WorldWideScience

Sample records for two-level parallel direct

  1. A two-level parallel direct search implementation for arbitrarily sized objective functions

    Energy Technology Data Exchange (ETDEWEB)

    Hutchinson, S.A.; Shadid, N.; Moffat, H.K. [Sandia National Labs., Albuquerque, NM (United States)] [and others

    1994-12-31

    In the past, many optimization schemes for massively parallel computers have attempted to achieve parallel efficiency using one of two methods. In the case of large and expensive objective function calculations, the optimization itself may be run in serial and the objective function calculations parallelized. In contrast, if the objective function calculations are relatively inexpensive and can be performed on a single processor, then the actual optimization routine itself may be parallelized. In this paper, a scheme based upon the Parallel Direct Search (PDS) technique is presented which allows the objective function calculations to be done on an arbitrarily large number (p{sub 2}) of processors. If, p, the number of processors available, is greater than or equal to 2p{sub 2} then the optimization may be parallelized as well. This allows for efficient use of computational resources since the objective function calculations can be performed on the number of processors that allow for peak parallel efficiency and then further speedup may be achieved by parallelizing the optimization. Results are presented for an optimization problem which involves the solution of a PDE using a finite-element algorithm as part of the objective function calculation. The optimum number of processors for the finite-element calculations is less than p/2. Thus, the PDS method is also parallelized. Performance comparisons are given for a nCUBE 2 implementation.

  2. A novel two-level dynamic parallel data scheme for large 3-D SN calculations

    International Nuclear Information System (INIS)

    Sjoden, G.E.; Shedlock, D.; Haghighat, A.; Yi, C.

    2005-01-01

    We introduce a new dynamic parallel memory optimization scheme for executing large scale 3-D discrete ordinates (Sn) simulations on distributed memory parallel computers. In order for parallel transport codes to be truly scalable, they must use parallel data storage, where only the variables that are locally computed are locally stored. Even with parallel data storage for the angular variables, cumulative storage requirements for large discrete ordinates calculations can be prohibitive. To address this problem, Memory Tuning has been implemented into the PENTRAN 3-D parallel discrete ordinates code as an optimized, two-level ('large' array, 'small' array) parallel data storage scheme. Memory Tuning can be described as the process of parallel data memory optimization. Memory Tuning dynamically minimizes the amount of required parallel data in allocated memory on each processor using a statistical sampling algorithm. This algorithm is based on the integral average and standard deviation of the number of fine meshes contained in each coarse mesh in the global problem. Because PENTRAN only stores the locally computed problem phase space, optimal two-level memory assignments can be unique on each node, depending upon the parallel decomposition used (hybrid combinations of angular, energy, or spatial). As demonstrated in the two large discrete ordinates models presented (a storage cask and an OECD MOX Benchmark), Memory Tuning can save a substantial amount of memory per parallel processor, allowing one to accomplish very large scale Sn computations. (authors)

  3. Parallel Factor-Based Model for Two-Dimensional Direction Estimation

    Directory of Open Access Journals (Sweden)

    Nizar Tayem

    2017-01-01

    Full Text Available Two-dimensional (2D Direction-of-Arrivals (DOA estimation for elevation and azimuth angles assuming noncoherent, mixture of coherent and noncoherent, and coherent sources using extended three parallel uniform linear arrays (ULAs is proposed. Most of the existing schemes have drawbacks in estimating 2D DOA for multiple narrowband incident sources as follows: use of large number of snapshots, estimation failure problem for elevation and azimuth angles in the range of typical mobile communication, and estimation of coherent sources. Moreover, the DOA estimation for multiple sources requires complex pair-matching methods. The algorithm proposed in this paper is based on first-order data matrix to overcome these problems. The main contributions of the proposed method are as follows: (1 it avoids estimation failure problem using a new antenna configuration and estimates elevation and azimuth angles for coherent sources; (2 it reduces the estimation complexity by constructing Toeplitz data matrices, which are based on a single or few snapshots; (3 it derives parallel factor (PARAFAC model to avoid pair-matching problems between multiple sources. Simulation results demonstrate the effectiveness of the proposed algorithm.

  4. Series-parallel method of direct solar array regulation

    Science.gov (United States)

    Gooder, S. T.

    1976-01-01

    A 40 watt experimental solar array was directly regulated by shorting out appropriate combinations of series and parallel segments of a solar array. Regulation switches were employed to control the array at various set-point voltages between 25 and 40 volts. Regulation to within + or - 0.5 volt was obtained over a range of solar array temperatures and illumination levels as an active load was varied from open circuit to maximum available power. A fourfold reduction in regulation switch power dissipation was achieved with series-parallel regulation as compared to the usual series-only switching for direct solar array regulation.

  5. Cross-Circulating Current Suppression Method for Parallel Three-Phase Two-Level Inverters

    DEFF Research Database (Denmark)

    Wei, Baoze; Guerrero, Josep M.; Guo, Xiaoqiang

    2015-01-01

    The parallel architecture is very popular for power inverters to increase the power level. This paper presents a method for the parallel operation of inverters in an ac-distributed system, to suppress the cross-circulating current based on virtual impedance without current-sharing bus...

  6. Parallel sparse direct solver for integrated circuit simulation

    CERN Document Server

    Chen, Xiaoming; Yang, Huazhong

    2017-01-01

    This book describes algorithmic methods and parallelization techniques to design a parallel sparse direct solver which is specifically targeted at integrated circuit simulation problems. The authors describe a complete flow and detailed parallel algorithms of the sparse direct solver. They also show how to improve the performance by simple but effective numerical techniques. The sparse direct solver techniques described can be applied to any SPICE-like integrated circuit simulator and have been proven to be high-performance in actual circuit simulation. Readers will benefit from the state-of-the-art parallel integrated circuit simulation techniques described in this book, especially the latest parallel sparse matrix solution techniques. · Introduces complicated algorithms of sparse linear solvers, using concise principles and simple examples, without complex theory or lengthy derivations; · Describes a parallel sparse direct solver that can be adopted to accelerate any SPICE-like integrated circuit simulato...

  7. Evidence for parallel consolidation of motion direction and orientation into visual short-term memory.

    Science.gov (United States)

    Rideaux, Reuben; Apthorp, Deborah; Edwards, Mark

    2015-02-12

    Recent findings have indicated the capacity to consolidate multiple items into visual short-term memory in parallel varies as a function of the type of information. That is, while color can be consolidated in parallel, evidence suggests that orientation cannot. Here we investigated the capacity to consolidate multiple motion directions in parallel and reexamined this capacity using orientation. This was achieved by determining the shortest exposure duration necessary to consolidate a single item, then examining whether two items, presented simultaneously, could be consolidated in that time. The results show that parallel consolidation of direction and orientation information is possible, and that parallel consolidation of direction appears to be limited to two. Additionally, we demonstrate the importance of adequate separation between feature intervals used to define items when attempting to consolidate in parallel, suggesting that when multiple items are consolidated in parallel, as opposed to serially, the resolution of representations suffer. Finally, we used facilitation of spatial attention to show that the deterioration of item resolution occurs during parallel consolidation, as opposed to storage. © 2015 ARVO.

  8. Parallel alternating direction preconditioner for isogeometric simulations of explicit dynamics

    KAUST Repository

    Łoś, Marcin

    2015-04-27

    In this paper we present a parallel implementation of the alternating direction preconditioner for isogeometric simulations of explicit dynamics. The Alternating Direction Implicit (ADI) algorithm, belongs to the category of matrix-splitting iterative methods, was proposed almost six decades ago for solving parabolic and elliptic partial differential equations, see [1–4]. The new version of this algorithm has been recently developed for isogeometric simulations of two dimensional explicit dynamics [5] and steady-state diffusion equations with orthotropic heterogenous coefficients [6]. In this paper we present a parallel version of the alternating direction implicit algorithm for three dimensional simulations. The algorithm has been incorporated as a part of PETIGA an isogeometric framework [7] build on top of PETSc [8]. We show the scalability of the parallel algorithm on STAMPEDE linux cluster up to 10,000 processors, as well as the convergence rate of the PCG solver with ADI algorithm as preconditioner.

  9. Data-parallel tomographic reconstruction : A comparison of filtered backprojection and direct Fourier reconstruction

    NARCIS (Netherlands)

    Roerdink, J.B.T.M.; Westenberg, M.A

    1998-01-01

    We consider the parallelization of two standard 2D reconstruction algorithms, filtered backprojection and direct Fourier reconstruction, using the data-parallel programming style. The algorithms are implemented on a Connection Machine CM-5 with 16 processors and a peak performance of 2 Gflop/s.

  10. Continuous liquid level detection based on two parallel plastic optical fibers in a helical structure

    Science.gov (United States)

    Zhang, Yingzi; Hou, Yulong; Zhang, Yanjun; Hu, Yanjun; Zhang, Liang; Gao, Xiaolong; Zhang, Huixin; Liu, Wenyi

    2018-02-01

    A simple and low-cost continuous liquid-level sensor based on two parallel plastic optical fibers (POFs) in a helical structure is presented. The change in the liquid level is determined by measuring the side-coupling power in the passive fiber. The side-coupling ratio is increased by just filling the gap between the two POFs with ultraviolet-curable optical cement, making the proposed sensor competitive. The experimental results show that the side-coupling power declines as the liquid level rises. The sensitivity and the measurement range are flexible and affected by the geometric parameters of the helical structure. A higher sensitivity of 0.0208 μW/mm is acquired for a smaller curvature radius of 5 mm, and the measurement range can be expanded to 120 mm by enlarging the screw pitch to 40 mm. In addition, the reversibility and temperature dependence are studied. The proposed sensor is a cost-effective solution offering the advantages of a simple fabrication process, good reversibility, and compensable temperature dependence.

  11. Parallel Directionally Split Solver Based on Reformulation of Pipelined Thomas Algorithm

    Science.gov (United States)

    Povitsky, A.

    1998-01-01

    In this research an efficient parallel algorithm for 3-D directionally split problems is developed. The proposed algorithm is based on a reformulated version of the pipelined Thomas algorithm that starts the backward step computations immediately after the completion of the forward step computations for the first portion of lines This algorithm has data available for other computational tasks while processors are idle from the Thomas algorithm. The proposed 3-D directionally split solver is based on the static scheduling of processors where local and non-local, data-dependent and data-independent computations are scheduled while processors are idle. A theoretical model of parallelization efficiency is used to define optimal parameters of the algorithm, to show an asymptotic parallelization penalty and to obtain an optimal cover of a global domain with subdomains. It is shown by computational experiments and by the theoretical model that the proposed algorithm reduces the parallelization penalty about two times over the basic algorithm for the range of the number of processors (subdomains) considered and the number of grid nodes per subdomain.

  12. Graph Grammar-Based Multi-Frontal Parallel Direct Solver for Two-Dimensional Isogeometric Analysis

    KAUST Repository

    Kuźnik, Krzysztof

    2012-06-02

    This paper introduces the graph grammar based model for developing multi-thread multi-frontal parallel direct solver for two dimensional isogeometric finite element method. Execution of the solver algorithm has been expressed as the sequence of graph grammar productions. At the beginning productions construct the elimination tree with leaves corresponding to finite elements. Following sequence of graph grammar productions generates element frontal matri-ces at leaf nodes, merges matrices at parent nodes and eliminates rows corresponding to fully assembled degrees of freedom. Finally, there are graph grammar productions responsible for root problem solution and recursive backward substitutions. Expressing the solver algorithm by graph grammar productions allows us to explore the concurrency of the algorithm. The graph grammar productions are grouped into sets of independent tasks that can be executed concurrently. The resulting concurrent multi-frontal solver algorithm is implemented and tested on NVIDIA GPU, providing O(NlogN) execution time complexity where N is the number of degrees of freedom. We have confirmed this complexity by solving up to 1 million of degrees of freedom with 448 cores GPU.

  13. Input-Parallel Output-Parallel Three-Level DC/DC Converters With Interleaving Control Strategy for Minimizing and Balancing Capacitor Ripple Currents

    DEFF Research Database (Denmark)

    Liu, Dong; Deng, Fujin; Gong, Zheng

    2017-01-01

    In this paper, the input-parallel output-parallel (IPOP) three-level (TL) DC/DC converters associated with the interleaving control strategy are proposed for minimizing and balancing the capacitor ripple currents. The proposed converters consist of two four-switch half-bridge three-level (HBTL) DC...

  14. Two-state ion heating at quasi-parallel shocks

    International Nuclear Information System (INIS)

    Thomsen, M.F.; Gosling, J.T.; Bame, S.J.; Onsager, T.G.; Russell, C.T.

    1990-01-01

    In a previous study of ion heating at quasi-parallel shocks, the authors showed a case in which the ion distributions downstream from the shock alternated between a cooler, denser, core/shoulder type and a hotter, less dense, more Maxwellian type. In this paper they further document the alternating occurrence of two different ion states downstream from several quasi-parallel shocks. Three separate lines of evidence are presented to show that the two states are not related in an evolutionary sense, but rather both are produced alternately at the shock: (1) the asymptotic downstream plasma parameters (density, ion temperature, and flow speed) are intermediate between those characterizing the two different states closer to the shock, suggesting that the asymptotic state is produced by a mixing of the two initial states; (2) examples of apparently interpenetrating (i.e., mixing) distributions can be found during transitions from one state to the other; and (3) examples of both types of distributions can be found at actual crossings of the shock ramp. The alternation between the two different types of ion distribution provides direct observational support for the idea that the dissipative dynamics of at least some quasi-parallel shocks is non-stationary and cyclic in nature, as demonstrated by recent numerical simulations. Typical cycle times between intervals of similar ion heating states are ∼2 upstream ion gyroperiods. Both the simulations and the in situ observations indicate that a process of coherent ion reflection is commonly an important part of the dissipation at quasi-parallel shocks

  15. Efficient Parallel Algorithm For Direct Numerical Simulation of Turbulent Flows

    Science.gov (United States)

    Moitra, Stuti; Gatski, Thomas B.

    1997-01-01

    A distributed algorithm for a high-order-accurate finite-difference approach to the direct numerical simulation (DNS) of transition and turbulence in compressible flows is described. This work has two major objectives. The first objective is to demonstrate that parallel and distributed-memory machines can be successfully and efficiently used to solve computationally intensive and input/output intensive algorithms of the DNS class. The second objective is to show that the computational complexity involved in solving the tridiagonal systems inherent in the DNS algorithm can be reduced by algorithm innovations that obviate the need to use a parallelized tridiagonal solver.

  16. Direct drive digital servo press with high parallel control

    Science.gov (United States)

    Murata, Chikara; Yabe, Jun; Endou, Junichi; Hasegawa, Kiyoshi

    2013-12-01

    Direct drive digital servo press has been developed as the university-industry joint research and development since 1998. On the basis of this result, 4-axes direct drive digital servo press has been developed and in the market on April of 2002. This servo press is composed of 1 slide supported by 4 ball screws and each axis has linearscale measuring the position of each axis with high accuracy less than μm order level. Each axis is controlled independently by servo motor and feedback system. This system can keep high level parallelism and high accuracy even with high eccentric load. Furthermore the 'full stroke full power' is obtained by using ball screws. Using these features, new various types of press forming and stamping have been obtained by development and production. The new stamping and forming methods are introduced and 'manufacturing' need strategy of press forming with high added value and also the future direction of press forming are also introduced.

  17. Parallel Machine Scheduling with Batch Delivery to Two Customers

    Directory of Open Access Journals (Sweden)

    Xueling Zhong

    2015-01-01

    Full Text Available In some make-to-order supply chains, the manufacturer needs to process and deliver products for customers at different locations. To coordinate production and distribution operations at the detailed scheduling level, we study a parallel machine scheduling model with batch delivery to two customers by vehicle routing method. In this model, the supply chain consists of a processing facility with m parallel machines and two customers. A set of jobs containing n1 jobs from customer 1 and n2 jobs from customer 2 are first processed in the processing facility and then delivered to the customers directly without intermediate inventory. The problem is to find a joint schedule of production and distribution such that the tradeoff between maximum arrival time of the jobs and total distribution cost is minimized. The distribution cost of a delivery shipment consists of a fixed charge and a variable cost proportional to the total distance of the route taken by the shipment. We provide polynomial time heuristics with worst-case performance analysis for the problem. If m=2 and (n1-b(n2-b<0, we propose a heuristic with worst-case ratio bound of 3/2, where b is the capacity of the delivery shipment. Otherwise, the worst-case ratio bound of the heuristic we propose is 2-2/(m+1.

  18. Fencing direct memory access data transfers in a parallel active messaging interface of a parallel computer

    Science.gov (United States)

    Blocksome, Michael A.; Mamidala, Amith R.

    2013-09-03

    Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to segments of shared random access memory through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and a segment of shared memory; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.

  19. Fencing network direct memory access data transfers in a parallel active messaging interface of a parallel computer

    Science.gov (United States)

    Blocksome, Michael A.; Mamidala, Amith R.

    2015-07-07

    Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to a deterministic data communications network through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and the deterministic data communications network; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.

  20. Directions in parallel processor architecture, and GPUs too

    CERN Multimedia

    CERN. Geneva

    2014-01-01

    Modern computing is power-limited in every domain of computing. Performance increments extracted from instruction-level parallelism (ILP) are no longer power-efficient; they haven't been for some time. Thread-level parallelism (TLP) is a more easily exploited form of parallelism, at the expense of programmer effort to expose it in the program. In this talk, I will introduce you to disparate topics in parallel processor architecture that will impact programming models (and you) in both the near and far future. About the speaker Olivier is a senior GPU (SM) architect at NVIDIA and an active participant in the concurrency working group of the ISO C++ committee. He has also worked on very large diesel engines as a mechanical engineer, and taught at McGill University (Canada) as a faculty instructor.

  1. Direct Power Control for Three-Phase Two-Level Voltage-Source Rectifiers Based on Extended-State Observation

    DEFF Research Database (Denmark)

    Song, Zhanfeng; Tian, Yanjun; Yan, Zhuo

    2016-01-01

    This paper proposed a direct power control strategy for three-phase two-level voltage-source rectifiers based on extended-state observation. Active and reactive powers are directly regulated in the stationary reference frame. Similar to the family of predictive controllers whose inherent characte......This paper proposed a direct power control strategy for three-phase two-level voltage-source rectifiers based on extended-state observation. Active and reactive powers are directly regulated in the stationary reference frame. Similar to the family of predictive controllers whose inherent...

  2. A path-level exact parallelization strategy for sequential simulation

    Science.gov (United States)

    Peredo, Oscar F.; Baeza, Daniel; Ortiz, Julián M.; Herrero, José R.

    2018-01-01

    Sequential Simulation is a well known method in geostatistical modelling. Following the Bayesian approach for simulation of conditionally dependent random events, Sequential Indicator Simulation (SIS) method draws simulated values for K categories (categorical case) or classes defined by K different thresholds (continuous case). Similarly, Sequential Gaussian Simulation (SGS) method draws simulated values from a multivariate Gaussian field. In this work, a path-level approach to parallelize SIS and SGS methods is presented. A first stage of re-arrangement of the simulation path is performed, followed by a second stage of parallel simulation for non-conflicting nodes. A key advantage of the proposed parallelization method is to generate identical realizations as with the original non-parallelized methods. Case studies are presented using two sequential simulation codes from GSLIB: SISIM and SGSIM. Execution time and speedup results are shown for large-scale domains, with many categories and maximum kriging neighbours in each case, achieving high speedup results in the best scenarios using 16 threads of execution in a single machine.

  3. Parallelizing flow-accumulation calculations on graphics processing units—From iterative DEM preprocessing algorithm to recursive multiple-flow-direction algorithm

    Science.gov (United States)

    Qin, Cheng-Zhi; Zhan, Lijun

    2012-06-01

    As one of the important tasks in digital terrain analysis, the calculation of flow accumulations from gridded digital elevation models (DEMs) usually involves two steps in a real application: (1) using an iterative DEM preprocessing algorithm to remove the depressions and flat areas commonly contained in real DEMs, and (2) using a recursive flow-direction algorithm to calculate the flow accumulation for every cell in the DEM. Because both algorithms are computationally intensive, quick calculation of the flow accumulations from a DEM (especially for a large area) presents a practical challenge to personal computer (PC) users. In recent years, rapid increases in hardware capacity of the graphics processing units (GPUs) provided in modern PCs have made it possible to meet this challenge in a PC environment. Parallel computing on GPUs using a compute-unified-device-architecture (CUDA) programming model has been explored to speed up the execution of the single-flow-direction algorithm (SFD). However, the parallel implementation on a GPU of the multiple-flow-direction (MFD) algorithm, which generally performs better than the SFD algorithm, has not been reported. Moreover, GPU-based parallelization of the DEM preprocessing step in the flow-accumulation calculations has not been addressed. This paper proposes a parallel approach to calculate flow accumulations (including both iterative DEM preprocessing and a recursive MFD algorithm) on a CUDA-compatible GPU. For the parallelization of an MFD algorithm (MFD-md), two different parallelization strategies using a GPU are explored. The first parallelization strategy, which has been used in the existing parallel SFD algorithm on GPU, has the problem of computing redundancy. Therefore, we designed a parallelization strategy based on graph theory. The application results show that the proposed parallel approach to calculate flow accumulations on a GPU performs much faster than either sequential algorithms or other parallel GPU

  4. Coherent effects on two-photon correlation and directional emission of two two-level atoms

    International Nuclear Information System (INIS)

    Ooi, C. H. Raymond; Kim, Byung-Gyu; Lee, Hai-Woong

    2007-01-01

    Sub- and superradiant dynamics of spontaneously decaying atoms are manifestations of collective many-body systems. We study the internal dynamics and the radiation properties of two atoms in free space. Interesting results are obtained when the atoms are separated by less than half a wavelength of the atomic transition, where the dipole-dipole interaction gives rise to new coherent effects, such as (a) coherence between two intermediate collective states, (b) oscillations in the two-photon correlation G (2) , (c) emission of two photons by one atom, and (d) the loss of directional correlation. We compare the population dynamics during the two-photon emission process with the dynamics of single-photon emission in the cases of a Λ and a V scheme. We compute the temporal correlation and angular correlation of two successively emitted photons using the G (2) for different values of atomic separation. We find antibunching when the atomic separation is a quarter wavelength λ/4. Oscillations in the temporal correlation provide a useful feature for measuring subwavelength atomic separation. Strong directional correlation between two emitted photons is found for atomic separation larger than a wavelength. We also compare the directionality of a photon spontaneously emitted by the two atoms prepared in phased-symmetric and phased-antisymmetric entangled states vertical bar ±> k 0 =e ik 0 ·r 1 vertical bar a 1 ,b 2 >±e ik 0 ·r 2 vertical bar b 1 ,a 2 > by a laser pulse with wave vector k 0 . Photon emission is directionally suppressed along k 0 for the phased-antisymmetric state. The directionality ceases for interatomic distances less than λ/2

  5. A Parallel Two-fluid Code for Global Magnetic Reconnection Studies

    International Nuclear Information System (INIS)

    Breslau, J.A.; Jardin, S.C.

    2001-01-01

    This paper describes a new algorithm for the computation of two-dimensional resistive magnetohydrodynamic (MHD) and two-fluid studies of magnetic reconnection in plasmas. It has been implemented on several parallel platforms and shows good scalability up to 32 CPUs for reasonable problem sizes. A fixed, nonuniform rectangular mesh is used to resolve the different spatial scales in the reconnection problem. The resistive MHD version of the code uses an implicit/explicit hybrid method, while the two-fluid version uses an alternating-direction implicit (ADI) method. The technique has proven useful for comparing several different theories of collisional and collisionless reconnection

  6. An Automatic Instruction-Level Parallelization of Machine Code

    Directory of Open Access Journals (Sweden)

    MARINKOVIC, V.

    2018-02-01

    Full Text Available Prevailing multicores and novel manycores have made a great challenge of modern day - parallelization of embedded software that is still written as sequential. In this paper, automatic code parallelization is considered, focusing on developing a parallelization tool at the binary level as well as on the validation of this approach. The novel instruction-level parallelization algorithm for assembly code which uses the register names after SSA to find independent blocks of code and then to schedule independent blocks using METIS to achieve good load balance is developed. The sequential consistency is verified and the validation is done by measuring the program execution time on the target architecture. Great speedup, taken as the performance measure in the validation process, and optimal load balancing are achieved for multicore RISC processors with 2 to 16 cores (e.g. MIPS, MicroBlaze, etc.. In particular, for 16 cores, the average speedup is 7.92x, while in some cases it reaches 14x. An approach to automatic parallelization provided by this paper is useful to researchers and developers in the area of parallelization as the basis for further optimizations, as the back-end of a compiler, or as the code parallelization tool for an embedded system.

  7. Computer-Aided Parallelizer and Optimizer

    Science.gov (United States)

    Jin, Haoqiang

    2011-01-01

    The Computer-Aided Parallelizer and Optimizer (CAPO) automates the insertion of compiler directives (see figure) to facilitate parallel processing on Shared Memory Parallel (SMP) machines. While CAPO currently is integrated seamlessly into CAPTools (developed at the University of Greenwich, now marketed as ParaWise), CAPO was independently developed at Ames Research Center as one of the components for the Legacy Code Modernization (LCM) project. The current version takes serial FORTRAN programs, performs interprocedural data dependence analysis, and generates OpenMP directives. Due to the widely supported OpenMP standard, the generated OpenMP codes have the potential to run on a wide range of SMP machines. CAPO relies on accurate interprocedural data dependence information currently provided by CAPTools. Compiler directives are generated through identification of parallel loops in the outermost level, construction of parallel regions around parallel loops and optimization of parallel regions, and insertion of directives with automatic identification of private, reduction, induction, and shared variables. Attempts also have been made to identify potential pipeline parallelism (implemented with point-to-point synchronization). Although directives are generated automatically, user interaction with the tool is still important for producing good parallel codes. A comprehensive graphical user interface is included for users to interact with the parallelization process.

  8. Parallel Libraries to support High-Level Programming

    DEFF Research Database (Denmark)

    Larsen, Morten Nørgaard

    and the Microsoft .NET iv framework. Normally, one would not directly think of the .NET framework when talking scientific applications, but Microsoft has in the last couple of versions of .NET introduce a number of tools for writing parallel and high performance code. The first section examines how programmers can...

  9. Towards a streaming model for nested data parallelism

    DEFF Research Database (Denmark)

    Madsen, Frederik Meisner; Filinski, Andrzej

    2013-01-01

    The language-integrated cost semantics for nested data parallelism pioneered by NESL provides an intuitive, high-level model for predicting performance and scalability of parallel algorithms with reasonable accuracy. However, this predictability, obtained through a uniform, parallelism-flattening......The language-integrated cost semantics for nested data parallelism pioneered by NESL provides an intuitive, high-level model for predicting performance and scalability of parallel algorithms with reasonable accuracy. However, this predictability, obtained through a uniform, parallelism......-processable in a streaming fashion. This semantics is directly compatible with previously proposed piecewise execution models for nested data parallelism, but allows the expected space usage to be reasoned about directly at the source-language level. The language definition and implementation are still very much work...

  10. Feed-forward volume rendering algorithm for moderately parallel MIMD machines

    Science.gov (United States)

    Yagel, Roni

    1993-01-01

    Algorithms for direct volume rendering on parallel and vector processors are investigated. Volumes are transformed efficiently on parallel processors by dividing the data into slices and beams of voxels. Equal sized sets of slices along one axis are distributed to processors. Parallelism is achieved at two levels. Because each slice can be transformed independently of others, processors transform their assigned slices with no communication, thus providing maximum possible parallelism at the first level. Within each slice, consecutive beams are incrementally transformed using coherency in the transformation computation. Also, coherency across slices can be exploited to further enhance performance. This coherency yields the second level of parallelism through the use of the vector processing or pipelining. Other ongoing efforts include investigations into image reconstruction techniques, load balancing strategies, and improving performance.

  11. Direct kinematics solution architectures for industrial robot manipulators: Bit-serial versus parallel

    Science.gov (United States)

    Lee, J.; Kim, K.

    1991-01-01

    A Very Large Scale Integration (VLSI) architecture for robot direct kinematic computation suitable for industrial robot manipulators was investigated. The Denavit-Hartenberg transformations are reviewed to exploit a proper processing element, namely an augmented CORDIC. Specifically, two distinct implementations are elaborated on, such as the bit-serial and parallel. Performance of each scheme is analyzed with respect to the time to compute one location of the end-effector of a 6-links manipulator, and the number of transistors required.

  12. Direct kinematics solution architectures for industrial robot manipulators: Bit-serial versus parallel

    Science.gov (United States)

    Lee, J.; Kim, K.

    A Very Large Scale Integration (VLSI) architecture for robot direct kinematic computation suitable for industrial robot manipulators was investigated. The Denavit-Hartenberg transformations are reviewed to exploit a proper processing element, namely an augmented CORDIC. Specifically, two distinct implementations are elaborated on, such as the bit-serial and parallel. Performance of each scheme is analyzed with respect to the time to compute one location of the end-effector of a 6-links manipulator, and the number of transistors required.

  13. Parallel processing of two-dimensional Sn transport calculations

    International Nuclear Information System (INIS)

    Uematsu, M.

    1997-01-01

    A parallel processing method for the two-dimensional S n transport code DOT3.5 has been developed to achieve a drastic reduction in computation time. In the proposed method, parallelization is achieved with angular domain decomposition and/or space domain decomposition. The calculational speed of parallel processing by angular domain decomposition is largely influenced by frequent communications between processing elements. To assess parallelization efficiency, sample problems with up to 32 x 32 spatial meshes were solved with a Sun workstation using the PVM message-passing library. As a result, parallel calculation using 16 processing elements, for example, was found to be nine times as fast as that with one processing element. As for parallel processing by geometry segmentation, the influence of processing element communications on computation time is small; however, discontinuity at the segment boundary degrades convergence speed. To accelerate the convergence, an alternate sweep of angular flux in conjunction with space domain decomposition and a two-step rescaling method consisting of segmentwise rescaling and ordinary pointwise rescaling have been developed. By applying the developed method, the number of iterations needed to obtain a converged flux solution was reduced by a factor of 2. As a result, parallel calculation using 16 processing elements was found to be 5.98 times as fast as the original DOT3.5 calculation

  14. Adapting high-level language programs for parallel processing using data flow

    Science.gov (United States)

    Standley, Hilda M.

    1988-01-01

    EASY-FLOW, a very high-level data flow language, is introduced for the purpose of adapting programs written in a conventional high-level language to a parallel environment. The level of parallelism provided is of the large-grained variety in which parallel activities take place between subprograms or processes. A program written in EASY-FLOW is a set of subprogram calls as units, structured by iteration, branching, and distribution constructs. A data flow graph may be deduced from an EASY-FLOW program.

  15. Efficient parallel and out of core algorithms for constructing large bi-directed de Bruijn graphs

    Directory of Open Access Journals (Sweden)

    Vaughn Matthew

    2010-11-01

    Full Text Available Abstract Background Assembling genomic sequences from a set of overlapping reads is one of the most fundamental problems in computational biology. Algorithms addressing the assembly problem fall into two broad categories - based on the data structures which they employ. The first class uses an overlap/string graph and the second type uses a de Bruijn graph. However with the recent advances in short read sequencing technology, de Bruijn graph based algorithms seem to play a vital role in practice. Efficient algorithms for building these massive de Bruijn graphs are very essential in large sequencing projects based on short reads. In an earlier work, an O(n/p time parallel algorithm has been given for this problem. Here n is the size of the input and p is the number of processors. This algorithm enumerates all possible bi-directed edges which can overlap with a node and ends up generating Θ(nΣ messages (Σ being the size of the alphabet. Results In this paper we present a Θ(n/p time parallel algorithm with a communication complexity that is equal to that of parallel sorting and is not sensitive to Σ. The generality of our algorithm makes it very easy to extend it even to the out-of-core model and in this case it has an optimal I/O complexity of Θ(nlog(n/BBlog(M/B (M being the main memory size and B being the size of the disk block. We demonstrate the scalability of our parallel algorithm on a SGI/Altix computer. A comparison of our algorithm with the previous approaches reveals that our algorithm is faster - both asymptotically and practically. We demonstrate the scalability of our sequential out-of-core algorithm by comparing it with the algorithm used by VELVET to build the bi-directed de Bruijn graph. Our experiments reveal that our algorithm can build the graph with a constant amount of memory, which clearly outperforms VELVET. We also provide efficient algorithms for the bi-directed chain compaction problem. Conclusions The bi-directed

  16. Efficient parallel and out of core algorithms for constructing large bi-directed de Bruijn graphs.

    Science.gov (United States)

    Kundeti, Vamsi K; Rajasekaran, Sanguthevar; Dinh, Hieu; Vaughn, Matthew; Thapar, Vishal

    2010-11-15

    Assembling genomic sequences from a set of overlapping reads is one of the most fundamental problems in computational biology. Algorithms addressing the assembly problem fall into two broad categories - based on the data structures which they employ. The first class uses an overlap/string graph and the second type uses a de Bruijn graph. However with the recent advances in short read sequencing technology, de Bruijn graph based algorithms seem to play a vital role in practice. Efficient algorithms for building these massive de Bruijn graphs are very essential in large sequencing projects based on short reads. In an earlier work, an O(n/p) time parallel algorithm has been given for this problem. Here n is the size of the input and p is the number of processors. This algorithm enumerates all possible bi-directed edges which can overlap with a node and ends up generating Θ(nΣ) messages (Σ being the size of the alphabet). In this paper we present a Θ(n/p) time parallel algorithm with a communication complexity that is equal to that of parallel sorting and is not sensitive to Σ. The generality of our algorithm makes it very easy to extend it even to the out-of-core model and in this case it has an optimal I/O complexity of Θ(nlog(n/B)Blog(M/B)) (M being the main memory size and B being the size of the disk block). We demonstrate the scalability of our parallel algorithm on a SGI/Altix computer. A comparison of our algorithm with the previous approaches reveals that our algorithm is faster--both asymptotically and practically. We demonstrate the scalability of our sequential out-of-core algorithm by comparing it with the algorithm used by VELVET to build the bi-directed de Bruijn graph. Our experiments reveal that our algorithm can build the graph with a constant amount of memory, which clearly outperforms VELVET. We also provide efficient algorithms for the bi-directed chain compaction problem. The bi-directed de Bruijn graph is a fundamental data structure for

  17. The effect of the flow direction inside the header on two-phase flow distribution in parallel vertical channels

    International Nuclear Information System (INIS)

    Marchitto, A.; Fossa, M.; Guglielmini, G.

    2012-01-01

    Uniform fluid distribution is essential for efficient operation of chemical-processing equipment such as contactors, reactors, mixers, burners and in most refrigeration equipment, where two phases are acting together. To obtain optimum distribution, proper consideration must be given to flow behaviour in the distributor, flow conditions upstream and downstream of the distributor, and the distribution requirements (fluid or phase) of the equipment. Even though the principles of single phase distribution have been well developed for more than three decades, they are frequently not taken in the right account by equipment designers when a mixture is present, and a significant fraction of process equipment consequently suffers from maldistribution. The experimental investigation presented in this paper is aimed at understanding the main mechanisms which drive the flow distribution inside a two-phase horizontal header in order to design improved distributors and to optimise the flow distribution inside compact heat exchanger. Experimentation was devoted to establish the influence of the inlet conditions and of the channel/distributor geometry on the phase/mass distribution into parallel vertical channels. The study is carried out with air–water mixtures and it is based on the measurement of component flow rates in individual channels and on pressure drops across the distributor. The effects of the operating conditions, the header geometry and the inlet port nozzle were investigated in the ranges of liquid and gas superficial velocities of 0.2–1.2 and 1.5–16.5 m/s, respectively. In order to control the main flow direction inside the header, different fitting devices were tested; the insertion of a co-axial, multi-hole distributor inside the header has confirmed the possibility of greatly improving the liquid and gas flow distribution by the proper selection of position, diameter and number of the flow openings between the supplying distributor and the system of

  18. Directional Transport of a Liquid Drop between Parallel-Nonparallel Combinative Plates.

    Science.gov (United States)

    Huang, Yao; Hu, Liang; Chen, Wenyu; Fu, Xin; Ruan, Xiaodong; Xie, Haibo

    2018-04-17

    Liquids confined between two parallel plates can perform the function of transmission, support, or lubrication in many practical applications, due to which to maintain liquids stable within their working area is very important. However, instabilities may lead to the formation of leaking drops outside the bulk liquid, thus it is necessary to transport the detached drops back without overstepping the working area and causing destructive leakage to the system. In this study, we report a novel and facile method to solve this problem by introducing the wedgelike geometry into the parallel gap to form a parallel-nonparallel combinative construction. Transport performances of this structure were investigated. The criterion for self-propelled motion was established, which seemed more difficult to meet than that in the nonparallel gap. Then, we performed a more detailed investigation into the drop dynamics under squeezing and relaxing modes because the drops can surely return in hydrophilic combinative gaps, whereas uncertainties arose in gaps with a weak hydrophobic character. Therefore, through exploration of the transition mechanism of the drop motion state, a crucial factor named turning point was discovered and supposed to be directly related to the final state of the drops. On the basis of the theoretical model of turning point, the criterion to identify whether a liquid drop returns to the parallel part under squeezing and relaxing modes was achieved. These criteria can provide guidance on parameter selection and structural optimization for the combinative gap, so that the destructive leakage in practical productions can be avoided.

  19. Exploiting Symmetry on Parallel Architectures.

    Science.gov (United States)

    Stiller, Lewis Benjamin

    1995-01-01

    This thesis describes techniques for the design of parallel programs that solve well-structured problems with inherent symmetry. Part I demonstrates the reduction of such problems to generalized matrix multiplication by a group-equivariant matrix. Fast techniques for this multiplication are described, including factorization, orbit decomposition, and Fourier transforms over finite groups. Our algorithms entail interaction between two symmetry groups: one arising at the software level from the problem's symmetry and the other arising at the hardware level from the processors' communication network. Part II illustrates the applicability of our symmetry -exploitation techniques by presenting a series of case studies of the design and implementation of parallel programs. First, a parallel program that solves chess endgames by factorization of an associated dihedral group-equivariant matrix is described. This code runs faster than previous serial programs, and discovered it a number of results. Second, parallel algorithms for Fourier transforms for finite groups are developed, and preliminary parallel implementations for group transforms of dihedral and of symmetric groups are described. Applications in learning, vision, pattern recognition, and statistics are proposed. Third, parallel implementations solving several computational science problems are described, including the direct n-body problem, convolutions arising from molecular biology, and some communication primitives such as broadcast and reduce. Some of our implementations ran orders of magnitude faster than previous techniques, and were used in the investigation of various physical phenomena.

  20. Hybrid parallelization of the XTOR-2F code for the simulation of two-fluid MHD instabilities in tokamaks

    Science.gov (United States)

    Marx, Alain; Lütjens, Hinrich

    2017-03-01

    A hybrid MPI/OpenMP parallel version of the XTOR-2F code [Lütjens and Luciani, J. Comput. Phys. 229 (2010) 8130] solving the two-fluid MHD equations in full tokamak geometry by means of an iterative Newton-Krylov matrix-free method has been developed. The present work shows that the code has been parallelized significantly despite the numerical profile of the problem solved by XTOR-2F, i.e. a discretization with pseudo-spectral representations in all angular directions, the stiffness of the two-fluid stability problem in tokamaks, and the use of a direct LU decomposition to invert the physical pre-conditioner at every Krylov iteration of the solver. The execution time of the parallelized version is an order of magnitude smaller than the sequential one for low resolution cases, with an increasing speedup when the discretization mesh is refined. Moreover, it allows to perform simulations with higher resolutions, previously forbidden because of memory limitations.

  1. An experimental study of two-phase flow instability on two parallel channel with low steam quality

    International Nuclear Information System (INIS)

    Jiang Shengyao; Wu shaorong; Bo Jinhai; Yao Meisheng; Han Bing; Zhang Youjie

    1988-01-01

    An experimental result of two-phase flow instability on two parallel channel natural circulation with low steam quality is presented. The comparison of instability in the single channel and that in parallel channel is given. The effect of unequal inlet resistance coefficient and unequal power on the parallel channel instability is described and the behaviour of instability with equal exit steam quality in the two channel is investigated

  2. Primal Domain Decomposition Method with Direct and Iterative Solver for Circuit-Field-Torque Coupled Parallel Finite Element Method to Electric Machine Modelling

    Directory of Open Access Journals (Sweden)

    Daniel Marcsa

    2015-01-01

    Full Text Available The analysis and design of electromechanical devices involve the solution of large sparse linear systems, and require therefore high performance algorithms. In this paper, the primal Domain Decomposition Method (DDM with parallel forward-backward and with parallel Preconditioned Conjugate Gradient (PCG solvers are introduced in two-dimensional parallel time-stepping finite element formulation to analyze rotating machine considering the electromagnetic field, external circuit and rotor movement. The proposed parallel direct and the iterative solver with two preconditioners are analyzed concerning its computational efficiency and number of iterations of the solver with different preconditioners. Simulation results of a rotating machine is also presented.

  3. Parallel Computing Characteristics of Two-Phase Thermal-Hydraulics code, CUPID

    International Nuclear Information System (INIS)

    Lee, Jae Ryong; Yoon, Han Young

    2013-01-01

    Parallelized CUPID code has proved to be able to reproduce multi-dimensional thermal hydraulic analysis by validating with various conceptual problems and experimental data. In this paper, the characteristics of the parallelized CUPID code were investigated. Both single- and two phase simulation are taken into account. Since the scalability of a parallel simulation is known to be better for fine mesh system, two types of mesh system are considered. In addition, the dependency of the preconditioner for matrix solver was also compared. The scalability for the single-phase flow is better than that for two-phase flow due to the less numbers of iterations for solving pressure matrix. The CUPID code was investigated the parallel performance in terms of scalability. The CUPID code was parallelized with domain decomposition method. The MPI library was adopted to communicate the information at the interface cells. As increasing the number of mesh, the scalability is improved. For a given mesh, single-phase flow simulation with diagonal preconditioner shows the best speedup. However, for the two-phase flow simulation, the ILU preconditioner is recommended since it reduces the overall simulation time

  4. Abstract Level Parallelization of Finite Difference Methods

    Directory of Open Access Journals (Sweden)

    Edwin Vollebregt

    1997-01-01

    Full Text Available A formalism is proposed for describing finite difference calculations in an abstract way. The formalism consists of index sets and stencils, for characterizing the structure of sets of data items and interactions between data items (“neighbouring relations”. The formalism provides a means for lifting programming to a more abstract level. This simplifies the tasks of performance analysis and verification of correctness, and opens the way for automaticcode generation. The notation is particularly useful in parallelization, for the systematic construction of parallel programs in a process/channel programming paradigm (e.g., message passing. This is important because message passing, unfortunately, still is the only approach that leads to acceptable performance for many more unstructured or irregular problems on parallel computers that have non-uniform memory access times. It will be shown that the use of index sets and stencils greatly simplifies the determination of which data must be exchanged between different computing processes.

  5. Fabrication of Si-nozzles for parallel mechano-electrospinning direct writing

    International Nuclear Information System (INIS)

    Pan, Yanqiao; Huang, YongAn; Bu, Ningbin; Yin, Zhouping

    2013-01-01

    Nozzles with micro-scale orifices drive high-resolution printing techniques for generating micro- to nano-scale droplets/lines. This paper presents the fabrication and application of Si-nozzles in mechano-electrospinning (MES). The fabrication process mainly consists of photolithography, Au deposition, inductively coupled plasma, and polydimethylsiloxane encapsulation. The 6 wt% polyethylene oxide solution is adopted to study the electrospinning behaviour and the relations between fibre diameter and process parameters in MES. A fibre grid with 250 µm spacing is able to be direct written, and the diameters are less than 3 µm. To improve the printing efficiency, positioning accuracy and flexibility, a rotatable multi-nozzle is adopted. The distance between parallel lines reduces sharply from 4.927 to 0.308 mm with the rotating angle increasing from 0° to 87°, and the fibre grids with tunable distance are achieved. This method paves the way for fabrication of addressable Si-nozzle array in parallel MES direct writing. (paper)

  6. GRAVIDY, a GPU modular, parallel direct-summation N-body integrator: dynamics with softening

    Science.gov (United States)

    Maureira-Fredes, Cristián; Amaro-Seoane, Pau

    2018-01-01

    A wide variety of outstanding problems in astrophysics involve the motion of a large number of particles under the force of gravity. These include the global evolution of globular clusters, tidal disruptions of stars by a massive black hole, the formation of protoplanets and sources of gravitational radiation. The direct-summation of N gravitational forces is a complex problem with no analytical solution and can only be tackled with approximations and numerical methods. To this end, the Hermite scheme is a widely used integration method. With different numerical techniques and special-purpose hardware, it can be used to speed up the calculations. But these methods tend to be computationally slow and cumbersome to work with. We present a new graphics processing unit (GPU), direct-summation N-body integrator written from scratch and based on this scheme, which includes relativistic corrections for sources of gravitational radiation. GRAVIDY has high modularity, allowing users to readily introduce new physics, it exploits available computational resources and will be maintained by regular updates. GRAVIDY can be used in parallel on multiple CPUs and GPUs, with a considerable speed-up benefit. The single-GPU version is between one and two orders of magnitude faster than the single-CPU version. A test run using four GPUs in parallel shows a speed-up factor of about 3 as compared to the single-GPU version. The conception and design of this first release is aimed at users with access to traditional parallel CPU clusters or computational nodes with one or a few GPU cards.

  7. A DIRECT METHOD TO DETERMINE THE PARALLEL MEAN FREE PATH OF SOLAR ENERGETIC PARTICLES WITH ADIABATIC FOCUSING

    International Nuclear Information System (INIS)

    He, H.-Q.; Wan, W.

    2012-01-01

    The parallel mean free path of solar energetic particles (SEPs), which is determined by physical properties of SEPs as well as those of solar wind, is a very important parameter in space physics to study the transport of charged energetic particles in the heliosphere, especially for space weather forecasting. In space weather practice, it is necessary to find a quick approach to obtain the parallel mean free path of SEPs for a solar event. In addition, the adiabatic focusing effect caused by a spatially varying mean magnetic field in the solar system is important to the transport processes of SEPs. Recently, Shalchi presented an analytical description of the parallel diffusion coefficient with adiabatic focusing. Based on Shalchi's results, in this paper we provide a direct analytical formula as a function of parameters concerning the physical properties of SEPs and solar wind to directly and quickly determine the parallel mean free path of SEPs with adiabatic focusing. Since all of the quantities in the analytical formula can be directly observed by spacecraft, this direct method would be a very useful tool in space weather research. As applications of the direct method, we investigate the inherent relations between the parallel mean free path and various parameters concerning physical properties of SEPs and solar wind. Comparisons of parallel mean free paths with and without adiabatic focusing are also presented.

  8. Direct numerical simulation of bubbles with parallelized adaptive mesh refinement

    International Nuclear Information System (INIS)

    Talpaert, A.

    2015-01-01

    The study of two-phase Thermal-Hydraulics is a major topic for Nuclear Engineering for both security and efficiency of nuclear facilities. In addition to experiments, numerical modeling helps to knowing precisely where bubbles appear and how they behave, in the core as well as in the steam generators. This work presents the finest scale of representation of two-phase flows, Direct Numerical Simulation of bubbles. We use the 'Di-phasic Low Mach Number' equation model. It is particularly adapted to low-Mach number flows, that is to say flows which velocity is much slower than the speed of sound; this is very typical of nuclear thermal-hydraulics conditions. Because we study bubbles, we capture the front between vapor and liquid phases thanks to a downward flux limiting numerical scheme. The specific discrete analysis technique this work introduces is well-balanced parallel Adaptive Mesh Refinement (AMR). With AMR, we refined the coarse grid on a batch of patches in order to locally increase precision in areas which matter more, and capture fine changes in the front location and its topology. We show that patch-based AMR is very adapted for parallel computing. We use a variety of physical examples: forced advection, heat transfer, phase changes represented by a Stefan model, as well as the combination of all those models. We will present the results of those numerical simulations, as well as the speed up compared to equivalent non-AMR simulation and to serial computation of the same problems. This document is made up of an abstract and the slides of the presentation. (author)

  9. Out-of-order parallel discrete event simulation for electronic system-level design

    CERN Document Server

    Chen, Weiwei

    2014-01-01

    This book offers readers a set of new approaches and tools a set of tools and techniques for facing challenges in parallelization with design of embedded systems.? It provides an advanced parallel simulation infrastructure for efficient and effective system-level model validation and development so as to build better products in less time.? Since parallel discrete event simulation (PDES) has the potential to exploit the underlying parallel computational capability in today's multi-core simulation hosts, the author begins by reviewing the parallelization of discrete event simulation, identifyin

  10. Bi-directional series-parallel elastic actuator and overlap of the actuation layers.

    Science.gov (United States)

    Furnémont, Raphaël; Mathijssen, Glenn; Verstraten, Tom; Lefeber, Dirk; Vanderborght, Bram

    2016-01-27

    Several robotics applications require high torque-to-weight ratio and energy efficient actuators. Progress in that direction was made by introducing compliant elements into the actuation. A large variety of actuators were developed such as series elastic actuators (SEAs), variable stiffness actuators and parallel elastic actuators (PEAs). SEAs can reduce the peak power while PEAs can reduce the torque requirement on the motor. Nonetheless, these actuators still cannot meet performances close to humans. To combine both advantages, the series parallel elastic actuator (SPEA) was developed. The principle is inspired from biological muscles. Muscles are composed of motor units, placed in parallel, which are variably recruited as the required effort increases. This biological principle is exploited in the SPEA, where springs (layers), placed in parallel, can be recruited one by one. This recruitment is performed by an intermittent mechanism. This paper presents the development of a SPEA using the MACCEPA principle with a self-closing mechanism. This actuator can deliver a bi-directional output torque, variable stiffness and reduced friction. The load on the motor can also be reduced, leading to a lower power consumption. The variable recruitment of the parallel springs can also be tuned in order to further decrease the consumption of the actuator for a given task. First, an explanation of the concept and a brief description of the prior work done will be given. Next, the design and the model of one of the layers will be presented. The working principle of the full actuator will then be given. At the end of this paper, experiments showing the electric consumption of the actuator will display the advantage of the SPEA over an equivalent stiff actuator.

  11. Increasing phylogenetic resolution at low taxonomic levels using massively parallel sequencing of chloroplast genomes

    Directory of Open Access Journals (Sweden)

    Cronn Richard

    2009-12-01

    Full Text Available Abstract Background Molecular evolutionary studies share the common goal of elucidating historical relationships, and the common challenge of adequately sampling taxa and characters. Particularly at low taxonomic levels, recent divergence, rapid radiations, and conservative genome evolution yield limited sequence variation, and dense taxon sampling is often desirable. Recent advances in massively parallel sequencing make it possible to rapidly obtain large amounts of sequence data, and multiplexing makes extensive sampling of megabase sequences feasible. Is it possible to efficiently apply massively parallel sequencing to increase phylogenetic resolution at low taxonomic levels? Results We reconstruct the infrageneric phylogeny of Pinus from 37 nearly-complete chloroplast genomes (average 109 kilobases each of an approximately 120 kilobase genome generated using multiplexed massively parallel sequencing. 30/33 ingroup nodes resolved with ≥ 95% bootstrap support; this is a substantial improvement relative to prior studies, and shows massively parallel sequencing-based strategies can produce sufficient high quality sequence to reach support levels originally proposed for the phylogenetic bootstrap. Resampling simulations show that at least the entire plastome is necessary to fully resolve Pinus, particularly in rapidly radiating clades. Meta-analysis of 99 published infrageneric phylogenies shows that whole plastome analysis should provide similar gains across a range of plant genera. A disproportionate amount of phylogenetic information resides in two loci (ycf1, ycf2, highlighting their unusual evolutionary properties. Conclusion Plastome sequencing is now an efficient option for increasing phylogenetic resolution at lower taxonomic levels in plant phylogenetic and population genetic analyses. With continuing improvements in sequencing capacity, the strategies herein should revolutionize efforts requiring dense taxon and character sampling

  12. Co-ordination of directional overcurrent protection with load current for parallel feeders

    Energy Technology Data Exchange (ETDEWEB)

    Wright, J.W.; Lloyd, G.; Hindle, P.J. [Alstom, Inc., Stafford (United Kingdom). T and D Protection and Control

    1999-11-01

    Directional phase overcurrent relays are commonly applied at the receiving ends of parallel feeders or transformer feeders. Their purpose is to ensure full discrimination of main or back-up power system overcurrent protection for a fault near the receiving end of one feeder. This paper reviews this type of relay application and highlights load current setting constraints for directional protection. Such constraints have not previously been publicized in well-known text books. A directional relay current setting constraint that is suggested in some text books is based purely on thermal rating considerations for older technology relays. This constraint may not exist with modern numerical relays. In the absence of any apparent constraint, there is a temptation to adopt lower current settings with modern directional relays in relation to reverse load current at the receiving ends of parallel feeders. This paper identifies the danger of adopting very low current settings without any special relay feature to ensure protection security with load current during power system faults. A system incident recorded by numerical relays is also offered to highlight this danger. In cases where there is a need to infringe the identified constraints an implemented and testing relaying technique is proposed.

  13. Parallel spatial direct numerical simulations on the Intel iPSC/860 hypercube

    Science.gov (United States)

    Joslin, Ronald D.; Zubair, Mohammad

    1993-01-01

    The implementation and performance of a parallel spatial direct numerical simulation (PSDNS) approach on the Intel iPSC/860 hypercube is documented. The direct numerical simulation approach is used to compute spatially evolving disturbances associated with the laminar-to-turbulent transition in boundary-layer flows. The feasibility of using the PSDNS on the hypercube to perform transition studies is examined. The results indicate that the direct numerical simulation approach can effectively be parallelized on a distributed-memory parallel machine. By increasing the number of processors nearly ideal linear speedups are achieved with nonoptimized routines; slower than linear speedups are achieved with optimized (machine dependent library) routines. This slower than linear speedup results because the Fast Fourier Transform (FFT) routine dominates the computational cost and because the routine indicates less than ideal speedups. However with the machine-dependent routines the total computational cost decreases by a factor of 4 to 5 compared with standard FORTRAN routines. The computational cost increases linearly with spanwise wall-normal and streamwise grid refinements. The hypercube with 32 processors was estimated to require approximately twice the amount of Cray supercomputer single processor time to complete a comparable simulation; however it is estimated that a subgrid-scale model which reduces the required number of grid points and becomes a large-eddy simulation (PSLES) would reduce the computational cost and memory requirements by a factor of 10 over the PSDNS. This PSLES implementation would enable transition simulations on the hypercube at a reasonable computational cost.

  14. Parallel and distributed processing in two SGBDS: A case study

    Directory of Open Access Journals (Sweden)

    Francisco Javier Moreno

    2017-04-01

    Full Text Available Context: One of the strategies for managing large volumes of data is distributed and parallel computing. Among the tools that allow applying these characteristics are some Data Base Management Systems (DBMS, such as Oracle, DB2, and SQL Server. Method: In this paper we present a case study where we evaluate the performance of an SQL query in two of these DBMS. The evaluation is done through various forms of data distribution in a computer network with different degrees of parallelism. Results: The tests of the SQL query evidenced the performance differences between the two DBMS analyzed. However, more thorough testing and a wider variety of queries are needed. Conclusions: The differences in performance between the two DBMSs analyzed show that when evaluating this aspect, it is necessary to consider the particularities of each DBMS and the degree of parallelism of the queries.

  15. Research on Two-channel Interleaved Two-stage Paralleled Buck DC-DC Converter for Plasma Cutting Power Supply

    DEFF Research Database (Denmark)

    Yang, Xi-jun; Qu, Hao; Yao, Chen

    2014-01-01

    As for high power plasma power supply, due to high efficiency and flexibility, multi-channel interleaved multi-stage paralleled Buck DC-DC Converter becomes the first choice. In the paper, two-channel interleaved two- stage paralleled Buck DC-DC Converter powered by three-phase AC power supply...

  16. Two-fluid and parallel compressibility effects in tokamak plasmas

    International Nuclear Information System (INIS)

    Sugiyama, L.E.; Park, W.

    1998-01-01

    conductivity. The true parallel plasma dynamics are driven by additional kinetic processes that are not included in the fluid picture, but the basic fluid effects remain and should be understood first. The two-fluid code is part of a larger project, the M3D, or Multi-Level 3D project [3] for toroidal plasmas. Its goal is to develop a comprehensive suite of simulation models that cover a range of physics from simple to complex, starting from the fluid and progressing toward kinetic models. In addition to the new physics that it describes beyond MHD, the present two-fluid code provides a good base for adding additional, non-fluid effects in a fully electromagnetic and toroidal model. The use of gyrokinetic particle simulation to provide an improved closure for the ion fluid, combined with an electron fluid, is currently under development. (author)

  17. Linear predictions of supercritical flow instability in two parallel channels

    International Nuclear Information System (INIS)

    Shah, M.

    2008-01-01

    A steady state linear code that can predict thermo-hydraulic instability boundaries in a two parallel channel system under supercritical conditions has been developed. Linear and non-linear solutions of the instability boundary in a two parallel channel system are also compared. The effect of gravity on the instability boundary in a two parallel channel system, by changing the orientation of the system flow from horizontal flow to vertical up-flow and vertical down-flow has been analyzed. Vertical up-flow is found to be more unstable than horizontal flow and vertical down flow is found to be the most unstable configuration. The type of instability present in each flow-orientation of a parallel channel system has been checked and the density wave oscillation type is observed in horizontal flow and vertical up-flow, while the static type of instability is observed in a vertical down-flow for the cases studied here. The parameters affecting the instability boundary, such as the heating power, inlet temperature, inlet and outlet K-factors are varied to assess their effects. This study is important for the design of future Generation IV nuclear reactors in which supercritical light water is proposed as the primary coolant. (author)

  18. A two-level real-time vision machine combining coarse and fine grained parallelism

    DEFF Research Database (Denmark)

    Jensen, Lars Baunegaard With; Kjær-Nielsen, Anders; Pauwels, Karl

    2010-01-01

    In this paper, we describe a real-time vision machine having a stereo camera as input generating visual information on two different levels of abstraction. The system provides visual low-level and mid-level information in terms of dense stereo and optical flow, egomotion, indicating areas...... a factor 90 and a reduction of latency of a factor 26 compared to processing on a single CPU--core. Since the vision machine provides generic visual information it can be used in many contexts. Currently it is used in a driver assistance context as well as in two robotic applications....

  19. A parallel direct-forcing fictitious domain method for simulating microswimmers

    Science.gov (United States)

    Gao, Tong; Lin, Zhaowu

    2017-11-01

    We present a 3D parallel direct-forcing fictitious domain method for simulating swimming micro-organisms at small Reynolds numbers. We treat the motile micro-swimmers as spherical rigid particles using the ``Squirmer'' model. The particle dynamics are solved on the moving Larangian meshes that overlay upon a fixed Eulerian mesh for solving the fluid motion, and the momentum exchange between the two phases is resolved by distributing pseudo body-forces over the particle interior regions which constrain the background fictitious fluids to follow the particle movement. While the solid and fluid subproblems are solved separately, no inner-iterations are required to enforce numerical convergence. We demonstrate the accuracy and robustness of the method by comparing our results with the existing analytical and numerical studies for various cases of single particle dynamics and particle-particle interactions. We also perform a series of numerical explorations to obtain statistical and rheological measurements to characterize the dynamics and structures of Squirmer suspensions. NSF DMS 1619960.

  20. Parallel Implementation of Triangular Cellular Automata for Computing Two-Dimensional Elastodynamic Response on Arbitrary Domains

    Science.gov (United States)

    Leamy, Michael J.; Springer, Adam C.

    In this research we report parallel implementation of a Cellular Automata-based simulation tool for computing elastodynamic response on complex, two-dimensional domains. Elastodynamic simulation using Cellular Automata (CA) has recently been presented as an alternative, inherently object-oriented technique for accurately and efficiently computing linear and nonlinear wave propagation in arbitrarily-shaped geometries. The local, autonomous nature of the method should lead to straight-forward and efficient parallelization. We address this notion on symmetric multiprocessor (SMP) hardware using a Java-based object-oriented CA code implementing triangular state machines (i.e., automata) and the MPI bindings written in Java (MPJ Express). We use MPJ Express to reconfigure our existing CA code to distribute a domain's automata to cores present on a dual quad-core shared-memory system (eight total processors). We note that this message passing parallelization strategy is directly applicable to computer clustered computing, which will be the focus of follow-on research. Results on the shared memory platform indicate nearly-ideal, linear speed-up. We conclude that the CA-based elastodynamic simulator is easily configured to run in parallel, and yields excellent speed-up on SMP hardware.

  1. A parallel algorithm for switch-level timing simulation on a hypercube multiprocessor

    Science.gov (United States)

    Rao, Hariprasad Nannapaneni

    1989-01-01

    The parallel approach to speeding up simulation is studied, specifically the simulation of digital LSI MOS circuitry on the Intel iPSC/2 hypercube. The simulation algorithm is based on RSIM, an event driven switch-level simulator that incorporates a linear transistor model for simulating digital MOS circuits. Parallel processing techniques based on the concepts of Virtual Time and rollback are utilized so that portions of the circuit may be simulated on separate processors, in parallel for as large an increase in speed as possible. A partitioning algorithm is also developed in order to subdivide the circuit for parallel processing.

  2. Stratified steady and unsteady two-phase flows between two parallel plates

    International Nuclear Information System (INIS)

    Sim, Woo Gun

    2006-01-01

    To understand fluid dynamic forces acting on a structure subjected to two-phase flow, it is essential to get detailed information about the characteristics of two-phase flow. Stratified steady and unsteady two-phase flows between two parallel plates have been studied to investigate the general characteristics of the flow related to flow-induced vibration. Based on the spectral collocation method, a numerical approach has been developed for the unsteady two-phase flow. The method is validated by comparing numerical result to analytical one given for a simple harmonic two-phase flow. The flow parameters for the steady two-phase flow, such as void fraction and two-phase frictional multiplier, are evaluated. The dynamic characteristics of the unsteady two-phase flow, including the void fraction effect on the complex unsteady pressure, are illustrated

  3. Testing a Quantum Heat Pump with a Two-Level Spin

    Directory of Open Access Journals (Sweden)

    Luis A. Correa

    2016-04-01

    Full Text Available Once in its non-equilibrium steady state, a nanoscale system coupled to several heat baths may be thought of as a “quantum heat pump”. Depending on the direction of its stationary heat flows, it may function as, e.g., a refrigerator or a heat transformer. These continuous heat devices can be arbitrarily complex multipartite systems, and yet, their working principle is always the same: they are made up of several elementary three-level stages operating in parallel. As a result, it is possible to devise external “black-box” testing strategies to learn about their functionality and performance regardless of any internal details. In particular, one such heat pump can be tested by coupling a two-level spin to one of its “contact transitions”. The steady state of this external probe contains information about the presence of heat leaks and internal dissipation in the device and, also, about the direction of its steady-state heat currents. Provided that the irreversibility of the heat pump is low, one can further estimate its coefficient of performance. These techniques may find applications in the emerging field of quantum thermal engineering, as they facilitate the diagnosis and design optimization of complex thermodynamic cycles.

  4. Efficient two-level preconditionined conjugate gradient method on the GPU

    NARCIS (Netherlands)

    Gupta, R.; Van Gijzen, M.B.; Vuik, K.

    2011-01-01

    We present an implementation of Two-Level Preconditioned Conjugate Gradient Method for the GPU. We investigate a Truncated Neumann Series based preconditioner in combination with deflation and compare it with Block Incomplete Cholesky schemes. This combination exhibits fine-grain parallelism and

  5. Controlling nonsequential double ionization of Ne with parallel-polarized two-color laser pulses.

    Science.gov (United States)

    Luo, Siqiang; Ma, Xiaomeng; Xie, Hui; Li, Min; Zhou, Yueming; Cao, Wei; Lu, Peixiang

    2018-05-14

    We measure the recoil-ion momentum distributions from nonsequential double ionization of Ne by two-color laser pulses consisting of a strong 800-nm field and a weak 400-nm field with parallel polarizations. The ion momentum spectra show pronounced asymmetries in the emission direction, which depend sensitively on the relative phase of the two-color components. Moreover, the peak of the doubly charged ion momentum distribution shifts gradually with the relative phase. The shifted range is much larger than the maximal vector potential of the 400-nm laser field. Those features are well recaptured by a semiclassical model. Through analyzing the correlated electron dynamics, we found that the energy sharing between the two electrons is extremely unequal at the instant of recollison. We further show that the shift of the ion momentum corresponds to the change of the recollision time in the two-color laser field. By tuning the relative phase of the two-color components, the recollision time is controlled with attosecond precision.

  6. Parallel thermal radiation transport in two dimensions

    International Nuclear Information System (INIS)

    Smedley-Stevenson, R.P.; Ball, S.R.

    2003-01-01

    This paper describes the distributed memory parallel implementation of a deterministic thermal radiation transport algorithm in a 2-dimensional ALE hydrodynamics code. The parallel algorithm consists of a variety of components which are combined in order to produce a state of the art computational capability, capable of solving large thermal radiation transport problems using Blue-Oak, the 3 Tera-Flop MPP (massive parallel processors) computing facility at AWE (United Kingdom). Particular aspects of the parallel algorithm are described together with examples of the performance on some challenging applications. (author)

  7. Parallel thermal radiation transport in two dimensions

    Energy Technology Data Exchange (ETDEWEB)

    Smedley-Stevenson, R.P.; Ball, S.R. [AWE Aldermaston (United Kingdom)

    2003-07-01

    This paper describes the distributed memory parallel implementation of a deterministic thermal radiation transport algorithm in a 2-dimensional ALE hydrodynamics code. The parallel algorithm consists of a variety of components which are combined in order to produce a state of the art computational capability, capable of solving large thermal radiation transport problems using Blue-Oak, the 3 Tera-Flop MPP (massive parallel processors) computing facility at AWE (United Kingdom). Particular aspects of the parallel algorithm are described together with examples of the performance on some challenging applications. (author)

  8. Connectionist Models and Parallelism in High Level Vision.

    Science.gov (United States)

    1985-01-01

    GRANT NUMBER(s) Jerome A. Feldman N00014-82-K-0193 9. PERFORMING ORGANIZATION NAME AND ADDRESS 10. PROGRAM ELEMENt. PROJECT, TASK Computer Science...Connectionist Models 2.1 Background and Overviev % Computer science is just beginning to look seriously at parallel computation : it may turn out that...the chair. The program includes intermediate level networks that compute more complex joints and ones that compute parallelograms in the image. These

  9. Dynamic workload balancing of parallel applications with user-level scheduling on the Grid

    CERN Document Server

    Korkhov, Vladimir V; Krzhizhanovskaya, Valeria V

    2009-01-01

    This paper suggests a hybrid resource management approach for efficient parallel distributed computing on the Grid. It operates on both application and system levels, combining user-level job scheduling with dynamic workload balancing algorithm that automatically adapts a parallel application to the heterogeneous resources, based on the actual resource parameters and estimated requirements of the application. The hybrid environment and the algorithm for automated load balancing are described, the influence of resource heterogeneity level is measured, and the speedup achieved with this technique is demonstrated for different types of applications and resources.

  10. Novel Differential Current Control Strategy Based on a Modified Three-Level SVPWM for Two Parallel-Connected Inverters

    DEFF Research Database (Denmark)

    Zorig, Abdelmalik; Barkat, Said; Belkheiri, Mohammed

    2017-01-01

    Recently, parallel inverters have been investigated to provide multilevel characteristics besides their advantage to increase the power system capacity, reliability, and efficiency. However, the issue of differential currents imbalance remains a challenge in parallel inverter operation. The distr......Recently, parallel inverters have been investigated to provide multilevel characteristics besides their advantage to increase the power system capacity, reliability, and efficiency. However, the issue of differential currents imbalance remains a challenge in parallel inverter operation....... The distribution of switching vectors of the resulting multilevel topology has a certain degree of self-differential current balancing properties. Nevertheless, the method alone is not sufficient to maintain balanced differential current in practical applications. This paper proposes a closed-loop differential...... current control method by introducing a control variable adjusting the dwell time of the selected switching vectors and thus maintaining the differential currents balanced without affecting the overall system performance. The control strategy, including distribution of switching sequence, selection...

  11. Heating limits of boiling downward two-phase flow in parallel channels

    International Nuclear Information System (INIS)

    Fukuda, Kenji; Kondoh, Tetsuya; Hasegawa, Shu; Sakai, Takaaki.

    1989-01-01

    Flow characteristics and heating limits of downward two-phase flow in single or parallel multi-channels are investigated experimentally and analytically. The heating section used is made of glass tube, in which the heater tube is inserted, and the flow regime inside it is observed. In single channel experiments with low flow rate conditions, it is found that, initially, gas phase which flows upward against the downward liquid phase flow condenses and diminishes as it flows up being cooled by inflowing liquid. However, as the heating power is increased, some portion of the gas phase reaches the top and accumulates to form an liquid level, which eventually causes the dryout. On the other hand, for high flow rate condition, the flooding at the bottom of the heated section is the cause of the dryout. In parallel multi-channels experiments, reversed (upward) flow which leads to the dryout is observed in some of these channels for low flow rate conditions, while the situation is the same to the single channel case for high flow rate conditions. Analyses are carried out to predict the onset of dryout in single channel using the drift flux model as well as the Wallis' flooding correlation. Above-mentioned two types of the dryout and their boundary are predicted which agree well with the experimental results. (author)

  12. Domain decomposition parallel computing for transient two-phase flow of nuclear reactors

    Energy Technology Data Exchange (ETDEWEB)

    Lee, Jae Ryong; Yoon, Han Young [KAERI, Daejeon (Korea, Republic of); Choi, Hyoung Gwon [Seoul National University, Seoul (Korea, Republic of)

    2016-05-15

    KAERI (Korea Atomic Energy Research Institute) has been developing a multi-dimensional two-phase flow code named CUPID for multi-physics and multi-scale thermal hydraulics analysis of Light water reactors (LWRs). The CUPID code has been validated against a set of conceptual problems and experimental data. In this work, the CUPID code has been parallelized based on the domain decomposition method with Message passing interface (MPI) library. For domain decomposition, the CUPID code provides both manual and automatic methods with METIS library. For the effective memory management, the Compressed sparse row (CSR) format is adopted, which is one of the methods to represent the sparse asymmetric matrix. CSR format saves only non-zero value and its position (row and column). By performing the verification for the fundamental problem set, the parallelization of the CUPID has been successfully confirmed. Since the scalability of a parallel simulation is generally known to be better for fine mesh system, three different scales of mesh system are considered: 40000 meshes for coarse mesh system, 320000 meshes for mid-size mesh system, and 2560000 meshes for fine mesh system. In the given geometry, both single- and two-phase calculations were conducted. In addition, two types of preconditioners for a matrix solver were compared: Diagonal and incomplete LU preconditioner. In terms of enhancement of the parallel performance, the OpenMP and MPI hybrid parallel computing for a pressure solver was examined. It is revealed that the scalability of hybrid calculation was enhanced for the multi-core parallel computation.

  13. The specificity of learned parallelism in dual-memory retrieval.

    Science.gov (United States)

    Strobach, Tilo; Schubert, Torsten; Pashler, Harold; Rickard, Timothy

    2014-05-01

    Retrieval of two responses from one visually presented cue occurs sequentially at the outset of dual-retrieval practice. Exclusively for subjects who adopt a mode of grouping (i.e., synchronizing) their response execution, however, reaction times after dual-retrieval practice indicate a shift to learned retrieval parallelism (e.g., Nino & Rickard, in Journal of Experimental Psychology: Learning, Memory, and Cognition, 29, 373-388, 2003). In the present study, we investigated how this learned parallelism is achieved and why it appears to occur only for subjects who group their responses. Two main accounts were considered: a task-level versus a cue-level account. The task-level account assumes that learned retrieval parallelism occurs at the level of the task as a whole and is not limited to practiced cues. Grouping response execution may thus promote a general shift to parallel retrieval following practice. The cue-level account states that learned retrieval parallelism is specific to practiced cues. This type of parallelism may result from cue-specific response chunking that occurs uniquely as a consequence of grouped response execution. The results of two experiments favored the second account and were best interpreted in terms of a structural bottleneck model.

  14. Parallel and distributed processing in two SGBDS: A case study

    OpenAIRE

    Francisco Javier Moreno; Nataly Castrillón Charari; Camilo Taborda Zuluaga

    2017-01-01

    Context: One of the strategies for managing large volumes of data is distributed and parallel computing. Among the tools that allow applying these characteristics are some Data Base Management Systems (DBMS), such as Oracle, DB2, and SQL Server. Method: In this paper we present a case study where we evaluate the performance of an SQL query in two of these DBMS. The evaluation is done through various forms of data distribution in a computer network with different degrees of parallelism. ...

  15. A study on evaluating validity of SNR calculation using a conventional two region method in MR images applied a multichannel coil and parallel imaging technique

    Energy Technology Data Exchange (ETDEWEB)

    Choi, Kwan Woo; Son, Soon Yong [Dept. of Radiology, Asan Medical Center, Seoul (Korea, Republic of); Min, Jung Whan [Dept. of Radiological Technology, Shingu University, Sungnam (Korea, Republic of); Kwon, Kyung Tae [Dept. of Radiological Technology, Dongnam Health University, Suwon (Korea, Republic of); Yoo, Beong Gyu; Lee, Jong Seok [Dept. of Radiotechnology, Wonkwang Health Science University, Iksan (Korea, Republic of)

    2015-12-15

    The purpose of this study was to investigate the problems of a signal to noise ratio measurement using a two region measurement method that is conventionally used when using a multi-channel coil and a parallel imaging technique. As a research method, after calculating the standard SNR using a single channel head coil of which coil satisfies three preconditions when using a two region measurement method, we made comparisons and evaluations after calculating an SNR by using a two region measurement method of which method is problematic because it is used without considering the methods recommended by reputable organizations and the preconditions at the time of using a multi-channel coil and a parallel imaging technique. We found that a two region measurement method using a multi-channel coil and a parallel imaging technique shows the highest relative standard deviation, and thus shows a low degree of precision. In addition, we found out that the difference of SNR according to ROI location was very high, and thus a spatial noise distribution was not uniform. Also, 95% confidence interval through Blend-Altman plot is the widest, and thus the conformity degree with a two region measurement method using the standard single channel head coil is low. By directly comparing an AAPM method, which serves as a standard of a performance evaluation test of a magnetic resonance imaging device under the same image acquisition conditions, an NEMA method which can accurately determine the noise level in a signal region and the methods recommended by manufacturers of a magnetic resonance imaging device, there is a significance in that we quantitatively verified the inaccurate problems of a signal to noise ratio using a two region measurement method when using a multi-channel coil and a parallel imaging technique of which method does not satisfy the preconditions that researchers could overlook.

  16. Overview of the Force Scientific Parallel Language

    Directory of Open Access Journals (Sweden)

    Gita Alaghband

    1994-01-01

    Full Text Available The Force parallel programming language designed for large-scale shared-memory multiprocessors is presented. The language provides a number of parallel constructs as extensions to the ordinary Fortran language and is implemented as a two-level macro preprocessor to support portability across shared memory multiprocessors. The global parallelism model on which the Force is based provides a powerful parallel language. The parallel constructs, generic synchronization, and freedom from process management supported by the Force has resulted in structured parallel programs that are ported to the many multiprocessors on which the Force is implemented. Two new parallel constructs for looping and functional decomposition are discussed. Several programming examples to illustrate some parallel programming approaches using the Force are also presented.

  17. Parallel-Batch Scheduling with Two Models of Deterioration to Minimize the Makespan

    Directory of Open Access Journals (Sweden)

    Cuixia Miao

    2014-01-01

    Full Text Available We consider the bounded parallel-batch scheduling with two models of deterioration, in which the processing time of the first model is pj=aj+αt and of the second model is pj=a+αjt. The objective is to minimize the makespan. We present O(n log n time algorithms for the single-machine problems, respectively. And we propose fully polynomial time approximation schemes to solve the identical-parallel-machine problem and uniform-parallel-machine problem, respectively.

  18. Automatic Loop Parallelization via Compiler Guided Refactoring

    DEFF Research Database (Denmark)

    Larsen, Per; Ladelsky, Razya; Lidman, Jacob

    For many parallel applications, performance relies not on instruction-level parallelism, but on loop-level parallelism. Unfortunately, many modern applications are written in ways that obstruct automatic loop parallelization. Since we cannot identify sufficient parallelization opportunities...... for these codes in a static, off-line compiler, we developed an interactive compilation feedback system that guides the programmer in iteratively modifying application source, thereby improving the compiler’s ability to generate loop-parallel code. We use this compilation system to modify two sequential...... benchmarks, finding that the code parallelized in this way runs up to 8.3 times faster on an octo-core Intel Xeon 5570 system and up to 12.5 times faster on a quad-core IBM POWER6 system. Benchmark performance varies significantly between the systems. This suggests that semi-automatic parallelization should...

  19. Convergent Evolution of Hemoglobin Function in High-Altitude Andean Waterfowl Involves Limited Parallelism at the Molecular Sequence Level.

    Directory of Open Access Journals (Sweden)

    Chandrasekhar Natarajan

    2015-12-01

    Full Text Available A fundamental question in evolutionary genetics concerns the extent to which adaptive phenotypic convergence is attributable to convergent or parallel changes at the molecular sequence level. Here we report a comparative analysis of hemoglobin (Hb function in eight phylogenetically replicated pairs of high- and low-altitude waterfowl taxa to test for convergence in the oxygenation properties of Hb, and to assess the extent to which convergence in biochemical phenotype is attributable to repeated amino acid replacements. Functional experiments on native Hb variants and protein engineering experiments based on site-directed mutagenesis revealed the phenotypic effects of specific amino acid replacements that were responsible for convergent increases in Hb-O2 affinity in multiple high-altitude taxa. In six of the eight taxon pairs, high-altitude taxa evolved derived increases in Hb-O2 affinity that were caused by a combination of unique replacements, parallel replacements (involving identical-by-state variants with independent mutational origins in different lineages, and collateral replacements (involving shared, identical-by-descent variants derived via introgressive hybridization. In genome scans of nucleotide differentiation involving high- and low-altitude populations of three separate species, function-altering amino acid polymorphisms in the globin genes emerged as highly significant outliers, providing independent evidence for adaptive divergence in Hb function. The experimental results demonstrate that convergent changes in protein function can occur through multiple historical paths, and can involve multiple possible mutations. Most cases of convergence in Hb function did not involve parallel substitutions and most parallel substitutions did not affect Hb-O2 affinity, indicating that the repeatability of phenotypic evolution does not require parallelism at the molecular level.

  20. The level 1 and 2 specification for parallel benchmark and a benchmark test of scalar-parallel computer SP2 based on the specifications

    International Nuclear Information System (INIS)

    Orii, Shigeo

    1998-06-01

    A benchmark specification for performance evaluation of parallel computers for numerical analysis is proposed. Level 1 benchmark, which is a conventional type benchmark using processing time, measures performance of computers running a code. Level 2 benchmark proposed in this report is to give the reason of the performance. As an example, scalar-parallel computer SP2 is evaluated with this benchmark specification in case of a molecular dynamics code. As a result, the main causes to suppress the parallel performance are maximum band width and start-up time of communication between nodes. Especially the start-up time is proportional not only to the number of processors but also to the number of particles. (author)

  1. Automatic Thread-Level Parallelization in the Chombo AMR Library

    Energy Technology Data Exchange (ETDEWEB)

    Christen, Matthias; Keen, Noel; Ligocki, Terry; Oliker, Leonid; Shalf, John; Van Straalen, Brian; Williams, Samuel

    2011-05-26

    The increasing on-chip parallelism has some substantial implications for HPC applications. Currently, hybrid programming models (typically MPI+OpenMP) are employed for mapping software to the hardware in order to leverage the hardware?s architectural features. In this paper, we present an approach that automatically introduces thread level parallelism into Chombo, a parallel adaptive mesh refinement framework for finite difference type PDE solvers. In Chombo, core algorithms are specified in the ChomboFortran, a macro language extension to F77 that is part of the Chombo framework. This domain-specific language forms an already used target language for an automatic migration of the large number of existing algorithms into a hybrid MPI+OpenMP implementation. It also provides access to the auto-tuning methodology that enables tuning certain aspects of an algorithm to hardware characteristics. Performance measurements are presented for a few of the most relevant kernels with respect to a specific application benchmark using this technique as well as benchmark results for the entire application. The kernel benchmarks show that, using auto-tuning, up to a factor of 11 in performance was gained with 4 threads with respect to the serial reference implementation.

  2. Type Synthesis of Parallel Mechanisms with the First Class GF Sets and Two-Dimensional Rotations

    Directory of Open Access Journals (Sweden)

    Jialun Yang

    2012-09-01

    Full Text Available The novel design of parallel mechanisms plays a key role in the potential application of parallel mechanisms. In this paper, the type synthesis of parallel mechanisms with the first class GF sets and two-dimensional rotations is studied. The rule of two-dimensional rotations is given, which lays the theoretical foundation for the intersection operations of specific GF sets. Next, kinematic limbs with specific characteristics are designed according to the 2-D and 3-D axes movement theorems. Finally, several synthesized parallel mechanisms with the first class GF sets and two-dimensional rotations are illustrated to show the effectiveness of the proposed methodology.

  3. Short-term gas dispersion in idealised urban canopy in street parallel with flow direction

    Science.gov (United States)

    Chaloupecká, Hana; Jaňour, Zbyněk; Nosek, Štěpán

    2016-03-01

    Chemical attacks (e.g. Syria 2014-15 chlorine, 2013 sarine or Iraq 2006-7 chlorine) as well as chemical plant disasters (e.g. Spain 2015 nitric oxide, ferric chloride; Texas 2014 methyl mercaptan) threaten mankind. In these crisis situations, gas clouds are released. Dispersion of gas clouds is the issue of interest investigated in this paper. The paper describes wind tunnel experiments of dispersion from ground level point gas source. The source is situated in a model of an idealised urban canopy. The short duration releases of passive contaminant ethane are created by an electromagnetic valve. The gas cloud concentrations are measured in individual places at the height of the human breathing zone within a street parallel with flow direction by Fast-response Ionisation Detector. The simulations of the gas release for each measurement position are repeated many times under the same experimental set up to obtain representative datasets. These datasets are analysed to compute puff characteristics (arrival, leaving time and duration). The results indicate that the mean value of the dimensionless arrival time can be described as a growing linear function of the dimensionless coordinate in the street parallel with flow direction where the gas source is situated. The same might be stated about the dimensionless leaving time as well as the dimensionless duration, however these fits are worse. Utilising a linear function, we might also estimate some other statistical characteristics from datasets than the datasets means (medians, trimeans). The datasets of the dimensionless arrival time, the dimensionless leaving time and the dimensionless duration can be fitted by the generalized extreme value distribution (GEV) in all sampling positions except one.

  4. A method of paralleling computer calculation for two-dimensional kinetic plasma model

    International Nuclear Information System (INIS)

    Brazhnik, V.A.; Demchenko, V.V.; Dem'yanov, V.G.; D'yakov, V.E.; Ol'shanskij, V.V.; Panchenko, V.I.

    1987-01-01

    A method for parallel computer calculation and OSIRIS program complex realizing it and designed for numerical plasma simulation by the macroparticle method are described. The calculation can be carried out either with one or simultaneously with two computers BESM-6, that is provided by some package of interacting programs functioning in every computer. Program interaction in every computer is based on event techniques realized in OS DISPAK. Parallel computer calculation with two BESM-6 computers allows to accelerate the computation 1.5 times

  5. MMC with parallel-connected MOSFETs as an alternative to wide bandgap converters for LVDC distribution networks

    Directory of Open Access Journals (Sweden)

    Yanni Zhong

    2017-03-01

    Full Text Available Low-voltage direct-current (LVDC networks offer improved conductor utilisation on existing infrastructure and reduced conversion stages, which can lead to a simpler and more efficient distribution network. However, LVDC networks must continue to support AC loads, requiring efficient, low-distortion DC–AC converters. Additionally, increasing numbers of DC loads on the LVAC network require controlled, low-distortion, unity power factor AC-DC converters with large capacity, and bi-directional capability. An AC–DC/DC–AC converter design is therefore proposed in this study to minimise conversion loss and maximise power quality. Comparative analysis is performed for a conventional IGBT two-level converter, a SiC MOSFET two-level converter, a Si MOSFET modular multi-level converter (MMC and a GaN HEMT MMC, in terms of power loss, reliability, fault tolerance, converter cost and heatsink size. The analysis indicates that the five-level MMC with parallel-connected Si MOSFETs is an efficient, cost-effective converter for low-voltage converter applications. MMC converters suffer negligible switching loss, which enables reduced device switching without loss penalty from increased harmonics and filtering. Optimal extent of parallel-connection for MOSFETs in an MMC is investigated. Experimental results are presented to show the reduction in device stress and electromagnetic interference generating transients through the use of reduced switching and device parallel-connection.

  6. Direct and iterative algorithms for the parallel solution of the one-dimensional macroscopic Navier-Stokes equations

    International Nuclear Information System (INIS)

    Doster, J.M.; Sills, E.D.

    1986-01-01

    Current efforts are under way to develop and evaluate numerical algorithms for the parallel solution of the large sparse matrix equations associated with the finite difference representation of the macroscopic Navier-Stokes equations. Previous work has shown that these equations can be cast into smaller coupled matrix equations suitable for solution utilizing multiple computer processors operating in parallel. The individual processors themselves may exhibit parallelism through the use of vector pipelines. This wor, has concentrated on the one-dimensional drift flux form of the Navier-Stokes equations. Direct and iterative algorithms that may be suitable for implementation on parallel computer architectures are evaluated in terms of accuracy and overall execution speed. This work has application to engineering and training simulations, on-line process control systems, and engineering workstations where increased computational speeds are required

  7. Distance-two interpolation for parallel algebraic multigrid

    International Nuclear Information System (INIS)

    Sterck, H de; Falgout, R D; Nolting, J W; Yang, U M

    2007-01-01

    In this paper we study the use of long distance interpolation methods with the low complexity coarsening algorithm PMIS. AMG performance and scalability is compared for classical as well as long distance interpolation methods on parallel computers. It is shown that the increased interpolation accuracy largely restores the scalability of AMG convergence factors for PMIS-coarsened grids, and in combination with complexity reducing methods, such as interpolation truncation, one obtains a class of parallel AMG methods that enjoy excellent scalability properties on large parallel computers

  8. Renal magnetic resonance angiography at 3.0 Tesla using a 32-element phased-array coil system and parallel imaging in 2 directions.

    Science.gov (United States)

    Fenchel, Michael; Nael, Kambiz; Deshpande, Vibhas S; Finn, J Paul; Kramer, Ulrich; Miller, Stephan; Ruehm, Stefan; Laub, Gerhard

    2006-09-01

    The aim of the present study was to assess the feasibility of renal magnetic resonance angiography at 3.0 T using a phased-array coil system with 32-coil elements. Specifically, high parallel imaging factors were used for an increased spatial resolution and anatomic coverage of the whole abdomen. Signal-to-noise values and the g-factor distribution of the 32 element coil were examined in phantom studies for the magnetic resonance angiography (MRA) sequence. Eleven volunteers (6 men, median age of 30.0 years) were examined on a 3.0-T MR scanner (Magnetom Trio, Siemens Medical Solutions, Malvern, PA) using a 32-element phased-array coil (prototype from In vivo Corp.). Contrast-enhanced 3D-MRA (TR 2.95 milliseconds, TE 1.12 milliseconds, flip angle 25-30 degrees , bandwidth 650 Hz/pixel) was acquired with integrated generalized autocalibrating partially parallel acquisition (GRAPPA), in both phase- and slice-encoding direction. Images were assessed by 2 independent observers with regard to image quality, noise and presence of artifacts. Signal-to-noise levels of 22.2 +/- 22.0 and 57.9 +/- 49.0 were measured with (GRAPPAx6) and without parallel-imaging, respectively. The mean g-factor of the 32-element coil for GRAPPA with an acceleration of 3 and 2 in the phase-encoding and slice-encoding direction, respectively, was 1.61. High image quality was found in 9 of 11 volunteers (2.6 +/- 0.8) with good overall interobserver agreement (k = 0.87). Relatively low image quality with higher noise levels were encountered in 2 volunteers. MRA at 3.0 T using a 32-element phased-array coil is feasible in healthy volunteers. High diagnostic image quality and extended anatomic coverage could be achieved with application of high parallel imaging factors.

  9. A simple image-reject mixer based on two parallel phase modulators

    Science.gov (United States)

    Hu, Dapeng; Zhao, Shanghong; Zhu, Zihang; Li, Xuan; Qu, Kun; Lin, Tao; Zhang, Kun

    2018-02-01

    A simple photonic microwave image-reject mixer (IRM) using two parallel phase modulators is proposed. First, a photonic microwave mixer with phase shift ability is achieved using two parallel phase modulators (PMs), an optical bandpass filter, three polarization controllers, three polarization beam splitters and two balanced photodetectors. At the output of the mixer, two frequency downconverted signals with tunable frequency difference can be obtained. By adjusting the phase difference as 90° and utilizing an electrical 90° hybrid, the useless components can be eliminated, and the image reject operation is realized. The key advantage of the proposed scheme is the usage of PM, which avoid the DC bias shifting problem and make the system simple and stable. A simulation is performed to verify the proposed scheme, a relative - 90° or 90° phase shift can be obtained between the two output ports of the photonic microwave mixer, at the output of the IRM, 60 dB image-reject ratio is obtained.

  10. Device-independent parallel self-testing of two singlets

    Science.gov (United States)

    Wu, Xingyao; Bancal, Jean-Daniel; McKague, Matthew; Scarani, Valerio

    2016-06-01

    Device-independent self-testing offers the possibility of certifying the quantum state and measurements, up to local isometries, using only the statistics observed by querying uncharacterized local devices. In this paper we study parallel self-testing of two maximally entangled pairs of qubits; in particular, the local tensor product structure is not assumed but derived. We prove two criteria that achieve the desired result: a double use of the Clauser-Horne-Shimony-Holt inequality and the 3 ×3 magic square game. This demonstrate that the magic square game can only be perfectly won by measuring a two-singlet state. The tolerance to noise is well within reach of state-of-the-art experiments.

  11. PPOOLEX experiments with two parallel blowdown pipes

    Energy Technology Data Exchange (ETDEWEB)

    Laine, J.; Puustinen, M.; Raesaenen, A. (Lappeenranta Univ. of Technology, Nuclear Safety Research Unit (Finland))

    2011-01-15

    This report summarizes the results of the experiments with two transparent blowdown pipes carried out with the scaled down PPOOLEX test facility designed and constructed at Lappeenranta University of Technology. Steam was blown into the dry well compartment and from there through either one or two vertical transparent blowdown pipes to the condensation pool. Five experiments with one pipe and six with two parallel pipes were carried out. The main purpose of the experiments was to study loads caused by chugging (rapid condensation) while steam is discharged into the condensation pool filled with sub-cooled water. The PPOOLEX test facility is a closed stainless steel vessel divided into two compartments, dry well and wet well. In the experiments the initial temperature of the condensation pool water varied from 12 deg. C to 55 deg. C, the steam flow rate from 40 g/s to 1 300 g/s and the temperature of incoming steam from 120 deg. C to 185 deg. C. In the experiments with only one transparent blowdown pipe chugging phenomenon didn't occur as intensified as in the preceding experiments carried out with a DN200 stainless steel pipe. With the steel blowdown pipe even 10 times higher pressure pulses were registered inside the pipe. Meanwhile, loads registered in the pool didn't indicate significant differences between the steel and polycarbonate pipe experiments. In the experiments with two transparent blowdown pipes, the steamwater interface moved almost synchronously up and down inside both pipes. Chugging was stronger than in the one pipe experiments and even two times higher loads were measured inside the pipes. The loads at the blowdown pipe outlet were approximately the same as in the one pipe cases. Other registered loads around the pool were about 50-100 % higher than with one pipe. The experiments with two parallel blowdown pipes gave contradictory results compared to the earlier studies dealing with chugging loads in case of multiple pipes. Contributing

  12. Parallel solutions of the two-group neutron diffusion equations

    International Nuclear Information System (INIS)

    Zee, K.S.; Turinsky, P.J.

    1987-01-01

    Recent efforts to adapt various numerical solution algorithms to parallel computer architectures have addressed the possibility of substantially reducing the running time of few-group neutron diffusion calculations. The authors have developed an efficient iterative parallel algorithm and an associated computer code for the rapid solution of the finite difference method representation of the two-group neutron diffusion equations on the CRAY X/MP-48 supercomputer having multi-CPUs and vector pipelines. For realistic simulation of light water reactor cores, the code employees a macroscopic depletion model with trace capability for selected fission product transients and critical boron. In addition to this, moderator and fuel temperature feedback models are also incorporated into the code. The validity of the physics models used in the code were benchmarked against qualified codes and proved accurate. This work is an extension of previous work in that various feedback effects are accounted for in the system; the entire code is structured to accommodate extensive vectorization; and an additional parallelism by multitasking is achieved not only for the solution of the matrix equations associated with the inner iterations but also for the other segments of the code, e.g., outer iterations

  13. III - Template Metaprogramming for massively parallel scientific computing - Templates for Iteration; Thread-level Parallelism

    CERN Multimedia

    CERN. Geneva

    2016-01-01

    Large scale scientific computing raises questions on different levels ranging from the fomulation of the problems to the choice of the best algorithms and their implementation for a specific platform. There are similarities in these different topics that can be exploited by modern-style C++ template metaprogramming techniques to produce readable, maintainable and generic code. Traditional low-level code tend to be fast but platform-dependent, and it obfuscates the meaning of the algorithm. On the other hand, object-oriented approach is nice to read, but may come with an inherent performance penalty. These lectures aim to present he basics of the Expression Template (ET) idiom which allows us to keep the object-oriented approach without sacrificing performance. We will in particular show to to enhance ET to include SIMD vectorization. We will then introduce techniques for abstracting iteration, and introduce thread-level parallelism for use in heavy data-centric loads. We will show to to apply these methods i...

  14. Position Analysis of a Hybrid Serial-Parallel Manipulator in Immersion Lithography

    Directory of Open Access Journals (Sweden)

    Jie-jie Shao

    2015-01-01

    Full Text Available This paper proposes a novel hybrid serial-parallel mechanism with 6 degrees of freedom. The new mechanism combines two different parallel modules in a serial form. 3-P̲(PH parallel module is architecture of 3 degrees of freedom based on higher joints and specializes in describing two planes’ relative pose. 3-P̲SP parallel module is typical architecture which has been widely investigated in recent researches. In this paper, the direct-inverse position problems of the 3-P̲SP parallel module in the couple mixed-type mode are analyzed in detail, and the solutions are obtained in an analytical form. Furthermore, the solutions for the direct and inverse position problems of the novel hybrid serial-parallel mechanism are also derived and obtained in the analytical form. The proposed hybrid serial-parallel mechanism is applied to regulate the immersion hood’s pose in an immersion lithography system. Through measuring and regulating the pose of the immersion hood with respect to the wafer surface simultaneously, the immersion hood can track the wafer surface’s pose in real-time and the gap status is stabilized. This is another exploration to hybrid serial-parallel mechanism’s application.

  15. Comparison of the deflated preconditioned conjugate gradient method and parallel direct solver for composite materials

    NARCIS (Netherlands)

    Jönsthövel, T.B.; Van Gijzen, M.B.; MacLachlan, S.; Vuik, C.; Scarpas, A.

    2011-01-01

    The demand for large FE meshes increases as parallel computing becomes the standard in FE simulations. Direct and iterative solution methods are used to solve the resulting linear systems. Many applications concern composite materials, which are characterized by large discontinuities in the material

  16. Parallel S/sub n/ iteration schemes

    International Nuclear Information System (INIS)

    Wienke, B.R.; Hiromoto, R.E.

    1986-01-01

    The iterative, multigroup, discrete ordinates (S/sub n/) technique for solving the linear transport equation enjoys widespread usage and appeal. Serial iteration schemes and numerical algorithms developed over the years provide a timely framework for parallel extension. On the Denelcor HEP, the authors investigate three parallel iteration schemes for solving the one-dimensional S/sub n/ transport equation. The multigroup representation and serial iteration methods are also reviewed. This analysis represents a first attempt to extend serial S/sub n/ algorithms to parallel environments and provides good baseline estimates on ease of parallel implementation, relative algorithm efficiency, comparative speedup, and some future directions. The authors examine ordered and chaotic versions of these strategies, with and without concurrent rebalance and diffusion acceleration. Two strategies efficiently support high degrees of parallelization and appear to be robust parallel iteration techniques. The third strategy is a weaker parallel algorithm. Chaotic iteration, difficult to simulate on serial machines, holds promise and converges faster than ordered versions of the schemes. Actual parallel speedup and efficiency are high and payoff appears substantial

  17. Parallels between a Collaborative Research Process and the Middle Level Philosophy

    Science.gov (United States)

    Dever, Robin; Ross, Diane; Miller, Jennifer; White, Paula; Jones, Karen

    2014-01-01

    The characteristics of the middle level philosophy as described in This We Believe closely parallel the collaborative research process. The journey of one research team is described in relationship to these characteristics. The collaborative process includes strengths such as professional relationships, professional development, courageous…

  18. GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers

    Directory of Open Access Journals (Sweden)

    Mark James Abraham

    2015-09-01

    Full Text Available GROMACS is one of the most widely used open-source and free software codes in chemistry, used primarily for dynamical simulations of biomolecules. It provides a rich set of calculation types, preparation and analysis tools. Several advanced techniques for free-energy calculations are supported. In version 5, it reaches new performance heights, through several new and enhanced parallelization algorithms. These work on every level; SIMD registers inside cores, multithreading, heterogeneous CPU–GPU acceleration, state-of-the-art 3D domain decomposition, and ensemble-level parallelization through built-in replica exchange and the separate Copernicus framework. The latest best-in-class compressed trajectory storage format is supported.

  19. Analysis and Modeling of Circulating Current in Two Parallel-Connected Inverters

    DEFF Research Database (Denmark)

    Maheshwari, Ram Krishan; Gohil, Ghanshyamsinh Vijaysinh; Bede, Lorand

    2015-01-01

    Parallel-connected inverters are gaining attention for high power applications because of the limited power handling capability of the power modules. Moreover, the parallel-connected inverters may have low total harmonic distortion of the ac current if they are operated with the interleaved pulse...... this model, the circulating current between two parallel-connected inverters is analysed in this study. The peak and root mean square (rms) values of the normalised circulating current are calculated for different PWM methods, which makes this analysis a valuable tool to design a filter for the circulating......-width modulation (PWM). However, the interleaved PWM causes a circulating current between the inverters, which in turn causes additional losses. A model describing the dynamics of the circulating current is presented in this study which shows that the circulating current depends on the common-mode voltage. Using...

  20. Computing NLTE Opacities -- Node Level Parallel Calculation

    Energy Technology Data Exchange (ETDEWEB)

    Holladay, Daniel [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2015-09-11

    Presentation. The goal: to produce a robust library capable of computing reasonably accurate opacities inline with the assumption of LTE relaxed (non-LTE). Near term: demonstrate acceleration of non-LTE opacity computation. Far term (if funded): connect to application codes with in-line capability and compute opacities. Study science problems. Use efficient algorithms that expose many levels of parallelism and utilize good memory access patterns for use on advanced architectures. Portability to multiple types of hardware including multicore processors, manycore processors such as KNL, GPUs, etc. Easily coupled to radiation hydrodynamics and thermal radiative transfer codes.

  1. Speed Sensorless vector control of parallel-connected three-phase two-motor single-inverter drive system

    DEFF Research Database (Denmark)

    Gunabalan, Ramachandiran; Sanjeevikumar, Padmanaban; Blaabjerg, Frede

    2016-01-01

    to noise and parameter uncertainty. The gain matrix is absent in the natural observer. The rotor speed is estimated from the load torque, stator current, and rotor flux. Under symmetrical load conditions, the difference in speed between two induction motors is reduced by considering the motor parameters......This paper presents the characteristic behavior of direct vector control of two induction motors with sensorless speed feedback having the same rating parameters, paralleled combination, and supplied from a single current-controlled pulse-width-modulated voltage-source inverter drive. Natural...... observer design technique is known for its simple construction, which estimates the speed and rotor fluxes. Load torque is estimated by load torque adaptation and the average rotor flux was maintained constant by rotor flux feedback control. The technique’s convergence rate is very fast and is robust...

  2. GPU-based, parallel-line, omni-directional integration of measured acceleration field to obtain the 3D pressure distribution

    Science.gov (United States)

    Wang, Jin; Zhang, Cao; Katz, Joseph

    2016-11-01

    A PIV based method to reconstruct the volumetric pressure field by direct integration of the 3D material acceleration directions has been developed. Extending the 2D virtual-boundary omni-directional method (Omni2D, Liu & Katz, 2013), the new 3D parallel-line omni-directional method (Omni3D) integrates the material acceleration along parallel lines aligned in multiple directions. Their angles are set by a spherical virtual grid. The integration is parallelized on a Tesla K40c GPU, which reduced the computing time from three hours to one minute for a single realization. To validate its performance, this method is utilized to calculate the 3D pressure fields in isotropic turbulence and channel flow using the JHU DNS Databases (http://turbulence.pha.jhu.edu). Both integration of the DNS acceleration as well as acceleration from synthetic 3D particles are tested. Results are compared to other method, e.g. solution to the Pressure Poisson Equation (e.g. PPE, Ghaemi et al., 2012) with Bernoulli based Dirichlet boundary conditions, and the Omni2D method. The error in Omni3D prediction is uniformly low, and its sensitivity to acceleration errors is local. It agrees with the PPE/Bernoulli prediction away from the Dirichlet boundary. The Omni3D method is also applied to experimental data obtained using tomographic PIV, and results are correlated with deformation of a compliant wall. ONR.

  3. Two-dimensional evaluation of an ion plasma produced by pulsed lasers extracted by non-parallel collectors

    International Nuclear Information System (INIS)

    Mahdieh, M H; Gavili, A

    2003-01-01

    Two-dimensional hydrodynamics of ion extraction from quasi-neutral plasmas has been calculated numerically for non-parallel ion extractors, and the results compared with those for the parallel case. The ions were assumed to be initially uniform with a very steep density profile at the boundaries, and held between two non-parallel metal plates as cathode and anode with fixed potentials. Experimentally, tunable pulsed lasers through stepwise photo-excitation and photo-ionization or multi-photo-ionization processes can produce such plasma. Poisson's equation was solved simultaneously with the equations of mass and momentum, assuming the Maxwell-Boltzmann distribution for electrons. Ordinary Cartesian co-ordinates are not suitable for the rotated extractor geometry; therefore using the 'algebraic method' a transformation from the physical domain into the computational rectangular plane is applied for analysing the irregular boundaries. Such a technique provides adequate resolution for the boundary layer. Using a first-order explicit upwind differencing in an appropriate transformed Cartesian co-ordinate system, the hydrodynamics of the plasma ions between the two non-parallel electrodes was evaluated. In these calculations electric potential, ion density between the two electrodes, and the extraction time were assessed, considering three separate regions for the plasma, i.e. the ion sheath where (n i >>n e ∼0), the transition region (pre-sheath) (n i = n e ), and the quasi-neutral plasma (n i -n e i ). The results were compared with those for parallel electrodes. A significant discrepancy was found between the two results. From the calculation, the non-uniform asymmetric potential contour, and the ion density contour across the plasma, were obtained for the non-parallel electrodes. For comparison with the parallel extractors, we have also obtained almost the same extraction time for the non-parallel extractors

  4. Step by step parallel programming method for molecular dynamics code

    International Nuclear Information System (INIS)

    Orii, Shigeo; Ohta, Toshio

    1996-07-01

    Parallel programming for a numerical simulation program of molecular dynamics is carried out with a step-by-step programming technique using the two phase method. As a result, within the range of a certain computing parameters, it is found to obtain parallel performance by using the level of parallel programming which decomposes the calculation according to indices of do-loops into each processor on the vector parallel computer VPP500 and the scalar parallel computer Paragon. It is also found that VPP500 shows parallel performance in wider range computing parameters. The reason is that the time cost of the program parts, which can not be reduced by the do-loop level of the parallel programming, can be reduced to the negligible level by the vectorization. After that, the time consuming parts of the program are concentrated on less parts that can be accelerated by the do-loop level of the parallel programming. This report shows the step-by-step parallel programming method and the parallel performance of the molecular dynamics code on VPP500 and Paragon. (author)

  5. Parallel two-phase-flow-induced vibrations in fuel pin model

    International Nuclear Information System (INIS)

    Hara, Fumio; Yamashita, Tadashi

    1978-01-01

    This paper reports the experimental results of vibrations of a fuel pin model -herein meaning the essential form of a fuel pin from the standpoint of vibration- in a parallel air-and-water two-phase flow. The essential part of the experimental apparatus consisted of a flat elastic strip made of stainless steel, both ends of which were firmly supported in a circular channel conveying the two-phase fluid. Vibrational strain of the fuel pin model, pressure fluctuation of the two-phase flow and two-phase-flow void signals were measured. Statistical measures such as power spectral density, variance and correlation function were calculated. The authors obtained (1) the relation between variance of vibrational strain and two-phase-flow velocity, (2) the relation between variance of vibrational strain and two-phase-flow pressure fluctuation, (3) frequency characteristics of variance of vibrational strain against the dominant frequency of the two-phase-flow pressure fluctuation, and (4) frequency characteristics of variance of vibrational strain against the dominant frequency of two-phase-flow void signals. The authors conclude that there exist two kinds of excitation mechanisms in vibrations of a fuel pin model inserted in a parallel air-and-water two-phase flow; namely, (1) parametric excitation, which occurs when the fundamental natural frequency of the fuel pin model is related to the dominant travelling frequency of water slugs in the two-phase flow by the ratio 1/2, 1/1, 3/2 and so on; and (2) vibrational resonance, which occurs when the fundamental frequency coincides with the dominant frequency of the two-phase-flow pressure fluctuation. (auth.)

  6. Plastic collapse behavior for thin tube with two parallel cracks

    International Nuclear Information System (INIS)

    Moon, Seong In; Chang, Yoon Suk; Kim, Young Jin; Lee, Jin Ho; Song, Myung Ho; Choi, Young Hwan; Kim, Joung Soo

    2004-01-01

    The current plugging criterion is known to be too conservative for some locations and types of defects. Many defects detected during in-service inspection take on the form of multiple cracks at the top of tube sheet but there is no reliable plugging criterion for the steam generator tubes with multiple cracks. Most of the previous studies on multiple cracks are confined to elastic analyses and only few studies have been done on the steam generator tubes failed by plastic collapse. Therefore, it is necessary to develop models which can be used to estimate the failure behavior of steam generator tubes with multiple cracks. The objective of this study is to verify the applicability of the optimum local failure prediction models proposed in the previous study. For this, plastic collapse tests are performed with the tube specimens containing two parallel through-wall cracks. The plastic collapse load of the steam generator tubes containing two parallel through-wall cracks are also estimated by using the proposed optimum global failure model and the applicability is investigated by comparing the estimated results with the experimental results. Also, the interaction effect between two cracks was evaluated to explain the plastic collapse behavior

  7. Two Phase Flow Split Model for Parallel Channels | Iloeje | Nigerian ...

    African Journals Online (AJOL)

    The model and code are capable of handling single and two phase flows, steady states and transients, up to ten parallel flow paths, simple and complicated geometries, including the boilers of fossil steam generators and nuclear power plants. A test calculation has been made with a simplified three-channel system ...

  8. Mapping robust parallel multigrid algorithms to scalable memory architectures

    Science.gov (United States)

    Overman, Andrea; Vanrosendale, John

    1993-01-01

    The convergence rate of standard multigrid algorithms degenerates on problems with stretched grids or anisotropic operators. The usual cure for this is the use of line or plane relaxation. However, multigrid algorithms based on line and plane relaxation have limited and awkward parallelism and are quite difficult to map effectively to highly parallel architectures. Newer multigrid algorithms that overcome anisotropy through the use of multiple coarse grids rather than relaxation are better suited to massively parallel architectures because they require only simple point-relaxation smoothers. In this paper, we look at the parallel implementation of a V-cycle multiple semicoarsened grid (MSG) algorithm on distributed-memory architectures such as the Intel iPSC/860 and Paragon computers. The MSG algorithms provide two levels of parallelism: parallelism within the relaxation or interpolation on each grid and across the grids on each multigrid level. Both levels of parallelism must be exploited to map these algorithms effectively to parallel architectures. This paper describes a mapping of an MSG algorithm to distributed-memory architectures that demonstrates how both levels of parallelism can be exploited. The result is a robust and effective multigrid algorithm for distributed-memory machines.

  9. Parallel processing for fluid dynamics applications

    International Nuclear Information System (INIS)

    Johnson, G.M.

    1989-01-01

    The impact of parallel processing on computational science and, in particular, on computational fluid dynamics is growing rapidly. In this paper, particular emphasis is given to developments which have occurred within the past two years. Parallel processing is defined and the reasons for its importance in high-performance computing are reviewed. Parallel computer architectures are classified according to the number and power of their processing units, their memory, and the nature of their connection scheme. Architectures which show promise for fluid dynamics applications are emphasized. Fluid dynamics problems are examined for parallelism inherent at the physical level. CFD algorithms and their mappings onto parallel architectures are discussed. Several example are presented to document the performance of fluid dynamics applications on present-generation parallel processing devices

  10. About Parallel Programming: Paradigms, Parallel Execution and Collaborative Systems

    Directory of Open Access Journals (Sweden)

    Loredana MOCEAN

    2009-01-01

    Full Text Available In the last years, there were made efforts for delineation of a stabile and unitary frame, where the problems of logical parallel processing must find solutions at least at the level of imperative languages. The results obtained by now are not at the level of the made efforts. This paper wants to be a little contribution at these efforts. We propose an overview in parallel programming, parallel execution and collaborative systems.

  11. The new Exponential Directional Iterative (EDI) 3-D Sn scheme for parallel adaptive differencing

    International Nuclear Information System (INIS)

    Sjoden, G.E.

    2005-01-01

    The new Exponential Directional Iterative (EDI) discrete ordinates (Sn) scheme for 3-D Cartesian Coordinates is presented. The EDI scheme is a logical extension of the positive, efficient Exponential Directional Weighted (EDW) Sn scheme currently used as the third level of the adaptive spatial differencing algorithm in the PENTRAN parallel discrete ordinates solver. Here, the derivation and advantages of the EDI scheme are presented; EDI uses EDW-rendered exponential coefficients as initial starting values to begin a fixed point iteration of the exponential coefficients. One issue that required evaluation was an iterative cutoff criterion to prevent the application of an unstable fixed point iteration; although this was needed in some cases, it was readily treated with a default to EDW. Iterative refinement of the exponential coefficients in EDI typically converged in fewer than four fixed point iterations. Moreover, EDI yielded more accurate angular fluxes compared to the other schemes tested, particularly in streaming conditions. Overall, it was found that the EDI scheme was up to an order of magnitude more accurate than the EDW scheme on a given mesh interval in streaming cases, and is potentially a good candidate as a fourth-level differencing scheme in the PENTRAN adaptive differencing sequence. The 3-D Cartesian computational cost of EDI was only about 20% more than the EDW scheme, and about 40% more than Diamond Zero (DZ). More evaluation and testing are required to determine suitable upgrade metrics for EDI to be fully integrated into the current adaptive spatial differencing sequence in PENTRAN. (author)

  12. The FORCE: A highly portable parallel programming language

    Science.gov (United States)

    Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger

    1989-01-01

    Here, it is explained why the FORCE parallel programming language is easily portable among six different shared-memory microprocessors, and how a two-level macro preprocessor makes it possible to hide low level machine dependencies and to build machine-independent high level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared memory multiprocessor executing them.

  13. The FORCE - A highly portable parallel programming language

    Science.gov (United States)

    Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger

    1989-01-01

    This paper explains why the FORCE parallel programming language is easily portable among six different shared-memory multiprocessors, and how a two-level macro preprocessor makes it possible to hide low-level machine dependencies and to build machine-independent high-level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared-memory multiprocessor executing them.

  14. Assessing Reliability of Two Versions of Vocabulary Levels Tests in Iranian Context

    Directory of Open Access Journals (Sweden)

    Aso Bayazidi

    2017-02-01

    Full Text Available This study examined the equivalence and reliability of the two versions of the Vocabulary Levels Test in an Iranian context. This study was motivated by the fact that the Vocabulary Levels test is increasingly being used in Iran for both research and pedagogical purposes without having been checked for validity and reliability in this context. The equivalence and reliability of the two versions of the test were examined through the parallel-form approach to reliability in Classical True Score theory. Seventy-five intermediate learners of English as a foreign language at the Iran Language Institute took the two versions of the test with one week interval between the two administrations in a counterbalanced fashion. To examine the equivalence of the two versions, the means and variances of the scores obtained for the two tests were compared using paired-sample t-test and one-way ANOVA, respectively. The results of the analyses indicated that the difference between the means of the two versions was significant, and the two versions cannot be considered as parallel forms. To assess the reliability of the two versions, the correlation between the scores obtained from them was estimated using Pearson Product Moment correlation. The results of the analyses showed that the two versions are highly correlated and are reliable tests. It is concluded that the two versions should not be treated as equivalent in longitudinal and gain score studies.

  15. Evaluation of fault-normal/fault-parallel directions rotated ground motions for response history analysis of an instrumented six-story building

    Science.gov (United States)

    Kalkan, Erol; Kwong, Neal S.

    2012-01-01

    According to regulatory building codes in United States (for example, 2010 California Building Code), at least two horizontal ground-motion components are required for three-dimensional (3D) response history analysis (RHA) of buildings. For sites within 5 km of an active fault, these records should be rotated to fault-normal/fault-parallel (FN/FP) directions, and two RHA analyses should be performed separately (when FN and then FP are aligned with the transverse direction of the structural axes). It is assumed that this approach will lead to two sets of responses that envelope the range of possible responses over all nonredundant rotation angles. This assumption is examined here using a 3D computer model of a six-story reinforced-concrete instrumented building subjected to an ensemble of bidirectional near-fault ground motions. Peak responses of engineering demand parameters (EDPs) were obtained for rotation angles ranging from 0° through 180° for evaluating the FN/FP directions. It is demonstrated that rotating ground motions to FN/FP directions (1) does not always lead to the maximum responses over all angles, (2) does not always envelope the range of possible responses, and (3) does not provide maximum responses for all EDPs simultaneously even if it provides a maximum response for a specific EDP.

  16. Computational cost estimates for parallel shared memory isogeometric multi-frontal solvers

    KAUST Repository

    Woźniak, Maciej; Kuźnik, Krzysztof M.; Paszyński, Maciej R.; Calo, Victor M.; Pardo, D.

    2014-01-01

    In this paper we present computational cost estimates for parallel shared memory isogeometric multi-frontal solvers. The estimates show that the ideal isogeometric shared memory parallel direct solver scales as O( p2log(N/p)) for one dimensional problems, O(Np2) for two dimensional problems, and O(N4/3p2) for three dimensional problems, where N is the number of degrees of freedom, and p is the polynomial order of approximation. The computational costs of the shared memory parallel isogeometric direct solver are compared with those corresponding to the sequential isogeometric direct solver, being the latest equal to O(N p2) for the one dimensional case, O(N1.5p3) for the two dimensional case, and O(N2p3) for the three dimensional case. The shared memory version significantly reduces both the scalability in terms of N and p. Theoretical estimates are compared with numerical experiments performed with linear, quadratic, cubic, quartic, and quintic B-splines, in one and two spatial dimensions. © 2014 Elsevier Ltd. All rights reserved.

  17. Computational cost estimates for parallel shared memory isogeometric multi-frontal solvers

    KAUST Repository

    Woźniak, Maciej

    2014-06-01

    In this paper we present computational cost estimates for parallel shared memory isogeometric multi-frontal solvers. The estimates show that the ideal isogeometric shared memory parallel direct solver scales as O( p2log(N/p)) for one dimensional problems, O(Np2) for two dimensional problems, and O(N4/3p2) for three dimensional problems, where N is the number of degrees of freedom, and p is the polynomial order of approximation. The computational costs of the shared memory parallel isogeometric direct solver are compared with those corresponding to the sequential isogeometric direct solver, being the latest equal to O(N p2) for the one dimensional case, O(N1.5p3) for the two dimensional case, and O(N2p3) for the three dimensional case. The shared memory version significantly reduces both the scalability in terms of N and p. Theoretical estimates are compared with numerical experiments performed with linear, quadratic, cubic, quartic, and quintic B-splines, in one and two spatial dimensions. © 2014 Elsevier Ltd. All rights reserved.

  18. Vortex structure behind highly heated two cylinders in parallel arrangements

    International Nuclear Information System (INIS)

    Kurita, Eiichirou; Yahagi, Yuji

    2008-01-01

    Vortex structures behind twin, highly heated cylinders in parallel arrangements have been investigated experimentally. The experiments were conducted under the following conditions: cylinder diameter, D=4 mm; mean flow velocity, U ∞ =1.0 m/s; Reynolds number, Re=250; cylinder clearance, S/D=0.5 - 1.4; and cylinder heat flux, q=0 - 72.6 kW/m 2 . For S/D > 1.2, the Karman vortex street is formed alternately behind each cylinder divided on the slit flow. The slit flow velocity increases with a decrease in S/D and decreases with increasing heat flux. For S/D 2 ). As a result, the increased local kinematic viscosity and S/D play a key role for the vortex structure and formation behind arrangements of two parallel cylinders. (author)

  19. A new decomposition method for parallel processing multi-level optimization

    International Nuclear Information System (INIS)

    Park, Hyung Wook; Kim, Min Soo; Choi, Dong Hoon

    2002-01-01

    In practical designs, most of the multidisciplinary problems have a large-size and complicate design system. Since multidisciplinary problems have hundreds of analyses and thousands of variables, the grouping of analyses and the order of the analyses in the group affect the speed of the total design cycle. Therefore, it is very important to reorder and regroup the original design processes in order to minimize the total computational cost by decomposing large multidisciplinary problems into several MultiDisciplinary Analysis SubSystems (MDASS) and by processing them in parallel. In this study, a new decomposition method is proposed for parallel processing of multidisciplinary design optimization, such as Collaborative Optimization (CO) and Individual Discipline Feasible (IDF) method. Numerical results for two example problems are presented to show the feasibility of the proposed method

  20. Parallel Architectures and Parallel Algorithms for Integrated Vision Systems. Ph.D. Thesis

    Science.gov (United States)

    Choudhary, Alok Nidhi

    1989-01-01

    Computer vision is regarded as one of the most complex and computationally intensive problems. An integrated vision system (IVS) is a system that uses vision algorithms from all levels of processing to perform for a high level application (e.g., object recognition). An IVS normally involves algorithms from low level, intermediate level, and high level vision. Designing parallel architectures for vision systems is of tremendous interest to researchers. Several issues are addressed in parallel architectures and parallel algorithms for integrated vision systems.

  1. A Novel Design of 4-Class BCI Using Two Binary Classifiers and Parallel Mental Tasks

    Directory of Open Access Journals (Sweden)

    Tao Geng

    2008-01-01

    Full Text Available A novel 4-class single-trial brain computer interface (BCI based on two (rather than four or more binary linear discriminant analysis (LDA classifiers is proposed, which is called a “parallel BCI.” Unlike other BCIs where mental tasks are executed and classified in a serial way one after another, the parallel BCI uses properly designed parallel mental tasks that are executed on both sides of the subject body simultaneously, which is the main novelty of the BCI paradigm used in our experiments. Each of the two binary classifiers only classifies the mental tasks executed on one side of the subject body, and the results of the two binary classifiers are combined to give the result of the 4-class BCI. Data was recorded in experiments with both real movement and motor imagery in 3 able-bodied subjects. Artifacts were not detected or removed. Offline analysis has shown that, in some subjects, the parallel BCI can generate a higher accuracy than a conventional 4-class BCI, although both of them have used the same feature selection and classification algorithms.

  2. Combining Compile-Time and Run-Time Parallelization

    Directory of Open Access Journals (Sweden)

    Sungdo Moon

    1999-01-01

    Full Text Available This paper demonstrates that significant improvements to automatic parallelization technology require that existing systems be extended in two ways: (1 they must combine high‐quality compile‐time analysis with low‐cost run‐time testing; and (2 they must take control flow into account during analysis. We support this claim with the results of an experiment that measures the safety of parallelization at run time for loops left unparallelized by the Stanford SUIF compiler’s automatic parallelization system. We present results of measurements on programs from two benchmark suites – SPECFP95 and NAS sample benchmarks – which identify inherently parallel loops in these programs that are missed by the compiler. We characterize remaining parallelization opportunities, and find that most of the loops require run‐time testing, analysis of control flow, or some combination of the two. We present a new compile‐time analysis technique that can be used to parallelize most of these remaining loops. This technique is designed to not only improve the results of compile‐time parallelization, but also to produce low‐cost, directed run‐time tests that allow the system to defer binding of parallelization until run‐time when safety cannot be proven statically. We call this approach predicated array data‐flow analysis. We augment array data‐flow analysis, which the compiler uses to identify independent and privatizable arrays, by associating predicates with array data‐flow values. Predicated array data‐flow analysis allows the compiler to derive “optimistic” data‐flow values guarded by predicates; these predicates can be used to derive a run‐time test guaranteeing the safety of parallelization.

  3. Reduction of momentum transfer rates by parallel electric fields: A two-fluid demonstration

    International Nuclear Information System (INIS)

    Delamere, P.A.; Stenbaek-Nielsen, H.C.; Otto, A.

    2002-01-01

    Momentum transfer between an ionized gas cloud moving relative to an ambient magnetized plasma is a general problem in space plasma physics. Obvious examples include the Io-Jupiter interaction, comets, and coronal mass ejections. Active plasma experiments have demonstrated that momentum transfer rates associated with Alfven wave propagation are poorly understood. Barium injection experiments from the Combined Release and Radiation Effects Satellite (CRRES) have shown that dense ionized clouds are capable of ExB drifting over large distances perpendicular to the magnetic field. The CRRES 'skidding' distances were much larger than predicted by magnetohydrodynamic theory and it has been proposed that parallel electric fields were a key component in the skidding phenomenon. A two-fluid code was used to demonstrate the role of parallel electric fields in reducing momentum transfer between two distinct plasma populations. In this study, a dense plasma was initialized moving relative to an ambient plasma and perpendicular to B. Parallel electric fields were introduced via a friction term in the electron momentum equation and the collision frequency was scaled in proportion to the field-aligned current density. The simulation results showed that parallel electric fields decreased the decelerating magnetic tension force on the plasma cloud through a magnetic diffusion/reconnection process

  4. Direct reconstruction of parametric images for brain PET with event-by-event motion correction: evaluation in two tracers across count levels

    Science.gov (United States)

    Germino, Mary; Gallezot, Jean-Dominque; Yan, Jianhua; Carson, Richard E.

    2017-07-01

    Parametric images for dynamic positron emission tomography (PET) are typically generated by an indirect method, i.e. reconstructing a time series of emission images, then fitting a kinetic model to each voxel time activity curve. Alternatively, ‘direct reconstruction’, incorporates the kinetic model into the reconstruction algorithm itself, directly producing parametric images from projection data. Direct reconstruction has been shown to achieve parametric images with lower standard error than the indirect method. Here, we present direct reconstruction for brain PET using event-by-event motion correction of list-mode data, applied to two tracers. Event-by-event motion correction was implemented for direct reconstruction in the Parametric Motion-compensation OSEM List-mode Algorithm for Resolution-recovery reconstruction. The direct implementation was tested on simulated and human datasets with tracers [11C]AFM (serotonin transporter) and [11C]UCB-J (synaptic density), which follow the 1-tissue compartment model. Rigid head motion was tracked with the Vicra system. Parametric images of K 1 and distribution volume (V T  =  K 1/k 2) were compared to those generated by the indirect method by regional coefficient of variation (CoV). Performance across count levels was assessed using sub-sampled datasets. For simulated and real datasets at high counts, the two methods estimated K 1 and V T with comparable accuracy. At lower count levels, the direct method was substantially more robust to outliers than the indirect method. Compared to the indirect method, direct reconstruction reduced regional K 1 CoV by 35-48% (simulated dataset), 39-43% ([11C]AFM dataset) and 30-36% ([11C]UCB-J dataset) across count levels (averaged over regions at matched iteration); V T CoV was reduced by 51-58%, 54-60% and 30-46%, respectively. Motion correction played an important role in the dataset with larger motion: correction increased regional V T by 51% on average in the [11C

  5. Improving matrix-vector product performance and multi-level preconditioning for the parallel PCG package

    Energy Technology Data Exchange (ETDEWEB)

    McLay, R.T.; Carey, G.F.

    1996-12-31

    In this study we consider parallel solution of sparse linear systems arising from discretized PDE`s. As part of our continuing work on our parallel PCG Solver package, we have made improvements in two areas. The first is improving the performance of the matrix-vector product. Here on regular finite-difference grids, we are able to use the cache memory more efficiently for smaller domains or where there are multiple degrees of freedom. The second problem of interest in the present work is the construction of preconditioners in the context of the parallel PCG solver we are developing. Here the problem is partitioned over a set of processors subdomains and the matrix-vector product for PCG is carried out in parallel for overlapping grid subblocks. For problems of scaled speedup, the actual rate of convergence of the unpreconditioned system deteriorates as the mesh is refined. Multigrid and subdomain strategies provide a logical approach to resolving the problem. We consider the parallel trade-offs between communication and computation and provide a complexity analysis of a representative algorithm. Some preliminary calculations using the parallel package and comparisons with other preconditioners are provided together with parallel performance results.

  6. The language parallel Pascal and other aspects of the massively parallel processor

    Science.gov (United States)

    Reeves, A. P.; Bruner, J. D.

    1982-01-01

    A high level language for the Massively Parallel Processor (MPP) was designed. This language, called Parallel Pascal, is described in detail. A description of the language design, a description of the intermediate language, Parallel P-Code, and details for the MPP implementation are included. Formal descriptions of Parallel Pascal and Parallel P-Code are given. A compiler was developed which converts programs in Parallel Pascal into the intermediate Parallel P-Code language. The code generator to complete the compiler for the MPP is being developed independently. A Parallel Pascal to Pascal translator was also developed. The architecture design for a VLSI version of the MPP was completed with a description of fault tolerant interconnection networks. The memory arrangement aspects of the MPP are discussed and a survey of other high level languages is given.

  7. MINARET: Towards a time-dependent neutron transport parallel solver

    International Nuclear Information System (INIS)

    Baudron, A.M.; Lautard, J.J.; Maday, Y.; Mula, O.

    2013-01-01

    We present the newly developed time-dependent 3D multigroup discrete ordinates neutron transport solver that has recently been implemented in the MINARET code. The solver is the support for a study about computing acceleration techniques that involve parallel architectures. In this work, we will focus on the parallelization of two of the variables involved in our equation: the angular directions and the time. This last variable has been parallelized by a (time) domain decomposition method called the para-real in time algorithm. (authors)

  8. Scalable parallel prefix solvers for discrete ordinates transport

    International Nuclear Information System (INIS)

    Pautz, S.; Pandya, T.; Adams, M.

    2009-01-01

    The well-known 'sweep' algorithm for inverting the streaming-plus-collision term in first-order deterministic radiation transport calculations has some desirable numerical properties. However, it suffers from parallel scaling issues caused by a lack of concurrency. The maximum degree of concurrency, and thus the maximum parallelism, grows more slowly than the problem size for sweeps-based solvers. We investigate a new class of parallel algorithms that involves recasting the streaming-plus-collision problem in prefix form and solving via cyclic reduction. This method, although computationally more expensive at low levels of parallelism than the sweep algorithm, offers better theoretical scalability properties. Previous work has demonstrated this approach for one-dimensional calculations; we show how to extend it to multidimensional calculations. Notably, for multiple dimensions it appears that this approach is limited to long-characteristics discretizations; other discretizations cannot be cast in prefix form. We implement two variants of the algorithm within the radlib/SCEPTRE transport code library at Sandia National Laboratories and show results on two different massively parallel systems. Both the 'forward' and 'symmetric' solvers behave similarly, scaling well to larger degrees of parallelism then sweeps-based solvers. We do observe some issues at the highest levels of parallelism (relative to the system size) and discuss possible causes. We conclude that this approach shows good potential for future parallel systems, but the parallel scalability will depend heavily on the architecture of the communication networks of these systems. (authors)

  9. A parallel orbital-updating based plane-wave basis method for electronic structure calculations

    International Nuclear Information System (INIS)

    Pan, Yan; Dai, Xiaoying; Gironcoli, Stefano de; Gong, Xin-Gao; Rignanese, Gian-Marco; Zhou, Aihui

    2017-01-01

    Highlights: • Propose three parallel orbital-updating based plane-wave basis methods for electronic structure calculations. • These new methods can avoid the generating of large scale eigenvalue problems and then reduce the computational cost. • These new methods allow for two-level parallelization which is particularly interesting for large scale parallelization. • Numerical experiments show that these new methods are reliable and efficient for large scale calculations on modern supercomputers. - Abstract: Motivated by the recently proposed parallel orbital-updating approach in real space method , we propose a parallel orbital-updating based plane-wave basis method for electronic structure calculations, for solving the corresponding eigenvalue problems. In addition, we propose two new modified parallel orbital-updating methods. Compared to the traditional plane-wave methods, our methods allow for two-level parallelization, which is particularly interesting for large scale parallelization. Numerical experiments show that these new methods are more reliable and efficient for large scale calculations on modern supercomputers.

  10. Elimination of zero sequence circulating current between parallel operating three-level inverters

    DEFF Research Database (Denmark)

    Li, Kai; Wang, Xiaodong; Dong, Zhenhua

    2016-01-01

    In order to suppress the zero sequence circulating currents (ZSCCs) between parallel operating three level voltage source inverters with common AC and DC buses, a common mode voltage reduction PWM (CMVR-PWM) technique and neural point potentials (NPPs) control based method is proposed in this paper...

  11. Locating hardware faults in a parallel computer

    Science.gov (United States)

    Archer, Charles J.; Megerian, Mark G.; Ratterman, Joseph D.; Smith, Brian E.

    2010-04-13

    Locating hardware faults in a parallel computer, including defining within a tree network of the parallel computer two or more sets of non-overlapping test levels of compute nodes of the network that together include all the data communications links of the network, each non-overlapping test level comprising two or more adjacent tiers of the tree; defining test cells within each non-overlapping test level, each test cell comprising a subtree of the tree including a subtree root compute node and all descendant compute nodes of the subtree root compute node within a non-overlapping test level; performing, separately on each set of non-overlapping test levels, an uplink test on all test cells in a set of non-overlapping test levels; and performing, separately from the uplink tests and separately on each set of non-overlapping test levels, a downlink test on all test cells in a set of non-overlapping test levels.

  12. Parallel phase model : a programming model for high-end parallel machines with manycores.

    Energy Technology Data Exchange (ETDEWEB)

    Wu, Junfeng (Syracuse University, Syracuse, NY); Wen, Zhaofang; Heroux, Michael Allen; Brightwell, Ronald Brian

    2009-04-01

    This paper presents a parallel programming model, Parallel Phase Model (PPM), for next-generation high-end parallel machines based on a distributed memory architecture consisting of a networked cluster of nodes with a large number of cores on each node. PPM has a unified high-level programming abstraction that facilitates the design and implementation of parallel algorithms to exploit both the parallelism of the many cores and the parallelism at the cluster level. The programming abstraction will be suitable for expressing both fine-grained and coarse-grained parallelism. It includes a few high-level parallel programming language constructs that can be added as an extension to an existing (sequential or parallel) programming language such as C; and the implementation of PPM also includes a light-weight runtime library that runs on top of an existing network communication software layer (e.g. MPI). Design philosophy of PPM and details of the programming abstraction are also presented. Several unstructured applications that inherently require high-volume random fine-grained data accesses have been implemented in PPM with very promising results.

  13. A Two-Pass Exact Algorithm for Selection on Parallel Disk Systems.

    Science.gov (United States)

    Mi, Tian; Rajasekaran, Sanguthevar

    2013-07-01

    Numerous OLAP queries process selection operations of "top N", median, "top 5%", in data warehousing applications. Selection is a well-studied problem that has numerous applications in the management of data and databases since, typically, any complex data query can be reduced to a series of basic operations such as sorting and selection. The parallel selection has also become an important fundamental operation, especially after parallel databases were introduced. In this paper, we present a deterministic algorithm Recursive Sampling Selection (RSS) to solve the exact out-of-core selection problem, which we show needs no more than (2 + ε ) passes ( ε being a very small fraction). We have compared our RSS algorithm with two other algorithms in the literature, namely, the Deterministic Sampling Selection and QuickSelect on the Parallel Disks Systems. Our analysis shows that DSS is a (2 + ε )-pass algorithm when the total number of input elements N is a polynomial in the memory size M (i.e., N = M c for some constant c ). While, our proposed algorithm RSS runs in (2 + ε ) passes without any assumptions. Experimental results indicate that both RSS and DSS outperform QuickSelect on the Parallel Disks Systems. Especially, the proposed algorithm RSS is more scalable and robust to handle big data when the input size is far greater than the core memory size, including the case of N ≫ M c .

  14. Diderot: a Domain-Specific Language for Portable Parallel Scientific Visualization and Image Analysis.

    Science.gov (United States)

    Kindlmann, Gordon; Chiw, Charisee; Seltzer, Nicholas; Samuels, Lamont; Reppy, John

    2016-01-01

    Many algorithms for scientific visualization and image analysis are rooted in the world of continuous scalar, vector, and tensor fields, but are programmed in low-level languages and libraries that obscure their mathematical foundations. Diderot is a parallel domain-specific language that is designed to bridge this semantic gap by providing the programmer with a high-level, mathematical programming notation that allows direct expression of mathematical concepts in code. Furthermore, Diderot provides parallel performance that takes advantage of modern multicore processors and GPUs. The high-level notation allows a concise and natural expression of the algorithms and the parallelism allows efficient execution on real-world datasets.

  15. Diffraction of love waves by two parallel perfectly weak half planes

    International Nuclear Information System (INIS)

    Asghar, S.; Zaman, F.D.; Ayub, M.

    1986-04-01

    We consider the diffraction of Love waves by two parallel perfectly weak half planes in a layer overlying a half space. The problem is formulated in terms of the Wiener-Hopf equations in the transformed plane. The transmitted waves are then calculated using the Wiener-Hopf procedure and inverse transforms. (author)

  16. Effects of Parallel Channel Interactions on Two-Phase Flow Split in ...

    African Journals Online (AJOL)

    The tests would aid the development of a realistic transient computer model for tracking the distribution of two-phase flows into the multiple parallel channels of a Nuclear Reactor, during Loss of Coolant Accidents (LOCA), and were performed at the General Electric Nuclear Energy Division Laboratory, California. The test ...

  17. Shared memory parallelism for 3D cartesian discrete ordinates solver

    International Nuclear Information System (INIS)

    Moustafa, S.; Dutka-Malen, I.; Plagne, L.; Poncot, A.; Ramet, P.

    2013-01-01

    This paper describes the design and the performance of DOMINO, a 3D Cartesian SN solver that implements two nested levels of parallelism (multi-core + SIMD - Single Instruction on Multiple Data) on shared memory computation nodes. DOMINO is written in C++, a multi-paradigm programming language that enables the use of powerful and generic parallel programming tools such as Intel TBB and Eigen. These two libraries allow us to combine multi-thread parallelism with vector operations in an efficient and yet portable way. As a result, DOMINO can exploit the full power of modern multi-core processors and is able to tackle very large simulations, that usually require large HPC clusters, using a single computing node. For example, DOMINO solves a 3D full core PWR eigenvalue problem involving 26 energy groups, 288 angular directions (S16), 46*10 6 spatial cells and 1*10 12 DoFs within 11 hours on a single 32-core SMP node. This represents a sustained performance of 235 GFlops and 40.74% of the SMP node peak performance for the DOMINO sweep implementation. The very high Flops/Watt ratio of DOMINO makes it a very interesting building block for a future many-nodes nuclear simulation tool. (authors)

  18. Pros and cons of rotating ground motion records to fault-normal/parallel directions for response history analysis of buildings

    Science.gov (United States)

    Kalkan, Erol; Kwong, Neal S.

    2014-01-01

    According to the regulatory building codes in the United States (e.g., 2010 California Building Code), at least two horizontal ground motion components are required for three-dimensional (3D) response history analysis (RHA) of building structures. For sites within 5 km of an active fault, these records should be rotated to fault-normal/fault-parallel (FN/FP) directions, and two RHAs should be performed separately (when FN and then FP are aligned with the transverse direction of the structural axes). It is assumed that this approach will lead to two sets of responses that envelope the range of possible responses over all nonredundant rotation angles. This assumption is examined here, for the first time, using a 3D computer model of a six-story reinforced-concrete instrumented building subjected to an ensemble of bidirectional near-fault ground motions. Peak values of engineering demand parameters (EDPs) were computed for rotation angles ranging from 0 through 180° to quantify the difference between peak values of EDPs over all rotation angles and those due to FN/FP direction rotated motions. It is demonstrated that rotating ground motions to FN/FP directions (1) does not always lead to the maximum responses over all angles, (2) does not always envelope the range of possible responses, and (3) does not provide maximum responses for all EDPs simultaneously even if it provides a maximum response for a specific EDP.

  19. The Use of Two Culturing Methods in Parallel Reveals a High Prevalence and Diversity of Arcobacter spp. in a Wastewater Treatment Plant

    Directory of Open Access Journals (Sweden)

    Arturo Levican

    2016-01-01

    Full Text Available The genus Arcobacter includes species considered emerging food and waterborne pathogens. Despite Arcobacter has been linked to the presence of faecal pollution, few studies have investigated its prevalence in wastewater, and the only isolated species were Arcobacter butzleri and Arcobacter cryaerophilus. This study aimed to establish the prevalence of Arcobacter spp. at a WWTP using in parallel two culturing methods (direct plating and culturing after enrichment and a direct detection by m-PCR. In addition, the genetic diversity of the isolates was established using the ERIC-PCR genotyping method. Most of the wastewater samples (96.7% were positive for Arcobacter and a high genetic diversity was observed among the 651 investigated isolates that belonged to 424 different ERIC genotypes. However, only few strains persisted at different dates or sampling points. The use of direct plating in parallel with culturing after enrichment allowed recovering the species A. butzleri, A. cryaerophilus, Arcobacter thereius, Arcobacter defluvii, Arcobacter skirrowii, Arcobacter ellisii, Arcobacter cloacae, and Arcobacter nitrofigilis, most of them isolated for the first time from wastewater. The predominant species was A. butzleri, however, by direct plating predominated A. cryaerophilus. Therefore, the overall predominance of A. butzleri was a bias associated with the use of enrichment.

  20. A parallel algorithm for the two-dimensional time fractional diffusion equation with implicit difference method.

    Science.gov (United States)

    Gong, Chunye; Bao, Weimin; Tang, Guojian; Jiang, Yuewen; Liu, Jie

    2014-01-01

    It is very time consuming to solve fractional differential equations. The computational complexity of two-dimensional fractional differential equation (2D-TFDE) with iterative implicit finite difference method is O(M(x)M(y)N(2)). In this paper, we present a parallel algorithm for 2D-TFDE and give an in-depth discussion about this algorithm. A task distribution model and data layout with virtual boundary are designed for this parallel algorithm. The experimental results show that the parallel algorithm compares well with the exact solution. The parallel algorithm on single Intel Xeon X5540 CPU runs 3.16-4.17 times faster than the serial algorithm on single CPU core. The parallel efficiency of 81 processes is up to 88.24% compared with 9 processes on a distributed memory cluster system. We do think that the parallel computing technology will become a very basic method for the computational intensive fractional applications in the near future.

  1. High performance parallel computers for science

    International Nuclear Information System (INIS)

    Nash, T.; Areti, H.; Atac, R.; Biel, J.; Cook, A.; Deppe, J.; Edel, M.; Fischler, M.; Gaines, I.; Hance, R.

    1989-01-01

    This paper reports that Fermilab's Advanced Computer Program (ACP) has been developing cost effective, yet practical, parallel computers for high energy physics since 1984. The ACP's latest developments are proceeding in two directions. A Second Generation ACP Multiprocessor System for experiments will include $3500 RISC processors each with performance over 15 VAX MIPS. To support such high performance, the new system allows parallel I/O, parallel interprocess communication, and parallel host processes. The ACP Multi-Array Processor, has been developed for theoretical physics. Each $4000 node is a FORTRAN or C programmable pipelined 20 Mflops (peak), 10 MByte single board computer. These are plugged into a 16 port crossbar switch crate which handles both inter and intra crate communication. The crates are connected in a hypercube. Site oriented applications like lattice gauge theory are supported by system software called CANOPY, which makes the hardware virtually transparent to users. A 256 node, 5 GFlop, system is under construction

  2. Parallelization of 2-D lattice Boltzmann codes

    International Nuclear Information System (INIS)

    Suzuki, Soichiro; Kaburaki, Hideo; Yokokawa, Mitsuo.

    1996-03-01

    Lattice Boltzmann (LB) codes to simulate two dimensional fluid flow are developed on vector parallel computer Fujitsu VPP500 and scalar parallel computer Intel Paragon XP/S. While a 2-D domain decomposition method is used for the scalar parallel LB code, a 1-D domain decomposition method is used for the vector parallel LB code to be vectorized along with the axis perpendicular to the direction of the decomposition. High parallel efficiency of 95.1% by the vector parallel calculation on 16 processors with 1152x1152 grid and 88.6% by the scalar parallel calculation on 100 processors with 800x800 grid are obtained. The performance models are developed to analyze the performance of the LB codes. It is shown by our performance models that the execution speed of the vector parallel code is about one hundred times faster than that of the scalar parallel code with the same number of processors up to 100 processors. We also analyze the scalability in keeping the available memory size of one processor element at maximum. Our performance model predicts that the execution time of the vector parallel code increases about 3% on 500 processors. Although the 1-D domain decomposition method has in general a drawback in the interprocessor communication, the vector parallel LB code is still suitable for the large scale and/or high resolution simulations. (author)

  3. Parallelization of 2-D lattice Boltzmann codes

    Energy Technology Data Exchange (ETDEWEB)

    Suzuki, Soichiro; Kaburaki, Hideo; Yokokawa, Mitsuo

    1996-03-01

    Lattice Boltzmann (LB) codes to simulate two dimensional fluid flow are developed on vector parallel computer Fujitsu VPP500 and scalar parallel computer Intel Paragon XP/S. While a 2-D domain decomposition method is used for the scalar parallel LB code, a 1-D domain decomposition method is used for the vector parallel LB code to be vectorized along with the axis perpendicular to the direction of the decomposition. High parallel efficiency of 95.1% by the vector parallel calculation on 16 processors with 1152x1152 grid and 88.6% by the scalar parallel calculation on 100 processors with 800x800 grid are obtained. The performance models are developed to analyze the performance of the LB codes. It is shown by our performance models that the execution speed of the vector parallel code is about one hundred times faster than that of the scalar parallel code with the same number of processors up to 100 processors. We also analyze the scalability in keeping the available memory size of one processor element at maximum. Our performance model predicts that the execution time of the vector parallel code increases about 3% on 500 processors. Although the 1-D domain decomposition method has in general a drawback in the interprocessor communication, the vector parallel LB code is still suitable for the large scale and/or high resolution simulations. (author).

  4. Novel encoding and updating of positional, or directional, spatial cues are processed by distinct hippocampal subfields: Evidence for parallel information processing and the "what" stream.

    Science.gov (United States)

    Hoang, Thu-Huong; Aliane, Verena; Manahan-Vaughan, Denise

    2018-05-01

    The specific roles of hippocampal subfields in spatial information processing and encoding are, as yet, unclear. The parallel map theory postulates that whereas the CA1 processes discrete environmental features (positional cues used to generate a "sketch map"), the dentate gyrus (DG) processes large navigation-relevant landmarks (directional cues used to generate a "bearing map"). Additionally, the two-streams hypothesis suggests that hippocampal subfields engage in differentiated processing of information from the "where" and the "what" streams. We investigated these hypotheses by analyzing the effect of exploration of discrete "positional" features and large "directional" spatial landmarks on hippocampal neuronal activity in rats. As an indicator of neuronal activity we measured the mRNA induction of the immediate early genes (IEGs), Arc and Homer1a. We observed an increase of this IEG mRNA in CA1 neurons of the distal neuronal compartment and in proximal CA3, after novel spatial exploration of discrete positional cues, whereas novel exploration of directional cues led to increases in IEG mRNA in the lower blade of the DG and in proximal CA3. Strikingly, the CA1 did not respond to directional cues and the DG did not respond to positional cues. Our data provide evidence for both the parallel map theory and the two-streams hypothesis and suggest a precise compartmentalization of the encoding and processing of "what" and "where" information occurs within the hippocampal subfields. © 2018 The Authors. Hippocampus Published by Wiley Periodicals, Inc.

  5. High Performance Computation of a Jet in Crossflow by Lattice Boltzmann Based Parallel Direct Numerical Simulation

    Directory of Open Access Journals (Sweden)

    Jiang Lei

    2015-01-01

    Full Text Available Direct numerical simulation (DNS of a round jet in crossflow based on lattice Boltzmann method (LBM is carried out on multi-GPU cluster. Data parallel SIMT (single instruction multiple thread characteristic of GPU matches the parallelism of LBM well, which leads to the high efficiency of GPU on the LBM solver. With present GPU settings (6 Nvidia Tesla K20M, the present DNS simulation can be completed in several hours. A grid system of 1.5 × 108 is adopted and largest jet Reynolds number reaches 3000. The jet-to-free-stream velocity ratio is set as 3.3. The jet is orthogonal to the mainstream flow direction. The validated code shows good agreement with experiments. Vortical structures of CRVP, shear-layer vortices and horseshoe vortices, are presented and analyzed based on velocity fields and vorticity distributions. Turbulent statistical quantities of Reynolds stress are also displayed. Coherent structures are revealed in a very fine resolution based on the second invariant of the velocity gradients.

  6. Self-balanced modulation and magnetic rebalancing method for parallel multilevel inverters

    Science.gov (United States)

    Li, Hui; Shi, Yanjun

    2017-11-28

    A self-balanced modulation method and a closed-loop magnetic flux rebalancing control method for parallel multilevel inverters. The combination of the two methods provides for balancing of the magnetic flux of the inter-cell transformers (ICTs) of the parallel multilevel inverters without deteriorating the quality of the output voltage. In various embodiments a parallel multi-level inverter modulator is provide including a multi-channel comparator to generate a multiplexed digitized ideal waveform for a parallel multi-level inverter and a finite state machine (FSM) module coupled to the parallel multi-channel comparator, the FSM module to receive the multiplexed digitized ideal waveform and to generate a pulse width modulated gate-drive signal for each switching device of the parallel multi-level inverter. The system and method provides for optimization of the output voltage spectrum without influence the magnetic balancing.

  7. Parallel Newton-Krylov-Schwarz algorithms for the transonic full potential equation

    Science.gov (United States)

    Cai, Xiao-Chuan; Gropp, William D.; Keyes, David E.; Melvin, Robin G.; Young, David P.

    1996-01-01

    We study parallel two-level overlapping Schwarz algorithms for solving nonlinear finite element problems, in particular, for the full potential equation of aerodynamics discretized in two dimensions with bilinear elements. The overall algorithm, Newton-Krylov-Schwarz (NKS), employs an inexact finite-difference Newton method and a Krylov space iterative method, with a two-level overlapping Schwarz method as a preconditioner. We demonstrate that NKS, combined with a density upwinding continuation strategy for problems with weak shocks, is robust and, economical for this class of mixed elliptic-hyperbolic nonlinear partial differential equations, with proper specification of several parameters. We study upwinding parameters, inner convergence tolerance, coarse grid density, subdomain overlap, and the level of fill-in in the incomplete factorization, and report their effect on numerical convergence rate, overall execution time, and parallel efficiency on a distributed-memory parallel computer.

  8. Theoretical investigations on two-phase flow instability in parallel channels under axial non-uniform heating

    International Nuclear Information System (INIS)

    Lu, Xiaodong; Wu, Yingwei; Zhou, Linglan; Tian, Wenxi; Su, Guanghui; Qiu, Suizheng; Zhang, Hong

    2014-01-01

    Highlights: • We developed a model based on homogeneous flow model to analyze two-phase flow instability in parallel channels. • The influence of axial non-uniform heating on the system stability has been investigated. • Influences of various factors on system instability under cosine heat flux have been studied. • The system under top-peaked heat flux is the most stable system. - Abstract: Two-phase flow instability in parallel channels heated by axial non-uniform heat flux has been theoretically studied in this paper. The system control equations of parallel channels were established based on the homogeneous flow model in two-phase region. Semi-implicit finite-difference scheme and staggered mesh method were used to discretize the equations, and the difference equations were solved by chasing method. Cosine, bottom-peaked and top-peaked heat fluxes were used to study the influence of non-uniform heating on two-phase flow instability of the parallel channels system. The marginal stability boundaries (MSB) of parallel channels and three-dimensional instability spaces (or instability reefs) under different heat flux conditions have been obtained. Compared with axial uniform heating, axial non-uniform heating will affect the system stability. Cosine and bottom-peaked heat fluxes can destabilize the system stability in high inlet subcooling region, while the opposite effect can be found in low inlet subcooling region. However, top-peaked heat flux can enhance the system stability in the whole region. In addition, for cosine heat flux, increasing the system pressure or inlet resistance coefficient can strengthen the system stability, and increasing the heating power will destabilize the system stability. The influence of inlet subcooling number on the system stability is multi-valued under cosine heat flux

  9. Entropy resistance analyses of a two-stream parallel flow heat exchanger with viscous heating

    International Nuclear Information System (INIS)

    Cheng Xue-Tao; Liang Xin-Gang

    2013-01-01

    Heat exchangers are widely used in industry, and analyses and optimizations of the performance of heat exchangers are important topics. In this paper, we define the concept of entropy resistance based on the entropy generation analyses of a one-dimensional heat transfer process. With this concept, a two-stream parallel flow heat exchanger with viscous heating is analyzed and discussed. It is found that the minimization of entropy resistance always leads to the maximum heat transfer rate for the discussed two-stream parallel flow heat exchanger, while the minimizations of entropy generation rate, entropy generation numbers, and revised entropy generation number do not always. (general)

  10. Scalability of Parallel Spatial Direct Numerical Simulations on Intel Hypercube and IBM SP1 and SP2

    Science.gov (United States)

    Joslin, Ronald D.; Hanebutte, Ulf R.; Zubair, Mohammad

    1995-01-01

    The implementation and performance of a parallel spatial direct numerical simulation (PSDNS) approach on the Intel iPSC/860 hypercube and IBM SP1 and SP2 parallel computers is documented. Spatially evolving disturbances associated with the laminar-to-turbulent transition in boundary-layer flows are computed with the PSDNS code. The feasibility of using the PSDNS to perform transition studies on these computers is examined. The results indicate that PSDNS approach can effectively be parallelized on a distributed-memory parallel machine by remapping the distributed data structure during the course of the calculation. Scalability information is provided to estimate computational costs to match the actual costs relative to changes in the number of grid points. By increasing the number of processors, slower than linear speedups are achieved with optimized (machine-dependent library) routines. This slower than linear speedup results because the computational cost is dominated by FFT routine, which yields less than ideal speedups. By using appropriate compile options and optimized library routines on the SP1, the serial code achieves 52-56 M ops on a single node of the SP1 (45 percent of theoretical peak performance). The actual performance of the PSDNS code on the SP1 is evaluated with a "real world" simulation that consists of 1.7 million grid points. One time step of this simulation is calculated on eight nodes of the SP1 in the same time as required by a Cray Y/MP supercomputer. For the same simulation, 32-nodes of the SP1 and SP2 are required to reach the performance of a Cray C-90. A 32 node SP1 (SP2) configuration is 2.9 (4.6) times faster than a Cray Y/MP for this simulation, while the hypercube is roughly 2 times slower than the Y/MP for this application. KEY WORDS: Spatial direct numerical simulations; incompressible viscous flows; spectral methods; finite differences; parallel computing.

  11. A parallel direct solver for the self-adaptive hp Finite Element Method

    KAUST Repository

    Paszyński, Maciej R.

    2010-03-01

    In this paper we present a new parallel multi-frontal direct solver, dedicated for the hp Finite Element Method (hp-FEM). The self-adaptive hp-FEM generates in a fully automatic mode, a sequence of hp-meshes delivering exponential convergence of the error with respect to the number of degrees of freedom (d.o.f.) as well as the CPU time, by performing a sequence of hp refinements starting from an arbitrary initial mesh. The solver constructs an initial elimination tree for an arbitrary initial mesh, and expands the elimination tree each time the mesh is refined. This allows us to keep track of the order of elimination for the solver. The solver also minimizes the memory usage, by de-allocating partial LU factorizations computed during the elimination stage of the solver, and recomputes them for the backward substitution stage, by utilizing only about 10% of the computational time necessary for the original computations. The solver has been tested on 3D Direct Current (DC) borehole resistivity measurement simulations problems. We measure the execution time and memory usage of the solver over a large regular mesh with 1.5 million degrees of freedom as well as on the highly non-regular mesh, generated by the self-adaptive h p-FEM, with finite elements of various sizes and polynomial orders of approximation varying from p = 1 to p = 9. From the presented experiments it follows that the parallel solver scales well up to the maximum number of utilized processors. The limit for the solver scalability is the maximum sequential part of the algorithm: the computations of the partial LU factorizations over the longest path, coming from the root of the elimination tree down to the deepest leaf. © 2009 Elsevier Inc. All rights reserved.

  12. Parallel integer sorting with medium and fine-scale parallelism

    Science.gov (United States)

    Dagum, Leonardo

    1993-01-01

    Two new parallel integer sorting algorithms, queue-sort and barrel-sort, are presented and analyzed in detail. These algorithms do not have optimal parallel complexity, yet they show very good performance in practice. Queue-sort designed for fine-scale parallel architectures which allow the queueing of multiple messages to the same destination. Barrel-sort is designed for medium-scale parallel architectures with a high message passing overhead. The performance results from the implementation of queue-sort on a Connection Machine CM-2 and barrel-sort on a 128 processor iPSC/860 are given. The two implementations are found to be comparable in performance but not as good as a fully vectorized bucket sort on the Cray YMP.

  13. Wing-Body Aeroelasticity Using Finite-Difference Fluid/Finite-Element Structural Equations on Parallel Computers

    Science.gov (United States)

    Byun, Chansup; Guruswamy, Guru P.; Kutler, Paul (Technical Monitor)

    1994-01-01

    In recent years significant advances have been made for parallel computers in both hardware and software. Now parallel computers have become viable tools in computational mechanics. Many application codes developed on conventional computers have been modified to benefit from parallel computers. Significant speedups in some areas have been achieved by parallel computations. For single-discipline use of both fluid dynamics and structural dynamics, computations have been made on wing-body configurations using parallel computers. However, only a limited amount of work has been completed in combining these two disciplines for multidisciplinary applications. The prime reason is the increased level of complication associated with a multidisciplinary approach. In this work, procedures to compute aeroelasticity on parallel computers using direct coupling of fluid and structural equations will be investigated for wing-body configurations. The parallel computer selected for computations is an Intel iPSC/860 computer which is a distributed-memory, multiple-instruction, multiple data (MIMD) computer with 128 processors. In this study, the computational efficiency issues of parallel integration of both fluid and structural equations will be investigated in detail. The fluid and structural domains will be modeled using finite-difference and finite-element approaches, respectively. Results from the parallel computer will be compared with those from the conventional computers using a single processor. This study will provide an efficient computational tool for the aeroelastic analysis of wing-body structures on MIMD type parallel computers.

  14. Highly efficient parallel direct solver for solving dense complex matrix equations from method of moments

    Directory of Open Access Journals (Sweden)

    Yan Chen

    2017-03-01

    Full Text Available Based on the vectorised and cache optimised kernel, a parallel lower upper decomposition with a novel communication avoiding pivoting scheme is developed to solve dense complex matrix equations generated by the method of moments. The fine-grain data rearrangement and assembler instructions are adopted to reduce memory accessing times and improve CPU cache utilisation, which also facilitate vectorisation of the code. Through grouping processes in a binary tree, a parallel pivoting scheme is designed to optimise the communication pattern and thus reduces the solving time of the proposed solver. Two large electromagnetic radiation problems are solved on two supercomputers, respectively, and the numerical results demonstrate that the proposed method outperforms those in open source and commercial libraries.

  15. Application of the DMRG in two dimensions: a parallel tempering algorithm

    Science.gov (United States)

    Hu, Shijie; Zhao, Jize; Zhang, Xuefeng; Eggert, Sebastian

    The Density Matrix Renormalization Group (DMRG) is known to be a powerful algorithm for treating one-dimensional systems. When the DMRG is applied in two dimensions, however, the convergence becomes much less reliable and typically ''metastable states'' may appear, which are unfortunately quite robust even when keeping a very high number of DMRG states. To overcome this problem we have now successfully developed a parallel tempering DMRG algorithm. Similar to parallel tempering in quantum Monte Carlo, this algorithm allows the systematic switching of DMRG states between different model parameters, which is very efficient for solving convergence problems. Using this method we have figured out the phase diagram of the xxz model on the anisotropic triangular lattice which can be realized by hardcore bosons in optical lattices. SFB Transregio 49 of the Deutsche Forschungsgemeinschaft (DFG) and the Allianz fur Hochleistungsrechnen Rheinland-Pfalz (AHRP).

  16. Digital parallel-to-series pulse-train converter

    Science.gov (United States)

    Hussey, J.

    1971-01-01

    Circuit converts number represented as two level signal on n-bit lines to series of pulses on one of two lines, depending on sign of number. Converter accepts parallel binary input data and produces number of output pulses equal to number represented by input data.

  17. Fast ℓ1-SPIRiT Compressed Sensing Parallel Imaging MRI: Scalable Parallel Implementation and Clinically Feasible Runtime

    Science.gov (United States)

    Murphy, Mark; Alley, Marcus; Demmel, James; Keutzer, Kurt; Vasanawala, Shreyas; Lustig, Michael

    2012-01-01

    We present ℓ1-SPIRiT, a simple algorithm for auto calibrating parallel imaging (acPI) and compressed sensing (CS) that permits an efficient implementation with clinically-feasible runtimes. We propose a CS objective function that minimizes cross-channel joint sparsity in the Wavelet domain. Our reconstruction minimizes this objective via iterative soft-thresholding, and integrates naturally with iterative Self-Consistent Parallel Imaging (SPIRiT). Like many iterative MRI reconstructions, ℓ1-SPIRiT’s image quality comes at a high computational cost. Excessively long runtimes are a barrier to the clinical use of any reconstruction approach, and thus we discuss our approach to efficiently parallelizing ℓ1-SPIRiT and to achieving clinically-feasible runtimes. We present parallelizations of ℓ1-SPIRiT for both multi-GPU systems and multi-core CPUs, and discuss the software optimization and parallelization decisions made in our implementation. The performance of these alternatives depends on the processor architecture, the size of the image matrix, and the number of parallel imaging channels. Fundamentally, achieving fast runtime requires the correct trade-off between cache usage and parallelization overheads. We demonstrate image quality via a case from our clinical experimentation, using a custom 3DFT Spoiled Gradient Echo (SPGR) sequence with up to 8× acceleration via poisson-disc undersampling in the two phase-encoded directions. PMID:22345529

  18. Vectorization, parallelization and porting of nuclear codes on the VPP500 system (parallelization). Progress report fiscal 1996

    Energy Technology Data Exchange (ETDEWEB)

    Watanabe, Hideo; Kawai, Wataru; Nemoto, Toshiyuki [Fujitsu Ltd., Tokyo (Japan); and others

    1997-12-01

    Several computer codes in the nuclear field have been vectorized, parallelized and transported on the FUJITSU VPP500 system at Center for Promotion of Computational Science and Engineering in Japan Atomic Energy Research Institute. These results are reported in 3 parts, i.e., the vectorization part, the parallelization part and the porting part. In this report, we describe the parallelization. In this parallelization part, the parallelization of 2-Dimensional relativistic electromagnetic particle code EM2D, Cylindrical Direct Numerical Simulation code CYLDNS and molecular dynamics code for simulating radiation damages in diamond crystals DGR are described. In the vectorization part, the vectorization of two and three dimensional discrete ordinates simulation code DORT-TORT, gas dynamics analysis code FLOWGR and relativistic Boltzmann-Uehling-Uhlenbeck simulation code RBUU are described. And then, in the porting part, the porting of reactor safety analysis code RELAP5/MOD3.2 and RELAP5/MOD3.2.1.2, nuclear data processing system NJOY and 2-D multigroup discrete ordinate transport code TWOTRAN-II are described. And also, a survey for the porting of command-driven interactive data analysis plotting program IPLOT are described. (author)

  19. Directional detection of dark matter with two-dimensional targets

    Science.gov (United States)

    Hochberg, Yonit; Kahn, Yonatan; Lisanti, Mariangela; Tully, Christopher G.; Zurek, Kathryn M.

    2017-09-01

    We propose two-dimensional materials as targets for direct detection of dark matter. Using graphene as an example, we focus on the case where dark matter scattering deposits sufficient energy on a valence-band electron to eject it from the target. We show that the sensitivity of graphene to dark matter of MeV to GeV mass can be comparable, for similar exposure and background levels, to that of semiconductor targets such as silicon and germanium. Moreover, a two-dimensional target is an excellent directional detector, as the ejected electron retains information about the angular dependence of the incident dark matter particle. This proposal can be implemented by the PTOLEMY experiment, presenting for the first time an opportunity for directional detection of sub-GeV dark matter.

  20. Start-up flow in a three-dimensional lid-driven cavity by means of a massively parallel direction splitting algorithm

    KAUST Repository

    Guermond, J. L.; Minev, P. D.

    2011-01-01

    The purpose of this paper is to validate a new highly parallelizable direction splitting algorithm. The parallelization capabilities of this algorithm are illustrated by providing a highly accurate solution for the start-up flow in a three

  1. Almost two-dimensional treatment of drift wave turbulence

    International Nuclear Information System (INIS)

    Albert, J.M.; Similon, P.L.; Sudan, R.N.

    1990-01-01

    The approximation of two-dimensionality is studied and extended for electrostatic drift wave turbulence in a three-dimensional, magnetized plasma. It is argued on the basis of the direct interaction approximation that in the absence of parallel viscosity, purely 2-D solutions exist for which only modes with k parallel =0 are excited, but that the 2-D spectrum is unstable to perturbations at nonzero k parallel . A 1-D equation for the parallel profile g k perpendicular (k parallel ) of the saturated spectrum at steady state is derived and solved, allowing for parallel viscosity; the spectrum has finite width in k parallel , and hence finite parallel correlation length, as a result of nonlinear coupling. The enhanced energy dissipation rate, a 3-D effect, may be incorporated in the 2-D approximation by a suitable renormalization of the linear dissipation term. An algorithm is presented that reduces the 3-D problem to coupled 1- and 2-D problems. Numerical results from a 2-D spectral direct simulation, thus modified, are compared with the results from the corresponding 3-D (unmodified) simulation for a specific model of drift wave excitation. Damping at high k parallel is included. It is verified that the 1-D solution for g k perpendicular (k parallel ) accurately describes the shape and width of the 3-D spectrum, and that the modified 2-D simulation gives a good estimate of the 3-D energy saturation level and distribution E(k perpendicular )

  2. Thread-level parallelization and optimization of NWChem for the Intel MIC architecture

    Energy Technology Data Exchange (ETDEWEB)

    Shan, Hongzhang [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Williams, Samuel [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); de Jong, Wibe [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Oliker, Leonid [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)

    2015-01-01

    In the multicore era it was possible to exploit the increase in on-chip parallelism by simply running multiple MPI processes per chip. Unfortunately, manycore processors' greatly increased thread- and data-level parallelism coupled with a reduced memory capacity demand an altogether different approach. In this paper we explore augmenting two NWChem modules, triples correction of the CCSD(T) and Fock matrix construction, with OpenMP in order that they might run efficiently on future manycore architectures. As the next NERSC machine will be a self-hosted Intel MIC (Xeon Phi) based supercomputer, we leverage an existing MIC testbed at NERSC to evaluate our experiments. In order to proxy the fact that future MIC machines will not have a host processor, we run all of our experiments in native mode. We found that while straightforward application of OpenMP to the deep loop nests associated with the tensor contractions of CCSD(T) was sufficient in attaining high performance, significant e ort was required to safely and efeciently thread the TEXAS integral package when constructing the Fock matrix. Ultimately, our new MPI+OpenMP hybrid implementations attain up to 65× better performance for the triples part of the CCSD(T) due in large part to the fact that the limited on-card memory limits the existing MPI implementation to a single process per card. Additionally, we obtain up to 1.6× better performance on Fock matrix constructions when compared with the best MPI implementations running multiple processes per card.

  3. Thread-Level Parallelization and Optimization of NWChem for the Intel MIC Architecture

    Energy Technology Data Exchange (ETDEWEB)

    Shan, Hongzhang; Williams, Samuel; Jong, Wibe de; Oliker, Leonid

    2014-10-10

    In the multicore era it was possible to exploit the increase in on-chip parallelism by simply running multiple MPI processes per chip. Unfortunately, manycore processors' greatly increased thread- and data-level parallelism coupled with a reduced memory capacity demand an altogether different approach. In this paper we explore augmenting two NWChem modules, triples correction of the CCSD(T) and Fock matrix construction, with OpenMP in order that they might run efficiently on future manycore architectures. As the next NERSC machine will be a self-hosted Intel MIC (Xeon Phi) based supercomputer, we leverage an existing MIC testbed at NERSC to evaluate our experiments. In order to proxy the fact that future MIC machines will not have a host processor, we run all of our experiments in tt native mode. We found that while straightforward application of OpenMP to the deep loop nests associated with the tensor contractions of CCSD(T) was sufficient in attaining high performance, significant effort was required to safely and efficiently thread the TEXAS integral package when constructing the Fock matrix. Ultimately, our new MPI OpenMP hybrid implementations attain up to 65x better performance for the triples part of the CCSD(T) due in large part to the fact that the limited on-card memory limits the existing MPI implementation to a single process per card. Additionally, we obtain up to 1.6x better performance on Fock matrix constructions when compared with the best MPI implementations running multiple processes per card.

  4. Concatenating algorithms for parallel numerical simulations coupling radiation hydrodynamics with neutron transport

    International Nuclear Information System (INIS)

    Mo Zeyao

    2004-11-01

    Multiphysics parallel numerical simulations are usually essential to simplify researches on complex physical phenomena in which several physics are tightly coupled. It is very important on how to concatenate those coupled physics for fully scalable parallel simulation. Meanwhile, three objectives should be balanced, the first is efficient data transfer among simulations, the second and the third are efficient parallel executions and simultaneously developments of those simulation codes. Two concatenating algorithms for multiphysics parallel numerical simulations coupling radiation hydrodynamics with neutron transport on unstructured grid are presented. The first algorithm, Fully Loosely Concatenation (FLC), focuses on the independence of code development and the independence running with optimal performance of code. The second algorithm. Two Level Tightly Concatenation (TLTC), focuses on the optimal tradeoffs among above three objectives. Theoretical analyses for communicational complexity and parallel numerical experiments on hundreds of processors on two parallel machines have showed that these two algorithms are efficient and can be generalized to other multiphysics parallel numerical simulations. In especial, algorithm TLTC is linearly scalable and has achieved the optimal parallel performance. (authors)

  5. A CS1 pedagogical approach to parallel thinking

    Science.gov (United States)

    Rague, Brian William

    Almost all collegiate programs in Computer Science offer an introductory course in programming primarily devoted to communicating the foundational principles of software design and development. The ACM designates this introduction to computer programming course for first-year students as CS1, during which methodologies for solving problems within a discrete computational context are presented. Logical thinking is highlighted, guided primarily by a sequential approach to algorithm development and made manifest by typically using the latest, commercially successful programming language. In response to the most recent developments in accessible multicore computers, instructors of these introductory classes may wish to include training on how to design workable parallel code. Novel issues arise when programming concurrent applications which can make teaching these concepts to beginning programmers a seemingly formidable task. Student comprehension of design strategies related to parallel systems should be monitored to ensure an effective classroom experience. This research investigated the feasibility of integrating parallel computing concepts into the first-year CS classroom. To quantitatively assess student comprehension of parallel computing, an experimental educational study using a two-factor mixed group design was conducted to evaluate two instructional interventions in addition to a control group: (1) topic lecture only, and (2) topic lecture with laboratory work using a software visualization Parallel Analysis Tool (PAT) specifically designed for this project. A new evaluation instrument developed for this study, the Perceptions of Parallelism Survey (PoPS), was used to measure student learning regarding parallel systems. The results from this educational study show a statistically significant main effect among the repeated measures, implying that student comprehension levels of parallel concepts as measured by the PoPS improve immediately after the delivery of

  6. Template based parallel checkpointing in a massively parallel computer system

    Science.gov (United States)

    Archer, Charles Jens [Rochester, MN; Inglett, Todd Alan [Rochester, MN

    2009-01-13

    A method and apparatus for a template based parallel checkpoint save for a massively parallel super computer system using a parallel variation of the rsync protocol, and network broadcast. In preferred embodiments, the checkpoint data for each node is compared to a template checkpoint file that resides in the storage and that was previously produced. Embodiments herein greatly decrease the amount of data that must be transmitted and stored for faster checkpointing and increased efficiency of the computer system. Embodiments are directed to a parallel computer system with nodes arranged in a cluster with a high speed interconnect that can perform broadcast communication. The checkpoint contains a set of actual small data blocks with their corresponding checksums from all nodes in the system. The data blocks may be compressed using conventional non-lossy data compression algorithms to further reduce the overall checkpoint size.

  7. Experimental Hamiltonian identification for controlled two-level systems

    International Nuclear Information System (INIS)

    Schirmer, S.G.; Kolli, A.; Oi, D.K.L.

    2004-01-01

    We present a strategy to empirically determine the internal and control Hamiltonians for an unknown two-level system (black box) subject to various (piecewise constant) control fields when direct readout by measurement is limited to a single, fixed observable

  8. New strategy for eliminating zero-sequence circulating current between parallel operating three-level NPC voltage source inverters

    DEFF Research Database (Denmark)

    Li, Kai; Dong, Zhenhua; Wang, Xiaodong

    2018-01-01

    buses, that are operating in parallel. First, an equivalent model of ZSCC in a three-phase three-level NPC inverter paralleled system is developed. Second, on the basis of the analysis of the excitation source of ZSCCs, i.e., the difference in common mode voltages (CMVs) between paralleled inverters......, the ZCMV-PWM method is presented to reduce CMVs, and a simple electric circuit is adopted to control ZSCCs and neutral point potential. Finally, simulation and experiment are conducted to illustrate effectiveness of the proposed strategy. Results show that ZSCCs between paralleled inverters can...... be eliminated effectively under steady and dynamic states. Moreover, the proposed strategy exhibits the advantage of not requiring carrier synchronization. It can be utilized in inverters with different types of filter....

  9. Coupled Model of channels in parallel and neutron kinetics in two dimensions

    International Nuclear Information System (INIS)

    Cecenas F, M.; Campos G, R.M.; Valle G, E. del

    2004-01-01

    In this work an arrangement of thermohydraulic channels is presented that represent those four quadrants of a nucleus of reactor type BWR. The channels are coupled to a model of neutronic in two dimensions that allow to generate the radial profile of power of the reactor. Nevertheless that the neutronic pattern is of two dimensions, it is supplemented with axial additional information when considering the axial profiles of power for each thermo hydraulic channel. The stationary state is obtained the one it imposes as frontier condition the same pressure drop for all the channels. This condition is satisfied to iterating on the flow of coolant in each channel to equal the pressure drop in all the channels. This stationary state is perturbed later on when modifying the values for the effective sections corresponding to an it assembles. The calculation in parallel of the neutronic and the thermo hydraulic is carried out with Vpm (Virtual parallel machine) by means of an outline teacher-slave in a local net of computers. (Author)

  10. Unsteady free convection MHD flow between two heated vertical parallel conducting plates

    International Nuclear Information System (INIS)

    Sanyal, D.C.; Adhikari, A.

    2006-01-01

    Unsteady free convection flow of a viscous incompressible electrically conducting fluid between two heated conducting vertical parallel plates subjected to a uniform transverse magnetic field is considered. The approximate analytical solutions for velocity, induced field and temperature distribution are obtained for small and large values of magnetic Reynolds number. The problem is also extended to thermometric case. (author)

  11. Convergence analysis of a class of massively parallel direction splitting algorithms for the Navier-Stokes equations in simple domains

    KAUST Repository

    Guermond, Jean-Luc; Minev, Peter D.; Salgado, Abner J.

    2012-01-01

    We provide a convergence analysis for a new fractional timestepping technique for the incompressible Navier-Stokes equations based on direction splitting. This new technique is of linear complexity, unconditionally stable and convergent, and suitable for massive parallelization. © 2012 American Mathematical Society.

  12. Large Scale Parallel DNA Detection by Two-Dimensional Solid-State Multipore Systems.

    Science.gov (United States)

    Athreya, Nagendra Bala Murali; Sarathy, Aditya; Leburton, Jean-Pierre

    2018-04-23

    We describe a scalable device design of a dense array of multiple nanopores made from nanoscale semiconductor materials to detect and identify translocations of many biomolecules in a massively parallel detection scheme. We use molecular dynamics coupled to nanoscale device simulations to illustrate the ability of this device setup to uniquely identify DNA parallel translocations. We show that the transverse sheet currents along membranes are immune to the crosstalk effects arising from simultaneous translocations of biomolecules through multiple pores, due to their ability to sense only the local potential changes. We also show that electronic sensing across the nanopore membrane offers a higher detection resolution compared to ionic current blocking technique in a multipore setup, irrespective of the irregularities that occur while fabricating the nanopores in a two-dimensional membrane.

  13. Parallelization in Modern C++

    CERN Multimedia

    CERN. Geneva

    2016-01-01

    The traditionally used and well established parallel programming models OpenMP and MPI are both targeting lower level parallelism and are meant to be as language agnostic as possible. For a long time, those models were the only widely available portable options for developing parallel C++ applications beyond using plain threads. This has strongly limited the optimization capabilities of compilers, has inhibited extensibility and genericity, and has restricted the use of those models together with other, modern higher level abstractions introduced by the C++11 and C++14 standards. The recent revival of interest in the industry and wider community for the C++ language has also spurred a remarkable amount of standardization proposals and technical specifications being developed. Those efforts however have so far failed to build a vision on how to seamlessly integrate various types of parallelism, such as iterative parallel execution, task-based parallelism, asynchronous many-task execution flows, continuation s...

  14. Totally parallel multilevel algorithms

    Science.gov (United States)

    Frederickson, Paul O.

    1988-01-01

    Four totally parallel algorithms for the solution of a sparse linear system have common characteristics which become quite apparent when they are implemented on a highly parallel hypercube such as the CM2. These four algorithms are Parallel Superconvergent Multigrid (PSMG) of Frederickson and McBryan, Robust Multigrid (RMG) of Hackbusch, the FFT based Spectral Algorithm, and Parallel Cyclic Reduction. In fact, all four can be formulated as particular cases of the same totally parallel multilevel algorithm, which are referred to as TPMA. In certain cases the spectral radius of TPMA is zero, and it is recognized to be a direct algorithm. In many other cases the spectral radius, although not zero, is small enough that a single iteration per timestep keeps the local error within the required tolerance.

  15. Two-dimensional parallel array technology as a new approach to automated combinatorial solid-phase organic synthesis

    Science.gov (United States)

    Brennan; Biddison; Frauendorf; Schwarcz; Keen; Ecker; Davis; Tinder; Swayze

    1998-01-01

    An automated, 96-well parallel array synthesizer for solid-phase organic synthesis has been designed and constructed. The instrument employs a unique reagent array delivery format, in which each reagent utilized has a dedicated plumbing system. An inert atmosphere is maintained during all phases of a synthesis, and temperature can be controlled via a thermal transfer plate which holds the injection molded reaction block. The reaction plate assembly slides in the X-axis direction, while eight nozzle blocks holding the reagent lines slide in the Y-axis direction, allowing for the extremely rapid delivery of any of 64 reagents to 96 wells. In addition, there are six banks of fixed nozzle blocks, which deliver the same reagent or solvent to eight wells at once, for a total of 72 possible reagents. The instrument is controlled by software which allows the straightforward programming of the synthesis of a larger number of compounds. This is accomplished by supplying a general synthetic procedure in the form of a command file, which calls upon certain reagents to be added to specific wells via lookup in a sequence file. The bottle position, flow rate, and concentration of each reagent is stored in a separate reagent table file. To demonstrate the utility of the parallel array synthesizer, a small combinatorial library of hydroxamic acids was prepared in high throughput mode for biological screening. Approximately 1300 compounds were prepared on a 10 μmole scale (3-5 mg) in a few weeks. The resulting crude compounds were generally >80% pure, and were utilized directly for high throughput screening in antibacterial assays. Several active wells were found, and the activity was verified by solution-phase synthesis of analytically pure material, indicating that the system described herein is an efficient means for the parallel synthesis of compounds for lead discovery. Copyright 1998 John Wiley & Sons, Inc.

  16. High-speed fan-beam reconstruction using direct two-dimensional Fourier transform method

    International Nuclear Information System (INIS)

    Niki, Noboru; Mizutani, Toshio; Takahashi, Yoshizo; Inouye, Tamon.

    1984-01-01

    Since the first development of X-ray computer tomography (CT), various efforts have been made to obtain high quality of high-speed image. However, the development of high resolution CT and the ultra-high speed CT to be applied to hearts is still desired. The X-ray beam scanning method was already changed from the parallel beam system to the fan-beam system in order to greatly shorten the scanning time. Also, the filtered back projection (DFBP) method has been employed to directly processing fan-beam projection data as reconstruction method. Although the two-dimensional Fourier transform (TFT) method significantly faster than FBP method was proposed, it has not been sufficiently examined for fan-beam projection data. Thus, the ITFT method was investigated, which first executes rebinning algorithm to convert the fan-beam projection data to the parallel beam projection data, thereafter, uses two-dimensional Fourier transform. By this method, although high speed is expected, the reconstructed images might be degraded due to the adoption of rebinning algorithm. Therefore, the effect of the interpolation error of rebinning algorithm on the reconstructed images has been analyzed theoretically, and finally, the result of the employment of spline interpolation which allows the acquisition of high quality images with less errors has been shown by the numerical and visual evaluation based on simulation and actual data. Computation time was reduced to 1/15 for the image matrix of 512 and to 1/30 for doubled matrix. (Wakatsuki, Y.)

  17. HPC parallel programming model for gyrokinetic MHD simulation

    International Nuclear Information System (INIS)

    Naitou, Hiroshi; Yamada, Yusuke; Tokuda, Shinji; Ishii, Yasutomo; Yagi, Masatoshi

    2011-01-01

    The 3-dimensional gyrokinetic PIC (particle-in-cell) code for MHD simulation, Gpic-MHD, was installed on SR16000 (“Plasma Simulator”), which is a scalar cluster system consisting of 8,192 logical cores. The Gpic-MHD code advances particle and field quantities in time. In order to distribute calculations over large number of logical cores, the total simulation domain in cylindrical geometry was broken up into N DD-r × N DD-z (number of radial decomposition times number of axial decomposition) small domains including approximately the same number of particles. The axial direction was uniformly decomposed, while the radial direction was non-uniformly decomposed. N RP replicas (copies) of each decomposed domain were used (“particle decomposition”). The hybrid parallelization model of multi-threads and multi-processes was employed: threads were parallelized by the auto-parallelization and N DD-r × N DD-z × N RP processes were parallelized by MPI (message-passing interface). The parallelization performance of Gpic-MHD was investigated for the medium size system of N r × N θ × N z = 1025 × 128 × 128 mesh with 4.196 or 8.192 billion particles. The highest speed for the fixed number of logical cores was obtained for two threads, the maximum number of N DD-z , and optimum combination of N DD-r and N RP . The observed optimum speeds demonstrated good scaling up to 8,192 logical cores. (author)

  18. Integrated parallel reception, excitation, and shimming (iPRES).

    Science.gov (United States)

    Han, Hui; Song, Allen W; Truong, Trong-Kha

    2013-07-01

    To develop a new concept for a hardware platform that enables integrated parallel reception, excitation, and shimming. This concept uses a single coil array rather than separate arrays for parallel excitation/reception and B0 shimming. It relies on a novel design that allows a radiofrequency current (for excitation/reception) and a direct current (for B0 shimming) to coexist independently in the same coil. Proof-of-concept B0 shimming experiments were performed with a two-coil array in a phantom, whereas B0 shimming simulations were performed with a 48-coil array in the human brain. Our experiments show that individually optimized direct currents applied in each coil can reduce the B0 root-mean-square error by 62-81% and minimize distortions in echo-planar images. The simulations show that dynamic shimming with the 48-coil integrated parallel reception, excitation, and shimming array can reduce the B0 root-mean-square error in the prefrontal and temporal regions by 66-79% as compared with static second-order spherical harmonic shimming and by 12-23% as compared with dynamic shimming with a 48-coil conventional shim array. Our results demonstrate the feasibility of the integrated parallel reception, excitation, and shimming concept to perform parallel excitation/reception and B0 shimming with a unified coil system as well as its promise for in vivo applications. Copyright © 2013 Wiley Periodicals, Inc.

  19. Testing New Programming Paradigms with NAS Parallel Benchmarks

    Science.gov (United States)

    Jin, H.; Frumkin, M.; Schultz, M.; Yan, J.

    2000-01-01

    was applied to several benchmarks, noticeably BT and SP, resulting in better sequential performance. In order to overcome the lack of an HPF performance model and guide the development of the HPF codes, we employed an empirical performance model for several primitives found in the benchmarks. We encountered a few limitations of HPF, such as lack of supporting the "REDISTRIBUTION" directive and no easy way to handle irregular computation. The parallelization with OpenMP directives was done at the outer-most loop level to achieve the largest granularity. The performance of six HPF and OpenMP benchmarks is compared with their MPI counterparts for the Class-A problem size in the figure in next page. These results were obtained on an SGI Origin2000 (195MHz) with MIPSpro-f77 compiler 7.2.1 for OpenMP and MPI codes and PGI pghpf-2.4.3 compiler with MPI interface for HPF programs.

  20. Design strategies for irregularly adapting parallel applications

    International Nuclear Information System (INIS)

    Oliker, Leonid; Biswas, Rupak; Shan, Hongzhang; Sing, Jaswinder Pal

    2000-01-01

    Achieving scalable performance for dynamic irregular applications is eminently challenging. Traditional message-passing approaches have been making steady progress towards this goal; however, they suffer from complex implementation requirements. The use of a global address space greatly simplifies the programming task, but can degrade the performance of dynamically adapting computations. In this work, we examine two major classes of adaptive applications, under five competing programming methodologies and four leading parallel architectures. Results indicate that it is possible to achieve message-passing performance using shared-memory programming techniques by carefully following the same high level strategies. Adaptive applications have computational work loads and communication patterns which change unpredictably at runtime, requiring dynamic load balancing to achieve scalable performance on parallel machines. Efficient parallel implementations of such adaptive applications are therefore a challenging task. This work examines the implementation of two typical adaptive applications, Dynamic Remeshing and N-Body, across various programming paradigms and architectural platforms. We compare several critical factors of the parallel code development, including performance, programmability, scalability, algorithmic development, and portability

  1. Parallel algorithms for large-scale biological sequence alignment on Xeon-Phi based clusters.

    Science.gov (United States)

    Lan, Haidong; Chan, Yuandong; Xu, Kai; Schmidt, Bertil; Peng, Shaoliang; Liu, Weiguo

    2016-07-19

    Computing alignments between two or more sequences are common operations frequently performed in computational molecular biology. The continuing growth of biological sequence databases establishes the need for their efficient parallel implementation on modern accelerators. This paper presents new approaches to high performance biological sequence database scanning with the Smith-Waterman algorithm and the first stage of progressive multiple sequence alignment based on the ClustalW heuristic on a Xeon Phi-based compute cluster. Our approach uses a three-level parallelization scheme to take full advantage of the compute power available on this type of architecture; i.e. cluster-level data parallelism, thread-level coarse-grained parallelism, and vector-level fine-grained parallelism. Furthermore, we re-organize the sequence datasets and use Xeon Phi shuffle operations to improve I/O efficiency. Evaluations show that our method achieves a peak overall performance up to 220 GCUPS for scanning real protein sequence databanks on a single node consisting of two Intel E5-2620 CPUs and two Intel Xeon Phi 7110P cards. It also exhibits good scalability in terms of sequence length and size, and number of compute nodes for both database scanning and multiple sequence alignment. Furthermore, the achieved performance is highly competitive in comparison to optimized Xeon Phi and GPU implementations. Our implementation is available at https://github.com/turbo0628/LSDBS-mpi .

  2. Parallel algorithms for nuclear reactor analysis via domain decomposition method

    International Nuclear Information System (INIS)

    Kim, Yong Hee

    1995-02-01

    In this thesis, the neutron diffusion equation in reactor physics is discretized by the finite difference method and is solved on a parallel computer network which is composed of T-800 transputers. T-800 transputer is a message-passing type MIMD (multiple instruction streams and multiple data streams) architecture. A parallel variant of Schwarz alternating procedure for overlapping subdomains is developed with domain decomposition. The thesis provides convergence analysis and improvement of the convergence of the algorithm. The convergence of the parallel Schwarz algorithms with DN(or ND), DD, NN, and mixed pseudo-boundary conditions(a weighted combination of Dirichlet and Neumann conditions) is analyzed for both continuous and discrete models in two-subdomain case and various underlying features are explored. The analysis shows that the convergence rate of the algorithm highly depends on the pseudo-boundary conditions and the theoretically best one is the mixed boundary conditions(MM conditions). Also it is shown that there may exist a significant discrepancy between continuous model analysis and discrete model analysis. In order to accelerate the convergence of the parallel Schwarz algorithm, relaxation in pseudo-boundary conditions is introduced and the convergence analysis of the algorithm for two-subdomain case is carried out. The analysis shows that under-relaxation of the pseudo-boundary conditions accelerates the convergence of the parallel Schwarz algorithm if the convergence rate without relaxation is negative, and any relaxation(under or over) decelerates convergence if the convergence rate without relaxation is positive. Numerical implementation of the parallel Schwarz algorithm on an MIMD system requires multi-level iterations: two levels for fixed source problems, three levels for eigenvalue problems. Performance of the algorithm turns out to be very sensitive to the iteration strategy. In general, multi-level iterations provide good performance when

  3. Multilevel Parallelization of AutoDock 4.2

    Directory of Open Access Journals (Sweden)

    Norgan Andrew P

    2011-04-01

    Full Text Available Abstract Background Virtual (computational screening is an increasingly important tool for drug discovery. AutoDock is a popular open-source application for performing molecular docking, the prediction of ligand-receptor interactions. AutoDock is a serial application, though several previous efforts have parallelized various aspects of the program. In this paper, we report on a multi-level parallelization of AutoDock 4.2 (mpAD4. Results Using MPI and OpenMP, AutoDock 4.2 was parallelized for use on MPI-enabled systems and to multithread the execution of individual docking jobs. In addition, code was implemented to reduce input/output (I/O traffic by reusing grid maps at each node from docking to docking. Performance of mpAD4 was examined on two multiprocessor computers. Conclusions Using MPI with OpenMP multithreading, mpAD4 scales with near linearity on the multiprocessor systems tested. In situations where I/O is limiting, reuse of grid maps reduces both system I/O and overall screening time. Multithreading of AutoDock's Lamarkian Genetic Algorithm with OpenMP increases the speed of execution of individual docking jobs, and when combined with MPI parallelization can significantly reduce the execution time of virtual screens. This work is significant in that mpAD4 speeds the execution of certain molecular docking workloads and allows the user to optimize the degree of system-level (MPI and node-level (OpenMP parallelization to best fit both workloads and computational resources.

  4. Multilevel Parallelization of AutoDock 4.2.

    Science.gov (United States)

    Norgan, Andrew P; Coffman, Paul K; Kocher, Jean-Pierre A; Katzmann, David J; Sosa, Carlos P

    2011-04-28

    Virtual (computational) screening is an increasingly important tool for drug discovery. AutoDock is a popular open-source application for performing molecular docking, the prediction of ligand-receptor interactions. AutoDock is a serial application, though several previous efforts have parallelized various aspects of the program. In this paper, we report on a multi-level parallelization of AutoDock 4.2 (mpAD4). Using MPI and OpenMP, AutoDock 4.2 was parallelized for use on MPI-enabled systems and to multithread the execution of individual docking jobs. In addition, code was implemented to reduce input/output (I/O) traffic by reusing grid maps at each node from docking to docking. Performance of mpAD4 was examined on two multiprocessor computers. Using MPI with OpenMP multithreading, mpAD4 scales with near linearity on the multiprocessor systems tested. In situations where I/O is limiting, reuse of grid maps reduces both system I/O and overall screening time. Multithreading of AutoDock's Lamarkian Genetic Algorithm with OpenMP increases the speed of execution of individual docking jobs, and when combined with MPI parallelization can significantly reduce the execution time of virtual screens. This work is significant in that mpAD4 speeds the execution of certain molecular docking workloads and allows the user to optimize the degree of system-level (MPI) and node-level (OpenMP) parallelization to best fit both workloads and computational resources.

  5. Spin-orbit torques for current parallel and perpendicular to a domain wall

    International Nuclear Information System (INIS)

    Schulz, Tomek; Lee, Kyujoon; Karnad, Gurucharan V.; Alejos, Oscar; Martinez, Eduardo; Moretti, Simone; Hals, Kjetil M. D.; Garcia, Karin; Ravelosona, Dafiné; Vila, Laurent; Lo Conte, Roberto; Kläui, Mathias; Ocker, Berthold; Brataas, Arne

    2015-01-01

    We report field- and current-induced domain wall (DW) depinning experiments in Ta\\Co 20 Fe 60 B 20 \\MgO nanowires through a Hall cross geometry. While purely field-induced depinning shows no angular dependence on in-plane fields, the effect of the current depends crucially on the internal DW structure, which we manipulate by an external magnetic in-plane field. We show depinning measurements for a current sent parallel to the DW and compare its depinning efficiency with the conventional case of current flowing perpendicularly to the DW. We find that the maximum efficiency is similar for both current directions within the error bars, which is in line with a dominating damping-like spin-orbit torque (SOT) and indicates that no large additional torques arise for currents perpendicular to the DW. Finally, we find a varying dependence of the maximum depinning efficiency angle for different DWs and pinning levels. This emphasizes the importance of our full angular scans compared with previously used measurements for just two field directions (parallel and perpendicular to the DW) to determine the real torque strength and shows the sensitivity of the SOT to the precise DW structure and pinning sites

  6. Markovian inventory model with two parallel queues, jockeying and impatient customers

    Directory of Open Access Journals (Sweden)

    Jeganathan K.

    2016-01-01

    Full Text Available This article presents a perishable stochastic inventory system under continuous review at a service facility consisting of two parallel queues with jockeying. Each server has its own queue, and jockeying among the queues is permitted. The capacity of each queue is of finite size L. The inventory is replenished according to an (s; S inventory policy and the replenishing times are assumed to be exponentially distributed. The individual customer is issued a demanded item after a random service time, which is distributed as negative exponential. The life time of each item is assumed to be exponential. Customers arrive according to a Poisson process and on arrival; they join the shortest feasible queue. Moreover, if the inventory level is more than one and one queue is empty while in the other queue, more than one customer are waiting, then the customer who has to be received after the customer being served in that queue is transferred to the empty queue. This will prevent one server from being idle while the customers are waiting in the other queue. The waiting customer independently reneges the system after an exponentially distributed amount of time. The joint probability distribution of the inventory level, the number of customers in both queues, and the status of the server are obtained in the steady state. Some important system performance measures in the steady state are derived, so as the long-run total expected cost rate.

  7. Parallel computation for distributed parameter system-from vector processors to Adena computer

    Energy Technology Data Exchange (ETDEWEB)

    Nogi, T

    1983-04-01

    Research on advanced parallel hardware and software architectures for very high-speed computation deserves and needs more support and attention to fulfil its promise. Novel architectures for parallel processing are being made ready. Architectures for parallel processing can be roughly divided into two groups. One is a vector processor in which a single central processing unit involves multiple vector-arithmetic registers. The other is a processor array in which slave processors are connected to a host processor to perform parallel computation. In this review, the concept and data structure of the Adena (alternating-direction edition nexus array) architecture, which is conformable to distributed-parameter simulation algorithms, are described. 5 references.

  8. Parallel Breadth-First Search on Distributed Memory Systems

    Energy Technology Data Exchange (ETDEWEB)

    Computational Research Division; Buluc, Aydin; Madduri, Kamesh

    2011-04-15

    Data-intensive, graph-based computations are pervasive in several scientific applications, and are known to to be quite challenging to implement on distributed memory systems. In this work, we explore the design space of parallel algorithms for Breadth-First Search (BFS), a key subroutine in several graph algorithms. We present two highly-tuned par- allel approaches for BFS on large parallel systems: a level-synchronous strategy that relies on a simple vertex-based partitioning of the graph, and a two-dimensional sparse matrix- partitioning-based approach that mitigates parallel commu- nication overhead. For both approaches, we also present hybrid versions with intra-node multithreading. Our novel hybrid two-dimensional algorithm reduces communication times by up to a factor of 3.5, relative to a common vertex based approach. Our experimental study identifies execu- tion regimes in which these approaches will be competitive, and we demonstrate extremely high performance on lead- ing distributed-memory parallel systems. For instance, for a 40,000-core parallel execution on Hopper, an AMD Magny- Cours based system, we achieve a BFS performance rate of 17.8 billion edge visits per second on an undirected graph of 4.3 billion vertices and 68.7 billion edges with skewed degree distribution.

  9. A new class of massively parallel direction splitting for the incompressible Navier–Stokes equations

    KAUST Repository

    Guermond, J.L.

    2011-06-01

    We introduce in this paper a new direction splitting algorithm for solving the incompressible Navier-Stokes equations. The main originality of the method consists of using the operator (I-∂xx)(I-∂yy)(I-∂zz) for approximating the pressure correction instead of the Poisson operator as done in all the contemporary projection methods. The complexity of the proposed algorithm is significantly lower than that of projection methods, and it is shown the have the same stability properties as the Poisson-based pressure-correction techniques, either in standard or rotational form. The first-order (in time) version of the method is proved to have the same convergence properties as the classical first-order projection techniques. Numerical tests reveal that the second-order version of the method has the same convergence rate as its second-order projection counterpart as well. The method is suitable for parallel implementation and preliminary tests show excellent parallel performance on a distributed memory cluster of up to 1024 processors. The method has been validated on the three-dimensional lid-driven cavity flow using grids composed of up to 2×109 points. © 2011 Elsevier B.V.

  10. The STAPL Parallel Graph Library

    KAUST Repository

    Harshvardhan,; Fidel, Adam; Amato, Nancy M.; Rauchwerger, Lawrence

    2013-01-01

    This paper describes the stapl Parallel Graph Library, a high-level framework that abstracts the user from data-distribution and parallelism details and allows them to concentrate on parallel graph algorithm development. It includes a customizable

  11. High-Efficient Parallel CAVLC Encoders on Heterogeneous Multicore Architectures

    Directory of Open Access Journals (Sweden)

    H. Y. Su

    2012-04-01

    Full Text Available This article presents two high-efficient parallel realizations of the context-based adaptive variable length coding (CAVLC based on heterogeneous multicore processors. By optimizing the architecture of the CAVLC encoder, three kinds of dependences are eliminated or weaken, including the context-based data dependence, the memory accessing dependence and the control dependence. The CAVLC pipeline is divided into three stages: two scans, coding, and lag packing, and be implemented on two typical heterogeneous multicore architectures. One is a block-based SIMD parallel CAVLC encoder on multicore stream processor STORM. The other is a component-oriented SIMT parallel encoder on massively parallel architecture GPU. Both of them exploited rich data-level parallelism. Experiments results show that compared with the CPU version, more than 70 times of speedup can be obtained for STORM and over 50 times for GPU. The implementation of encoder on STORM can make a real-time processing for 1080p @30fps and GPU-based version can satisfy the requirements for 720p real-time encoding. The throughput of the presented CAVLC encoders is more than 10 times higher than that of published software encoders on DSP and multicore platforms.

  12. Electromagnetic pulse coupling through an aperture into a two-parallel-plate region

    Science.gov (United States)

    Rahmat-Samii, Y.

    1978-01-01

    Analysis of electromagnetic-pulse (EMP) penetration via apertures into cavities is an important study in designing hardened systems. In this paper, an integral equation procedure is developed for determining the frequency and consequently the time behavior of the field inside a two-parallel-plate region excited through an aperture by an EMP. Some discussion of the numerical results is also included in the paper for completeness.

  13. Parallel and Multivalued Logic by the Two-Dimensional Photon-Echo Response of a Rhodamine–DNA Complex

    Science.gov (United States)

    2015-01-01

    Implementing parallel and multivalued logic operations at the molecular scale has the potential to improve the miniaturization and efficiency of a new generation of nanoscale computing devices. Two-dimensional photon-echo spectroscopy is capable of resolving dynamical pathways on electronic and vibrational molecular states. We experimentally demonstrate the implementation of molecular decision trees, logic operations where all possible values of inputs are processed in parallel and the outputs are read simultaneously, by probing the laser-induced dynamics of populations and coherences in a rhodamine dye mounted on a short DNA duplex. The inputs are provided by the bilinear interactions between the molecule and the laser pulses, and the output values are read from the two-dimensional molecular response at specific frequencies. Our results highlights how ultrafast dynamics between multiple molecular states induced by light–matter interactions can be used as an advantage for performing complex logic operations in parallel, operations that are faster than electrical switching. PMID:25984269

  14. Start-up flow in a three-dimensional lid-driven cavity by means of a massively parallel direction splitting algorithm

    KAUST Repository

    Guermond, J. L.

    2011-05-04

    The purpose of this paper is to validate a new highly parallelizable direction splitting algorithm. The parallelization capabilities of this algorithm are illustrated by providing a highly accurate solution for the start-up flow in a three-dimensional impulsively started lid-driven cavity of aspect ratio 1×1×2 at Reynolds numbers 1000 and 5000. The computations are done in parallel (up to 1024 processors) on adapted grids of up to 2 billion nodes in three space dimensions. Velocity profiles are given at dimensionless times t=4, 8, and 12; at least four digits are expected to be correct at Re=1000. © 2011 John Wiley & Sons, Ltd.

  15. Combined spatial/angular domain decomposition SN algorithms for shared memory parallel machines

    International Nuclear Information System (INIS)

    Hunter, M.A.; Haghighat, A.

    1993-01-01

    Several parallel processing algorithms on the basis of spatial and angular domain decomposition methods are developed and incorporated into a two-dimensional discrete ordinates transport theory code. These algorithms divide the spatial and angular domains into independent subdomains so that the flux calculations within each subdomain can be processed simultaneously. Two spatial parallel algorithms (Block-Jacobi, red-black), one angular parallel algorithm (η-level), and their combinations are implemented on an eight processor CRAY Y-MP. Parallel performances of the algorithms are measured using a series of fixed source RZ geometry problems. Some of the results are also compared with those executed on an IBM 3090/600J machine. (orig.)

  16. Exploration Of Deep Learning Algorithms Using Openacc Parallel Programming Model

    KAUST Repository

    Hamam, Alwaleed A.

    2017-03-13

    Deep learning is based on a set of algorithms that attempt to model high level abstractions in data. Specifically, RBM is a deep learning algorithm that used in the project to increase it\\'s time performance using some efficient parallel implementation by OpenACC tool with best possible optimizations on RBM to harness the massively parallel power of NVIDIA GPUs. GPUs development in the last few years has contributed to growing the concept of deep learning. OpenACC is a directive based ap-proach for computing where directives provide compiler hints to accelerate code. The traditional Restricted Boltzmann Ma-chine is a stochastic neural network that essentially perform a binary version of factor analysis. RBM is a useful neural net-work basis for larger modern deep learning model, such as Deep Belief Network. RBM parameters are estimated using an efficient training method that called Contrastive Divergence. Parallel implementation of RBM is available using different models such as OpenMP, and CUDA. But this project has been the first attempt to apply OpenACC model on RBM.

  17. Exploration Of Deep Learning Algorithms Using Openacc Parallel Programming Model

    KAUST Repository

    Hamam, Alwaleed A.; Khan, Ayaz H.

    2017-01-01

    Deep learning is based on a set of algorithms that attempt to model high level abstractions in data. Specifically, RBM is a deep learning algorithm that used in the project to increase it's time performance using some efficient parallel implementation by OpenACC tool with best possible optimizations on RBM to harness the massively parallel power of NVIDIA GPUs. GPUs development in the last few years has contributed to growing the concept of deep learning. OpenACC is a directive based ap-proach for computing where directives provide compiler hints to accelerate code. The traditional Restricted Boltzmann Ma-chine is a stochastic neural network that essentially perform a binary version of factor analysis. RBM is a useful neural net-work basis for larger modern deep learning model, such as Deep Belief Network. RBM parameters are estimated using an efficient training method that called Contrastive Divergence. Parallel implementation of RBM is available using different models such as OpenMP, and CUDA. But this project has been the first attempt to apply OpenACC model on RBM.

  18. PDDP, A Data Parallel Programming Model

    Directory of Open Access Journals (Sweden)

    Karen H. Warren

    1996-01-01

    Full Text Available PDDP, the parallel data distribution preprocessor, is a data parallel programming model for distributed memory parallel computers. PDDP implements high-performance Fortran-compatible data distribution directives and parallelism expressed by the use of Fortran 90 array syntax, the FORALL statement, and the WHERE construct. Distributed data objects belong to a global name space; other data objects are treated as local and replicated on each processor. PDDP allows the user to program in a shared memory style and generates codes that are portable to a variety of parallel machines. For interprocessor communication, PDDP uses the fastest communication primitives on each platform.

  19. Evaluation of Circulating Current Suppression Methods for Parallel Interleaved Inverters

    DEFF Research Database (Denmark)

    Gohil, Ghanshyamsinh Vijaysinh; Bede, Lorand; Teodorescu, Remus

    2016-01-01

    Two-level Voltage Source Converters (VSCs) are often connected in parallel to achieve desired current rating in multi-megawatt Wind Energy Conversion System (WECS). A multi-level converter can be realized by interleaving the carrier signals of the parallel VSCs. As a result, the harmonic perfor......-mance of the WECS can be significantly improved. However, the interleaving of the carrier signals may lead to the flow of circulating current between parallel VSCs and it is highly desirable to avoid/suppress this unwanted circulating current. A comparative evaluation of the different methods to avoid....../suppress the circulating current between the parallel interleaved VSCs is presented in this paper. The losses and the volume of the inductive components and the semiconductor losses are evaluated for the WECS with different circulating current suppression methods. Multi-objective optimizations of the inductive components...

  20. Strong nonlinearity-induced correlations for counterpropagating photons scattering on a two-level emitter

    DEFF Research Database (Denmark)

    Nysteen, Anders; McCutcheon, Dara; Mørk, Jesper

    2015-01-01

    We analytically treat the scattering of two counterpropagating photons on a two-level emitter embedded in an optical waveguide. We find that the nonlinearity of the emitter can give rise to significant pulse-dependent directional correlations in the scattered photonic state, which could be quanti......We analytically treat the scattering of two counterpropagating photons on a two-level emitter embedded in an optical waveguide. We find that the nonlinearity of the emitter can give rise to significant pulse-dependent directional correlations in the scattered photonic state, which could...

  1. Intellectual Property Rights, Parallel Imports and Strategic Behavior

    OpenAIRE

    Maskus, Keith E.; Ganslandt, Mattias

    2007-01-01

    The existence of parallel imports (PI) raises a number of interesting policy and strategic questions, which are the subject of this survey article. For example, parallel trade is essentially arbitrage within policy-integrated markets of IPR-protected goods, which may have different prices across countries. Thus, we analyze fully two types of price differences that give rise to such arbitrage. First is simple retail-level trade in horizontal markets because consumer prices may differ. Second i...

  2. Direct measurement of human plasma corticotropin-releasing hormone by two-site immunoradiometric assay

    International Nuclear Information System (INIS)

    Linton, E.A.; McLean, C.; Nieuwenhuyzen Kruseman, A.C.; Tilders, F.J.; Van der Veen, E.A.; Lowry, P.J.

    1987-01-01

    A ''two-site'' immunoradiometric assay (IRMA) which allows the direct estimation of human CRH (hCRH) in plasma is described. Using this IRMA, basal levels of CRH in normal subjects ranged from 2-28 pg/mL [mean, 15 +/- 7 (+/- SD) pg/mL; n = 58]. Values in men and women were similar. Plasma CRH values within this range were also found in patients with Cushing's syndrome, Addison's disease, and Nelson's syndrome, with no correlation between plasma CRH and ACTH levels in these patients. Elevated plasma CRH levels were found in pregnant women near term [1462 +/- 752 (+/- SD) pg/mL; n = 55], and the dilution curve of this CRH-like immunoreactivity paralleled the IRMA standard curve. After its immunoadsorption from maternal plasma, this CRH-like material eluted on reverse phase high performance liquid chromatography with a retention time identical to that of synthetic CRH and had equipotent bioactivity with the synthetic peptide in the perfused anterior pituitary cell bioassay. Circulating CRH was not detected in Wistar rats, even after adrenalectomy and subsequent ether stress. Synthetic hCRH was degraded by fresh human plasma relatively slowly; 65% of added CRH remained after 1 h of incubation at 37 C. Degradation was inhibited by heat treatment (54 C; 1 h), cold treatment (4 C; 4 h), or freezing and thawing. Loss of synthetic rat CRH occurred more rapidly when fresh rat plasma was used; only 20% of added CRH remained under the same conditions. The inability to measure CRH in peripheral rat plasma may be due to the presence of active CRH-degrading enzymes which fragment the CRH molecule into forms not recognized by the CRH IRMA

  3. High-throughput fabrication of micrometer-sized compound parabolic mirror arrays by using parallel laser direct-write processing

    International Nuclear Information System (INIS)

    Yan, Wensheng; Gu, Min; Cumming, Benjamin P

    2015-01-01

    Micrometer-sized parabolic mirror arrays have significant applications in both light emitting diodes and solar cells. However, low fabrication throughput has been identified as major obstacle for the mirror arrays towards large-scale applications due to the serial nature of the conventional method. Here, the mirror arrays are fabricated by using a parallel laser direct-write processing, which addresses this barrier. In addition, it is demonstrated that the parallel writing is able to fabricate complex arrays besides simple arrays and thus offers wider applications. Optical measurements show that each single mirror confines the full-width at half-maximum value to as small as 17.8 μm at the height of 150 μm whilst providing a transmittance of up to 68.3% at a wavelength of 633 nm in good agreement with the calculation values. (paper)

  4. A Parallel Computational Model for Multichannel Phase Unwrapping Problem

    Science.gov (United States)

    Imperatore, Pasquale; Pepe, Antonio; Lanari, Riccardo

    2015-05-01

    In this paper, a parallel model for the solution of the computationally intensive multichannel phase unwrapping (MCh-PhU) problem is proposed. Firstly, the Extended Minimum Cost Flow (EMCF) algorithm for solving MCh-PhU problem is revised within the rigorous mathematical framework of the discrete calculus ; thus permitting to capture its topological structure in terms of meaningful discrete differential operators. Secondly, emphasis is placed on those methodological and practical aspects, which lead to a parallel reformulation of the EMCF algorithm. Thus, a novel dual-level parallel computational model, in which the parallelism is hierarchically implemented at two different (i.e., process and thread) levels, is presented. The validity of our approach has been demonstrated through a series of experiments that have revealed a significant speedup. Therefore, the attained high-performance prototype is suitable for the solution of large-scale phase unwrapping problems in reasonable time frames, with a significant impact on the systematic exploitation of the existing, and rapidly growing, large archives of SAR data.

  5. Automatic Management of Parallel and Distributed System Resources

    Science.gov (United States)

    Yan, Jerry; Ngai, Tin Fook; Lundstrom, Stephen F.

    1990-01-01

    Viewgraphs on automatic management of parallel and distributed system resources are presented. Topics covered include: parallel applications; intelligent management of multiprocessing systems; performance evaluation of parallel architecture; dynamic concurrent programs; compiler-directed system approach; lattice gaseous cellular automata; and sparse matrix Cholesky factorization.

  6. Parallelization and automatic data distribution for nuclear reactor simulations

    Energy Technology Data Exchange (ETDEWEB)

    Liebrock, L.M. [Liebrock-Hicks Research, Calumet, MI (United States)

    1997-07-01

    Detailed attempts at realistic nuclear reactor simulations currently take many times real time to execute on high performance workstations. Even the fastest sequential machine can not run these simulations fast enough to ensure that the best corrective measure is used during a nuclear accident to prevent a minor malfunction from becoming a major catastrophe. Since sequential computers have nearly reached the speed of light barrier, these simulations will have to be run in parallel to make significant improvements in speed. In physical reactor plants, parallelism abounds. Fluids flow, controls change, and reactions occur in parallel with only adjacent components directly affecting each other. These do not occur in the sequentialized manner, with global instantaneous effects, that is often used in simulators. Development of parallel algorithms that more closely approximate the real-world operation of a reactor may, in addition to speeding up the simulations, actually improve the accuracy and reliability of the predictions generated. Three types of parallel architecture (shared memory machines, distributed memory multicomputers, and distributed networks) are briefly reviewed as targets for parallelization of nuclear reactor simulation. Various parallelization models (loop-based model, shared memory model, functional model, data parallel model, and a combined functional and data parallel model) are discussed along with their advantages and disadvantages for nuclear reactor simulation. A variety of tools are introduced for each of the models. Emphasis is placed on the data parallel model as the primary focus for two-phase flow simulation. Tools to support data parallel programming for multiple component applications and special parallelization considerations are also discussed.

  7. Parallelization and automatic data distribution for nuclear reactor simulations

    International Nuclear Information System (INIS)

    Liebrock, L.M.

    1997-01-01

    Detailed attempts at realistic nuclear reactor simulations currently take many times real time to execute on high performance workstations. Even the fastest sequential machine can not run these simulations fast enough to ensure that the best corrective measure is used during a nuclear accident to prevent a minor malfunction from becoming a major catastrophe. Since sequential computers have nearly reached the speed of light barrier, these simulations will have to be run in parallel to make significant improvements in speed. In physical reactor plants, parallelism abounds. Fluids flow, controls change, and reactions occur in parallel with only adjacent components directly affecting each other. These do not occur in the sequentialized manner, with global instantaneous effects, that is often used in simulators. Development of parallel algorithms that more closely approximate the real-world operation of a reactor may, in addition to speeding up the simulations, actually improve the accuracy and reliability of the predictions generated. Three types of parallel architecture (shared memory machines, distributed memory multicomputers, and distributed networks) are briefly reviewed as targets for parallelization of nuclear reactor simulation. Various parallelization models (loop-based model, shared memory model, functional model, data parallel model, and a combined functional and data parallel model) are discussed along with their advantages and disadvantages for nuclear reactor simulation. A variety of tools are introduced for each of the models. Emphasis is placed on the data parallel model as the primary focus for two-phase flow simulation. Tools to support data parallel programming for multiple component applications and special parallelization considerations are also discussed

  8. System-Enforced Deterministic Streaming for Efficient Pipeline Parallelism

    Institute of Scientific and Technical Information of China (English)

    张昱; 李兆鹏; 曹慧芳

    2015-01-01

    Pipeline parallelism is a popular parallel programming pattern for emerging applications. However, program-ming pipelines directly on conventional multithreaded shared memory is difficult and error-prone. We present DStream, a C library that provides high-level abstractions of deterministic threads and streams for simply representing pipeline stage work-ers and their communications. The deterministic stream is established atop our proposed single-producer/multi-consumer (SPMC) virtual memory, which integrates synchronization with the virtual memory model to enforce determinism on shared memory accesses. We investigate various strategies on how to efficiently implement DStream atop the SPMC memory, so that an infinite sequence of data items can be asynchronously published (fixed) and asynchronously consumed in order among adjacent stage workers. We have successfully transformed two representative pipeline applications – ferret and dedup using DStream, and conclude conversion rules. An empirical evaluation shows that the converted ferret performed on par with its Pthreads and TBB counterparts in term of running time, while the converted dedup is close to 2.56X, 7.05X faster than the Pthreads counterpart and 1.06X, 3.9X faster than the TBB counterpart on 16 and 32 CPUs, respectively.

  9. Incorrectness of conventional one-dimensional parallel thermal resistance circuit model for two-dimensional circular composite pipes

    International Nuclear Information System (INIS)

    Wong, K.-L.; Hsien, T.-L.; Chen, W.-L.; Yu, S.-J.

    2008-01-01

    This study is to prove that two-dimensional steady state heat transfer problems of composite circular pipes cannot be appropriately solved by the conventional one-dimensional parallel thermal resistance circuits (PTRC) model because its interface temperatures are not unique. Thus, the PTRC model is definitely different from its conventional recognized analogy, parallel electrical resistance circuits (PERC) model, which has unique node electric voltages. Two typical composite circular pipe examples are solved by CFD software, and the numerical results are compared with those obtained by the PTRC model. This shows that the PTRC model generates large error. Thus, this conventional model, introduced in most heat transfer text books, cannot be applied to two-dimensional composite circular pipes. On the contrary, an alternative one-dimensional separately series thermal resistance circuit (SSTRC) model is proposed and applied to a two-dimensional composite circular pipe with isothermal boundaries, and acceptable results are returned

  10. Structural Directed Growth of Ultrathin Parallel Birnessite on β-MnO2 for High-Performance Asymmetric Supercapacitors.

    Science.gov (United States)

    Zhu, Shijin; Li, Li; Liu, Jiabin; Wang, Hongtao; Wang, Tian; Zhang, Yuxin; Zhang, Lili; Ruoff, Rodney S; Dong, Fan

    2018-02-27

    Two-dimensional birnessite has attracted attention for electrochemical energy storage because of the presence of redox active Mn 4+ /Mn 3+ ions and spacious interlayer channels available for ions diffusion. However, current strategies are largely limited to enhancing the electrical conductivity of birnessite. One key limitation affecting the electrochemical properties of birnessite is the poor utilization of the MnO 6 unit. Here, we assemble β-MnO 2 /birnessite core-shell structure that exploits the exposed crystal face of β-MnO 2 as the core and ultrathin birnessite sheets that have the structure advantage to enhance the utilization efficiency of the Mn from the bulk. Our birnessite that has sheets parallel to each other is found to have unusual crystal structure with interlayer spacing, Mn(III)/Mn(IV) ratio and the content of the balancing cations differing from that of the common birnessite. The substrate directed growth mechanism is carefully investigated. The as-prepared core-shell nanostructures enhance the exposed surface area of birnessite and achieve high electrochemical performances (for example, 657 F g -1 in 1 M Na 2 SO 4 electrolyte based on the weight of parallel birnessite) and excellent rate capability over a potential window of up to 1.2 V. This strategy opens avenues for fundamental studies of birnessite and its properties and suggests the possibility of its use in energy storage and other applications. The potential window of an asymmetric supercapacitor that was assembled with this material can be enlarged to 2.2 V (in aqueous electrolyte) with a good cycling ability.

  11. Characterizing information propagation through inter-vehicle communication on a simple network of two parallel roads

    Science.gov (United States)

    2010-10-01

    In this report, we study information propagation via inter-vehicle communication along two parallel : roads. By identifying an inherent Bernoulli process, we are able to derive the mean and variance of : propagation distance. A road separation distan...

  12. Parallel Atomistic Simulations

    Energy Technology Data Exchange (ETDEWEB)

    HEFFELFINGER,GRANT S.

    2000-01-18

    Algorithms developed to enable the use of atomistic molecular simulation methods with parallel computers are reviewed. Methods appropriate for bonded as well as non-bonded (and charged) interactions are included. While strategies for obtaining parallel molecular simulations have been developed for the full variety of atomistic simulation methods, molecular dynamics and Monte Carlo have received the most attention. Three main types of parallel molecular dynamics simulations have been developed, the replicated data decomposition, the spatial decomposition, and the force decomposition. For Monte Carlo simulations, parallel algorithms have been developed which can be divided into two categories, those which require a modified Markov chain and those which do not. Parallel algorithms developed for other simulation methods such as Gibbs ensemble Monte Carlo, grand canonical molecular dynamics, and Monte Carlo methods for protein structure determination are also reviewed and issues such as how to measure parallel efficiency, especially in the case of parallel Monte Carlo algorithms with modified Markov chains are discussed.

  13. Numerical investigation of two interacting parallel thruster-plumes and comparison to experiment

    Science.gov (United States)

    Grabe, Martin; Holz, André; Ziegenhagen, Stefan; Hannemann, Klaus

    2014-12-01

    Clusters of orbital thrusters are an attractive option to achieve graduated thrust levels and increased redundancy with available hardware, but the heavily under-expanded plumes of chemical attitude control thrusters placed in close proximity will interact, leading to a local amplification of downstream fluxes and of back-flow onto the spacecraft. The interaction of two similar, parallel, axi-symmetric cold-gas model thrusters has recently been studied in the DLR High-Vacuum Plume Test Facility STG under space-like vacuum conditions, employing a Patterson-type impact pressure probe with slot orifice. We reproduce a selection of these experiments numerically, and emphasise that a comparison of numerical results to the measured data is not straight-forward. The signal of the probe used in the experiments must be interpreted according to the degree of rarefaction and local flow Mach number, and both vary dramatically thoughout the flow-field. We present a procedure to reconstruct the probe signal by post-processing the numerically obtained flow-field data and show that agreement to the experimental results is then improved. Features of the investigated cold-gas thruster plume interaction are discussed on the basis of the numerical results.

  14. Spin-orbit torques for current parallel and perpendicular to a domain wall

    Energy Technology Data Exchange (ETDEWEB)

    Schulz, Tomek; Lee, Kyujoon; Karnad, Gurucharan V. [Institut für Physik, Johannes Gutenberg-Universität Mainz, Staudinger Weg 7, 55128 Mainz (Germany); Alejos, Oscar [Departamento de Electricidad y Electrónica, Universidad de Valladolid, Paseo de Belen, 7, E-47011 Valladolid (Spain); Martinez, Eduardo; Moretti, Simone [Departamento Fisica Aplicada, Universidad de Salamanca, Plaza de los Caidos s/n, E-38008 Salamanca (Spain); Hals, Kjetil M. D. [Niels Bohr International Academy and the Center for Quantum Devices, Niels Bohr Institute, University of Copenhagen, 2100 Copenhagen (Denmark); Garcia, Karin; Ravelosona, Dafiné [Institut d' Electronique Fondamentale, UMR CNRS 8622, Université Paris Sud, 91405 Orsay Cedex (France); Vila, Laurent [Institut Nanosciences et Cryogénie, Université Grenoble Alpes, F-38000 Grenoble (France); Institut Nanosciences et Cryogénie, CEA, F-38000 Grenoble (France); Lo Conte, Roberto; Kläui, Mathias [Institut für Physik, Johannes Gutenberg-Universität Mainz, Staudinger Weg 7, 55128 Mainz (Germany); Graduate School of Excellence “Materials Science in Mainz” (MAINZ), Staudinger Weg 9, 55128 Mainz (Germany); Ocker, Berthold [Singulus Technologies AG, 63796 Kahl am Main (Germany); Brataas, Arne [Department of Physics, Norwegian University of Science and Technology, NO-7491 Trondheim (Norway)

    2015-09-21

    We report field- and current-induced domain wall (DW) depinning experiments in Ta\\Co{sub 20}Fe{sub 60}B{sub 20}\\MgO nanowires through a Hall cross geometry. While purely field-induced depinning shows no angular dependence on in-plane fields, the effect of the current depends crucially on the internal DW structure, which we manipulate by an external magnetic in-plane field. We show depinning measurements for a current sent parallel to the DW and compare its depinning efficiency with the conventional case of current flowing perpendicularly to the DW. We find that the maximum efficiency is similar for both current directions within the error bars, which is in line with a dominating damping-like spin-orbit torque (SOT) and indicates that no large additional torques arise for currents perpendicular to the DW. Finally, we find a varying dependence of the maximum depinning efficiency angle for different DWs and pinning levels. This emphasizes the importance of our full angular scans compared with previously used measurements for just two field directions (parallel and perpendicular to the DW) to determine the real torque strength and shows the sensitivity of the SOT to the precise DW structure and pinning sites.

  15. Parallel sites implicate functional convergence of the hearing gene prestin among echolocating mammals.

    Science.gov (United States)

    Liu, Zhen; Qi, Fei-Yan; Zhou, Xin; Ren, Hai-Qing; Shi, Peng

    2014-09-01

    Echolocation is a sensory system whereby certain mammals navigate and forage using sound waves, usually in environments where visibility is limited. Curiously, echolocation has evolved independently in bats and whales, which occupy entirely different environments. Based on this phenotypic convergence, recent studies identified several echolocation-related genes with parallel sites at the protein sequence level among different echolocating mammals, and among these, prestin seems the most promising. Although previous studies analyzed the evolutionary mechanism of prestin, the functional roles of the parallel sites in the evolution of mammalian echolocation are not clear. By functional assays, we show that a key parameter of prestin function, 1/α, is increased in all echolocating mammals and that the N7T parallel substitution accounted for this functional convergence. Moreover, another parameter, V1/2, was shifted toward the depolarization direction in a toothed whale, the bottlenose dolphin (Tursiops truncatus) and a constant-frequency (CF) bat, the Stoliczka's trident bat (Aselliscus stoliczkanus). The parallel site of I384T between toothed whales and CF bats was responsible for this functional convergence. Furthermore, the two parameters (1/α and V1/2) were correlated with mammalian high-frequency hearing, suggesting that the convergent changes of the prestin function in echolocating mammals may play important roles in mammalian echolocation. To our knowledge, these findings present the functional patterns of echolocation-related genes in echolocating mammals for the first time and rigorously demonstrate adaptive parallel evolution at the protein sequence level, paving the way to insights into the molecular mechanism underlying mammalian echolocation. © The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  16. Parallel MR imaging.

    Science.gov (United States)

    Deshmane, Anagha; Gulani, Vikas; Griswold, Mark A; Seiberlich, Nicole

    2012-07-01

    Parallel imaging is a robust method for accelerating the acquisition of magnetic resonance imaging (MRI) data, and has made possible many new applications of MR imaging. Parallel imaging works by acquiring a reduced amount of k-space data with an array of receiver coils. These undersampled data can be acquired more quickly, but the undersampling leads to aliased images. One of several parallel imaging algorithms can then be used to reconstruct artifact-free images from either the aliased images (SENSE-type reconstruction) or from the undersampled data (GRAPPA-type reconstruction). The advantages of parallel imaging in a clinical setting include faster image acquisition, which can be used, for instance, to shorten breath-hold times resulting in fewer motion-corrupted examinations. In this article the basic concepts behind parallel imaging are introduced. The relationship between undersampling and aliasing is discussed and two commonly used parallel imaging methods, SENSE and GRAPPA, are explained in detail. Examples of artifacts arising from parallel imaging are shown and ways to detect and mitigate these artifacts are described. Finally, several current applications of parallel imaging are presented and recent advancements and promising research in parallel imaging are briefly reviewed. Copyright © 2012 Wiley Periodicals, Inc.

  17. Study Behaviors and USMLE Step 1 Performance: Implications of a Student Self-Directed Parallel Curriculum.

    Science.gov (United States)

    Burk-Rafel, Jesse; Santen, Sally A; Purkiss, Joel

    2017-11-01

    To determine medical students' study behaviors when preparing for the United States Medical Licensing Examination (USMLE) Step 1, and how these behaviors are associated with Step 1 scores when controlling for likely covariates. The authors distributed a study-behaviors survey in 2014 and 2015 at their institution to two cohorts of medical students who had recently taken Step 1. Demographic and academic data were linked to responses. Descriptive statistics, bivariate correlations, and multiple linear regression analyses were performed. Of 332 medical students, 274 (82.5%) participated. Most students (n = 211; 77.0%) began studying for Step 1 during their preclinical curriculum, increasing their intensity during a protected study period during which they averaged 11.0 hours studying per day (standard deviation [SD] 2.1) over a period of 35.3 days (SD 6.2). Students used numerous third-party resources, including reading an exam-specific 700-page review book on average 2.1 times (SD 0.8) and completing an average of 3,597 practice multiple-choice questions (SD 1,611). Initiating study prior to the designated study period, increased review book usage, and attempting more practice questions were all associated with higher Step 1 scores, even when controlling for Medical College Admission Test scores, preclinical exam performance, and self-identified score goal (adjusted R = 0.56, P < .001). Medical students at one public institution engaged in a self-directed, "parallel" Step 1 curriculum using third-party study resources. Several study behaviors were associated with improved USMLE Step 1 performance, informing both institutional- and student-directed preparation for this high-stakes exam.

  18. The inaccuracy of conventional one-dimensional parallel thermal resistance circuit model for two-dimensional composite walls

    International Nuclear Information System (INIS)

    Wong, K.-L.; Hsien, T.-L.; Hsiao, M.-C.; Chen, W.-L.; Lin, K.-C.

    2008-01-01

    This investigation is to show that two-dimensional steady state heat transfer problems of composite walls should not be solved by the conventionally one-dimensional parallel thermal resistance circuits (PTRC) model because the interface temperatures are not unique. Thus PTRC model cannot be used like its conventional recognized analogy, parallel electrical resistance circuits (PERC) model which has the unique node electric voltage. Two typical composite wall examples, solved by CFD software, are used to demonstrate the incorrectness. The numerical results are compared with those obtained by PTRC model, and very large differences are observed between their results. This proves that the application of conventional heat transfer PTRC model to two-dimensional composite walls, introduced in most heat transfer text book, is totally incorrect. An alternative one-dimensional separately series thermal resistance circuit (SSTRC) model is proposed and applied to the two-dimensional composite walls with isothermal boundaries. Results with acceptable accuracy can be obtained by the new model

  19. Quantification of the level descriptors for the standard EQ-5D three-level system and a five-level version according to two methods

    NARCIS (Netherlands)

    M.F. Janssen (Bas); E. Birnie (Erwin); G.J. Bonsel (Gouke)

    2008-01-01

    textabstractObjectives: Our aim was to compare the quantitative position of the level descriptors of the standard EQ-5D three-level system (3L) and a newly developed, experimental five-level version (5L) using a direct and a vignette-based indirect method. Methods: Eighty-two respondents took part

  20. Heat transfer analysis of GO-water nanofluid flow between two parallel disks

    Directory of Open Access Journals (Sweden)

    M. Azimi

    2015-03-01

    Full Text Available In this paper, the unsteady magnetohydrodynamic (MHD squeezing flow between two parallel disks (which is filled with nanofluid is considered. The Galerkin optimal homotopy asymptotic method (GOHAM is used to obtain the solution of the governing equations. The effects of Hartman number, nanoparticle volume fraction, Brownian motion parameter and suction/blowing parameter on nanofluid concentration, temperature and velocity profiles have been discussed. Furthermore, a comparison between obtained solutions and numerical ones have been provided.

  1. Two-level method with coarse space size independent convergence

    Energy Technology Data Exchange (ETDEWEB)

    Vanek, P.; Brezina, M. [Univ. of Colorado, Denver, CO (United States); Tezaur, R.; Krizkova, J. [UWB, Plzen (Czech Republic)

    1996-12-31

    The basic disadvantage of the standard two-level method is the strong dependence of its convergence rate on the size of the coarse-level problem. In order to obtain the optimal convergence result, one is limited to using a coarse space which is only a few times smaller than the size of the fine-level one. Consequently, the asymptotic cost of the resulting method is the same as in the case of using a coarse-level solver for the original problem. Today`s two-level domain decomposition methods typically offer an improvement by yielding a rate of convergence which depends on the ratio of fine and coarse level only polylogarithmically. However, these methods require the use of local subdomain solvers for which straightforward application of iterative methods is problematic, while the usual application of direct solvers is expensive. We suggest a method diminishing significantly these difficulties.

  2. Time-dependent density-functional theory in massively parallel computer architectures: the OCTOPUS project.

    Science.gov (United States)

    Andrade, Xavier; Alberdi-Rodriguez, Joseba; Strubbe, David A; Oliveira, Micael J T; Nogueira, Fernando; Castro, Alberto; Muguerza, Javier; Arruabarrena, Agustin; Louie, Steven G; Aspuru-Guzik, Alán; Rubio, Angel; Marques, Miguel A L

    2012-06-13

    Octopus is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this paper we present the ongoing efforts to achieve the parallelization of octopus. We focus on the real-time variant of TDDFT, where the time-dependent Kohn-Sham equations are directly propagated in time. This approach has great potential for execution in massively parallel systems such as modern supercomputers with thousands of processors and graphics processing units (GPUs). For harvesting the potential of conventional supercomputers, the main strategy is a multi-level parallelization scheme that combines the inherent scalability of real-time TDDFT with a real-space grid domain-partitioning approach. A scalable Poisson solver is critical for the efficiency of this scheme. For GPUs, we show how using blocks of Kohn-Sham states provides the required level of data parallelism and that this strategy is also applicable for code optimization on standard processors. Our results show that real-time TDDFT, as implemented in octopus, can be the method of choice for studying the excited states of large molecular systems in modern parallel architectures.

  3. Time-dependent density-functional theory in massively parallel computer architectures: the octopus project

    Science.gov (United States)

    Andrade, Xavier; Alberdi-Rodriguez, Joseba; Strubbe, David A.; Oliveira, Micael J. T.; Nogueira, Fernando; Castro, Alberto; Muguerza, Javier; Arruabarrena, Agustin; Louie, Steven G.; Aspuru-Guzik, Alán; Rubio, Angel; Marques, Miguel A. L.

    2012-06-01

    Octopus is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this paper we present the ongoing efforts to achieve the parallelization of octopus. We focus on the real-time variant of TDDFT, where the time-dependent Kohn-Sham equations are directly propagated in time. This approach has great potential for execution in massively parallel systems such as modern supercomputers with thousands of processors and graphics processing units (GPUs). For harvesting the potential of conventional supercomputers, the main strategy is a multi-level parallelization scheme that combines the inherent scalability of real-time TDDFT with a real-space grid domain-partitioning approach. A scalable Poisson solver is critical for the efficiency of this scheme. For GPUs, we show how using blocks of Kohn-Sham states provides the required level of data parallelism and that this strategy is also applicable for code optimization on standard processors. Our results show that real-time TDDFT, as implemented in octopus, can be the method of choice for studying the excited states of large molecular systems in modern parallel architectures.

  4. Time-dependent density-functional theory in massively parallel computer architectures: the octopus project

    International Nuclear Information System (INIS)

    Andrade, Xavier; Aspuru-Guzik, Alán; Alberdi-Rodriguez, Joseba; Rubio, Angel; Strubbe, David A; Louie, Steven G; Oliveira, Micael J T; Nogueira, Fernando; Castro, Alberto; Muguerza, Javier; Arruabarrena, Agustin; Marques, Miguel A L

    2012-01-01

    Octopus is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this paper we present the ongoing efforts to achieve the parallelization of octopus. We focus on the real-time variant of TDDFT, where the time-dependent Kohn-Sham equations are directly propagated in time. This approach has great potential for execution in massively parallel systems such as modern supercomputers with thousands of processors and graphics processing units (GPUs). For harvesting the potential of conventional supercomputers, the main strategy is a multi-level parallelization scheme that combines the inherent scalability of real-time TDDFT with a real-space grid domain-partitioning approach. A scalable Poisson solver is critical for the efficiency of this scheme. For GPUs, we show how using blocks of Kohn-Sham states provides the required level of data parallelism and that this strategy is also applicable for code optimization on standard processors. Our results show that real-time TDDFT, as implemented in octopus, can be the method of choice for studying the excited states of large molecular systems in modern parallel architectures. (topical review)

  5. Influence of Paralleling Dies and Paralleling Half-Bridges on Transient Current Distribution in Multichip Power Modules

    DEFF Research Database (Denmark)

    Li, Helong; Zhou, Wei; Wang, Xiongfei

    2018-01-01

    This paper addresses the transient current distribution in the multichip half-bridge power modules, where two types of paralleling connections with different current commutation mechanisms are considered: paralleling dies and paralleling half-bridges. It reveals that with paralleling dies, both t...

  6. Determination of the onset nonlinearity hydrodynamic characteristics at two-phase flow in parallel vertical channels

    International Nuclear Information System (INIS)

    Jovic, V.; Afgan, N.; Jovic, L.; Spasojevic, D.

    1993-01-01

    The paper presents results of the experimental and theoretical analyses of linear and nonlinear characteristics of adiabatic two-phase water-air flow in vertical parallel channels. Regime character changes and linear to nonlinear dynamic characteristics transfer conditions were defined. (author)

  7. The STAPL Parallel Graph Library

    KAUST Repository

    Harshvardhan,

    2013-01-01

    This paper describes the stapl Parallel Graph Library, a high-level framework that abstracts the user from data-distribution and parallelism details and allows them to concentrate on parallel graph algorithm development. It includes a customizable distributed graph container and a collection of commonly used parallel graph algorithms. The library introduces pGraph pViews that separate algorithm design from the container implementation. It supports three graph processing algorithmic paradigms, level-synchronous, asynchronous and coarse-grained, and provides common graph algorithms based on them. Experimental results demonstrate improved scalability in performance and data size over existing graph libraries on more than 16,000 cores and on internet-scale graphs containing over 16 billion vertices and 250 billion edges. © Springer-Verlag Berlin Heidelberg 2013.

  8. PLAST: parallel local alignment search tool for database comparison

    Directory of Open Access Journals (Sweden)

    Lavenier Dominique

    2009-10-01

    Full Text Available Abstract Background Sequence similarity searching is an important and challenging task in molecular biology and next-generation sequencing should further strengthen the need for faster algorithms to process such vast amounts of data. At the same time, the internal architecture of current microprocessors is tending towards more parallelism, leading to the use of chips with two, four and more cores integrated on the same die. The main purpose of this work was to design an effective algorithm to fit with the parallel capabilities of modern microprocessors. Results A parallel algorithm for comparing large genomic banks and targeting middle-range computers has been developed and implemented in PLAST software. The algorithm exploits two key parallel features of existing and future microprocessors: the SIMD programming model (SSE instruction set and the multithreading concept (multicore. Compared to multithreaded BLAST software, tests performed on an 8-processor server have shown speedup ranging from 3 to 6 with a similar level of accuracy. Conclusion A parallel algorithmic approach driven by the knowledge of the internal microprocessor architecture allows significant speedup to be obtained while preserving standard sensitivity for similarity search problems.

  9. Parallel hierarchical global illumination

    Energy Technology Data Exchange (ETDEWEB)

    Snell, Quinn O. [Iowa State Univ., Ames, IA (United States)

    1997-10-08

    Solving the global illumination problem is equivalent to determining the intensity of every wavelength of light in all directions at every point in a given scene. The complexity of the problem has led researchers to use approximation methods for solving the problem on serial computers. Rather than using an approximation method, such as backward ray tracing or radiosity, the authors have chosen to solve the Rendering Equation by direct simulation of light transport from the light sources. This paper presents an algorithm that solves the Rendering Equation to any desired accuracy, and can be run in parallel on distributed memory or shared memory computer systems with excellent scaling properties. It appears superior in both speed and physical correctness to recent published methods involving bidirectional ray tracing or hybrid treatments of diffuse and specular surfaces. Like progressive radiosity methods, it dynamically refines the geometry decomposition where required, but does so without the excessive storage requirements for ray histories. The algorithm, called Photon, produces a scene which converges to the global illumination solution. This amounts to a huge task for a 1997-vintage serial computer, but using the power of a parallel supercomputer significantly reduces the time required to generate a solution. Currently, Photon can be run on most parallel environments from a shared memory multiprocessor to a parallel supercomputer, as well as on clusters of heterogeneous workstations.

  10. Quantification of the level descriptors for the standard EQ-5D three-level system and a five-level version according to two methods

    NARCIS (Netherlands)

    Janssen, M. F.; Birnie, E.; Bonsel, G. J.

    2008-01-01

    OBJECTIVES: Our aim was to compare the quantitative position of the level descriptors of the standard EQ-5D three-level system (3L) and a newly developed, experimental five-level version (5L) using a direct and a vignette-based indirect method. METHODS: Eighty-two respondents took part in the study.

  11. Parallelized event chain algorithm for dense hard sphere and polymer systems

    International Nuclear Information System (INIS)

    Kampmann, Tobias A.; Boltz, Horst-Holger; Kierfeld, Jan

    2015-01-01

    We combine parallelization and cluster Monte Carlo for hard sphere systems and present a parallelized event chain algorithm for the hard disk system in two dimensions. For parallelization we use a spatial partitioning approach into simulation cells. We find that it is crucial for correctness to ensure detailed balance on the level of Monte Carlo sweeps by drawing the starting sphere of event chains within each simulation cell with replacement. We analyze the performance gains for the parallelized event chain and find a criterion for an optimal degree of parallelization. Because of the cluster nature of event chain moves massive parallelization will not be optimal. Finally, we discuss first applications of the event chain algorithm to dense polymer systems, i.e., bundle-forming solutions of attractive semiflexible polymers

  12. Extended Kalman Filter Based Sliding Mode Control of Parallel-Connected Two Five-Phase PMSM Drive System

    Directory of Open Access Journals (Sweden)

    Tounsi Kamel

    2018-01-01

    Full Text Available This paper presents sliding mode control of sensor-less parallel-connected two five-phase permanent magnet synchronous machines (PMSMs fed by a single five-leg inverter. For both machines, the rotor speeds and rotor positions as well as load torques are estimated by using Extended Kalman Filter (EKF scheme. Fully decoupled control of both machines is possible via an appropriate phase transposition while connecting the stator windings parallel and employing proposed speed sensor-less method. In the resulting parallel-connected two-machine drive, the independent control of each machine in the group is achieved by controlling the stator currents and speed of each machine under vector control consideration. The effectiveness of the proposed Extended Kalman Filter in conjunction with the sliding mode control is confirmed through application of different load torques for wide speed range operation. Comparison between sliding mode control and PI control of the proposed two-motor drive is provided. The speed response shows a short rise time, an overshoot during reverse operation and settling times is 0.075 s when PI control is used. The speed response obtained by SMC is without overshoot and follows its reference and settling time is 0.028 s. Simulation results confirm that, in transient periods, sliding mode controller remarkably outperforms its counterpart PI controller.

  13. Experimental research on density wave oscillation of steam-water two-phase flow in parallel inclined internally ribbed pipes

    International Nuclear Information System (INIS)

    Gao Feng; Chen Tingkuan; Luo Yushan; Yin Fei; Liu Weimin

    2005-01-01

    At p=3-10 MPa, G=300-600 kg/(m 2 ·s), Δt sub =30-90 degree C, and q=0-190 kW/m 2 , the experiments on steam-water two-phase flow instabilities have been performed. The test sections are parallel inclined internally ribbed pipes with an outer diameter of φ38.1 mm, a wall thinkness of 7.5 mm, a obliquity of 19.5 and a length more than 15 m length. Based on the experimental results, the effects of pressure, mass velocity, inlet subcooling and asymmetrical heat flux on steam-water two-phase flow density wave oscillation were analyzed. The experimental results showed that the flow system were more stable as pressure increased. As an increase in mass velocity, critical heat flux increased but critical steam quality decreased. Inlet subcooling had a monotone effect on density wave oscillation, when inlet subcooling decreased, critical heat flux decreased. Under a certain working condition, critical heat flux on asymmetrically heating parallel pipes is higher than that on symmetrically heating parallel pipes, that means the system with symmetrically heating parallel pips was more stable. (authors)

  14. The Effect of Bite Registration on the Reproducibility of Parallel Periapical Radiographs Obtained with Two Month Intervals

    Directory of Open Access Journals (Sweden)

    L. Khojastehpour

    2006-06-01

    Full Text Available Statement of Problem: Digital Subtraction Radiography (DSR needs reproducible alignment between the x-ray source, the object, and the film for obtaining identical projections of the same anatomic region.Purpose: The aim of this study was to evaluate the effect of bite registrations (placed on individual bite blocks on the reproducibility of parallel periapical radiographs,obtained every 2 months, in patients undergoing periodontal surgery for furcationinvolvement.Materials and Methods: Ninety eight parallel periapical radiographs were used in this study. The radiographs were taken with individual bite-blocks attached to the beamguiding device. In order to individualize the bite blocks, bite registrations were fabricated using silicon impression material, and were placed on the individual bite blocks. All radiographs in each series were processed under similar conditions and were digitized with the flatbed scanner fitted with a transparency adaptor (hp Scanjet 7400 at 300 dpi resolution. Reproducibility of this method for obtaining similar parallel periapical radiographs was assessed by measuring the horizontal and vertical distances between two selected unchanged reference points on each radiograph and comparing them in each series. Reliability of measurements was analyzed using the one wayrandom model intraclass correlation coefficient for average of raters.Results: For both measurements (Horizontal and Vertical statistically significant reliability was found between three repeated radiographs with two month intervals in 16 patients, as well as 5 repeated radiographs with two month intervals in 10 patients (P<0.001.Conclusion: The result of this study shows that bite registration on individual bite blocks is enough for obtaining identical parallel periapical radiographs.

  15. Extending molecular simulation time scales: Parallel in time integrations for high-level quantum chemistry and complex force representations

    International Nuclear Information System (INIS)

    Bylaska, Eric J.; Weare, Jonathan Q.; Weare, John H.

    2013-01-01

    Parallel in time simulation algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f (e.g., Verlet algorithm), is available to propagate the system from time t i (trajectory positions and velocities x i = (r i , v i )) to time t i+1 (x i+1 ) by x i+1 = f i (x i ), the dynamics problem spanning an interval from t 0 …t M can be transformed into a root finding problem, F(X) = [x i − f(x (i−1 )] i =1,M = 0, for the trajectory variables. The root finding problem is solved using a variety of root finding techniques, including quasi-Newton and preconditioned quasi-Newton schemes that are all unconditionally convergent. The algorithms are parallelized by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed parallel in time methods is discussed, and the effectiveness of various approaches to solving the root finding problem is tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD simulations, such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The parallel in time algorithms developed are tested by applying them to MD and AIMD simulations of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD simulation using Stillinger-Weber potentials, and a HCl + 4H 2 O AIMD simulation at the MP2 level. The maximum speedup ((serial execution time)/(parallel execution time) ) obtained by parallelizing the Stillinger-Weber MD simulation was nearly 3.0. For the AIMD MP2 simulations, the algorithms achieved speedups of up to 14.3. The parallel in time algorithms can be implemented in a

  16. Information Entropy Squeezing of a Two-Level Atom Interacting with Two-Mode Coherent Fields

    Institute of Scientific and Technical Information of China (English)

    LIU Xiao-Juan; FANG Mao-Fa

    2004-01-01

    From a quantum information point of view we investigate the entropy squeezing properties for a two-level atom interacting with the two-mode coherent fields via the two-photon transition. We discuss the influences of the initial state of the system on the atomic information entropy squeezing. Our results show that the squeezed component number,squeezed direction, and time of the information entropy squeezing can be controlled by choosing atomic distribution angle,the relative phase between the atom and the two-mode field, and the difference of the average photon number of the two field modes, respectively. Quantum information entropy is a remarkable precision measure for the atomic squeezing.

  17. Parallel programming of saccades during natural scene viewing: evidence from eye movement positions.

    Science.gov (United States)

    Wu, Esther X W; Gilani, Syed Omer; van Boxtel, Jeroen J A; Amihai, Ido; Chua, Fook Kee; Yen, Shih-Cheng

    2013-10-24

    Previous studies have shown that saccade plans during natural scene viewing can be programmed in parallel. This evidence comes mainly from temporal indicators, i.e., fixation durations and latencies. In the current study, we asked whether eye movement positions recorded during scene viewing also reflect parallel programming of saccades. As participants viewed scenes in preparation for a memory task, their inspection of the scene was suddenly disrupted by a transition to another scene. We examined whether saccades after the transition were invariably directed immediately toward the center or were contingent on saccade onset times relative to the transition. The results, which showed a dissociation in eye movement behavior between two groups of saccades after the scene transition, supported the parallel programming account. Saccades with relatively long onset times (>100 ms) after the transition were directed immediately toward the center of the scene, probably to restart scene exploration. Saccades with short onset times (programming of saccades during scene viewing. Additionally, results from the analyses of intersaccadic intervals were also consistent with the parallel programming hypothesis.

  18. Parallel inter channel interaction mechanisms

    International Nuclear Information System (INIS)

    Jovic, V.; Afgan, N.; Jovic, L.

    1995-01-01

    Parallel channels interactions are examined. For experimental researches of nonstationary regimes flow in three parallel vertical channels results of phenomenon analysis and mechanisms of parallel channel interaction for adiabatic condition of one-phase fluid and two-phase mixture flow are shown. (author)

  19. Analysis on detection accuracy of binocular photoelectric instrument optical axis parallelism digital calibration instrument

    Science.gov (United States)

    Ying, Jia-ju; Yin, Jian-ling; Wu, Dong-sheng; Liu, Jie; Chen, Yu-dan

    2017-11-01

    Low-light level night vision device and thermal infrared imaging binocular photoelectric instrument are used widely. The maladjustment of binocular instrument ocular axises parallelism will cause the observer the symptom such as dizziness, nausea, when use for a long time. Binocular photoelectric equipment digital calibration instrument is developed for detecting ocular axises parallelism. And the quantitative value of optical axis deviation can be quantitatively measured. As a testing instrument, the precision must be much higher than the standard of test instrument. Analyzes the factors that influence the accuracy of detection. Factors exist in each testing process link which affect the precision of the detecting instrument. They can be divided into two categories, one category is factors which directly affect the position of reticle image, the other category is factors which affect the calculation the center of reticle image. And the Synthesize error is calculated out. And further distribute the errors reasonably to ensure the accuracy of calibration instruments.

  20. Memory Retrieval Given Two Independent Cues: Cue Selection or Parallel Access?

    Science.gov (United States)

    Rickard, Timothy C.; Bajic, Daniel

    2004-01-01

    A basic but unresolved issue in the study of memory retrieval is whether multiple independent cues can be used concurrently (i.e., in parallel) to recall a single, common response. A number of empirical results, as well as potentially applicable theories, suggest that retrieval can proceed in parallel, though Rickard (1997) set forth a model that…

  1. Probabilistic Teleportation of an Arbitrary Three-Level Two-Particle State and Classical Communication Cost

    Institute of Scientific and Technical Information of China (English)

    DAIHong-Yi; KUANGLe-Man; LICheng-Zu

    2005-01-01

    We propose a scheme to probabilistically teleport an unknown arbitrary three-level two-particle state by using two partial entangled two-particle states of three-level as the quantum channel. The classical communication cost required in the ideal probabilistic teleportation process is also calculated. This scheme can be directly generalized to teleport an unknown and arbitrary three-level K-particle state by using K partial entangled two-particle states of three-level as the quantum channel.

  2. Enhanced 2D-DOA Estimation for Large Spacing Three-Parallel Uniform Linear Arrays

    Directory of Open Access Journals (Sweden)

    Dong Zhang

    2018-01-01

    Full Text Available An enhanced two-dimensional direction of arrival (2D-DOA estimation algorithm for large spacing three-parallel uniform linear arrays (ULAs is proposed in this paper. Firstly, we use the propagator method (PM to get the highly accurate but ambiguous estimation of directional cosine. Then, we use the relationship between the directional cosine to eliminate the ambiguity. This algorithm not only can make use of the elements of the three-parallel ULAs but also can utilize the connection between directional cosine to improve the estimation accuracy. Besides, it has satisfied estimation performance when the elevation angle is between 70° and 90° and it can automatically pair the estimated azimuth and elevation angles. Furthermore, it has low complexity without using any eigen value decomposition (EVD or singular value decompostion (SVD to the covariance matrix. Simulation results demonstrate the effectiveness of our proposed algorithm.

  3. Algorithm for Solution of Direct Kinematic Problem of Multi-sectional Manipulator with Parallel Structure

    Directory of Open Access Journals (Sweden)

    A. L. Lapikov

    2014-01-01

    Full Text Available The article is aimed at creating techniques to study multi-sectional manipulators with parallel structure. To solve this task the analysis in the field concerned was carried out to reveal both advantages and drawbacks of such executive mechanisms and main problems to be encountered in the course of research. The work shows that it is inefficient to create complete mathematical models of multisectional manipulators, which in the context of solving a direct kinematic problem are to derive a functional dependence of location and orientation of the end effector on all the generalized coordinates of the mechanism. The structure of multisectional manipulators was considered, where the sections are platform manipulators of parallel kinematics with six degrees of freedom. The paper offers an algorithm to define location and orientation of the end effector of the manipulator by means of iterative solution of analytical equation of the moving platform plane for each section. The equation for the unknown plane is derived using three points, which are attachment points of the moving platform joints. To define the values of joint coordinates a system of nine non-linear equations is completed. It is necessary to mention that for completion of the equation system are used the equations with the same type of non-linearity. The physical sense of all nine equations of the system is Euclidean distance between the points of the manipulator. The result of algorithm execution is a matrix of homogenous transformation for each section. The correlations describing transformations between adjoining sections of the manipulator are given. An example of the mechanism consisting of three sections is examined. The comparison of theoretical calculations with results obtained on a 3D-prototype is made. The next step of the work is to conduct research activities both in the field of dynamics of platform parallel kinematics manipulators with six degrees of freedom and in the

  4. Comments on X. Yin, A. Wen, Y. Chen, and T. Wang, `Studies in an optical millimeter-wave generation scheme via two parallel dual-parallel Mach-Zehnder modulators', Journal of Modern Optics, 58(8), 2011, pp. 665-673

    Science.gov (United States)

    Hasan, Mehedi; Maldonado-Basilio, Ramón; Hall, Trevor J.

    2015-04-01

    Yin et al. have described an innovative filter-less optical millimeter-wave generation scheme for octotupling of a 10 GHz RF oscillator, or sedecimtupling of a 5 GHz RF oscillator using two parallel dual-parallel Mach-Zehnder modulators (DP-MZMs). The great merit of their design is the suppression of all harmonics except those of order ? (octotupling) or all harmonics except those of order ? (sedecimtupling), where ? is an integer. A demerit of their scheme is the requirement to set a precise RF signal modulation index in order to suppress the zeroth order optical carrier. The purpose of this comment is to show that, in the case of the octotupling function, all harmonics may be suppressed except those of order ?, where ? is an odd integer, by the simple addition of an optical ? phase shift between the two DP-MZMs and an adjustment of the RF drive phases. Since the carrier is suppressed in the modified architecture, the octotupling circuit is thereby released of the strict requirement to set the drive level to a precise value without any significant increase in circuit complexity.

  5. Parallel transport of long mean-free-path plasma along open magnetic field lines: Parallel heat flux

    International Nuclear Information System (INIS)

    Guo Zehua; Tang Xianzhu

    2012-01-01

    In a long mean-free-path plasma where temperature anisotropy can be sustained, the parallel heat flux has two components with one associated with the parallel thermal energy and the other the perpendicular thermal energy. Due to the large deviation of the distribution function from local Maxwellian in an open field line plasma with low collisionality, the conventional perturbative calculation of the parallel heat flux closure in its local or non-local form is no longer applicable. Here, a non-perturbative calculation is presented for a collisionless plasma in a two-dimensional flux expander bounded by absorbing walls. Specifically, closures of previously unfamiliar form are obtained for ions and electrons, which relate two distinct components of the species parallel heat flux to the lower order fluid moments such as density, parallel flow, parallel and perpendicular temperatures, and the field quantities such as the magnetic field strength and the electrostatic potential. The plasma source and boundary condition at the absorbing wall enter explicitly in the closure calculation. Although the closure calculation does not take into account wave-particle interactions, the results based on passing orbits from steady-state collisionless drift-kinetic equation show remarkable agreement with fully kinetic-Maxwell simulations. As an example of the physical implications of the theory, the parallel heat flux closures are found to predict a surprising observation in the kinetic-Maxwell simulation of the 2D magnetic flux expander problem, where the parallel heat flux of the parallel thermal energy flows from low to high parallel temperature region.

  6. Comparative study on current limiting characteristics of flux-lock type SFCL with series or parallel connection of two coils

    International Nuclear Information System (INIS)

    Lim, S.H.

    2008-01-01

    We investigated the current limiting characteristics of the flux-lock type superconducting fault current limiter (SFCL) with series or parallel connection of two coils. These two flux-lock type SFCLs with magnetically coupled two coils have the same operational principle that the fault current can be limited by the magnetic flux generated between two coils of the SFCL when a fault happens. In addition, the inductance ratio and the winding direction of two coils in both the SFCLs are the major design parameters that affect the fault current limiting characteristics of the SFCL. On the other hand, the operational current and the limiting impedance of both the SFCLs under the same design condition have the different tendency, which results from the different winding methods of two coils on an iron core. Therefore, the comparative study for both the SFCLs from the current limiting performance of the SFCL point of view is needed. To compare the current limiting characteristics of both the SFCLs, the operational current and the limiting impedance of the SFCL, which describes the performance of the SFCL, were derived from each SFCL's electrical equivalent circuit. Through the analysis for the fault current limiting experiments of both the SFCLs, the different current limiting characteristics of both the SFCLs were discussed

  7. Accurately bi-orthogonal direct and adjoint lambda modes via two-sided Eigen-solvers

    International Nuclear Information System (INIS)

    Roman, J.E.; Vidal, V.; Verdu, G.

    2005-01-01

    This work is concerned with the accurate computation of the dominant l-modes (Lambda mode) of the reactor core in order to approximate the solution of the neutron diffusion equation in different situations such as the transient modal analysis. In a previous work, the problem was already addressed by implementing a parallel program based on SLEPc (Scalable Library for Eigenvalue Problem Computations), a public domain software for the solution of eigenvalue problems. Now, the proposed solution is extended by incorporating also the computation of the adjoint l-modes in such a way that the bi-orthogonality condition is enforced very accurately. This feature is very desirable in some types of analyses, and in the proposed scheme it is achieved by making use of two-sided eigenvalue solving software. Current implementations of some of these software, while still susceptible of improvement, show that they can be competitive in terms of response time and accuracy with respect to other types of eigenvalue solving software. The code developed by the authors has parallel capabilities in order to be able to analyze reactors with a great level of detail in a short time. (authors)

  8. Accurately bi-orthogonal direct and adjoint lambda modes via two-sided Eigen-solvers

    Energy Technology Data Exchange (ETDEWEB)

    Roman, J.E.; Vidal, V. [Valencia Univ. Politecnica, D. Sistemas Informaticos y Computacion (Spain); Verdu, G. [Valencia Univ. Politecnica, D. Ingenieria Quimica y Nuclear (Spain)

    2005-07-01

    This work is concerned with the accurate computation of the dominant l-modes (Lambda mode) of the reactor core in order to approximate the solution of the neutron diffusion equation in different situations such as the transient modal analysis. In a previous work, the problem was already addressed by implementing a parallel program based on SLEPc (Scalable Library for Eigenvalue Problem Computations), a public domain software for the solution of eigenvalue problems. Now, the proposed solution is extended by incorporating also the computation of the adjoint l-modes in such a way that the bi-orthogonality condition is enforced very accurately. This feature is very desirable in some types of analyses, and in the proposed scheme it is achieved by making use of two-sided eigenvalue solving software. Current implementations of some of these software, while still susceptible of improvement, show that they can be competitive in terms of response time and accuracy with respect to other types of eigenvalue solving software. The code developed by the authors has parallel capabilities in order to be able to analyze reactors with a great level of detail in a short time. (authors)

  9. Extending molecular simulation time scales: Parallel in time integrations for high-level quantum chemistry and complex force representations

    Energy Technology Data Exchange (ETDEWEB)

    Bylaska, Eric J., E-mail: Eric.Bylaska@pnnl.gov [Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, P.O. Box 999, Richland, Washington 99352 (United States); Weare, Jonathan Q., E-mail: weare@uchicago.edu [Department of Mathematics, University of Chicago, Chicago, Illinois 60637 (United States); Weare, John H., E-mail: jweare@ucsd.edu [Department of Chemistry and Biochemistry, University of California, San Diego, La Jolla, California 92093 (United States)

    2013-08-21

    Parallel in time simulation algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f (e.g., Verlet algorithm), is available to propagate the system from time t{sub i} (trajectory positions and velocities x{sub i} = (r{sub i}, v{sub i})) to time t{sub i+1} (x{sub i+1}) by x{sub i+1} = f{sub i}(x{sub i}), the dynamics problem spanning an interval from t{sub 0}…t{sub M} can be transformed into a root finding problem, F(X) = [x{sub i} − f(x{sub (i−1})]{sub i} {sub =1,M} = 0, for the trajectory variables. The root finding problem is solved using a variety of root finding techniques, including quasi-Newton and preconditioned quasi-Newton schemes that are all unconditionally convergent. The algorithms are parallelized by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed parallel in time methods is discussed, and the effectiveness of various approaches to solving the root finding problem is tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD simulations, such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The parallel in time algorithms developed are tested by applying them to MD and AIMD simulations of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD simulation using Stillinger-Weber potentials, and a HCl + 4H{sub 2}O AIMD simulation at the MP2 level. The maximum speedup ((serial execution time)/(parallel execution time) ) obtained by parallelizing the Stillinger-Weber MD simulation was nearly 3.0. For the AIMD MP2 simulations, the algorithms achieved speedups of up

  10. Parallel alternating direction preconditioner for isogeometric simulations of explicit dynamics

    KAUST Repository

    Łoś, Marcin; Woźniak, Maciej; Paszyński, Maciej; Dalcin, Lisandro; Calo, Victor M.

    2015-01-01

    incorporated as a part of PETIGA an isogeometric framework [7] build on top of PETSc [8]. We show the scalability of the parallel algorithm on STAMPEDE linux cluster up to 10,000 processors, as well as the convergence rate of the PCG solver

  11. Dynamic grid refinement for partial differential equations on parallel computers

    International Nuclear Information System (INIS)

    Mccormick, S.; Quinlan, D.

    1989-01-01

    The fast adaptive composite grid method (FAC) is an algorithm that uses various levels of uniform grids to provide adaptive resolution and fast solution of PDEs. An asynchronous version of FAC, called AFAC, that completely eliminates the bottleneck to parallelism is presented. This paper describes the advantage that this algorithm has in adaptive refinement for moving singularities on multiprocessor computers. This work is applicable to the parallel solution of two- and three-dimensional shock tracking problems. 6 refs

  12. Automatic Dictionary Expansion Using Non-parallel Corpora

    Science.gov (United States)

    Rapp, Reinhard; Zock, Michael

    Automatically generating bilingual dictionaries from parallel, manually translated texts is a well established technique that works well in practice. However, parallel texts are a scarce resource. Therefore, it is desirable also to be able to generate dictionaries from pairs of comparable monolingual corpora. For most languages, such corpora are much easier to acquire, and often in considerably larger quantities. In this paper we present the implementation of an algorithm which exploits such corpora with good success. Based on the assumption that the co-occurrence patterns between different languages are related, it expands a small base lexicon. For improved performance, it also realizes a novel interlingua approach. That is, if corpora of more than two languages are available, the translations from one language to another can be determined not only directly, but also indirectly via a pivot language.

  13. Design of a highly parallel board-level-interconnection with 320 Gbps capacity

    Science.gov (United States)

    Lohmann, U.; Jahns, J.; Limmer, S.; Fey, D.; Bauer, H.

    2012-01-01

    A parallel board-level interconnection design is presented consisting of 32 channels, each operating at 10 Gbps. The hardware uses available optoelectronic components (VCSEL, TIA, pin-diodes) and a combination of planarintegrated free-space optics, fiber-bundles and available MEMS-components, like the DMD™ from Texas Instruments. As a specific feature, we present a new modular inter-board interconnect, realized by 3D fiber-matrix connectors. The performance of the interconnect is evaluated with regard to optical properties and power consumption. Finally, we discuss the application of the interconnect for strongly distributed system architectures, as, for example, in high performance embedded computing systems and data centers.

  14. A development framework for parallel CFD applications: TRIOU project

    International Nuclear Information System (INIS)

    Calvin, Ch.

    2003-01-01

    We present in this paper the parallel structure of a thermal-hydraulic framework: Trio-U. This development platform has been designed in order to solve large 3-dimensional structured or unstructured CFD (computational fluid dynamics) problems. The code is intrinsically parallel, and an object-oriented design, UML, is used. The implementation language chosen is C++. All the parallelism management and the communication routines have been encapsulated. Parallel I/O and communication classes over standard I/O streams of C++ have been defined, which allows the developer an easy use of the different modules of the application without dealing with basic parallel process management and communications. Moreover, the encapsulation of the communication routines, guarantees the portability of the application and allows an efficient tuning of basic communication methods in order to achieve the best performances of the target architecture. The speed-up of parallel applications designed using the Trio U framework are very good since we obtained, for instance, on complex turbulent flow Large Eddy Simulation (LES) simulations an efficiency of up to 90% on 20 processors. The efficiencies obtained on direct numerical simulations of two phase flow fluids are similar since the speed-up is nearly equals to 7.5 for a 3-dimensional simulation using a one million element mesh on 8 processors. The purpose of this paper is to focus on the main concepts and their implementation that were the guidelines of the design of the parallel architecture of the code. (author)

  15. Unsteady free convection MHD flow between two heated vertical parallel plates in induced magnetic field

    International Nuclear Information System (INIS)

    Chakraborty, S.; Borkakati, A.K.

    1999-01-01

    An unsteady viscous incompressible free convection flow of an electrically conducting fluid between two heated vertical parallel plates is considered in presence of a uniform magnetic field applied transversely to the flow. The approximate analytical solutions for velocity, induced field and temperature distributions are obtained for small and large magnetic Reynolds number. The skin-friction on the two plates are obtained and plotted graphically. The problem is extended for thermometric case. (author)

  16. Extending molecular simulation time scales: Parallel in time integrations for high-level quantum chemistry and complex force representations.

    Science.gov (United States)

    Bylaska, Eric J; Weare, Jonathan Q; Weare, John H

    2013-08-21

    Parallel in time simulation algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f (e.g., Verlet algorithm), is available to propagate the system from time ti (trajectory positions and velocities xi = (ri, vi)) to time ti + 1 (xi + 1) by xi + 1 = fi(xi), the dynamics problem spanning an interval from t0[ellipsis (horizontal)]tM can be transformed into a root finding problem, F(X) = [xi - f(x(i - 1)]i = 1, M = 0, for the trajectory variables. The root finding problem is solved using a variety of root finding techniques, including quasi-Newton and preconditioned quasi-Newton schemes that are all unconditionally convergent. The algorithms are parallelized by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed parallel in time methods is discussed, and the effectiveness of various approaches to solving the root finding problem is tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD simulations, such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The parallel in time algorithms developed are tested by applying them to MD and AIMD simulations of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD simulation using Stillinger-Weber potentials, and a HCl + 4H2O AIMD simulation at the MP2 level. The maximum speedup (serial execution/timeparallel execution time) obtained by parallelizing the Stillinger-Weber MD simulation was nearly 3.0. For the AIMD MP2 simulations, the algorithms achieved speedups of up to 14.3. The parallel in time algorithms can be implemented in a

  17. Linear and nonlinear excitations in two stacks of parallel arrays of long Josephson junctions

    DEFF Research Database (Denmark)

    Carapella, G.; Constabile, Giovanni; Latempa, R.

    2000-01-01

    We investigate a structure consisting of two parallel arrays of long Josephson junctions sharing a common electrode that allows inductive coupling between the arrays. A model for this structure is derived starting from the description of its continuous limit. The excitation of linear cavity modes...... known from continuous and discrete systems as well as the excitation of a new state exhibiting synchronization in two dimensions are inferred from the mathematical model of the system. The stable nonlinear solution of the coupled sine-Gordon equations describing the system is found to consist...

  18. Parallel sparse direct solvers for Poisson's equation in streamer discharges

    NARCIS (Netherlands)

    M. Nool (Margreet); M. Genseberger (Menno); U. M. Ebert (Ute)

    2017-01-01

    textabstractThe aim of this paper is to examine whether a hybrid approach of parallel computing, a combination of the message passing model (MPI) with the threads model (OpenMP) can deliver good performance in streamer discharge simulations. Since one of the bottlenecks of almost all streamer

  19. Parallel diffusion length on thermal neutrons in rod type lattices

    International Nuclear Information System (INIS)

    Ahmed, T.; Siddiqui, S.A.M.M.; Khan, A.M.

    1981-11-01

    Calculation of diffusion lengths of thermal neutrons in lead-water and aluminum water lattices in direction parallel to the rods are performed using one group diffusion equation together with Shevelev transport correction. The formalism is then applied to two practical cases, the Kawasaki (Hitachi) and the Douglas point (Candu) reactor lattices. Our results are in good agreement with the observed values. (author)

  20. The series-parallel circuit in the treatment of fulminant hepatitis.

    Science.gov (United States)

    Nakae, Hajime; Yonekawa, Chikara; Moon, Sunkwi; Tajimi, Kimitaka

    2004-04-01

    We developed a series-parallel treatment method for combined plasma exchange (PE) and continuous hemodiafiltration (CHDF) therapy in fulminant hepatitis. We then compared total serum bilirubin, citrate, and cytokine levels obtained by the new methods to those obtained with treatment by the single and reverse-parallel PE methods. Ten adult patients with fulminant hepatitis consented to participate. Plasma exchange was conducted 25 times by the single method (PE only), 16 times by the reverse-parallel method, and 37 times by the series-parallel method. The percentage of total bilirubin removed was highest with the single method followed in order by that with the series-parallel and reverse-parallel methods; the differences were significant. The percentage increase in citrate level was highest with the single method, followed in order by that with the series-parallel and the reverse-parallel methods; these differences were also significant. There was no significant difference in serum interleukin (IL)-6 levels after PE, by the single or the reverse-parallel methods. However, the IL-6 level decreased significantly following PE by the series-parallel method. The serum IL-18 level decreased significantly following PE by each of the three methods. Thus, removal of excess bilirubin, citrate, and cytokines by the series-parallel method, a simple maneuver with excellent removal rates, was considered effective.

  1. Parallel computing of a climate model on the dawn 1000 by domain decomposition method

    Science.gov (United States)

    Bi, Xunqiang

    1997-12-01

    In this paper the parallel computing of a grid-point nine-level atmospheric general circulation model on the Dawn 1000 is introduced. The model was developed by the Institute of Atmospheric Physics (IAP), Chinese Academy of Sciences (CAS). The Dawn 1000 is a MIMD massive parallel computer made by National Research Center for Intelligent Computer (NCIC), CAS. A two-dimensional domain decomposition method is adopted to perform the parallel computing. The potential ways to increase the speed-up ratio and exploit more resources of future massively parallel supercomputation are also discussed.

  2. Massively parallel mathematical sieves

    Energy Technology Data Exchange (ETDEWEB)

    Montry, G.R.

    1989-01-01

    The Sieve of Eratosthenes is a well-known algorithm for finding all prime numbers in a given subset of integers. A parallel version of the Sieve is described that produces computational speedups over 800 on a hypercube with 1,024 processing elements for problems of fixed size. Computational speedups as high as 980 are achieved when the problem size per processor is fixed. The method of parallelization generalizes to other sieves and will be efficient on any ensemble architecture. We investigate two highly parallel sieves using scattered decomposition and compare their performance on a hypercube multiprocessor. A comparison of different parallelization techniques for the sieve illustrates the trade-offs necessary in the design and implementation of massively parallel algorithms for large ensemble computers.

  3. Large-scale parallel configuration interaction. II. Two- and four-component double-group general active space implementation with application to BiH

    DEFF Research Database (Denmark)

    Knecht, Stefan; Jensen, Hans Jørgen Aagaard; Fleig, Timo

    2010-01-01

    We present a parallel implementation of a large-scale relativistic double-group configuration interaction CIprogram. It is applicable with a large variety of two- and four-component Hamiltonians. The parallel algorithm is based on a distributed data model in combination with a static load balanci...

  4. Event parallelism: Distributed memory parallel computing for high energy physics experiments

    International Nuclear Information System (INIS)

    Nash, T.

    1989-05-01

    This paper describes the present and expected future development of distributed memory parallel computers for high energy physics experiments. It covers the use of event parallel microprocessor farms, particularly at Fermilab, including both ACP multiprocessors and farms of MicroVAXES. These systems have proven very cost effective in the past. A case is made for moving to the more open environment of UNIX and RISC processors. The 2nd Generation ACP Multiprocessor System, which is based on powerful RISC systems, is described. Given the promise of still more extraordinary increases in processor performance, a new emphasis on point to point, rather than bussed, communication will be required. Developments in this direction are described. 6 figs

  5. Event parallelism: Distributed memory parallel computing for high energy physics experiments

    International Nuclear Information System (INIS)

    Nash, T.

    1989-01-01

    This paper describes the present and expected future development of distributed memory parallel computers for high energy physics experiments. It covers the use of event parallel microprocessor farms, particularly at Fermilab, including both ACP multiprocessors and farms of MicroVAXES. These systems have proven very cost effective in the past. A case is made for moving to the more open environment of UNIX and RISC processors. The 2nd Generation ACP Multiprocessor System, which is based on powerful RISC systems, is described. Given the promise of still more extraordinary increases in processor performance, a new emphasis on point to point, rather than bussed, communication will be required. Developments in this direction are described. (orig.)

  6. Event parallelism: Distributed memory parallel computing for high energy physics experiments

    Science.gov (United States)

    Nash, Thomas

    1989-12-01

    This paper describes the present and expected future development of distributed memory parallel computers for high energy physics experiments. It covers the use of event parallel microprocessor farms, particularly at Fermilab, including both ACP multiprocessors and farms of MicroVAXES. These systems have proven very cost effective in the past. A case is made for moving to the more open environment of UNIX and RISC processors. The 2nd Generation ACP Multiprocessor System, which is based on powerful RISC system, is described. Given the promise of still more extraordinary increases in processor performance, a new emphasis on point to point, rather than bussed, communication will be required. Developments in this direction are described.

  7. SequenceL: Automated Parallel Algorithms Derived from CSP-NT Computational Laws

    Science.gov (United States)

    Cooke, Daniel; Rushton, Nelson

    2013-01-01

    With the introduction of new parallel architectures like the cell and multicore chips from IBM, Intel, AMD, and ARM, as well as the petascale processing available for highend computing, a larger number of programmers will need to write parallel codes. Adding the parallel control structure to the sequence, selection, and iterative control constructs increases the complexity of code development, which often results in increased development costs and decreased reliability. SequenceL is a high-level programming language that is, a programming language that is closer to a human s way of thinking than to a machine s. Historically, high-level languages have resulted in decreased development costs and increased reliability, at the expense of performance. In recent applications at JSC and in industry, SequenceL has demonstrated the usual advantages of high-level programming in terms of low cost and high reliability. SequenceL programs, however, have run at speeds typically comparable with, and in many cases faster than, their counterparts written in C and C++ when run on single-core processors. Moreover, SequenceL is able to generate parallel executables automatically for multicore hardware, gaining parallel speedups without any extra effort from the programmer beyond what is required to write the sequen tial/singlecore code. A SequenceL-to-C++ translator has been developed that automatically renders readable multithreaded C++ from a combination of a SequenceL program and sample data input. The SequenceL language is based on two fundamental computational laws, Consume-Simplify- Produce (CSP) and Normalize-Trans - pose (NT), which enable it to automate the creation of parallel algorithms from high-level code that has no annotations of parallelism whatsoever. In our anecdotal experience, SequenceL development has been in every case less costly than development of the same algorithm in sequential (that is, single-core, single process) C or C++, and an order of magnitude less

  8. Hydrogenic donor impurity in parallel-triangular quantum wires: Hydrostatic pressure and applied electric field effects

    International Nuclear Information System (INIS)

    Restrepo, R.L.; Giraldo, E.; Miranda, G.L.; Ospina, W.; Duque, C.A.

    2009-01-01

    The combined effects of the hydrostatic pressure and in-growth direction applied electric field on the binding energy of hydrogenic shallow-donor impurity states in parallel-coupled-GaAs-Ga 1-x Al x As-quantum-well wires are calculated using a variational procedure within the effective-mass and parabolic-band approximations. Results are obtained for several dimensions of the structure, shallow-donor impurity positions, hydrostatic pressure, and applied electric field. Our results suggest that external inputs such us hydrostatic pressure and in-growth direction electric field are two useful tools in order to modify the binding energy of a donor impurity in parallel-coupled-quantum-well wires.

  9. Transfer function modeling of parallel connected two three-phase induction motor implementation using LabView platform

    DEFF Research Database (Denmark)

    Gunabalan, R.; Sanjeevikumar, P.; Blaabjerg, Frede

    2015-01-01

    This paper presents the transfer function modeling and stability analysis of two induction motors of same ratings and parameters connected in parallel. The induction motors are controlled by a single inverter and the entire drive system is modeled using transfer function in LabView. Further...

  10. Parallel and vector implementation of APROS simulator code

    International Nuclear Information System (INIS)

    Niemi, J.; Tommiska, J.

    1990-01-01

    In this paper the vector and parallel processing implementation of a general purpose simulator code is discussed. In this code the utilization of vector processing is straightforward. In addition to the loop level parallel processing, the functional decomposition and the domain decomposition have been considered. Results represented for a PWR-plant simulation illustrate the potential speed-up factors of the alternatives. It turns out that the loop level parallelism and the domain decomposition are the most promising alternative to employ the parallel processing. (author)

  11. The Development of Reading and Spelling in Arabic Orthography: Two Parallel Processes?

    Science.gov (United States)

    Taha, Haitham

    2016-01-01

    The parallels between reading and spelling skills in Arabic were tested. One-hundred forty-three native Arab students, with typical reading development, from second, fourth, and sixth grades were tested with reading, spelling and orthographic decision tasks. The results indicated a full parallel between the reading and spelling performances within…

  12. Physicality and Digitality: Parallelisms at a Material Level

    DEFF Research Database (Denmark)

    Gjerlufsen, Tony; Olsen, Jesper Wolff

    2007-01-01

    What is the striking difference between doing in a purely non-digital world and doing though, on, at and with digitally augmented physical entities? The work described in this article sets out to explore the nature of the physical an digital world at a material level. To contemplate the two world...... and Digitality. Each of which captures the essence of the two worlds, including their individual defining basic qualities. through an increase in understanding of the two terms we hope to inform designers and researchers about the intermixture of the two worlds....

  13. The design of multi-core DSP parallel model based on message passing and multi-level pipeline

    Science.gov (United States)

    Niu, Jingyu; Hu, Jian; He, Wenjing; Meng, Fanrong; Li, Chuanrong

    2017-10-01

    Currently, the design of embedded signal processing system is often based on a specific application, but this idea is not conducive to the rapid development of signal processing technology. In this paper, a parallel processing model architecture based on multi-core DSP platform is designed, and it is mainly suitable for the complex algorithms which are composed of different modules. This model combines the ideas of multi-level pipeline parallelism and message passing, and summarizes the advantages of the mainstream model of multi-core DSP (the Master-Slave model and the Data Flow model), so that it has better performance. This paper uses three-dimensional image generation algorithm to validate the efficiency of the proposed model by comparing with the effectiveness of the Master-Slave and the Data Flow model.

  14. Theoretical analysis on ac loss properties of two-strand parallel conductors composed of superconducting multifilamentary strands

    CERN Document Server

    Iwakuma, M; Funaki, K

    2002-01-01

    The ac loss properties of two-strand parallel conductors composed of superconducting multifilamentary strands were theoretically investigated. The constituent strands generally need to be insulated and transposed for the sake of uniform current distribution and low ac loss. In case the transposition points deviate from the optimum ones, shielding current is induced according to the interlinkage magnetic flux of the twisted loop enclosed by the insulated strands and the contact resistances at the terminals. It produces an additional ac loss. Supposing a simple situation where a two-strand parallel conductor with one-point transposition is exposed to a uniform ac magnetic field, the basic equations for the magnetic field were proposed and the theoretical expressions of the additional ac losses derived. As a result, the following features were shown. The additional ac loss in the non-saturation case, where the induced shielding current is less than the critical current of a strand, is proportional to the square ...

  15. Comparative Evaluation and Case Studies of Shared-Memory and Data-Parallel Execution Patterns

    Directory of Open Access Journals (Sweden)

    Xiaodong Zhang

    1999-01-01

    Full Text Available Shared‐memory and data‐parallel programming models are two important paradigms for scientific applications. Both models provide high‐level program abstractions, and simple and uniform views of network structures. The common features of the two models significantly simplify program coding and debugging for scientific applications. However, the underlining execution and overhead patterns are significantly different between the two models due to their programming constraints, and due to different and complex structures of interconnection networks and systems which support the two models. We performed this experimental study to present implications and comparisons of execution patterns on two commercial architectures. We implemented a standard electromagnetic simulation program (EM and a linear system solver using the shared‐memory model on the KSR‐1 and the data‐parallel model on the CM‐5. Our objectives are to examine the execution pattern changes required for an implementation transformation between the two models; to study memory access patterns; to address scalability issues; and to investigate relative costs and advantages/disadvantages of using the two models for scientific computations. Our results indicate that the EM program tends to become computation‐intensive in the KSR‐1 shared‐memory system, and memory‐demanding in the CM‐5 data‐parallel system when the systems and the problems are scaled. The EM program, a highly data‐parallel program performed extremely well, and the linear system solver, a highly control‐structured program suffered significantly in the data‐parallel model on the CM‐5. Our study provides further evidence that matching execution patterns of algorithms to parallel architectures would achieve better performance.

  16. Acceleration of cardiovascular MRI using parallel imaging: basic principles, practical considerations, clinical applications and future directions

    International Nuclear Information System (INIS)

    Niendorf, T.; Sodickson, D.

    2006-01-01

    Cardiovascular Magnetic Resonance (CVMR) imaging has proven to be of clinical value for non-invasive diagnostic imaging of cardiovascular diseases. CVMR requires rapid imaging; however, the speed of conventional MRI is fundamentally limited due to its sequential approach to image acquisition, in which data points are collected one after the other in the presence of sequentially-applied magnetic field gradients and radiofrequency coils to acquire multiple data points simultaneously, and thereby to increase imaging speed and efficiency beyond the limits of purely gradient-based approaches. The resulting improvements in imaging speed can be used in various ways, including shortening long examinations, improving spatial resolution and anatomic coverage, improving temporal resolution, enhancing image quality, overcoming physiological constraints, detecting and correcting for physiologic motion, and streamlining work flow. Examples of these strategies will be provided in this review, after some of the fundamentals of parallel imaging methods now in use for cardiovascular MRI are outlined. The emphasis will rest upon basic principles and clinical state-of-the art cardiovascular MRI applications. In addition, practical aspects such as signal-to-noise ratio considerations, tailored parallel imaging protocols and potential artifacts will be discussed, and current trends and future directions will be explored. (orig.)

  17. Parallel Syndromes: Two Dimensions of Narcissism and the Facets of Psychopathic Personality in Criminally-Involved Individuals

    Science.gov (United States)

    2012-01-01

    Little research has examined different dimensions of narcissism that may parallel psychopathy facets in criminally-involved individuals. The present study examined the pattern of relationships between grandiose and vulnerable narcissism, assessed using the Narcissistic Personality Inventory-16 and the Hypersensitive Narcissism Scale, respectively, and the four facets of psychopathy (interpersonal, affective, lifestyle, and antisocial) assessed via the Psychopathy Checklist: Screening Version (PCL:SV). As predicted, grandiose and vulnerable narcissism showed differential relationships to psychopathy facets, with grandiose narcissism relating positively to the interpersonal facet of psychopathy and vulnerable narcissism relating positively to the lifestyle facet of psychopathy. Paralleling existing psychopathy research, vulnerable narcissism showed stronger associations than grandiose narcissism to 1) other forms of psychopathology, including internalizing and substance use disorders, and 2) self- and other-directed aggression, measured using the Life History of Aggression and the Forms of Aggression Questionnaire. Grandiose narcissism was nonetheless associated with social dysfunction marked by a manipulative and deceitful interpersonal style and unprovoked aggression. Potentially important implications for uncovering etiological pathways and developing treatment interventions for these disorders in externalizing adults are discussed. PMID:22448731

  18. On the Convergence of Asynchronous Parallel Pattern Search

    International Nuclear Information System (INIS)

    Tamara Gilbson Kolda

    2002-01-01

    In this paper the authors prove global convergence for asynchronous parallel pattern search. In standard pattern search, decisions regarding the update of the iterate and the step-length control parameter are synchronized implicitly across all search directions. They lose this feature in asynchronous parallel pattern search since the search along each direction proceeds semi-autonomously. By bounding the value of the step-length control parameter after any step that produces decrease along a single search direction, they can prove that all the processes share a common accumulation point and that such a point is a stationary point of the standard nonlinear unconstrained optimization problem

  19. Closed-form solution for piezoelectric layer with two collinear cracks parallel to the boundaries

    Directory of Open Access Journals (Sweden)

    B. M. Singh

    2006-01-01

    Full Text Available We consider the problem of determining the stress distribution in an infinitely long piezoelectric layer of finite width, with two collinear cracks of equal length and parallel to the layer boundaries. Within the framework of reigning piezoelectric theory under mode III, the cracked piezoelectric layer subjected to combined electromechanical loading is analyzed. The faces of the layers are subjected to electromechanical loading. The collinear cracks are located at the middle plane of the layer parallel to its face. By the use of Fourier transforms we reduce the problem to solving a set of triple integral equations with cosine kernel and a weight function. The triple integral equations are solved exactly. Closed form analytical expressions for stress intensity factors, electric displacement intensity factors, and shape of crack and energy release rate are derived. As the limiting case, the solution of the problem with one crack in the layer is derived. Some numerical results for the physical quantities are obtained and displayed graphically.

  20. PELVIC ROTATION AND LOWER EXTREMITY MOTION WITH TWO DIFFERENT FRONT FOOT DIRECTIONS IN THE TENNIS BACKHAND GROUNDSTROKE

    Directory of Open Access Journals (Sweden)

    Sayumi Iwamoto

    2013-06-01

    Full Text Available When a tennis player steps forward to hit a backhand groundstroke in closed stance, modifying the direction of the front foot relative to the net may reduce the risk of ankle injury and increase performance. This study evaluated the relationship between pelvic rotation and lower extremity movement during the backhand groundstroke when players stepped with toes parallel to the net (Level or with toes pointed towards the net (Net. High school competitive tennis players (eleven males and seven females, 16.8 ± 0.8 years, all right- handed performed tennis court tests comprising five maximum speed directional runs to the court intersection line to hit an imaginary ball with forehand or backhand swings. The final backhand groundstroke for each player at the backcourt baseline was analyzed. Pelvic rotation and lower extremity motion were quantified using 3D video analysis from frontal and sagittal plane camera views reconstructed to 3D using DLT methods. Plantar flexion of ankle and supination of the front foot were displayed for both Net and Level groups during the late phase of the front foot step. The timings of the peak pelvis rotational velocity and peak pelvis rotational acceleration showed different pattern for Net and Level groups. The peak timing of the pelvis rotational velocity of the Level group occurred during the late phase of the step, suggesting an increase in the risk of inversion ankle sprain and a decrease in stroke power compared to the Net group

  1. Parallel hyperbolic PDE simulation on clusters: Cell versus GPU

    Science.gov (United States)

    Rostrup, Scott; De Sterck, Hans

    2010-12-01

    Increasingly, high-performance computing is looking towards data-parallel computational devices to enhance computational performance. Two technologies that have received significant attention are IBM's Cell Processor and NVIDIA's CUDA programming model for graphics processing unit (GPU) computing. In this paper we investigate the acceleration of parallel hyperbolic partial differential equation simulation on structured grids with explicit time integration on clusters with Cell and GPU backends. The message passing interface (MPI) is used for communication between nodes at the coarsest level of parallelism. Optimizations of the simulation code at the several finer levels of parallelism that the data-parallel devices provide are described in terms of data layout, data flow and data-parallel instructions. Optimized Cell and GPU performance are compared with reference code performance on a single x86 central processing unit (CPU) core in single and double precision. We further compare the CPU, Cell and GPU platforms on a chip-to-chip basis, and compare performance on single cluster nodes with two CPUs, two Cell processors or two GPUs in a shared memory configuration (without MPI). We finally compare performance on clusters with 32 CPUs, 32 Cell processors, and 32 GPUs using MPI. Our GPU cluster results use NVIDIA Tesla GPUs with GT200 architecture, but some preliminary results on recently introduced NVIDIA GPUs with the next-generation Fermi architecture are also included. This paper provides computational scientists and engineers who are considering porting their codes to accelerator environments with insight into how structured grid based explicit algorithms can be optimized for clusters with Cell and GPU accelerators. It also provides insight into the speed-up that may be gained on current and future accelerator architectures for this class of applications. Program summaryProgram title: SWsolver Catalogue identifier: AEGY_v1_0 Program summary URL

  2. Efficiency Analysis of the Parallel Implementation of the SIMPLE Algorithm on Multiprocessor Computers

    Science.gov (United States)

    Lashkin, S. V.; Kozelkov, A. S.; Yalozo, A. V.; Gerasimov, V. Yu.; Zelensky, D. K.

    2017-12-01

    This paper describes the details of the parallel implementation of the SIMPLE algorithm for numerical solution of the Navier-Stokes system of equations on arbitrary unstructured grids. The iteration schemes for the serial and parallel versions of the SIMPLE algorithm are implemented. In the description of the parallel implementation, special attention is paid to computational data exchange among processors under the condition of the grid model decomposition using fictitious cells. We discuss the specific features for the storage of distributed matrices and implementation of vector-matrix operations in parallel mode. It is shown that the proposed way of matrix storage reduces the number of interprocessor exchanges. A series of numerical experiments illustrates the effect of the multigrid SLAE solver tuning on the general efficiency of the algorithm; the tuning involves the types of the cycles used (V, W, and F), the number of iterations of a smoothing operator, and the number of cells for coarsening. Two ways (direct and indirect) of efficiency evaluation for parallelization of the numerical algorithm are demonstrated. The paper presents the results of solving some internal and external flow problems with the evaluation of parallelization efficiency by two algorithms. It is shown that the proposed parallel implementation enables efficient computations for the problems on a thousand processors. Based on the results obtained, some general recommendations are made for the optimal tuning of the multigrid solver, as well as for selecting the optimal number of cells per processor.

  3. Pa2 kinematic bond in translational parallel manipulators

    Directory of Open Access Journals (Sweden)

    A. Hernández

    2018-01-01

    Full Text Available The Pa2 pair is composed of two intertwined articulated parallelograms connecting in parallel two links of a kinematic chain. This pair has two translational degrees of freedom leading to a translational plane variable with the position. Currently, the Pa2 pair appears in conceptual designs presented in recent papers. However, its practical application is very limited. One of the reasons for this can be the high number of redundant constraints it has. But, it has to be considered that most of them can be eliminated by replacing wisely the revolute joints by spherical joints. On the other side, the structure of the Pa2 pair contributes to increase the global stiffness of the kinematic chain in which it is mounted. Also, its implementation is a promising alternative to the problematic passive prismatic joints. In this paper, the Pa2 pairs are used in the design of a 3 − P Pa2 parallel manipulator. The potentiality of this design is evaluated and proven after doing the following analyses: direct and inverse kinematics, singularity study, and workspace computation and assessment.

  4. Parallel Fortran-MPI software for numerical inversion of the Laplace transform and its application to oscillatory water levels in groundwater environments

    Science.gov (United States)

    Zhan, X.

    2005-01-01

    A parallel Fortran-MPI (Message Passing Interface) software for numerical inversion of the Laplace transform based on a Fourier series method is developed to meet the need of solving intensive computational problems involving oscillatory water level's response to hydraulic tests in a groundwater environment. The software is a parallel version of ACM (The Association for Computing Machinery) Transactions on Mathematical Software (TOMS) Algorithm 796. Running 38 test examples indicated that implementation of MPI techniques with distributed memory architecture speedups the processing and improves the efficiency. Applications to oscillatory water levels in a well during aquifer tests are presented to illustrate how this package can be applied to solve complicated environmental problems involved in differential and integral equations. The package is free and is easy to use for people with little or no previous experience in using MPI but who wish to get off to a quick start in parallel computing. ?? 2004 Elsevier Ltd. All rights reserved.

  5. Investigating the dynamics of a direct parallel combination of supercapacitors and polymer electrolyte fuel cells

    Energy Technology Data Exchange (ETDEWEB)

    Papra, M.; Buechi, F.N.; Koetz, R. [Electrochemistry Laboratory, Paul Scherrer Institute, CH-5232 Villigen PSI (Switzerland)

    2010-10-15

    Hydrogen fuelled vehicles with a fuel cell based powertrain are considered to contribute to sustainable mobility by reducing CO{sub 2} emissions from road transport. In such vehicles the fuel cell system is typically hybridised with an energy storage device such as a battery or a supercapacitor (SC) to allow for recovering braking energy and assist the fuel cell system for peak power. The direct parallel combination of a polymer electrolyte fuel cell (PEFC) and a SC without any control electronics is investigated in the present study. It is demonstrated that the combination enhances the dynamics of the PEFC significantly during load changes. However, due to the lack of a power electronic interface the SC cannot be utilised to its optimum capacity. (Abstract Copyright [2010], Wiley Periodicals, Inc.)

  6. Numerical simulation of two-phase flow with front-capturing

    International Nuclear Information System (INIS)

    Tzanos, C.P.; Weber, D.P.

    2000-01-01

    Because of the complexity of two-phase flow phenomena, two-phase flow codes rely heavily on empirical correlations. This approach has a number of serious shortcomings. Advances in parallel computing and continuing improvements in computer speed and memory have stimulated the development of numerical simulation tools that rely less on empirical correlations and more on fundamental physics. The objective of this work is to take advantage of developments in massively parallel computing, single-phase computational fluid dynamics of complex systems, and numerical methods for front capturing in two-phase flows to develop a computer code for direct numerical simulation of two-phase flow. This includes bubble/droplet transport, interface deformation and topology change, bubble-droplet interactions, interface mass, momentum, and energy transfer. In this work, the Navier-Stokes and energy equations are solved by treating both phases as a single fluid with interfaces between the two phases, and a discontinuity in material properties across the moving interfaces. The evolution of the interfaces is simulated by using the front capturing technique of the level-set methods. In these methods, the boundary of a two-fluid interface is modeled as the zero level set of a smooth function φ. The level-set function φ is defined as the signed distance from the interface (φ is negative inside a droplet/bubble and positive outside). Compared to other front-capturing or front-tracking methods, the level-set approach is relatively easy to implement even in three-dimensional flows, and it has been shown to simulate well the coalescence and breakup of droplets/bubbles

  7. A maintenance policy for two-unit parallel systems based on imperfect monitoring information

    Energy Technology Data Exchange (ETDEWEB)

    Barros, Anne [Department Genie des Systems Industiels (GSI), Universite de technologie de Troyes, 12 rue Marie Curie, BP 2060, 10010 Troyes, Cedex (France)]. E-mail: anne.barros@utt.fr; Berenguer, Christophe [Department Genie des Systems Industiels (GSI), Universite de technologie de Troyes, 12 rue Marie Curie, BP 2060, 10010 Troyes, Cedex (France); Grall, Antoine [Department Genie des Systems Industiels (GSI), Universite de technologie de Troyes, 12 rue Marie Curie, BP 2060, 10010 Troyes, Cedex (France)

    2006-02-01

    In this paper a maintenance policy is optimised for a two-unit system with a parallel structure and stochastic dependences. Monitoring problems are taken into account in the optimisation scheme: the failure time of each unit can be not detected with a given probability. Conditions on the system parameters (unit failure rates) and on the non-detection probabilities must be verified to make the optimisation scheme valid. These conditions are clearly identified. Numerical experiments allow to show the relevance of taking into account monitoring problems in the maintenance model.

  8. A maintenance policy for two-unit parallel systems based on imperfect monitoring information

    International Nuclear Information System (INIS)

    Barros, Anne; Berenguer, Christophe; Grall, Antoine

    2006-01-01

    In this paper a maintenance policy is optimised for a two-unit system with a parallel structure and stochastic dependences. Monitoring problems are taken into account in the optimisation scheme: the failure time of each unit can be not detected with a given probability. Conditions on the system parameters (unit failure rates) and on the non-detection probabilities must be verified to make the optimisation scheme valid. These conditions are clearly identified. Numerical experiments allow to show the relevance of taking into account monitoring problems in the maintenance model

  9. Stepped-wedge cluster randomised controlled trials: a generic framework including parallel and multiple-level designs.

    Science.gov (United States)

    Hemming, Karla; Lilford, Richard; Girling, Alan J

    2015-01-30

    Stepped-wedge cluster randomised trials (SW-CRTs) are being used with increasing frequency in health service evaluation. Conventionally, these studies are cross-sectional in design with equally spaced steps, with an equal number of clusters randomised at each step and data collected at each and every step. Here we introduce several variations on this design and consider implications for power. One modification we consider is the incomplete cross-sectional SW-CRT, where the number of clusters varies at each step or where at some steps, for example, implementation or transition periods, data are not collected. We show that the parallel CRT with staggered but balanced randomisation can be considered a special case of the incomplete SW-CRT. As too can the parallel CRT with baseline measures. And we extend these designs to allow for multiple layers of clustering, for example, wards within a hospital. Building on results for complete designs, power and detectable difference are derived using a Wald test and obtaining the variance-covariance matrix of the treatment effect assuming a generalised linear mixed model. These variations are illustrated by several real examples. We recommend that whilst the impact of transition periods on power is likely to be small, where they are a feature of the design they should be incorporated. We also show examples in which the power of a SW-CRT increases as the intra-cluster correlation (ICC) increases and demonstrate that the impact of the ICC is likely to be smaller in a SW-CRT compared with a parallel CRT, especially where there are multiple levels of clustering. Finally, through this unified framework, the efficiency of the SW-CRT and the parallel CRT can be compared. © 2014 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

  10. Influence of equilibrium shear flow in the parallel magnetic direction on edge localized mode crash

    Energy Technology Data Exchange (ETDEWEB)

    Luo, Y.; Xiong, Y. Y. [College of Physical Science and Technology, Sichuan University, 610064 Chengdu (China); Chen, S. Y., E-mail: sychen531@163.com [College of Physical Science and Technology, Sichuan University, 610064 Chengdu (China); Key Laboratory of High Energy Density Physics and Technology of Ministry of Education, Sichuan University, Chengdu 610064 (China); Southwestern Institute of Physics, Chengdu 610041 (China); Huang, J.; Tang, C. J. [College of Physical Science and Technology, Sichuan University, 610064 Chengdu (China); Key Laboratory of High Energy Density Physics and Technology of Ministry of Education, Sichuan University, Chengdu 610064 (China)

    2016-04-15

    The influence of the parallel shear flow on the evolution of peeling-ballooning (P-B) modes is studied with the BOUT++ four-field code in this paper. The parallel shear flow has different effects in linear simulation and nonlinear simulation. In the linear simulations, the growth rate of edge localized mode (ELM) can be increased by Kelvin-Helmholtz term, which can be caused by the parallel shear flow. In the nonlinear simulations, the results accord with the linear simulations in the linear phase. However, the ELM size is reduced by the parallel shear flow in the beginning of the turbulence phase, which is recognized as the P-B filaments' structure. Then during the turbulence phase, the ELM size is decreased by the shear flow.

  11. One Factor or Two Parallel Processes? Comorbidity and Development of Adolescent Anxiety and Depressive Disorder Symptoms

    Science.gov (United States)

    Hale, William W., III; Raaijmakers, Quinten A. W.; Muris, Peter; van Hoof, Anne; Meeus, Wim H. J.

    2009-01-01

    Background: This study investigates whether anxiety and depressive disorder symptoms of adolescents from the general community are best described by a model that assumes they are indicative of one general factor or by a model that assumes they are two distinct disorders with parallel growth processes. Additional analyses were conducted to explore…

  12. Large-scale modeling of epileptic seizures: scaling properties of two parallel neuronal network simulation algorithms.

    Science.gov (United States)

    Pesce, Lorenzo L; Lee, Hyong C; Hereld, Mark; Visser, Sid; Stevens, Rick L; Wildeman, Albert; van Drongelen, Wim

    2013-01-01

    Our limited understanding of the relationship between the behavior of individual neurons and large neuronal networks is an important limitation in current epilepsy research and may be one of the main causes of our inadequate ability to treat it. Addressing this problem directly via experiments is impossibly complex; thus, we have been developing and studying medium-large-scale simulations of detailed neuronal networks to guide us. Flexibility in the connection schemas and a complete description of the cortical tissue seem necessary for this purpose. In this paper we examine some of the basic issues encountered in these multiscale simulations. We have determined the detailed behavior of two such simulators on parallel computer systems. The observed memory and computation-time scaling behavior for a distributed memory implementation were very good over the range studied, both in terms of network sizes (2,000 to 400,000 neurons) and processor pool sizes (1 to 256 processors). Our simulations required between a few megabytes and about 150 gigabytes of RAM and lasted between a few minutes and about a week, well within the capability of most multinode clusters. Therefore, simulations of epileptic seizures on networks with millions of cells should be feasible on current supercomputers.

  13. Large-Scale Modeling of Epileptic Seizures: Scaling Properties of Two Parallel Neuronal Network Simulation Algorithms

    Directory of Open Access Journals (Sweden)

    Lorenzo L. Pesce

    2013-01-01

    Full Text Available Our limited understanding of the relationship between the behavior of individual neurons and large neuronal networks is an important limitation in current epilepsy research and may be one of the main causes of our inadequate ability to treat it. Addressing this problem directly via experiments is impossibly complex; thus, we have been developing and studying medium-large-scale simulations of detailed neuronal networks to guide us. Flexibility in the connection schemas and a complete description of the cortical tissue seem necessary for this purpose. In this paper we examine some of the basic issues encountered in these multiscale simulations. We have determined the detailed behavior of two such simulators on parallel computer systems. The observed memory and computation-time scaling behavior for a distributed memory implementation were very good over the range studied, both in terms of network sizes (2,000 to 400,000 neurons and processor pool sizes (1 to 256 processors. Our simulations required between a few megabytes and about 150 gigabytes of RAM and lasted between a few minutes and about a week, well within the capability of most multinode clusters. Therefore, simulations of epileptic seizures on networks with millions of cells should be feasible on current supercomputers.

  14. Studies of electron collisions with polyatomic molecules using distributed-memory parallel computers

    International Nuclear Information System (INIS)

    Winstead, C.; Hipes, P.G.; Lima, M.A.P.; McKoy, V.

    1991-01-01

    Elastic electron scattering cross sections from 5--30 eV are reported for the molecules C 2 H 4 , C 2 H 6 , C 3 H 8 , Si 2 H 6 , and GeH 4 , obtained using an implementation of the Schwinger multichannel method for distributed-memory parallel computer architectures. These results, obtained within the static-exchange approximation, are in generally good agreement with the available experimental data. These calculations demonstrate the potential of highly parallel computation in the study of collisions between low-energy electrons and polyatomic gases. The computational methodology discussed is also directly applicable to the calculation of elastic cross sections at higher levels of approximation (target polarization) and of electronic excitation cross sections

  15. The effect of plasma background on the instability of two non-parallel quantum plasma shells in whole K space

    International Nuclear Information System (INIS)

    Mehdian, H.; Hajisharifi, K.; Hasanbeigi, A.

    2014-01-01

    In this paper, quantum fluid equations together with Maxwell's equations are used to study the stability problem of non-parallel and non-relativistic plasma shells colliding over a “background plasma” at arbitrary angle, as a first step towards a microscopic understanding of the collision shocks. The calculations have been performed for all magnitude and directions of wave vectors. The colliding plasma shells in the vacuum region have been investigated in the previous works as a counter-streaming model. While, in the presence of background plasma (more realistic system), the colliding shells are mainly non-paralleled. The obtained results show that the presence of background plasma often suppresses the maximum growth rate of instabilities (in particular case, this behavior is contrary). It is also found that the largest maximum growth rate occurs for the two-stream instability of the configuration consisting of counter-streaming currents in a very dilute plasma background. The results derived in this study can be used to analyze the systems of three colliding plasma slabs, provided that the used coordinate system is stationary relative to the one of the particle slabs. The present analytical investigations can be applied to describe the quantum violent astrophysical phenomena such as white dwarf stars collision with other dense astrophysical bodies or supernova remnants. Moreover, at the limit of ℏ→0, the obtained results described the classical (sufficiently dilute) events of colliding plasma shells such as gamma-ray bursts and flares in the solar winds

  16. Interaction between two parallel plates covered with a polyelectrolyte brush layer in an electrolyte solution.

    Science.gov (United States)

    Ohshima, Hiroyuki

    An approximate analytic expression is derived for the interaction energy between two parallel plates covered with a polyelectrolyte brush layer in an electrolyte solution. The interaction energy has three components: electrostatic interaction energy between two brush layers before and after their contact, steric interaction energy between two brush layers after their contact, and the van der Waals interaction energy between the cores of the plates. It is shown that these three components are of the same order of magnitude and contribute equally to the total interaction energy between two polyelectrolyte-coated plates in an electrolyte solution. On the basis of Derjaguin's approximation, an approximate expression for the interaction energy between two spherical particles covered with polyelectrolyte brush layers is also derived.

  17. Real-time objects development: Study and proposal for a parallel scheduling architecture

    International Nuclear Information System (INIS)

    Rioux, Laurent

    1997-01-01

    This thesis contributes to the programming and the execution control of real-time object oriented applications. Using real-time objects is very interesting for programming real- time applications, because this model can introduce the concurrence with the encapsulation properties, with modularity and reusability by taking into account the real-time constraints of the application. One essential quality of this approach is that it can directly specify the parallelism and the real-time constraints at the model level of the application. An annotation system of C++ has been defined to describe the real-time specifications in the model (or in the source code) of the application. It will supply to the execution support the different information it needs for the control. In this approach of multitasking, the control is distributed and encapsulated inside each real time object. Three complementary levels of control have been defined: the state level (defining the capability of an object to treat an operation), the concurrence level (assuring the coherence between the object attributes) and a scheduling control (allocating the processors resources to the object by taking real-time constraints into account). The proposed control architecture, named OROS, manages the attribute access of each object in an individual way, then it can parallel treatments which do not access at the same data. This architecture makes a dynamic control of an application that can take benefit from the parallelism of the new machines both for the execution parallelism and the control itself. This architecture uses only the simplest primitives of the industrial real-time operating systems which ensures its feasibility and portability. (author) [fr

  18. Comparative analysis of serial and parallel laser patterning of Ag nanowire thin films

    Energy Technology Data Exchange (ETDEWEB)

    Oh, Harim; Lee, Myeongkyu, E-mail: myeong@yonsei.ac.kr

    2017-03-31

    Highlights: • Serial and parallel laser patterning of Ag nanowire thin films is comparatively analyzed. • AgNW film can be directly patterned by a spatially-modulated pulsed Nd:YAG laser beam. • An area of 2.24 cm{sup 2} can be simultaneously patterned by a single pulse with energy of 350 mJ. - Abstract: Ag nanowire (AgNW) films solution-coated on a glass substrate were laser-patterned in two different ways. For the conventional serial process, a pulsed ultraviolet laser of 30 kHz repetition rate and ∼20 ns pulse width was employed as the laser source. For parallel patterning, the film was directly irradiated by a spatially-modulated Nd:YAG laser beam that has a low repetition rate of 10 kHz and a shorter pulse width of 5 ns. While multiple pulses with energy density ranging from 3 to 9 J/cm{sup 2} were required to pattern the film in the serial process, a single pulse with energy density of 0.16 J/cm{sup 2} completely removed AgNWs in the parallel patterning. This may be explained by the difference in patterning mechanism. In the parallel process using short pulses of 5 ns width, AgNWs can be removed in their solid state by the laser-induced thermo-elastic force, while they should be evaporated in the serial process utilizing a high-repetition rate laser. Important process parameters such as threshold energy density, speed, and available feature sizes are comparatively discussed for the two patterning.

  19. Feasibility study of segmented-parallel-hole collimator for stationary cardiac SPECT

    Energy Technology Data Exchange (ETDEWEB)

    Mao, Yanfei [Utah Univ., Salt Lake City, UT (United States). Center for Advanced Imaging Research (UCAIR); Utah Univ., Salt Lake City, UT (United States). Dept. of Bioengineering; Zeng, Gengsheng L. [Utah Univ., Salt Lake City, UT (United States). Center for Advanced Imaging Research (UCAIR)

    2011-07-01

    The goal of this research is to propose a stationary cardiac SPECT system using the segmented parallel-beam collimator and to perform some computer simulations to test the feasibility. A stationary system has a benefit of acquiring temporally consistent projections. The most challenging issue in building a stationary system is to provide sufficient projection view-angles. A 2-detector, multi-segment collimator system with 14 view-angles over 180 in the transaxial direction and 3 view-angles in the axial directions was designed, where the two detectors are configured 90 apart in an L-shape. We applied the parallel-beam imaging geometry and used segmented parallel-hole collimator to acquire SPECT data. To improve the system condition due to data truncation, we measured more rays within the field-of-view (FOV) of the detector by using a relatively small detector bin-size. In image reconstruction, we used the maximum-likelihood expectation-maximization (ML-EM) algorithm. The criterion for evaluating the system is the summed pixel-to-pixel distance that measures the discrepancy between the 3D gold-standard image and the reconstructed 3D region of interest (ROI) with truncated data. Effects of limited number of view-angles, data truncation, varying body habitus, attenuation, and noise were considered in the system design. As a result, our segmented-parallel-beam stationary cardiac SPECT system is able to acquire sufficient data for cardiac imaging and has a high sensitivity gain. (orig.)

  20. Parallel Monte Carlo reactor neutronics

    International Nuclear Information System (INIS)

    Blomquist, R.N.; Brown, F.B.

    1994-01-01

    The issues affecting implementation of parallel algorithms for large-scale engineering Monte Carlo neutron transport simulations are discussed. For nuclear reactor calculations, these include load balancing, recoding effort, reproducibility, domain decomposition techniques, I/O minimization, and strategies for different parallel architectures. Two codes were parallelized and tested for performance. The architectures employed include SIMD, MIMD-distributed memory, and workstation network with uneven interactive load. Speedups linear with the number of nodes were achieved

  1. Adaptive integrand decomposition in parallel and orthogonal space

    International Nuclear Information System (INIS)

    Mastrolia, Pierpaolo; Peraro, Tiziano; Primo, Amedeo

    2016-01-01

    We present the integrand decomposition of multiloop scattering amplitudes in parallel and orthogonal space-time dimensions, d=d ∥ +d ⊥ , being d ∥ the dimension of the parallel space spanned by the legs of the diagrams. When the number n of external legs is n≤4, the corresponding representation of multiloop integrals exposes a subset of integration variables which can be easily integrated away by means of Gegenbauer polynomials orthogonality condition. By decomposing the integration momenta along parallel and orthogonal directions, the polynomial division algorithm is drastically simplified. Moreover, the orthogonality conditions of Gegenbauer polynomials can be suitably applied to integrate the decomposed integrand, yielding the systematic annihilation of spurious terms. Consequently, multiloop amplitudes are expressed in terms of integrals corresponding to irreducible scalar products of loop momenta and external ones. We revisit the one-loop decomposition, which turns out to be controlled by the maximum-cut theorem in different dimensions, and we discuss the integrand reduction of two-loop planar and non-planar integrals up to n=8 legs, for arbitrary external and internal kinematics. The proposed algorithm extends to all orders in perturbation theory.

  2. Adaptive integrand decomposition in parallel and orthogonal space

    Energy Technology Data Exchange (ETDEWEB)

    Mastrolia, Pierpaolo [Dipartimento di Fisica ed Astronomia, Università di Padova,Via Marzolo 8, 35131 Padova (Italy); INFN, Sezione di Padova,Via Marzolo 8, 35131 Padova (Italy); Peraro, Tiziano [Higgs Centre for Theoretical Physics, School of Physics and Astronomy,The University of Edinburgh,James Clerk Maxwell Building,Peter Guthrie Tait Road, Edinburgh EH9 3FD, Scotland (United Kingdom); Primo, Amedeo [Dipartimento di Fisica ed Astronomia, Università di Padova,Via Marzolo 8, 35131 Padova (Italy); INFN, Sezione di Padova,Via Marzolo 8, 35131 Padova (Italy)

    2016-08-29

    We present the integrand decomposition of multiloop scattering amplitudes in parallel and orthogonal space-time dimensions, d=d{sub ∥}+d{sub ⊥}, being d{sub ∥} the dimension of the parallel space spanned by the legs of the diagrams. When the number n of external legs is n≤4, the corresponding representation of multiloop integrals exposes a subset of integration variables which can be easily integrated away by means of Gegenbauer polynomials orthogonality condition. By decomposing the integration momenta along parallel and orthogonal directions, the polynomial division algorithm is drastically simplified. Moreover, the orthogonality conditions of Gegenbauer polynomials can be suitably applied to integrate the decomposed integrand, yielding the systematic annihilation of spurious terms. Consequently, multiloop amplitudes are expressed in terms of integrals corresponding to irreducible scalar products of loop momenta and external ones. We revisit the one-loop decomposition, which turns out to be controlled by the maximum-cut theorem in different dimensions, and we discuss the integrand reduction of two-loop planar and non-planar integrals up to n=8 legs, for arbitrary external and internal kinematics. The proposed algorithm extends to all orders in perturbation theory.

  3. Analytic Approximate Solutions for Unsteady Two-Dimensional and Axisymmetric Squeezing Flows between Parallel Plates

    Directory of Open Access Journals (Sweden)

    Mohammad Mehdi Rashidi

    2008-01-01

    Full Text Available The flow of a viscous incompressible fluid between two parallel plates due to the normal motion of the plates is investigated. The unsteady Navier-Stokes equations are reduced to a nonlinear fourth-order differential equation by using similarity solutions. Homotopy analysis method (HAM is used to solve this nonlinear equation analytically. The convergence of the obtained series solution is carefully analyzed. The validity of our solutions is verified by the numerical results obtained by fourth-order Runge-Kutta.

  4. Parallel R-matrix computation

    International Nuclear Information System (INIS)

    Heggarty, J.W.

    1999-06-01

    For almost thirty years, sequential R-matrix computation has been used by atomic physics research groups, from around the world, to model collision phenomena involving the scattering of electrons or positrons with atomic or molecular targets. As considerable progress has been made in the understanding of fundamental scattering processes, new data, obtained from more complex calculations, is of current interest to experimentalists. Performing such calculations, however, places considerable demands on the computational resources to be provided by the target machine, in terms of both processor speed and memory requirement. Indeed, in some instances the computational requirements are so great that the proposed R-matrix calculations are intractable, even when utilising contemporary classic supercomputers. Historically, increases in the computational requirements of R-matrix computation were accommodated by porting the problem codes to a more powerful classic supercomputer. Although this approach has been successful in the past, it is no longer considered to be a satisfactory solution due to the limitations of current (and future) Von Neumann machines. As a consequence, there has been considerable interest in the high performance multicomputers, that have emerged over the last decade which appear to offer the computational resources required by contemporary R-matrix research. Unfortunately, developing codes for these machines is not as simple a task as it was to develop codes for successive classic supercomputers. The difficulty arises from the considerable differences in the computing models that exist between the two types of machine and results in the programming of multicomputers to be widely acknowledged as a difficult, time consuming and error-prone task. Nevertheless, unless parallel R-matrix computation is realised, important theoretical and experimental atomic physics research will continue to be hindered. This thesis describes work that was undertaken in

  5. Parallel Implicit Algorithms for CFD

    Science.gov (United States)

    Keyes, David E.

    1998-01-01

    The main goal of this project was efficient distributed parallel and workstation cluster implementations of Newton-Krylov-Schwarz (NKS) solvers for implicit Computational Fluid Dynamics (CFD.) "Newton" refers to a quadratically convergent nonlinear iteration using gradient information based on the true residual, "Krylov" to an inner linear iteration that accesses the Jacobian matrix only through highly parallelizable sparse matrix-vector products, and "Schwarz" to a domain decomposition form of preconditioning the inner Krylov iterations with primarily neighbor-only exchange of data between the processors. Prior experience has established that Newton-Krylov methods are competitive solvers in the CFD context and that Krylov-Schwarz methods port well to distributed memory computers. The combination of the techniques into Newton-Krylov-Schwarz was implemented on 2D and 3D unstructured Euler codes on the parallel testbeds that used to be at LaRC and on several other parallel computers operated by other agencies or made available by the vendors. Early implementations were made directly in Massively Parallel Integration (MPI) with parallel solvers we adapted from legacy NASA codes and enhanced for full NKS functionality. Later implementations were made in the framework of the PETSC library from Argonne National Laboratory, which now includes pseudo-transient continuation Newton-Krylov-Schwarz solver capability (as a result of demands we made upon PETSC during our early porting experiences). A secondary project pursued with funding from this contract was parallel implicit solvers in acoustics, specifically in the Helmholtz formulation. A 2D acoustic inverse problem has been solved in parallel within the PETSC framework.

  6. Streaming for Functional Data-Parallel Languages

    DEFF Research Database (Denmark)

    Madsen, Frederik Meisner

    In this thesis, we investigate streaming as a general solution to the space inefficiency commonly found in functional data-parallel programming languages. The data-parallel paradigm maps well to parallel SIMD-style hardware. However, the traditional fully materializing execution strategy...... by extending two existing data-parallel languages: NESL and Accelerate. In the extensions we map bulk operations to data-parallel streams that can evaluate fully sequential, fully parallel or anything in between. By a dataflow, piecewise parallel execution strategy, the runtime system can adjust to any target...... flattening necessitates all sub-computations to materialize at the same time. For example, naive n by n matrix multiplication requires n^3 space in NESL because the algorithm contains n^3 independent scalar multiplications. For large values of n, this is completely unacceptable. We address the problem...

  7. DGDFT: A massively parallel method for large scale density functional theory calculations.

    Science.gov (United States)

    Hu, Wei; Lin, Lin; Yang, Chao

    2015-09-28

    We describe a massively parallel implementation of the recently developed discontinuous Galerkin density functional theory (DGDFT) method, for efficient large-scale Kohn-Sham DFT based electronic structure calculations. The DGDFT method uses adaptive local basis (ALB) functions generated on-the-fly during the self-consistent field iteration to represent the solution to the Kohn-Sham equations. The use of the ALB set provides a systematic way to improve the accuracy of the approximation. By using the pole expansion and selected inversion technique to compute electron density, energy, and atomic forces, we can make the computational complexity of DGDFT scale at most quadratically with respect to the number of electrons for both insulating and metallic systems. We show that for the two-dimensional (2D) phosphorene systems studied here, using 37 basis functions per atom allows us to reach an accuracy level of 1.3 × 10(-4) Hartree/atom in terms of the error of energy and 6.2 × 10(-4) Hartree/bohr in terms of the error of atomic force, respectively. DGDFT can achieve 80% parallel efficiency on 128,000 high performance computing cores when it is used to study the electronic structure of 2D phosphorene systems with 3500-14 000 atoms. This high parallel efficiency results from a two-level parallelization scheme that we will describe in detail.

  8. DGDFT: A massively parallel method for large scale density functional theory calculations

    International Nuclear Information System (INIS)

    Hu, Wei; Yang, Chao; Lin, Lin

    2015-01-01

    We describe a massively parallel implementation of the recently developed discontinuous Galerkin density functional theory (DGDFT) method, for efficient large-scale Kohn-Sham DFT based electronic structure calculations. The DGDFT method uses adaptive local basis (ALB) functions generated on-the-fly during the self-consistent field iteration to represent the solution to the Kohn-Sham equations. The use of the ALB set provides a systematic way to improve the accuracy of the approximation. By using the pole expansion and selected inversion technique to compute electron density, energy, and atomic forces, we can make the computational complexity of DGDFT scale at most quadratically with respect to the number of electrons for both insulating and metallic systems. We show that for the two-dimensional (2D) phosphorene systems studied here, using 37 basis functions per atom allows us to reach an accuracy level of 1.3 × 10 −4 Hartree/atom in terms of the error of energy and 6.2 × 10 −4 Hartree/bohr in terms of the error of atomic force, respectively. DGDFT can achieve 80% parallel efficiency on 128,000 high performance computing cores when it is used to study the electronic structure of 2D phosphorene systems with 3500-14 000 atoms. This high parallel efficiency results from a two-level parallelization scheme that we will describe in detail

  9. DGDFT: A massively parallel method for large scale density functional theory calculations

    Energy Technology Data Exchange (ETDEWEB)

    Hu, Wei, E-mail: whu@lbl.gov; Yang, Chao, E-mail: cyang@lbl.gov [Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720 (United States); Lin, Lin, E-mail: linlin@math.berkeley.edu [Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720 (United States); Department of Mathematics, University of California, Berkeley, California 94720 (United States)

    2015-09-28

    We describe a massively parallel implementation of the recently developed discontinuous Galerkin density functional theory (DGDFT) method, for efficient large-scale Kohn-Sham DFT based electronic structure calculations. The DGDFT method uses adaptive local basis (ALB) functions generated on-the-fly during the self-consistent field iteration to represent the solution to the Kohn-Sham equations. The use of the ALB set provides a systematic way to improve the accuracy of the approximation. By using the pole expansion and selected inversion technique to compute electron density, energy, and atomic forces, we can make the computational complexity of DGDFT scale at most quadratically with respect to the number of electrons for both insulating and metallic systems. We show that for the two-dimensional (2D) phosphorene systems studied here, using 37 basis functions per atom allows us to reach an accuracy level of 1.3 × 10{sup −4} Hartree/atom in terms of the error of energy and 6.2 × 10{sup −4} Hartree/bohr in terms of the error of atomic force, respectively. DGDFT can achieve 80% parallel efficiency on 128,000 high performance computing cores when it is used to study the electronic structure of 2D phosphorene systems with 3500-14 000 atoms. This high parallel efficiency results from a two-level parallelization scheme that we will describe in detail.

  10. Computational cost of isogeometric multi-frontal solvers on parallel distributed memory machines

    KAUST Repository

    Woźniak, Maciej

    2015-02-01

    This paper derives theoretical estimates of the computational cost for isogeometric multi-frontal direct solver executed on parallel distributed memory machines. We show theoretically that for the Cp-1 global continuity of the isogeometric solution, both the computational cost and the communication cost of a direct solver are of order O(log(N)p2) for the one dimensional (1D) case, O(Np2) for the two dimensional (2D) case, and O(N4/3p2) for the three dimensional (3D) case, where N is the number of degrees of freedom and p is the polynomial order of the B-spline basis functions. The theoretical estimates are verified by numerical experiments performed with three parallel multi-frontal direct solvers: MUMPS, PaStiX and SuperLU, available through PETIGA toolkit built on top of PETSc. Numerical results confirm these theoretical estimates both in terms of p and N. For a given problem size, the strong efficiency rapidly decreases as the number of processors increases, becoming about 20% for 256 processors for a 3D example with 1283 unknowns and linear B-splines with C0 global continuity, and 15% for a 3D example with 643 unknowns and quartic B-splines with C3 global continuity. At the same time, one cannot arbitrarily increase the problem size, since the memory required by higher order continuity spaces is large, quickly consuming all the available memory resources even in the parallel distributed memory version. Numerical results also suggest that the use of distributed parallel machines is highly beneficial when solving higher order continuity spaces, although the number of processors that one can efficiently employ is somehow limited.

  11. Photoluminescence spectra of n-doped double quantum wells in a parallel magnetic field

    International Nuclear Information System (INIS)

    Huang, D.; Lyo, S.K.

    1999-01-01

    We show that the photoluminescence (PL) line shapes from tunnel-split ground sublevels of n-doped thin double quantum wells (DQW close-quote s) are sensitively modulated by an in-plane magnetic field B parallel at low temperatures (T). The modulation is caused by the B parallel -induced distortion of the electronic structure. The latter arises from the relative shift of the energy-dispersion parabolas of the two quantum wells (QW close-quote s) in rvec k space, both in the conduction and valence bands, and formation of an anticrossing gap in the conduction band. Using a self-consistent density-functional theory, the PL spectra and the band-gap narrowing are calculated as a function of B parallel , T, and the homogeneous linewidths. The PL spectra from symmetric and asymmetric DQW close-quote s are found to show strikingly different behavior. In symmetric DQW close-quote s with a high density of electrons, two PL peaks are obtained at B parallel =0, representing the interband transitions between the pair of the upper (i.e., antisymmetric) levels and that of the lower (i.e., symmetric) levels of the ground doublets. As B parallel increases, the upper PL peak develops an N-type kink, namely a maximum followed by a minimum, and merges with the lower peak, which rises monotonically as a function of B parallel due to the diamagnetic energy. When the electron density is low, however, only a single PL peak, arising from the transitions between the lower levels, is obtained. In asymmetric DQW close-quote s, the PL spectra show mainly one dominant peak at all B parallel close-quote s. In this case, the holes are localized in one of the QW close-quote s at low T and recombine only with the electrons in the same QW. At high electron densities, the upper PL peak shows an N-type kink like in symmetric DQW close-quote s. However, the lower peak is absent at low B parallel close-quote s because it arises from the inter-QW transitions. Reasonable agreement is obtained with recent

  12. Dataflow Query Execution in a Parallel Main-Memory Environment

    NARCIS (Netherlands)

    Wilschut, A.N.; Apers, Peter M.G.

    1991-01-01

    The performance and characteristics of the execution of various join-trees on a parallel DBMS are studied. The results are a step in the direction of the design of a query optimization strategy that is fit for parallel execution of complex queries. Among others, synchronization issues are identified

  13. A parallel solution for high resolution histological image analysis.

    Science.gov (United States)

    Bueno, G; González, R; Déniz, O; García-Rojo, M; González-García, J; Fernández-Carrobles, M M; Vállez, N; Salido, J

    2012-10-01

    This paper describes a general methodology for developing parallel image processing algorithms based on message passing for high resolution images (on the order of several Gigabytes). These algorithms have been applied to histological images and must be executed on massively parallel processing architectures. Advances in new technologies for complete slide digitalization in pathology have been combined with developments in biomedical informatics. However, the efficient use of these digital slide systems is still a challenge. The image processing that these slides are subject to is still limited both in terms of data processed and processing methods. The work presented here focuses on the need to design and develop parallel image processing tools capable of obtaining and analyzing the entire gamut of information included in digital slides. Tools have been developed to assist pathologists in image analysis and diagnosis, and they cover low and high-level image processing methods applied to histological images. Code portability, reusability and scalability have been tested by using the following parallel computing architectures: distributed memory with massive parallel processors and two networks, INFINIBAND and Myrinet, composed of 17 and 1024 nodes respectively. The parallel framework proposed is flexible, high performance solution and it shows that the efficient processing of digital microscopic images is possible and may offer important benefits to pathology laboratories. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.

  14. On Two-Level State-Dependent Routing Polling Systems with Mixed Service

    Directory of Open Access Journals (Sweden)

    Guan Zheng

    2015-01-01

    Full Text Available Based on priority differentiation and efficiency of the system, we consider an N+1 queues’ single-server two-level polling system which consists of one key queue and N normal queues. The novel contribution of the present paper is that we consider that the server just polls active queues with customers waiting in the queue. Furthermore, key queue is served with exhaustive service and normal queues are served with 1-limited service in a parallel scheduling. For this model, we derive an expression for the probability generating function of the joint queue length distribution at polling epochs. Based on these results, we derive the explicit closed-form expressions for the mean waiting time. Numerical examples demonstrate that theoretical and simulation results are identical and the new system is efficient both at key queue and normal queues.

  15. Optimisation of a parallel ocean general circulation model

    OpenAIRE

    M. I. Beare; D. P. Stevens

    1997-01-01

    International audience; This paper presents the development of a general-purpose parallel ocean circulation model, for use on a wide range of computer platforms, from traditional scalar machines to workstation clusters and massively parallel processors. Parallelism is provided, as a modular option, via high-level message-passing routines, thus hiding the technical intricacies from the user. An initial implementation highlights that the parallel efficiency of the model is adversely affected by...

  16. Two-dimensional superconducting state of monolayer Pb films grown on GaAs(110) in a strong parallel magnetic field.

    Science.gov (United States)

    Sekihara, Takayuki; Masutomi, Ryuichi; Okamoto, Tohru

    2013-08-02

    Two-dimensional (2D) superconductivity was studied by magnetotransport measurements on single-atomic-layer Pb films on a cleaved GaAs(110) surface. The superconducting transition temperature shows only a weak dependence on the parallel magnetic field up to 14T, which is higher than the Pauli paramagnetic limit. Furthermore, the perpendicular-magnetic-field dependence of the sheet resistance is almost independent of the presence of the parallel field component. These results are explained in terms of an inhomogeneous superconducting state predicted for 2D metals with a large Rashba spin splitting.

  17. MHD flow of a dusty viscoelastic liquid through a porous medium between two inclined parallel plates

    International Nuclear Information System (INIS)

    Singh, A.K.; Singh, N.P.

    1996-01-01

    Magnetohydrodynamic flow of a dusty viscoelastic liquid (Oldroyd B-liquid) through a porous medium between two parallel plates inclined to the horizon has been studied. The liquid velocity, dust particle velocity and flux of flow have been obtained. Earlier results have been deduced as particular cases of the present investigation. The physical situation of the motion has been discussed graphically. (author)

  18. Parallel Framework for Cooperative Processes

    Directory of Open Access Journals (Sweden)

    Mitică Craus

    2005-01-01

    Full Text Available This paper describes the work of an object oriented framework designed to be used in the parallelization of a set of related algorithms. The idea behind the system we are describing is to have a re-usable framework for running several sequential algorithms in a parallel environment. The algorithms that the framework can be used with have several things in common: they have to run in cycles and the work should be possible to be split between several "processing units". The parallel framework uses the message-passing communication paradigm and is organized as a master-slave system. Two applications are presented: an Ant Colony Optimization (ACO parallel algorithm for the Travelling Salesman Problem (TSP and an Image Processing (IP parallel algorithm for the Symmetrical Neighborhood Filter (SNF. The implementations of these applications by means of the parallel framework prove to have good performances: approximatively linear speedup and low communication cost.

  19. Parallel electric fields from ionospheric winds

    International Nuclear Information System (INIS)

    Nakada, M.P.

    1987-01-01

    The possible production of electric fields parallel to the magnetic field by dynamo winds in the E region is examined, using a jet stream wind model. Current return paths through the F region above the stream are examined as well as return paths through the conjugate ionosphere. The Wulf geometry with horizontal winds moving in opposite directions one above the other is also examined. Parallel electric fields are found to depend strongly on the width of current sheets at the edges of the jet stream. If these are narrow enough, appreciable parallel electric fields are produced. These appear to be sufficient to heat the electrons which reduces the conductivity and produces further increases in parallel electric fields and temperatures. Calculations indicate that high enough temperatures for optical emission can be produced in less than 0.3 s. Some properties of auroras that might be produced by dynamo winds are examined; one property is a time delay in brightening at higher and lower altitudes

  20. Bistatic scattering from a three-dimensional object above a two-dimensional randomly rough surface modeled with the parallel FDTD approach.

    Science.gov (United States)

    Guo, L-X; Li, J; Zeng, H

    2009-11-01

    We present an investigation of the electromagnetic scattering from a three-dimensional (3-D) object above a two-dimensional (2-D) randomly rough surface. A Message Passing Interface-based parallel finite-difference time-domain (FDTD) approach is used, and the uniaxial perfectly matched layer (UPML) medium is adopted for truncation of the FDTD lattices, in which the finite-difference equations can be used for the total computation domain by properly choosing the uniaxial parameters. This makes the parallel FDTD algorithm easier to implement. The parallel performance with different number of processors is illustrated for one rough surface realization and shows that the computation time of our parallel FDTD algorithm is dramatically reduced relative to a single-processor implementation. Finally, the composite scattering coefficients versus scattered and azimuthal angle are presented and analyzed for different conditions, including the surface roughness, the dielectric constants, the polarization, and the size of the 3-D object.

  1. Dataflow Query Execution in a Parallel, Main-memory Environment

    NARCIS (Netherlands)

    Wilschut, A.N.; Apers, Peter M.G.

    In this paper, the performance and characteristics of the execution of various join-trees on a parallel DBMS are studied. The results of this study are a step into the direction of the design of a query optimization strategy that is fit for parallel execution of complex queries. Among others,

  2. Serine/threonine-protein phosphatase 1 α levels are paralleling olfactory memory formation in the CD1 mouse.

    Science.gov (United States)

    Winding, Christiana; Sun, Yanwei; Höger, Harald; Bubna-Littitz, Hermann; Pollak, Arnold; Schmidt, Peter; Lubec, Gert

    2011-06-01

    Although olfactory discrimination has already been studied in several mouse strains, data on protein levels linked to olfactory memory are limited. Wild mouse strains Mus musculus musculus, Mus musculus domesticus and CD1 laboratory outbred mice were tested in a conditioned odor preference task and trained to discriminate between two odors, Rose and Lemon, by pairing one odor with a sugar reward. Six hours following the final test, mice were sacrificed and olfactory bulbs (OB) were taken for gel-based proteomics analyses and immunoblotting. OB proteins were extracted, separated by 2-DE and quantified using specific software (Proteomweaver). Odor-trained mice showed a preference for the previously rewarded odor suggesting that conditioned odor preference occurred. In CD1 mice levels, one out of 482 protein spots was significantly increased in odor-trained mice as compared with the control group; it was in-gel digested by trypsin and chymotrypsin and analyzed by tandem mass spectrometry (nano-ESI-LC-MS/MS). The spot was unambiguously identified as serine/threonine-protein phosphatase PP1-α catalytic subunit (PP-1A) and differential levels observed in gel-based proteomic studies were verified by immunoblotting. PP-1A is a key signalling element in synaptic plasticity and memory processes and is herein shown to be paralleling olfactory discrimination representing olfactory memory. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  3. High temporal resolution functional MRI using parallel echo volumar imaging

    International Nuclear Information System (INIS)

    Rabrait, C.; Ciuciu, P.; Ribes, A.; Poupon, C.; Dehaine-Lambertz, G.; LeBihan, D.; Lethimonnier, F.; Le Roux, P.; Dehaine-Lambertz, G.

    2008-01-01

    Purpose: To combine parallel imaging with 3D single-shot acquisition (echo volumar imaging, EVI) in order to acquire high temporal resolution volumar functional MRI (fMRI) data. Materials and Methods: An improved EVI sequence was associated with parallel acquisition and field of view reduction in order to acquire a large brain volume in 200 msec. Temporal stability and functional sensitivity were increased through optimization of all imaging parameters and Tikhonov regularization of parallel reconstruction. Two human volunteers were scanned with parallel EVI in a 1.5 T whole-body MR system, while submitted to a slow event-related auditory paradigm. Results: Thanks to parallel acquisition, the EVI volumes display a low level of geometric distortions and signal losses. After removal of low-frequency drifts and physiological artifacts,activations were detected in the temporal lobes of both volunteers and voxel-wise hemodynamic response functions (HRF) could be computed. On these HRF different habituation behaviors in response to sentence repetition could be identified. Conclusion: This work demonstrates the feasibility of high temporal resolution 3D fMRI with parallel EVI. Combined with advanced estimation tools,this acquisition method should prove useful to measure neural activity timing differences or study the nonlinearities and non-stationarities of the BOLD response. (authors)

  4. Professional Parallel Programming with C# Master Parallel Extensions with NET 4

    CERN Document Server

    Hillar, Gastón

    2010-01-01

    Expert guidance for those programming today's dual-core processors PCs As PC processors explode from one or two to now eight processors, there is an urgent need for programmers to master concurrent programming. This book dives deep into the latest technologies available to programmers for creating professional parallel applications using C#, .NET 4, and Visual Studio 2010. The book covers task-based programming, coordination data structures, PLINQ, thread pools, asynchronous programming model, and more. It also teaches other parallel programming techniques, such as SIMD and vectorization.Teach

  5. Frame-Based and Subpicture-Based Parallelization Approaches of the HEVC Video Encoder

    Directory of Open Access Journals (Sweden)

    Héctor Migallón

    2018-05-01

    Full Text Available The most recent video coding standard, High Efficiency Video Coding (HEVC, is able to significantly improve the compression performance at the expense of a huge computational complexity increase with respect to its predecessor, H.264/AVC. Parallel versions of the HEVC encoder may help to reduce the overall encoding time in order to make it more suitable for practical applications. In this work, we study two parallelization strategies. One of them follows a coarse-grain approach, where parallelization is based on frames, and the other one follows a fine-grain approach, where parallelization is performed at subpicture level. Two different frame-based approaches have been developed. The first one only uses MPI and the second one is a hybrid MPI/OpenMP algorithm. An exhaustive experimental test was carried out to study the performance of both approaches in order to find out the best setup in terms of parallel efficiency and coding performance. Both frame-based and subpicture-based approaches are compared under the same hardware platform. Although subpicture-based schemes provide an excellent performance with high-resolution video sequences, scalability is limited by resolution, and the coding performance worsens by increasing the number of processes. Conversely, the proposed frame-based approaches provide the best results with respect to both parallel performance (increasing scalability and coding performance (not degrading the rate/distortion behavior.

  6. Unified Singularity Modeling and Reconfiguration of 3rTPS Metamorphic Parallel Mechanisms with Parallel Constraint Screws

    Directory of Open Access Journals (Sweden)

    Yufeng Zhuang

    2015-01-01

    Full Text Available This paper presents a unified singularity modeling and reconfiguration analysis of variable topologies of a class of metamorphic parallel mechanisms with parallel constraint screws. The new parallel mechanisms consist of three reconfigurable rTPS limbs that have two working phases stemming from the reconfigurable Hooke (rT joint. While one phase has full mobility, the other supplies a constraint force to the platform. Based on these, the platform constraint screw systems show that the new metamorphic parallel mechanisms have four topologies by altering the limb phases with mobility change among 1R2T (one rotation with two translations, 2R2T, and 3R2T and mobility 6. Geometric conditions of the mechanism design are investigated with some special topologies illustrated considering the limb arrangement. Following this and the actuation scheme analysis, a unified Jacobian matrix is formed using screw theory to include the change between geometric constraints and actuation constraints in the topology reconfiguration. Various singular configurations are identified by analyzing screw dependency in the Jacobian matrix. The work in this paper provides basis for singularity-free workspace analysis and optimal design of the class of metamorphic parallel mechanisms with parallel constraint screws which shows simple geometric constraints with potential simple kinematics and dynamics properties.

  7. A Parallel Sweeping Preconditioner for Heterogeneous 3D Helmholtz Equations

    KAUST Repository

    Poulson, Jack

    2013-05-02

    A parallelization of a sweeping preconditioner for three-dimensional Helmholtz equations without large cavities is introduced and benchmarked for several challenging velocity models. The setup and application costs of the sequential preconditioner are shown to be O(γ2N4/3) and O(γN logN), where γ(ω) denotes the modestly frequency-dependent number of grid points per perfectly matched layer. Several computational and memory improvements are introduced relative to using black-box sparse-direct solvers for the auxiliary problems, and competitive runtimes and iteration counts are reported for high-frequency problems distributed over thousands of cores. Two open-source packages are released along with this paper: Parallel Sweeping Preconditioner (PSP) and the underlying distributed multifrontal solver, Clique. © 2013 Society for Industrial and Applied Mathematics.

  8. Parallel paving: An algorithm for generating distributed, adaptive, all-quadrilateral meshes on parallel computers

    Energy Technology Data Exchange (ETDEWEB)

    Lober, R.R.; Tautges, T.J.; Vaughan, C.T.

    1997-03-01

    Paving is an automated mesh generation algorithm which produces all-quadrilateral elements. It can additionally generate these elements in varying sizes such that the resulting mesh adapts to a function distribution, such as an error function. While powerful, conventional paving is a very serial algorithm in its operation. Parallel paving is the extension of serial paving into parallel environments to perform the same meshing functions as conventional paving only on distributed, discretized models. This extension allows large, adaptive, parallel finite element simulations to take advantage of paving`s meshing capabilities for h-remap remeshing. A significantly modified version of the CUBIT mesh generation code has been developed to host the parallel paving algorithm and demonstrate its capabilities on both two dimensional and three dimensional surface geometries and compare the resulting parallel produced meshes to conventionally paved meshes for mesh quality and algorithm performance. Sandia`s {open_quotes}tiling{close_quotes} dynamic load balancing code has also been extended to work with the paving algorithm to retain parallel efficiency as subdomains undergo iterative mesh refinement.

  9. Parallel simulated annealing algorithms for cell placement on hypercube multiprocessors

    Science.gov (United States)

    Banerjee, Prithviraj; Jones, Mark Howard; Sargent, Jeff S.

    1990-01-01

    Two parallel algorithms for standard cell placement using simulated annealing are developed to run on distributed-memory message-passing hypercube multiprocessors. The cells can be mapped in a two-dimensional area of a chip onto processors in an n-dimensional hypercube in two ways, such that both small and large cell exchange and displacement moves can be applied. The computation of the cost function in parallel among all the processors in the hypercube is described, along with a distributed data structure that needs to be stored in the hypercube to support the parallel cost evaluation. A novel tree broadcasting strategy is used extensively for updating cell locations in the parallel environment. A dynamic parallel annealing schedule estimates the errors due to interacting parallel moves and adapts the rate of synchronization automatically. Two novel approaches in controlling error in parallel algorithms are described: heuristic cell coloring and adaptive sequence control.

  10. Parallel education: what is it?

    OpenAIRE

    Amos, Michelle Peta

    2017-01-01

    In the history of education it has long been discussed that single-sex and coeducation are the two models of education present in schools. With the introduction of parallel schools over the last 15 years, there has been very little research into this 'new model'. Many people do not understand what it means for a school to be parallel or they confuse a parallel model with co-education, due to the presence of both boys and girls within the one institution. Therefore, the main obj...

  11. Parallel electric fields in a simulation of magnetotail reconnection and plasmoid evolution

    International Nuclear Information System (INIS)

    Hesse, M.; Birn, J.

    1990-01-01

    Properties of the electric field component parallel to the magnetic field are investigate in a 3D MHD simulation of plasmoid formation and evolution in the magnetotail, in the presence of a net dawn-dusk magnetic field component. The spatial localization of E-parallel, and the concept of a diffusion zone and the role of E-parallel in accelerating electrons are discussed. A localization of the region of enhanced E-parallel in all space directions is found, with a strong concentration in the z direction. This region is identified as the diffusion zone, which plays a crucial role in reconnection theory through the local break-down of magnetic flux conservation. 12 refs

  12. PARALLEL IMPLEMENTATION OF MORPHOLOGICAL PROFILE BASED SPECTRAL-SPATIAL CLASSIFICATION SCHEME FOR HYPERSPECTRAL IMAGERY

    Directory of Open Access Journals (Sweden)

    B. Kumar

    2016-06-01

    Full Text Available Extended morphological profile (EMP is a good technique for extracting spectral-spatial information from the images but large size of hyperspectral images is an important concern for creating EMPs. However, with the availability of modern multi-core processors and commodity parallel processing systems like graphics processing units (GPUs at desktop level, parallel computing provides a viable option to significantly accelerate execution of such computations. In this paper, parallel implementation of an EMP based spectralspatial classification method for hyperspectral imagery is presented. The parallel implementation is done both on multi-core CPU and GPU. The impact of parallelization on speed up and classification accuracy is analyzed. For GPU, the implementation is done in compute unified device architecture (CUDA C. The experiments are carried out on two well-known hyperspectral images. It is observed from the experimental results that GPU implementation provides a speed up of about 7 times, while parallel implementation on multi-core CPU resulted in speed up of about 3 times. It is also observed that parallel implementation has no adverse impact on the classification accuracy.

  13. Simulation of neutron transport equation using parallel Monte Carlo for deep penetration problems

    International Nuclear Information System (INIS)

    Bekar, K. K.; Tombakoglu, M.; Soekmen, C. N.

    2001-01-01

    Neutron transport equation is simulated using parallel Monte Carlo method for deep penetration neutron transport problem. Monte Carlo simulation is parallelized by using three different techniques; direct parallelization, domain decomposition and domain decomposition with load balancing, which are used with PVM (Parallel Virtual Machine) software on LAN (Local Area Network). The results of parallel simulation are given for various model problems. The performances of the parallelization techniques are compared with each other. Moreover, the effects of variance reduction techniques on parallelization are discussed

  14. PERFORMANCE ANALYSIS BETWEEN EXPLICIT SCHEDULING AND IMPLICIT SCHEDULING OF PARALLEL ARRAY-BASED DOMAIN DECOMPOSITION USING OPENMP

    Directory of Open Access Journals (Sweden)

    MOHAMMED FAIZ ABOALMAALY

    2014-10-01

    Full Text Available With the continuous revolution of multicore architecture, several parallel programming platforms have been introduced in order to pave the way for fast and efficient development of parallel algorithms. Back into its categories, parallel computing can be done through two forms: Data-Level Parallelism (DLP or Task-Level Parallelism (TLP. The former can be done by the distribution of data among the available processing elements while the latter is based on executing independent tasks concurrently. Most of the parallel programming platforms have built-in techniques to distribute the data among processors, these techniques are technically known as automatic distribution (scheduling. However, due to their wide range of purposes, variation of data types, amount of distributed data, possibility of extra computational overhead and other hardware-dependent factors, manual distribution could achieve better outcomes in terms of performance when compared to the automatic distribution. In this paper, this assumption is investigated by conducting a comparison between automatic and our newly proposed manual distribution of data among threads in parallel. Empirical results of matrix addition and matrix multiplication show a considerable performance gain when manual distribution is applied against automatic distribution.

  15. Programming massively parallel processors a hands-on approach

    CERN Document Server

    Kirk, David B

    2010-01-01

    Programming Massively Parallel Processors discusses basic concepts about parallel programming and GPU architecture. ""Massively parallel"" refers to the use of a large number of processors to perform a set of computations in a coordinated parallel way. The book details various techniques for constructing parallel programs. It also discusses the development process, performance level, floating-point format, parallel patterns, and dynamic parallelism. The book serves as a teaching guide where parallel programming is the main topic of the course. It builds on the basics of C programming for CUDA, a parallel programming environment that is supported on NVI- DIA GPUs. Composed of 12 chapters, the book begins with basic information about the GPU as a parallel computer source. It also explains the main concepts of CUDA, data parallelism, and the importance of memory access efficiency using CUDA. The target audience of the book is graduate and undergraduate students from all science and engineering disciplines who ...

  16. Darboux transformation for two-level system

    Energy Technology Data Exchange (ETDEWEB)

    Bagrov, V.; Baldiotti, M.; Gitman, D.; Shamshutdinova, V. [Instituto de Fisica, Universidade de Sao Paulo, Caixa Postal 66318-CEP, 05315-970 Sao Paulo, S.P. (Brazil)

    2005-06-01

    We develop the Darboux procedure for the case of the two-level system. In particular, it is demonstrated that one can construct the Darboux intertwining operator that does not violate the specific structure of the equations of the two-level system, transforming only one real potential into another real potential. We apply the obtained Darboux transformation to known exact solutions of the two-level system. Thus, we find three classes of new solutions for the two-level system and the corresponding new potentials that allow such solutions. (Abstract Copyright [2005], Wiley Periodicals, Inc.)

  17. An application of analyzing the trajectories of two disorders: A parallel piecewise growth model of substance use and attention-deficit/hyperactivity disorder.

    Science.gov (United States)

    Mamey, Mary Rose; Barbosa-Leiker, Celestina; McPherson, Sterling; Burns, G Leonard; Parks, Craig; Roll, John

    2015-12-01

    Researchers often want to examine 2 comorbid conditions simultaneously. One strategy to do so is through the use of parallel latent growth curve modeling (LGCM). This statistical technique allows for the simultaneous evaluation of 2 disorders to determine the explanations and predictors of change over time. Additionally, a piecewise model can help identify whether there are more than 2 growth processes within each disorder (e.g., during a clinical trial). A parallel piecewise LGCM was applied to self-reported attention-deficit/hyperactivity disorder (ADHD) and self-reported substance use symptoms in 303 adolescents enrolled in cognitive-behavioral therapy treatment for a substance use disorder and receiving either oral-methylphenidate or placebo for ADHD across 16 weeks. Assessing these 2 disorders concurrently allowed us to determine whether elevated levels of 1 disorder predicted elevated levels or increased risk of the other disorder. First, a piecewise growth model measured ADHD and substance use separately. Next, a parallel piecewise LGCM was used to estimate the regressions across disorders to determine whether higher scores at baseline of the disorders (i.e., ADHD or substance use disorder) predicted rates of change in the related disorder. Finally, treatment was added to the model to predict change. While the analyses revealed no significant relationships across disorders, this study explains and applies a parallel piecewise growth model to examine the developmental processes of comorbid conditions over the course of a clinical trial. Strengths of piecewise and parallel LGCMs for other addictions researchers interested in examining dual processes over time are discussed. (PsycINFO Database Record (c) 2015 APA, all rights reserved).

  18. Portable programming on parallel/networked computers using the Application Portable Parallel Library (APPL)

    Science.gov (United States)

    Quealy, Angela; Cole, Gary L.; Blech, Richard A.

    1993-01-01

    The Application Portable Parallel Library (APPL) is a subroutine-based library of communication primitives that is callable from applications written in FORTRAN or C. APPL provides a consistent programmer interface to a variety of distributed and shared-memory multiprocessor MIMD machines. The objective of APPL is to minimize the effort required to move parallel applications from one machine to another, or to a network of homogeneous machines. APPL encompasses many of the message-passing primitives that are currently available on commercial multiprocessor systems. This paper describes APPL (version 2.3.1) and its usage, reports the status of the APPL project, and indicates possible directions for the future. Several applications using APPL are discussed, as well as performance and overhead results.

  19. Water-based squeezing flow in the presence of carbon nanotubes between two parallel disks

    Directory of Open Access Journals (Sweden)

    Haq Rizwan Ul

    2016-01-01

    Full Text Available Present study is dedicated to investigate the water functionalized carbon nanotubes squeezing flow between two parallel discs. Moreover, we have considered magnetohydrodynamics effects normal to the disks. In addition we have considered two kind of carbon nanotubes named: single wall carbon nanotubes (SWCNT and multiple wall carbon nanotubes (MWCNT with in the base fluid. Under this squeezing flow mechanism model has been constructed in the form of partial differential equation. Transformed ordinary differential equations are solved numerically with the help of Runge-Kutta-Fehlberg method. Results for velocity and temperature are constructed against all the emerging parameters. Comparison among the SWCNT and MWCNT are drawn for skin friction coefficient and local Nusselt number. Conclusion remarks are drawn under the observation of whole analysis.

  20. Parallel Algorithms for the Exascale Era

    Energy Technology Data Exchange (ETDEWEB)

    Robey, Robert W. [Los Alamos National Laboratory

    2016-10-19

    New parallel algorithms are needed to reach the Exascale level of parallelism with millions of cores. We look at some of the research developed by students in projects at LANL. The research blends ideas from the early days of computing while weaving in the fresh approach brought by students new to the field of high performance computing. We look at reproducibility of global sums and why it is important to parallel computing. Next we look at how the concept of hashing has led to the development of more scalable algorithms suitable for next-generation parallel computers. Nearly all of this work has been done by undergraduates and published in leading scientific journals.

  1. Effects of parallel electron dynamics on plasma blob transport

    Energy Technology Data Exchange (ETDEWEB)

    Angus, Justin R.; Krasheninnikov, Sergei I. [University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093 (United States); Umansky, Maxim V. [Lawrence Livermore National Laboratory, 7000 East Avenue, Livermore, California 94550 (United States)

    2012-08-15

    The 3D effects on sheath connected plasma blobs that result from parallel electron dynamics are studied by allowing for the variation of blob density and potential along the magnetic field line and using collisional Ohm's law to model the parallel current density. The parallel current density from linear sheath theory, typically used in the 2D model, is implemented as parallel boundary conditions. This model includes electrostatic 3D effects, such as resistive drift waves and blob spinning, while retaining all of the fundamental 2D physics of sheath connected plasma blobs. If the growth time of unstable drift waves is comparable to the 2D advection time scale of the blob, then the blob's density gradient will be depleted resulting in a much more diffusive blob with little radial motion. Furthermore, blob profiles that are initially varying along the field line drive the potential to a Boltzmann relation that spins the blob and thereby acts as an addition sink of the 2D potential. Basic dimensionless parameters are presented to estimate the relative importance of these two 3D effects. The deviation of blob dynamics from that predicted by 2D theory in the appropriate limits of these parameters is demonstrated by a direct comparison of 2D and 3D seeded blob simulations.

  2. Excitation transfer in two two-level systems coupled to an oscillator

    International Nuclear Information System (INIS)

    Hagelstein, P L; Chaudhary, I U

    2008-01-01

    We consider a generalization of the spin-boson model in which two different two-level systems are coupled to an oscillator, under conditions where the oscillator energy is much less than the two-level system energies, and where the oscillator is highly excited. We find that the two-level system transition energy is shifted, producing a Bloch-Siegert shift in each two-level system similar to what would be obtained if the other were absent. At resonances associated with energy exchange between a two-level system and the oscillator, the level splitting is about the same as would be obtained in the spin-boson model at a Bloch-Siegert resonance. However, there occur resonances associated with the transfer of excitation between one two-level system and the other, an effect not present in the spin-boson model. We use a unitary transformation leading to a rotated system in which terms responsible for the shift and splittings can be identified. The level splittings at the anticrossings associated with both energy exchange and excitation transfer resonances are accounted for with simple two-state models and degenerate perturbation theory using operators that appear in the rotated Hamiltonian

  3. Instantaneous Kinematics Analysis via Screw-Theory of a Novel 3-CRC Parallel Mechanism

    Directory of Open Access Journals (Sweden)

    Hussein de la Torre

    2016-06-01

    Full Text Available This paper presents the mobility and kinematics analysis of a novel parallel mechanism that is composed by one base, one platform and three identical limbs with CRC joints. The paper obtains closed-form solutions to the direct and inverse kinematics problems, and determines the mobility of the mechanism and instantaneous kinematics by applying screw theory. The obtained results show that this parallel robot is part of the family 2R1T, since the platform shows 3 DOF, i.e.: one translation perpendicular to the base and two rotations about skew axes. In order to calculate the direct instantaneous kinematics, this paper introduces the vector mh, which is part of the joint velocity vector that multiplies the overall inverse Jacobian matrix. This paper compares the results between simulations and numerical examples using Mathematica and SolidWorks in order to prove the accuracy of the analytical results.

  4. Multi-area market clearing in wind-integrated interconnected power systems: A fast parallel decentralized method

    International Nuclear Information System (INIS)

    Doostizadeh, Meysam; Aminifar, Farrokh; Lesani, Hamid; Ghasemi, Hassan

    2016-01-01

    Highlights: • A parallel-decentralized multi-area energy & reserve clearance model is proposed. • A fictitious area and joint variables coordinate & parallelize area market models. • Adjustable intervals of random variables compromise optimality and robustness. • The stochastic nature of problem is tackled in an efficient deterministic manner. • The model is compact and applicable in multi-area real-scale systems. - Abstract: The growing evolution of regional electricity markets and proliferation of wind power penetration underline the prominence of coordinated operation of interconnected regional power systems. This paper develops a parallel decentralized methodology for multi-area energy and reserve clearance under wind power uncertainty. Preserving the independency of regional markets while fully taking the advantages of interconnection is a salient feature of the new model. Additionally, the parallel procedure simultaneously clears regional markets for the sake of acceleration particularly in large-scale systems. In order to achieve the optimal solution in a distributed fashion, the augmented Lagrangian relaxation along with alternative direction method of multipliers are applied. The wind power intermittency and uncertainty are tackled through the interval optimization approach. Opposed to the conventional wisdom, adjustable intervals, as subsets of conventional predefined intervals, are introduced here to compromise the cost and conservatism of the solution. The confidence level approach is employed to accommodate the stochastic nature of wind power in a computationally efficient deterministic manner. The effectiveness and robustness of the proposed method are evaluated through several case studies on a two-area 6-bus and the modified three-area IEEE 118-bus test systems.

  5. A Green's function method for two-dimensional reactive solute transport in a parallel fracture-matrix system

    Science.gov (United States)

    Chen, Kewei; Zhan, Hongbin

    2018-06-01

    The reactive solute transport in a single fracture bounded by upper and lower matrixes is a classical problem that captures the dominant factors affecting transport behavior beyond pore scale. A parallel fracture-matrix system which considers the interaction among multiple paralleled fractures is an extension to a single fracture-matrix system. The existing analytical or semi-analytical solution for solute transport in a parallel fracture-matrix simplifies the problem to various degrees, such as neglecting the transverse dispersion in the fracture and/or the longitudinal diffusion in the matrix. The difficulty of solving the full two-dimensional (2-D) problem lies in the calculation of the mass exchange between the fracture and matrix. In this study, we propose an innovative Green's function approach to address the 2-D reactive solute transport in a parallel fracture-matrix system. The flux at the interface is calculated numerically. It is found that the transverse dispersion in the fracture can be safely neglected due to the small scale of fracture aperture. However, neglecting the longitudinal matrix diffusion would overestimate the concentration profile near the solute entrance face and underestimate the concentration profile at the far side. The error caused by neglecting the longitudinal matrix diffusion decreases with increasing Peclet number. The longitudinal matrix diffusion does not have obvious influence on the concentration profile in long-term. The developed model is applied to a non-aqueous-phase-liquid (DNAPL) contamination field case in New Haven Arkose of Connecticut in USA to estimate the Trichloroethylene (TCE) behavior over 40 years. The ratio of TCE mass stored in the matrix and the injected TCE mass increases above 90% in less than 10 years.

  6. Experiences in the parallelization of the discrete ordinates method using OpenMP and MPI

    Energy Technology Data Exchange (ETDEWEB)

    Pautz, A. [TUV Hannover/Sachsen-Anhalt e.V. (Germany); Langenbuch, S. [Gesellschaft fur Anlagen- und Reaktorsicherheit (GRS) mbH (Germany)

    2003-07-01

    The method of Discrete Ordinates is in principle parallelizable to a high degree, since the transport 'mesh sweeps' are mutually independent for all angular directions. However, in the well-known production code Dort such a type of angular domain decomposition has to be done on a spatial line-byline basis, causing the parallelism in the code to be very fine-grained. The construction of scalar fluxes and moments requires a large effort for inter-thread or inter-process communication. We have implemented two different parallelization approaches in Dort: firstly, we have used a shared-memory model suitable for SMP (Symmetric Multiprocessor) machines based on the standard OpenMP. The second approach uses the well-known Message Passing Interface (MPI) to establish communication between parallel processes running in a distributed-memory environment. We investigate the benefits and drawbacks of both models and show first results on performance and scaling behaviour of the parallel Dort code. (authors)

  7. High performance parallel computers for science: New developments at the Fermilab advanced computer program

    International Nuclear Information System (INIS)

    Nash, T.; Areti, H.; Atac, R.

    1988-08-01

    Fermilab's Advanced Computer Program (ACP) has been developing highly cost effective, yet practical, parallel computers for high energy physics since 1984. The ACP's latest developments are proceeding in two directions. A Second Generation ACP Multiprocessor System for experiments will include $3500 RISC processors each with performance over 15 VAX MIPS. To support such high performance, the new system allows parallel I/O, parallel interprocess communication, and parallel host processes. The ACP Multi-Array Processor, has been developed for theoretical physics. Each $4000 node is a FORTRAN or C programmable pipelined 20 MFlops (peak), 10 MByte single board computer. These are plugged into a 16 port crossbar switch crate which handles both inter and intra crate communication. The crates are connected in a hypercube. Site oriented applications like lattice gauge theory are supported by system software called CANOPY, which makes the hardware virtually transparent to users. A 256 node, 5 GFlop, system is under construction. 10 refs., 7 figs

  8. Experiences in the parallelization of the discrete ordinates method using OpenMP and MPI

    International Nuclear Information System (INIS)

    Pautz, A.; Langenbuch, S.

    2003-01-01

    The method of Discrete Ordinates is in principle parallelizable to a high degree, since the transport 'mesh sweeps' are mutually independent for all angular directions. However, in the well-known production code Dort such a type of angular domain decomposition has to be done on a spatial line-byline basis, causing the parallelism in the code to be very fine-grained. The construction of scalar fluxes and moments requires a large effort for inter-thread or inter-process communication. We have implemented two different parallelization approaches in Dort: firstly, we have used a shared-memory model suitable for SMP (Symmetric Multiprocessor) machines based on the standard OpenMP. The second approach uses the well-known Message Passing Interface (MPI) to establish communication between parallel processes running in a distributed-memory environment. We investigate the benefits and drawbacks of both models and show first results on performance and scaling behaviour of the parallel Dort code. (authors)

  9. Expressing Parallelism with ROOT

    Energy Technology Data Exchange (ETDEWEB)

    Piparo, D. [CERN; Tejedor, E. [CERN; Guiraud, E. [CERN; Ganis, G. [CERN; Mato, P. [CERN; Moneta, L. [CERN; Valls Pla, X. [CERN; Canal, P. [Fermilab

    2017-11-22

    The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module in Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.

  10. Expressing Parallelism with ROOT

    Science.gov (United States)

    Piparo, D.; Tejedor, E.; Guiraud, E.; Ganis, G.; Mato, P.; Moneta, L.; Valls Pla, X.; Canal, P.

    2017-10-01

    The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module in Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.

  11. Parallelizing AT with MatlabMPI

    International Nuclear Information System (INIS)

    2011-01-01

    The Accelerator Toolbox (AT) is a high-level collection of tools and scripts specifically oriented toward solving problems dealing with computational accelerator physics. It is integrated into the MATLAB environment, which provides an accessible, intuitive interface for accelerator physicists, allowing researchers to focus the majority of their efforts on simulations and calculations, rather than programming and debugging difficulties. Efforts toward parallelization of AT have been put in place to upgrade its performance to modern standards of computing. We utilized the packages MatlabMPI and pMatlab, which were developed by MIT Lincoln Laboratory, to set up a message-passing environment that could be called within MATLAB, which set up the necessary pre-requisites for multithread processing capabilities. On local quad-core CPUs, we were able to demonstrate processor efficiencies of roughly 95% and speed increases of nearly 380%. By exploiting the efficacy of modern-day parallel computing, we were able to demonstrate incredibly efficient speed increments per processor in AT's beam-tracking functions. Extrapolating from prediction, we can expect to reduce week-long computation runtimes to less than 15 minutes. This is a huge performance improvement and has enormous implications for the future computing power of the accelerator physics group at SSRL. However, one of the downfalls of parringpass is its current lack of transparency; the pMatlab and MatlabMPI packages must first be well-understood by the user before the system can be configured to run the scripts. In addition, the instantiation of argument parameters requires internal modification of the source code. Thus, parringpass, cannot be directly run from the MATLAB command line, which detracts from its flexibility and user-friendliness. Future work in AT's parallelization will focus on development of external functions and scripts that can be called from within MATLAB and configured on multiple nodes, while

  12. Semi-coarsening multigrid methods for parallel computing

    Energy Technology Data Exchange (ETDEWEB)

    Jones, J.E.

    1996-12-31

    Standard multigrid methods are not well suited for problems with anisotropic coefficients which can occur, for example, on grids that are stretched to resolve a boundary layer. There are several different modifications of the standard multigrid algorithm that yield efficient methods for anisotropic problems. In the paper, we investigate the parallel performance of these multigrid algorithms. Multigrid algorithms which work well for anisotropic problems are based on line relaxation and/or semi-coarsening. In semi-coarsening multigrid algorithms a grid is coarsened in only one of the coordinate directions unlike standard or full-coarsening multigrid algorithms where a grid is coarsened in each of the coordinate directions. When both semi-coarsening and line relaxation are used, the resulting multigrid algorithm is robust and automatic in that it requires no knowledge of the nature of the anisotropy. This is the basic multigrid algorithm whose parallel performance we investigate in the paper. The algorithm is currently being implemented on an IBM SP2 and its performance is being analyzed. In addition to looking at the parallel performance of the basic semi-coarsening algorithm, we present algorithmic modifications with potentially better parallel efficiency. One modification reduces the amount of computational work done in relaxation at the expense of using multiple coarse grids. This modification is also being implemented with the aim of comparing its performance to that of the basic semi-coarsening algorithm.

  13. Parallel vs. Convergent Evolution in Domestication and Diversification of Crops in the Americas

    Directory of Open Access Journals (Sweden)

    Barbara Pickersgill

    2018-05-01

    Full Text Available Domestication involves changes in various traits of the phenotype in response to human selection. Diversification may accompany or follow domestication, and results in variants within the crop adapted to different uses by humans or different agronomic conditions. Similar domestication and diversification traits may be shared by closely related species (parallel evolution or by distantly related species (convergent evolution. Many of these traits are produced by complex genetic networks or long biosynthetic pathways that are extensively conserved even in distantly related species. Similar phenotypic changes in different species may be controlled by homologous genes (parallel evolution at the genetic level or non-homologous genes (convergent evolution at the genetic level. It has been suggested that parallel evolution may be more frequent among closely related species, or among diversification rather than domestication traits, or among traits produced by simple metabolic pathways. Crops domesticated in the Americas span a spectrum of genetic relatedness, have been domesticated for diverse purposes, and have responded to human selection by changes in many different traits, so provide examples of both parallel and convergent evolution at various levels. However, despite the current explosion in relevant information, data are still insufficient to provide quantitative or conclusive assessments of the relative roles of these two processes in domestication and diversification

  14. Casimir effect of two conducting parallel plates in a general weak gravitational field

    Energy Technology Data Exchange (ETDEWEB)

    Nazari, Borzoo [University of Tehran, Faculty of Engineering Science, College of Engineering, P.O. Box 11155-4563, Tehran (Iran, Islamic Republic of)

    2015-10-15

    We calculate the finite vacuum energy density of the scalar and electromagnetic fields inside a Casimir apparatus made up of two conducting parallel plates in a general weak gravitational field. The metric of the weak gravitational field has a small deviation from flat spacetime inside the apparatus, and we find it by expanding the metric in terms of small parameters of the weak background. We show that the metric found can be transformed via a gauge transformation to the Fermi metric. We solve the Klein-Gordon equation exactly and find mode frequencies in Fermi spacetime. Using the fact that the electromagnetic field can be represented by two scalar fields in the Fermi spacetime, we find general formulas for the energy density and mode frequencies of the electromagnetic field. Some well-known weak backgrounds are examined and consistency of the results with the literature is shown. (orig.)

  15. Parallel implementation of geometrical shock dynamics for two dimensional converging shock waves

    Science.gov (United States)

    Qiu, Shi; Liu, Kuang; Eliasson, Veronica

    2016-10-01

    Geometrical shock dynamics (GSD) theory is an appealing method to predict the shock motion in the sense that it is more computationally efficient than solving the traditional Euler equations, especially for converging shock waves. However, to solve and optimize large scale configurations, the main bottleneck is the computational cost. Among the existing numerical GSD schemes, there is only one that has been implemented on parallel computers, with the purpose to analyze detonation waves. To extend the computational advantage of the GSD theory to more general applications such as converging shock waves, a numerical implementation using a spatial decomposition method has been coupled with a front tracking approach on parallel computers. In addition, an efficient tridiagonal system solver for massively parallel computers has been applied to resolve the most expensive function in this implementation, resulting in an efficiency of 0.93 while using 32 HPCC cores. Moreover, symmetric boundary conditions have been developed to further reduce the computational cost, achieving a speedup of 19.26 for a 12-sided polygonal converging shock.

  16. Flexure mechanism-based parallelism measurements for chip-on-glass bonding

    International Nuclear Information System (INIS)

    Jung, Seung Won; Yun, Won Soo; Jin, Songwan; Jeong, Young Hun; Kim, Bo Sun

    2011-01-01

    Recently, liquid crystal displays (LCDs) have played vital roles in a variety of electronic devices such as televisions, cellular phones, and desktop/laptop monitors because of their enhanced volume, performance, and functionality. However, there is still a need for thinner LCD panels due to the trend of miniaturization in electronic applications. Thus, chip-on-glass (COG) bonding has become one of the most important aspects in the LCD panel manufacturing process. In this study, a novel sensor was developed to measure the parallelism between the tooltip planes of the bonding head and the backup of the COG main bonder, which has previously been estimated by prescale pressure films in industry. The sensor developed in this study is based on a flexure mechanism, and it can measure the total pressing force and the inclination angles in two directions that satisfy the quantitative definition of parallelism. To improve the measurement accuracy, the sensor was calibrated based on the estimation of the total pressing force and the inclination angles using the least-squares method. To verify the accuracy of the sensor, the estimation results for parallelism were compared with those from prescale pressure film measurements. In addition, the influence of parallelism on the bonding quality was experimentally demonstrated. The sensor was successfully applied to the measurement of parallelism in the COG-bonding process with an accuracy of more than three times that of the conventional method using prescale pressure films

  17. Parallel manipulators with two end-effectors : Getting a grip on Jacobian-based stiffness analysis

    NARCIS (Netherlands)

    Hoevenaars, A.G.L.

    2016-01-01

    Robots that are developed for applications which require a high stiffness-over-inertia ratio, such as pick-and-place robots, machining robots, or haptic devices, are often based on parallel manipulators. Parallel manipulators connect an end-effector to an inertial base using multiple serial

  18. Hierarchical Parallel Matrix Multiplication on Large-Scale Distributed Memory Platforms

    KAUST Repository

    Quintin, Jean-Noel

    2013-10-01

    Matrix multiplication is a very important computation kernel both in its own right as a building block of many scientific applications and as a popular representative for other scientific applications. Cannon\\'s algorithm which dates back to 1969 was the first efficient algorithm for parallel matrix multiplication providing theoretically optimal communication cost. However this algorithm requires a square number of processors. In the mid-1990s, the SUMMA algorithm was introduced. SUMMA overcomes the shortcomings of Cannon\\'s algorithm as it can be used on a nonsquare number of processors as well. Since then the number of processors in HPC platforms has increased by two orders of magnitude making the contribution of communication in the overall execution time more significant. Therefore, the state of the art parallel matrix multiplication algorithms should be revisited to reduce the communication cost further. This paper introduces a new parallel matrix multiplication algorithm, Hierarchical SUMMA (HSUMMA), which is a redesign of SUMMA. Our algorithm reduces the communication cost of SUMMA by introducing a two-level virtual hierarchy into the two-dimensional arrangement of processors. Experiments on an IBM BlueGene/P demonstrate the reduction of communication cost up to 2.08 times on 2048 cores and up to 5.89 times on 16384 cores. © 2013 IEEE.

  19. Hierarchical Parallel Matrix Multiplication on Large-Scale Distributed Memory Platforms

    KAUST Repository

    Quintin, Jean-Noel; Hasanov, Khalid; Lastovetsky, Alexey

    2013-01-01

    Matrix multiplication is a very important computation kernel both in its own right as a building block of many scientific applications and as a popular representative for other scientific applications. Cannon's algorithm which dates back to 1969 was the first efficient algorithm for parallel matrix multiplication providing theoretically optimal communication cost. However this algorithm requires a square number of processors. In the mid-1990s, the SUMMA algorithm was introduced. SUMMA overcomes the shortcomings of Cannon's algorithm as it can be used on a nonsquare number of processors as well. Since then the number of processors in HPC platforms has increased by two orders of magnitude making the contribution of communication in the overall execution time more significant. Therefore, the state of the art parallel matrix multiplication algorithms should be revisited to reduce the communication cost further. This paper introduces a new parallel matrix multiplication algorithm, Hierarchical SUMMA (HSUMMA), which is a redesign of SUMMA. Our algorithm reduces the communication cost of SUMMA by introducing a two-level virtual hierarchy into the two-dimensional arrangement of processors. Experiments on an IBM BlueGene/P demonstrate the reduction of communication cost up to 2.08 times on 2048 cores and up to 5.89 times on 16384 cores. © 2013 IEEE.

  20. 6th International Parallel Tools Workshop

    CERN Document Server

    Brinkmann, Steffen; Gracia, José; Resch, Michael; Nagel, Wolfgang

    2013-01-01

    The latest advances in the High Performance Computing hardware have significantly raised the level of available compute performance. At the same time, the growing hardware capabilities of modern supercomputing architectures have caused an increasing complexity of the parallel application development. Despite numerous efforts to improve and simplify parallel programming, there is still a lot of manual debugging and  tuning work required. This process  is supported by special software tools, facilitating debugging, performance analysis, and optimization and thus  making a major contribution to the development of  robust and efficient parallel software. This book introduces a selection of the tools, which were presented and discussed at the 6th International Parallel Tools Workshop, held in Stuttgart, Germany, 25-26 September 2012.

  1. OPTIMIZATION OF AGGREGATION AND SEQUENTIAL-PARALLEL EXECUTION MODES OF INTERSECTING OPERATION SETS

    Directory of Open Access Journals (Sweden)

    G. М. Levin

    2016-01-01

    Full Text Available A mathematical model and a method for the problem of optimization of aggregation and of sequential- parallel execution modes of intersecting operation sets are proposed. The proposed method is based on the two-level decomposition scheme. At the top level the variant of aggregation for groups of operations is selected, and at the lower level the execution modes of operations are optimized for a fixed version of aggregation.

  2. Repeated sprint ability in young basketball players: one vs. two changes of direction (Part 2).

    Science.gov (United States)

    Attene, Giuseppe; Laffaye, Guillaume; Chaouachi, Anis; Pizzolato, Fabio; Migliaccio, Gian Mario; Padulo, Johnny

    2015-01-01

    The aim of this study was to compare the training effects based on repeated sprint ability (RSA) (with one change of direction) with an intensive repeated sprint ability (IRSA) (with two changes of direction) on jump performance and aerobic fitness. Eighteen male basketball players were assigned to repeated sprint ability and intensive repeated sprint ability training groups (RSAG and IRSAG). RSA, IRSA, squat jump (SJ), countermovement jump (CMJ) and Yo-Yo intermittent recovery level 1 test were assessed before and after four training weeks. The RSA and IRSA trainings consisted of three sets of six sprints (first two weeks) and eight sprints (second two weeks) with 4-min sets recovery and 20-s of sprints recovery. Four weeks of training led to an overall improvement in most of the measures of RSA, but little evidence of any differences between the two training modes. Jump performance was enhanced: CMJ of 7.5% (P training with one/two changes of direction promotes improvements in both RSA and IRSA respectively but the better increase on jump performance shown a few changes on sprint and endurance performances.

  3. Analysis of parallel computing performance of the code MCNP

    International Nuclear Information System (INIS)

    Wang Lei; Wang Kan; Yu Ganglin

    2006-01-01

    Parallel computing can reduce the running time of the code MCNP effectively. With the MPI message transmitting software, MCNP5 can achieve its parallel computing on PC cluster with Windows operating system. Parallel computing performance of MCNP is influenced by factors such as the type, the complexity level and the parameter configuration of the computing problem. This paper analyzes the parallel computing performance of MCNP regarding with these factors and gives measures to improve the MCNP parallel computing performance. (authors)

  4. Parallel transmission techniques in magnetic resonance imaging: experimental realization, applications and perspectives; Parallele Sendetechniken in der Magnetresonanztomographie: experimentelle Realisierung, Anwendungen und Perspektiven

    Energy Technology Data Exchange (ETDEWEB)

    Ullmann, P.

    2007-06-15

    The primary objective of this work was the first experimental realization of parallel RF transmission for accelerating spatially selective excitation in magnetic resonance imaging. Furthermore, basic aspects regarding the performance of this technique were investigated, potential risks regarding the specific absorption rate (SAR) were considered and feasibility studies under application-oriented conditions as first steps towards a practical utilisation of this technique were undertaken. At first, based on the RF electronics platform of the Bruker Avance MRI systems, the technical foundations were laid to perform simultaneous transmission of individual RF waveforms on different RF channels. Another essential requirement for the realization of Parallel Excitation (PEX) was the design and construction of suitable RF transmit arrays with elements driven by separate transmit channels. In order to image the PEX results two imaging methods were implemented based on a spin-echo and a gradient-echo sequence, in which a parallel spatially selective pulse was included as an excitation pulse. In the course of this work PEX experiments were successfully performed on three different MRI systems, a 4.7 T and a 9.4 T animal system and a 3 T human scanner, using 5 different RF coil setups in total. In the last part of this work investigations regarding possible applications of Parallel Excitation were performed. A first study comprised experiments of slice-selective B1 inhomogeneity correction by using 3D-selective Parallel Excitation. The investigations were performed in a phantom as well as in a rat fixed in paraformaldehyde solution. In conjunction with these experiments a novel method of calculating RF pulses for spatially selective excitation based on a so-called Direct Calibration approach was developed, which is particularly suitable for this type of experiments. In the context of these experiments it was demonstrated how to combine the advantages of parallel transmission

  5. Analytical model for vibration prediction of two parallel tunnels in a full-space

    Science.gov (United States)

    He, Chao; Zhou, Shunhua; Guo, Peijun; Di, Honggui; Zhang, Xiaohui

    2018-06-01

    This paper presents a three-dimensional analytical model for the prediction of ground vibrations from two parallel tunnels embedded in a full-space. The two tunnels are modelled as cylindrical shells of infinite length, and the surrounding soil is modelled as a full-space with two cylindrical cavities. A virtual interface is introduced to divide the soil into the right layer and the left layer. By transforming the cylindrical waves into the plane waves, the solution of wave propagation in the full-space with two cylindrical cavities is obtained. The transformations from the plane waves to cylindrical waves are then used to satisfy the boundary conditions on the tunnel-soil interfaces. The proposed model provides a highly efficient tool to predict the ground vibration induced by the underground railway, which accounts for the dynamic interaction between neighbouring tunnels. Analysis of the vibration fields produced over a range of frequencies and soil properties is conducted. When the distance between the two tunnels is smaller than three times the tunnel diameter, the interaction between neighbouring tunnels is highly significant, at times in the order of 20 dB. It is necessary to consider the interaction between neighbouring tunnels for the prediction of ground vibrations induced underground railways.

  6. Structured Parallel Programming Patterns for Efficient Computation

    CERN Document Server

    McCool, Michael; Robison, Arch

    2012-01-01

    Programming is now parallel programming. Much as structured programming revolutionized traditional serial programming decades ago, a new kind of structured programming, based on patterns, is relevant to parallel programming today. Parallel computing experts and industry insiders Michael McCool, Arch Robison, and James Reinders describe how to design and implement maintainable and efficient parallel algorithms using a pattern-based approach. They present both theory and practice, and give detailed concrete examples using multiple programming models. Examples are primarily given using two of th

  7. Metabolic profiling based on two-dimensional J-resolved 1H NMR data and parallel factor analysis

    DEFF Research Database (Denmark)

    Yilmaz, Ali; Nyberg, Nils T; Jaroszewski, Jerzy W.

    2011-01-01

    the intensity variances along the chemical shift axis are taken into account. Here, we describe the use of parallel factor analysis (PARAFAC) as a tool to preprocess a set of two-dimensional J-resolved spectra with the aim of keeping the J-coupling information intact. PARAFAC is a mathematical decomposition......-model was done automatically by evaluating amount of explained variance and core consistency values. Score plots showing the distribution of objects in relation to each other, and loading plots in the form of two-dimensional pseudo-spectra with the same appearance as the original J-resolved spectra...

  8. 3D printed soft parallel actuator

    Science.gov (United States)

    Zolfagharian, Ali; Kouzani, Abbas Z.; Khoo, Sui Yang; Noshadi, Amin; Kaynak, Akif

    2018-04-01

    This paper presents a 3-dimensional (3D) printed soft parallel contactless actuator for the first time. The actuator involves an electro-responsive parallel mechanism made of two segments namely active chain and passive chain both 3D printed. The active chain is attached to the ground from one end and constitutes two actuator links made of responsive hydrogel. The passive chain, on the other hand, is attached to the active chain from one end and consists of two rigid links made of polymer. The actuator links are printed using an extrusion-based 3D-Bioplotter with polyelectrolyte hydrogel as printer ink. The rigid links are also printed by a 3D fused deposition modelling (FDM) printer with acrylonitrile butadiene styrene (ABS) as print material. The kinematics model of the soft parallel actuator is derived via transformation matrices notations to simulate and determine the workspace of the actuator. The printed soft parallel actuator is then immersed into NaOH solution with specific voltage applied to it via two contactless electrodes. The experimental data is then collected and used to develop a parametric model to estimate the end-effector position and regulate kinematics model in response to specific input voltage over time. It is observed that the electroactive actuator demonstrates expected behaviour according to the simulation of its kinematics model. The use of 3D printing for the fabrication of parallel soft actuators opens a new chapter in manufacturing sophisticated soft actuators with high dexterity and mechanical robustness for biomedical applications such as cell manipulation and drug release.

  9. Studies on diversion cross-flow between two parallel channels communicating by a lateral slot. II

    International Nuclear Information System (INIS)

    Tapucu, A.; Merilo, M.

    1977-01-01

    The axial pressure variations of two parallel channels with single phase flows communicating by a long lateral slot have been studied experimentally. Using mass and momentum conservation principles, the axial pressure variations have been derived in terms of two parameters ksub(d) and ksub(r), for donor and recipient channels, respectively. These parameters include the combined effect of fluid transferred from donor to recipient channel, and drag force brought on by the connection gap, and are functions of the velocities and slot geometry parameters. A pressure difference oscillation between channels along the slot has been detected which is sinusoidal with wave lengths which seem to be a function of the gap clearance. (Auth.)

  10. Parallelization of a numerical simulation code for isotropic turbulence

    International Nuclear Information System (INIS)

    Sato, Shigeru; Yokokawa, Mitsuo; Watanabe, Tadashi; Kaburaki, Hideo.

    1996-03-01

    A parallel pseudospectral code which solves the three-dimensional Navier-Stokes equation by direct numerical simulation is developed and execution time, parallelization efficiency, load balance and scalability are evaluated. A vector parallel supercomputer, Fujitsu VPP500 with up to 16 processors is used for this calculation for Fourier modes up to 256x256x256 using 16 processors. Good scalability for number of processors is achieved when number of Fourier mode is fixed. For small Fourier modes, calculation time of the program is proportional to NlogN which is ideal complexity of calculation for 3D-FFT on vector parallel processors. It is found that the calculation performance decreases as the increase of the Fourier modes. (author)

  11. Parallel execution of chemical software on EGEE Grid

    CERN Document Server

    Sterzel, Mariusz

    2008-01-01

    Constant interest among chemical community to study larger and larger molecules forces the parallelization of existing computational methods in chemistry and development of new ones. These are main reasons of frequent port updates and requests from the community for the Grid ports of new packages to satisfy their computational demands. Unfortunately some parallelization schemes used by chemical packages cannot be directly used in Grid environment. Here we present a solution for Gaussian package. The current state of development of Grid middleware allows easy parallel execution in case of software using any of MPI flavour. Unfortunately many chemical packages do not use MPI for parallelization therefore special treatment is needed. Gaussian can be executed in parallel on SMP architecture or via Linda. These require reservation of certain number of processors/cores on a given WN and the equal number of processors/cores on each WN, respectively. The current implementation of EGEE middleware does not offer such f...

  12. Protecting quantum coherence of two-level atoms from vacuum fluctuations of electromagnetic field

    International Nuclear Information System (INIS)

    Liu, Xiaobao; Tian, Zehua; Wang, Jieci; Jing, Jiliang

    2016-01-01

    In the framework of open quantum systems, we study the dynamics of a static polarizable two-level atom interacting with a bath of fluctuating vacuum electromagnetic field and explore under which conditions the coherence of the open quantum system is unaffected by the environment. For both a single-qubit and two-qubit systems, we find that the quantum coherence cannot be protected from noise when the atom interacts with a non-boundary electromagnetic field. However, with the presence of a boundary, the dynamical conditions for the insusceptible of quantum coherence are fulfilled only when the atom is close to the boundary and is transversely polarizable. Otherwise, the quantum coherence can only be protected in some degree in other polarizable direction. -- Highlights: •We study the dynamics of a two-level atom interacting with a bath of fluctuating vacuum electromagnetic field. •For both a single and two-qubit systems, the quantum coherence cannot be protected from noise without a boundary. •The insusceptible of the quantum coherence can be fulfilled only when the atom is close to the boundary and is transversely polarizable. •Otherwise, the quantum coherence can only be protected in some degree in other polarizable direction.

  13. Hall effects on unsteady MHD flow between two rotating disks with non-coincident parallel axes

    Energy Technology Data Exchange (ETDEWEB)

    Barik, R.N., E-mail: barik.rabinarayan@rediffmail.com [Department of Mathematics, Trident Academy of Technology, Bhubaneswar (India); Dash, G.C., E-mail: gcdash@indiatimes.com [Department of Mathematics, S.O.A. University, Bhubaneswar (India); Rath, P.K., E-mail: pkrath_1967@yahoo.in [Department of Mathematics, B.R.M. International Institute of Technology, Bhubaneswar (India)

    2013-01-15

    Hall effects on the unsteady MHD rotating flow of a viscous incompressible electrically conducting fluid between two rotating disks with non-coincident parallel axes have been studied. There exists an axisymmetric solution to this problem. The governing equations are solved by applying Laplace transform method. It is found that the torque experienced by the disks decreases with an increase in either the Hall parameter, m or the rotation parameter, S{sup 2}. Further, the axis of rotation has no effect on the fluid flow. (author)

  14. Hall effects on unsteady MHD flow between two rotating disks with non-coincident parallel axes

    International Nuclear Information System (INIS)

    Barik, R.N.; Dash, G.C.; Rath, P.K.

    2013-01-01

    Hall effects on the unsteady MHD rotating flow of a viscous incompressible electrically conducting fluid between two rotating disks with non-coincident parallel axes have been studied. There exists an axisymmetric solution to this problem. The governing equations are solved by applying Laplace transform method. It is found that the torque experienced by the disks decreases with an increase in either the Hall parameter, m or the rotation parameter, S 2 . Further, the axis of rotation has no effect on the fluid flow. (author)

  15. Applications of the parallel computing system using network

    International Nuclear Information System (INIS)

    Ido, Shunji; Hasebe, Hiroki

    1994-01-01

    Parallel programming is applied to multiple processors connected in Ethernet. Data exchanges between tasks located in each processing element are realized by two ways. One is socket which is standard library on recent UNIX operating systems. Another is a network connecting software, named as Parallel Virtual Machine (PVM) which is a free software developed by ORNL, to use many workstations connected to network as a parallel computer. This paper discusses the availability of parallel computing using network and UNIX workstations and comparison between specialized parallel systems (Transputer and iPSC/860) in a Monte Carlo simulation which generally shows high parallelization ratio. (author)

  16. Gamma-gamma directional correlations for levels excited up to 2.5 MeV of 214Po

    International Nuclear Information System (INIS)

    Morales, A.; Nunez-Lagos, R.; Morales, J.; Plo, M.

    1984-01-01

    The spin of twenty two excited states (up to an energy of 2.5 MeV) of 214 Po have been measured by using gamma-gamma directional correlation techniques, twelve of them for the first time. The multipole mixing ratios of the corresponding electromagnetic transition from these levels to the first excited level have also been determined. (author)

  17. Design of high-performance parallelized gene predictors in MATLAB.

    Science.gov (United States)

    Rivard, Sylvain Robert; Mailloux, Jean-Gabriel; Beguenane, Rachid; Bui, Hung Tien

    2012-04-10

    This paper proposes a method of implementing parallel gene prediction algorithms in MATLAB. The proposed designs are based on either Goertzel's algorithm or on FFTs and have been implemented using varying amounts of parallelism on a central processing unit (CPU) and on a graphics processing unit (GPU). Results show that an implementation using a straightforward approach can require over 4.5 h to process 15 million base pairs (bps) whereas a properly designed one could perform the same task in less than five minutes. In the best case, a GPU implementation can yield these results in 57 s. The present work shows how parallelism can be used in MATLAB for gene prediction in very large DNA sequences to produce results that are over 270 times faster than a conventional approach. This is significant as MATLAB is typically overlooked due to its apparent slow processing time even though it offers a convenient environment for bioinformatics. From a practical standpoint, this work proposes two strategies for accelerating genome data processing which rely on different parallelization mechanisms. Using a CPU, the work shows that direct access to the MEX function increases execution speed and that the PARFOR construct should be used in order to take full advantage of the parallelizable Goertzel implementation. When the target is a GPU, the work shows that data needs to be segmented into manageable sizes within the GFOR construct before processing in order to minimize execution time.

  18. Development of a new dynamic turbulent model, applications to two-dimensional and plane parallel flows

    International Nuclear Information System (INIS)

    Laval, Jean Philippe

    1999-01-01

    We developed a turbulent model based on asymptotic development of the Navier-Stokes equations within the hypothesis of non-local interactions at small scales. This model provides expressions of the turbulent Reynolds sub-grid stresses via estimates of the sub-grid velocities rather than velocities correlations as is usually done. The model involves the coupling of two dynamical equations: one for the resolved scales of motions, which depends upon the Reynolds stresses generated by the sub-grid motions, and one for the sub-grid scales of motions, which can be used to compute the sub-grid Reynolds stresses. The non-locality of interaction at sub-grid scales allows to model their evolution with a linear inhomogeneous equation where the forcing occurs via the energy cascade from resolved to sub-grid scales. This model was solved using a decomposition of sub-grid scales on Gabor's modes and implemented numerically in 2D with periodic boundary conditions. A particles method (PIC) was used to compute the sub-grid scales. The results were compared with results of direct simulations for several typical flows. The model was also applied to plane parallel flows. An analytical study of the equations allows a description of mean velocity profiles in agreement with experimental results and theoretical results based on the symmetries of the Navier-Stokes equation. Possible applications and improvements of the model are discussed in the conclusion. (author) [fr

  19. Parallel Programming with Intel Parallel Studio XE

    CERN Document Server

    Blair-Chappell , Stephen

    2012-01-01

    Optimize code for multi-core processors with Intel's Parallel Studio Parallel programming is rapidly becoming a "must-know" skill for developers. Yet, where to start? This teach-yourself tutorial is an ideal starting point for developers who already know Windows C and C++ and are eager to add parallelism to their code. With a focus on applying tools, techniques, and language extensions to implement parallelism, this essential resource teaches you how to write programs for multicore and leverage the power of multicore in your programs. Sharing hands-on case studies and real-world examples, the

  20. Parallel habitat acclimatization is realized by the expression of different genes in two closely related salamander species (genus Salamandra).

    Science.gov (United States)

    Goedbloed, D J; Czypionka, T; Altmüller, J; Rodriguez, A; Küpfer, E; Segev, O; Blaustein, L; Templeton, A R; Nolte, A W; Steinfartz, S

    2017-12-01

    The utilization of similar habitats by different species provides an ideal opportunity to identify genes underlying adaptation and acclimatization. Here, we analysed the gene expression of two closely related salamander species: Salamandra salamandra in Central Europe and Salamandra infraimmaculata in the Near East. These species inhabit similar habitat types: 'temporary ponds' and 'permanent streams' during larval development. We developed two species-specific gene expression microarrays, each targeting over 12 000 transcripts, including an overlapping subset of 8331 orthologues. Gene expression was examined for systematic differences between temporary ponds and permanent streams in larvae from both salamander species to establish gene sets and functions associated with these two habitat types. Only 20 orthologues were associated with a habitat in both species, but these orthologues did not show parallel expression patterns across species more than expected by chance. Functional annotation of a set of 106 genes with the highest effect size for a habitat suggested four putative gene function categories associated with a habitat in both species: cell proliferation, neural development, oxygen responses and muscle capacity. Among these high effect size genes was a single orthologue (14-3-3 protein zeta/YWHAZ) that was downregulated in temporary ponds in both species. The emergence of four gene function categories combined with a lack of parallel expression of orthologues (except 14-3-3 protein zeta) suggests that parallel habitat adaptation or acclimatization by larvae from S. salamandra and S. infraimmaculata to temporary ponds and permanent streams is mainly realized by different genes with a converging functionality.

  1. Assessment of assembly homogenized two-steps core dynamic calculations using direct whole core transport solutions

    International Nuclear Information System (INIS)

    Hursin, Mathieu; Downar, Thomas J.; Yoon, Joo Il; Joo, Han Gyu

    2016-01-01

    Highlights: • Reactivity initiated accident analysis with direct whole core transient transport code. • Comparison with usual “two steps” procedure. • Effect of effective delayed neutron fraction definition on energy deposition in the fuel. • Effect of homogenized few-group cross sections generation at the assembly level on energy deposition in the fuel. • Effect of effective fuel temperature definition on energy deposition in the fuel. - Abstract: The impact of the approximations in the “two-steps” procedure used in the current generation of nodal simulators for core transient calculations is assessed by using a higher order solution obtained from a direct, whole core, transient transport calculation. A control rod ejection accident in an idealized minicore is analyzed with PARCS, which uses the two-steps procedure and DeCART which provides the higher order solution. DeCART is used as lattice code to provide the homogenized cross sections and kinetics parameters to PARCS. The approximations made by using (1) the homogenized few-group cross sections and kinetic parameters generated at the assembly level, (2) an effective delayed neutrons fraction, (3) an effective fuel temperature and (4) the few-group formulation are investigated in terms of global and local core power behavior. The results presented in the paper show that the current two-steps procedure produces sufficiently accurate transient results with respect to the direct whole core calculation solution, provided that its parameters are carefully generated using the prescriptions described in the present article.

  2. Multitasking TORT Under UNICOS: Parallel Performance Models and Measurements

    International Nuclear Information System (INIS)

    Azmy, Y.Y.; Barnett, D.A.

    1999-01-01

    The existing parallel algorithms in the TORT discrete ordinates were updated to function in a UNI-COS environment. A performance model for the parallel overhead was derived for the existing algorithms. The largest contributors to the parallel overhead were identified and a new algorithm was developed. A parallel overhead model was also derived for the new algorithm. The results of the comparison of parallel performance models were compared to applications of the code to two TORT standard test problems and a large production problem. The parallel performance models agree well with the measured parallel overhead

  3. Multitasking TORT under UNICOS: Parallel performance models and measurements

    International Nuclear Information System (INIS)

    Barnett, A.; Azmy, Y.Y.

    1999-01-01

    The existing parallel algorithms in the TORT discrete ordinates code were updated to function in a UNICOS environment. A performance model for the parallel overhead was derived for the existing algorithms. The largest contributors to the parallel overhead were identified and a new algorithm was developed. A parallel overhead model was also derived for the new algorithm. The results of the comparison of parallel performance models were compared to applications of the code to two TORT standard test problems and a large production problem. The parallel performance models agree well with the measured parallel overhead

  4. Particle image velocimetry measurements of the flow in the converging region of two parallel jets

    Energy Technology Data Exchange (ETDEWEB)

    Wang, Huhu, E-mail: huhuwang@tamu.edu; Lee, Saya, E-mail: sayalee@tamu.edu; Hassan, Yassin A., E-mail: y-hassan@tamu.edu

    2016-09-15

    Highlights: • The flow behaviors in the converging region were non-intrusively investigated using PIV. • The PIV results using two measuring scales and LDV data matched very well. • Significant momentum transfer was observed in the merging region right after the merging point. • Instantaneous vector field revealed characteristic interacting patterns of the jets. - Abstract: The interaction between parallel jets plays a critical role in determining the characteristics of the momentum and heat transfer in the flow. Specifically for next generation VHTR, the output temperature will be about 900 °C, and any thermal oscillations will create safety issues. The mixing variations of the coolants in the reactor core may influence these power oscillations. Numerous numerical tools such as computational fluid dynamics (CFD) simulations have been used to support the reactor design. The validation of CFD method is important to ensure the fidelity of the calculations. This requires high-fidelity, qualified benchmark data. Particle image velocimetry (PIV), a non-intrusive measuring technique, was used to provide benchmark data for resolving a simultaneous flow field in the converging region of two submerged parallel jets issued from rectangular channels. The jets studied in this work had an equal discharge velocity at room temperature. The turbulent characteristics including the distributions of mean velocities, turbulence intensities, Reynolds stresses and z-component vorticity were studied. The streamwise mean velocity measured by PIV and LDV were compared, and they agreed very well.

  5. Data-Parallel Mesh Connected Components Labeling and Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Harrison, Cyrus; Childs, Hank; Gaither, Kelly

    2011-04-10

    We present a data-parallel algorithm for identifying and labeling the connected sub-meshes within a domain-decomposed 3D mesh. The identification task is challenging in a distributed-memory parallel setting because connectivity is transitive and the cells composing each sub-mesh may span many or all processors. Our algorithm employs a multi-stage application of the Union-find algorithm and a spatial partitioning scheme to efficiently merge information across processors and produce a global labeling of connected sub-meshes. Marking each vertex with its corresponding sub-mesh label allows us to isolate mesh features based on topology, enabling new analysis capabilities. We briefly discuss two specific applications of the algorithm and present results from a weak scaling study. We demonstrate the algorithm at concurrency levels up to 2197 cores and analyze meshes containing up to 68 billion cells.

  6. Parallel implementation of multireference coupled-cluster theories based on the reference-level parallelism

    Energy Technology Data Exchange (ETDEWEB)

    Brabec, Jiri; Pittner, Jiri; van Dam, Hubertus JJ; Apra, Edoardo; Kowalski, Karol

    2012-02-01

    A novel algorithm for implementing general type of multireference coupled-cluster (MRCC) theory based on the Jeziorski-Monkhorst exponential Ansatz [B. Jeziorski, H.J. Monkhorst, Phys. Rev. A 24, 1668 (1981)] is introduced. The proposed algorithm utilizes processor groups to calculate the equations for the MRCC amplitudes. In the basic formulation each processor group constructs the equations related to a specific subset of references. By flexible choice of processor groups and subset of reference-specific sufficiency conditions designated to a given group one can assure optimum utilization of available computing resources. The performance of this algorithm is illustrated on the examples of the Brillouin-Wigner and Mukherjee MRCC methods with singles and doubles (BW-MRCCSD and Mk-MRCCSD). A significant improvement in scalability and in reduction of time to solution is reported with respect to recently reported parallel implementation of the BW-MRCCSD formalism [J.Brabec, H.J.J. van Dam, K. Kowalski, J. Pittner, Chem. Phys. Lett. 514, 347 (2011)].

  7. Partitioning of electron flux between the respiratory chains of the yeast Candida parapsilosis: parallel working of the two chains.

    Science.gov (United States)

    Guerin, M G; Camougrand, N M

    1994-02-08

    Partitioning of the electron flux between the classical and the alternative respiratory chains of the yeast Candida parapsilosis, was measured as a function of the oxidation rate and of the Q-pool redox poise. At low respiration rate, electrons from external NADH travelled preferentially through the alternative pathway as indicated by the antimycin A-insensitivity of electron flow. Inhibition of the alternative pathway by SHAM restored full antimycin A-sensitivity to the remaining electro flow. The dependence of the respiratory rate on the redox poise of the quinone pool was investigated when the electron flux was mediated either by the main respiratory chain (growth in the absence of antimycin A) or by the second respiratory chain (growth in the presence of antimycin A). In the former case, a linear relationship was found between these two parameters. In contrast, in the latter case, the relationship between Q-pool reduction level and electron flux was non-linear, but it could be resolved into two distinct curves. This second quinone is not reducible in the presence of antimycin A but only in the presence of high concentrations of myxothiazol or cyanide. Since two quinone species exist in C. parapsilosis, UQ9 and Qx (C33H54O4), we hypothesized that these two curves could correspond to the functioning of the second quinone engaged during the alternative pathway activity. Partitioning of electrons between both respiratory chains could occur upstream of complex III with the second chain functioning in parallel to the main one, and with the additional possibility of merging into the main one at the complex IV level.

  8. New high accuracy super stable alternating direction implicit methods for two and three dimensional hyperbolic damped wave equations

    Directory of Open Access Journals (Sweden)

    R.K. Mohanty

    2014-01-01

    Full Text Available In this paper, we report new three level implicit super stable methods of order two in time and four in space for the solution of hyperbolic damped wave equations in one, two and three space dimensions subject to given appropriate initial and Dirichlet boundary conditions. We use uniform grid points both in time and space directions. Our methods behave like fourth order accurate, when grid size in time-direction is directly proportional to the square of grid size in space-direction. The proposed methods are super stable. The resulting system of algebraic equations is solved by the Gauss elimination method. We discuss new alternating direction implicit (ADI methods for two and three dimensional problems. Numerical results and the graphical representation of numerical solution are presented to illustrate the accuracy of the proposed methods.

  9. A diffusion model for two parallel queues with processor sharing: transient behavior and asymptotics

    Directory of Open Access Journals (Sweden)

    Charles Knessl

    1999-01-01

    Full Text Available We consider two identical, parallel M/M/1 queues. Both queues are fed by a Poisson arrival stream of rate λ and have service rates equal to μ. When both queues are non-empty, the two systems behave independently of each other. However, when one of the queues becomes empty, the corresponding server helps in the other queue. This is called head-of-the-line processor sharing. We study this model in the heavy traffic limit, where ρ=λ/μ→1. We formulate the heavy traffic diffusion approximation and explicitly compute the time-dependent probability of the diffusion approximation to the joint queue length process. We then evaluate the solution asymptotically for large values of space and/or time. This leads to simple expressions that show how the process achieves its stead state and other transient aspects.

  10. OpenMP parallelization of a gridded SWAT (SWATG)

    Science.gov (United States)

    Zhang, Ying; Hou, Jinliang; Cao, Yongpan; Gu, Juan; Huang, Chunlin

    2017-12-01

    Large-scale, long-term and high spatial resolution simulation is a common issue in environmental modeling. A Gridded Hydrologic Response Unit (HRU)-based Soil and Water Assessment Tool (SWATG) that integrates grid modeling scheme with different spatial representations also presents such problems. The time-consuming problem affects applications of very high resolution large-scale watershed modeling. The OpenMP (Open Multi-Processing) parallel application interface is integrated with SWATG (called SWATGP) to accelerate grid modeling based on the HRU level. Such parallel implementation takes better advantage of the computational power of a shared memory computer system. We conducted two experiments at multiple temporal and spatial scales of hydrological modeling using SWATG and SWATGP on a high-end server. At 500-m resolution, SWATGP was found to be up to nine times faster than SWATG in modeling over a roughly 2000 km2 watershed with 1 CPU and a 15 thread configuration. The study results demonstrate that parallel models save considerable time relative to traditional sequential simulation runs. Parallel computations of environmental models are beneficial for model applications, especially at large spatial and temporal scales and at high resolutions. The proposed SWATGP model is thus a promising tool for large-scale and high-resolution water resources research and management in addition to offering data fusion and model coupling ability.

  11. A genetic polymorphism evolving in parallel in two cell compartments and in two clades

    Directory of Open Access Journals (Sweden)

    Watt Ward B

    2013-01-01

    Full Text Available Abstract Background The enzyme phosphoenolpyruvate carboxykinase, PEPCK, occurs in its guanosine-nucleotide-using form in animals and a few prokaryotes. We study its natural genetic variation in Colias (Lepidoptera, Pieridae. PEPCK offers a route, alternative to pyruvate kinase, for carbon skeletons to move between cytosolic glycolysis and mitochondrial Krebs cycle reactions. Results PEPCK is expressed in both cytosol and mitochondrion, but differently in diverse animal clades. In vertebrates and independently in Drosophila, compartment-specific paralogous genes occur. In a contrasting expression strategy, compartment-specific PEPCKs of Colias and of the silkmoth, Bombyx, differ only in their first, 5′, exons; these are alternatively spliced onto a common series of following exons. In two Colias species from distinct clades, PEPCK sequence is highly variable at nonsynonymous and synonymous sites, mainly in its common exons. Three major amino acid polymorphisms, Gly 335 ↔ Ser, Asp 503 ↔ Glu, and Ile 629 ↔ Val occur in both species, and in the first two cases are similar in frequency between species. Homology-based structural modelling shows that the variants can alter hydrogen bonding, salt bridging, or van der Waals interactions of amino acid side chains, locally or at one another′s sites which are distant in PEPCK′s structure, and thus may affect its enzyme function. We ask, using coalescent simulations, if these polymorphisms′ cross-species similarities are compatible with neutral evolution by genetic drift, but find the probability of this null hypothesis is 0.001 ≤ P ≤ 0.006 under differing scenarios. Conclusion Our results make the null hypothesis of neutrality of these PEPCK polymorphisms quite unlikely, but support an alternative hypothesis that they are maintained by natural selection in parallel in the two species. This alternative can now be justifiably tested further via studies of PEPCK genotypes′ effects

  12. Parallelization Issues and Particle-In Codes.

    Science.gov (United States)

    Elster, Anne Cathrine

    1994-01-01

    the simulation may lead to further improvements. For example, in the case of mean particle drift, it is often advantageous to partition the grid primarily along the direction of the drift. The particle-in-cell codes for this study were tested using physical parameters, which lead to predictable phenomena including plasma oscillations and two-stream instabilities. An overview of the most central references related to parallel particle codes is also given.

  13. Direct and indirect two-photon processes in semiconductors

    International Nuclear Information System (INIS)

    Hassan, A.R.

    1986-07-01

    The expressions describing direct and indirect two-photon absorption in crystals are given. They are valid both near and far from the energy gap. A perturbative approach through two different band models is adopted. The effects of the non-parabolicity and the degeneracy of the energy bands are considered. The numerical results are compared with the other theories and with a recent experimental data in Zn and AgCl. It is shown that the dominant transition mechanisms are of the allowed-allowed type near and far from the gap for both direct and indirect processes. (author)

  14. An Integrated Approach to Locality-Conscious Processor Allocation and Scheduling of Mixed-Parallel Applications

    Energy Technology Data Exchange (ETDEWEB)

    Vydyanathan, Naga; Krishnamoorthy, Sriram; Sabin, Gerald M.; Catalyurek, Umit V.; Kurc, Tahsin; Sadayappan, Ponnuswamy; Saltz, Joel H.

    2009-08-01

    Complex parallel applications can often be modeled as directed acyclic graphs of coarse-grained application-tasks with dependences. These applications exhibit both task- and data-parallelism, and combining these two (also called mixedparallelism), has been shown to be an effective model for their execution. In this paper, we present an algorithm to compute the appropriate mix of task- and data-parallelism required to minimize the parallel completion time (makespan) of these applications. In other words, our algorithm determines the set of tasks that should be run concurrently and the number of processors to be allocated to each task. The processor allocation and scheduling decisions are made in an integrated manner and are based on several factors such as the structure of the taskgraph, the runtime estimates and scalability characteristics of the tasks and the inter-task data communication volumes. A locality conscious scheduling strategy is used to improve inter-task data reuse. Evaluation through simulations and actual executions of task graphs derived from real applications as well as synthetic graphs shows that our algorithm consistently generates schedules with lower makespan as compared to CPR and CPA, two previously proposed scheduling algorithms. Our algorithm also produces schedules that have lower makespan than pure taskand data-parallel schedules. For task graphs with known optimal schedules or lower bounds on the makespan, our algorithm generates schedules that are closer to the optima than other scheduling approaches.

  15. Flexibility and Performance of Parallel File Systems

    Science.gov (United States)

    Kotz, David; Nieuwejaar, Nils

    1996-01-01

    As we gain experience with parallel file systems, it becomes increasingly clear that a single solution does not suit all applications. For example, it appears to be impossible to find a single appropriate interface, caching policy, file structure, or disk-management strategy. Furthermore, the proliferation of file-system interfaces and abstractions make applications difficult to port. We propose that the traditional functionality of parallel file systems be separated into two components: a fixed core that is standard on all platforms, encapsulating only primitive abstractions and interfaces, and a set of high-level libraries to provide a variety of abstractions and application-programmer interfaces (API's). We present our current and next-generation file systems as examples of this structure. Their features, such as a three-dimensional file structure, strided read and write interfaces, and I/O-node programs, are specifically designed with the flexibility and performance necessary to support a wide range of applications.

  16. Application of Pfortran and Co-Array Fortran in the Parallelization of the GROMOS96 Molecular Dynamics Module

    Directory of Open Access Journals (Sweden)

    Piotr Bała

    2001-01-01

    Full Text Available After at least a decade of parallel tool development, parallelization of scientific applications remains a significant undertaking. Typically parallelization is a specialized activity supported only partially by the programming tool set, with the programmer involved with parallel issues in addition to sequential ones. The details of concern range from algorithm design down to low-level data movement details. The aim of parallel programming tools is to automate the latter without sacrificing performance and portability, allowing the programmer to focus on algorithm specification and development. We present our use of two similar parallelization tools, Pfortran and Cray's Co-Array Fortran, in the parallelization of the GROMOS96 molecular dynamics module. Our parallelization started from the GROMOS96 distribution's shared-memory implementation of the replicated algorithm, but used little of that existing parallel structure. Consequently, our parallelization was close to starting with the sequential version. We found the intuitive extensions to Pfortran and Co-Array Fortran helpful in the rapid parallelization of the project. We present performance figures for both the Pfortran and Co-Array Fortran parallelizations showing linear speedup within the range expected by these parallelization methods.

  17. Configuration affects parallel stent grafting results.

    Science.gov (United States)

    Tanious, Adam; Wooster, Mathew; Armstrong, Paul A; Zwiebel, Bruce; Grundy, Shane; Back, Martin R; Shames, Murray L

    2018-05-01

    A number of adjunctive "off-the-shelf" procedures have been described to treat complex aortic diseases. Our goal was to evaluate parallel stent graft configurations and to determine an optimal formula for these procedures. This is a retrospective review of all patients at a single medical center treated with parallel stent grafts from January 2010 to September 2015. Outcomes were evaluated on the basis of parallel graft orientation, type, and main body device. Primary end points included parallel stent graft compromise and overall endovascular aneurysm repair (EVAR) compromise. There were 78 patients treated with a total of 144 parallel stents for a variety of pathologic processes. There was a significant correlation between main body oversizing and snorkel compromise (P = .0195) and overall procedural complication (P = .0019) but not with endoleak rates. Patients were organized into the following oversizing groups for further analysis: 0% to 10%, 10% to 20%, and >20%. Those oversized into the 0% to 10% group had the highest rate of overall EVAR complication (73%; P = .0003). There were no significant correlations between any one particular configuration and overall procedural complication. There was also no significant correlation between total number of parallel stents employed and overall complication. Composite EVAR configuration had no significant correlation with individual snorkel compromise, endoleak, or overall EVAR or procedural complication. The configuration most prone to individual snorkel compromise and overall EVAR complication was a four-stent configuration with two stents in an antegrade position and two stents in a retrograde position (60% complication rate). The configuration most prone to endoleak was one or two stents in retrograde position (33% endoleak rate), followed by three stents in an all-antegrade position (25%). There was a significant correlation between individual stent configuration and stent compromise (P = .0385), with 31

  18. Approximation algorithms for the parallel flow shop problem

    NARCIS (Netherlands)

    X. Zhang (Xiandong); S.L. van de Velde (Steef)

    2012-01-01

    textabstractWe consider the NP-hard problem of scheduling n jobs in m two-stage parallel flow shops so as to minimize the makespan. This problem decomposes into two subproblems: assigning the jobs to parallel flow shops; and scheduling the jobs assigned to the same flow shop by use of Johnson's

  19. Parallel Task Processing on a Multicore Platform in a PC-based Control System for Parallel Kinematics

    Directory of Open Access Journals (Sweden)

    Harald Michalik

    2009-02-01

    Full Text Available Multicore platforms are such that have one physical processor chip with multiple cores interconnected via a chip level bus. Because they deliver a greater computing power through concurrency, offer greater system density multicore platforms provide best qualifications to address the performance bottleneck encountered in PC-based control systems for parallel kinematic robots with heavy CPU-load. Heavy load control tasks are generated by new control approaches that include features like singularity prediction, structure control algorithms, vision data integration and similar tasks. In this paper we introduce the parallel task scheduling extension of a communication architecture specially tailored for the development of PC-based control of parallel kinematics. The Sche-duling is specially designed for the processing on a multicore platform. It breaks down the serial task processing of the robot control cycle and extends it with parallel task processing paths in order to enhance the overall control performance.

  20. BCD/CPS: An event-level GEANT3 parallelization via CPS

    International Nuclear Information System (INIS)

    Roberts, L.A.

    1991-04-01

    BCD/CPS is an implementation of the Bottom Collider Detector GEANT3 simulation for CPS processor ranches. BCD/CPS demonstrates some of the capabilities of event-parallel applications applicable to current SSC detector simulations using the CPS and CZ/CPS communications protocols. Design, implementation and usage of the BCD/CPS simulation are presented along with extensive source listings for novice GEANT3/CPS programmers. 11 refs

  1. Development of a parallel plate ion chamber for radiation protection level

    International Nuclear Information System (INIS)

    Bottaro, Marcio; Landi, Mauricio; Moralles, Mauricio

    2011-01-01

    A new parallel plate vented ion chamber is proposed in this paper. The application of this chamber was primarily intended to the measurement of stray radiation in interventional procedures, but the energy response of about 2.6%, which was obtained in the first prototype, on the range from 40 to 150 kV using ISO 4037-1 narrow qualities, provided the possibility of a wide modality application on radiation protection. Primary studies with Maxwell 2D electromagnetic field simulator revealed an optimized model regarding effective volume and saturation voltage levels, which conferred to the ion chamber a dual entrance window feature. The development of this ion chamber has the main contribution of Monte Carlo calculations as a support tool to the establishment of the effective volume of the chamber and determination of the best materials for housing mounting and conductive elements, such as guard rings, electrode, and windows. Even the composition of the conductive layers, which would be neglected due to their very small thicknesses (about 35 μm), had important influence on the results and could be better understood with Monte Carlo N-Particle Transport Code System (MCNP) simulations. (author)

  2. Two-step values for games with two-level communication structure

    NARCIS (Netherlands)

    Béal, Silvain; Khmelnitskaya, Anna Borisovna; Solal, Philippe

    TU games with two-level communication structure, in which a two-level communication structure relates fundamentally to the given coalition structure and consists of a communication graph on the collection of the a priori unions in the coalition structure, as well as a collection of communication

  3. Combinatorics of spreads and parallelisms

    CERN Document Server

    Johnson, Norman

    2010-01-01

    Partitions of Vector Spaces Quasi-Subgeometry Partitions Finite Focal-SpreadsGeneralizing André SpreadsThe Going Up Construction for Focal-SpreadsSubgeometry Partitions Subgeometry and Quasi-Subgeometry Partitions Subgeometries from Focal-SpreadsExtended André SubgeometriesKantor's Flag-Transitive DesignsMaximal Additive Partial SpreadsSubplane Covered Nets and Baer Groups Partial Desarguesian t-Parallelisms Direct Products of Affine PlanesJha-Johnson SL(2,

  4. Parallel Object-Oriented Computation Applied to a Finite Element Problem

    Directory of Open Access Journals (Sweden)

    Jon B. Weissman

    1993-01-01

    Full Text Available The conventional wisdom in the scientific computing community is that the best way to solve large-scale numerically intensive scientific problems on today's parallel MIMD computers is to use Fortran or C programmed in a data-parallel style using low-level message-passing primitives. This approach inevitably leads to nonportable codes and extensive development time, and restricts parallel programming to the domain of the expert programmer. We believe that these problems are not inherent to parallel computing but are the result of the programming tools used. We will show that comparable performance can be achieved with little effort if better tools that present higher level abstractions are used. The vehicle for our demonstration is a 2D electromagnetic finite element scattering code we have implemented in Mentat, an object-oriented parallel processing system. We briefly describe the application. Mentat, the implementation, and present performance results for both a Mentat and a hand-coded parallel Fortran version.

  5. Parallel operation of primary sodium pumps in FBTR

    International Nuclear Information System (INIS)

    Athmalingam, S.; Ellappan, T.R.; Vaidyanathan, G.; Chetal, S.C.; Bhoje, S.B.

    1994-01-01

    Sodium pumps used in the primary main circuit of Fast Breeder Test Reactor (FBTR) are centrifugal pumps. These pumps have a free level of sodium with a cover gas above it to simplify the pump seal arrangement. The sodium level in the pumps will vary based on the flow. The minimum level is governed by consideration of gas entrainment and net positive suction head (NPSH) to the pump while the maximum level is limited by sodium entering the pump tank gas line. There is a special feature in these pumps in that a small portion of the pump outlet sodium flow is led back into the suction chamber to maintain level and avoid gas entrainment. A control valve in this line helps in controlling the level at the desired value. With parallel operation of two sodium pumps a study was conducted to find the regions of safe operation of the two pumps. The purpose of this paper is to give the various design features and methodology of the analysis to arrive at the limiting condition of operation for the different operating states of the two pumps and the effect of pump speed variations on the fluctuations in sodium flows. (author). 6 figs

  6. [Three-dimensional parallel collagen scaffold promotes tendon extracellular matrix formation].

    Science.gov (United States)

    Zheng, Zefeng; Shen, Weiliang; Le, Huihui; Dai, Xuesong; Ouyang, Hongwei; Chen, Weishan

    2016-03-01

    To investigate the effects of three-dimensional parallel collagen scaffold on the cell shape, arrangement and extracellular matrix formation of tendon stem cells. Parallel collagen scaffold was fabricated by unidirectional freezing technique, while random collagen scaffold was fabricated by freeze-drying technique. The effects of two scaffolds on cell shape and extracellular matrix formation were investigated in vitro by seeding tendon stem/progenitor cells and in vivo by ectopic implantation. Parallel and random collagen scaffolds were produced successfully. Parallel collagen scaffold was more akin to tendon than random collagen scaffold. Tendon stem/progenitor cells were spindle-shaped and unified orientated in parallel collagen scaffold, while cells on random collagen scaffold had disorder orientation. Two weeks after ectopic implantation, cells had nearly the same orientation with the collagen substance. In parallel collagen scaffold, cells had parallel arrangement, and more spindly cells were observed. By contrast, cells in random collagen scaffold were disorder. Parallel collagen scaffold can induce cells to be in spindly and parallel arrangement, and promote parallel extracellular matrix formation; while random collagen scaffold can induce cells in random arrangement. The results indicate that parallel collagen scaffold is an ideal structure to promote tendon repairing.

  7. Parallel object-oriented term rewriting : the booleans

    NARCIS (Netherlands)

    Rodenburg, P.H.; Vrancken, J.L.M.

    As a first case study in parallel object-oriented term rewriting, we give two implementations of term rewriting algorithms for boolean terms, using the parallel object-oriented features of the language Pool-T. The term rewriting systems are specified in the specification formalism

  8. User-friendly parallelization of GAUDI applications with Python

    International Nuclear Information System (INIS)

    Mato, Pere; Smith, Eoin

    2010-01-01

    GAUDI is a software framework in C++ used to build event data processing applications using a set of standard components with well-defined interfaces. Simulation, high-level trigger, reconstruction, and analysis programs used by several experiments are developed using GAUDI. These applications can be configured and driven by simple Python scripts. Given the fact that a considerable amount of existing software has been developed using serial methodology, and has existed in some cases for many years, implementation of parallelisation techniques at the framework level may offer a way of exploiting current multi-core technologies to maximize performance and reduce latencies without re-writing thousands/millions of lines of code. In the solution we have developed, the parallelization techniques are introduced to the high level Python scripts which configure and drive the applications, such that the core C++ application code requires no modification, and that end users need make only minimal changes to their scripts. The developed solution leverages from existing generic Python modules that support parallel processing. Naturally, the parallel version of a given program should produce results consistent with its serial execution. The evaluation of several prototypes incorporating various parallelization techniques are presented and discussed.

  9. User-friendly parallelization of GAUDI applications with Python

    Energy Technology Data Exchange (ETDEWEB)

    Mato, Pere; Smith, Eoin, E-mail: pere.mato@cern.c [PH Department, CERN, 1211 Geneva 23 (Switzerland)

    2010-04-01

    GAUDI is a software framework in C++ used to build event data processing applications using a set of standard components with well-defined interfaces. Simulation, high-level trigger, reconstruction, and analysis programs used by several experiments are developed using GAUDI. These applications can be configured and driven by simple Python scripts. Given the fact that a considerable amount of existing software has been developed using serial methodology, and has existed in some cases for many years, implementation of parallelisation techniques at the framework level may offer a way of exploiting current multi-core technologies to maximize performance and reduce latencies without re-writing thousands/millions of lines of code. In the solution we have developed, the parallelization techniques are introduced to the high level Python scripts which configure and drive the applications, such that the core C++ application code requires no modification, and that end users need make only minimal changes to their scripts. The developed solution leverages from existing generic Python modules that support parallel processing. Naturally, the parallel version of a given program should produce results consistent with its serial execution. The evaluation of several prototypes incorporating various parallelization techniques are presented and discussed.

  10. Change of direction ability test differentiates higher level and lower level soccer referees

    Science.gov (United States)

    Los, Arcos A; Grande, I; Casajús, JA

    2016-01-01

    This report examines the agility and level of acceleration capacity of Spanish soccer referees and investigates the possible differences between field referees of different categories. The speed test consisted of 3 maximum acceleration stretches of 15 metres. The change of direction ability (CODA) test used in this study was a modification of the Modified Agility Test (MAT). The study included a sample of 41 Spanish soccer field referees from the Navarre Committee of Soccer Referees divided into two groups: i) the higher level group (G1, n = 20): 2ndA, 2ndB and 3rd division referees from the Spanish National Soccer League (28.43 ± 1.39 years); and ii) the lower level group (G2, n = 21): Navarre Provincial League soccer referees (29.54 ± 1.87 years). Significant differences were found with respect to the CODA between G1 (5.72 ± 0.13 s) and G2 (6.06 ± 0.30 s), while no differences were encountered between groups in acceleration ability. No significant correlations were obtained in G1 between agility and the capacity to accelerate. Significant correlations were found between sprint and agility times in the G2 and in the total group. The results of this study showed that agility can be used as a discriminating factor for differentiating between national and regional field referees; however, no observable differences were found over the 5 and 15 m sprint tests. PMID:27274111

  11. Extended RF shimming: Sequence-level parallel transmission optimization applied to steady-state free precession MRI of the heart.

    Science.gov (United States)

    Beqiri, Arian; Price, Anthony N; Padormo, Francesco; Hajnal, Joseph V; Malik, Shaihan J

    2017-06-01

    Cardiac magnetic resonance imaging (MRI) at high field presents challenges because of the high specific absorption rate and significant transmit field (B 1 + ) inhomogeneities. Parallel transmission MRI offers the ability to correct for both issues at the level of individual radiofrequency (RF) pulses, but must operate within strict hardware and safety constraints. The constraints are themselves affected by sequence parameters, such as the RF pulse duration and TR, meaning that an overall optimal operating point exists for a given sequence. This work seeks to obtain optimal performance by performing a 'sequence-level' optimization in which pulse sequence parameters are included as part of an RF shimming calculation. The method is applied to balanced steady-state free precession cardiac MRI with the objective of minimizing TR, hence reducing the imaging duration. Results are demonstrated using an eight-channel parallel transmit system operating at 3 T, with an in vivo study carried out on seven male subjects of varying body mass index (BMI). Compared with single-channel operation, a mean-squared-error shimming approach leads to reduced imaging durations of 32 ± 3% with simultaneous improvement in flip angle homogeneity of 32 ± 8% within the myocardium. © 2017 The Authors. NMR in Biomedicine published by John Wiley & Sons Ltd.

  12. Analysis and Implementation of Parallel Connected Two-Induction Motor Single-Inverter Drive by Direct Vector Control for Industrial Application

    DEFF Research Database (Denmark)

    Gunabalan, Ramachandiran; Padmanaban, Sanjeevikumar; Blaabjerg, Frede

    2015-01-01

    Sensorless-based direct vector control techniques are widely used for three-phase induction motor drive, whereas in case of multiple-motor control, it becomes intensively complicated and very few research articles in support to industrial applications were found. A straight-forward direct vector...... to estimate the rotor speed, rotor flux, and load torque of both motors. Simulation results along with theoretical background provided in this paper confirm the feasibility of operation of the ac motors and proves reliability for industrial applications....

  13. Simultaneous reflection masking: dependency on direct sound level and hearing-impairment

    DEFF Research Database (Denmark)

    Buchholz, Jörg; Mihai, Paul Glad

    2008-01-01

    B-SL direct sound level, NH-listeners showed a binaural suppression effect for delays smaller than 7-10 ms and a binaural enhancement effect for larger delays. When decreasing the direct sound level to 15 dB-SL, the only significant change observed was that the dichotic RMT increased for delays larger than...... expected from changed auditory filter bandwidth and audi-bility. However, the stimulus level-dependency of the auditory filters’ bandwidth was not reflected in the SRMT data....

  14. Optical path difference measurements with a two-step parallel phase shifting interferometer based on a modified Michelson configuration

    Science.gov (United States)

    Toto-Arellano, Noel Ivan; Serrano-Garcia, David I.; Rodriguez-Zurita, Gustavo

    2017-09-01

    We report an optical implementation of a parallel phase-shifting quasi-common path interferometer using two modified Michelson interferometers to generate two interferograms. By using a displaceable polarizer's array, placed on the image plane, we can obtain four phase-shifted interferograms in two captures. The system operates as a quasi-common path interferometer generating four beams, which are to interfere with alignment procedures on the mirrors of the Michelson configurations. The optical phase data are retrieved using the well-known four-step algorithms. To present the capabilities of the system, experimental results obtained from transparent structures are presented.

  15. An environment for parallel structuring of Fortran programs

    International Nuclear Information System (INIS)

    Sridharan, K.; McShea, M.; Denton, C.; Eventoff, B.; Browne, J.C.; Newton, P.; Ellis, M.; Grossbard, D.; Wise, T.; Clemmer, D.

    1990-01-01

    The paper describes and illustrates an environment for interactive support of the detection and implementation of macro-level parallelism in Fortran programs. The approach couples algorithms for dependence analysis with both innovative techniques for complexity management and capabilities for the measurement and analysis of the parallel computation structures generated through use of the environment. The resulting environment is complementary to the more common approach of seeking local parallelism by loop unrolling, either by an automatic compiler or manually. (orig.)

  16. Parallel Narrative Structure in Paul Harding's "Tinkers"

    Science.gov (United States)

    Çirakli, Mustafa Zeki

    2014-01-01

    The present paper explores the implications of parallel narrative structure in Paul Harding's "Tinkers" (2009). Besides primarily recounting the two sets of parallel narratives, "Tinkers" also comprises of seemingly unrelated fragments such as excerpts from clock repair manuals and diaries. The main stories, however, told…

  17. Aperture-based antihydrogen gravity experiment: Parallel plate geometry

    Science.gov (United States)

    Rocha, J. R.; Hedlof, R. M.; Ordonez, C. A.

    2013-10-01

    An analytical model and a Monte Carlo simulation are presented of an experiment that could be used to determine the direction of the acceleration of antihydrogen due to gravity. The experiment would rely on methods developed by existing antihydrogen research collaborations. The configuration consists of two circular, parallel plates that have an axis of symmetry directed away from the center of the earth. The plates are separated by a small vertical distance, and include one or more pairs of circular barriers that protrude from the upper and lower plates, thereby forming an aperture between the plates. Antihydrogen annihilations that occur just beyond each barrier, within a "shadow" region, are asymmetric on the upper plate relative to the lower plate. The probability for such annihilations is determined for a point, line and spheroidal source of antihydrogen. The production of 100,000 antiatoms is predicted to be necessary for the aperture-based experiment to indicate the direction of free fall acceleration of antimatter, provided that antihydrogen is produced within a sufficiently small antiproton plasma at a temperature of 4 K.

  18. Aperture-based antihydrogen gravity experiment: Parallel plate geometry

    Energy Technology Data Exchange (ETDEWEB)

    Rocha, J. R.; Hedlof, R. M.; Ordonez, C. A. [Department of Physics, University of North Texas, Denton, Texas 76203 (United States)

    2013-10-15

    An analytical model and a Monte Carlo simulation are presented of an experiment that could be used to determine the direction of the acceleration of antihydrogen due to gravity. The experiment would rely on methods developed by existing antihydrogen research collaborations. The configuration consists of two circular, parallel plates that have an axis of symmetry directed away from the center of the earth. The plates are separated by a small vertical distance, and include one or more pairs of circular barriers that protrude from the upper and lower plates, thereby forming an aperture between the plates. Antihydrogen annihilations that occur just beyond each barrier, within a “shadow” region, are asymmetric on the upper plate relative to the lower plate. The probability for such annihilations is determined for a point, line and spheroidal source of antihydrogen. The production of 100,000 antiatoms is predicted to be necessary for the aperture-based experiment to indicate the direction of free fall acceleration of antimatter, provided that antihydrogen is produced within a sufficiently small antiproton plasma at a temperature of 4 K.

  19. Aperture-based antihydrogen gravity experiment: Parallel plate geometry

    Directory of Open Access Journals (Sweden)

    J. R. Rocha

    2013-10-01

    Full Text Available An analytical model and a Monte Carlo simulation are presented of an experiment that could be used to determine the direction of the acceleration of antihydrogen due to gravity. The experiment would rely on methods developed by existing antihydrogen research collaborations. The configuration consists of two circular, parallel plates that have an axis of symmetry directed away from the center of the earth. The plates are separated by a small vertical distance, and include one or more pairs of circular barriers that protrude from the upper and lower plates, thereby forming an aperture between the plates. Antihydrogen annihilations that occur just beyond each barrier, within a “shadow” region, are asymmetric on the upper plate relative to the lower plate. The probability for such annihilations is determined for a point, line and spheroidal source of antihydrogen. The production of 100,000 antiatoms is predicted to be necessary for the aperture-based experiment to indicate the direction of free fall acceleration of antimatter, provided that antihydrogen is produced within a sufficiently small antiproton plasma at a temperature of 4 K.

  20. A Comparison Study on Motion/Force Transmissibility of Two Typical 3-DOF Parallel Manipulators: The Sprint Z3 and A3 Tool Heads

    Directory of Open Access Journals (Sweden)

    Xiang Chen

    2014-01-01

    Full Text Available This paper presents a comparison study of two important three-degree-of-freedom (DOF parallel manipulators, the Sprint Z3 head and the A3 head, both commonly used in industry. As an initial step, the inverse kinematics are derived and an analysis of two classes of limbs is carried out via screw theory. For comparison, three transmission indices are then defined to describe their motion/force transmission performance. Based on the same main parameters, the compared results reveal some distinct characteristics in addition to the similarities between the two parallel manipulators. To a certain extent, the A3 head outperforms the common Sprint Z3 head, providing a new and satisfactory option for a machine tool head in industry.

  1. The two parallel photocycles of the Chlamydomonas sensory photoreceptor histidine kinase rhodopsin 1.

    Science.gov (United States)

    Luck, Meike; Hegemann, Peter

    2017-10-01

    Histidine kinase rhodopsins (HKRs) belong to a class of unexplored sensory photoreceptors that share a similar modular architecture. The light sensing rhodopsin domain is covalently linked to signal-transducing modules and in some cases to a C-terminal guanylyl-cyclase effector. In spite of their wide distribution in unicellular organisms, very little is known about their physiological role and mechanistic functioning. We investigated the photochemical properties of the recombinant rhodopsin-fragment of Cr-HKR1 originating from Chlamydomonas reinhardtii. Our spectroscopic studies revealed an unusual thermal stability of the photoproducts with the deprotonated retinal Schiff base (RSB). Upon UV-irradiation these Rh-UV states with maximal absorbance in the UVA-region (Rh-UV) photochemically convert to stable blue light absorbing rhodopsin (Rh-Bl) with protonated chromophore. The heterogeneity of the sample is based on two parallel photocycles with the chromophore in C 15 =N-syn- or -anti-configuration. This report represents an attempt to decipher the underlying reaction schemes and interconversions of the two coexisting photocycles. Copyright © 2017 Elsevier GmbH. All rights reserved.

  2. A qualitative single case study of parallel processes

    DEFF Research Database (Denmark)

    Jacobsen, Claus Haugaard

    2007-01-01

    Parallel process in psychotherapy and supervision is a phenomenon manifest in relationships and interactions, that originates in one setting and is reflected in another. This article presents an explorative single case study of parallel processes based on qualitative analyses of two successive...... randomly chosen psychotherapy sessions with a schizophrenic patient and the supervision session given in between. The author's analysis is verified by an independent examiner's analysis. Parallel processes are identified and described. Reflections on the dynamics of parallel processes and supervisory...

  3. Numerical discrepancy between serial and MPI parallel computations

    Directory of Open Access Journals (Sweden)

    Sang Bong Lee

    2016-09-01

    Full Text Available Numerical simulations of 1D Burgers equation and 2D sloshing problem were carried out to study numerical discrepancy between serial and parallel computations. The numerical domain was decomposed into 2 and 4 subdomains for parallel computations with message passing interface. The numerical solution of Burgers equation disclosed that fully explicit boundary conditions used on subdomains of parallel computation was responsible for the numerical discrepancy of transient solution between serial and parallel computations. Two dimensional sloshing problems in a rectangular domain were solved using OpenFOAM. After a lapse of initial transient time sloshing patterns of water were significantly different in serial and parallel computations although the same numerical conditions were given. Based on the histograms of pressure measured at two points near the wall the statistical characteristics of numerical solution was not affected by the number of subdomains as much as the transient solution was dependent on the number of subdomains.

  4. Rapid Screening of Acetylcholinesterase Inhibitors by Effect-Directed Analysis Using LC × LC Fractionation, a High Throughput in Vitro Assay, and Parallel Identification by Time of Flight Mass Spectrometry.

    Science.gov (United States)

    Ouyang, Xiyu; Leonards, Pim E G; Tousova, Zuzana; Slobodnik, Jaroslav; de Boer, Jacob; Lamoree, Marja H

    2016-02-16

    Effect-directed analysis (EDA) is a useful tool to identify bioactive compounds in complex samples. However, identification in EDA is usually challenging, mainly due to limited separation power of the liquid chromatography based fractionation. In this study, comprehensive two-dimensional liquid chromatography (LC × LC) based microfractionation combined with parallel high resolution time of flight (HR-ToF) mass spectrometric detection and a high throughput acetylcholinesterase (AChE) assay was developed. The LC × LC fractionation method was validated using analytical standards and a C18 and pentafluorophenyl (PFP) stationary phase combination was selected for the two-dimensional separation and fractionation in four 96-well plates. The method was successfully applied to identify AChE inhibitors in a wastewater treatment plant (WWTP) effluent. Good orthogonality (>0.9) separation was achieved and three AChE inhibitors (tiapride, amisulpride, and lamotrigine), used as antipsychotic medicines, were identified and confirmed by two-dimensional retention alignment as well as their AChE inhibition activity.

  5. Stretchable Complementary Split Ring Resonator (CSRR-Based Radio Frequency (RF Sensor for Strain Direction and Level Detection

    Directory of Open Access Journals (Sweden)

    Seunghyun Eom

    2016-10-01

    Full Text Available In this paper, we proposed a stretchable radio frequency (RF sensor to detect strain direction and level. The stretchable sensor is composed of two complementary split ring resonators (CSRR with microfluidic channels. In order to achieve stretchability, liquid metal (eutectic gallium-indium, EGaIn and Ecoflex substrate are used. Microfluidic channels are built by Ecoflex elastomer and microfluidic channel frames. A three-dimensional (3D printer is used for fabrication of microfluidic channel frames. Two CSRR resonators are designed to resonate 2.03 GHz and 3.68 GHz. When the proposed sensor is stretched from 0 to 8 mm along the +x direction, the resonant frequency is shifted from 3.68 GHz to 3.13 GHz. When the proposed sensor is stretched from 0 to 8 mm along the −x direction, the resonant frequency is shifted from 2.03 GHz to 1.78 GHz. Therefore, we can detect stretched length and direction from independent variation of two resonant frequencies.

  6. Parallel computing for event reconstruction in high-energy physics

    International Nuclear Information System (INIS)

    Wolbers, S.

    1993-01-01

    Parallel computing has been recognized as a solution to large computing problems. In High Energy Physics offline event reconstruction of detector data is a very large computing problem that has been solved with parallel computing techniques. A review of the parallel programming package CPS (Cooperative Processes Software) developed and used at Fermilab for offline reconstruction of Terabytes of data requiring the delivery of hundreds of Vax-Years per experiment is given. The Fermilab UNIX farms, consisting of 180 Silicon Graphics workstations and 144 IBM RS6000 workstations, are used to provide the computing power for the experiments. Fermilab has had a long history of providing production parallel computing starting with the ACP (Advanced Computer Project) Farms in 1986. The Fermilab UNIX Farms have been in production for over 2 years with 24 hour/day service to experimental user groups. Additional tools for management, control and monitoring these large systems will be described. Possible future directions for parallel computing in High Energy Physics will be given

  7. A novel laparoscopic grasper with two parallel jaws capable of extracting the mechanical behaviour of soft tissues.

    Science.gov (United States)

    Nazarynasab, Dariush; Farahmand, Farzam; Mirbagheri, Alireza; Afshari, Elnaz

    2017-07-01

    Data related to force-deformation behaviour of soft tissue plays an important role in medical/surgical applications such as realistically modelling mechanical behaviour of soft tissue as well as minimally invasive surgery (MIS) and medical diagnosis. While the mechanical behaviour of soft tissue is very complex due to its different constitutive components, some issues increase its complexity like behavioural changes between the live and dead tissues. Indeed, an adequate quantitative description of mechanical behaviour of soft tissues requires high quality in vivo experimental data to be obtained and analysed. This paper describes a novel laparoscopic grasper with two parallel jaws capable of obtaining compressive force-deformation data related to mechanical behaviour of soft tissues. This new laparoscopic grasper includes four sections as mechanical hardware, sensory part, electrical/electronical part and data storage part. By considering a unique design for mechanical hardware, data recording conditions will be close to unconfined-compression-test conditions; so obtained data can be properly used in extracting the mechanical behaviour of soft tissues. Also, the other distinguishing feature of this new system is its applicability during different laparoscopic surgeries and subsequently obtaining in vivo data. However, more preclinical examinations are needed to evaluate the practicality of the novel laparoscopic grasper with two parallel jaws.

  8. Performance of a Two-Level Call Admission Control Scheme for DS-CDMA Wireless Networks

    Directory of Open Access Journals (Sweden)

    Fapojuwo Abraham O

    2007-01-01

    Full Text Available We propose a two-level call admission control (CAC scheme for direct sequence code division multiple access (DS-CDMA wireless networks supporting multimedia traffic and evaluate its performance. The first-level admission control assigns higher priority to real-time calls (also referred to as class 0 calls in gaining access to the system resources. The second level admits nonreal-time calls (or class 1 calls based on the resources remaining after meeting the resource needs for real-time calls. However, to ensure some minimum level of performance for nonreal-time calls, the scheme reserves some resources for such calls. The proposed two-level CAC scheme utilizes the delay-tolerant characteristic of non-real-time calls by incorporating a queue to temporarily store those that cannot be assigned resources at the time of initial access. We analyze and evaluate the call blocking, outage probability, throughput, and average queuing delay performance of the proposed two-level CAC scheme using Markov chain theory. The analytic results are validated by simulation results. The numerical results show that the proposed two-level CAC scheme provides better performance than the single-level CAC scheme. Based on these results, it is concluded that the proposed two-level CAC scheme serves as a good solution for supporting multimedia applications in DS-CDMA wireless communication systems.

  9. Performance of a Two-Level Call Admission Control Scheme for DS-CDMA Wireless Networks

    Directory of Open Access Journals (Sweden)

    Abraham O. Fapojuwo

    2007-11-01

    Full Text Available We propose a two-level call admission control (CAC scheme for direct sequence code division multiple access (DS-CDMA wireless networks supporting multimedia traffic and evaluate its performance. The first-level admission control assigns higher priority to real-time calls (also referred to as class 0 calls in gaining access to the system resources. The second level admits nonreal-time calls (or class 1 calls based on the resources remaining after meeting the resource needs for real-time calls. However, to ensure some minimum level of performance for nonreal-time calls, the scheme reserves some resources for such calls. The proposed two-level CAC scheme utilizes the delay-tolerant characteristic of non-real-time calls by incorporating a queue to temporarily store those that cannot be assigned resources at the time of initial access. We analyze and evaluate the call blocking, outage probability, throughput, and average queuing delay performance of the proposed two-level CAC scheme using Markov chain theory. The analytic results are validated by simulation results. The numerical results show that the proposed two-level CAC scheme provides better performance than the single-level CAC scheme. Based on these results, it is concluded that the proposed two-level CAC scheme serves as a good solution for supporting multimedia applications in DS-CDMA wireless communication systems.

  10. 3-dimensional magnetotelluric inversion including topography using deformed hexahedral edge finite elements and direct solvers parallelized on symmetric multiprocessor computers - Part II: direct data-space inverse solution

    Science.gov (United States)

    Kordy, M.; Wannamaker, P.; Maris, V.; Cherkaev, E.; Hill, G.

    2016-01-01

    Following the creation described in Part I of a deformable edge finite-element simulator for 3-D magnetotelluric (MT) responses using direct solvers, in Part II we develop an algorithm named HexMT for 3-D regularized inversion of MT data including topography. Direct solvers parallelized on large-RAM, symmetric multiprocessor (SMP) workstations are used also for the Gauss-Newton model update. By exploiting the data-space approach, the computational cost of the model update becomes much less in both time and computer memory than the cost of the forward simulation. In order to regularize using the second norm of the gradient, we factor the matrix related to the regularization term and apply its inverse to the Jacobian, which is done using the MKL PARDISO library. For dense matrix multiplication and factorization related to the model update, we use the PLASMA library which shows very good scalability across processor cores. A synthetic test inversion using a simple hill model shows that including topography can be important; in this case depression of the electric field by the hill can cause false conductors at depth or mask the presence of resistive structure. With a simple model of two buried bricks, a uniform spatial weighting for the norm of model smoothing recovered more accurate locations for the tomographic images compared to weightings which were a function of parameter Jacobians. We implement joint inversion for static distortion matrices tested using the Dublin secret model 2, for which we are able to reduce nRMS to ˜1.1 while avoiding oscillatory convergence. Finally we test the code on field data by inverting full impedance and tipper MT responses collected around Mount St Helens in the Cascade volcanic chain. Among several prominent structures, the north-south trending, eruption-controlling shear zone is clearly imaged in the inversion.

  11. An Algorithm for Parallel Sn Sweeps on Unstructured Meshes

    International Nuclear Information System (INIS)

    Pautz, Shawn D.

    2002-01-01

    A new algorithm for performing parallel S n sweeps on unstructured meshes is developed. The algorithm uses a low-complexity list ordering heuristic to determine a sweep ordering on any partitioned mesh. For typical problems and with 'normal' mesh partitionings, nearly linear speedups on up to 126 processors are observed. This is an important and desirable result, since although analyses of structured meshes indicate that parallel sweeps will not scale with normal partitioning approaches, no severe asymptotic degradation in the parallel efficiency is observed with modest (≤100) levels of parallelism. This result is a fundamental step in the development of efficient parallel S n methods

  12. Effect of parallel magnetic field on repetitively unipolar nanosecond pulsed dielectric barrier discharge under different pulse repetition frequencies

    Science.gov (United States)

    Liu, Yidi; Yan, Huijie; Guo, Hongfei; Fan, Zhihui; Wang, Yuying; Wu, Yun; Ren, Chunsheng

    2018-03-01

    A magnetic field, with the direction parallel to the electric field, is applied to the repetitively unipolar positive nanosecond pulsed dielectric barrier discharge. The effect of the parallel magnetic field on the plasma generated between two parallel-plate electrodes in quiescent air is experimentally studied under different pulse repetition frequencies (PRFs). It is indicated that only the current pulse in the rising front of the voltage pulse occurs, and the value of the current is increased by the parallel magnetic field under different PRFs. The discharge uniformity is improved with the decrease in PRF, and this phenomenon is also observed in the discharge with the parallel magnetic field. By using the line-ratio technique of optical emission spectra, it is found that the average electron density and electron temperature under the considered PRFs are both increased when the parallel magnetic field is applied. The incremental degree of average electron density is basically the same under the considered PRFs, while the incremental degree of electron temperature under the higher-PRFs is larger than that under the lower-PRFs. All the above phenomena are explained by the effect of parallel magnetic field on diffusion and dissipation of electrons.

  13. A Scalable Parallel PWTD-Accelerated SIE Solver for Analyzing Transient Scattering from Electrically Large Objects

    KAUST Repository

    Liu, Yang; Yucel, Abdulkadir; Bagci, Hakan; Michielssen, Eric

    2015-01-01

    of processors by leveraging two mechanisms: (i) a hierarchical parallelization strategy to evenly distribute the computation and memory loads at all levels of the PWTD tree among processors, and (ii) a novel asynchronous communication scheme to reduce the cost

  14. Optimisation of a parallel ocean general circulation model

    Science.gov (United States)

    Beare, M. I.; Stevens, D. P.

    1997-10-01

    This paper presents the development of a general-purpose parallel ocean circulation model, for use on a wide range of computer platforms, from traditional scalar machines to workstation clusters and massively parallel processors. Parallelism is provided, as a modular option, via high-level message-passing routines, thus hiding the technical intricacies from the user. An initial implementation highlights that the parallel efficiency of the model is adversely affected by a number of factors, for which optimisations are discussed and implemented. The resulting ocean code is portable and, in particular, allows science to be achieved on local workstations that could otherwise only be undertaken on state-of-the-art supercomputers.

  15. Domain decomposition method of stochastic PDEs: a two-level scalable preconditioner

    International Nuclear Information System (INIS)

    Subber, Waad; Sarkar, Abhijit

    2012-01-01

    For uncertainty quantification in many practical engineering problems, the stochastic finite element method (SFEM) may be computationally challenging. In SFEM, the size of the algebraic linear system grows rapidly with the spatial mesh resolution and the order of the stochastic dimension. In this paper, we describe a non-overlapping domain decomposition method, namely the iterative substructuring method to tackle the large-scale linear system arising in the SFEM. The SFEM is based on domain decomposition in the geometric space and a polynomial chaos expansion in the probabilistic space. In particular, a two-level scalable preconditioner is proposed for the iterative solver of the interface problem for the stochastic systems. The preconditioner is equipped with a coarse problem which globally connects the subdomains both in the geometric and probabilistic spaces via their corner nodes. This coarse problem propagates the information quickly across the subdomains leading to a scalable preconditioner. For numerical illustrations, a two-dimensional stochastic elliptic partial differential equation (SPDE) with spatially varying non-Gaussian random coefficients is considered. The numerical scalability of the the preconditioner is investigated with respect to the mesh size, subdomain size, fixed problem size per subdomain and order of polynomial chaos expansion. The numerical experiments are performed on a Linux cluster using MPI and PETSc parallel libraries.

  16. Differences Between Distributed and Parallel Systems

    Energy Technology Data Exchange (ETDEWEB)

    Brightwell, R.; Maccabe, A.B.; Rissen, R.

    1998-10-01

    Distributed systems have been studied for twenty years and are now coming into wider use as fast networks and powerful workstations become more readily available. In many respects a massively parallel computer resembles a network of workstations and it is tempting to port a distributed operating system to such a machine. However, there are significant differences between these two environments and a parallel operating system is needed to get the best performance out of a massively parallel system. This report characterizes the differences between distributed systems, networks of workstations, and massively parallel systems and analyzes the impact of these differences on operating system design. In the second part of the report, we introduce Puma, an operating system specifically developed for massively parallel systems. We describe Puma portals, the basic building blocks for message passing paradigms implemented on top of Puma, and show how the differences observed in the first part of the report have influenced the design and implementation of Puma.

  17. Spacecraft attitude maneuver control using two parallel mounted 3-DOF spherical actuators

    Directory of Open Access Journals (Sweden)

    Guidan Li

    2017-02-01

    Full Text Available A parallel configuration using two 3-degree-of-freedom (3-DOF spherical electromagnetic momentum exchange actuators is investigated for large angle spacecraft attitude maneuvers. First, the full dynamic equations of motion for the spacecraft system are derived by the Newton-Euler method. To facilitate computation, virtual gimbal coordinate frames are established. Second, a nonlinear control law in terms of quaternions is developed via backstepping method. The proposed control law compensates the coupling torques arising from the spacecraft rotation, and is robust against the external disturbances. Then, the singularity problem is analyzed. To avoid singularities, a modified weighed Moore-Pseudo inverse velocity steering law based on null motion is proposed. The weighted matrices are carefully designed to switch the actuators and redistribute the control torques. The null motion is used to reorient the rotor away from the tilt angle saturation state. Finally, numerical simulations of rest-to-rest maneuvers are performed to validate the effectiveness of the proposed method.

  18. Empirical valence bond models for reactive potential energy surfaces: a parallel multilevel genetic program approach.

    Science.gov (United States)

    Bellucci, Michael A; Coker, David F

    2011-07-28

    We describe a new method for constructing empirical valence bond potential energy surfaces using a parallel multilevel genetic program (PMLGP). Genetic programs can be used to perform an efficient search through function space and parameter space to find the best functions and sets of parameters that fit energies obtained by ab initio electronic structure calculations. Building on the traditional genetic program approach, the PMLGP utilizes a hierarchy of genetic programming on two different levels. The lower level genetic programs are used to optimize coevolving populations in parallel while the higher level genetic program (HLGP) is used to optimize the genetic operator probabilities of the lower level genetic programs. The HLGP allows the algorithm to dynamically learn the mutation or combination of mutations that most effectively increase the fitness of the populations, causing a significant increase in the algorithm's accuracy and efficiency. The algorithm's accuracy and efficiency is tested against a standard parallel genetic program with a variety of one-dimensional test cases. Subsequently, the PMLGP is utilized to obtain an accurate empirical valence bond model for proton transfer in 3-hydroxy-gamma-pyrone in gas phase and protic solvent. © 2011 American Institute of Physics

  19. Identification of Arbitrary Zonation in Groundwater Parameters using the Level Set Method and a Parallel Genetic Algorithm

    Science.gov (United States)

    Lei, H.; Lu, Z.; Vesselinov, V. V.; Ye, M.

    2017-12-01

    Simultaneous identification of both the zonation structure of aquifer heterogeneity and the hydrogeological parameters associated with these zones is challenging, especially for complex subsurface heterogeneity fields. In this study, a new approach, based on the combination of the level set method and a parallel genetic algorithm is proposed. Starting with an initial guess for the zonation field (including both zonation structure and the hydraulic properties of each zone), the level set method ensures that material interfaces are evolved through the inverse process such that the total residual between the simulated and observed state variables (hydraulic head) always decreases, which means that the inversion result depends on the initial guess field and the minimization process might fail if it encounters a local minimum. To find the global minimum, the genetic algorithm (GA) is utilized to explore the parameters that define initial guess fields, and the minimal total residual corresponding to each initial guess field is considered as the fitness function value in the GA. Due to the expensive evaluation of the fitness function, a parallel GA is adapted in combination with a simulated annealing algorithm. The new approach has been applied to several synthetic cases in both steady-state and transient flow fields, including a case with real flow conditions at the chromium contaminant site at the Los Alamos National Laboratory. The results show that this approach is capable of identifying the arbitrary zonation structures of aquifer heterogeneity and the hydrogeological parameters associated with these zones effectively.

  20. Parallel segmented outlet flow high performance liquid chromatography with multiplexed detection

    International Nuclear Information System (INIS)

    Camenzuli, Michelle; Terry, Jessica M.; Shalliker, R. Andrew; Conlan, Xavier A.; Barnett, Neil W.; Francis, Paul S.

    2013-01-01

    Graphical abstract: -- Highlights: •Multiplexed detection for liquid chromatography. •‘Parallel segmented outlet flow’ distributes inner and outer portions of the analyte zone. •Three detectors were used simultaneously for the determination of opiate alkaloids. -- Abstract: We describe a new approach to multiplex detection for HPLC, exploiting parallel segmented outlet flow – a new column technology that provides pressure-regulated control of eluate flow through multiple outlet channels, which minimises the additional dead volume associated with conventional post-column flow splitting. Using three detectors: one UV-absorbance and two chemiluminescence systems (tris(2,2′-bipyridine)ruthenium(III) and permanganate), we examine the relative responses for six opium poppy (Papaver somniferum) alkaloids under conventional and multiplexed conditions, where approximately 30% of the eluate was distributed to each detector and the remaining solution directed to a collection vessel. The parallel segmented outlet flow mode of operation offers advantages in terms of solvent consumption, waste generation, total analysis time and solute band volume when applying multiple detectors to HPLC, but the manner in which each detection system is influenced by changes in solute concentration and solution flow rates must be carefully considered

  1. Parallel segmented outlet flow high performance liquid chromatography with multiplexed detection

    Energy Technology Data Exchange (ETDEWEB)

    Camenzuli, Michelle [Australian Centre for Research on Separation Science (ACROSS), School of Science and Health, University of Western Sydney (Parramatta), Sydney, NSW (Australia); Terry, Jessica M. [Centre for Chemistry and Biotechnology, School of Life and Environmental Sciences, Deakin University, Geelong, Victoria 3216 (Australia); Shalliker, R. Andrew, E-mail: r.shalliker@uws.edu.au [Australian Centre for Research on Separation Science (ACROSS), School of Science and Health, University of Western Sydney (Parramatta), Sydney, NSW (Australia); Conlan, Xavier A.; Barnett, Neil W. [Centre for Chemistry and Biotechnology, School of Life and Environmental Sciences, Deakin University, Geelong, Victoria 3216 (Australia); Francis, Paul S., E-mail: paul.francis@deakin.edu.au [Centre for Chemistry and Biotechnology, School of Life and Environmental Sciences, Deakin University, Geelong, Victoria 3216 (Australia)

    2013-11-25

    Graphical abstract: -- Highlights: •Multiplexed detection for liquid chromatography. •‘Parallel segmented outlet flow’ distributes inner and outer portions of the analyte zone. •Three detectors were used simultaneously for the determination of opiate alkaloids. -- Abstract: We describe a new approach to multiplex detection for HPLC, exploiting parallel segmented outlet flow – a new column technology that provides pressure-regulated control of eluate flow through multiple outlet channels, which minimises the additional dead volume associated with conventional post-column flow splitting. Using three detectors: one UV-absorbance and two chemiluminescence systems (tris(2,2′-bipyridine)ruthenium(III) and permanganate), we examine the relative responses for six opium poppy (Papaver somniferum) alkaloids under conventional and multiplexed conditions, where approximately 30% of the eluate was distributed to each detector and the remaining solution directed to a collection vessel. The parallel segmented outlet flow mode of operation offers advantages in terms of solvent consumption, waste generation, total analysis time and solute band volume when applying multiple detectors to HPLC, but the manner in which each detection system is influenced by changes in solute concentration and solution flow rates must be carefully considered.

  2. Functional Parallel Factor Analysis for Functions of One- and Two-dimensional Arguments.

    Science.gov (United States)

    Choi, Ji Yeh; Hwang, Heungsun; Timmerman, Marieke E

    2018-03-01

    Parallel factor analysis (PARAFAC) is a useful multivariate method for decomposing three-way data that consist of three different types of entities simultaneously. This method estimates trilinear components, each of which is a low-dimensional representation of a set of entities, often called a mode, to explain the maximum variance of the data. Functional PARAFAC permits the entities in different modes to be smooth functions or curves, varying over a continuum, rather than a collection of unconnected responses. The existing functional PARAFAC methods handle functions of a one-dimensional argument (e.g., time) only. In this paper, we propose a new extension of functional PARAFAC for handling three-way data whose responses are sequenced along both a two-dimensional domain (e.g., a plane with x- and y-axis coordinates) and a one-dimensional argument. Technically, the proposed method combines PARAFAC with basis function expansion approximations, using a set of piecewise quadratic finite element basis functions for estimating two-dimensional smooth functions and a set of one-dimensional basis functions for estimating one-dimensional smooth functions. In a simulation study, the proposed method appeared to outperform the conventional PARAFAC. We apply the method to EEG data to demonstrate its empirical usefulness.

  3. Adapting algorithms to massively parallel hardware

    CERN Document Server

    Sioulas, Panagiotis

    2016-01-01

    In the recent years, the trend in computing has shifted from delivering processors with faster clock speeds to increasing the number of cores per processor. This marks a paradigm shift towards parallel programming in which applications are programmed to exploit the power provided by multi-cores. Usually there is gain in terms of the time-to-solution and the memory footprint. Specifically, this trend has sparked an interest towards massively parallel systems that can provide a large number of processors, and possibly computing nodes, as in the GPUs and MPPAs (Massively Parallel Processor Arrays). In this project, the focus was on two distinct computing problems: k-d tree searches and track seeding cellular automata. The goal was to adapt the algorithms to parallel systems and evaluate their performance in different cases.

  4. Using Direct Sub-Level Entity Access to Improve Nuclear Stockpile Simulation Modeling

    Energy Technology Data Exchange (ETDEWEB)

    Parker, Robert Y. [Brigham Young Univ., Provo, UT (United States)

    1999-08-01

    Direct sub-level entity access is a seldom-used technique in discrete-event simulation modeling that addresses the accessibility of sub-level entity information. The technique has significant advantages over more common, alternative modeling methods--especially where hierarchical entity structures are modeled. As such, direct sub-level entity access is often preferable in modeling nuclear stockpile, life-extension issues, an area to which it has not been previously applied. Current nuclear stockpile, life-extension models were demonstrated to benefit greatly from the advantages of direct sub-level entity access. In specific cases, the application of the technique resulted in models that were up to 10 times faster than functionally equivalent models where alternative techniques were applied. Furthermore, specific implementations of direct sub-level entity access were observed to be more flexible, efficient, functional, and scalable than corresponding implementations using common modeling techniques. Common modeling techniques (''unbatch/batch'' and ''attribute-copying'') proved inefficient and cumbersome in handling many nuclear stockpile modeling complexities, including multiple weapon sites, true defect analysis, and large numbers of weapon and subsystem types. While significant effort was required to enable direct sub-level entity access in the nuclear stockpile simulation models, the enhancements were worth the effort--resulting in more efficient, more capable, and more informative models that effectively addressed the complexities of the nuclear stockpile.

  5. Many-Objective Particle Swarm Optimization Using Two-Stage Strategy and Parallel Cell Coordinate System.

    Science.gov (United States)

    Hu, Wang; Yen, Gary G; Luo, Guangchun

    2017-06-01

    It is a daunting challenge to balance the convergence and diversity of an approximate Pareto front in a many-objective optimization evolutionary algorithm. A novel algorithm, named many-objective particle swarm optimization with the two-stage strategy and parallel cell coordinate system (PCCS), is proposed in this paper to improve the comprehensive performance in terms of the convergence and diversity. In the proposed two-stage strategy, the convergence and diversity are separately emphasized at different stages by a single-objective optimizer and a many-objective optimizer, respectively. A PCCS is exploited to manage the diversity, such as maintaining a diverse archive, identifying the dominance resistant solutions, and selecting the diversified solutions. In addition, a leader group is used for selecting the global best solutions to balance the exploitation and exploration of a population. The experimental results illustrate that the proposed algorithm outperforms six chosen state-of-the-art designs in terms of the inverted generational distance and hypervolume over the DTLZ test suite.

  6. Parallel-propagated frame along null geodesics in higher-dimensional black hole spacetimes

    International Nuclear Information System (INIS)

    Kubiznak, David; Frolov, Valeri P.; Connell, Patrick; Krtous, Pavel

    2009-01-01

    In [arXiv:0803.3259] the equations describing the parallel transport of orthonormal frames along timelike (spacelike) geodesics in a spacetime admitting a nondegenerate principal conformal Killing-Yano 2-form h were solved. The construction employed is based on studying the Darboux subspaces of the 2-form F obtained as a projection of h along the geodesic trajectory. In this paper we demonstrate that, although slightly modified, a similar construction is possible also in the case of null geodesics. In particular, we explicitly construct the parallel-transported frames along null geodesics in D=4, 5, 6 Kerr-NUT-(A)dS spacetimes. We further discuss the parallel transport along principal null directions in these spacetimes. Such directions coincide with the eigenvectors of the principal conformal Killing-Yano tensor. Finally, we show how to obtain a parallel-transported frame along null geodesics in the background of the 4D Plebanski-Demianski metric which admits only a conformal generalization of the Killing-Yano tensor.

  7. Spatial data analytics on heterogeneous multi- and many-core parallel architectures using python

    Science.gov (United States)

    Laura, Jason R.; Rey, Sergio J.

    2017-01-01

    Parallel vector spatial analysis concerns the application of parallel computational methods to facilitate vector-based spatial analysis. The history of parallel computation in spatial analysis is reviewed, and this work is placed into the broader context of high-performance computing (HPC) and parallelization research. The rise of cyber infrastructure and its manifestation in spatial analysis as CyberGIScience is seen as a main driver of renewed interest in parallel computation in the spatial sciences. Key problems in spatial analysis that have been the focus of parallel computing are covered. Chief among these are spatial optimization problems, computational geometric problems including polygonization and spatial contiguity detection, the use of Monte Carlo Markov chain simulation in spatial statistics, and parallel implementations of spatial econometric methods. Future directions for research on parallelization in computational spatial analysis are outlined.

  8. Current Trends in Numerical Simulation for Parallel Engineering Environments New Directions and Work-in-Progress

    International Nuclear Information System (INIS)

    Trinitis, C; Schulz, M

    2006-01-01

    In today's world, the use of parallel programming and architectures is essential for simulating practical problems in engineering and related disciplines. Remarkable progress in CPU architecture, system scalability, and interconnect technology continues to provide new opportunities, as well as new challenges for both system architects and software developers. These trends are paralleled by progress in parallel algorithms, simulation techniques, and software integration from multiple disciplines. ParSim brings together researchers from both application disciplines and computer science and aims at fostering closer cooperation between these fields. Since its successful introduction in 2002, ParSim has established itself as an integral part of the EuroPVM/MPI conference series. In contrast to traditional conferences, emphasis is put on the presentation of up-to-date results with a short turn-around time. This offers a unique opportunity to present new aspects in this dynamic field and discuss them with a wide, interdisciplinary audience. The EuroPVM/MPI conference series, as one of the prime events in parallel computation, serves as an ideal surrounding for ParSim. This combination enables the participants to present and discuss their work within the scope of both the session and the host conference. This year, eleven papers from authors in nine countries were submitted to ParSim, and we selected five of them. They cover a wide range of different application fields including gas flow simulations, thermo-mechanical processes in nuclear waste storage, and cosmological simulations. At the same time, the selected contributions also address the computer science side of their codes and discuss different parallelization strategies, programming models and languages, as well as the use nonblocking collective operations in MPI. We are confident that this provides an attractive program and that ParSim will be an informal setting for lively discussions and for fostering new

  9. Operation and Control of a Direct-Driven PMSG-Based Wind Turbine System with an Auxiliary Parallel Grid-Side Converter

    Directory of Open Access Journals (Sweden)

    Jiawei Chu

    2013-07-01

    Full Text Available In this paper, based on the similarity, in structure and principle, between a grid-connected converter for a direct-driven permanent magnet synchronous generator (D-PMSG and an active power filter (APF, a new D-PMSG-based wind turbine (WT system configuration that includes not only an auxiliary converter in parallel with the grid-side converter, but also a coordinated control strategy, is proposed to enhance the low voltage ride through (LVRT capability and improve power quality. During normal operation, the main grid-side converter maintains the DC-link voltage constant, whereas the auxiliary grid-side converter functions as an APF with harmonic suppression and reactive power compensation to improve the power quality. During grid faults, a hierarchical coordinated control scheme for the generator-side converter, main grid-side converter and auxiliary grid-side converter, depending on the grid voltage sags, is presented to enhance the LVRT capability of the direct-driven PMSG WT. The feasibility and the effectiveness of the proposed system’s topology and hierarchical coordinated control strategy were verified using MATLAB/Simulink simulations.

  10. Discrete Hadamard transformation algorithm's parallelism analysis and achievement

    Science.gov (United States)

    Hu, Hui

    2009-07-01

    With respect to Discrete Hadamard Transformation (DHT) wide application in real-time signal processing while limitation in operation speed of DSP. The article makes DHT parallel research and its parallel performance analysis. Based on multiprocessor platform-TMS320C80 programming structure, the research is carried out to achieve two kinds of parallel DHT algorithms. Several experiments demonstrated the effectiveness of the proposed algorithms.

  11. Parallel plate detectors

    International Nuclear Information System (INIS)

    Gardes, D.; Volkov, P.

    1981-01-01

    A 5x3cm 2 (timing only) and a 15x5cm 2 (timing and position) parallel plate avalanche counters (PPAC) are considered. The theory of operation and timing resolution is given. The measurement set-up and the curves of experimental results illustrate the possibilities of the two counters [fr

  12. New strategy for eliminating zero-sequence circulating current between parallel operating three-level NPC voltage source inverters

    DEFF Research Database (Denmark)

    Li, Kai; Dong, Zhenhua; Wang, Xiaodong

    2018-01-01

    A novel strategy based on a zero common mode voltage pulse-width modulation (ZCMV-PWM) technique and zero-sequence circulating current (ZSCC) feedback control is proposed in this study to eliminate ZSCCs between three-level neutral point clamped (NPC) voltage source inverters, with common AC and DC......, the ZCMV-PWM method is presented to reduce CMVs, and a simple electric circuit is adopted to control ZSCCs and neutral point potential. Finally, simulation and experiment are conducted to illustrate effectiveness of the proposed strategy. Results show that ZSCCs between paralleled inverters can...

  13. Current distribution characteristics of superconducting parallel circuits

    International Nuclear Information System (INIS)

    Mori, K.; Suzuki, Y.; Hara, N.; Kitamura, M.; Tominaka, T.

    1994-01-01

    In order to increase the current carrying capacity of the current path of the superconducting magnet system, the portion of parallel circuits such as insulated multi-strand cables or parallel persistent current switches (PCS) are made. In superconducting parallel circuits of an insulated multi-strand cable or a parallel persistent current switch (PCS), the current distribution during the current sweep, the persistent mode, and the quench process were investigated. In order to measure the current distribution, two methods were used. (1) Each strand was surrounded with a pure iron core with the air gap. In the air gap, a Hall probe was located. The accuracy of this method was deteriorated by the magnetic hysteresis of iron. (2) The Rogowski coil without iron was used for the current measurement of each path in a 4-parallel PCS. As a result, it was shown that the current distribution characteristics of a parallel PCS is very similar to that of an insulated multi-strand cable for the quench process

  14. Diverse Assessment and Active Student Engagement Sustain Deep Learning: A Comparative Study of Outcomes in Two Parallel Introductory Biochemistry Courses

    Science.gov (United States)

    Bevan, Samantha J.; Chan, Cecilia W. L.; Tanner, Julian A.

    2014-01-01

    Although there is increasing evidence for a relationship between courses that emphasize student engagement and achievement of student deep learning, there is a paucity of quantitative comparative studies in a biochemistry and molecular biology context. Here, we present a pedagogical study in two contrasting parallel biochemistry introductory…

  15. Construction of a digital elevation model: methods and parallelization

    International Nuclear Information System (INIS)

    Mazzoni, Christophe

    1995-01-01

    The aim of this work is to reduce the computation time needed to produce the Digital Elevation Models (DEM) by using a parallel machine. It is made in collaboration between the French 'Institut Geographique National' (IGN) and the Laboratoire d'Electronique de Technologie et d'Instrumentation (LETI) of the French Atomic Energy Commission (CEA). The IGN has developed a system which provides DEM that is used to produce topographic maps. The kernel of this system is the correlator, a software which automatically matches pairs of homologous points of a stereo-pair of photographs. Nevertheless the correlator is expensive In computing time. In order to reduce computation time and to produce the DEM with same accuracy that the actual system, we have parallelized the IGN's correlator on the OPENVISION system. This hardware solution uses a SIMD (Single Instruction Multiple Data) parallel machine SYMPATI-2, developed by the LETI that is involved in parallel architecture and image processing. Our analysis of the implementation has demonstrated the difficulty of efficient coupling between scalar and parallel structure. So we propose solutions to reinforce this coupling. In order to accelerate more the processing we evaluate SYMPHONIE, a SIMD calculator, successor of SYMPATI-2. On an other hand, we developed a multi-agent approach for what a MIMD (Multiple Instruction, Multiple Data) architecture is available. At last, we describe a Multi-SIMD architecture that conciliates our two approaches. This architecture offers a capacity to apprehend efficiently multi-level treatment image. It is flexible by its modularity, and its communication network supplies reliability that interest sensible systems. (author) [fr

  16. Parallel and non-parallel laminar mixed convection flow in an inclined tube: The effect of the boundary conditions

    International Nuclear Information System (INIS)

    Barletta, A.

    2008-01-01

    The necessary condition for the onset of parallel flow in the fully developed region of an inclined duct is applied to the case of a circular tube. Parallel flow in inclined ducts is an uncommon regime, since in most cases buoyancy tends to produce the onset of secondary flow. The present study shows how proper thermal boundary conditions may preserve parallel flow regime. Mixed convection flow is studied for a special non-axisymmetric thermal boundary condition that, with a proper choice of a switch parameter, may be compatible with parallel flow. More precisely, a circumferentially variable heat flux distribution is prescribed on the tube wall, expressed as a sinusoidal function of the azimuthal coordinate θ with period 2π. A π/2 rotation in the position of the maximum heat flux, achieved by setting the switch parameter, may allow or not the existence of parallel flow. Two cases are considered corresponding to parallel and non-parallel flow. In the first case, the governing balance equations allow a simple analytical solution. On the contrary, in the second case, the local balance equations are solved numerically by employing a finite element method

  17. Practical parallel computing

    CERN Document Server

    Morse, H Stephen

    1994-01-01

    Practical Parallel Computing provides information pertinent to the fundamental aspects of high-performance parallel processing. This book discusses the development of parallel applications on a variety of equipment.Organized into three parts encompassing 12 chapters, this book begins with an overview of the technology trends that converge to favor massively parallel hardware over traditional mainframes and vector machines. This text then gives a tutorial introduction to parallel hardware architectures. Other chapters provide worked-out examples of programs using several parallel languages. Thi

  18. Parallel rendering

    Science.gov (United States)

    Crockett, Thomas W.

    1995-01-01

    This article provides a broad introduction to the subject of parallel rendering, encompassing both hardware and software systems. The focus is on the underlying concepts and the issues which arise in the design of parallel rendering algorithms and systems. We examine the different types of parallelism and how they can be applied in rendering applications. Concepts from parallel computing, such as data decomposition, task granularity, scalability, and load balancing, are considered in relation to the rendering problem. We also explore concepts from computer graphics, such as coherence and projection, which have a significant impact on the structure of parallel rendering algorithms. Our survey covers a number of practical considerations as well, including the choice of architectural platform, communication and memory requirements, and the problem of image assembly and display. We illustrate the discussion with numerous examples from the parallel rendering literature, representing most of the principal rendering methods currently used in computer graphics.

  19. Parallel Application Performance on Two Generations of Intel Xeon HPC Platforms

    Energy Technology Data Exchange (ETDEWEB)

    Chang, Christopher H.; Long, Hai; Sides, Scott; Vaidhynathan, Deepthi; Jones, Wesley

    2015-10-15

    Two next-generation node configurations hosting the Haswell microarchitecture were tested with a suite of microbenchmarks and application examples, and compared with a current Ivy Bridge production node on NREL" tm s Peregrine high-performance computing cluster. A primary conclusion from this study is that the additional cores are of little value to individual task performance--limitations to application parallelism, or resource contention among concurrently running but independent tasks, limits effective utilization of these added cores. Hyperthreading generally impacts throughput negatively, but can improve performance in the absence of detailed attention to runtime workflow configuration. The observations offer some guidance to procurement of future HPC systems at NREL. First, raw core count must be balanced with available resources, particularly memory bandwidth. Balance-of-system will determine value more than processor capability alone. Second, hyperthreading continues to be largely irrelevant to the workloads that are commonly seen, and were tested here, at NREL. Finally, perhaps the most impactful enhancement to productivity might occur through enabling multiple concurrent jobs per node. Given the right type and size of workload, more may be achieved by doing many slow things at once, than fast things in order.

  20. Optimisation of a parallel ocean general circulation model

    Directory of Open Access Journals (Sweden)

    M. I. Beare

    1997-10-01

    Full Text Available This paper presents the development of a general-purpose parallel ocean circulation model, for use on a wide range of computer platforms, from traditional scalar machines to workstation clusters and massively parallel processors. Parallelism is provided, as a modular option, via high-level message-passing routines, thus hiding the technical intricacies from the user. An initial implementation highlights that the parallel efficiency of the model is adversely affected by a number of factors, for which optimisations are discussed and implemented. The resulting ocean code is portable and, in particular, allows science to be achieved on local workstations that could otherwise only be undertaken on state-of-the-art supercomputers.

  1. Optimisation of a parallel ocean general circulation model

    Directory of Open Access Journals (Sweden)

    M. I. Beare

    Full Text Available This paper presents the development of a general-purpose parallel ocean circulation model, for use on a wide range of computer platforms, from traditional scalar machines to workstation clusters and massively parallel processors. Parallelism is provided, as a modular option, via high-level message-passing routines, thus hiding the technical intricacies from the user. An initial implementation highlights that the parallel efficiency of the model is adversely affected by a number of factors, for which optimisations are discussed and implemented. The resulting ocean code is portable and, in particular, allows science to be achieved on local workstations that could otherwise only be undertaken on state-of-the-art supercomputers.

  2. Development of design technology on thermal-hydraulic performance in tight-lattice rod bundle. 4. Large paralleled simulation by the advanced two-fluid model code

    International Nuclear Information System (INIS)

    Misawa, Takeharu; Yoshida, Hiroyuki; Akimoto, Hajime

    2008-01-01

    In Japan Atomic Energy Agency (JAEA), the Innovative Water Reactor for Flexible Fuel Cycle (FLWR) has been developed. For thermal design of FLWR, it is necessary to develop analytical method to predict boiling transition of FLWR. Japan Atomic Energy Agency (JAEA) has been developing three-dimensional two-fluid model analysis code ACE-3D, which adopts boundary fitted coordinate system to simulate complex shape channel flow. In this paper, as a part of development of ACE-3D to apply to rod bundle analysis, introduction of parallelization to ACE-3D and assessments of ACE-3D are shown. In analysis of large-scale domain such as a rod bundle, even two-fluid model requires large number of computational cost, which exceeds upper limit of memory amount of 1 CPU. Therefore, parallelization was introduced to ACE-3D to divide data amount for analysis of large-scale domain among large number of CPUs, and it is confirmed that analysis of large-scale domain such as a rod bundle can be performed by parallel computation with keeping parallel computation performance even using large number of CPUs. ACE-3D adopts two-phase flow models, some of which are dependent upon channel geometry. Therefore, analyses in the domains, which simulate individual subchannel and 37 rod bundle, are performed, and compared with experiments. It is confirmed that the results obtained by both analyses using ACE-3D show agreement with past experimental result qualitatively. (author)

  3. Research on Parallel Three Phase PWM Converters base on RTDS

    Science.gov (United States)

    Xia, Yan; Zou, Jianxiao; Li, Kai; Liu, Jingbo; Tian, Jun

    2018-01-01

    Converters parallel operation can increase capacity of the system, but it may lead to potential zero-sequence circulating current, so the control of circulating current was an important goal in the design of parallel inverters. In this paper, the Real Time Digital Simulator (RTDS) is used to model the converters parallel system in real time and study the circulating current restraining. The equivalent model of two parallel converters and zero-sequence circulating current(ZSCC) were established and analyzed, then a strategy using variable zero vector control was proposed to suppress the circulating current. For two parallel modular converters, hardware-in-the-loop(HIL) study based on RTDS and practical experiment were implemented, results prove that the proposed control strategy is feasible and effective.

  4. A Computer Simulation of the System-Wide Effects of Parallel-Offset Route Maneuvers

    Science.gov (United States)

    Lauderdale, Todd A.; Santiago, Confesor; Pankok, Carl

    2010-01-01

    Most aircraft managed by air-traffic controllers in the National Airspace System are capable of flying parallel-offset routes. This paper presents the results of two related studies on the effects of increased use of offset routes as a conflict resolution maneuver. The first study analyzes offset routes in the context of all standard resolution types which air-traffic controllers currently use. This study shows that by utilizing parallel-offset route maneuvers, significant system-wide savings in delay due to conflict resolution of up to 30% are possible. It also shows that most offset resolutions replace horizontal-vectoring resolutions. The second study builds on the results of the first and directly compares offset resolutions and standard horizontal-vectoring maneuvers to determine that in-trail conflicts are often more efficiently resolved by offset maneuvers.

  5. Visualization of biomedical image data and irradiation planning using a parallel computing system

    International Nuclear Information System (INIS)

    Lehrig, R.

    1991-01-01

    The contribution explains the development of a novel, low-cost workstation for the processing of biomedical tomographic data sequences. The workstation was to allow both graphical display of the data and implementation of modelling software for irradiation planning, especially for calculation of dose distributions on the basis of the measured tomogram data. The system developed according to these criteria is a parallel computing system which performs secondary, two-dimensional image reconstructions irrespective of the imaging direction of the original tomographic scans. Three-dimensional image reconstructions can be generated from any direction of view, with random selection of sections of the scanned object. (orig./MM) With 69 figs., 2 tabs [de

  6. Parallel computations

    CERN Document Server

    1982-01-01

    Parallel Computations focuses on parallel computation, with emphasis on algorithms used in a variety of numerical and physical applications and for many different types of parallel computers. Topics covered range from vectorization of fast Fourier transforms (FFTs) and of the incomplete Cholesky conjugate gradient (ICCG) algorithm on the Cray-1 to calculation of table lookups and piecewise functions. Single tridiagonal linear systems and vectorized computation of reactive flow are also discussed.Comprised of 13 chapters, this volume begins by classifying parallel computers and describing techn

  7. A parallel approach to the stable marriage problem

    DEFF Research Database (Denmark)

    Larsen, Jesper

    1997-01-01

    This paper describes two parallel algorithms for the stable marriage problem implemented on a MIMD parallel computer. The algorithms are tested against sequential algorithms on randomly generated and worst-case instances. The results clearly show that the combination fo a very simple problem...... and a commercial MIMD system results in parallel algorithms which are not competitive with sequential algorithms wrt. practical performance. 1 Introduction In 1962 the Stable Marriage Problem was....

  8. Massively parallel multicanonical simulations

    Science.gov (United States)

    Gross, Jonathan; Zierenberg, Johannes; Weigel, Martin; Janke, Wolfhard

    2018-03-01

    Generalized-ensemble Monte Carlo simulations such as the multicanonical method and similar techniques are among the most efficient approaches for simulations of systems undergoing discontinuous phase transitions or with rugged free-energy landscapes. As Markov chain methods, they are inherently serial computationally. It was demonstrated recently, however, that a combination of independent simulations that communicate weight updates at variable intervals allows for the efficient utilization of parallel computational resources for multicanonical simulations. Implementing this approach for the many-thread architecture provided by current generations of graphics processing units (GPUs), we show how it can be efficiently employed with of the order of 104 parallel walkers and beyond, thus constituting a versatile tool for Monte Carlo simulations in the era of massively parallel computing. We provide the fully documented source code for the approach applied to the paradigmatic example of the two-dimensional Ising model as starting point and reference for practitioners in the field.

  9. Two interacting spins in external fields. Four-level systems

    Energy Technology Data Exchange (ETDEWEB)

    Bagrov, V.G.; Baldiotti, M.C.; Gitman, D.M. [Instituto de Fisica, Universidade de Sao Paulo, Caixa Postal 66318-CEP, 05315-970 Sao Paulo, S.P. (Brazil); Levin, A.D. [Dexter Research Center (United States)

    2007-04-15

    In the present article, we consider the so-called two-spin equation that describes four-level quantum systems. Recently, these systems attract attention due to their relation to the problem of quantum computation. We study general properties of the two-spin equation and show that the problem for certain external backgrounds can be identified with the problem of one spin in an appropriate background. This allows one to generate a number of exact solutions for two-spin equations on the basis of already known exact solutions of the one-spin equation. Besides, we present some exact solutions for the two-spin equation with an external background different for each spin but having the same direction. We study the eigenvalue problem for a time-independent spin interaction and a time-independent external background. A possible analogue of the Rabi problem for the two-spin equation is defined. We present its exact solution and demonstrate the existence of magnetic resonances in two specific frequencies, one of them coinciding with the Rabi frequency, and the other depending on the rotating field magnitude. The resonance that corresponds to the second frequency is suppressed with respect to the first one. (Abstract Copyright [2007], Wiley Periodicals, Inc.)

  10. Broadcasting a message in a parallel computer

    Science.gov (United States)

    Berg, Jeremy E [Rochester, MN; Faraj, Ahmad A [Rochester, MN

    2011-08-02

    Methods, systems, and products are disclosed for broadcasting a message in a parallel computer. The parallel computer includes a plurality of compute nodes connected together using a data communications network. The data communications network optimized for point to point data communications and is characterized by at least two dimensions. The compute nodes are organized into at least one operational group of compute nodes for collective parallel operations of the parallel computer. One compute node of the operational group assigned to be a logical root. Broadcasting a message in a parallel computer includes: establishing a Hamiltonian path along all of the compute nodes in at least one plane of the data communications network and in the operational group; and broadcasting, by the logical root to the remaining compute nodes, the logical root's message along the established Hamiltonian path.

  11. Plane parallel radiance transport for global illumination in vegetation

    Energy Technology Data Exchange (ETDEWEB)

    Max, N.; Mobley, C.; Keating, B.; Wu, E.H.

    1997-01-05

    This paper applies plane parallel radiance transport techniques to scattering from vegetation. The leaves, stems, and branches are represented as a volume density of scattering surfaces, depending only on height and the vertical component of the surface normal. Ordinary differential equations are written for the multiply scattered radiance as a function of the height above the ground, with the sky radiance and ground reflectance as boundary conditions. They are solved using a two-pass integration scheme to unify the two-point boundary conditions, and Fourier series for the dependence on the azimuthal angle. The resulting radiance distribution is used to precompute diffuse and specular `ambient` shading tables, as a function of height and surface normal, to be used in rendering, together with a z-buffer shadow algorithm for direct solar illumination.

  12. Parallel supercomputing: Advanced methods, algorithms, and software for large-scale linear and nonlinear problems

    Energy Technology Data Exchange (ETDEWEB)

    Carey, G.F.; Young, D.M.

    1993-12-31

    The program outlined here is directed to research on methods, algorithms, and software for distributed parallel supercomputers. Of particular interest are finite element methods and finite difference methods together with sparse iterative solution schemes for scientific and engineering computations of very large-scale systems. Both linear and nonlinear problems will be investigated. In the nonlinear case, applications with bifurcation to multiple solutions will be considered using continuation strategies. The parallelizable numerical methods of particular interest are a family of partitioning schemes embracing domain decomposition, element-by-element strategies, and multi-level techniques. The methods will be further developed incorporating parallel iterative solution algorithms with associated preconditioners in parallel computer software. The schemes will be implemented on distributed memory parallel architectures such as the CRAY MPP, Intel Paragon, the NCUBE3, and the Connection Machine. We will also consider other new architectures such as the Kendall-Square (KSQ) and proposed machines such as the TERA. The applications will focus on large-scale three-dimensional nonlinear flow and reservoir problems with strong convective transport contributions. These are legitimate grand challenge class computational fluid dynamics (CFD) problems of significant practical interest to DOE. The methods developed and algorithms will, however, be of wider interest.

  13. Parallel sorting algorithms

    CERN Document Server

    Akl, Selim G

    1985-01-01

    Parallel Sorting Algorithms explains how to use parallel algorithms to sort a sequence of items on a variety of parallel computers. The book reviews the sorting problem, the parallel models of computation, parallel algorithms, and the lower bounds on the parallel sorting problems. The text also presents twenty different algorithms, such as linear arrays, mesh-connected computers, cube-connected computers. Another example where algorithm can be applied is on the shared-memory SIMD (single instruction stream multiple data stream) computers in which the whole sequence to be sorted can fit in the

  14. High-energy physics software parallelization using database techniques

    International Nuclear Information System (INIS)

    Argante, E.; Van der Stok, P.D.V.; Willers, I.

    1997-01-01

    A programming model for software parallelization, called CoCa, is introduced that copes with problems caused by typical features of high-energy physics software. By basing CoCa on the database transaction paradigm, the complexity induced by the parallelization is for a large part transparent to the programmer, resulting in a higher level of abstraction than the native message passing software. CoCa is implemented on a Meiko CS-2 and on a SUN SPARCcenter 2000 parallel computer. On the CS-2, the performance is comparable with the performance of native PVM and MPI. (orig.)

  15. Parallel processing of structural integrity analysis codes

    International Nuclear Information System (INIS)

    Swami Prasad, P.; Dutta, B.K.; Kushwaha, H.S.

    1996-01-01

    Structural integrity analysis forms an important role in assessing and demonstrating the safety of nuclear reactor components. This analysis is performed using analytical tools such as Finite Element Method (FEM) with the help of digital computers. The complexity of the problems involved in nuclear engineering demands high speed computation facilities to obtain solutions in reasonable amount of time. Parallel processing systems such as ANUPAM provide an efficient platform for realising the high speed computation. The development and implementation of software on parallel processing systems is an interesting and challenging task. The data and algorithm structure of the codes plays an important role in exploiting the parallel processing system capabilities. Structural analysis codes based on FEM can be divided into two categories with respect to their implementation on parallel processing systems. The first category codes such as those used for harmonic analysis, mechanistic fuel performance codes need not require the parallelisation of individual modules of the codes. The second category of codes such as conventional FEM codes require parallelisation of individual modules. In this category, parallelisation of equation solution module poses major difficulties. Different solution schemes such as domain decomposition method (DDM), parallel active column solver and substructuring method are currently used on parallel processing systems. Two codes, FAIR and TABS belonging to each of these categories have been implemented on ANUPAM. The implementation details of these codes and the performance of different equation solvers are highlighted. (author). 5 refs., 12 figs., 1 tab

  16. Large-Scale Parallel Viscous Flow Computations using an Unstructured Multigrid Algorithm

    Science.gov (United States)

    Mavriplis, Dimitri J.

    1999-01-01

    The development and testing of a parallel unstructured agglomeration multigrid algorithm for steady-state aerodynamic flows is discussed. The agglomeration multigrid strategy uses a graph algorithm to construct the coarse multigrid levels from the given fine grid, similar to an algebraic multigrid approach, but operates directly on the non-linear system using the FAS (Full Approximation Scheme) approach. The scalability and convergence rate of the multigrid algorithm are examined on the SGI Origin 2000 and the Cray T3E. An argument is given which indicates that the asymptotic scalability of the multigrid algorithm should be similar to that of its underlying single grid smoothing scheme. For medium size problems involving several million grid points, near perfect scalability is obtained for the single grid algorithm, while only a slight drop-off in parallel efficiency is observed for the multigrid V- and W-cycles, using up to 128 processors on the SGI Origin 2000, and up to 512 processors on the Cray T3E. For a large problem using 25 million grid points, good scalability is observed for the multigrid algorithm using up to 1450 processors on a Cray T3E, even when the coarsest grid level contains fewer points than the total number of processors.

  17. Parallel evolutionary computation in bioinformatics applications.

    Science.gov (United States)

    Pinho, Jorge; Sobral, João Luis; Rocha, Miguel

    2013-05-01

    A large number of optimization problems within the field of Bioinformatics require methods able to handle its inherent complexity (e.g. NP-hard problems) and also demand increased computational efforts. In this context, the use of parallel architectures is a necessity. In this work, we propose ParJECoLi, a Java based library that offers a large set of metaheuristic methods (such as Evolutionary Algorithms) and also addresses the issue of its efficient execution on a wide range of parallel architectures. The proposed approach focuses on the easiness of use, making the adaptation to distinct parallel environments (multicore, cluster, grid) transparent to the user. Indeed, this work shows how the development of the optimization library can proceed independently of its adaptation for several architectures, making use of Aspect-Oriented Programming. The pluggable nature of parallelism related modules allows the user to easily configure its environment, adding parallelism modules to the base source code when needed. The performance of the platform is validated with two case studies within biological model optimization. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.

  18. HAM solutions on MHD squeezing axisymmetric flow of water nanofluid through saturated porous medium between two parallel disks

    Science.gov (United States)

    Reddy, B. Siva Kumar; Rao, K. V. Surya Narayana; Vijaya, R. Bhuvana

    2017-07-01

    In this paper, we have considered the unsteady magnetohydrodynamic squeezing axi-symmetric flow of water-nanofluid through saturated porous medium between two parallel disks. The equations for the governing flow are solved by Galerkin optimal Homotopy asymptotic method. The effects of non-dimensional parameters on velocity, temperature and concentration have been discussed with the help of graphs. Also we obtained local Nusselt number and computationally discussed with reference to flow parameters.

  19. Two NP-hardness results for preemptive minsum scheduling of unrelated parallel machines

    NARCIS (Netherlands)

    Sitters, R.A.; Aardal, K.; Gerards, B.

    2001-01-01

    We show that the problems of minimizing total completion time and of minimizing the number of late jobs on unrelated parallel machines, when preemption is allowed, are both NP-hard in the strong sense. The former result settles a long-standing open question.

  20. The BLAZE language - A parallel language for scientific programming

    Science.gov (United States)

    Mehrotra, Piyush; Van Rosendale, John

    1987-01-01

    A Pascal-like scientific programming language, BLAZE, is described. BLAZE contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus BLAZE should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with conceptually sequential control flow. A central goal in the design of BLAZE is portability across a broad range of parallel architectures. The multiple levels of parallelism present in BLAZE code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of BLAZE are described and it is shown how this language would be used in typical scientific programming.

  1. The BLAZE language: A parallel language for scientific programming

    Science.gov (United States)

    Mehrotra, P.; Vanrosendale, J.

    1985-01-01

    A Pascal-like scientific programming language, Blaze, is described. Blaze contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus Blaze should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with onceptually sequential control flow. A central goal in the design of Blaze is portability across a broad range of parallel architectures. The multiple levels of parallelism present in Blaze code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of Blaze are described and shows how this language would be used in typical scientific programming.

  2. Aspects of two-level systems under external time-dependent fields

    Energy Technology Data Exchange (ETDEWEB)

    Bagrov, V.G.; Wreszinski, W.F. [Tomsk State University and Tomsk Institute of High Current Electronics (Russian Federation); Barata, J.C.A.; Gitman D.M. [Universidade de Sao Paulo, Instituto de Fisica (Brazil)]. E-mails: jbarata@fma.if.usp.br; gitman@fma.if.usp.br

    2001-12-14

    The dynamics of two-level systems in time-dependent backgrounds is under consideration. We present some new exact solutions in special backgrounds decaying in time. On the other hand, following ideas of Feynman et al, we discuss in detail the possibility of reducing the quantum dynamics to a classical Hamiltonian system. This, in particular, opens the possibility of directly applying powerful methods of classical mechanics (e.g. KAM methods) to study the quantum system. Following such an approach, we draw conclusions of relevance for 'quantum chaos' when the external background is periodic or quasi-periodic in time. (author)

  3. Parallel programming practical aspects, models and current limitations

    CERN Document Server

    Tarkov, Mikhail S

    2014-01-01

    Parallel programming is designed for the use of parallel computer systems for solving time-consuming problems that cannot be solved on a sequential computer in a reasonable time. These problems can be divided into two classes: 1. Processing large data arrays (including processing images and signals in real time)2. Simulation of complex physical processes and chemical reactions For each of these classes, prospective methods are designed for solving problems. For data processing, one of the most promising technologies is the use of artificial neural networks. Particles-in-cell method and cellular automata are very useful for simulation. Problems of scalability of parallel algorithms and the transfer of existing parallel programs to future parallel computers are very acute now. An important task is to optimize the use of the equipment (including the CPU cache) of parallel computers. Along with parallelizing information processing, it is essential to ensure the processing reliability by the relevant organization ...

  4. A comprehensive study of task coalescing for selecting parallelism granularity in a two-stage bidiagonal reduction

    KAUST Repository

    Haidar, Azzam

    2012-05-01

    We present new high performance numerical kernels combined with advanced optimization techniques that significantly increase the performance of parallel bidiagonal reduction. Our approach is based on developing efficient fine-grained computational tasks as well as reducing overheads associated with their high-level scheduling during the so-called bulge chasing procedure that is an essential phase of a scalable bidiagonalization procedure. In essence, we coalesce multiple tasks in a way that reduces the time needed to switch execution context between the scheduler and useful computational tasks. At the same time, we maintain the crucial information about the tasks and their data dependencies between the coalescing groups. This is the necessary condition to preserve numerical correctness of the computation. We show our annihilation strategy based on multiple applications of single orthogonal reflectors. Despite non-trivial characteristics in computational complexity and memory access patterns, our optimization approach smoothly applies to the annihilation scenario. The coalescing positively influences another equally important aspect of the bulge chasing stage: the memory reuse. For the tasks within the coalescing groups, the data is retained in high levels of the cache hierarchy and, as a consequence, operations that are normally memory-bound increase their ratio of computation to off-chip communication and become compute-bound which renders them amenable to efficient execution on multicore architectures. The performance for the new two-stage bidiagonal reduction is staggering. Our implementation results in up to 50-fold and 12-fold improvement (∼130 Gflop/s) compared to the equivalent routines from LAPACK V3.2 and Intel MKL V10.3, respectively, on an eight socket hexa-core AMD Opteron multicore shared-memory system with a matrix size of 24000 x 24000. Last but not least, we provide a comprehensive study on the impact of the coalescing group size in terms of cache

  5. A Coupling Tool for Parallel Molecular Dynamics-Continuum Simulations

    KAUST Repository

    Neumann, Philipp

    2012-06-01

    We present a tool for coupling Molecular Dynamics and continuum solvers. It is written in C++ and is meant to support the developers of hybrid molecular - continuum simulations in terms of both realisation of the respective coupling algorithm as well as parallel execution of the hybrid simulation. We describe the implementational concept of the tool and its parallel extensions. We particularly focus on the parallel execution of particle insertions into dense molecular systems and propose a respective parallel algorithm. Our implementations are validated for serial and parallel setups in two and three dimensions. © 2012 IEEE.

  6. Personal Level Customer Orientation in Russian Direct Selling Market

    Directory of Open Access Journals (Sweden)

    Alexander Rozhkov

    2014-06-01

    Full Text Available In the modern world the importance of customer orientation cannot be underestimated. It hugely impacts the overall business performance, as well as separate areas of business-customer interaction. In this paper, we examine the role of personal level relations and customer orientation in the direct selling industry in the Russian market. Based on a sample of over 6000 participants in 74 regions of Russia, we develop a model revealing the factors that define the level of customer orientation in personal level interactions.

  7. Enhancing Scalability of Sparse Direct Methods

    International Nuclear Information System (INIS)

    Li, Xiaoye S.; Demmel, James; Grigori, Laura; Gu, Ming; Xia, Jianlin; Jardin, Steve; Sovinec, Carl; Lee, Lie-Quan

    2007-01-01

    TOPS is providing high-performance, scalable sparse direct solvers, which have had significant impacts on the SciDAC applications, including fusion simulation (CEMM), accelerator modeling (COMPASS), as well as many other mission-critical applications in DOE and elsewhere. Our recent developments have been focusing on new techniques to overcome scalability bottleneck of direct methods, in both time and memory. These include parallelizing symbolic analysis phase and developing linear-complexity sparse factorization methods. The new techniques will make sparse direct methods more widely usable in large 3D simulations on highly-parallel petascale computers

  8. Programming parallel architectures - The BLAZE family of languages

    Science.gov (United States)

    Mehrotra, Piyush

    1989-01-01

    This paper gives an overview of the various approaches to programming multiprocessor architectures that are currently being explored. It is argued that two of these approaches, interactive programming environments and functional parallel languages, are particularly attractive, since they remove much of the burden of exploiting parallel architectures from the user. This paper also describes recent work in the design of parallel languages. Research on languages for both shared and nonshared memory multiprocessors is described.

  9. Parallel and Distributed Systems for Probabilistic Reasoning

    Science.gov (United States)

    2012-12-01

    Ranganathan "et"al...typically a random permutation over the vertices. Advances by Elidan et al. [2006] and Ranganathan et al. [2007] have focused on dynamic asynchronous...Wildfire algorithm shown in Alg. 3.6 is a direct parallelization of the algorithm proposed by [ Ranganathan et al., 2007]. The Wildfire algorithm

  10. Kemari: A Portable High Performance Fortran System for Distributed Memory Parallel Processors

    Directory of Open Access Journals (Sweden)

    T. Kamachi

    1997-01-01

    Full Text Available We have developed a compilation system which extends High Performance Fortran (HPF in various aspects. We support the parallelization of well-structured problems with loop distribution and alignment directives similar to HPF's data distribution directives. Such directives give both additional control to the user and simplify the compilation process. For the support of unstructured problems, we provide directives for dynamic data distribution through user-defined mappings. The compiler also allows integration of message-passing interface (MPI primitives. The system is part of a complete programming environment which also comprises a parallel debugger and a performance monitor and analyzer. After an overview of the compiler, we describe the language extensions and related compilation mechanisms in detail. Performance measurements demonstrate the compiler's applicability to a variety of application classes.

  11. Coherent transport in a system of periodic linear chain of quantum dots situated between two parallel quantum wires

    International Nuclear Information System (INIS)

    Petrosyan, Lyudvig S

    2016-01-01

    We study coherent transport in a system of periodic linear chain of quantum dots situated between two parallel quantum wires. We show that the resonant-tunneling conductance between the wires exhibits a Rabi splitting of the resonance peak as a function of Fermi energy in the wires. This effect is an electron transport analogue of the Rabi splitting in optical spectra of two interacting systems. The conductance peak splitting originates from the anticrossing of Bloch bands in a periodic system that is caused by a strong coupling between the electron states in the quantum dot chain and quantum wires. (paper)

  12. Variation in efficiency of parallel algorithms. [for study of stiffness matrices in planar trusses

    Science.gov (United States)

    Hayashi, A.; Melosh, R. J.; Utku, S.; Salama, M.

    1985-01-01

    The present study has the objective to investigate some iterative parallel-processor linear equation solving algorithms with respect to efficiency for analyses of typical linear engineering systems. Attention is given to a set of n linear equations, Ku = p, where K = an n x n positive definite, sparsely populated, symmetric matrix, u = an n x 1 vector of unknown responses, and p = an n x 1 vector of prescribed constants. This study is concerned with a hybrid method in which iteration is used to solve the problem, while a direct method is used on the local processor level. Variations in the efficiency of parallel algorithms are explored. Measures of the efficiency are based on computer experiments regarding the algorithms. For all the algorithms, the wall clock time is found to decrease as the number of processors increases.

  13. Functional Parallel Factor Analysis for Functions of One- and Two-dimensional Arguments

    NARCIS (Netherlands)

    Choi, Ji Yeh; Hwang, Heungsun; Timmerman, Marieke

    Parallel factor analysis (PARAFAC) is a useful multivariate method for decomposing three-way data that consist of three different types of entities simultaneously. This method estimates trilinear components, each of which is a low-dimensional representation of a set of entities, often called a mode,

  14. Fencing data transfers in a parallel active messaging interface of a parallel computer

    Science.gov (United States)

    Blocksome, Michael A.; Mamidala, Amith R.

    2015-06-02

    Fencing data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task; the compute nodes coupled for data communications through the PAMI and through data communications resources including at least one segment of shared random access memory; including initiating execution through the PAMI of an ordered sequence of active SEND instructions for SEND data transfers between two endpoints, effecting deterministic SEND data transfers through a segment of shared memory; and executing through the PAMI, with no FENCE accounting for SEND data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all SEND instructions initiated prior to execution of the FENCE instruction for SEND data transfers between the two endpoints.

  15. Parallel multigrid smoothing: polynomial versus Gauss-Seidel

    International Nuclear Information System (INIS)

    Adams, Mark; Brezina, Marian; Hu, Jonathan; Tuminaro, Ray

    2003-01-01

    Gauss-Seidel is often the smoother of choice within multigrid applications. In the context of unstructured meshes, however, maintaining good parallel efficiency is difficult with multiplicative iterative methods such as Gauss-Seidel. This leads us to consider alternative smoothers. We discuss the computational advantages of polynomial smoothers within parallel multigrid algorithms for positive definite symmetric systems. Two particular polynomials are considered: Chebyshev and a multilevel specific polynomial. The advantages of polynomial smoothing over traditional smoothers such as Gauss-Seidel are illustrated on several applications: Poisson's equation, thin-body elasticity, and eddy current approximations to Maxwell's equations. While parallelizing the Gauss-Seidel method typically involves a compromise between a scalable convergence rate and maintaining high flop rates, polynomial smoothers achieve parallel scalable multigrid convergence rates without sacrificing flop rates. We show that, although parallel computers are the main motivation, polynomial smoothers are often surprisingly competitive with Gauss-Seidel smoothers on serial machines

  16. Parallel multigrid smoothing: polynomial versus Gauss-Seidel

    Science.gov (United States)

    Adams, Mark; Brezina, Marian; Hu, Jonathan; Tuminaro, Ray

    2003-07-01

    Gauss-Seidel is often the smoother of choice within multigrid applications. In the context of unstructured meshes, however, maintaining good parallel efficiency is difficult with multiplicative iterative methods such as Gauss-Seidel. This leads us to consider alternative smoothers. We discuss the computational advantages of polynomial smoothers within parallel multigrid algorithms for positive definite symmetric systems. Two particular polynomials are considered: Chebyshev and a multilevel specific polynomial. The advantages of polynomial smoothing over traditional smoothers such as Gauss-Seidel are illustrated on several applications: Poisson's equation, thin-body elasticity, and eddy current approximations to Maxwell's equations. While parallelizing the Gauss-Seidel method typically involves a compromise between a scalable convergence rate and maintaining high flop rates, polynomial smoothers achieve parallel scalable multigrid convergence rates without sacrificing flop rates. We show that, although parallel computers are the main motivation, polynomial smoothers are often surprisingly competitive with Gauss-Seidel smoothers on serial machines.

  17. Iteration schemes for parallelizing models of superconductivity

    Energy Technology Data Exchange (ETDEWEB)

    Gray, P.A. [Michigan State Univ., East Lansing, MI (United States)

    1996-12-31

    The time dependent Lawrence-Doniach model, valid for high fields and high values of the Ginzburg-Landau parameter, is often used for studying vortex dynamics in layered high-T{sub c} superconductors. When solving these equations numerically, the added degrees of complexity due to the coupling and nonlinearity of the model often warrant the use of high-performance computers for their solution. However, the interdependence between the layers can be manipulated so as to allow parallelization of the computations at an individual layer level. The reduced parallel tasks may then be solved independently using a heterogeneous cluster of networked workstations connected together with Parallel Virtual Machine (PVM) software. Here, this parallelization of the model is discussed and several computational implementations of varying degrees of parallelism are presented. Computational results are also given which contrast properties of convergence speed, stability, and consistency of these implementations. Included in these results are models involving the motion of vortices due to an applied current and pinning effects due to various material properties.

  18. Simulation Exploration through Immersive Parallel Planes

    Energy Technology Data Exchange (ETDEWEB)

    Brunhart-Lupo, Nicholas J [National Renewable Energy Laboratory (NREL), Golden, CO (United States); Bush, Brian W [National Renewable Energy Laboratory (NREL), Golden, CO (United States); Gruchalla, Kenny M [National Renewable Energy Laboratory (NREL), Golden, CO (United States); Smith, Steve [Los Alamos Visualization Associates

    2017-05-25

    We present a visualization-driven simulation system that tightly couples systems dynamics simulations with an immersive virtual environment to allow analysts to rapidly develop and test hypotheses in a high-dimensional parameter space. To accomplish this, we generalize the two-dimensional parallel-coordinates statistical graphic as an immersive 'parallel-planes' visualization for multivariate time series emitted by simulations running in parallel with the visualization. In contrast to traditional parallel coordinate's mapping the multivariate dimensions onto coordinate axes represented by a series of parallel lines, we map pairs of the multivariate dimensions onto a series of parallel rectangles. As in the case of parallel coordinates, each individual observation in the dataset is mapped to a polyline whose vertices coincide with its coordinate values. Regions of the rectangles can be 'brushed' to highlight and select observations of interest: a 'slider' control allows the user to filter the observations by their time coordinate. In an immersive virtual environment, users interact with the parallel planes using a joystick that can select regions on the planes, manipulate selection, and filter time. The brushing and selection actions are used to both explore existing data as well as to launch additional simulations corresponding to the visually selected portions of the input parameter space. As soon as the new simulations complete, their resulting observations are displayed in the virtual environment. This tight feedback loop between simulation and immersive analytics accelerates users' realization of insights about the simulation and its output.

  19. SWAMP+: multiple subsequence alignment using associative massive parallelism

    Energy Technology Data Exchange (ETDEWEB)

    Steinfadt, Shannon Irene [Los Alamos National Laboratory; Baker, Johnnie W [KENT STATE UNIV.

    2010-10-18

    A new parallel algorithm SWAMP+ incorporates the Smith-Waterman sequence alignment on an associative parallel model known as ASC. It is a highly sensitive parallel approach that expands traditional pairwise sequence alignment. This is the first parallel algorithm to provide multiple non-overlapping, non-intersecting subsequence alignments with the accuracy of Smith-Waterman. The efficient algorithm provides multiple alignments similar to BLAST while creating a better workflow for the end users. The parallel portions of the code run in O(m+n) time using m processors. When m = n, the algorithmic analysis becomes O(n) with a coefficient of two, yielding a linear speedup. Implementation of the algorithm on the SIMD ClearSpeed CSX620 confirms this theoretical linear speedup with real timings.

  20. Pthreads vs MPI Parallel Performance of Angular-Domain Decomposed S

    International Nuclear Information System (INIS)

    Azmy, Y.Y.; Barnett, D.A.

    2000-01-01

    Two programming models for parallelizing the Angular Domain Decomposition (ADD) of the discrete ordinates (S n ) approximation of the neutron transport equation are examined. These are the shared memory model based on the POSIX threads (Pthreads) standard, and the message passing model based on the Message Passing Interface (MPI) standard. These standard libraries are available on most multiprocessor platforms thus making the resulting parallel codes widely portable. The question is: on a fixed platform, and for a particular code solving a given test problem, which of the two programming models delivers better parallel performance? Such comparison is possible on Symmetric Multi-Processors (SMP) architectures in which several CPUs physically share a common memory, and in addition are capable of emulating message passing functionality. Implementation of the two-dimensional,(S n ), Arbitrarily High Order Transport (AHOT) code for solving neutron transport problems using these two parallelization models is described. Measured parallel performance of each model on the COMPAQ AlphaServer 8400 and the SGI Origin 2000 platforms is described, and comparison of the observed speedup for the two programming models is reported. For the case presented in this paper it appears that the MPI implementation scales better than the Pthreads implementation on both platforms