WorldWideScience

Sample records for massively parallel machines

  1. Optimal evaluation of array expressions on massively parallel machines

    Science.gov (United States)

    Chatterjee, Siddhartha; Gilbert, John R.; Schreiber, Robert; Teng, Shang-Hua

    1992-01-01

    We investigate the problem of evaluating FORTRAN 90 style array expressions on massively parallel distributed-memory machines. On such machines, an elementwise operation can be performed in constant time for arrays whose corresponding elements are in the same processor. If the arrays are not aligned in this manner, the cost of aligning them is part of the cost of evaluating the expression. The choice of where to perform the operation then affects this cost. We present algorithms based on dynamic programming to solve this problem efficiently for a wide variety of interconnection schemes, including multidimensional grids and rings, hypercubes, and fat-trees. We also consider expressions containing operations that change the shape of the arrays, and show that our approach extends naturally to handle this case.

  2. C++ and Massively Parallel Computers

    Directory of Open Access Journals (Sweden)

    Daniel J. Lickly

    1993-01-01

    Full Text Available Our goal is to apply the software engineering advantages of object-oriented programming to the raw power of massively parallel architectures. To do this we have constructed a hierarchy of C++ classes to support the data-parallel paradigm. Feasibility studies and initial coding can be supported by any serial machine that has a C++ compiler. Parallel execution requires an extended Cfront, which understands the data-parallel classes and generates C* code. (C* is a data-parallel superset of ANSI C developed by Thinking Machines Corporation. This approach provides potential portability across parallel architectures and leverages the existing compiler technology for translating data-parallel programs onto both SIMD and MIMD hardware.

  3. Massively Parallel Genetics.

    Science.gov (United States)

    Shendure, Jay; Fields, Stanley

    2016-06-01

    Human genetics has historically depended on the identification of individuals whose natural genetic variation underlies an observable trait or disease risk. Here we argue that new technologies now augment this historical approach by allowing the use of massively parallel assays in model systems to measure the functional effects of genetic variation in many human genes. These studies will help establish the disease risk of both observed and potential genetic variants and to overcome the problem of "variants of uncertain significance." Copyright © 2016 by the Genetics Society of America.

  4. Massively parallel evolutionary computation on GPGPUs

    CERN Document Server

    Tsutsui, Shigeyoshi

    2013-01-01

    Evolutionary algorithms (EAs) are metaheuristics that learn from natural collective behavior and are applied to solve optimization problems in domains such as scheduling, engineering, bioinformatics, and finance. Such applications demand acceptable solutions with high-speed execution using finite computational resources. Therefore, there have been many attempts to develop platforms for running parallel EAs using multicore machines, massively parallel cluster machines, or grid computing environments. Recent advances in general-purpose computing on graphics processing units (GPGPU) have opened u

  5. Massively Parallel Computing: A Sandia Perspective

    Energy Technology Data Exchange (ETDEWEB)

    Dosanjh, Sudip S.; Greenberg, David S.; Hendrickson, Bruce; Heroux, Michael A.; Plimpton, Steve J.; Tomkins, James L.; Womble, David E.

    1999-05-06

    The computing power available to scientists and engineers has increased dramatically in the past decade, due in part to progress in making massively parallel computing practical and available. The expectation for these machines has been great. The reality is that progress has been slower than expected. Nevertheless, massively parallel computing is beginning to realize its potential for enabling significant break-throughs in science and engineering. This paper provides a perspective on the state of the field, colored by the authors' experiences using large scale parallel machines at Sandia National Laboratories. We address trends in hardware, system software and algorithms, and we also offer our view of the forces shaping the parallel computing industry.

  6. A Massively Parallel Face Recognition System

    Directory of Open Access Journals (Sweden)

    Lahdenoja Olli

    2007-01-01

    Full Text Available We present methods for processing the LBPs (local binary patterns with a massively parallel hardware, especially with CNN-UM (cellular nonlinear network-universal machine. In particular, we present a framework for implementing a massively parallel face recognition system, including a dedicated highly accurate algorithm suitable for various types of platforms (e.g., CNN-UM and digital FPGA. We study in detail a dedicated mixed-mode implementation of the algorithm and estimate its implementation cost in the view of its performance and accuracy restrictions.

  7. A Massively Parallel Face Recognition System

    Directory of Open Access Journals (Sweden)

    Ari Paasio

    2006-12-01

    Full Text Available We present methods for processing the LBPs (local binary patterns with a massively parallel hardware, especially with CNN-UM (cellular nonlinear network-universal machine. In particular, we present a framework for implementing a massively parallel face recognition system, including a dedicated highly accurate algorithm suitable for various types of platforms (e.g., CNN-UM and digital FPGA. We study in detail a dedicated mixed-mode implementation of the algorithm and estimate its implementation cost in the view of its performance and accuracy restrictions.

  8. Real-time parallel implementation of Pulse-Doppler radar signal processing chain on a massively parallel machine based on multi-core DSP and Serial RapidIO interconnect

    Science.gov (United States)

    Klilou, Abdessamad; Belkouch, Said; Elleaume, Philippe; Le Gall, Philippe; Bourzeix, François; Hassani, Moha M'Rabet

    2014-12-01

    Pulse-Doppler radars require high-computing power. A massively parallel machine has been developed in this paper to implement a Pulse-Doppler radar signal processing chain in real-time fashion. The proposed machine consists of two C6678 digital signal processors (DSPs), each with eight DSP cores, interconnected with Serial RapidIO (SRIO) bus. In this study, each individual core is considered as the basic processing element; hence, the proposed parallel machine contains 16 processing elements. A straightforward model has been adopted to distribute the Pulse-Doppler radar signal processing chain. This model provides low latency, but communication inefficiency limits system performance. This paper proposes several optimizations that greatly reduce the inter-processor communication in a straightforward model and improves the parallel efficiency of the system. A use case of the Pulse-Doppler radar signal processing chain has been used to illustrate and validate the concept of the proposed mapping model. Experimental results show that the parallel efficiency of the proposed parallel machine is about 90%.

  9. Massively parallel quantum computer simulator

    NARCIS (Netherlands)

    De Raedt, K.; Michielsen, K.; De Raedt, H.; Trieu, B.; Arnold, G.; Richter, M.; Lippert, Th.; Watanabe, H.; Ito, N.

    2007-01-01

    We describe portable software to simulate universal quantum computers on massive parallel Computers. We illustrate the use of the simulation software by running various quantum algorithms on different computer architectures, such as a IBM BlueGene/L, a IBM Regatta p690+, a Hitachi SR11000/J1, a Cray

  10. Fast, Massively Parallel Data Processors

    Science.gov (United States)

    Heaton, Robert A.; Blevins, Donald W.; Davis, ED

    1994-01-01

    Proposed fast, massively parallel data processor contains 8x16 array of processing elements with efficient interconnection scheme and options for flexible local control. Processing elements communicate with each other on "X" interconnection grid with external memory via high-capacity input/output bus. This approach to conditional operation nearly doubles speed of various arithmetic operations.

  11. Massively collaborative machine learning

    NARCIS (Netherlands)

    Rijn, van J.N.

    2016-01-01

    Many scientists are focussed on building models. We nearly process all information we perceive to a model. There are many techniques that enable computers to build models as well. The field of research that develops such techniques is called Machine Learning. Many research is devoted to develop comp

  12. Accelerating Monte Carlo Molecular Simulations Using Novel Extrapolation Schemes Combined with Fast Database Generation on Massively Parallel Machines

    KAUST Repository

    Amir, Sahar Z.

    2013-05-01

    We introduce an efficient thermodynamically consistent technique to extrapolate and interpolate normalized Canonical NVT ensemble averages like pressure and energy for Lennard-Jones (L-J) fluids. Preliminary results show promising applicability in oil and gas modeling, where accurate determination of thermodynamic properties in reservoirs is challenging. The thermodynamic interpolation and thermodynamic extrapolation schemes predict ensemble averages at different thermodynamic conditions from expensively simulated data points. The methods reweight and reconstruct previously generated database values of Markov chains at neighboring temperature and density conditions. To investigate the efficiency of these methods, two databases corresponding to different combinations of normalized density and temperature are generated. One contains 175 Markov chains with 10,000,000 MC cycles each and the other contains 3000 Markov chains with 61,000,000 MC cycles each. For such massive database creation, two algorithms to parallelize the computations have been investigated. The accuracy of the thermodynamic extrapolation scheme is investigated with respect to classical interpolation and extrapolation. Finally, thermodynamic interpolation benefiting from four neighboring Markov chains points is implemented and compared with previous schemes. The thermodynamic interpolation scheme using knowledge from the four neighboring points proves to be more accurate than the thermodynamic extrapolation from the closest point only, while both thermodynamic extrapolation and thermodynamic interpolation are more accurate than the classical interpolation and extrapolation. The investigated extrapolation scheme has great potential in oil and gas reservoir modeling.That is, such a scheme has the potential to speed up the MCMC thermodynamic computation to be comparable with conventional Equation of State approaches in efficiency. In particular, this makes it applicable to large-scale optimization of L

  13. Massive Parallel Quantum Computer Simulator

    CERN Document Server

    De Raedt, K; De Raedt, H; Ito, N; Lippert, T; Michielsen, K; Richter, M; Trieu, B; Watanabe, H; Lippert, Th.

    2006-01-01

    We describe portable software to simulate universal quantum computers on massive parallel computers. We illustrate the use of the simulation software by running various quantum algorithms on different computer architectures, such as a IBM BlueGene/L, a IBM Regatta p690+, a Hitachi SR11000/J1, a Cray X1E, a SGI Altix 3700 and clusters of PCs running Windows XP. We study the performance of the software by simulating quantum computers containing up to 36 qubits, using up to 4096 processors and up to 1 TB of memory. Our results demonstrate that the simulator exhibits nearly ideal scaling as a function of the number of processors and suggest that the simulation software described in this paper may also serve as benchmark for testing high-end parallel computers.

  14. Implementation of GAMMA on a Massively Parallel Computer

    Institute of Scientific and Technical Information of China (English)

    黄林鹏; 童维勤; 等

    1997-01-01

    The GAMMA paradigm is recently proposed by Banatre and Metayer to describe the systematic construction of parallel programs without introducing artificial sequentiality.This paper presents two synchronous execution models for GAMMA and discusses how to implement them on MasPar MP-1,a massively data parallel computer.The results show that GAMMA paradign can be implemented very naturally on data parallel machines,and very high level language,such as GAMMA in which parallelism is left implicit,is suitable for specifying massively parallel applications.

  15. A massively asynchronous, parallel brain

    Science.gov (United States)

    Zeki, Semir

    2015-01-01

    Whether the visual brain uses a parallel or a serial, hierarchical, strategy to process visual signals, the end result appears to be that different attributes of the visual scene are perceived asynchronously—with colour leading form (orientation) by 40 ms and direction of motion by about 80 ms. Whatever the neural root of this asynchrony, it creates a problem that has not been properly addressed, namely how visual attributes that are perceived asynchronously over brief time windows after stimulus onset are bound together in the longer term to give us a unified experience of the visual world, in which all attributes are apparently seen in perfect registration. In this review, I suggest that there is no central neural clock in the (visual) brain that synchronizes the activity of different processing systems. More likely, activity in each of the parallel processing-perceptual systems of the visual brain is reset independently, making of the brain a massively asynchronous organ, just like the new generation of more efficient computers promise to be. Given the asynchronous operations of the brain, it is likely that the results of activities in the different processing-perceptual systems are not bound by physiological interactions between cells in the specialized visual areas, but post-perceptually, outside the visual brain. PMID:25823871

  16. Advances in Domain Mapping of Massively Parallel Scientific Computations

    Energy Technology Data Exchange (ETDEWEB)

    Leland, Robert W. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Hendrickson, Bruce A. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

    2015-10-01

    One of the most important concerns in parallel computing is the proper distribution of workload across processors. For most scientific applications on massively parallel machines, the best approach to this distribution is to employ data parallelism; that is, to break the datastructures supporting a computation into pieces and then to assign those pieces to different processors. Collectively, these partitioning and assignment tasks comprise the domain mapping problem.

  17. Massively Parallel Finite Element Programming

    KAUST Repository

    Heister, Timo

    2010-01-01

    Today\\'s large finite element simulations require parallel algorithms to scale on clusters with thousands or tens of thousands of processor cores. We present data structures and algorithms to take advantage of the power of high performance computers in generic finite element codes. Existing generic finite element libraries often restrict the parallelization to parallel linear algebra routines. This is a limiting factor when solving on more than a few hundreds of cores. We describe routines for distributed storage of all major components coupled with efficient, scalable algorithms. We give an overview of our effort to enable the modern and generic finite element library deal.II to take advantage of the power of large clusters. In particular, we describe the construction of a distributed mesh and develop algorithms to fully parallelize the finite element calculation. Numerical results demonstrate good scalability. © 2010 Springer-Verlag.

  18. Performance of Air Pollution Models on Massively Parallel Computers

    DEFF Research Database (Denmark)

    Brown, John; Hansen, Per Christian; Wasniewski, Jerzy

    1996-01-01

    To compare the performance and use of three massively parallel SIMD computers, we implemented a large air pollution model on the computers. Using a realistic large-scale model, we gain detailed insight about the performance of the three computers when used to solve large-scale scientific problems...... that involve several types of numerical computations. The computers considered in our study are the Connection Machines CM-200 and CM-5, and the MasPar MP-2216...

  19. Adapting algorithms to massively parallel hardware

    CERN Document Server

    Sioulas, Panagiotis

    2016-01-01

    In the recent years, the trend in computing has shifted from delivering processors with faster clock speeds to increasing the number of cores per processor. This marks a paradigm shift towards parallel programming in which applications are programmed to exploit the power provided by multi-cores. Usually there is gain in terms of the time-to-solution and the memory footprint. Specifically, this trend has sparked an interest towards massively parallel systems that can provide a large number of processors, and possibly computing nodes, as in the GPUs and MPPAs (Massively Parallel Processor Arrays). In this project, the focus was on two distinct computing problems: k-d tree searches and track seeding cellular automata. The goal was to adapt the algorithms to parallel systems and evaluate their performance in different cases.

  20. A Computational Fluid Dynamics Algorithm on a Massively Parallel Computer

    Science.gov (United States)

    Jespersen, Dennis C.; Levit, Creon

    1989-01-01

    The discipline of computational fluid dynamics is demanding ever-increasing computational power to deal with complex fluid flow problems. We investigate the performance of a finite-difference computational fluid dynamics algorithm on a massively parallel computer, the Connection Machine. Of special interest is an implicit time-stepping algorithm; to obtain maximum performance from the Connection Machine, it is necessary to use a nonstandard algorithm to solve the linear systems that arise in the implicit algorithm. We find that the Connection Machine ran achieve very high computation rates on both explicit and implicit algorithms. The performance of the Connection Machine puts it in the same class as today's most powerful conventional supercomputers.

  1. Micro-mechanical Simulations of Soils using Massively Parallel Supercomputers

    Directory of Open Access Journals (Sweden)

    David W. Washington

    2004-06-01

    Full Text Available In this research a computer program, Trubal version 1.51, based on the Discrete Element Method was converted to run on a Connection Machine (CM-5,a massively parallel supercomputer with 512 nodes, to expedite the computational times of simulating Geotechnical boundary value problems. The dynamic memory algorithm in Trubal program did not perform efficiently in CM-2 machine with the Single Instruction Multiple Data (SIMD architecture. This was due to the communication overhead involving global array reductions, global array broadcast and random data movement. Therefore, a dynamic memory algorithm in Trubal program was converted to a static memory arrangement and Trubal program was successfully converted to run on CM-5 machines. The converted program was called "TRUBAL for Parallel Machines (TPM." Simulating two physical triaxial experiments and comparing simulation results with Trubal simulations validated the TPM program. With a 512 nodes CM-5 machine TPM produced a nine-fold speedup demonstrating the inherent parallelism within algorithms based on the Discrete Element Method.

  2. ADAPTATION OF PARALLEL VIRTUAL MACHINES MECHANISMS TO PARALLEL SYSTEMS

    Directory of Open Access Journals (Sweden)

    Zafer DEMİR

    2001-02-01

    Full Text Available In this study, at first, Parallel Virtual Machine is reviewed. Since It is based upon parallel processing, it is similar to parallel systems in principle in terms of architecture. Parallel Virtual Machine is neither an operating system nor a programming language. It is a specific software tool that supports heterogeneous parallel systems. However, it takes advantage of the features of both to make users close to parallel systems. Since tasks can be executed in parallel on parallel systems by Parallel Virtual Machine, there is an important similarity between PVM and distributed systems and multiple processors. In this study, the relations in question are examined by making use of Master-Slave programming technique. In conclusion, the PVM is tested with a simple factorial computation on a distributed system to observe its adaptation to parallel architects.

  3. Template based parallel checkpointing in a massively parallel computer system

    Science.gov (United States)

    Archer, Charles Jens; Inglett, Todd Alan

    2009-01-13

    A method and apparatus for a template based parallel checkpoint save for a massively parallel super computer system using a parallel variation of the rsync protocol, and network broadcast. In preferred embodiments, the checkpoint data for each node is compared to a template checkpoint file that resides in the storage and that was previously produced. Embodiments herein greatly decrease the amount of data that must be transmitted and stored for faster checkpointing and increased efficiency of the computer system. Embodiments are directed to a parallel computer system with nodes arranged in a cluster with a high speed interconnect that can perform broadcast communication. The checkpoint contains a set of actual small data blocks with their corresponding checksums from all nodes in the system. The data blocks may be compressed using conventional non-lossy data compression algorithms to further reduce the overall checkpoint size.

  4. Massively Parallel Algorithms for Solution of Schrodinger Equation

    Science.gov (United States)

    Fijany, Amir; Barhen, Jacob; Toomerian, Nikzad

    1994-01-01

    In this paper massively parallel algorithms for solution of Schrodinger equation are developed. Our results clearly indicate that the Crank-Nicolson method, in addition to its excellent numerical properties, is also highly suitable for massively parallel computation.

  5. Calibration of a Parallel Kinematic Machine Tool

    Institute of Scientific and Technical Information of China (English)

    HE Xiao-mei; DING Hong-sheng; FU Tie; XIE Dian-huang; XU Jin-zhong; LI Hua-feng; LIU Hui-lin

    2006-01-01

    A calibration method is presented to enhance the static accuracy of a parallel kinematic machine tool by using a coordinate measuring machine and a laser tracker. According to the established calibration model and the calibration experiment, the factual 42 kinematic parameters of BKX-I parallel kinematic machine tool are obtained. By circular tests the comparison is made between the calibrated and the uncalibrated parameters and shows that there is 80% improvement in accuracy of this machine tool.

  6. Ordered fast fourier transforms on a massively parallel hypercube multiprocessor

    Science.gov (United States)

    Tong, Charles; Swarztrauber, Paul N.

    1989-01-01

    Design alternatives for ordered Fast Fourier Transformation (FFT) algorithms were examined on massively parallel hypercube multiprocessors such as the Connection Machine. Particular emphasis is placed on reducing communication which is known to dominate the overall computing time. To this end, the order and computational phases of the FFT were combined, and the sequence to processor maps that reduce communication were used. The class of ordered transforms is expanded to include any FFT in which the order of the transform is the same as that of the input sequence. Two such orderings are examined, namely, standard-order and A-order which can be implemented with equal ease on the Connection Machine where orderings are determined by geometries and priorities. If the sequence has N = 2 exp r elements and the hypercube has P = 2 exp d processors, then a standard-order FFT can be implemented with d + r/2 + 1 parallel transmissions. An A-order sequence can be transformed with 2d - r/2 parallel transmissions which is r - d + 1 fewer than the standard order. A parallel method for computing the trigonometric coefficients is presented that does not use trigonometric functions or interprocessor communication. A performance of 0.9 GFLOPS was obtained for an A-order transform on the Connection Machine.

  7. Massively parallel sequencing of forensic STRs

    DEFF Research Database (Denmark)

    Parson, Walther; Ballard, David; Budowle, Bruce

    2016-01-01

    The DNA Commission of the International Society for Forensic Genetics (ISFG) is reviewing factors that need to be considered ahead of the adoption by the forensic community of short tandem repeat (STR) genotyping by massively parallel sequencing (MPS) technologies. MPS produces sequence data...... accessible genome assembly, and in place before the uptake of MPS by the general forensic community starts to generate sequence data on a large scale. While the established nomenclature for CE-based STR analysis will remain unchanged in the future, the nomenclature of sequence-based STR genotypes will need...

  8. Massively Parallel Direct Simulation of Multiphase Flow

    Energy Technology Data Exchange (ETDEWEB)

    COOK,BENJAMIN K.; PREECE,DALE S.; WILLIAMS,J.R.

    2000-08-10

    The authors understanding of multiphase physics and the associated predictive capability for multi-phase systems are severely limited by current continuum modeling methods and experimental approaches. This research will deliver an unprecedented modeling capability to directly simulate three-dimensional multi-phase systems at the particle-scale. The model solves the fully coupled equations of motion governing the fluid phase and the individual particles comprising the solid phase using a newly discovered, highly efficient coupled numerical method based on the discrete-element method and the Lattice-Boltzmann method. A massively parallel implementation will enable the solution of large, physically realistic systems.

  9. Parallel Machine Scheduling with Special Jobs

    Institute of Scientific and Technical Information of China (English)

    2006-01-01

    This paper considers parallel machine scheduling with special jobs. Normal jobs can be processed on any of the parallel machines, while the special jobs can only be processed on one machine. The problem is analyzed for various manufacturing conditions and service requirements. The off-line scheduling problem is transformed into a classical parallel machine scheduling problem. The on-line scheduling uses the FCFS (first come, first served), SWSC (special window for special customers), and FFFS (first fit, first served) algorithms to satisfy the various requirements. Furthermore, this paper proves that FCFS has a competitive ratio of m, where m is the number of parallel machines, and this bound is asymptotically tight, SWSC has a competitive ratio of 2 and FFFS has a competitive ratio of , and these bounds are tight.

  10. MASSIVE HYBRID PARALLELISM FOR FULLY IMPLICIT MULTIPHYSICS

    Energy Technology Data Exchange (ETDEWEB)

    Cody J. Permann; David Andrs; John W. Peterson; Derek R. Gaston

    2013-05-01

    As hardware advances continue to modify the supercomputing landscape, traditional scientific software development practices will become more outdated, ineffective, and inefficient. The process of rewriting/retooling existing software for new architectures is a Sisyphean task, and results in substantial hours of development time, effort, and money. Software libraries which provide an abstraction of the resources provided by such architectures are therefore essential if the computational engineering and science communities are to continue to flourish in this modern computing environment. The Multiphysics Object Oriented Simulation Environment (MOOSE) framework enables complex multiphysics analysis tools to be built rapidly by scientists, engineers, and domain specialists, while also allowing them to both take advantage of current HPC architectures, and efficiently prepare for future supercomputer designs. MOOSE employs a hybrid shared-memory and distributed-memory parallel model and provides a complete and consistent interface for creating multiphysics analysis tools. In this paper, a brief discussion of the mathematical algorithms underlying the framework and the internal object-oriented hybrid parallel design are given. Representative massively parallel results from several applications areas are presented, and a brief discussion of future areas of research for the framework are provided.

  11. Parallel Impurity Spreading During Massive Gas Injection

    Science.gov (United States)

    Izzo, V. A.

    2016-10-01

    Extended-MHD simulations of disruption mitigation in DIII-D demonstrate that both pre-existing islands (locked-modes) and plasma rotation can significantly influence toroidal spreading of impurities following massive gas injection (MGI). Given the importance of successful disruption mitigation in ITER and the large disparity in device parameters, empirical demonstrations of disruption mitigation strategies in present tokamaks are insufficient to inspire unreserved confidence for ITER. Here, MHD simulations elucidate how impurities injected as a localized jet spread toroidally and poloidally. Simulations with large pre-existing islands at the q = 2 surface reveal that the magnetic topology strongly influences the rate of impurity spreading parallel to the field lines. Parallel spreading is largely driven by rapid parallel heat conduction, and is much faster at low order rational surfaces, where a short parallel connection length leads to faster thermal equilibration. Consequently, the presence of large islands, which alter the connection length, can slow impurity transport; but the simulations also show that the appearance of a 4/2 harmonic of the 2/1 mode, which breaks up the large islands, can increase the rate of spreading. This effect is seen both for simulations with spontaneously growing and directly imposed 4/2 modes. Given the prevalence of locked-modes as a cause of disruptions, understanding the effect of large islands is of particular importance. Simulations with and without islands also show that rotation can alter impurity spreading, even reversing the predominant direction of spreading, which is toward the high-field-side in the absence of rotation. Given expected differences in rotation for ITER vs. DIII-D, rotation effects are another important consideration when extrapolating experimental results. Work supported by US DOE under DE-FG02-95ER54309.

  12. Broadband monitoring simulation with massively parallel processors

    Science.gov (United States)

    Trubetskov, Mikhail; Amotchkina, Tatiana; Tikhonravov, Alexander

    2011-09-01

    Modern efficient optimization techniques, namely needle optimization and gradual evolution, enable one to design optical coatings of any type. Even more, these techniques allow obtaining multiple solutions with close spectral characteristics. It is important, therefore, to develop software tools that can allow one to choose a practically optimal solution from a wide variety of possible theoretical designs. A practically optimal solution provides the highest production yield when optical coating is manufactured. Computational manufacturing is a low-cost tool for choosing a practically optimal solution. The theory of probability predicts that reliable production yield estimations require many hundreds or even thousands of computational manufacturing experiments. As a result reliable estimation of the production yield may require too much computational time. The most time-consuming operation is calculation of the discrepancy function used by a broadband monitoring algorithm. This function is formed by a sum of terms over wavelength grid. These terms can be computed simultaneously in different threads of computations which opens great opportunities for parallelization of computations. Multi-core and multi-processor systems can provide accelerations up to several times. Additional potential for further acceleration of computations is connected with using Graphics Processing Units (GPU). A modern GPU consists of hundreds of massively parallel processors and is capable to perform floating-point operations efficiently.

  13. Scalable Machine Learning for Massive Astronomical Datasets

    Science.gov (United States)

    Ball, Nicholas M.; Gray, A.

    2014-04-01

    We present the ability to perform data mining and machine learning operations on a catalog of half a billion astronomical objects. This is the result of the combination of robust, highly accurate machine learning algorithms with linear scalability that renders the applications of these algorithms to massive astronomical data tractable. We demonstrate the core algorithms kernel density estimation, K-means clustering, linear regression, nearest neighbors, random forest and gradient-boosted decision tree, singular value decomposition, support vector machine, and two-point correlation function. Each of these is relevant for astronomical applications such as finding novel astrophysical objects, characterizing artifacts in data, object classification (including for rare objects), object distances, finding the important features describing objects, density estimation of distributions, probabilistic quantities, and exploring the unknown structure of new data. The software, Skytree Server, runs on any UNIX-based machine, a virtual machine, or cloud-based and distributed systems including Hadoop. We have integrated it on the cloud computing system of the Canadian Astronomical Data Centre, the Canadian Advanced Network for Astronomical Research (CANFAR), creating the world's first cloud computing data mining system for astronomy. We demonstrate results showing the scaling of each of our major algorithms on large astronomical datasets, including the full 470,992,970 objects of the 2 Micron All-Sky Survey (2MASS) Point Source Catalog. We demonstrate the ability to find outliers in the full 2MASS dataset utilizing multiple methods, e.g., nearest neighbors. This is likely of particular interest to the radio astronomy community given, for example, that survey projects contain groups dedicated to this topic. 2MASS is used as a proof-of-concept dataset due to its convenience and availability. These results are of interest to any astronomical project with large and/or complex

  14. Parallelization of TMVA Machine Learning Algorithms

    CERN Document Server

    Hajili, Mammad

    2017-01-01

    This report reflects my work on Parallelization of TMVA Machine Learning Algorithms integrated to ROOT Data Analysis Framework during summer internship at CERN. The report consists of 4 impor- tant part - data set used in training and validation, algorithms that multiprocessing applied on them, parallelization techniques and re- sults of execution time changes due to number of workers.

  15. Multiplexed microsatellite recovery using massively parallel sequencing.

    Science.gov (United States)

    Jennings, T N; Knaus, B J; Mullins, T D; Haig, S M; Cronn, R C

    2011-11-01

    Conservation and management of natural populations requires accurate and inexpensive genotyping methods. Traditional microsatellite, or simple sequence repeat (SSR), marker analysis remains a popular genotyping method because of the comparatively low cost of marker development, ease of analysis and high power of genotype discrimination. With the availability of massively parallel sequencing (MPS), it is now possible to sequence microsatellite-enriched genomic libraries in multiplex pools. To test this approach, we prepared seven microsatellite-enriched, barcoded genomic libraries from diverse taxa (two conifer trees, five birds) and sequenced these on one lane of the Illumina Genome Analyzer using paired-end 80-bp reads. In this experiment, we screened 6.1 million sequences and identified 356,958 unique microreads that contained di- or trinucleotide microsatellites. Examination of four species shows that our conversion rate from raw sequences to polymorphic markers compares favourably to Sanger- and 454-based methods. The advantage of multiplexed MPS is that the staggering capacity of modern microread sequencing is spread across many libraries; this reduces sample preparation and sequencing costs to less than $400 (USD) per species. This price is sufficiently low that microsatellite libraries could be prepared and sequenced for all 1373 organisms listed as 'threatened' and 'endangered' in the United States for under $0.5 M (USD).

  16. Massively Parallel Dimension Independent Adaptive Metropolis

    KAUST Repository

    Chen, Yuxin

    2015-05-14

    This work considers black-box Bayesian inference over high-dimensional parameter spaces. The well-known and widely respected adaptive Metropolis (AM) algorithm is extended herein to asymptotically scale uniformly with respect to the underlying parameter dimension, by respecting the variance, for Gaussian targets. The result- ing algorithm, referred to as the dimension-independent adaptive Metropolis (DIAM) algorithm, also shows improved performance with respect to adaptive Metropolis on non-Gaussian targets. This algorithm is further improved, and the possibility of probing high-dimensional targets is enabled, via GPU-accelerated numerical libraries and periodically synchronized concurrent chains (justified a posteriori). Asymptoti- cally in dimension, this massively parallel dimension-independent adaptive Metropolis (MPDIAM) GPU implementation exhibits a factor of four improvement versus the CPU-based Intel MKL version alone, which is itself already a factor of three improve- ment versus the serial version. The scaling to multiple CPUs and GPUs exhibits a form of strong scaling in terms of the time necessary to reach a certain convergence criterion, through a combination of longer time per sample batch (weak scaling) and yet fewer necessary samples to convergence. This is illustrated by e ciently sampling from several Gaussian and non-Gaussian targets for dimension d 1000.

  17. Programming massively parallel processors a hands-on approach

    CERN Document Server

    Kirk, David B

    2010-01-01

    Programming Massively Parallel Processors discusses basic concepts about parallel programming and GPU architecture. ""Massively parallel"" refers to the use of a large number of processors to perform a set of computations in a coordinated parallel way. The book details various techniques for constructing parallel programs. It also discusses the development process, performance level, floating-point format, parallel patterns, and dynamic parallelism. The book serves as a teaching guide where parallel programming is the main topic of the course. It builds on the basics of C programming for CUDA, a parallel programming environment that is supported on NVI- DIA GPUs. Composed of 12 chapters, the book begins with basic information about the GPU as a parallel computer source. It also explains the main concepts of CUDA, data parallelism, and the importance of memory access efficiency using CUDA. The target audience of the book is graduate and undergraduate students from all science and engineering disciplines who ...

  18. Stiffness estimation of a parallel kinematic machine

    Institute of Scientific and Technical Information of China (English)

    2001-01-01

    This paper presents a simple yet comprehensive approach to quickly estimating the stiff-ness of a tripod-based parallel kinematic machine. This approach can be implemented in two steps. Inthe first step, the machine structure is decomposed into two substructures associated with the machineframe and parallel mechanism. The stiffness models of these two substructures are formulated bymeans of virtual work principle. This is followed by the second step that enables the stiffness model ofthe machine structure as a whole to be achieved by linear superposition. The 3D representations of themachine stiffness within the usable workspace are depicted and the contributions of different componentrigidities to the machine stiffness are discussed. The result is compared with that obtained through finiteelement analysis.

  19. Implementation of QR up- and downdating on a massively parallel |computer

    DEFF Research Database (Denmark)

    Bendtsen, Claus; Hansen, Per Christian; Madsen, Kaj;

    1995-01-01

    We describe an implementation of QR up- and downdating on a massively parallel computer (the Connection Machine CM-200) and show that the algorithm maps well onto the computer. In particular, we show how the use of corrected semi-normal equations for downdating can be efficiently implemented. We...

  20. Scheduling identical jobs on uniform parallel machines

    NARCIS (Netherlands)

    M. Dessouky (Mohamed); B. Lageweg (Ben); J.K. Lenstra; S.L. van de Velde (Steef)

    1990-01-01

    textabstractWe address the problem of scheduling n identical jobs on m uniform parallel machines to optimize scheduling criteria that are nondecreasing in the job completion times. It is well known that this can be formulated as a linear assignment problem, and subsequently solved in O(n3) time. We

  1. The EMCC / DARPA Massively Parallel Electromagnetic Scattering Project

    Science.gov (United States)

    Woo, Alex C.; Hill, Kueichien C.

    1996-01-01

    The Electromagnetic Code Consortium (EMCC) was sponsored by the Advanced Research Program Agency (ARPA) to demonstrate the effectiveness of massively parallel computing in large scale radar signature predictions. The EMCC/ARPA project consisted of three parts.

  2. Some Massively Parallel Algorithms from Nature

    Institute of Scientific and Technical Information of China (English)

    2002-01-01

    We introduced the work on parallel problem solvers from physics and biology being developedby the research team at the State Key Laboratory of Software Engineering, Wuhan University. Results onparallel solvers include the following areas:Evolutionary algorithms based on imitating the evolution pro-cesses of nature for parallel problem solving, especially for parallel optimization and model-building;Asynchronous parallel algorithms based on domain decomposition which are inspired by physical analogiessuch as elastic relaxation process and annealing process , for scientific computations, especially for solv-ing nonlinear mathematical physics problems. All these algorithms have the following common characteris-tics: inherent parallelism, self-adaptation and self-organization, because the basic ideas of these solversare from imitating the natural evolutionary processes.

  3. Development of a Massively Parallel NOGAPS Forecast Model

    Science.gov (United States)

    2016-06-07

    parallel computer architectures. These algorithms will be critical for inter- processor communication dependent and computationally intensive model...to exploit massively parallel processor (MPP), distributed memory computer architectures. Future increases in computer power from MPP’s will allow...passing (MPI) is the paradigm chosen for communication between distributed memory processors. APPROACH Use integrations of the current operational

  4. Multilanguage parallel programming of heterogeneous machines

    Energy Technology Data Exchange (ETDEWEB)

    Bisiani, R.; Forin, A.

    1988-08-01

    The authors designed and implemented a system, Agora, that supports the development of multilanguage parallel applications for heterogeneous machines. Agora hinges on two ideas: the first one is that shared memory can be a suitable abstraction to program concurrent, multilanguage modules running on heterogeneous machines. The second one is that a shared memory abstraction can efficiently supported across different computer architectures that are not connected by a physical shared memory, for example local are network workstations or ensemble machines. Agora has been in use for more than a year. This paper describes the Agora shared memory and its software implementation on both tightly and loosely coupled architectures. Measurements of the current implementation are also included.

  5. Parallel machine scheduling with a common server

    Energy Technology Data Exchange (ETDEWEB)

    Hall, N.; Sriskandarajah, C.; Potts, C.

    1994-12-31

    This paper considers the nonpreemptive scheduling of a given set of jobs on several identical, parallel machines. Each job must be processed on one of the machines. Prior to processing, a job must be loaded (setup) by a single server onto the relevant machine. The server may be a human operator, a robot, or a piece of specialized equipment. We study a number of classical scheduling objectives in this environment, including makespan, maximum lateness, the sum of completion times, the number of late jobs, and total tardiness, as well as weighted versions of some of these. The number of machines may be constant or arbitrary. Setup times may be unit, equal, or arbitrary. Processing times may be unit or arbitrary. For each problem considered, we attempt to provide either an efficient algorithm, or a proof that such an algorithm is unlikely to exist. Our results provide a mapping of the computational complexity of these problems. Included in these results are generalizations of the classical algorithms of Moore, Lawler and Moore and Lawler. In addition, we describe two heuristics for makespan scheduling in this environment, and provide an exact analysis of their worst-case performance.

  6. Parallelization of the ROOT Machine Learning Methods

    CERN Document Server

    Vakilipourtakalou, Pourya

    2016-01-01

    Today computation is an inseparable part of scientific research. Specially in Particle Physics when there is a classification problem like discrimination of Signals from Backgrounds originating from the collisions of particles. On the other hand, Monte Carlo simulations can be used in order to generate a known data set of Signals and Backgrounds based on theoretical physics. The aim of Machine Learning is to train some algorithms on known data set and then apply these trained algorithms to the unknown data sets. However, the most common framework for data analysis in Particle Physics is ROOT. In order to use Machine Learning methods, a Toolkit for Multivariate Data Analysis (TMVA) has been added to ROOT. The major consideration in this report is the parallelization of some TMVA methods, specially Cross-Validation and BDT.

  7. Scheduling and Subcontracting under Parallel Machines

    Institute of Scientific and Technical Information of China (English)

    CHEN Rong-jun; TANG Guo-chun

    2012-01-01

    In this paper,we study a model on joint decisions of scheduling and subcontracting,in which jobs(orders) can be either processed by parallel machines at the manufacturer in-house or subcontracted to a subcontractor.The manufacturer needs to determine which jobs should be produced in-house and which jobs should be subcontracted.Furthermore,it needs to determine a production schedule for jobs to be produced in-house.We discuss five classical scheduling objectives as production costs.For each problem with different objective functions,we give optimality conditions and propose dynamic programming algorithms.

  8. Linear Time Algorithms for Parallel Machine Scheduling

    Institute of Scientific and Technical Information of China (English)

    Zhi Yi TAN; Yong HE

    2007-01-01

    This paper addresses linear time algorithms for parallel machine scheduling problems. We introduce a kind of threshold algorithms and discuss their main features. Three linear time threshold algorithm classes DT, PT and DTm are studied thoroughly. For all classes, we study their best possible algorithms among each class. We also present their application to several scheduling problems.The new algorithms are better than classical algorithms in time complexity and/or worst-case ratio.Computer-aided proof technique is used in the proof of main results, which greatly simplifies the proof and decreases case by case analysis.

  9. Verifiable Computation with Massively Parallel Interactive Proofs

    CERN Document Server

    Thaler, Justin; Mitzenmacher, Michael; Pfister, Hanspeter

    2012-01-01

    As the cloud computing paradigm has gained prominence, the need for verifiable computation has grown increasingly urgent. The concept of verifiable computation enables a weak client to outsource difficult computations to a powerful, but untrusted, server. Protocols for verifiable computation aim to provide the client with a guarantee that the server performed the requested computations correctly, without requiring the client to perform the computations herself. By design, these protocols impose a minimal computational burden on the client. However, existing protocols require the server to perform a large amount of extra bookkeeping in order to enable a client to easily verify the results. Verifiable computation has thus remained a theoretical curiosity, and protocols for it have not been implemented in real cloud computing systems. Our goal is to leverage GPUs to reduce the server-side slowdown for verifiable computation. To this end, we identify abundant data parallelism in a state-of-the-art general-purpose...

  10. Fast parallel Markov clustering in bioinformatics using massively parallel computing on GPU with CUDA and ELLPACK-R sparse format.

    Science.gov (United States)

    Bustamam, Alhadi; Burrage, Kevin; Hamilton, Nicholas A

    2012-01-01

    Markov clustering (MCL) is becoming a key algorithm within bioinformatics for determining clusters in networks. However,with increasing vast amount of data on biological networks, performance and scalability issues are becoming a critical limiting factor in applications. Meanwhile, GPU computing, which uses CUDA tool for implementing a massively parallel computing environment in the GPU card, is becoming a very powerful, efficient, and low-cost option to achieve substantial performance gains over CPU approaches. The use of on-chip memory on the GPU is efficiently lowering the latency time, thus, circumventing a major issue in other parallel computing environments, such as MPI. We introduce a very fast Markov clustering algorithm using CUDA (CUDA-MCL) to perform parallel sparse matrix-matrix computations and parallel sparse Markov matrix normalizations, which are at the heart of MCL. We utilized ELLPACK-R sparse format to allow the effective and fine-grain massively parallel processing to cope with the sparse nature of interaction networks data sets in bioinformatics applications. As the results show, CUDA-MCL is significantly faster than the original MCL running on CPU. Thus, large-scale parallel computation on off-the-shelf desktop-machines, that were previously only possible on supercomputing architectures, can significantly change the way bioinformaticians and biologists deal with their data.

  11. Duality-based algorithms for scheduling unrelated parallel machines

    NARCIS (Netherlands)

    S.L. van de Velde (Steef)

    1993-01-01

    textabstractWe consider the following parallel machine scheduling problem. Each of n independent jobs has to be scheduled on one of m unrelated parallel machines. The processing of job J[sub l] on machine Mi requires an uninterrupted period of positive length p[sub lj]. The objective is to find an a

  12. Performance analysis of active schedules in identical parallel machine

    Institute of Scientific and Technical Information of China (English)

    Changjun WANG; Yugeng XI

    2007-01-01

    Active schedule is one of the most basic and popular concepts in production scheduling research. For identical parallel machine scheduling with jobs' dynamic arrivals, the tight performance bounds of active schedules under the measurement of four popular objectives are respectively given in this paper. Similar analysis method and conclusions can be generalized to static identical parallel machine and single machine scheduling problem.

  13. Parallel optimization of pixel purity index algorithm for massive hyperspectral images in cloud computing environment

    Science.gov (United States)

    Chen, Yufeng; Wu, Zebin; Sun, Le; Wei, Zhihui; Li, Yonglong

    2016-04-01

    With the gradual increase in the spatial and spectral resolution of hyperspectral images, the size of image data becomes larger and larger, and the complexity of processing algorithms is growing, which poses a big challenge to efficient massive hyperspectral image processing. Cloud computing technologies distribute computing tasks to a large number of computing resources for handling large data sets without the limitation of memory and computing resource of a single machine. This paper proposes a parallel pixel purity index (PPI) algorithm for unmixing massive hyperspectral images based on a MapReduce programming model for the first time in the literature. According to the characteristics of hyperspectral images, we describe the design principle of the algorithm, illustrate the main cloud unmixing processes of PPI, and analyze the time complexity of serial and parallel algorithms. Experimental results demonstrate that the parallel implementation of the PPI algorithm on the cloud can effectively process big hyperspectral data and accelerate the algorithm.

  14. Performance of Air Pollution Models on Massively Parallel Computers

    DEFF Research Database (Denmark)

    Brown, John; Hansen, Per Christian; Wasniewski, Jerzy

    1996-01-01

    To compare the performance and use of three massively parallel SIMD computers, we implemented a large air pollution model on the computers. Using a realistic large-scale model, we gain detailed insight about the performance of the three computers when used to solve large-scale scientific problems...

  15. Design and Performance Analysis of a Massively Parallel Atmospheric General Circulation Model

    Directory of Open Access Journals (Sweden)

    Daniel S. Schaffer

    2000-01-01

    Full Text Available In the 1990's, computer manufacturers are increasingly turning to the development of parallel processor machines to meet the high performance needs of their customers. Simultaneously, atmospheric scientists studying weather and climate phenomena ranging from hurricanes to El Niño to global warming require increasingly fine resolution models. Here, implementation of a parallel atmospheric general circulation model (GCM which exploits the power of massively parallel machines is described. Using the horizontal data domain decomposition methodology, this FORTRAN 90 model is able to integrate a 0.6° longitude by 0.5° latitude problem at a rate of 19 Gigaflops on 512 processors of a Cray T3E 600; corresponding to 280 seconds of wall-clock time per simulated model day. At this resolution, the model has 64 times as many degrees of freedom and performs 400 times as many floating point operations per simulated day as the model it replaces.

  16. Salinas - An implicit finite element structural dynamics code developed for massively parallel platforms

    Energy Technology Data Exchange (ETDEWEB)

    BHARDWAJ, MANLJ K.; REESE,GARTH M.; DRIESSEN,BRIAN; ALVIN,KENNETH F.; DAY,DAVID M.

    2000-04-06

    As computational needs for structural finite element analysis increase, a robust implicit structural dynamics code is needed which can handle millions of degrees of freedom in the model and produce results with quick turn around time. A parallel code is needed to avoid limitations of serial platforms. Salinas is an implicit structural dynamics code specifically designed for massively parallel platforms. It computes the structural response of very large complex structures and provides solutions faster than any existing serial machine. This paper gives a current status of Salinas and uses demonstration problems to show Salinas' performance.

  17. Kinematic Analysis of a New Parallel Machine Tool: the Orthoglide

    CERN Document Server

    Wenger, Philippe

    2007-01-01

    This paper describes a new parallel kinematic architecture for machining applications: the orthoglide. This machine features three fixed parallel linear joints which are mounted orthogonally and a mobile platform which moves in the Cartesian x-y-z space with fixed orientation. The main interest of the orthoglide is that it takes benefit from the advantages of the popular PPP serial machines (regular Cartesian workspace shape and uniform performances) as well as from the parallel kinematic arrangement of the links (less inertia and better dynamic performances), which makes the orthoglide well suited to high-speed machining applications. Possible extension of the orthoglide to 5-axis machining is also investigated.

  18. A New Three-DOF Parallel Mechanism: Milling Machine Applications

    CERN Document Server

    Chablat, Damien

    2000-01-01

    This paper describes a new parallel kinematic architecture for machining applications, namely, the orthoglide. This machine features three fixed parallel linear joints which are mounted orthogonally and a mobile platform which moves in the Cartesian x-y-z space with fixed orientation. The main interest of the orthoglide is that it takes benefit from the advantages of the popular PPP serial machines (regular Cartesian workspace shape and uniform performances) as well as from the parallel kinematic arrangement of the links (less inertia and better dynamic performances), which makes the orthoglide well suited to high-speed machining applications. Possible extension of the orthoglide to 5-axis machining is also investigated.

  19. Multigene amplification and massively parallel sequencing for cancer mutation discovery

    Science.gov (United States)

    Dahl, Fredrik; Stenberg, Johan; Fredriksson, Simon; Welch, Katrina; Zhang, Michael; Nilsson, Mats; Bicknell, David; Bodmer, Walter F.; Davis, Ronald W.; Ji, Hanlee

    2007-01-01

    We have developed a procedure for massively parallel resequencing of multiple human genes by combining a highly multiplexed and target-specific amplification process with a high-throughput parallel sequencing technology. The amplification process is based on oligonucleotide constructs, called selectors, that guide the circularization of specific DNA target regions. Subsequently, the circularized target sequences are amplified in multiplex and analyzed by using a highly parallel sequencing-by-synthesis technology. As a proof-of-concept study, we demonstrate parallel resequencing of 10 cancer genes covering 177 exons with average sequence coverage per sample of 93%. Seven cancer cell lines and one normal genomic DNA sample were studied with multiple mutations and polymorphisms identified among the 10 genes. Mutations and polymorphisms in the TP53 gene were confirmed by traditional sequencing. PMID:17517648

  20. Development of massively parallel quantum chemistry program SMASH

    Energy Technology Data Exchange (ETDEWEB)

    Ishimura, Kazuya [Department of Theoretical and Computational Molecular Science, Institute for Molecular Science 38 Nishigo-Naka, Myodaiji, Okazaki, Aichi 444-8585 (Japan)

    2015-12-31

    A massively parallel program for quantum chemistry calculations SMASH was released under the Apache License 2.0 in September 2014. The SMASH program is written in the Fortran90/95 language with MPI and OpenMP standards for parallelization. Frequently used routines, such as one- and two-electron integral calculations, are modularized to make program developments simple. The speed-up of the B3LYP energy calculation for (C{sub 150}H{sub 30}){sub 2} with the cc-pVDZ basis set (4500 basis functions) was 50,499 on 98,304 cores of the K computer.

  1. Massively parallel Wang-Landau sampling on multiple GPUs

    Science.gov (United States)

    Yin, Junqi; Landau, D. P.

    2012-08-01

    Wang-Landau sampling is implemented on the Graphics Processing Unit (GPU) with the Compute Unified Device Architecture (CUDA). Performances on three different GPU cards, including the new generation Fermi architecture card, are compared with that on a Central Processing Unit (CPU). The parameters for massively parallel Wang-Landau sampling are tuned in order to achieve fast convergence. For simulations of the water cluster systems, we obtain an average of over 50 times speedup for a given workload.

  2. Increasing the reach of forensic genetics with massively parallel sequencing.

    Science.gov (United States)

    Budowle, Bruce; Schmedes, Sarah E; Wendt, Frank R

    2017-06-19

    The field of forensic genetics has made great strides in the analysis of biological evidence related to criminal and civil matters. More so, the discipline has set a standard of performance and quality in the forensic sciences. The advent of massively parallel sequencing will allow the field to expand its capabilities substantially. This review describes the salient features of massively parallel sequencing and how it can impact forensic genetics. The features of this technology offer increased number and types of genetic markers that can be analyzed, higher throughput of samples, and the capability of targeting different organisms, all by one unifying methodology. While there are many applications, three are described where massively parallel sequencing will have immediate impact: molecular autopsy, microbial forensics and differentiation of monozygotic twins. The intent of this review is to expose the forensic science community to the potential enhancements that have or are soon to arrive and demonstrate the continued expansion the field of forensic genetics and its service in the investigation of legal matters.

  3. Requirements for supercomputing in energy research: The transition to massively parallel computing

    Energy Technology Data Exchange (ETDEWEB)

    1993-02-01

    This report discusses: The emergence of a practical path to TeraFlop computing and beyond; requirements of energy research programs at DOE; implementation: supercomputer production computing environment on massively parallel computers; and implementation: user transition to massively parallel computing.

  4. Parallel Machine Problems with a Single Server and Release Times

    Institute of Scientific and Technical Information of China (English)

    SHI Ling

    2005-01-01

    Parallel machine problems with a single server and release times are generalizations of classical parallel machine problems.Before processing, each job must be loaded on a machine, which takes a certain release times and a certain setup times.All these setups have to be done by a single server, which can handle at most one job at a time.In this paper, we continue studying the complexity result for parallel machine problem with a single and release times.New complexity results are derived for special cases.

  5. Engineering-Based Thermal CFD Simulations on Massive Parallel Systems

    Directory of Open Access Journals (Sweden)

    Jérôme Frisch

    2015-05-01

    Full Text Available The development of parallel Computational Fluid Dynamics (CFD codes is a challenging task that entails efficient parallelization concepts and strategies in order to achieve good scalability values when running those codes on modern supercomputers with several thousands to millions of cores. In this paper, we present a hierarchical data structure for massive parallel computations that supports the coupling of a Navier–Stokes-based fluid flow code with the Boussinesq approximation in order to address complex thermal scenarios for energy-related assessments. The newly designed data structure is specifically designed with the idea of interactive data exploration and visualization during runtime of the simulation code; a major shortcoming of traditional high-performance computing (HPC simulation codes. We further show and discuss speed-up values obtained on one of Germany’s top-ranked supercomputers with up to 140,000 processes and present simulation results for different engineering-based thermal problems.

  6. Engineering-Based Thermal CFD Simulations on Massive Parallel Systems

    KAUST Repository

    Frisch, Jérôme

    2015-05-22

    The development of parallel Computational Fluid Dynamics (CFD) codes is a challenging task that entails efficient parallelization concepts and strategies in order to achieve good scalability values when running those codes on modern supercomputers with several thousands to millions of cores. In this paper, we present a hierarchical data structure for massive parallel computations that supports the coupling of a Navier–Stokes-based fluid flow code with the Boussinesq approximation in order to address complex thermal scenarios for energy-related assessments. The newly designed data structure is specifically designed with the idea of interactive data exploration and visualization during runtime of the simulation code; a major shortcoming of traditional high-performance computing (HPC) simulation codes. We further show and discuss speed-up values obtained on one of Germany’s top-ranked supercomputers with up to 140,000 processes and present simulation results for different engineering-based thermal problems.

  7. Massively parallel processing of remotely sensed hyperspectral images

    Science.gov (United States)

    Plaza, Javier; Plaza, Antonio; Valencia, David; Paz, Abel

    2009-08-01

    In this paper, we develop several parallel techniques for hyperspectral image processing that have been specifically designed to be run on massively parallel systems. The techniques developed cover the three relevant areas of hyperspectral image processing: 1) spectral mixture analysis, a popular approach to characterize mixed pixels in hyperspectral data addressed in this work via efficient implementation of a morphological algorithm for automatic identification of pure spectral signatures or endmembers from the input data; 2) supervised classification of hyperspectral data using multi-layer perceptron neural networks with back-propagation learning; and 3) automatic target detection in the hyperspectral data using orthogonal subspace projection concepts. The scalability of the proposed parallel techniques is investigated using Barcelona Supercomputing Center's MareNostrum facility, one of the most powerful supercomputers in Europe.

  8. Generic, hierarchical framework for massively parallel Wang-Landau sampling.

    Science.gov (United States)

    Vogel, Thomas; Li, Ying Wai; Wüst, Thomas; Landau, David P

    2013-05-24

    We introduce a parallel Wang-Landau method based on the replica-exchange framework for Monte Carlo simulations. To demonstrate its advantages and general applicability for simulations of complex systems, we apply it to different spin models including spin glasses, the Ising model, and the Potts model, lattice protein adsorption, and the self-assembly process in amphiphilic solutions. Without loss of accuracy, the method gives significant speed-up and potentially scales up to petaflop machines.

  9. Generic, Hierarchical Framework for Massively Parallel Wang-Landau Sampling

    Science.gov (United States)

    Vogel, Thomas; Li, Ying Wai; Wüst, Thomas; Landau, David P.

    2013-05-01

    We introduce a parallel Wang-Landau method based on the replica-exchange framework for Monte Carlo simulations. To demonstrate its advantages and general applicability for simulations of complex systems, we apply it to different spin models including spin glasses, the Ising model, and the Potts model, lattice protein adsorption, and the self-assembly process in amphiphilic solutions. Without loss of accuracy, the method gives significant speed-up and potentially scales up to petaflop machines.

  10. A generic, hierarchical framework for massively parallel Wang Landau sampling

    Energy Technology Data Exchange (ETDEWEB)

    Vogel, Thomas [University of Georgia, Athens, GA; Li, Ying Wai [ORNL; Wuest, Thomas [Swiss Federal Research Institute, Switzerland; Landau, David P [University of Georgia, Athens, GA

    2013-01-01

    We introduce a parallel Wang Landau method based on the replica-exchange framework for Monte Carlo simulations. To demonstrate its advantages and general applicability for simulations of com- plex systems, we apply it to the self-assembly process in amphiphilic solutions and to lattice protein adsorption. Without loss of accuracy, the method gives significant speed-up on small architectures like multi-core processors, and should be beneficial for petaflop machines.

  11. Scientific programming on massively parallel processor CP-PACS

    Energy Technology Data Exchange (ETDEWEB)

    Boku, Taisuke [Tsukuba Univ., Ibaraki (Japan). Inst. of Information Sciences and Electronics

    1998-03-01

    The massively parallel processor CP-PACS takes various problems of calculation physics as the object, and it has been designed so that its architecture has been devised to do various numerical processings. In this report, the outline of the CP-PACS and the example of programming in the Kernel CG benchmark in NAS Parallel Benchmarks, version 1, are shown, and the pseudo vector processing mechanism and the parallel processing tuning of scientific and technical computation utilizing the three-dimensional hyper crossbar net, which are two great features of the architecture of the CP-PACS are described. As for the CP-PACS, the PUs based on RISC processor and added with pseudo vector processor are used. Pseudo vector processing is realized as the loop processing by scalar command. The features of the connection net of PUs are explained. The algorithm of the NPB version 1 Kernel CG is shown. The part that takes the time for processing most in the main loop is the product of matrix and vector (matvec), and the parallel processing of the matvec is explained. The time for the computation by the CPU is determined. As the evaluation of the performance, the evaluation of the time for execution, the short vector processing of pseudo vector processor based on slide window, and the comparison with other parallel computers are reported. (K.I.)

  12. Routing performance analysis and optimization within a massively parallel computer

    Science.gov (United States)

    Archer, Charles Jens; Peters, Amanda; Pinnow, Kurt Walter; Swartz, Brent Allen

    2013-04-16

    An apparatus, program product and method optimize the operation of a massively parallel computer system by, in part, receiving actual performance data concerning an application executed by the plurality of interconnected nodes, and analyzing the actual performance data to identify an actual performance pattern. A desired performance pattern may be determined for the application, and an algorithm may be selected from among a plurality of algorithms stored within a memory, the algorithm being configured to achieve the desired performance pattern based on the actual performance data.

  13. Local Search Method for a Parallel Machine Scheduling Problemof Minimizing the Number of Machines Operated

    Science.gov (United States)

    Yamana, Takashi; Iima, Hitoshi; Sannomiya, Nobuo

    Although there have been many studies on parallel machine scheduling problems, the number of machines operated is fixed in these studies. It is desirable to generate a schedule with fewer machines operated from the viewpoint of the operation cost of machines. In this paper, we cope with a problem of minimizing the number of parallel machines subject to the constraint that the total tardiness is not greater than the value given in advance. For this problem, we introduce a local search method in which the number of machines operated is changed efficiently and appropriately in a short time as well as reducing the total tardiness.

  14. Massively parallel single-molecule manipulation using centrifugal force

    CERN Document Server

    Halvorsen, Ken

    2009-01-01

    Precise manipulation of single molecules has already led to remarkable insights in physics, chemistry, biology and medicine. However, widespread adoption of single-molecule techniques has been impeded by equipment cost and the laborious nature of making measurements one molecule at a time. We have solved these issues with a new approach: massively parallel single-molecule force measurements using centrifugal force. This approach is realized in a novel instrument that we call the Centrifuge Force Microscope (CFM), in which objects in an orbiting sample are subjected to a calibration-free, macroscopically uniform force-field while their micro-to-nanoscopic motions are observed. We demonstrate high-throughput single-molecule force spectroscopy with this technique by performing thousands of rupture experiments in parallel, characterizing force-dependent unbinding kinetics of an antibody-antigen pair in minutes rather than days. Additionally, we verify the force accuracy of the instrument by measuring the well-est...

  15. Solving the Stokes problem on a massively parallel computer

    DEFF Research Database (Denmark)

    Axelsson, Owe; Barker, Vincent A.; Neytcheva, Maya

    2001-01-01

    We describe a numerical procedure for solving the stationary two‐dimensional Stokes problem based on piecewise linear finite element approximations for both velocity and pressure, a regularization technique for stability, and a defect‐correction technique for improving accuracy. Eliminating...... boundary value problem for each velocity component, are solved by the conjugate gradient method with a preconditioning based on the algebraic multi‐level iteration (AMLI) technique. The velocity is found from the computed pressure. The method is optimal in the sense that the computational work...... is proportional to the number of unknowns. Further, it is designed to exploit a massively parallel computer with distributed memory architecture. Numerical experiments on a Cray T3E computer illustrate the parallel performance of the method....

  16. A conventional, massively parallel eigensolver for electronic structure theory

    Energy Technology Data Exchange (ETDEWEB)

    Blum, V.; Scheffler, M. [Fritz Haber Institute, Berlin (Germany); Johanni, R.; Lederer, H. [RZ Garching (Germany); Auckenthaler, T.; Huckle, T.; Bungartz, H.J. [TU Muenchen (Germany); Kraemer, L.; Willems, P.; Lang, B. [BU Wuppertal (Germany); Havu, V. [Aalto University, Helsinki (Finland)

    2011-07-01

    We demonstrate a robust large-scale, massively parallel conventional eigensolver for first-principles theory of molecules and materials. Despite much research into O(N) methods, standard approaches (Kohn-Sham or Hartree-Fock theory and excited-state formalisms) must still rely on conventional but robust O(N{sup 3}) solvers for many system classes, most notably metals. In particular, our eigensolver overcomes parallel scalability limitations where standard implementations of certain steps (reduction to tridiagonal form, solution of reduced tridiagonal eigenproblem) can be a serious bottleneck already for a few hundred CPUs. We demonstrate scalable implementations of these and all other steps of the full generalized eigenvalue problem. Our largest example is a production run with 1046 Pt (heavy-metal) atoms with converged all-electron accuracy in the numeric atom-centered orbital code FHI-aims, but the implementation is generic and should easily be portable to other codes.

  17. Massively parallelized replica-exchange simulations of polymers on GPUs

    CERN Document Server

    Groß, Jonathan; Bachmann, Michael

    2011-01-01

    We discuss the advantages of parallelization by multithreading on graphics processing units (GPUs) for parallel tempering Monte Carlo computer simulations of an exemplified bead-spring model for homopolymers. Since the sampling of a large ensemble of conformations is a prerequisite for the precise estimation of statistical quantities such as typical indicators for conformational transitions like the peak structure of the specific heat, the advantage of a strong increase in performance of Monte Carlo simulations cannot be overestimated. Employing multithreading and utilizing the massive power of the large number of cores on GPUs, being available in modern but standard graphics cards, we find a rapid increase in efficiency when porting parts of the code from the central processing unit (CPU) to the GPU.

  18. Random number generators for massively parallel simulations on GPU

    CERN Document Server

    Manssen, Markus; Hartmann, Alexander K

    2012-01-01

    High-performance streams of (pseudo) random numbers are crucial for the efficient implementation for countless stochastic algorithms, most importantly, Monte Carlo simulations and molecular dynamics simulations with stochastic thermostats. A number of implementations of random number generators has been discussed for GPU platforms before and some generators are even included in the CUDA supporting libraries. Nevertheless, not all of these generators are well suited for highly parallel applications where each thread requires its own generator instance. For this specific situation encountered, for instance, in simulations of lattice models, most of the high-quality generators with large states such as Mersenne twister cannot be used efficiently without substantial changes. We provide a broad review of existing CUDA variants of random-number generators and present the CUDA implementation of a new massively parallel high-quality, high-performance generator with a small memory load overhead.

  19. Parallelization of the NASA Goddard Cumulus Ensemble Model for Massively Parallel Computing

    Directory of Open Access Journals (Sweden)

    Hann-Ming Henry Juang

    2007-01-01

    Full Text Available Massively parallel computing, using a message passing interface (MPI, has been implemented into a three-dimensional version of the Goddard Cumulus Ensemble (GCE model. The implementation uses the domainresemble concept to design a code structure for both the whole domain and sub-domains after decomposition. Instead of inserting a group of MPI related statements into the model routine, these statements are packed into a single routine. In other words, only a single call statement to the model code is utilized once in a place, thus there is minimal impact on the original code. Therefore, the model is easily modified and/or managed by the model developers and/or users, who have little knowledge of massively parallel computing.

  20. The Fortran-P Translator: Towards Automatic Translation of Fortran 77 Programs for Massively Parallel Processors

    Directory of Open Access Journals (Sweden)

    Matthew O'keefe

    1995-01-01

    Full Text Available Massively parallel processors (MPPs hold the promise of extremely high performance that, if realized, could be used to study problems of unprecedented size and complexity. One of the primary stumbling blocks to this promise has been the lack of tools to translate application codes to MPP form. In this article we show how applications codes written in a subset of Fortran 77, called Fortran-P, can be translated to achieve good performance on several massively parallel machines. This subset can express codes that are self-similar, where the algorithm applied to the global data domain is also applied to each subdomain. We have found many codes that match the Fortran-P programming style and have converted them using our tools. We believe a self-similar coding style will accomplish what a vectorizable style has accomplished for vector machines by allowing the construction of robust, user-friendly, automatic translation systems that increase programmer productivity and generate fast, efficient code for MPPs.

  1. Representing and computing regular languages on massively parallel networks

    Energy Technology Data Exchange (ETDEWEB)

    Miller, M.I.; O' Sullivan, J.A. (Electronic Systems and Research Lab., of Electrical Engineering, Washington Univ., St. Louis, MO (US)); Boysam, B. (Dept. of Electrical, Computer and Systems Engineering, Rensselaer Polytechnic Inst., Troy, NY (US)); Smith, K.R. (Dept. of Electrical Engineering, Southern Illinois Univ., Edwardsville, IL (US))

    1991-01-01

    This paper proposes a general method for incorporating rule-based constraints corresponding to regular languages into stochastic inference problems, thereby allowing for a unified representation of stochastic and syntactic pattern constraints. The authors' approach first established the formal connection of rules to Chomsky grammars, and generalizes the original work of Shannon on the encoding of rule-based channel sequences to Markov chains of maximum entropy. This maximum entropy probabilistic view leads to Gibb's representations with potentials which have their number of minima growing at precisely the exponential rate that the language of deterministically constrained sequences grow. These representations are coupled to stochastic diffusion algorithms, which sample the language-constrained sequences by visiting the energy minima according to the underlying Gibbs' probability law. The coupling to stochastic search methods yields the all-important practical result that fully parallel stochastic cellular automata may be derived to generate samples from the rule-based constraint sets. The production rules and neighborhood state structure of the language of sequences directly determines the necessary connection structures of the required parallel computing surface. Representations of this type have been mapped to the DAP-510 massively-parallel processor consisting of 1024 mesh-connected bit-serial processing elements for performing automated segmentation of electron-micrograph images.

  2. X: A Comprehensive Analytic Model for Parallel Machines

    Energy Technology Data Exchange (ETDEWEB)

    Li, Ang; Song, Shuaiwen; Brugel, Eric; Kumar, Akash; Chavarría-Miranda, Daniel; Corporaal, Henk

    2016-05-23

    To continuously comply with Moore’s Law, modern parallel machines become increasingly complex. Effectively tuning application performance for these machines therefore becomes a daunting task. Moreover, identifying performance bottlenecks at application and architecture level, as well as evaluating various optimization strategies, are becoming extremely difficult when the entanglement of numerous correlated factors is being presented. To tackle these challenges, we present a visual analytical model named “X”. It is intuitive and sufficiently flexible to track all the typical features of a parallel machine.

  3. PUMA: An Operating System for Massively Parallel Systems

    Directory of Open Access Journals (Sweden)

    Stephen R. Wheat

    1994-01-01

    Full Text Available This article presents an overview of PUMA (Performance-oriented, User-managed Messaging Architecture, a message-passing kernel for massively parallel systems. Message passing in PUMA is based on portals – an opening in the address space of an application process. Once an application process has established a portal, other processes can write values into the portal using a simple send operation. Because messages are written directly into the address space of the receiving process, there is no need to buffer messages in the PUMA kernel and later copy them into the applications address space. PUMA consists of two components: the quintessential kernel (Q-Kernel and the process control thread (PCT. Although the PCT provides management decisions, the Q-Kernel controls access and implements the policies specified by the PCT.

  4. Massive Parallelization of STED Nanoscopy Using Optical Lattices

    CERN Document Server

    Yang, Bin; Mestre, Michael; Trebbia, Jean-Baptiste; Lounis, Brahim

    2013-01-01

    Recent developments in stimulated emission depletion (STED) microscopy achieved nanometer scale resolution and showed great potential in live cell imaging. Yet, STED nanoscopy techniques are based on single point-scanning. This constitutes a drawback for wide field imaging, since the gain in spatial resolution requires dense pixelation and hence long recording times. Here we achieve massive parallelization of STED nanoscopy using wide-field excitation together with well-designed optical lattices for depletion and a fast camera for detection. Acquisition of large field of view super-resolved images requires scanning over a single unit cell of the optical lattice which can be as small as 290 nm*290nm. Interference STED (In-STED) images of 2.9 {\\mu}m* 2.9 {\\mu}m with resolution down to 70 nm are obtained at 12.5 frames per second. The development of this technique opens many prospects for fast wide-field nanoscopy.

  5. Microresonator solitons for massively parallel coherent optical communications

    CERN Document Server

    Marin-Palomo, Pablo; Karpov, Maxim; Kordts, Arne; Pfeifle, Joerg; Pfeiffer, Martin H P; Trocha, Philipp; Wolf, Stefan; Brasch, Victor; Rosenberger, Ralf; Vijayan, Kovendhan; Freude, Wolfgang; Kippenberg, Tobias J; Koos, Christian

    2016-01-01

    Optical solitons are waveforms that preserve their shape while travelling, relying on a balance of dispersion and nonlinearity. Data transmission schemes using solitons were heavily investigated in the 1980s promising to overcome the limitations imposed by dispersion of optical fibers. These approaches, however, were eventually abandoned in favour of WDM schemes, that are easier to implement and offer much better scalability to higher data rates. Here, we show that optical solitons may experience a comeback in optical terabit communications, this time not as a competitor, but as a key element of massively parallel WDM. Instead of encoding data on the soliton itself, we exploit continuously circulating solitons in Kerr-nonlinear microresonators to generate broadband optical frequency combs. In our experiments, we use two interleaved Kerr combs to transmit data on a total of 179 individual optical carriers that span the entire C and L bands. Using higher-order modulation formats (16QAM), net data rates exceedin...

  6. Tolerating correlated failures in Massively Parallel Stream Processing Engines

    DEFF Research Database (Denmark)

    Su, L.; Zhou, Y.

    2016-01-01

    Fault-tolerance techniques for stream processing engines can be categorized into passive and active approaches. A typical passive approach periodically checkpoints a processing task's runtime states and can recover a failed task by restoring its runtime state using its latest checkpoint....... On the other hand, an active approach usually employs backup nodes to run replicated tasks. Upon failure, the active replica can take over the processing of the failed task with minimal latency. However, both approaches have their own inadequacies in Massively Parallel Stream Processing Engines (MPSPE......, the passive approach is applied to all tasks while only a selected set of tasks will be actively replicated. The number of actively replicated tasks depends on the available resources. If tasks without active replicas fail, tentative outputs will be generated before the completion of the recovery process. We...

  7. Direct stereo radargrammetric processing using massively parallel processing

    Science.gov (United States)

    Balz, Timo; Zhang, Lu; Liao, Mingsheng

    2013-05-01

    Synthetic Aperture Radar (SAR) offers many ways to reconstruct digital surface models (DSMs). The two most commonly used methods are SAR interferometry (InSAR) and stereo radargrammetry. Stereo radargrammetry is a very stable and reliable process and is far less affected by temporal decorrelation compared with InSAR. It is therefore often used for DSM generation in heavily vegetated areas. However, stereo radargrammetry often produces rather noisy DSMs, sometimes containing large outliers. In this manuscript, we present a new approach for stereo radargrammetric processing, where the homologous points between the images are found by geocoding large amount of points. This offers a very flexible approach, allowing the simultaneous processing of multiple images and of cross-heading image pairs. Our approach relies on a good initial geocoding accuracy of the data and on very fast processing using a massively parallel implementation. The approach is demonstrated using TerraSAR-X images from Mount Song, China, and from Trento, Italy.

  8. Performance Evaluation Methodologies and Tools for Massively Parallel Programs

    Science.gov (United States)

    Yan, Jerry C.; Sarukkai, Sekhar; Tucker, Deanne (Technical Monitor)

    1994-01-01

    The need for computing power has forced a migration from serial computation on a single processor to parallel processing on multiprocessors. However, without effective means to monitor (and analyze) program execution, tuning the performance of parallel programs becomes exponentially difficult as program complexity and machine size increase. The recent introduction of performance tuning tools from various supercomputer vendors (Intel's ParAide, TMC's PRISM, CSI'S Apprentice, and Convex's CXtrace) seems to indicate the maturity of performance tool technologies and vendors'/customers' recognition of their importance. However, a few important questions remain: What kind of performance bottlenecks can these tools detect (or correct)? How time consuming is the performance tuning process? What are some important technical issues that remain to be tackled in this area? This workshop reviews the fundamental concepts involved in analyzing and improving the performance of parallel and heterogeneous message-passing programs. Several alternative strategies will be contrasted, and for each we will describe how currently available tuning tools (e.g., AIMS, ParAide, PRISM, Apprentice, CXtrace, ATExpert, Pablo, IPS-2)) can be used to facilitate the process. We will characterize the effectiveness of the tools and methodologies based on actual user experiences at NASA Ames Research Center. Finally, we will discuss their limitations and outline recent approaches taken by vendors and the research community to address them.

  9. MASSIVE PARALLELISM WITH GPUS FOR CENTRALITY RANKING IN COMPLEX NETWORKS

    Directory of Open Access Journals (Sweden)

    Frederico L. Cabral

    2014-10-01

    Full Text Available Many problems in Computer Science can be modelled using graphs. Evaluating node centrality in complex networks, which can be considered equivalent to undirected graphs, provides an useful metric of the relative importance of each node inside the evaluated network. The knowledge on which the most central nodes are, has various applications, such as improving information spreading in diffusion networks. In this case, most central nodes can be considered to have higher influence rates over other nodes in the network. The main purpose in this work is developing a GPU based and massively parallel application so as to evaluate the node centrality in complex networks using the Nvidia CUDA programming model. The main contribution of this work is the strategies for the development of an algorithm to evaluate the node centrality in complex networks using Nvidia CUDA parallel programming model. We show that the strategies improves algorithm´s speed-up in two orders of magnitude on one NVIDIA Tesla k20 GPU cluster node, when compared to the hybrid OpenMP/MPI algorithm version, running in the same cluster, with 4 nodes 2 Intel(R Xeon(R CPU E5-2660 each, for radius zero.

  10. CHOLLA: A New Massively Parallel Hydrodynamics Code for Astrophysical Simulation

    Science.gov (United States)

    Schneider, Evan E.; Robertson, Brant E.

    2015-04-01

    We present Computational Hydrodynamics On ParaLLel Architectures (Cholla ), a new three-dimensional hydrodynamics code that harnesses the power of graphics processing units (GPUs) to accelerate astrophysical simulations. Cholla models the Euler equations on a static mesh using state-of-the-art techniques, including the unsplit Corner Transport Upwind algorithm, a variety of exact and approximate Riemann solvers, and multiple spatial reconstruction techniques including the piecewise parabolic method (PPM). Using GPUs, Cholla evolves the fluid properties of thousands of cells simultaneously and can update over 10 million cells per GPU-second while using an exact Riemann solver and PPM reconstruction. Owing to the massively parallel architecture of GPUs and the design of the Cholla code, astrophysical simulations with physically interesting grid resolutions (≳2563) can easily be computed on a single device. We use the Message Passing Interface library to extend calculations onto multiple devices and demonstrate nearly ideal scaling beyond 64 GPUs. A suite of test problems highlights the physical accuracy of our modeling and provides a useful comparison to other codes. We then use Cholla to simulate the interaction of a shock wave with a gas cloud in the interstellar medium, showing that the evolution of the cloud is highly dependent on its density structure. We reconcile the computed mixing time of a turbulent cloud with a realistic density distribution destroyed by a strong shock with the existing analytic theory for spherical cloud destruction by describing the system in terms of its median gas density.

  11. Massive parallel 3D PIC simulation of negative ion extraction

    Science.gov (United States)

    Revel, Adrien; Mochalskyy, Serhiy; Montellano, Ivar Mauricio; Wünderlich, Dirk; Fantz, Ursel; Minea, Tiberiu

    2017-09-01

    The 3D PIC-MCC code ONIX is dedicated to modeling Negative hydrogen/deuterium Ion (NI) extraction and co-extraction of electrons from radio-frequency driven, low pressure plasma sources. It provides valuable insight on the complex phenomena involved in the extraction process. In previous calculations, a mesh size larger than the Debye length was used, implying numerical electron heating. Important steps have been achieved in terms of computation performance and parallelization efficiency allowing successful massive parallel calculations (4096 cores), imperative to resolve the Debye length. In addition, the numerical algorithms have been improved in terms of grid treatment, i.e., the electric field near the complex geometry boundaries (plasma grid) is calculated more accurately. The revised model preserves the full 3D treatment, but can take advantage of a highly refined mesh. ONIX was used to investigate the role of the mesh size, the re-injection scheme for lost particles (extracted or wall absorbed), and the electron thermalization process on the calculated extracted current and plasma characteristics. It is demonstrated that all numerical schemes give the same NI current distribution for extracted ions. Concerning the electrons, the pair-injection technique is found well-adapted to simulate the sheath in front of the plasma grid.

  12. Massively Parallel Atomic Force Microscope with Digital Holographic Readout

    Energy Technology Data Exchange (ETDEWEB)

    Sache, L [Laboratory of Robotic Systems, Ecole Polytechnique Federale de Lausanne, EPFLSRO1, Station 9, CH-1015 Lausanne (Switzerland); Kawakatsu, H [Institute of Industrial Science, University of Tokyo, Tokyo (Japan); Emery, Y [Lyncee Tec SA, PSE-A, CH-1015 Lausanne (Switzerland); Bleuler, H [Laboratory of Robotic Systems, Ecole Polytechnique Federale de Lausanne, EPFLSRO1, Station 9, CH-1015 Lausanne (Switzerland)

    2007-03-15

    Massively Parallel Scanning Probe Microscopy is an obvious path for data storage (E Grochowski, R F Hoyt, Future Trends in Hard disc Drives, IEEE Trans. Magn. 1996, 32, 1850- 1854; J L Griffin, S W Schlosser, G R Ganger and D F Nagle, Modeling and Performance of MEMS-Based Storage Devices, Proc. ACM SIGMETRICS, 2000). Current experimental systems still lay far behind Hard Disc Drive (HDD) or Digital Video Disk (DVD), be it in access speed, data throughput, storage density or cost per bit. This paper presents an entirely new approach with the promise to break several of these barriers. The key idea is readout of a Scanning Probes Microscope (SPM) array by Digital Holographic Microscopy (DHM). This technology directly gives phase information at each pixel of a CCD array. This means that no contact line to each individual SPM probes is needed. The data is directly available in parallel form. Moreover, the optical setup needs in principle no expensive components, optical (or, to a large extent, mechanical) imperfections being compensated in the signal processing, i.e. in electronics. This gives the system the potential for a low cost device with fast Terabit readout capability.

  13. MCBooster: a tool for MC generation for massively parallel platforms

    CERN Multimedia

    Alves Junior, Antonio Augusto

    2016-01-01

    MCBooster is a header-only, C++11-compliant library for the generation of large samples of phase-space Monte Carlo events on massively parallel platforms. It was released on GitHub in the spring of 2016. The library core algorithms implement the Raubold-Lynch method; they are able to generate the full kinematics of decays with up to nine particles in the final state. The library supports the generation of sequential decays as well as the parallel evaluation of arbitrary functions over the generated events. The output of MCBooster completely accords with popular and well-tested software packages such as GENBOD (W515 from CERNLIB) and TGenPhaseSpace from the ROOT framework. MCBooster is developed on top of the Thrust library and runs on Linux systems. It deploys transparently on NVidia CUDA-enabled GPUs as well as multicore CPUs. This contribution summarizes the main features of MCBooster. A basic description of the user interface and some examples of applications are provided, along with measurements of perfor...

  14. Massively Parallel Interrogation of Aptamer Sequence, Structure and Function

    Energy Technology Data Exchange (ETDEWEB)

    Fischer, N O; Tok, J B; Tarasow, T M

    2008-02-08

    Optimization of high affinity reagents is a significant bottleneck in medicine and the life sciences. The ability to synthetically create thousands of permutations of a lead high-affinity reagent and survey the properties of individual permutations in parallel could potentially relieve this bottleneck. Aptamers are single stranded oligonucleotides affinity reagents isolated by in vitro selection processes and as a class have been shown to bind a wide variety of target molecules. Methodology/Principal Findings. High density DNA microarray technology was used to synthesize, in situ, arrays of approximately 3,900 aptamer sequence permutations in triplicate. These sequences were interrogated on-chip for their ability to bind the fluorescently-labeled cognate target, immunoglobulin E, resulting in the parallel execution of thousands of experiments. Fluorescence intensity at each array feature was well resolved and shown to be a function of the sequence present. The data demonstrated high intra- and interchip correlation between the same features as well as among the sequence triplicates within a single array. Consistent with aptamer mediated IgE binding, fluorescence intensity correlated strongly with specific aptamer sequences and the concentration of IgE applied to the array. The massively parallel sequence-function analyses provided by this approach confirmed the importance of a consensus sequence found in all 21 of the original IgE aptamer sequences and support a common stem:loop structure as being the secondary structure underlying IgE binding. The microarray application, data and results presented illustrate an efficient, high information content approach to optimizing aptamer function. It also provides a foundation from which to better understand and manipulate this important class of high affinity biomolecules.

  15. Massively parallel interrogation of aptamer sequence, structure and function.

    Directory of Open Access Journals (Sweden)

    Nicholas O Fischer

    Full Text Available BACKGROUND: Optimization of high affinity reagents is a significant bottleneck in medicine and the life sciences. The ability to synthetically create thousands of permutations of a lead high-affinity reagent and survey the properties of individual permutations in parallel could potentially relieve this bottleneck. Aptamers are single stranded oligonucleotides affinity reagents isolated by in vitro selection processes and as a class have been shown to bind a wide variety of target molecules. METHODOLOGY/PRINCIPAL FINDINGS: High density DNA microarray technology was used to synthesize, in situ, arrays of approximately 3,900 aptamer sequence permutations in triplicate. These sequences were interrogated on-chip for their ability to bind the fluorescently-labeled cognate target, immunoglobulin E, resulting in the parallel execution of thousands of experiments. Fluorescence intensity at each array feature was well resolved and shown to be a function of the sequence present. The data demonstrated high intra- and inter-chip correlation between the same features as well as among the sequence triplicates within a single array. Consistent with aptamer mediated IgE binding, fluorescence intensity correlated strongly with specific aptamer sequences and the concentration of IgE applied to the array. CONCLUSION AND SIGNIFICANCE: The massively parallel sequence-function analyses provided by this approach confirmed the importance of a consensus sequence found in all 21 of the original IgE aptamer sequences and support a common stem:loop structure as being the secondary structure underlying IgE binding. The microarray application, data and results presented illustrate an efficient, high information content approach to optimizing aptamer function. It also provides a foundation from which to better understand and manipulate this important class of high affinity biomolecules.

  16. cellGPU: Massively parallel simulations of dynamic vertex models

    Science.gov (United States)

    Sussman, Daniel M.

    2017-10-01

    Vertex models represent confluent tissue by polygonal or polyhedral tilings of space, with the individual cells interacting via force laws that depend on both the geometry of the cells and the topology of the tessellation. This dependence on the connectivity of the cellular network introduces several complications to performing molecular-dynamics-like simulations of vertex models, and in particular makes parallelizing the simulations difficult. cellGPU addresses this difficulty and lays the foundation for massively parallelized, GPU-based simulations of these models. This article discusses its implementation for a pair of two-dimensional models, and compares the typical performance that can be expected between running cellGPU entirely on the CPU versus its performance when running on a range of commercial and server-grade graphics cards. By implementing the calculation of topological changes and forces on cells in a highly parallelizable fashion, cellGPU enables researchers to simulate time- and length-scales previously inaccessible via existing single-threaded CPU implementations. Program Files doi:http://dx.doi.org/10.17632/6j2cj29t3r.1 Licensing provisions: MIT Programming language: CUDA/C++ Nature of problem: Simulations of off-lattice "vertex models" of cells, in which the interaction forces depend on both the geometry and the topology of the cellular aggregate. Solution method: Highly parallelized GPU-accelerated dynamical simulations in which the force calculations and the topological features can be handled on either the CPU or GPU. Additional comments: The code is hosted at https://gitlab.com/dmsussman/cellGPU, with documentation additionally maintained at http://dmsussman.gitlab.io/cellGPUdocumentation

  17. Parallel machine covering with limited number of preemptions

    Institute of Scientific and Technical Information of China (English)

    JIANG Yi-wei; HU Jue-liang; WENG Ze-wei; ZHU Yu-qing

    2014-01-01

    In this paper, we investigate the i-preemptive scheduling on parallel machines to maximize the minimum machine completion time, i.e., machine covering problem with limited number of preemptions. It is aimed to obtain the worst case ratio of the objective value of the optimal schedule with unlimited preemptions and that of the schedule allowed to be preempted at most i times. For the m identical machines case, we show the worst case ratio is 2m-i-1m , and we present a polynomial time algorithm which can guarantee the ratio for any 0≤ i≤ m-1. For the i-preemptive scheduling on two uniform machines case, we only need to consider the cases of i=0 and i=1. For both cases, we present two linear time algorithms and obtain the worst case ratios with respect to s, i.e., the ratio of the speeds of two machines.

  18. Waveform interative techniques for device transient simulation on parallel machines

    Energy Technology Data Exchange (ETDEWEB)

    Lumsdaine, A. [Univ. of Notre Dame, IN (United States); Reichelt, M.W. [Massachusetts Institute of Technology, Cambridge, MA (United States)

    1993-12-31

    In this paper we describe our experiences with parallel implementations of several different waveform algorithms for performing transient simulation of semiconductor devices. Because of their inherent computation and communication structure, waveform methods are well suited to MIMD-type parallel machines having a high communication latency - such as a cluster of workstations. Experimental results using pWORDS, a parallel waveform-based device transient simulation program, in conjunction with PVM running on a cluster of eight workstations demonstrate that parallel waveform techniques are an efficient and faster alternative to standard simulation algorithms.

  19. Porting a 3D-model for the transport of reactive air pollutants to the parallel machine T3D

    NARCIS (Netherlands)

    Kessler, C.; Blom, J.G.; Verwer, J.G.

    1995-01-01

    Air pollution forecasting puts a high demand on the memory and the floating point performance of modern computers. For this kind of problems massively parallel computers are very promising, although the software tools and the I/O facilities on those machines are still under-developed. This report de

  20. Reliable Radio Access for Massive Machine-to-Machine (M2M) Communication

    DEFF Research Database (Denmark)

    Madueño, Germán Corrales

    Machine-to-Machine (M2M) communication is a term that identifies the emerging paradigm of interconnected systems, machines, and things that communicate and collaborate without human intervention. The characteristics of M2M Communications are small payloads and sporadic transmissions, while...... the service requirements can range from massive number of devices to ultra-reliable. This PhD thesis focuses on novel mechanisms to meet these requirements in a variety of wireless systems, from well-established technologies such as cellular networks, to emerging technologies like IEEE 802.11ah. Today...... an overwhelming 89% of the deployed M2M modules are GPRS-based. This motivates us to investigate the potential of GPRS as a dedicated M2M network. We show that by introducing minimal modifications to GPRS operation, a large number of devices can be reliably supported. Surprisingly, even though LTE is seen...

  1. Approximation algorithms for scheduling unrelated parallel machines with release dates

    Science.gov (United States)

    Avdeenko, T. V.; Mesentsev, Y. A.; Estraykh, I. V.

    2017-01-01

    In this paper we propose approaches to optimal scheduling of unrelated parallel machines with release dates. One approach is based on the scheme of dynamic programming modified with adaptive narrowing of search domain ensuring its computational effectiveness. We discussed complexity of the exact schedules synthesis and compared it with approximate, close to optimal, solutions. Also we explain how the algorithm works for the example of two unrelated parallel machines and five jobs with release dates. Performance results that show the efficiency of the proposed approach have been given.

  2. KINEMATIC DESIGN OF A RECONFIGURABLE MINIATURE PARALLEL KINEMATIC MACHINE

    Institute of Scientific and Technical Information of China (English)

    2003-01-01

    The kinematic design of a reconfigurable miniature parallel kinematic machine is dealt with. It shows that the reconfigurability may be realized by packaging a tripod-based parallel mechanism with fixed length struts into a compact and rigid frame with which the different configurations can be formed. Utilizing a dual parameter model, the influences of the geometrical parameters on the dexterous performance and the workspace/machine volume ratio are investigated. A novel global performance index for the dimensional synthesis is proposed and optimized, resulting in a set of dimensionless geometrical parameters.

  3. The Distributed Assembly Parallel Machine Scheduling Problem with eligibility constraints.

    Directory of Open Access Journals (Sweden)

    Sara Hatami

    2015-01-01

    Full Text Available In this paper we jointly consider realistic scheduling extensions: First we study the distributed unrelated parallel machines problems by which there is a set of identical factories with parallel machines in a production stage. Jobs have to be assigned to factories and to machines. Additionally, there is an assembly stage with a single assembly machine. Finished jobs at the manufacturing stage are assembled into final products in this second assembly stage. These two joint features are referred to as the distributed assembly parallel machine scheduling problem or DAPMSP. The objective is to minimize the makespan in the assembly stage. Due to technological constraints, machines cannot be left empty and some jobs might be processed on certain factories only. We propose a mathematical model and two high performing heuristics. The model is tested with two state-of-the-art solvers and, together with the heuristics, 2220 instances are solved in a comprehensive computational experiments. Results show that the proposed model is able to solve moderately-sized instances and one of the heuristics is fast, giving close to optimal solutions in less than half a second in the worst case.

  4. Kinematic performance analysis of a parallel-chain hexapod machine

    Energy Technology Data Exchange (ETDEWEB)

    Jing Song; Jong-I Mou; Calvin King

    1998-05-18

    Inverse and forward kinematic models were derived to analyze the performance of a parallel-chain hexapod machine. Analytical models were constructed for both ideal and real structures. Performance assessment and enhancement algorithms were developed to determine the strut lengths for both ideal and real structures. The strut lengths determined from both cases can be used to analyze the effect of structural imperfections on machine performance. In an open-architecture control environment, strut length errors can be fed back to the controller to compensate for the displacement errors and thus improve the machine's accuracy in production.

  5. Selection and Assignment of Machines: a Parallel Aproach

    OpenAIRE

    Francisco Ribeiro, José

    2003-01-01

    In this paper, a two-phase method is presented for selection of machines to be kept on the shop floor and assignment of parts to be manufactured to these machines. In the first phase, dynamic programming or a heuristic procedure identifies a set of feasible solutions to a knapsack problem. In the second phase, implicit enumeration technique or a greedy algorithm solves an assignment problem. The proposed method is written in language C and runs on a parallel virtual machine called PVM-W95. Th...

  6. Wavelet-Based DFT calculations on Massively Parallel Hybrid Architectures

    Science.gov (United States)

    Genovese, Luigi

    2011-03-01

    In this contribution, we present an implementation of a full DFT code that can run on massively parallel hybrid CPU-GPU clusters. Our implementation is based on modern GPU architectures which support double-precision floating-point numbers. This DFT code, named BigDFT, is delivered within the GNU-GPL license either in a stand-alone version or integrated in the ABINIT software package. Hybrid BigDFT routines were initially ported with NVidia's CUDA language, and recently more functionalities have been added with new routines writeen within Kronos' OpenCL standard. The formalism of this code is based on Daubechies wavelets, which is a systematic real-space based basis set. As we will see in the presentation, the properties of this basis set are well suited for an extension on a GPU-accelerated environment. In addition to focusing on the implementation of the operators of the BigDFT code, this presentation also relies of the usage of the GPU resources in a complex code with different kinds of operations. A discussion on the interest of present and expected performances of Hybrid architectures computation in the framework of electronic structure calculations is also adressed.

  7. Comparing current cluster, massively parallel, and accelerated systems

    Energy Technology Data Exchange (ETDEWEB)

    Barker, Kevin J [Los Alamos National Laboratory; Davis, Kei [Los Alamos National Laboratory; Hoisie, Adolfy [Los Alamos National Laboratory; Kerbyson, Darren J [Los Alamos National Laboratory; Pakin, Scott [Los Alamos National Laboratory; Lang, Mike [Los Alamos National Laboratory; Sancho Pitarch, Jose C [Los Alamos National Laboratory

    2010-01-01

    Currently there is large architectural diversity in high perfonnance computing systems. They include 'commodity' cluster systems that optimize per-node performance for small jobs, massively parallel processors (MPPs) that optimize aggregate perfonnance for large jobs, and accelerated systems that optimize both per-node and aggregate performance but only for applications custom-designed to take advantage of such systems. Because of these dissimilarities, meaningful comparisons of achievable performance are not straightforward. In this work we utilize a methodology that combines both empirical analysis and performance modeling to compare clusters (represented by a 4,352-core IB cluster), MPPs (represented by a 147,456-core BG/P), and accelerated systems (represented by the 129,600-core Roadrunner) across a workload of four applications. Strengths of our approach include the ability to compare architectures - as opposed to specific implementations of an architecture - attribute each application's performance bottlenecks to characteristics unique to each system, and to explore performance scenarios in advance of their availability for measurement. Our analysis illustrates that application performance is essentially unrelated to relative peak performance but that application performance can be both predicted and explained using modeling.

  8. Information theory of massively parallel probe storage channels

    CERN Document Server

    Hambrey, Oliver; Zaboronski, Oleg

    2011-01-01

    Motivated by the concept of probe storage, we study the problem of information retrieval using a large array of N nano-mechanical probes, N ~ 4000. At the nanometer scale it is impossible to avoid errors in the positioning of the array, thus all signals retrieved by the probes of the array at a given sampling moment are affected by the same amount of random position jitter. Therefore a massively parallel probe storage device is an example of a noisy communication channel with long range correlations between channel outputs due to the global positioning errors. We find that these correlations have a profound effect on the channel's properties. For example, it turns out that the channel's information capacity does approach 1 bit per probe in the limit of high signal-to-noise ratio, but the rate of the approach is only polynomial in the channel noise strength. Moreover, any error correction code with block size N >> 1 such that codewords correspond to the instantaneous outputs of the all probes in the array exhi...

  9. Use of maximum entropy method with parallel processing machine. [for x-ray object image reconstruction

    Science.gov (United States)

    Yin, Lo I.; Bielefeld, Michael J.

    1987-01-01

    The maximum entropy method (MEM) and balanced correlation method were used to reconstruct the images of low-intensity X-ray objects obtained experimentally by means of a uniformly redundant array coded aperture system. The reconstructed images from MEM are clearly superior. However, the MEM algorithm is computationally more time-consuming because of its iterative nature. On the other hand, both the inherently two-dimensional character of images and the iterative computations of MEM suggest the use of parallel processing machines. Accordingly, computations were carried out on the massively parallel processor at Goddard Space Flight Center as well as on the serial processing machine VAX 8600, and the results are compared.

  10. Kinematic Analysis of a Serial - Parallel Machine Tool: the VERNE machine

    CERN Document Server

    Kanaan, Daniel; Chablat, Damien; 10.1016/j.mechmachtheory.2008.03.002

    2008-01-01

    The paper derives the inverse and the forward kinematic equations of a serial - parallel 5-axis machine tool: the VERNE machine. This machine is composed of a three-degree-of-freedom (DOF) parallel module and a two-DOF serial tilting table. The parallel module consists of a moving platform that is connected to a fixed base by three non-identical legs. These legs are connected in a way that the combined effects of the three legs lead to an over-constrained mechanism with complex motion. This motion is defined as a simultaneous combination of rotation and translation. In this paper we propose symbolical methods that able to calculate all kinematic solutions and identify the acceptable one by adding analytical constraint on the disposition of legs of the parallel module.

  11. Time efficient 3-D electromagnetic modeling on massively parallel computers

    Energy Technology Data Exchange (ETDEWEB)

    Alumbaugh, D.L.; Newman, G.A.

    1995-08-01

    A numerical modeling algorithm has been developed to simulate the electromagnetic response of a three dimensional earth to a dipole source for frequencies ranging from 100 to 100MHz. The numerical problem is formulated in terms of a frequency domain--modified vector Helmholtz equation for the scattered electric fields. The resulting differential equation is approximated using a staggered finite difference grid which results in a linear system of equations for which the matrix is sparse and complex symmetric. The system of equations is solved using a preconditioned quasi-minimum-residual method. Dirichlet boundary conditions are employed at the edges of the mesh by setting the tangential electric fields equal to zero. At frequencies less than 1MHz, normal grid stretching is employed to mitigate unwanted reflections off the grid boundaries. For frequencies greater than this, absorbing boundary conditions must be employed by making the stretching parameters of the modified vector Helmholtz equation complex which introduces loss at the boundaries. To allow for faster calculation of realistic models, the original serial version of the code has been modified to run on a massively parallel architecture. This modification involves three distinct tasks; (1) mapping the finite difference stencil to a processor stencil which allows for the necessary information to be exchanged between processors that contain adjacent nodes in the model, (2) determining the most efficient method to input the model which is accomplished by dividing the input into ``global`` and ``local`` data and then reading the two sets in differently, and (3) deciding how to output the data which is an inherently nonparallel process.

  12. PFLOTRAN: Recent Developments Facilitating Massively-Parallel Reactive Biogeochemical Transport

    Science.gov (United States)

    Hammond, G. E.

    2015-12-01

    With the recent shift towards modeling carbon and nitrogen cycling in support of climate-related initiatives, emphasis has been placed on incorporating increasingly mechanistic biogeochemistry within Earth system models to more accurately predict the response of terrestrial processes to natural and anthropogenic climate cycles. PFLOTRAN is an open-source subsurface code that is specialized for simulating multiphase flow and multicomponent biogeochemical transport on supercomputers. The object-oriented code was designed with modularity in mind and has been coupled with several third-party simulators (e.g. CLM to simulate land surface processes and E4D for coupled hydrogeophysical inversion). Central to PFLOTRAN's capabilities is its ability to simulate tightly-coupled reactive transport processes. This presentation focuses on recent enhancements to the code that enable the solution of large parameterized biogeochemical reaction networks with numerous chemical species. PFLOTRAN's "reaction sandbox" is described, which facilitates the implementation of user-defined reaction networks without the need for a comprehensive understanding of PFLOTRAN software infrastructure. The reaction sandbox is written in modern Fortran (2003-2008) and leverages encapsulation, inheritance, and polymorphism to provide the researcher with a flexible workspace for prototyping reactions within a massively parallel flow and transport simulation framework. As these prototypical reactions mature into well-accepted implementations, they can be incorporated into PFLOTRAN as native biogeochemistry capability. Users of the reaction sandbox are encouraged to upload their source code to PFLOTRAN's main source code repository, including the addition of simple regression tests to better ensure the long-term code compatibility and validity of simulation results.

  13. Massively parallel computational fluid dynamics calculations for aerodynamics and aerothermodynamics applications

    Energy Technology Data Exchange (ETDEWEB)

    Payne, J.L.; Hassan, B.

    1998-09-01

    Massively parallel computers have enabled the analyst to solve complicated flow fields (turbulent, chemically reacting) that were previously intractable. Calculations are presented using a massively parallel CFD code called SACCARA (Sandia Advanced Code for Compressible Aerothermodynamics Research and Analysis) currently under development at Sandia National Laboratories as part of the Department of Energy (DOE) Accelerated Strategic Computing Initiative (ASCI). Computations were made on a generic reentry vehicle in a hypersonic flowfield utilizing three different distributed parallel computers to assess the parallel efficiency of the code with increasing numbers of processors. The parallel efficiencies for the SACCARA code will be presented for cases using 1, 150, 100 and 500 processors. Computations were also made on a subsonic/transonic vehicle using both 236 and 521 processors on a grid containing approximately 14.7 million grid points. Ongoing and future plans to implement a parallel overset grid capability and couple SACCARA with other mechanics codes in a massively parallel environment are discussed.

  14. A Model of Parallel Kinematics for Machine Calibration

    DEFF Research Database (Denmark)

    Pedersen, David Bue; Bæk Nielsen, Morten; Kløve Christensen, Simon

    2016-01-01

    Parallel kinematics have been adopted by more than 25 manufacturers of high-end desktop 3D printers [Wohlers Report (2015), p.118] as well as by research projects such as the WASP project [WASP (2015)], a 12 meter tall linear delta robot for Additive Manufacture of large-scale components...... developed in order to decompose the different types of geometrical errors into 6 elementary cases. Deliberate introduction of errors to the virtual machine has subsequently allowed for the generation of deviation plots that can be used as a strong tool for the identification and correction of geometrical...... errors on a physical machine tool....

  15. ON-LINE SCHEDULING WITH REJECTION ON IDENTICAL PARALLEL MACHINES

    Institute of Scientific and Technical Information of China (English)

    Cuixia MIAO; Yuzhong ZHANG

    2006-01-01

    In this paper, we consider the on-line scheduling of unit time jobs with rejection on m identical parallel machines. The objective is to minimize the total completion time of the accepted jobs plus the total penalty of the rejected jobs. We give an on-line algorithm for the problem with competitive ratio 1/2(2 + √3) ≈ 1.86602.

  16. Machine translation with minimal reliance on parallel resources

    CERN Document Server

    Tambouratzis, George; Sofianopoulos, Sokratis

    2017-01-01

    This book provides a unified view on a new methodology for Machine Translation (MT). This methodology extracts information from widely available resources (extensive monolingual corpora) while only assuming the existence of a very limited parallel corpus, thus having a unique starting point to Statistical Machine Translation (SMT). In this book, a detailed presentation of the methodology principles and system architecture is followed by a series of experiments, where the proposed system is compared to other MT systems using a set of established metrics including BLEU, NIST, Meteor and TER. Additionally, a free-to-use code is available, that allows the creation of new MT systems. The volume is addressed to both language professionals and researchers. Prerequisites for the readers are very limited and include a basic understanding of the machine translation as well as of the basic tools of natural language processing.

  17. Detection of arboviruses and other micro-organisms in experimentally infected mosquitoes using massively parallel sequencing.

    Science.gov (United States)

    Hall-Mendelin, Sonja; Allcock, Richard; Kresoje, Nina; van den Hurk, Andrew F; Warrilow, David

    2013-01-01

    Human disease incidence attributed to arbovirus infection is increasing throughout the world, with effective control interventions limited by issues of sustainability, insecticide resistance and the lack of effective vaccines. Several promising control strategies are currently under development, such as the release of mosquitoes trans-infected with virus-blocking Wolbachia bacteria. Implementation of any control program is dependent on effective virus surveillance and a thorough understanding of virus-vector interactions. Massively parallel sequencing has enormous potential for providing comprehensive genomic information that can be used to assess many aspects of arbovirus ecology, as well as to evaluate novel control strategies. To demonstrate proof-of-principle, we analyzed Aedes aegypti or Aedes albopictus experimentally infected with dengue, yellow fever or chikungunya viruses. Random amplification was used to prepare sufficient template for sequencing on the Personal Genome Machine. Viral sequences were present in all infected mosquitoes. In addition, in most cases, we were also able to identify the mosquito species and mosquito micro-organisms, including the bacterial endosymbiont Wolbachia. Importantly, naturally occurring Wolbachia strains could be differentiated from strains that had been trans-infected into the mosquito. The method allowed us to assemble near full-length viral genomes and detect other micro-organisms without prior sequence knowledge, in a single reaction. This is a step toward the application of massively parallel sequencing as an arbovirus surveillance tool. It has the potential to provide insight into virus transmission dynamics, and has applicability to the post-release monitoring of Wolbachia in mosquito populations.

  18. Manufacturing methods for machining spring ends parallel at loaded length

    Science.gov (United States)

    Hinke, Patrick Thomas (Inventor); Benson, Dwayne M. (Inventor); Atkins, Donald J. (Inventor)

    1995-01-01

    A first end surface of a coiled compression spring at its relaxed length is machined to a plane transverse to the spring axis. The spring is then placed in a press structure having first and second opposed planar support surfaces, with the machined spring end surface bearing against the first support surface, the unmachined spring end surface bearing against a planar first surface of a lateral force compensation member, and an opposite, generally spherically curved surface of the compensation member bearing against the second press structure support surface. The spring is then compressed generally to its loaded length, and a circumferentially spaced series of marks, lying in a plane parallel to the second press structure support surface, are formed on the spring coil on which the second spring end surface lies. The spring is then removed from the press structure, and the second spring end surface is machined to the mark plane. When the spring is subsequently compressed to its loaded length the precisely parallel relationship between the machined spring end surfaces substantially eliminates undesirable lateral deflection of the spring.

  19. Massively parallel neural circuits for stereoscopic color vision: encoding, decoding and identification.

    Science.gov (United States)

    Lazar, Aurel A; Slutskiy, Yevgeniy B; Zhou, Yiyin

    2015-03-01

    Past work demonstrated how monochromatic visual stimuli could be faithfully encoded and decoded under Nyquist-type rate conditions. Color visual stimuli were then traditionally encoded and decoded in multiple separate monochromatic channels. The brain, however, appears to mix information about color channels at the earliest stages of the visual system, including the retina itself. If information about color is mixed and encoded by a common pool of neurons, how can colors be demixed and perceived? We present Color Video Time Encoding Machines (Color Video TEMs) for encoding color visual stimuli that take into account a variety of color representations within a single neural circuit. We then derive a Color Video Time Decoding Machine (Color Video TDM) algorithm for color demixing and reconstruction of color visual scenes from spikes produced by a population of visual neurons. In addition, we formulate Color Video Channel Identification Machines (Color Video CIMs) for functionally identifying color visual processing performed by a spiking neural circuit. Furthermore, we derive a duality between TDMs and CIMs that unifies the two and leads to a general theory of neural information representation for stereoscopic color vision. We provide examples demonstrating that a massively parallel color visual neural circuit can be first identified with arbitrary precision and its spike trains can be subsequently used to reconstruct the encoded stimuli. We argue that evaluation of the functional identification methodology can be effectively and intuitively performed in the stimulus space. In this space, a signal reconstructed from spike trains generated by the identified neural circuit can be compared to the original stimulus.

  20. SWAMP+: multiple subsequence alignment using associative massive parallelism

    Energy Technology Data Exchange (ETDEWEB)

    Steinfadt, Shannon Irene [Los Alamos National Laboratory; Baker, Johnnie W [KENT STATE UNIV.

    2010-10-18

    A new parallel algorithm SWAMP+ incorporates the Smith-Waterman sequence alignment on an associative parallel model known as ASC. It is a highly sensitive parallel approach that expands traditional pairwise sequence alignment. This is the first parallel algorithm to provide multiple non-overlapping, non-intersecting subsequence alignments with the accuracy of Smith-Waterman. The efficient algorithm provides multiple alignments similar to BLAST while creating a better workflow for the end users. The parallel portions of the code run in O(m+n) time using m processors. When m = n, the algorithmic analysis becomes O(n) with a coefficient of two, yielding a linear speedup. Implementation of the algorithm on the SIMD ClearSpeed CSX620 confirms this theoretical linear speedup with real timings.

  1. A hybrid algorithm for unrelated parallel machines scheduling

    Directory of Open Access Journals (Sweden)

    Mohsen Shafiei Nikabadi

    2016-09-01

    Full Text Available In this paper, a new hybrid algorithm based on multi-objective genetic algorithm (MOGA using simulated annealing (SA is proposed for scheduling unrelated parallel machines with sequence-dependent setup times, varying due dates, ready times and precedence relations among jobs. Our objective is to minimize makespan (Maximum completion time of all machines, number of tardy jobs, total tardiness and total earliness at the same time which can be more advantageous in real environment than considering each of objectives separately. For obtaining an optimal solution, hybrid algorithm based on MOGA and SA has been proposed in order to gain both good global and local search abilities. Simulation results and four well-known multi-objective performance metrics, indicate that the proposed hybrid algorithm outperforms the genetic algorithm (GA and SA in terms of each objective and significantly in minimizing the total cost of the weighted function.

  2. Dimensional Synthesis Design of Novel Parallel Machine Tool

    Institute of Scientific and Technical Information of China (English)

    汪劲松; 唐晓强; 段广洪; 尹文生

    2002-01-01

    This paper presents dimensional synthesis design theory for a novel planar 3-DOF (degrees of freedom) parallel machine tool. Closed-form solutions are developed for both the inverse and direct kinematics. The formulation of the dexterity and the definitions of the theoretical workspace and the valid workspace are used to analyze the effects of the design parameters on the dexterity and workspace. The analysis results are used to propose an approach to satisfy the platform motion requirement while realizing orientation capability, dexterity and valid workspace. A design example is given to illustrate the effectiveness of this approach.

  3. Programming Massively Parallel Architectures using MARTE: a Case Study

    CERN Document Server

    Rodrigues, Wendell; Dekeyser, Jean-Luc

    2011-01-01

    Nowadays, several industrial applications are being ported to parallel architectures. These applications take advantage of the potential parallelism provided by multiple core processors. Many-core processors, especially the GPUs(Graphics Processing Unit), have led the race of floating-point performance since 2003. While the performance improvement of general- purpose microprocessors has slowed significantly, the GPUs have continued to improve relentlessly. As of 2009, the ratio between many-core GPUs and multicore CPUs for peak floating-point calculation throughput is about 10 times. However, as parallel programming requires a non-trivial distribution of tasks and data, developers find it hard to implement their applications effectively. Aiming to improve the use of many-core processors, this work presents an case-study using UML and MARTE profile to specify and generate OpenCL code for intensive signal processing applications. Benchmark results show us the viability of the use of MDE approaches to generate G...

  4. Calibration of parallel kinematics machine using generalized distance error model

    Institute of Scientific and Technical Information of China (English)

    2007-01-01

    This paper focus on the accuracy enhancement of parallel kinematics machine through kinematics calibration. In the calibration processing, well-structured identification Jacobian matrix construction and end-effector position and orientation measurement are two main difficulties. In this paper, the identification Jacobian matrix is constructed easily by numerical calculation utilizing the unit virtual velocity method. The generalized distance errors model is presented for avoiding measuring the position and orientation directly which is difficult to be measured. At last, a measurement tool is given for acquiring the data points in the calibration processing.Experimental studies confirmed the effectiveness of method. It is also shown in the paper that the proposed approach can be applied to other typed parallel manipulators.

  5. A Parallel Vector Machine for the PM Programming Language

    Science.gov (United States)

    Bellerby, Tim

    2016-04-01

    PM is a new programming language which aims to make the writing of computational geoscience models on parallel hardware accessible to scientists who are not themselves expert parallel programmers. It is based around the concept of communicating operators: language constructs that enable variables local to a single invocation of a parallelised loop to be viewed as if they were arrays spanning the entire loop domain. This mechanism enables different loop invocations (which may or may not be executing on different processors) to exchange information in a manner that extends the successful Communicating Sequential Processes idiom from single messages to collective communication. Communicating operators avoid the additional synchronisation mechanisms, such as atomic variables, required when programming using the Partitioned Global Address Space (PGAS) paradigm. Using a single loop invocation as the fundamental unit of concurrency enables PM to uniformly represent different levels of parallelism from vector operations through shared memory systems to distributed grids. This paper describes an implementation of PM based on a vectorised virtual machine. On a single processor node, concurrent operations are implemented using masked vector operations. Virtual machine instructions operate on vectors of values and may be unmasked, masked using a Boolean field, or masked using an array of active vector cell locations. Conditional structures (such as if-then-else or while statement implementations) calculate and apply masks to the operations they control. A shift in mask representation from Boolean to location-list occurs when active locations become sufficiently sparse. Parallel loops unfold data structures (or vectors of data structures for nested loops) into vectors of values that may additionally be distributed over multiple computational nodes and then split into micro-threads compatible with the size of the local cache. Inter-node communication is accomplished using

  6. Solving Lotsizing Problems on Parallel Identical Machines Using Symmetry Breaking Constraints

    NARCIS (Netherlands)

    R.F. Jans (Raf)

    2006-01-01

    textabstractProduction planning on multiple parallel machines is an interesting problem, both from a theoretical and practical point of view. The parallel machine lotsizing problem consists of finding the optimal timing and level of production and the best allocation of products to machines. In this

  7. Parallel-Machine Scheduling with Time-Dependent and Machine Availability Constraints

    Directory of Open Access Journals (Sweden)

    Cuixia Miao

    2015-01-01

    Full Text Available We consider the parallel-machine scheduling problem in which the machines have availability constraints and the processing time of each job is simple linear increasing function of its starting times. For the makespan minimization problem, which is NP-hard in the strong sense, we discuss the Longest Deteriorating Rate algorithm and List Scheduling algorithm; we also provide a lower bound of any optimal schedule. For the total completion time minimization problem, we analyze the strong NP-hardness, and we present a dynamic programming algorithm and a fully polynomial time approximation scheme for the two-machine problem. Furthermore, we extended the dynamic programming algorithm to the total weighted completion time minimization problem.

  8. Introduction to massively-parallel computing in high-energy physics

    CERN Document Server

    Smith, Mark

    1993-01-01

    Ever since computers were first used for scientific and numerical work, there has existed an "arms race" between the technical development of faster computing hardware, and the desires of scientists to solve larger problems in shorter time-scales. However, the vast leaps in processor performance achieved through advances in semi-conductor science have reached a hiatus as the technology comes up against the physical limits of the speed of light and quantum effects. This has lead all high performance computer manufacturers to turn towards a parallel architecture for their new machines. In these lectures we will introduce the history and concepts behind parallel computing, and review the various parallel architectures and software environments currently available. We will then introduce programming methodologies that allow efficient exploitation of parallel machines, and present case studies of the parallelization of typical High Energy Physics codes for the two main classes of parallel computing architecture (S...

  9. BlueGene/L Applications: Parallelism on a Massive Scale

    Energy Technology Data Exchange (ETDEWEB)

    de Supinski, B R; Schulz, M; Bulatov, V V; Cabot, W; Chan, B; Cook, A W; Draeger, E W; Glosli, J N; Greenough, J A; Henderson, K; Kubota, A; Louis, S; Miller, B J; Patel, M V; Spelce, T E; Streitz, F H; Williams, P L; Yates, R K; Yoo, A; Almasi, G; Bhanot, G; Gara, A; Gunnels, J A; Gupta, M; Moreira, J; Sexton, J; Walkup, B; Archer, C; Gygi, F; Germann, T C; Kadau, K; Lomdahl, P S; Rendleman, C; Welcome, M L; McLendon, W; Hendrickson, B; Franchetti, F; Lorenz, J; Uberhuber, C W; Chow, E; Catalyurek, U

    2006-09-08

    BlueGene/L (BG/L), developed through a partnership between IBM and Lawrence Livermore National Laboratory (LLNL), is currently the world's largest system both in terms of scale with 131,072 processors and absolute performance with a peak rate of 367 TFlop/s. BG/L has led the Top500 list the last four times with a Linpack rate of 280.6 TFlop/s for the full machine installed at LLNL and is expected to remain the fastest computer in the next few editions. However, the real value of a machine like BG/L derives from the scientific breakthroughs that real applications can produce by successfully using its unprecedented scale and computational power. In this paper, we describe our experiences with eight large scale applications on BG/L from several application domains, ranging from molecular dynamics to dislocation dynamics and turbulence simulations to searches in semantic graphs. We also discuss the challenges we faced when scaling these codes and present several successful optimization techniques. All applications show excellent scaling behavior, even at very large processor counts, with one code even achieving a sustained performance of more than 100 TFlop/s, clearly demonstrating the real success of the BG/L design.

  10. Numerical simulations of astrophysical problems on massively parallel supercomputers

    Science.gov (United States)

    Kulikov, Igor; Chernykh, Igor; Glinsky, Boris

    2016-10-01

    In this paper, we propose the last version of the numerical model for simulation of astrophysical objects dynamics, and a new realization of our AstroPhi code for Intel Xeon Phi based RSC PetaStream supercomputers. The co-design of a computational model for the description of astrophysical objects is described. The parallel implementation and scalability tests of the AstroPhi code are presented. We achieve a 73% weak scaling efficiency with using of 256x Intel Xeon Phi accelerators with 61440 threads.

  11. Cross-platform compatibility of Hi-Plex, a streamlined approach for targeted massively parallel sequencing.

    Science.gov (United States)

    Nguyen-Dumont, Tú; Pope, Bernard J; Hammet, Fleur; Mahmoodi, Maryam; Tsimiklis, Helen; Southey, Melissa C; Park, Daniel J

    2013-11-15

    Although per-base sequencing costs have decreased during recent years, library preparation for targeted massively parallel sequencing remains constrained by high reagent cost, limited design flexibility, and protocol complexity. To address these limitations, we previously developed Hi-Plex, a polymerase chain reaction (PCR) massively parallel sequencing strategy for screening panels of genomic target regions. Here, we demonstrate that Hi-Plex applied with hybrid adapters can generate a library suitable for sequencing with both the Ion Torrent and the TruSeq chemistries and that adjusting primer concentrations improves coverage uniformity. These results expand Hi-Plex capabilities as an accurate, affordable, flexible, and rapid approach for various genetic screening applications.

  12. MASSIVELY PARALLEL LATENT SEMANTIC ANALYSES USING A GRAPHICS PROCESSING UNIT

    Energy Technology Data Exchange (ETDEWEB)

    Cavanagh, J.; Cui, S.

    2009-01-01

    Latent Semantic Analysis (LSA) aims to reduce the dimensions of large term-document datasets using Singular Value Decomposition. However, with the ever-expanding size of datasets, current implementations are not fast enough to quickly and easily compute the results on a standard PC. A graphics processing unit (GPU) can solve some highly parallel problems much faster than a traditional sequential processor or central processing unit (CPU). Thus, a deployable system using a GPU to speed up large-scale LSA processes would be a much more effective choice (in terms of cost/performance ratio) than using a PC cluster. Due to the GPU’s application-specifi c architecture, harnessing the GPU’s computational prowess for LSA is a great challenge. We presented a parallel LSA implementation on the GPU, using NVIDIA® Compute Unifi ed Device Architecture and Compute Unifi ed Basic Linear Algebra Subprograms software. The performance of this implementation is compared to traditional LSA implementation on a CPU using an optimized Basic Linear Algebra Subprograms library. After implementation, we discovered that the GPU version of the algorithm was twice as fast for large matrices (1 000x1 000 and above) that had dimensions not divisible by 16. For large matrices that did have dimensions divisible by 16, the GPU algorithm ran fi ve to six times faster than the CPU version. The large variation is due to architectural benefi ts of the GPU for matrices divisible by 16. It should be noted that the overall speeds for the CPU version did not vary from relative normal when the matrix dimensions were divisible by 16. Further research is needed in order to produce a fully implementable version of LSA. With that in mind, the research we presented shows that the GPU is a viable option for increasing the speed of LSA, in terms of cost/performance ratio.

  13. Massively Parallel Latent Semantic Analyzes using a Graphics Processing Unit

    Energy Technology Data Exchange (ETDEWEB)

    Cavanagh, Joseph M [ORNL; Cui, Xiaohui [ORNL

    2009-01-01

    Latent Semantic Indexing (LSA) aims to reduce the dimensions of large Term-Document datasets using Singular Value Decomposition. However, with the ever expanding size of data sets, current implementations are not fast enough to quickly and easily compute the results on a standard PC. The Graphics Processing Unit (GPU) can solve some highly parallel problems much faster than the traditional sequential processor (CPU). Thus, a deployable system using a GPU to speedup large-scale LSA processes would be a much more effective choice (in terms of cost/performance ratio) than using a computer cluster. Due to the GPU s application-specific architecture, harnessing the GPU s computational prowess for LSA is a great challenge. We present a parallel LSA implementation on the GPU, using NVIDIA Compute Unified Device Architecture and Compute Unified Basic Linear Algebra Subprograms. The performance of this implementation is compared to traditional LSA implementation on CPU using an optimized Basic Linear Algebra Subprograms library. After implementation, we discovered that the GPU version of the algorithm was twice as fast for large matrices (1000x1000 and above) that had dimensions not divisible by 16. For large matrices that did have dimensions divisible by 16, the GPU algorithm ran five to six times faster than the CPU version. The large variation is due to architectural benefits the GPU has for matrices divisible by 16. It should be noted that the overall speeds for the CPU version did not vary from relative normal when the matrix dimensions were divisible by 16. Further research is needed in order to produce a fully implementable version of LSA. With that in mind, the research we presented shows that the GPU is a viable option for increasing the speed of LSA, in terms of cost/performance ratio.

  14. Machine Learning and Parallelism in the Reconstruction of LHCb and its Upgrade

    Science.gov (United States)

    De Cian, Michel

    2016-11-01

    The LHCb detector at the LHC is a general purpose detector in the forward region with a focus on reconstructing decays of c- and b-hadrons. For Run II of the LHC, a new trigger strategy with a real-time reconstruction, alignment and calibration was employed. This was made possible by implementing an offline-like track reconstruction in the high level trigger. However, the ever increasing need for a higher throughput and the move to parallelism in the CPU architectures in the last years necessitated the use of vectorization techniques to achieve the desired speed and a more extensive use of machine learning to veto bad events early on. This document discusses selected improvements in computationally expensive parts of the track reconstruction, like the Kalman filter, as well as an improved approach to get rid of fake tracks using fast machine learning techniques. In the last part, a short overview of the track reconstruction challenges for the upgrade of LHCb, is given. Running a fully software-based trigger, a large gain in speed in the reconstruction has to be achieved to cope with the 40 MHz bunch-crossing rate. Two possible approaches for techniques exploiting massive parallelization are discussed.

  15. Massively parallel Monte Carlo for many-particle simulations on GPUs

    CERN Document Server

    Anderson, Joshua A; Grubb, Thomas L; Engel, Michael; Glotzer, Sharon C

    2013-01-01

    Current trends in parallel processors call for the design of efficient massively parallel algorithms for scientific computing. Parallel algorithms for Monte Carlo simulations of thermodynamic ensembles of particles have received little attention because of the inherent serial nature of the statistical sampling. In this paper, we present a massively parallel method that obeys detailed balance and implement it for a system of hard disks on the GPU. We reproduce results of serial high-precision Monte Carlo runs to verify the method. This is a good test case because the hard disk equation of state over the range where the liquid transforms into the solid is particularly sensitive to small deviations away from the balance conditions. On a GeForce GTX 680, our GPU implementation executes 95 times faster than on a single Intel Xeon E5540 CPU core, enabling 17 times better performance per dollar and cutting energy usage by a factor of 10.

  16. Comparing and Optimising Parallel Haskell Implementations for Multicore Machines

    DEFF Research Database (Denmark)

    Berthold, Jost; Marlow, Simon; Hammond, Kevin

    2009-01-01

    by our testing: for example, we implemented a work-stealing approach to task allocation. Our optimisations improved the performance of the shared-heap GpH implementation by as much as 30% on eight cores. Secondly, the shared heap approach is, rather surprisingly, not superior to a distributed heap......H implementation investigated here uses a physically-shared heap, which should be well-suited to multicore architectures. In contrast, the Eden implementation adopts an approach that has been designed for use on distributed-memory parallel machines: a system of multiple, independent heaps (one per core......), with inter-core communication handled by message-passing rather than through shared heap cells. We report two main results. Firstly, we report on the effect of a number of optimisations that we applied to the shared-memory GpH implementation in order to address some performance issues that were revealed...

  17. I - Template Metaprogramming for Massively Parallel Scientific Computing - Expression Templates

    CERN Document Server

    CERN. Geneva

    2016-01-01

    Large scale scientific computing raises questions on different levels ranging from the fomulation of the problems to the choice of the best algorithms and their implementation for a specific platform. There are similarities in these different topics that can be exploited by modern-style C++ template metaprogramming techniques to produce readable, maintainable and generic code. Traditional low-level code tend to be fast but platform-dependent, and it obfuscates the meaning of the algorithm. On the other hand, object-oriented approach is nice to read, but may come with an inherent performance penalty. These lectures aim to present he basics of the Expression Template (ET) idiom which allows us to keep the object-oriented approach without sacrificing performance. We will in particular show to to enhance ET to include SIMD vectorization. We will then introduce techniques for abstracting iteration, and introduce thread-level parallelism for use in heavy data-centric loads. We will show to to apply these methods i...

  18. Guide to development of a scalar massive parallel programming on Paragon

    Energy Technology Data Exchange (ETDEWEB)

    Ueshima, Yutaka; Arakawa, Takuya; Sasaki, Akira [Japan Atomic Energy Research Inst., Neyagawa, Osaka (Japan). Kansai Research Establishment; Yokota, Hisasi

    1998-10-01

    Parallel calculations using more than hundred computers had begun in Japan only several years ago. The Intel Paragon XP/S 15GP256 , 75MP834 were introduced as pioneers in Japan Atomic Energy Research Institute (JAERI) to pursue massive parallel simulations for advanced photon and fusion researches. Recently, large number of parallel programs have been transplanted or newly produced to perform the parallel calculations with those computers. However, these programs are developed based on software technologies for conventional super computer, therefore they sometimes cause troubles in the massive parallel computing. In principle, when programs are developed under different computer and operating system (OS), prudent directions and knowledge are needed. However, integration of knowledge and standardization of environment are quite difficult because number of Paragon system and Paragon`s users are very small in Japan. Therefore, we summarized information which was got through the process of development of a massive parallel program in the Paragon XP/S 75MP834. (author)

  19. Design and performance characterization of electronic structure calculations on massively parallel supercomputers

    DEFF Research Database (Denmark)

    Romero, N. A.; Glinsvad, Christian; Larsen, Ask Hjorth

    2013-01-01

    Density function theory (DFT) is the most widely employed electronic structure method because of its favorable scaling with system size and accuracy for a broad range of molecular and condensed-phase systems. The advent of massively parallel supercomputers has enhanced the scientific community's ...

  20. Implementation of whole genome massively parallel sequencing for noninvasive prenatal testing in laboratories

    NARCIS (Netherlands)

    Thung, G.W.D.T.; Beulen, L.; Hehir-Kwa, J.Y.; Faas, B.H.W.

    2015-01-01

    Noninvasive prenatal testing (NIPT) for fetal aneuploidies using cell-free fetal DNA in maternal plasma has revolutionized the field of prenatal care and methods using massively parallel sequencing are now being implemented almost worldwide. Substantial progress has been made from initially testing

  1. Detection of reverse transcriptase termination sites using cDNA ligation and massive parallel sequencing

    DEFF Research Database (Denmark)

    Kielpinski, Lukasz J; Boyd, Mette; Sandelin, Albin

    2013-01-01

    of these methods can be increased by applying massive parallel sequencing technologies.Here, we describe a versatile method for detection of reverse transcriptase termination sites based on ligation of an adapter to the 3' end of cDNA with bacteriophage TS2126 RNA ligase (CircLigase™). In the following PCR...

  2. Reduced complexity and latency for a massive MIMO system using a parallel detection algorithm

    Directory of Open Access Journals (Sweden)

    Shoichi Higuchi

    2017-09-01

    Full Text Available In recent years, massive MIMO systems have been widely researched to realize high-speed data transmission. Since massive MIMO systems use a large number of antennas, these systems require huge complexity to detect the signal. In this paper, we propose a novel detection method for massive MIMO using parallel detection with maximum likelihood detection with QR decomposition and M-algorithm (QRM-MLD to reduce the complexity and latency. The proposed scheme obtains an R matrix after permutation of an H matrix and QR decomposition. The R matrix is also eliminated using a Gauss–Jordan elimination method. By using a modified R matrix, the proposed method can detect the transmitted signal using parallel detection. From the simulation results, the proposed scheme can achieve a reduced complexity and latency with a little degradation of the bit error rate (BER performance compared with the conventional method.

  3. ARTS - adaptive runtime system for massively parallel systems. Final report; ARTS - optimale Ausfuehrungsunterstuetzung fuer komplexe Anwendungen auf massiv parallelen Systemen. Teilprojekt: Parallele Stroemungsmechanik. Abschlussbericht

    Energy Technology Data Exchange (ETDEWEB)

    Gentzsch, W.; Ferstl, F.; Paap, H.G.; Riedel, E.

    1998-03-20

    In the ARTS project, system software has been developed to support smog and fluid dynamic applications on massively parallel systems. The aim is to implement and test specific software structures within an adaptive run-time system to separate the parallel core algorithms of the applications from the platform independent runtime aspects. Only slight modifications is existing Fortran and C code are necessary to integrate the application code into the new object oriented parallel integrated ARTS framework. The OO-design offers easy control, re-use and adaptation of the system services, resulting in a dramatic decrease in development time of the application and in ease of maintainability of the application software in the future. (orig.) [Deutsch] Im Projekt ARTS wird Basissoftware zur Unterstuetzung von Anwendungen aus den Bereichen Smoganalyse und Stroemungsmechanik auf massiv parallelen Systemen entwickelt und optimiert. Im Vordergrund steht die Erprobung geeigneter Strukturen, um systemnahe Funktionalitaeten in einer Laufzeitumgebung anzusiedeln und dadurch die parallelen Kernalgorithmen der Anwendungsprogramme von den plattformunabhaengigen Laufzeitaspekten zu trennen. Es handelt sich dabei um herkoemmlich strukturierten Fortran-Code, der unter minimalen Aenderungen auch weiterhin nutzbar sein muss, sowie um objektbasiert entworfenen C-Code, der die volle Funktionalitaet der ARTS-Plattform ausnutzen kann. Ein objektorientiertes Design erlaubt eine einfache Kontrolle, Wiederverwendung und Adaption der vom System vorgegebenen Basisdienste. Daraus resultiert ein deutlich reduzierter Entwicklungs- und Laufzeitaufwand fuer die Anwendung. ARTS schafft eine integrierende Plattform, die moderne Technologien aus dem Bereich objektorientierter Laufzeitsysteme mit praxisrelevanten Anforderungen aus dem Bereich des wissenschaftlichen Hoechstleistungsrechnens kombiniert. (orig.)

  4. Three-dimensional electromagnetic modeling and inversion on massively parallel computers

    Energy Technology Data Exchange (ETDEWEB)

    Newman, G.A.; Alumbaugh, D.L. [Sandia National Labs., Albuquerque, NM (United States). Geophysics Dept.

    1996-03-01

    This report has demonstrated techniques that can be used to construct solutions to the 3-D electromagnetic inverse problem using full wave equation modeling. To this point great progress has been made in developing an inverse solution using the method of conjugate gradients which employs a 3-D finite difference solver to construct model sensitivities and predicted data. The forward modeling code has been developed to incorporate absorbing boundary conditions for high frequency solutions (radar), as well as complex electrical properties, including electrical conductivity, dielectric permittivity and magnetic permeability. In addition both forward and inverse codes have been ported to a massively parallel computer architecture which allows for more realistic solutions that can be achieved with serial machines. While the inversion code has been demonstrated on field data collected at the Richmond field site, techniques for appraising the quality of the reconstructions still need to be developed. Here it is suggested that rather than employing direct matrix inversion to construct the model covariance matrix which would be impossible because of the size of the problem, one can linearize about the 3-D model achieved in the inverse and use Monte-Carlo simulations to construct it. Using these appraisal and construction tools, it is now necessary to demonstrate 3-D inversion for a variety of EM data sets that span the frequency range from induction sounding to radar: below 100 kHz to 100 MHz. Appraised 3-D images of the earth`s electrical properties can provide researchers opportunities to infer the flow paths, flow rates and perhaps the chemistry of fluids in geologic mediums. It also offers a means to study the frequency dependence behavior of the properties in situ. This is of significant relevance to the Department of Energy, paramount to characterizing and monitoring of environmental waste sites and oil and gas exploration.

  5. Cholla : A New Massively-Parallel Hydrodynamics Code For Astrophysical Simulation

    CERN Document Server

    Schneider, Evan E

    2014-01-01

    We present Cholla (Computational Hydrodynamics On ParaLLel Architectures), a new three-dimensional hydrodynamics code that harnesses the power of graphics processing units (GPUs) to accelerate astrophysical simulations. Cholla models the Euler equations on a static mesh using state-of-the-art techniques, including the unsplit Corner Transport Upwind (CTU) algorithm, a variety of exact and approximate Riemann solvers, and multiple spatial reconstruction techniques including the piecewise parabolic method (PPM). Cholla performs all hydrodynamical calculations in a massively-parallel manner, using GPUs to evolve the fluid properties of thousands of cells simultaneously while leaving the power of central processing units (CPUs) available for modeling additional physics. On current hardware, Cholla can update more than ten million cells per GPU-second while using an exact Riemann solver and PPM reconstruction with the CTU algorithm. Owing to the massively-parallel architecture of GPUs and the design of the Cholla ...

  6. Implementation of Shifted Periodic Boundary Conditions in the Large-Scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) Software

    Science.gov (United States)

    2015-08-01

    Atomic /Molecular Massively Parallel Simulator (LAMMPS) Software by N Scott Weingarten and James P Larentzos Approved for...0687 ● AUG 2015 US Army Research Laboratory Implementation of Shifted Periodic Boundary Conditions in the Large-Scale Atomic /Molecular...Shifted Periodic Boundary Conditions in the Large-Scale Atomic /Molecular Massively Parallel Simulator (LAMMPS) Software 5a. CONTRACT NUMBER 5b

  7. CODE BLUE: Three dimensional massively-parallel simulation of multi-scale configurations

    Science.gov (United States)

    Juric, Damir; Kahouadji, Lyes; Chergui, Jalel; Shin, Seungwon; Craster, Richard; Matar, Omar

    2016-11-01

    We present recent progress on BLUE, a solver for massively parallel simulations of fully three-dimensional multiphase flows which runs on a variety of computer architectures from laptops to supercomputers and on 131072 threads or more (limited only by the availability to us of more threads). The code is wholly written in Fortran 2003 and uses a domain decomposition strategy for parallelization with MPI. The fluid interface solver is based on a parallel implementation of a hybrid Front Tracking/Level Set method designed to handle highly deforming interfaces with complex topology changes. We developed parallel GMRES and multigrid iterative solvers suited to the linear systems arising from the implicit solution for the fluid velocities and pressure in the presence of strong density and viscosity discontinuities across fluid phases. Particular attention is drawn to the details and performance of the parallel Multigrid solver. EPSRC UK Programme Grant MEMPHIS (EP/K003976/1).

  8. Massive Machine-Type Communication (mMTC) Access with Integrated Authentication

    DEFF Research Database (Denmark)

    Pratas, Nuno; Pattathil, Sarath; Stefanovic, Cedomir

    2017-01-01

    We present a connection establishment protocol with integrated authentication, suited for Massive Machine-Type Communications (mMTC). The protocol is contention-based and its main feature is that a device contends with a unique signature that also enables the authentication of the device towards...

  9. Porting Gravitational Wave Signal Extraction to Parallel Virtual Machine (PVM)

    Science.gov (United States)

    Thirumalainambi, Rajkumar; Thompson, David E.; Redmon, Jeffery

    2009-01-01

    Laser Interferometer Space Antenna (LISA) is a planned NASA-ESA mission to be launched around 2012. The Gravitational Wave detection is fundamentally the determination of frequency, source parameters, and waveform amplitude derived in a specific order from the interferometric time-series of the rotating LISA spacecrafts. The LISA Science Team has developed a Mock LISA Data Challenge intended to promote the testing of complicated nested search algorithms to detect the 100-1 millihertz frequency signals at amplitudes of 10E-21. However, it has become clear that, sequential search of the parameters is very time consuming and ultra-sensitive; hence, a new strategy has been developed. Parallelization of existing sequential search algorithms of Gravitational Wave signal identification consists of decomposing sequential search loops, beginning with outermost loops and working inward. In this process, the main challenge is to detect interdependencies among loops and partitioning the loops so as to preserve concurrency. Existing parallel programs are based upon either shared memory or distributed memory paradigms. In PVM, master and node programs are used to execute parallelization and process spawning. The PVM can handle process management and process addressing schemes using a virtual machine configuration. The task scheduling and the messaging and signaling can be implemented efficiently for the LISA Gravitational Wave search process using a master and 6 nodes. This approach is accomplished using a server that is available at NASA Ames Research Center, and has been dedicated to the LISA Data Challenge Competition. Historically, gravitational wave and source identification parameters have taken around 7 days in this dedicated single thread Linux based server. Using PVM approach, the parameter extraction problem can be reduced to within a day. The low frequency computation and a proxy signal-to-noise ratio are calculated in separate nodes that are controlled by the master

  10. Determination of the workspace of a new coordinate-measuring machine using parallel-link mechanism

    Institute of Scientific and Technical Information of China (English)

    2001-01-01

    Presents the detailed algorithm established for determination of workspace for a 3 - DOF coordinate measuring machine using parallel-link mechanism by constructing the inverse kinematic model first and then re viewing the physical and kinematical constraints from the structural characteristics of the parallel-link mechanism, and discusses the actual geometries of workspace and the factors having effect on workspace through computer simulation thereby providing necessary theoretical basis for the research and development of coordinate measuring machines using parallel-link mechanism.

  11. A Solver for Massively Parallel Direct Numerical Simulation of Three-Dimensional Multiphase Flows

    CERN Document Server

    Shin, S; Juric, D

    2014-01-01

    We present a new solver for massively parallel simulations of fully three-dimensional multiphase flows. The solver runs on a variety of computer architectures from laptops to supercomputers and on 65536 threads or more (limited only by the availability to us of more threads). The code is wholly written by the authors in Fortran 2003 and uses a domain decomposition strategy for parallelization with MPI. The fluid interface solver is based on a parallel implementation of the LCRM hybrid Front Tracking/Level Set method designed to handle highly deforming interfaces with complex topology changes. We discuss the implementation of this interface method and its particular suitability to distributed processing where all operations are carried out locally on distributed subdomains. We have developed parallel GMRES and Multigrid iterative solvers suited to the linear systems arising from the implicit solution of the fluid velocities and pressure in the presence of strong density and viscosity discontinuities across flu...

  12. A highly scalable massively parallel fast marching method for the Eikonal equation

    CERN Document Server

    Yang, Jianming

    2015-01-01

    In this study, we present a highly scalable massively parallel implementation of the fast marching method using a domain decomposition approach. Central to this algorithm is a novel restarted narrow band approach that coordinates the frequency of communications and the amount of computations extra to a sequential run for achieving an unprecedented parallel performance. Within each restart, the narrow band fast marching method is executed; simple synchronous local exchanges and global reductions are adopted for communicating updated data in the overlapping regions between neighboring subdomains and getting the latest front status, respectively. The independence of front characteristics is exploited through special data structures and augmented status tags to extract the masked parallelism within the fast marching method. The efficiency, flexibility, and applicability of the parallel algorithm are demonstrated through several examples. These problems are extensively tested on grids with up to 1 billion points u...

  13. A Massive Data Parallel Computational Framework for Petascale/Exascale Hybrid Computer Systems

    CERN Document Server

    Blazewicz, Marek; Diener, Peter; Koppelman, David M; Kurowski, Krzysztof; Löffler, Frank; Schnetter, Erik; Tao, Jian

    2012-01-01

    Heterogeneous systems are becoming more common on High Performance Computing (HPC) systems. Even using tools like CUDA and OpenCL it is a non-trivial task to obtain optimal performance on the GPU. Approaches to simplifying this task include Merge (a library based framework for heterogeneous multi-core systems), Zippy (a framework for parallel execution of codes on multiple GPUs), BSGP (a new programming language for general purpose computation on the GPU) and CUDA-lite (an enhancement to CUDA that transforms code based on annotations). In addition, efforts are underway to improve compiler tools for automatic parallelization and optimization of affine loop nests for GPUs and for automatic translation of OpenMP parallelized codes to CUDA. In this paper we present an alternative approach: a new computational framework for the development of massively data parallel scientific codes applications suitable for use on such petascale/exascale hybrid systems built upon the highly scalable Cactus framework. As the first...

  14. A highly scalable massively parallel fast marching method for the Eikonal equation

    Science.gov (United States)

    Yang, Jianming; Stern, Frederick

    2017-03-01

    The fast marching method is a widely used numerical method for solving the Eikonal equation arising from a variety of scientific and engineering fields. It is long deemed inherently sequential and an efficient parallel algorithm applicable to large-scale practical applications is not available in the literature. In this study, we present a highly scalable massively parallel implementation of the fast marching method using a domain decomposition approach. Central to this algorithm is a novel restarted narrow band approach that coordinates the frequency of communications and the amount of computations extra to a sequential run for achieving an unprecedented parallel performance. Within each restart, the narrow band fast marching method is executed; simple synchronous local exchanges and global reductions are adopted for communicating updated data in the overlapping regions between neighboring subdomains and getting the latest front status, respectively. The independence of front characteristics is exploited through special data structures and augmented status tags to extract the masked parallelism within the fast marching method. The efficiency, flexibility, and applicability of the parallel algorithm are demonstrated through several examples. These problems are extensively tested on six grids with up to 1 billion points using different numbers of processes ranging from 1 to 65536. Remarkable parallel speedups are achieved using tens of thousands of processes. Detailed pseudo-codes for both the sequential and parallel algorithms are provided to illustrate the simplicity of the parallel implementation and its similarity to the sequential narrow band fast marching algorithm.

  15. Analysis of gallium arsenide deposition in a horizontal chemical vapor deposition reactor using massively parallel computations

    Energy Technology Data Exchange (ETDEWEB)

    Salinger, A.G.; Shadid, J.N.; Hutchinson, S.A. [and others

    1998-01-01

    A numerical analysis of the deposition of gallium from trimethylgallium (TMG) and arsine in a horizontal CVD reactor with tilted susceptor and a three inch diameter rotating substrate is performed. The three-dimensional model includes complete coupling between fluid mechanics, heat transfer, and species transport, and is solved using an unstructured finite element discretization on a massively parallel computer. The effects of three operating parameters (the disk rotation rate, inlet TMG fraction, and inlet velocity) and two design parameters (the tilt angle of the reactor base and the reactor width) on the growth rate and uniformity are presented. The nonlinear dependence of the growth rate uniformity on the key operating parameters is discussed in detail. Efficient and robust algorithms for massively parallel reacting flow simulations, as incorporated into our analysis code MPSalsa, make detailed analysis of this complicated system feasible.

  16. Development of Microreactor Array Chip-Based Measurement System for Massively Parallel Analysis of Enzymatic Activity

    Science.gov (United States)

    Hosoi, Yosuke; Akagi, Takanori; Ichiki, Takanori

    Microarray chip technology such as DNA chips, peptide chips and protein chips is one of the promising approaches for achieving high-throughput screening (HTS) of biomolecule function since it has great advantages in feasibility of automated information processing due to one-to-one indexing between array position and molecular function as well as massively parallel sample analysis as a benefit of down-sizing and large-scale integration. Mostly, however, the function that can be evaluated by such microarray chips is limited to affinity of target molecules. In this paper, we propose a new HTS system of enzymatic activity based on microreactor array chip technology. A prototype of the automated and massively parallel measurement system for fluorometric assay of enzymatic reactions was developed by the combination of microreactor array chips and a highly-sensitive fluorescence microscope. Design strategy of microreactor array chips and an optical measurement platform for the high-throughput enzyme assay are discussed.

  17. A domain decomposition study of massively parallel computing in compressible gas dynamics

    Energy Technology Data Exchange (ETDEWEB)

    Wong, C.C.; Blottner, F.G.; Payne, J.L. [Sandia National Labs., Albuquerque, NM (United States); Soetrisno, M. [Amtec Engineering, Inc., Bellevue, WA (United States)

    1995-01-01

    The appropriate utilization of massively parallel computers for solving the Navier-Stokes equations is investigated and determined from an engineering perspective. The issues investigated are: (1) Should strip or patch domain decomposition of the spatial mesh be used to reduce computer time? (2) How many computer nodes should be used for a problem with a given sized mesh to reduce computer time? (3) Is the convergence of the Navier-Stokes solution procedure (LU-SGS) adversely influenced by the domain decomposition approach? The results of the paper show that the present Navier-Stokes solution technique has good performance on a massively parallel computer for transient flow problems. For steady-state problems with a large number of mesh cells, the solution procedure will require significant computer time due to an increased number of iterations to achieve a converged solution. There is an optimum number of computer nodes to use for a problem with a given global mesh size.

  18. Performance analysis of massively parallel embedded hardware architectures for retinal image processing

    Directory of Open Access Journals (Sweden)

    Osorio Roberto

    2011-01-01

    Full Text Available Abstract This paper examines the implementation of a retinal vessel tree extraction technique on different hardware platforms and architectures. Retinal vessel tree extraction is a representative application of those found in the domain of medical image processing. The low signal-to-noise ratio of the images leads to a large amount of low-level tasks in order to meet the accuracy requirements. In some applications, this might compromise computing speed. This paper is focused on the assessment of the performance of a retinal vessel tree extraction method on different hardware platforms. In particular, the retinal vessel tree extraction method is mapped onto a massively parallel SIMD (MP-SIMD chip, a massively parallel processor array (MPPA and onto an field-programmable gate arrays (FPGA.

  19. Developing a Massively Parallel Forward Projection Radiography Model for Large-Scale Industrial Applications

    Energy Technology Data Exchange (ETDEWEB)

    Bauerle, Matthew [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

    2014-08-01

    This project utilizes Graphics Processing Units (GPUs) to compute radiograph simulations for arbitrary objects. The generation of radiographs, also known as the forward projection imaging model, is computationally intensive and not widely utilized. The goal of this research is to develop a massively parallel algorithm that can compute forward projections for objects with a trillion voxels (3D pixels). To achieve this end, the data are divided into blocks that can each t into GPU memory. The forward projected image is also divided into segments to allow for future parallelization and to avoid needless computations.

  20. Increasing phylogenetic resolution at low taxonomic levels using massively parallel sequencing of chloroplast genomes

    Directory of Open Access Journals (Sweden)

    Cronn Richard

    2009-12-01

    Full Text Available Abstract Background Molecular evolutionary studies share the common goal of elucidating historical relationships, and the common challenge of adequately sampling taxa and characters. Particularly at low taxonomic levels, recent divergence, rapid radiations, and conservative genome evolution yield limited sequence variation, and dense taxon sampling is often desirable. Recent advances in massively parallel sequencing make it possible to rapidly obtain large amounts of sequence data, and multiplexing makes extensive sampling of megabase sequences feasible. Is it possible to efficiently apply massively parallel sequencing to increase phylogenetic resolution at low taxonomic levels? Results We reconstruct the infrageneric phylogeny of Pinus from 37 nearly-complete chloroplast genomes (average 109 kilobases each of an approximately 120 kilobase genome generated using multiplexed massively parallel sequencing. 30/33 ingroup nodes resolved with ≥ 95% bootstrap support; this is a substantial improvement relative to prior studies, and shows massively parallel sequencing-based strategies can produce sufficient high quality sequence to reach support levels originally proposed for the phylogenetic bootstrap. Resampling simulations show that at least the entire plastome is necessary to fully resolve Pinus, particularly in rapidly radiating clades. Meta-analysis of 99 published infrageneric phylogenies shows that whole plastome analysis should provide similar gains across a range of plant genera. A disproportionate amount of phylogenetic information resides in two loci (ycf1, ycf2, highlighting their unusual evolutionary properties. Conclusion Plastome sequencing is now an efficient option for increasing phylogenetic resolution at lower taxonomic levels in plant phylogenetic and population genetic analyses. With continuing improvements in sequencing capacity, the strategies herein should revolutionize efforts requiring dense taxon and character sampling

  1. Parallel contributing area calculation with granularity control on massive grid terrain datasets

    Science.gov (United States)

    Jiang, Ling; Tang, Guoan; Liu, Xuejun; Song, Xiaodong; Yang, Jianyi; Liu, Kai

    2013-10-01

    The calculation of contributing areas from digital elevation models (DEMs) is one of the important tasks in digital terrain analysis (DTA). The computational process usually involves two steps in a real application: (1) calculating flow directions via a flow model, and (2) computing the contributing area for each grid cell in the DEM. The traditional algorithm for calculating contributing areas is coded as a sequential program executed on a single processor. With the increase of scope and resolution of DEMs, the serial algorithm has become increasingly difficult to perform and is often very time-consuming, especially for DEMs of large areas and fine scales. In recent years, parallel computing is able to meet this challenge with the development of computer technology. However, the parallel implementation with granularity control, an efficient strategy to reap the best parallel performance and to break the limitation of computing resources in processing massive grid terrain datasets, has not been found in DTA research field. This paper develops a message-passing-interface (MPI) parallel approach with granularity control to calculate contributing areas. According to the proposed parallelization strategy, the parallel D8 algorithm with granularity control is designed as well as the parallel AreaD8 algorithm. Based on the domain decomposition of DEM data, it is possible for each process to process multiple partitions decomposed under a grain size. According to an iterative procedure of reading source data, executing the operator and writing resulting data, the partitions achieve the calculation results one by one in each process. The experimental results on a multi-node cluster show that the proposed parallel algorithms with granularity control are the powerful tools to process the big dataset and the parallel D8 algorithm is insensitive to granularity, while the parallel AreaD8 algorithm has an optimal grain size to reap the best parallel performance.

  2. 3-PRS serial-parallel machine tool error calibration and parameter identification

    Institute of Scientific and Technical Information of China (English)

    ZHAO Jun-wei; DAI Jun; HUANG Jun-jie

    2009-01-01

    3-PRS serial-parallel machine tool consists of a 3-degree-of-freedom (DOF) implementation platform and a 2-DOF X-Y platform. The error modeling and parameter identification methods were deduced based on 3-PRS serial-parallel machine tool. 3-PRS serial-parallel machine tool was researched, and the mechanism of error analysis, modeling, identification of error parameters and measurement equipment for the use of agency error of measurement were conducted. In order to achieve the geometric parameters calibration and error compensation of the serial-parallel machine tool, the nominal structural parameters of the controller was adjusted by identifying the structure of the machine tool. With the establishment of a vector space size chain, we can do the error analysis, error modeling, error measurement and error compensation can be done.

  3. Using CLIPS in the domain of knowledge-based massively parallel programming

    Science.gov (United States)

    Dvorak, Jiri J.

    1994-01-01

    The Program Development Environment (PDE) is a tool for massively parallel programming of distributed-memory architectures. Adopting a knowledge-based approach, the PDE eliminates the complexity introduced by parallel hardware with distributed memory and offers complete transparency in respect of parallelism exploitation. The knowledge-based part of the PDE is realized in CLIPS. Its principal task is to find an efficient parallel realization of the application specified by the user in a comfortable, abstract, domain-oriented formalism. A large collection of fine-grain parallel algorithmic skeletons, represented as COOL objects in a tree hierarchy, contains the algorithmic knowledge. A hybrid knowledge base with rule modules and procedural parts, encoding expertise about application domain, parallel programming, software engineering, and parallel hardware, enables a high degree of automation in the software development process. In this paper, important aspects of the implementation of the PDE using CLIPS and COOL are shown, including the embedding of CLIPS with C++-based parts of the PDE. The appropriateness of the chosen approach and of the CLIPS language for knowledge-based software engineering are discussed.

  4. Parallel Machine Scheduling (PMS) in Manufacturing Systems Using the Ant Colonies Optimization Algorithmic Rule

    Science.gov (United States)

    Senthiil, P. V.; Selladurai, V.; Rajesh, R.

    This study introduces a new approach for decentralized scheduling in a parallel machine environment based on the ant colonies optimization algorithm. The algorithm extends the use of the traveling salesman problem for scheduling in one single machine, to a multiple machine problem. The results are presented using simple and illustrative examples and show that the algorithm is able to optimize the different scheduling problems. Using the same parameters, the completion time of the tasks is minimized and the processing time of the parallel machines is balanced.

  5. Massively Parallel Computing at the Large Hadron Collider up to the HL-LHC

    CERN Document Server

    AUTHOR|(CDS)2080997; Halyo, Valerie

    2015-01-01

    As the Large Hadron Collider (LHC) continues its upward progression in energy and luminosity towards the planned High-Luminosity LHC (HL-LHC) in 2025, the challenges of the experiments in processing increasingly complex events will also continue to increase. Improvements in computing technologies and algorithms will be a key part of the advances necessary to meet this challenge. Parallel computing techniques, especially those using massively parallel computing (MPC), promise to be a significant part of this effort. In these proceedings, we discuss these algorithms in the specific context of a particularly important problem: the reconstruction of charged particle tracks in the trigger algorithms in an experiment, in which high computing performance is critical for executing the track reconstruction in the available time. We discuss some areas where parallel computing has already shown benefits to the LHC experiments, and also demonstrate how a MPC-based trigger at the CMS experiment could not only improve perf...

  6. Massively parallel-in-space-time, adaptive finite element framework for non-linear parabolic equations

    CERN Document Server

    Dyja, Robert; van der Zee, Kristoffer G

    2016-01-01

    We present an adaptive methodology for the solution of (linear and) non-linear time dependent problems that is especially tailored for massively parallel computations. The basic concept is to solve for large blocks of space-time unknowns instead of marching sequentially in time. The methodology is a combination of a computationally efficient implementation of a parallel-in-space-time finite element solver coupled with a posteriori space-time error estimates and a parallel mesh generator. This methodology enables, in principle, simultaneous adaptivity in both space and time (within the block) domains. We explore this basic concept in the context of a variety of time-steppers including $\\Theta$-schemes and Backward Differentiate Formulas. We specifically illustrate this framework with applications involving time dependent linear, quasi-linear and semi-linear diffusion equations. We focus on investigating how the coupled space-time refinement indicators for this class of problems affect spatial adaptivity. Final...

  7. Fast structural design and analysis via hybrid domain decomposition on massively parallel processors

    Science.gov (United States)

    Farhat, Charbel

    1993-01-01

    A hybrid domain decomposition framework for static, transient and eigen finite element analyses of structural mechanics problems is presented. Its basic ingredients include physical substructuring and /or automatic mesh partitioning, mapping algorithms, 'gluing' approximations for fast design modifications and evaluations, and fast direct and preconditioned iterative solvers for local and interface subproblems. The overall methodology is illustrated with the structural design of a solar viewing payload that is scheduled to fly in March 1993. This payload has been entirely designed and validated by a group of undergraduate students at the University of Colorado using the proposed hybrid domain decomposition approach on a massively parallel processor. Performance results are reported on the CRAY Y-MP/8 and the iPSC-860/64 Touchstone systems, which represent both extreme parallel architectures. The hybrid domain decomposition methodology is shown to outperform leading solution algorithms and to exhibit an excellent parallel scalability.

  8. III - Template Metaprogramming for massively parallel scientific computing - Templates for Iteration; Thread-level Parallelism

    CERN Document Server

    CERN. Geneva

    2016-01-01

    Large scale scientific computing raises questions on different levels ranging from the fomulation of the problems to the choice of the best algorithms and their implementation for a specific platform. There are similarities in these different topics that can be exploited by modern-style C++ template metaprogramming techniques to produce readable, maintainable and generic code. Traditional low-level code tend to be fast but platform-dependent, and it obfuscates the meaning of the algorithm. On the other hand, object-oriented approach is nice to read, but may come with an inherent performance penalty. These lectures aim to present he basics of the Expression Template (ET) idiom which allows us to keep the object-oriented approach without sacrificing performance. We will in particular show to to enhance ET to include SIMD vectorization. We will then introduce techniques for abstracting iteration, and introduce thread-level parallelism for use in heavy data-centric loads. We will show to to apply these methods i...

  9. A generic, hierarchical framework for massively parallel Wang-Landau sampling

    CERN Document Server

    Vogel, Thomas; Wüst, Thomas; Landau, David P

    2013-01-01

    We introduce a parallel Wang-Landau method based on the replica-exchange framework for Monte Carlo simulations. To demonstrate its advantages and general applicability for simulations of complex systems, we apply it to different spin models including spin glasses, the Ising model and the Potts model, lattice protein adsorption, and the self-assembly process in amphiphilic solutions. Without loss of accuracy, the method gives significant speed-up and potentially scales up to petaflop machines.

  10. Massively parallel data processing for quantitative total flow imaging with optical coherence microscopy and tomography

    Science.gov (United States)

    Sylwestrzak, Marcin; Szlag, Daniel; Marchand, Paul J.; Kumar, Ashwin S.; Lasser, Theo

    2017-08-01

    We present an application of massively parallel processing of quantitative flow measurements data acquired using spectral optical coherence microscopy (SOCM). The need for massive signal processing of these particular datasets has been a major hurdle for many applications based on SOCM. In view of this difficulty, we implemented and adapted quantitative total flow estimation algorithms on graphics processing units (GPU) and achieved a 150 fold reduction in processing time when compared to a former CPU implementation. As SOCM constitutes the microscopy counterpart to spectral optical coherence tomography (SOCT), the developed processing procedure can be applied to both imaging modalities. We present the developed DLL library integrated in MATLAB (with an example) and have included the source code for adaptations and future improvements. Catalogue identifier: AFBT_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AFBT_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU GPLv3 No. of lines in distributed program, including test data, etc.: 913552 No. of bytes in distributed program, including test data, etc.: 270876249 Distribution format: tar.gz Programming language: CUDA/C, MATLAB. Computer: Intel x64 CPU, GPU supporting CUDA technology. Operating system: 64-bit Windows 7 Professional. Has the code been vectorized or parallelized?: Yes, CPU code has been vectorized in MATLAB, CUDA code has been parallelized. RAM: Dependent on users parameters, typically between several gigabytes and several tens of gigabytes Classification: 6.5, 18. Nature of problem: Speed up of data processing in optical coherence microscopy Solution method: Utilization of GPU for massively parallel data processing Additional comments: Compiled DLL library with source code and documentation, example of utilization (MATLAB script with raw data) Running time: 1,8 s for one B-scan (150 × faster in comparison to the CPU

  11. Evaluating automatically parallelized versions of the support vector machine

    NARCIS (Netherlands)

    Codreanu, Valeriu; Droge, Bob; Williams, David; Yasar, Burhan; Yang, Fo; Liu, Baoquan; Dong, Feng; Surinta, Olarik; Schomaker, Lambertus; Roerdink, Jos; Wiering, Marco

    2014-01-01

    The support vector machine (SVM) is a supervised learning algorithm used for recognizing patterns in data. It is a very popular technique in machine learning and has been successfully used in applications such as image classification, protein classification, and handwriting recognition. However, the

  12. Online Scheduling with Delivery Time on a Bounded Parallel Batch Machine with Limited Restart

    National Research Council Canada - National Science Library

    Liu, Hailing; Wan, Long; Yan, Zhigang; Yuan, Jinjiang

    2015-01-01

      We consider the online (over time) scheduling of equal length jobs on a bounded parallel batch machine with batch capacity b to minimize the time by which all jobs have been delivered with limited restart...

  13. A general purpose subroutine for fast fourier transform on a distributed memory parallel machine

    Science.gov (United States)

    Dubey, A.; Zubair, M.; Grosch, C. E.

    1992-01-01

    One issue which is central in developing a general purpose Fast Fourier Transform (FFT) subroutine on a distributed memory parallel machine is the data distribution. It is possible that different users would like to use the FFT routine with different data distributions. Thus, there is a need to design FFT schemes on distributed memory parallel machines which can support a variety of data distributions. An FFT implementation on a distributed memory parallel machine which works for a number of data distributions commonly encountered in scientific applications is presented. The problem of rearranging the data after computing the FFT is also addressed. The performance of the implementation on a distributed memory parallel machine Intel iPSC/860 is evaluated.

  14. LDRD final report on massively-parallel linear programming : the parPCx system.

    Energy Technology Data Exchange (ETDEWEB)

    Parekh, Ojas (Emory University, Atlanta, GA); Phillips, Cynthia Ann; Boman, Erik Gunnar

    2005-02-01

    This report summarizes the research and development performed from October 2002 to September 2004 at Sandia National Laboratories under the Laboratory-Directed Research and Development (LDRD) project ''Massively-Parallel Linear Programming''. We developed a linear programming (LP) solver designed to use a large number of processors. LP is the optimization of a linear objective function subject to linear constraints. Companies and universities have expended huge efforts over decades to produce fast, stable serial LP solvers. Previous parallel codes run on shared-memory systems and have little or no distribution of the constraint matrix. We have seen no reports of general LP solver runs on large numbers of processors. Our parallel LP code is based on an efficient serial implementation of Mehrotra's interior-point predictor-corrector algorithm (PCx). The computational core of this algorithm is the assembly and solution of a sparse linear system. We have substantially rewritten the PCx code and based it on Trilinos, the parallel linear algebra library developed at Sandia. Our interior-point method can use either direct or iterative solvers for the linear system. To achieve a good parallel data distribution of the constraint matrix, we use a (pre-release) version of a hypergraph partitioner from the Zoltan partitioning library. We describe the design and implementation of our new LP solver called parPCx and give preliminary computational results. We summarize a number of issues related to efficient parallel solution of LPs with interior-point methods including data distribution, numerical stability, and solving the core linear system using both direct and iterative methods. We describe a number of applications of LP specific to US Department of Energy mission areas and we summarize our efforts to integrate parPCx (and parallel LP solvers in general) into Sandia's massively-parallel integer programming solver PICO (Parallel Interger and

  15. O( N) tight-binding molecular dynamics on massively parallel computers: an orbital decomposition approach

    Science.gov (United States)

    Canning, A.; Galli, G.; Mauri, F.; De Vita, A.; Car, R.

    1996-04-01

    The implementation of an O( N) tight-binding molecular dynamics code on the Cray T3D parallel computer is discussed. The O( N) energy functional depends on non-orthogonal, localised orbitals and a chemical potential parameter which determines the number of electrons in the system. The localisation introduces a sparse nature to the orbital data and Hamiltonian matrix, greatly changing the coding on parallel machines compared to non-localised systems. The data distribution, communication routines and dynamic load-balancing scheme of the program are presented in detail together with the speed and scaling of the code on various homogeneous and inhomogeneous physical systems. Performance results will be presented for systems of 2048 to 32768 atoms on 32 to 512 processors. We discuss the relevance to quantum molecular dynamics simulations with localised orbitals, of techniques used for programming short-range classical molecular dynamics simulations on parallel machines. The absence of global communications and the localised nature of the orbitals makes these algorithms extremely scalable in terms of memory and speed on parallel systems with fast communications. The main aim of this article is to present in detail all the new concepts and programming techniques that localisation of the orbitals introduces which scientists, coming from a background in non-localised quantum molecular dynamics simulations, may be unfamiliar with.

  16. Research on a Novel Parallel Engraving Machine and its Key Technologies

    OpenAIRE

    Zhang Shi-hui; Kong Ling-fu

    2008-01-01

    In order to compensate the disadvantages of conventional engraving machine and exert the advantages of parallel mechanism, a novel parallel engraving machine is presented and some key technologies are studied in this paper. Mechanism performances are analyzed in terms of the first and the second order influence coefficient matrix firstly. So the sizes of mechanism, which are better for all the performance indices of both kinematics and dynamics, can be confirmed and the restriction due to con...

  17. A Multiobjective Optimization Approach to Solve a Parallel Machines Scheduling Problem

    OpenAIRE

    2010-01-01

    A multiobjective optimization problem which focuses on parallel machines scheduling is considered. This problem consists of scheduling independent jobs on identical parallel machines with release dates, due dates, and sequence-dependent setup times. The preemption of jobs is forbidden. The aim is to minimize two different objectives: makespan and total tardiness. The contribution of this paper is to propose first a new mathematical model for this specific p...

  18. Visualizing Network Traffic to Understand the Performance of Massively Parallel Simulations

    KAUST Repository

    Landge, A. G.

    2012-12-01

    The performance of massively parallel applications is often heavily impacted by the cost of communication among compute nodes. However, determining how to best use the network is a formidable task, made challenging by the ever increasing size and complexity of modern supercomputers. This paper applies visualization techniques to aid parallel application developers in understanding the network activity by enabling a detailed exploration of the flow of packets through the hardware interconnect. In order to visualize this large and complex data, we employ two linked views of the hardware network. The first is a 2D view, that represents the network structure as one of several simplified planar projections. This view is designed to allow a user to easily identify trends and patterns in the network traffic. The second is a 3D view that augments the 2D view by preserving the physical network topology and providing a context that is familiar to the application developers. Using the massively parallel multi-physics code pF3D as a case study, we demonstrate that our tool provides valuable insight that we use to explain and optimize pF3D-s performance on an IBM Blue Gene/P system. © 1995-2012 IEEE.

  19. On the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods.

    Science.gov (United States)

    Lee, Anthony; Yau, Christopher; Giles, Michael B; Doucet, Arnaud; Holmes, Christopher C

    2010-12-01

    We present a case-study on the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. Graphics cards, containing multiple Graphics Processing Units (GPUs), are self-contained parallel computational devices that can be housed in conventional desktop and laptop computers and can be thought of as prototypes of the next generation of many-core processors. For certain classes of population-based Monte Carlo algorithms they offer massively parallel simulation, with the added advantage over conventional distributed multi-core processors that they are cheap, easily accessible, easy to maintain, easy to code, dedicated local devices with low power consumption. On a canonical set of stochastic simulation examples including population-based Markov chain Monte Carlo methods and Sequential Monte Carlo methods, we nd speedups from 35 to 500 fold over conventional single-threaded computer code. Our findings suggest that GPUs have the potential to facilitate the growth of statistical modelling into complex data rich domains through the availability of cheap and accessible many-core computation. We believe the speedup we observe should motivate wider use of parallelizable simulation methods and greater methodological attention to their design.

  20. Massive Parallelism of Monte-Carlo Simulation on Low-End Hardware using Graphic Processing Units

    Energy Technology Data Exchange (ETDEWEB)

    Mburu, Joe Mwangi; Hah, Chang Joo Hah [KEPCO International Nuclear Graduate School, Ulsan (Korea, Republic of)

    2014-05-15

    Within the past decade, research has been done on utilizing GPU massive parallelization in core simulation with impressive results but unfortunately, not much commercial application has been done in the nuclear field especially in reactor core simulation. The purpose of this paper is to give an introductory concept on the topic and illustrate the potential of exploiting the massive parallel nature of GPU computing on a simple monte-carlo simulation with very minimal hardware specifications. To do a comparative analysis, a simple two dimension monte-carlo simulation is implemented for both the CPU and GPU in order to evaluate performance gain based on the computing devices. The heterogeneous platform utilized in this analysis is done on a slow notebook with only 1GHz processor. The end results are quite surprising whereby high speedups obtained are almost a factor of 10. In this work, we have utilized heterogeneous computing in a GPU-based approach in applying potential high arithmetic intensive calculation. By applying a complex monte-carlo simulation on GPU platform, we have speed up the computational process by almost a factor of 10 based on one million neutrons. This shows how easy, cheap and efficient it is in using GPU in accelerating scientific computing and the results should encourage in exploring further this avenue especially in nuclear reactor physics simulation where deterministic and stochastic calculations are quite favourable in parallelization.

  1. A cost-effective methodology for the design of massively-parallel VLSI functional units

    Science.gov (United States)

    Venkateswaran, N.; Sriram, G.; Desouza, J.

    1993-01-01

    In this paper we propose a generalized methodology for the design of cost-effective massively-parallel VLSI Functional Units. This methodology is based on a technique of generating and reducing a massive bit-array on the mask-programmable PAcube VLSI array. This methodology unifies (maintains identical data flow and control) the execution of complex arithmetic functions on PAcube arrays. It is highly regular, expandable and uniform with respect to problem-size and wordlength, thereby reducing the communication complexity. The memory-functional unit interface is regular and expandable. Using this technique functional units of dedicated processors can be mask-programmed on the naked PAcube arrays, reducing the turn-around time. The production cost of such dedicated processors can be drastically reduced since the naked PAcube arrays can be mass-produced. Analysis of the the performance of functional units designed by our method yields promising results.

  2. Automated Parallel Computing Tools for Multicore Machines and Clusters Project

    Data.gov (United States)

    National Aeronautics and Space Administration — We propose to improve productivity of high performance computing for applications on multicore computers and clusters. These machines built from one or more chips...

  3. The Utilization of Parallel Corpora for the Extension of Machine ...

    African Journals Online (AJOL)

    ety of areas such as lexical knowledge acquisition, grammar-construction and machine .... ever linguistic knowledge is used by the system, is derived empirically by examining the real ...... The KANT Perspective: A Critique of Pure. Transfer.

  4. Time-dependent density-functional theory in massively parallel computer architectures: the octopus project

    Science.gov (United States)

    Andrade, Xavier; Alberdi-Rodriguez, Joseba; Strubbe, David A.; Oliveira, Micael J. T.; Nogueira, Fernando; Castro, Alberto; Muguerza, Javier; Arruabarrena, Agustin; Louie, Steven G.; Aspuru-Guzik, Alán; Rubio, Angel; Marques, Miguel A. L.

    2012-06-01

    Octopus is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this paper we present the ongoing efforts to achieve the parallelization of octopus. We focus on the real-time variant of TDDFT, where the time-dependent Kohn-Sham equations are directly propagated in time. This approach has great potential for execution in massively parallel systems such as modern supercomputers with thousands of processors and graphics processing units (GPUs). For harvesting the potential of conventional supercomputers, the main strategy is a multi-level parallelization scheme that combines the inherent scalability of real-time TDDFT with a real-space grid domain-partitioning approach. A scalable Poisson solver is critical for the efficiency of this scheme. For GPUs, we show how using blocks of Kohn-Sham states provides the required level of data parallelism and that this strategy is also applicable for code optimization on standard processors. Our results show that real-time TDDFT, as implemented in octopus, can be the method of choice for studying the excited states of large molecular systems in modern parallel architectures.

  5. Commodity cluster and hardware-based massively parallel implementations of hyperspectral imaging algorithms

    Science.gov (United States)

    Plaza, Antonio; Chang, Chein-I.; Plaza, Javier; Valencia, David

    2006-05-01

    The incorporation of hyperspectral sensors aboard airborne/satellite platforms is currently producing a nearly continual stream of multidimensional image data, and this high data volume has soon introduced new processing challenges. The price paid for the wealth spatial and spectral information available from hyperspectral sensors is the enormous amounts of data that they generate. Several applications exist, however, where having the desired information calculated quickly enough for practical use is highly desirable. High computing performance of algorithm analysis is particularly important in homeland defense and security applications, in which swift decisions often involve detection of (sub-pixel) military targets (including hostile weaponry, camouflage, concealment, and decoys) or chemical/biological agents. In order to speed-up computational performance of hyperspectral imaging algorithms, this paper develops several fast parallel data processing techniques. Techniques include four classes of algorithms: (1) unsupervised classification, (2) spectral unmixing, and (3) automatic target recognition, and (4) onboard data compression. A massively parallel Beowulf cluster (Thunderhead) at NASA's Goddard Space Flight Center in Maryland is used to measure parallel performance of the proposed algorithms. In order to explore the viability of developing onboard, real-time hyperspectral data compression algorithms, a Xilinx Virtex-II field programmable gate array (FPGA) is also used in experiments. Our quantitative and comparative assessment of parallel techniques and strategies may help image analysts in selection of parallel hyperspectral algorithms for specific applications.

  6. L'orthoglide : une machine-outil rapide d'architecture parall\\`ele isotrope

    CERN Document Server

    Wenger, Philippe; Majou, Félix

    2007-01-01

    This article presents the Orthoglide project. The purpose of this project is the realization of a prototype of machine tool to three degrees of translation. The characteristic of this machine is a parallel kinematic architecture optimized to obtain a compact workspace with homogeneous performance. For that, the principal criterion of design which was used is the isotropy.

  7. A Massive Parallel Variational Multiscale FEM Scheme Applied to Nonhydrostatic Atmospheric Dynamics

    Science.gov (United States)

    Vazquez, Mariano; Marras, Simone; Moragues, Margarida; Jorba, Oriol; Houzeaux, Guillaume; Aubry, Romain

    2010-05-01

    The solution of the fully compressible Euler equations of stratified flows is approached from the point of view of Computational Fluid Dynamics techniques. Specifically, the main aim of this contribution is the introduction of a Variational Multiscale Finite Element (CVMS-FE) approach to solve dry atmospheric dynamics effectively on massive parallel architectures with more than 1000 processors. The conservation form of the equations of motion is discretized in all directions with a Galerkin scheme with stabilization given by the compressible counterpart of the variational multiscale technique of Hughes [1] and Houzeaux et al. [2]. The justification of this effort is twofold: the search of optimal parallelization characteristics and linear scalability trends on petascale machines is one. The development of a numerical algorithm whose local nature helps maintaining minimal the communication among the processors implies, in fact, a large leap towards efficient parallel computing. Second, the rising trend to global models and models of higher spatial resolution naturally suggests the use of adaptive grids to only resolve zones of larger gradients while keeping the computational mesh properly coarse elsewhere (thus keeping the computational cost low). With these two hypotheses in mind, the finite element scheme presented here is an open option to the development of the next generation Numerical Weather Prediction (NWP) codes. This methodology is as new in Computational Fluid Dynamics for compressible flows at low Mach number as it is in Numerical Weather Prediction (NWP). We however mean to show its ability to maintain stability in the solution of thermal, gravity-driven flows in a stratified environment in the specific context of dry atmospheric dynamics. Standard two dimensional benchmarks are implemented and compared against the reference literature. In the context of thermal and gravity-driven flows in a neutral atmosphere, we present: (1) the density current

  8. Massively parallel DNA sequencing facilitates diagnosis of patients with Usher syndrome type 1.

    Directory of Open Access Journals (Sweden)

    Hidekane Yoshimura

    Full Text Available Usher syndrome is an autosomal recessive disorder manifesting hearing loss, retinitis pigmentosa and vestibular dysfunction, and having three clinical subtypes. Usher syndrome type 1 is the most severe subtype due to its profound hearing loss, lack of vestibular responses, and retinitis pigmentosa that appears in prepuberty. Six of the corresponding genes have been identified, making early diagnosis through DNA testing possible, with many immediate and several long-term advantages for patients and their families. However, the conventional genetic techniques, such as direct sequence analysis, are both time-consuming and expensive. Targeted exon sequencing of selected genes using the massively parallel DNA sequencing technology will potentially enable us to systematically tackle previously intractable monogenic disorders and improve molecular diagnosis. Using this technique combined with direct sequence analysis, we screened 17 unrelated Usher syndrome type 1 patients and detected probable pathogenic variants in the 16 of them (94.1% who carried at least one mutation. Seven patients had the MYO7A mutation (41.2%, which is the most common type in Japanese. Most of the mutations were detected by only the massively parallel DNA sequencing. We report here four patients, who had probable pathogenic mutations in two different Usher syndrome type 1 genes, and one case of MYO7A/PCDH15 digenic inheritance. This is the first report of Usher syndrome mutation analysis using massively parallel DNA sequencing and the frequency of Usher syndrome type 1 genes in Japanese. Mutation screening using this technique has the power to quickly identify mutations of many causative genes while maintaining cost-benefit performance. In addition, the simultaneous mutation analysis of large numbers of genes is useful for detecting mutations in different genes that are possibly disease modifiers or of digenic inheritance.

  9. Tumor Genomic Profiling in Breast Cancer Patients Using Targeted Massively Parallel Sequencing

    Science.gov (United States)

    2015-04-30

    McDonald S, Watson M, Dooling DJ, Ota D, Chang LW, Bose R, Ley TJ, 18 Piwnica-Worms D, Stuart JM, Wilson RK, Mardis ER. Whole- genome analysis informs...AWARD NUMBER: W81XWH-13-1-0032 TITLE: Tumor Genomic Profiling in Breast Cancer Patients Using Targeted Massively Parallel Sequencing PRINCIPAL...THE ABOVE ADDRESS. 1. REPORT DATE I 2. REPORT TYPE 3. DATES COVERED 04-30-2015 Annual 01-01-2014 to 04-30-2015 4. TITLE AND SUBTITLE Tumor genomic

  10. A Massively Parallel Solver for the Mechanical Harmonic Analysis of Accelerator Cavities

    Energy Technology Data Exchange (ETDEWEB)

    Kononenko, O. [SLAC National Accelerator Lab., Menlo Park, CA (United States)

    2015-02-17

    ACE3P is a 3D massively parallel simulation suite that developed at SLAC National Accelerator Laboratory that can perform coupled electromagnetic, thermal and mechanical study. Effectively utilizing supercomputer resources, ACE3P has become a key simulation tool for particle accelerator R and D. A new frequency domain solver to perform mechanical harmonic response analysis of accelerator components is developed within the existing parallel framework. This solver is designed to determine the frequency response of the mechanical system to external harmonic excitations for time-efficient accurate analysis of the large-scale problems. Coupled with the ACE3P electromagnetic modules, this capability complements a set of multi-physics tools for a comprehensive study of microphonics in superconducting accelerating cavities in order to understand the RF response and feedback requirements for the operational reliability of a particle accelerator. (auth)

  11. Massively Parallel Computation of Soil Surface Roughness Parameters on A Fermi GPU

    Science.gov (United States)

    Li, Xiaojie; Song, Changhe

    2016-06-01

    Surface roughness is description of the surface micro topography of randomness or irregular. The standard deviation of surface height and the surface correlation length describe the statistical variation for the random component of a surface height relative to a reference surface. When the number of data points is large, calculation of surface roughness parameters is time-consuming. With the advent of Graphics Processing Unit (GPU) architectures, inherently parallel problem can be effectively solved using GPUs. In this paper we propose a GPU-based massively parallel computing method for 2D bare soil surface roughness estimation. This method was applied to the data collected by the surface roughness tester based on the laser triangulation principle during the field experiment in April 2012. The total number of data points was 52,040. It took 47 seconds on a Fermi GTX 590 GPU whereas its serial CPU version took 5422 seconds, leading to a significant 115x speedup.

  12. "Multipoint Force Feedback" Leveling of Massively Parallel Tip Arrays in Scanning Probe Lithography.

    Science.gov (United States)

    Noh, Hanaul; Jung, Goo-Eun; Kim, Sukhyun; Yun, Seong-Hun; Jo, Ahjin; Kahng, Se-Jong; Cho, Nam-Joon; Cho, Sang-Joon

    2015-09-16

    Nanoscale patterning with massively parallel 2D array tips is of significant interest in scanning probe lithography. A challenging task for tip-based large area nanolithography is maintaining parallel tip arrays at the same contact point with a sample substrate in order to pattern a uniform array. Here, polymer pen lithography is demonstrated with a novel leveling method to account for the magnitude and direction of the total applied force of tip arrays by a multipoint force sensing structure integrated into the tip holder. This high-precision approach results in a 0.001° slope of feature edge length variation over 1 cm wide tip arrays. The position sensitive leveling operates in a fully automated manner and is applicable to recently developed scanning probe lithography techniques of various kinds which can enable "desktop nanofabrication." © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  13. Machine and Collection Abstractions for User-Implemented Data-Parallel Programming

    Directory of Open Access Journals (Sweden)

    Magne Haveraaen

    2000-01-01

    Full Text Available Data parallelism has appeared as a fruitful approach to the parallelisation of compute-intensive programs. Data parallelism has the advantage of mimicking the sequential (and deterministic structure of programs as opposed to task parallelism, where the explicit interaction of processes has to be programmed. In data parallelism data structures, typically collection classes in the form of large arrays, are distributed on the processors of the target parallel machine. Trying to extract distribution aspects from conventional code often runs into problems with a lack of uniformity in the use of the data structures and in the expression of data dependency patterns within the code. Here we propose a framework with two conceptual classes, Machine and Collection. The Machine class abstracts hardware communication and distribution properties. This gives a programmer high-level access to the important parts of the low-level architecture. The Machine class may readily be used in the implementation of a Collection class, giving the programmer full control of the parallel distribution of data, as well as allowing normal sequential implementation of this class. Any program using such a collection class will be parallelisable, without requiring any modification, by choosing between sequential and parallel versions at link time. Experiments with a commercial application, built using the Sophus library which uses this approach to parallelisation, show good parallel speed-ups, without any adaptation of the application program being needed.

  14. Ordinal scheduling problem and its asymptotically optimal algorithms on parallel machine system

    Institute of Scientific and Technical Information of China (English)

    TAN Zhiyi; HE Yong

    2004-01-01

    Focusing on the ordinal scheduling problem on a parallel machine system, we discuss the background of ordinal scheduling and the motivation of ordinal algorithms. In addition, for the ordinal scheduling problem on identical parallel machines with the objective to maximize the minimum machine load, we then give two asymptotically optimal algorithm classes which have worst-case ratios very close to the upper bound of the problem for any given m. These results greatly improve the results proposed by He Yong and Tan Zhiyi in 2002.

  15. Parallel Machine Scheduling Models with Fuzzy Parameters and Precedence Constraints: A Credibility Approach

    Institute of Scientific and Technical Information of China (English)

    HOU Fu-jun; WU Qi-zong

    2007-01-01

    A method for modeling the parallel machine scheduling problems with fuzzy parameters and precedence constraints based on credibility measure is provided.For the given n jobs to be processed on m machines, it is assumed that the processing times and the due dates are nonnegative fuzzy numbers and all the weights are positive, crisp numbers.Based on credibility measure, three parallel machine scheduling problems and a goal-programming model are formulated.Feasible schedules are evaluated not only by their objective values but also by the credibility degree of satisfaction with their precedence constraints.The genetic algorithm is utilized to find the best solutions in a short period of time.An illustrative numerical example is also given.Simulation results show that the proposed models are effective, which can deal with the parallel machine scheduling problems with fuzzy parameters and precedence constraints based on credibility measure.

  16. Molecular dynamics simulation on a network of workstations using a machine-independent parallel programming language.

    OpenAIRE

    1991-01-01

    Molecular dynamics simulations investigate local and global motion in molecules. Several parallel computing approaches have been taken to attack the most computationally expensive phase of molecular simulations, the evaluation of long range interactions. This paper develops a straightforward but effective algorithm for molecular dynamics simulations using the machine-independent parallel programming language, Linda. The algorithm was run both on a shared memory parallel computer and on a netw...

  17. DGDFT: A massively parallel method for large scale density functional theory calculations

    Energy Technology Data Exchange (ETDEWEB)

    Hu, Wei, E-mail: whu@lbl.gov; Yang, Chao, E-mail: cyang@lbl.gov [Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720 (United States); Lin, Lin, E-mail: linlin@math.berkeley.edu [Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720 (United States); Department of Mathematics, University of California, Berkeley, California 94720 (United States)

    2015-09-28

    We describe a massively parallel implementation of the recently developed discontinuous Galerkin density functional theory (DGDFT) method, for efficient large-scale Kohn-Sham DFT based electronic structure calculations. The DGDFT method uses adaptive local basis (ALB) functions generated on-the-fly during the self-consistent field iteration to represent the solution to the Kohn-Sham equations. The use of the ALB set provides a systematic way to improve the accuracy of the approximation. By using the pole expansion and selected inversion technique to compute electron density, energy, and atomic forces, we can make the computational complexity of DGDFT scale at most quadratically with respect to the number of electrons for both insulating and metallic systems. We show that for the two-dimensional (2D) phosphorene systems studied here, using 37 basis functions per atom allows us to reach an accuracy level of 1.3 × 10{sup −4} Hartree/atom in terms of the error of energy and 6.2 × 10{sup −4} Hartree/bohr in terms of the error of atomic force, respectively. DGDFT can achieve 80% parallel efficiency on 128,000 high performance computing cores when it is used to study the electronic structure of 2D phosphorene systems with 3500-14 000 atoms. This high parallel efficiency results from a two-level parallelization scheme that we will describe in detail.

  18. Design of high-performing hybrid meta-heuristics for unrelated parallel machine scheduling with machine eligibility and precedence constraints

    Science.gov (United States)

    Afzalirad, Mojtaba; Rezaeian, Javad

    2016-04-01

    This study involves an unrelated parallel machine scheduling problem in which sequence-dependent set-up times, different release dates, machine eligibility and precedence constraints are considered to minimize total late works. A new mixed-integer programming model is presented and two efficient hybrid meta-heuristics, genetic algorithm and ant colony optimization, combined with the acceptance strategy of the simulated annealing algorithm (Metropolis acceptance rule), are proposed to solve this problem. Manifestly, the precedence constraints greatly increase the complexity of the scheduling problem to generate feasible solutions, especially in a parallel machine environment. In this research, a new corrective algorithm is proposed to obtain the feasibility in all stages of the algorithms. The performance of the proposed algorithms is evaluated in numerical examples. The results indicate that the suggested hybrid ant colony optimization statistically outperformed the proposed hybrid genetic algorithm in solving large-size test problems.

  19. On distributed memory MPI-based parallelization of SPH codes in massive HPC context

    Science.gov (United States)

    Oger, G.; Le Touzé, D.; Guibert, D.; de Leffe, M.; Biddiscombe, J.; Soumagne, J.; Piccinali, J.-G.

    2016-03-01

    Most of particle methods share the problem of high computational cost and in order to satisfy the demands of solvers, currently available hardware technologies must be fully exploited. Two complementary technologies are now accessible. On the one hand, CPUs which can be structured into a multi-node framework, allowing massive data exchanges through a high speed network. In this case, each node is usually comprised of several cores available to perform multithreaded computations. On the other hand, GPUs which are derived from the graphics computing technologies, able to perform highly multi-threaded calculations with hundreds of independent threads connected together through a common shared memory. This paper is primarily dedicated to the distributed memory parallelization of particle methods, targeting several thousands of CPU cores. The experience gained clearly shows that parallelizing a particle-based code on moderate numbers of cores can easily lead to an acceptable scalability, whilst a scalable speedup on thousands of cores is much more difficult to obtain. The discussion revolves around speeding up particle methods as a whole, in a massive HPC context by making use of the MPI library. We focus on one particular particle method which is Smoothed Particle Hydrodynamics (SPH), one of the most widespread today in the literature as well as in engineering.

  20. A Faster Parallel Algorithm and Efficient Multithreaded Implementations for Evaluating Betweenness Centrality on Massive Datasets

    Energy Technology Data Exchange (ETDEWEB)

    Madduri, Kamesh; Ediger, David; Jiang, Karl; Bader, David A.; Chavarria-Miranda, Daniel

    2009-02-15

    We present a new lock-free parallel algorithm for computing betweenness centralityof massive small-world networks. With minor changes to the data structures, ouralgorithm also achieves better spatial cache locality compared to previous approaches. Betweenness centrality is a key algorithm kernel in HPCS SSCA#2, a benchmark extensively used to evaluate the performance of emerging high-performance computing architectures for graph-theoretic computations. We design optimized implementations of betweenness centrality and the SSCA#2 benchmark for two hardware multithreaded systems: a Cray XMT system with the Threadstorm processor, and a single-socket Sun multicore server with the UltraSPARC T2 processor. For a small-world network of 134 million vertices and 1.073 billion edges, the 16-processor XMT system and the 8-core Sun Fire T5120 server achieve TEPS scores (an algorithmic performance count for the SSCA#2 benchmark) of 160 million and 90 million respectively, which corresponds to more than a 2X performance improvement over the previous parallel implementations. To better characterize the performance of these multithreaded systems, we correlate the SSCA#2 performance results with data from the memory-intensive STREAM and RandomAccess benchmarks. Finally, we demonstrate the applicability of our implementation to analyze massive real-world datasets by computing approximate betweenness centrality for a large-scale IMDb movie-actor network.

  1. Genetic testing in hereditary breast and ovarian cancer using massive parallel sequencing.

    Science.gov (United States)

    Ruiz, Anna; Llort, Gemma; Yagüe, Carmen; Baena, Neus; Viñas, Marina; Torra, Montse; Brunet, Anna; Seguí, Miquel A; Saigí, Eugeni; Guitart, Miriam

    2014-01-01

    High throughput methods such as next generation sequencing are increasingly used in molecular diagnosis. The aim of this study was to develop a workflow for the detection of BRCA1 and BRCA2 mutations using massive parallel sequencing in a 454 GS Junior bench top sequencer. Our approach was first validated in a panel of 23 patients containing 62 unique variants that had been previously Sanger sequenced. Subsequently, 101 patients with familial breast and ovarian cancer were studied. BRCA1 and BRCA2 exon enrichment has been performed by PCR amplification using the BRCA MASTR kit (Multiplicom). Bioinformatic analysis of reads is performed with the AVA software v2.7 (Roche). In total, all 62 variants were detected resulting in a sensitivity of 100%. 71 false positives were called resulting in a specificity of 97.35%. All of them correspond to deletions located in homopolymeric stretches. The analysis of the homopolymers stretches of 6 bp or longer using the BRCA HP kit (Multiplicom) increased the specificity of the detection of BRCA1 and BRCA2 mutations to 99.99%. We show here that massive parallel pyrosequencing can be used as a diagnostic strategy to test for BRCA1 and BRCA2 mutations meeting very stringent sensitivity and specificity parameters replacing traditional Sanger sequencing with a lower cost.

  2. Decreasing Data Analytics Time: Hybrid Architecture MapReduce-Massive Parallel Processing for a Smart Grid

    Directory of Open Access Journals (Sweden)

    Abdeslam Mehenni

    2017-03-01

    Full Text Available As our populations grow in a world of limited resources enterprise seek ways to lighten our load on the planet. The idea of modifying consumer behavior appears as a foundation for smart grids. Enterprise demonstrates the value available from deep analysis of electricity consummation histories, consumers’ messages, and outage alerts, etc. Enterprise mines massive structured and unstructured data. In a nutshell, smart grids result in a flood of data that needs to be analyzed, for better adjust to demand and give customers more ability to delve into their power consumption. Simply put, smart grids will increasingly have a flexible data warehouse attached to them. The key driver for the adoption of data management strategies is clearly the need to handle and analyze the large amounts of information utilities are now faced with. New approaches to data integration are nauseating moment; Hadoop is in fact now being used by the utility to help manage the huge growth in data whilst maintaining coherence of the Data Warehouse. In this paper we define a new Meter Data Management System Architecture repository that differ with three leaders MDMS, where we use MapReduce programming model for ETL and Parallel DBMS in Query statements(Massive Parallel Processing MPP.

  3. Massively Parallel, Molecular Analysis Platform Developed Using a CMOS Integrated Circuit With Biological Nanopores

    Science.gov (United States)

    Roever, Stefan

    2012-01-01

    A massively parallel, low cost molecular analysis platform will dramatically change the nature of protein, molecular and genomics research, DNA sequencing, and ultimately, molecular diagnostics. An integrated circuit (IC) with 264 sensors was fabricated using standard CMOS semiconductor processing technology. Each of these sensors is individually controlled with precision analog circuitry and is capable of single molecule measurements. Under electronic and software control, the IC was used to demonstrate the feasibility of creating and detecting lipid bilayers and biological nanopores using wild type α-hemolysin. The ability to dynamically create bilayers over each of the sensors will greatly accelerate pore development and pore mutation analysis. In addition, the noise performance of the IC was measured to be 30fA(rms). With this noise performance, single base detection of DNA was demonstrated using α-hemolysin. The data shows that a single molecule, electrical detection platform using biological nanopores can be operationalized and can ultimately scale to millions of sensors. Such a massively parallel platform will revolutionize molecular analysis and will completely change the field of molecular diagnostics in the future.

  4. Assessment of whole genome amplification for sequence capture and massively parallel sequencing.

    Directory of Open Access Journals (Sweden)

    Johanna Hasmats

    Full Text Available Exome sequence capture and massively parallel sequencing can be combined to achieve inexpensive and rapid global analyses of the functional sections of the genome. The difficulties of working with relatively small quantities of genetic material, as may be necessary when sharing tumor biopsies between collaborators for instance, can be overcome using whole genome amplification. However, the potential drawbacks of using a whole genome amplification technology based on random primers in combination with sequence capture followed by massively parallel sequencing have not yet been examined in detail, especially in the context of mutation discovery in tumor material. In this work, we compare mutations detected in sequence data for unamplified DNA, whole genome amplified DNA, and RNA originating from the same tumor tissue samples from 16 patients diagnosed with non-small cell lung cancer. The results obtained provide a comprehensive overview of the merits of these techniques for mutation analysis. We evaluated the identified genetic variants, and found that most (74% of them were observed in both the amplified and the unamplified sequence data. Eighty-nine percent of the variations found by WGA were shared with unamplified DNA. We demonstrate a strategy for avoiding allelic bias by including RNA-sequencing information.

  5. Assessment of whole genome amplification for sequence capture and massively parallel sequencing.

    Science.gov (United States)

    Hasmats, Johanna; Gréen, Henrik; Orear, Cedric; Validire, Pierre; Huss, Mikael; Käller, Max; Lundeberg, Joakim

    2014-01-01

    Exome sequence capture and massively parallel sequencing can be combined to achieve inexpensive and rapid global analyses of the functional sections of the genome. The difficulties of working with relatively small quantities of genetic material, as may be necessary when sharing tumor biopsies between collaborators for instance, can be overcome using whole genome amplification. However, the potential drawbacks of using a whole genome amplification technology based on random primers in combination with sequence capture followed by massively parallel sequencing have not yet been examined in detail, especially in the context of mutation discovery in tumor material. In this work, we compare mutations detected in sequence data for unamplified DNA, whole genome amplified DNA, and RNA originating from the same tumor tissue samples from 16 patients diagnosed with non-small cell lung cancer. The results obtained provide a comprehensive overview of the merits of these techniques for mutation analysis. We evaluated the identified genetic variants, and found that most (74%) of them were observed in both the amplified and the unamplified sequence data. Eighty-nine percent of the variations found by WGA were shared with unamplified DNA. We demonstrate a strategy for avoiding allelic bias by including RNA-sequencing information.

  6. Application of Massively Parallel Sequencing in the Clinical Diagnostic Testing of Inherited Cardiac Conditions

    Directory of Open Access Journals (Sweden)

    Ivone U. S. Leong

    2014-06-01

    Full Text Available Sudden cardiac death in people between the ages of 1–40 years is a devastating event and is frequently caused by several heritable cardiac disorders. These disorders include cardiac ion channelopathies, such as long QT syndrome, catecholaminergic polymorphic ventricular tachycardia and Brugada syndrome and cardiomyopathies, such as hypertrophic cardiomyopathy and arrhythmogenic right ventricular cardiomyopathy. Through careful molecular genetic evaluation of DNA from sudden death victims, the causative gene mutation can be uncovered, and the rest of the family can be screened and preventative measures implemented in at-risk individuals. The current screening approach in most diagnostic laboratories uses Sanger-based sequencing; however, this method is time consuming and labour intensive. The development of massively parallel sequencing has made it possible to produce millions of sequence reads simultaneously and is potentially an ideal approach to screen for mutations in genes that are associated with sudden cardiac death. This approach offers mutation screening at reduced cost and turnaround time. Here, we will review the current commercially available enrichment kits, massively parallel sequencing (MPS platforms, downstream data analysis and its application to sudden cardiac death in a diagnostic environment.

  7. MADmap: A Massively Parallel Maximum-Likelihood Cosmic Microwave Background Map-Maker

    Energy Technology Data Exchange (ETDEWEB)

    Cantalupo, Christopher; Borrill, Julian; Jaffe, Andrew; Kisner, Theodore; Stompor, Radoslaw

    2009-06-09

    MADmap is a software application used to produce maximum-likelihood images of the sky from time-ordered data which include correlated noise, such as those gathered by Cosmic Microwave Background (CMB) experiments. It works efficiently on platforms ranging from small workstations to the most massively parallel supercomputers. Map-making is a critical step in the analysis of all CMB data sets, and the maximum-likelihood approach is the most accurate and widely applicable algorithm; however, it is a computationally challenging task. This challenge will only increase with the next generation of ground-based, balloon-borne and satellite CMB polarization experiments. The faintness of the B-mode signal that these experiments seek to measure requires them to gather enormous data sets. MADmap is already being run on up to O(1011) time samples, O(108) pixels and O(104) cores, with ongoing work to scale to the next generation of data sets and supercomputers. We describe MADmap's algorithm based around a preconditioned conjugate gradient solver, fast Fourier transforms and sparse matrix operations. We highlight MADmap's ability to address problems typically encountered in the analysis of realistic CMB data sets and describe its application to simulations of the Planck and EBEX experiments. The massively parallel and distributed implementation is detailed and scaling complexities are given for the resources required. MADmap is capable of analysing the largest data sets now being collected on computing resources currently available, and we argue that, given Moore's Law, MADmap will be capable of reducing the most massive projected data sets.

  8. Scheduling Jobs with Variable Job Processing Times on Unrelated Parallel Machines

    Directory of Open Access Journals (Sweden)

    Guang-Qian Zhang

    2014-01-01

    Full Text Available m unrelated parallel machines scheduling problems with variable job processing times are considered, where the processing time of a job is a function of its position in a sequence, its starting time, and its resource allocation. The objective is to determine the optimal resource allocation and the optimal schedule to minimize a total cost function that dependents on the total completion (waiting time, the total machine load, the total absolute differences in completion (waiting times on all machines, and total resource cost. If the number of machines is a given constant number, we propose a polynomial time algorithm to solve the problem.

  9. Massive parallelization of a 3D finite difference electromagnetic forward solution using domain decomposition methods on multiple CUDA enabled GPUs

    Science.gov (United States)

    Schultz, A.

    2010-12-01

    describe our ongoing efforts to achieve massive parallelization on a novel hybrid GPU testbed machine currently configured with 12 Intel Westmere Xeon CPU cores (or 24 parallel computational threads) with 96 GB DDR3 system memory, 4 GPU subsystems which in aggregate contain 960 NVidia Tesla GPU cores with 16 GB dedicated DDR3 GPU memory, and a second interleved bank of 4 GPU subsystems containing in aggregate 1792 NVidia Fermi GPU cores with 12 GB dedicated DDR5 GPU memory. We are applying domain decomposition methods to a modified version of Weiss' (2001) 3D frequency domain full physics EM finite difference code, an open source GPL licensed f90 code available for download from www.OpenEM.org. This will be the core of a new hybrid 3D inversion that parallelizes frequencies across CPUs and individual forward solutions across GPUs. We describe progress made in modifying the code to use direct solvers in GPU cores dedicated to each small subdomain, iteratively improving the solution by matching adjacent subdomain boundary solutions, rather than iterative Krylov space sparse solvers as currently applied to the whole domain.

  10. ASSET: Analysis of Sequences of Synchronous Events in Massively Parallel Spike Trains.

    Science.gov (United States)

    Torre, Emiliano; Canova, Carlos; Denker, Michael; Gerstein, George; Helias, Moritz; Grün, Sonja

    2016-07-01

    With the ability to observe the activity from large numbers of neurons simultaneously using modern recording technologies, the chance to identify sub-networks involved in coordinated processing increases. Sequences of synchronous spike events (SSEs) constitute one type of such coordinated spiking that propagates activity in a temporally precise manner. The synfire chain was proposed as one potential model for such network processing. Previous work introduced a method for visualization of SSEs in massively parallel spike trains, based on an intersection matrix that contains in each entry the degree of overlap of active neurons in two corresponding time bins. Repeated SSEs are reflected in the matrix as diagonal structures of high overlap values. The method as such, however, leaves the task of identifying these diagonal structures to visual inspection rather than to a quantitative analysis. Here we present ASSET (Analysis of Sequences of Synchronous EvenTs), an improved, fully automated method which determines diagonal structures in the intersection matrix by a robust mathematical procedure. The method consists of a sequence of steps that i) assess which entries in the matrix potentially belong to a diagonal structure, ii) cluster these entries into individual diagonal structures and iii) determine the neurons composing the associated SSEs. We employ parallel point processes generated by stochastic simulations as test data to demonstrate the performance of the method under a wide range of realistic scenarios, including different types of non-stationarity of the spiking activity and different correlation structures. Finally, the ability of the method to discover SSEs is demonstrated on complex data from large network simulations with embedded synfire chains. Thus, ASSET represents an effective and efficient tool to analyze massively parallel spike data for temporal sequences of synchronous activity.

  11. ASSET: Analysis of Sequences of Synchronous Events in Massively Parallel Spike Trains

    Science.gov (United States)

    Canova, Carlos; Denker, Michael; Gerstein, George; Helias, Moritz

    2016-01-01

    With the ability to observe the activity from large numbers of neurons simultaneously using modern recording technologies, the chance to identify sub-networks involved in coordinated processing increases. Sequences of synchronous spike events (SSEs) constitute one type of such coordinated spiking that propagates activity in a temporally precise manner. The synfire chain was proposed as one potential model for such network processing. Previous work introduced a method for visualization of SSEs in massively parallel spike trains, based on an intersection matrix that contains in each entry the degree of overlap of active neurons in two corresponding time bins. Repeated SSEs are reflected in the matrix as diagonal structures of high overlap values. The method as such, however, leaves the task of identifying these diagonal structures to visual inspection rather than to a quantitative analysis. Here we present ASSET (Analysis of Sequences of Synchronous EvenTs), an improved, fully automated method which determines diagonal structures in the intersection matrix by a robust mathematical procedure. The method consists of a sequence of steps that i) assess which entries in the matrix potentially belong to a diagonal structure, ii) cluster these entries into individual diagonal structures and iii) determine the neurons composing the associated SSEs. We employ parallel point processes generated by stochastic simulations as test data to demonstrate the performance of the method under a wide range of realistic scenarios, including different types of non-stationarity of the spiking activity and different correlation structures. Finally, the ability of the method to discover SSEs is demonstrated on complex data from large network simulations with embedded synfire chains. Thus, ASSET represents an effective and efficient tool to analyze massively parallel spike data for temporal sequences of synchronous activity. PMID:27420734

  12. MPI/OpenMP Hybrid Parallel Algorithm of Resolution of Identity Second-Order Møller-Plesset Perturbation Calculation for Massively Parallel Multicore Supercomputers.

    Science.gov (United States)

    Katouda, Michio; Nakajima, Takahito

    2013-12-10

    A new algorithm for massively parallel calculations of electron correlation energy of large molecules based on the resolution of identity second-order Møller-Plesset perturbation (RI-MP2) technique is developed and implemented into the quantum chemistry software NTChem. In this algorithm, a Message Passing Interface (MPI) and Open Multi-Processing (OpenMP) hybrid parallel programming model is applied to attain efficient parallel performance on massively parallel supercomputers. An in-core storage scheme of intermediate data of three-center electron repulsion integrals utilizing the distributed memory is developed to eliminate input/output (I/O) overhead. The parallel performance of the algorithm is tested on massively parallel supercomputers such as the K computer (using up to 45 992 central processing unit (CPU) cores) and a commodity Intel Xeon cluster (using up to 8192 CPU cores). The parallel RI-MP2/cc-pVTZ calculation of two-layer nanographene sheets (C150H30)2 (number of atomic orbitals is 9640) is performed using 8991 node and 71 288 CPU cores of the K computer.

  13. Design of a novel parallel reconfigurable machine tool

    CSIR Research Space (South Africa)

    Modungwa, D

    2008-06-01

    Full Text Available equipment, with high stiffness. Each worn out mould or die has different defects, which requires different mechanical processing and positioning of the reconditioning tools. Conventional machine tools and serial manipulators have been found incapable... and philosophy of re-configurability. The structure is influenced by 1) specifications for the repair and re- conditioning of moulds and dies, and 2) the manufacturing processes involved. 1. INTRODUCTION Serial manipulators have been used extensively...

  14. Accelerating Relevance-Vector-Machine-Based Classification of Hyperspectral Image with Parallel Computing

    Directory of Open Access Journals (Sweden)

    Chao Dong

    2012-01-01

    Full Text Available Benefiting from the kernel skill and the sparse property, the relevance vector machine (RVM could acquire a sparse solution, with an equivalent generalization ability compared with the support vector machine. The sparse property requires much less time in the prediction, making RVM potential in classifying the large-scale hyperspectral image. However, RVM is not widespread influenced by its slow training procedure. To solve the problem, the classification of the hyperspectral image using RVM is accelerated by the parallel computing technique in this paper. The parallelization is revealed from the aspects of the multiclass strategy, the ensemble of multiple weak classifiers, and the matrix operations. The parallel RVMs are implemented using the C language plus the parallel functions of the linear algebra packages and the message passing interface library. The proposed methods are evaluated by the AVIRIS Indian Pines data set on the Beowulf cluster and the multicore platforms. It shows that the parallel RVMs accelerate the training procedure obviously.

  15. User's Guide for TOUGH2-MP - A Massively Parallel Version of the TOUGH2 Code

    Energy Technology Data Exchange (ETDEWEB)

    Earth Sciences Division; Zhang, Keni; Zhang, Keni; Wu, Yu-Shu; Pruess, Karsten

    2008-05-27

    TOUGH2-MP is a massively parallel (MP) version of the TOUGH2 code, designed for computationally efficient parallel simulation of isothermal and nonisothermal flows of multicomponent, multiphase fluids in one, two, and three-dimensional porous and fractured media. In recent years, computational requirements have become increasingly intensive in large or highly nonlinear problems for applications in areas such as radioactive waste disposal, CO2 geological sequestration, environmental assessment and remediation, reservoir engineering, and groundwater hydrology. The primary objective of developing the parallel-simulation capability is to significantly improve the computational performance of the TOUGH2 family of codes. The particular goal for the parallel simulator is to achieve orders-of-magnitude improvement in computational time for models with ever-increasing complexity. TOUGH2-MP is designed to perform parallel simulation on multi-CPU computational platforms. An earlier version of TOUGH2-MP (V1.0) was based on the TOUGH2 Version 1.4 with EOS3, EOS9, and T2R3D modules, a software previously qualified for applications in the Yucca Mountain project, and was designed for execution on CRAY T3E and IBM SP supercomputers. The current version of TOUGH2-MP (V2.0) includes all fluid property modules of the standard version TOUGH2 V2.0. It provides computationally efficient capabilities using supercomputers, Linux clusters, or multi-core PCs, and also offers many user-friendly features. The parallel simulator inherits all process capabilities from V2.0 together with additional capabilities for handling fractured media from V1.4. This report provides a quick starting guide on how to set up and run the TOUGH2-MP program for users with a basic knowledge of running the (standard) version TOUGH2 code, The report also gives a brief technical description of the code, including a discussion of parallel methodology, code structure, as well as mathematical and numerical methods used

  16. A self-calibrating robot based upon a virtual machine model of parallel kinematics

    DEFF Research Database (Denmark)

    Pedersen, David Bue; Eiríksson, Eyþór Rúnar; Hansen, Hans Nørgaard

    2016-01-01

    A delta-type parallel kinematics system for Additive Manufacturing has been created, which through a probing system can recognise its geometrical deviations from nominal and compensate for these in the driving inverse kinematic model of the machine. Novelty is that this model is derived from...... a virtual machine of the kinematics system, built on principles from geometrical metrology. Relevant mathematically non-trivial deviations to the ideal machine are identified and decomposed into elemental deviations. From these deviations, a routine is added to a physical machine tool, which allows...... it to recognise its own geometry by probing the vertical offset from tool point to the machine table, at positions in the horizontal plane. After automatic calibration the positioning error of the machine tool was reduced from an initial error after its assembly of ±170 µm to a calibrated error of ±3 µm...

  17. Compact Graph Representations and Parallel Connectivity Algorithms for Massive Dynamic Network Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Madduri, Kamesh; Bader, David A.

    2009-02-15

    Graph-theoretic abstractions are extensively used to analyze massive data sets. Temporal data streams from socioeconomic interactions, social networking web sites, communication traffic, and scientific computing can be intuitively modeled as graphs. We present the first study of novel high-performance combinatorial techniques for analyzing large-scale information networks, encapsulating dynamic interaction data in the order of billions of entities. We present new data structures to represent dynamic interaction networks, and discuss algorithms for processing parallel insertions and deletions of edges in small-world networks. With these new approaches, we achieve an average performance rate of 25 million structural updates per second and a parallel speedup of nearly28 on a 64-way Sun UltraSPARC T2 multicore processor, for insertions and deletions to a small-world network of 33.5 million vertices and 268 million edges. We also design parallel implementations of fundamental dynamic graph kernels related to connectivity and centrality queries. Our implementations are freely distributed as part of the open-source SNAP (Small-world Network Analysis and Partitioning) complex network analysis framework.

  18. 3-D readout-electronics packaging for high-bandwidth massively paralleled imager

    Science.gov (United States)

    Kwiatkowski, Kris; Lyke, James

    2007-12-18

    Dense, massively parallel signal processing electronics are co-packaged behind associated sensor pixels. Microchips containing a linear or bilinear arrangement of photo-sensors, together with associated complex electronics, are integrated into a simple 3-D structure (a "mirror cube"). An array of photo-sensitive cells are disposed on a stacked CMOS chip's surface at a 45.degree. angle from light reflecting mirror surfaces formed on a neighboring CMOS chip surface. Image processing electronics are held within the stacked CMOS chip layers. Electrical connections couple each of said stacked CMOS chip layers and a distribution grid, the connections for distributing power and signals to components associated with each stacked CSMO chip layer.

  19. Simultaneous digital quantification and fluorescence-based size characterization of massively parallel sequencing libraries.

    Science.gov (United States)

    Laurie, Matthew T; Bertout, Jessica A; Taylor, Sean D; Burton, Joshua N; Shendure, Jay A; Bielas, Jason H

    2013-08-01

    Due to the high cost of failed runs and suboptimal data yields, quantification and determination of fragment size range are crucial steps in the library preparation process for massively parallel sequencing (or next-generation sequencing). Current library quality control methods commonly involve quantification using real-time quantitative PCR and size determination using gel or capillary electrophoresis. These methods are laborious and subject to a number of significant limitations that can make library calibration unreliable. Herein, we propose and test an alternative method for quality control of sequencing libraries using droplet digital PCR (ddPCR). By exploiting a correlation we have discovered between droplet fluorescence and amplicon size, we achieve the joint quantification and size determination of target DNA with a single ddPCR assay. We demonstrate the accuracy and precision of applying this method to the preparation of sequencing libraries.

  20. 'Proxy-equation' paradigm - A novel strategy for massively-parallel asynchronous computations

    CERN Document Server

    Mittal, Ankita

    2016-01-01

    Massively parallel simulations of physical systems call for a paradigm change in algorithm development to achieve efficient scalability. Traditional approaches require time synchronization of processing elements (PEs) which severely restricts scalability. Relaxing synchronization requirement introduces error and slows down convergence rate. In this paper, we propose a novel 'proxy-equation' concept for a general transport equation that (i) tolerates asynchrony without introducing much error, (ii) preserves convergence characteristics and (iii) scales efficiently. The central idea is to modify a priori the transport characteristics (e.g., advection speed, diffusivity) at the PE boundaries to counteract asynchrony errors. We develop the theoretical framework and derive the relationship between characteristics and degree of PE time delay. Proof-of-concept computations are performed using a one-dimensional advection-diffusion equation. Error and convergence rates for various discretization levels and wavelengths ...

  1. GPAW - massively parallel electronic structure calculations with Python-based software

    DEFF Research Database (Denmark)

    Enkovaara, Jussi; Romero, Nichols A.; Shende, Sameer

    2011-01-01

    popular choice. While dynamic, interpreted languages, such as Python, can increase the effciency of programmer, they cannot compete directly with the raw performance of compiled languages. However, by using an interpreted language together with a compiled language, it is possible to have most...... of the productivity enhancing features together with a good numerical performance. We have used this approach in implementing an electronic structure simulation software GPAW using the combination of Python and C programming languages. While the chosen approach works well in standard workstations and Unix...... environments, massively parallel supercomputing systems can present some challenges in porting, debugging and profiling the software. In this paper we describe some details of the implementation and discuss the advantages and challenges of the combined Python/C approach. We show that despite the challenges...

  2. Massively Parallel Phase-Field Simulations for Ternary Eutectic Directional Solidification

    CERN Document Server

    Bauer, Martin; Steinmetz, Philipp; Jainta, Marcus; Berghoff, Marco; Schornbaum, Florian; Godenschwager, Christian; Köstler, Harald; Nestler, Britta; Rüde, Ulrich

    2015-01-01

    Microstructures forming during ternary eutectic directional solidification processes have significant influence on the macroscopic mechanical properties of metal alloys. For a realistic simulation, we use the well established thermodynamically consistent phase-field method and improve it with a new grand potential formulation to couple the concentration evolution. This extension is very compute intensive due to a temperature dependent diffusive concentration. We significantly extend previous simulations that have used simpler phase-field models or were performed on smaller domain sizes. The new method has been implemented within the massively parallel HPC framework waLBerla that is designed to exploit current supercomputers efficiently. We apply various optimization techniques, including buffering techniques, explicit SIMD kernel vectorization, and communication hiding. Simulations utilizing up to 262,144 cores have been run on three different supercomputing architectures and weak scalability results are show...

  3. Passive and Partially Active Fault Tolerance for Massively Parallel Stream Processing Engines

    DEFF Research Database (Denmark)

    Su, Li; Zhou, Yongluan

    2017-01-01

    Fault-tolerance techniques for stream processing engines can be categorized into passive and active approaches. A typical passive approach periodically checkpoints a processing task's runtime states and can recover a failed task by restoring its runtime state using its latest checkpoint....... On the other hand, an active approach usually employs backup nodes to run replicated tasks. Upon failure, the active replica can take over the processing of the failed task with minimal latency. However, both approaches have their own inadequacies in Massively Parallel Stream Processing Engines (MPSPE......, the passive approach is applied to all tasks while only a selected set of tasks will be actively replicated. The number of actively replicated tasks depends on the available resources. If tasks without active replicas fail, tentative outputs will be generated before the completion of the recovery process. We...

  4. GPAW - massively parallel electronic structure calculations with Python-based software

    DEFF Research Database (Denmark)

    Enkovaara, Jussi; Romero, Nichols A.; Shende, Sameer

    2011-01-01

    popular choice. While dynamic, interpreted languages, such as Python, can increase the effciency of programmer, they cannot compete directly with the raw performance of compiled languages. However, by using an interpreted language together with a compiled language, it is possible to have most...... of the productivity enhancing features together with a good numerical performance. We have used this approach in implementing an electronic structure simulation software GPAW using the combination of Python and C programming languages. While the chosen approach works well in standard workstations and Unix...... environments, massively parallel supercomputing systems can present some challenges in porting, debugging and profiling the software. In this paper we describe some details of the implementation and discuss the advantages and challenges of the combined Python/C approach. We show that despite the challenges...

  5. Large-Scale Eigenvalue Calculations for Stability Analysis of Steady Flows on Massively Parallel Computers

    Energy Technology Data Exchange (ETDEWEB)

    Lehoucq, Richard B.; Salinger, Andrew G.

    1999-08-01

    We present an approach for determining the linear stability of steady states of PDEs on massively parallel computers. Linearizing the transient behavior around a steady state leads to a generalized eigenvalue problem. The eigenvalues with largest real part are calculated using Arnoldi's iteration driven by a novel implementation of the Cayley transformation to recast the problem as an ordinary eigenvalue problem. The Cayley transformation requires the solution of a linear system at each Arnoldi iteration, which must be done iteratively for the algorithm to scale with problem size. A representative model problem of 3D incompressible flow and heat transfer in a rotating disk reactor is used to analyze the effect of algorithmic parameters on the performance of the eigenvalue algorithm. Successful calculations of leading eigenvalues for matrix systems of order up to 4 million were performed, identifying the critical Grashof number for a Hopf bifurcation.

  6. Inside the intraterrestrials: The deep biosphere seen through massively parallel sequencing

    Science.gov (United States)

    Biddle, J.

    2009-12-01

    Deeply buried marine sediments may house a large amount of the Earth’s microbial population. Initial studies based on 16S rRNA clone libraries suggest that these sediments contain unique phylotypes of microorganisms, particularly from the archaeal domain. Since this environment is so difficult to study, microbiologists are challenged to find ways to examine these populations remotely. A major approach taken to study this environment uses massively parallel sequencing to examine the inner genetic workings of these microorganisms after the sediment has been drilled. Both metagenomics and tagged amplicon sequencing have been employed on deep sediments, and initial results show that different geographic regions can be differentiated through genomics and also minor populations may cause major geochemical changes.

  7. Radiation Hydrodynamics using Characteristics on Adaptive Decomposed Domains for Massively Parallel Star Formation Simulations

    CERN Document Server

    Buntemeyer, Lars; Peters, Thomas; Klassen, Mikhail; Pudritz, Ralph E

    2015-01-01

    We present an algorithm for solving the radiative transfer problem on massively parallel computers using adaptive mesh refinement and domain decomposition. The solver is based on the method of characteristics which requires an adaptive raytracer that integrates the equation of radiative transfer. The radiation field is split into local and global components which are handled separately to overcome the non-locality problem. The solver is implemented in the framework of the magneto-hydrodynamics code FLASH and is coupled by an operator splitting step. The goal is the study of radiation in the context of star formation simulations with a focus on early disc formation and evolution. This requires a proper treatment of radiation physics that covers both the optically thin as well as the optically thick regimes and the transition region in particular. We successfully show the accuracy and feasibility of our method in a series of standard radiative transfer problems and two 3D collapse simulations resembling the ear...

  8. GRay: a Massively Parallel GPU-Based Code for Ray Tracing in Relativistic Spacetimes

    CERN Document Server

    Chan, Chi-kwan; Ozel, Feryal

    2013-01-01

    We introduce GRay, a massively parallel integrator designed to trace the trajectories of billions of photons in a curved spacetime. This GPU-based integrator employs the stream processing paradigm, is implemented in CUDA C/C++, and runs on nVidia graphics cards. The peak performance of GRay using single precision floating-point arithmetic on a single GPU exceeds 300 GFLOP (or 1 nanosecond per photon per time step). For a realistic problem, where the peak performance cannot be reached, GRay is two orders of magnitude faster than existing CPU-based ray tracing codes. This performance enhancement allows more effective searches of large parameter spaces when comparing theoretical predictions of images, spectra, and lightcurves from the vicinities of compact objects to observations. GRay can also perform on-the-fly ray tracing within general relativistic magnetohydrodynamic algorithms that simulate accretion flows around compact objects. Making use of this algorithm, we calculate the properties of the shadows of K...

  9. ls1 mardyn: The massively parallel molecular dynamics code for large systems

    CERN Document Server

    Niethammer, Christoph; Bernreuther, Martin; Buchholz, Martin; Eckhardt, Wolfgang; Heinecke, Alexander; Werth, Stephan; Bungartz, Hans-Joachim; Glass, Colin W; Hasse, Hans; Vrabec, Jadran; Horsch, Martin

    2014-01-01

    The molecular dynamics simulation code ls1 mardyn is presented. It is a highly scalable code, optimized for massively parallel execution on supercomputing architectures, and currently holds the world record for the largest molecular simulation with over four trillion particles. It enables the application of pair potentials to length and time scales which were previously out of scope for molecular dynamics simulation. With an efficient dynamic load balancing scheme, it delivers high scalability even for challenging heterogeneous configurations. Presently, multi-center rigid potential models based on Lennard-Jones sites, point charges and higher-order polarities are supported. Due to its modular design, ls1 mardyn can be extended to new physical models, methods, and algorithms, allowing future users to tailor it to suit their respective needs. Possible applications include scenarios with complex geometries, e.g. for fluids at interfaces, as well as non-equilibrium molecular dynamics simulation of heat and mass ...

  10. Parallel-Machine Scheduling Problems with Past-Sequence-Dependent Delivery Times and Aging Maintenance

    Directory of Open Access Journals (Sweden)

    Wei-min Ma

    2015-01-01

    Full Text Available We consider parallel-machine scheduling problems with past-sequence-dependent (psd delivery times and aging maintenance. The delivery time is proportional to the waiting time in the system. Each machine has an aging maintenance activity. We develop polynomial algorithms to three versions of the problem to minimize the total absolute deviation of job completion times, the total load, and the total completion time.

  11. A Novel Parallel Engraving Machine Based on 6-PUS Mechanism and Related Technologies

    OpenAIRE

    Ling-fu, Kong; Shi-hui, Zhang

    2005-01-01

    A novel parallel engraving machine is proposed and its some key technologies are studied in this paper. Based on the confirming of mechanism type, a group of mechanisms are obtained by changing the sizes of engraving machine. Performance indices are analyzed by considering both the first and the second order influence coefficient matrix of different sample point in every mechanism's workspace, then mechanism's sizes better for both kinematics and dynamics are achieved, so the theory basis for...

  12. Efficient Parallel Sorting for Migrating Birds Optimization When Solving Machine-Part Cell Formation Problems

    OpenAIRE

    Ricardo Soto; Broderick Crawford; Boris Almonacid; Fernando Paredes

    2016-01-01

    The Machine-Part Cell Formation Problem (MPCFP) is a NP-Hard optimization problem that consists in grouping machines and parts in a set of cells, so that each cell can operate independently and the intercell movements are minimized. This problem has largely been tackled in the literature by using different techniques ranging from classic methods such as linear programming to more modern nature-inspired metaheuristics. In this paper, we present an efficient parallel version of the Migrating Bi...

  13. Massively parallel cis-regulatory analysis in the mammalian central nervous system.

    Science.gov (United States)

    Shen, Susan Q; Myers, Connie A; Hughes, Andrew E O; Byrne, Leah C; Flannery, John G; Corbo, Joseph C

    2016-02-01

    Cis-regulatory elements (CREs, e.g., promoters and enhancers) regulate gene expression, and variants within CREs can modulate disease risk. Next-generation sequencing has enabled the rapid generation of genomic data that predict the locations of CREs, but a bottleneck lies in functionally interpreting these data. To address this issue, massively parallel reporter assays (MPRAs) have emerged, in which barcoded reporter libraries are introduced into cells, and the resulting barcoded transcripts are quantified by next-generation sequencing. Thus far, MPRAs have been largely restricted to assaying short CREs in a limited repertoire of cultured cell types. Here, we present two advances that extend the biological relevance and applicability of MPRAs. First, we adapt exome capture technology to instead capture candidate CREs, thereby tiling across the targeted regions and markedly increasing the length of CREs that can be readily assayed. Second, we package the library into adeno-associated virus (AAV), thereby allowing delivery to target organs in vivo. As a proof of concept, we introduce a capture library of about 46,000 constructs, corresponding to roughly 3500 DNase I hypersensitive (DHS) sites, into the mouse retina by ex vivo plasmid electroporation and into the mouse cerebral cortex by in vivo AAV injection. We demonstrate tissue-specific cis-regulatory activity of DHSs and provide examples of high-resolution truncation mutation analysis for multiplex parsing of CREs. Our approach should enable massively parallel functional analysis of a wide range of CREs in any organ or species that can be infected by AAV, such as nonhuman primates and human stem cell-derived organoids.

  14. Architecture for next-generation massively parallel maskless lithography system (MPML2)

    Science.gov (United States)

    Su, Ming-Shing; Tsai, Kuen-Yu; Lu, Yi-Chang; Kuo, Yu-Hsuan; Pei, Ting-Hang; Yen, Jia-Yush

    2010-03-01

    Electron-beam lithography is promising for future manufacturing technology because it does not suffer from wavelength limits set by light sources. Since single electron-beam lithography systems have a common problem in throughput, a multi-electron-beam lithography (MEBL) system should be a feasible alternative using the concept of massive parallelism. In this paper, we evaluate the advantages and the disadvantages of different MEBL system architectures, and propose our novel Massively Parallel MaskLess Lithography System, MPML2. MPML2 system is targeting for cost-effective manufacturing at the 32nm node and beyond. The key structure of the proposed system is its beamlet array cells (BACs). Hundreds of BACs are uniformly arranged over the whole wafer area in the proposed system. Each BAC has a data processor and an array of beamlets, and each beamlet consists of an electron-beam source, a source controller, a set of electron lenses, a blanker, a deflector, and an electron detector. These essential parts of beamlets are integrated using MEMS technology, which increases the density of beamlets and reduces the system cost. The data processor in the BAC processes layout information coming off-chamber and dispatches them to the corresponding beamlet to control its ON/OFF status. High manufacturing cost of masks is saved in maskless lithography systems, however, immense mask data are needed to be handled and transmitted. Therefore, data compression technique is applied to reduce required transmission bandwidth. The compression algorithm is fast and efficient so that the real-time decoder can be implemented on-chip. Consequently, the proposed MPML2 can achieve 10 wafers per hour (WPH) throughput for 300mm-wafer systems.

  15. Kinematic simulation of a parallel NC machine tool in the manufacturing process

    Institute of Scientific and Technical Information of China (English)

    ZHANG Jian-min; FAN Yu; JIA Dong-yun; ZOU Qiu-ling; WU Ying

    2006-01-01

    Aimed at enhancing the research status of parallel machine tools,this paper introduces the structure of a 6-SPS parallel machine tool and explains the application significance of the kinematic simulation of the manufacturing process.The simulation software was developed in Microsoft Visual C++6.0 using OpenGL API functions.The data flows are presented.Finally, a simulation application of odd leaf hyperboloid is performed before the manufacturing process; the kinematic simulation demonstrates the rationale of trace planning and the feasibility of manufacturing instruction.

  16. The Design of a Novel Prismatic Drive for a Three-DOF Parallel-Kinematics Machine

    CERN Document Server

    Renotte, Jérome; Angeles, Jorge

    2004-01-01

    The design of a novel prismatic drive is reported in this paper. This transmission is based on Slide-O-Cam, a cam mechanism with multiple rollers mounted on a common translating follower. The design of Slide-O-Cam was reported elsewhere. This drive thus provides pure-rolling motion, thereby reducing the friction of rack-and-pinions and linear drives. Such properties can be used to design new transmissions for parallel-kinematics machines. In this paper, this transmission is optimized to replace ball-screws in Orthoglide, a three-DOF parallel robot optimized for machining applications.

  17. The Design of a Novel Prismatic Drive for a Three-DOF Parallel-Kinematics Machine

    CERN Document Server

    Chablat, Damien

    2011-01-01

    The design of a novel prismatic drive is reported in this paper. This transmission is based on Slide-o-Cam, a cam mechanism with multiple rollers mounted on a common translating follower. The design of Slide-o-Cam was reported elsewhere. This drive thus provides pure-rolling motion, thereby reducing the friction of rack-and-pinions and linear drives. Such properties can be used to design new transmissions for parallel-kinematics machines. In this paper, this transmission is intended to replace the ball-screws in Orthoglide, a three-dof parallel robot intended for machining applications.

  18. A comparison study of two Tricept units for reconfigurable parallel kinematic machines

    Institute of Scientific and Technical Information of China (English)

    Shi Junshan; Tang Xiaoqiang; Lin Chunshen; Wang Liping

    2005-01-01

    This paper presents a comparison study of workspace and dexterity of two Tricept units for Reconfigurable Parallel Kinematic Machines (RPKMs). The modular leg of RPKMs is designed and the RPKMs can be built by changing the setting of modules. A compositive kinematic model is developed accordingly. The inverse kinematics and Jacobian of these two Tricept units are analyzed. Considering workspace volume and dexterity, the effects of geometric size of some modules on the two Tricept units are discussed. In the end, comparison results of these two Tricept units are given. The comparison of two kinds of Parallel Kinematic Machines (PKMs) can be of help in the design and configuration planning of the RPKMs.

  19. Adaptation and optimization of basic operations for an unstructured mesh CFD algorithm for computation on massively parallel accelerators

    Science.gov (United States)

    Bogdanov, P. B.; Gorobets, A. V.; Sukov, S. A.

    2013-08-01

    The design of efficient algorithms for large-scale gas dynamics computations with hybrid (heterogeneous) computing systems whose high performance relies on massively parallel accelerators is addressed. A high-order accurate finite volume algorithm with polynomial reconstruction on unstructured hybrid meshes is used to compute compressible gas flows in domains of complex geometry. The basic operations of the algorithm are implemented in detail for massively parallel accelerators, including AMD and NVIDIA graphics processing units (GPUs). Major optimization approaches and a computation transfer technique are covered. The underlying programming tool is the Open Computing Language (OpenCL) standard, which performs on accelerators of various architectures, both existing and emerging.

  20. Preemptive Semi-online Algorithms for Parallel Machine Scheduling with Known Total Size

    Institute of Scientific and Technical Information of China (English)

    Yong HE; Hao ZHOU; Yi Wei JIANG

    2006-01-01

    This paper investigates preemptive semi-online scheduling problems on m identical parallel machines, where the total size of all jobs is known in advance. The goal is to minimize the maximum machine completion time or maximize the minimum machine completion time. For the first objective,we present an optimal semi-online algorithm with competitive ratio 1. For the second objective, we show that the competitive ratio of any semi-online algorithm is at least 2m-3/m-1 for any m > 2 and presentoptimal semi-online algorithms for m = 2,3.

  1. Complete data preparation flow for Massively Parallel E-Beam lithography on 28nm node full-field design

    Science.gov (United States)

    Fay, Aurélien; Browning, Clyde; Brandt, Pieter; Chartoire, Jacky; Bérard-Bergery, Sébastien; Hazart, Jérôme; Chagoya, Alexandre; Postnikov, Sergei; Saib, Mohamed; Lattard, Ludovic; Schavione, Patrick

    2016-03-01

    Massively parallel mask-less electron beam lithography (MP-EBL) offers a large intrinsic flexibility at a low cost of ownership in comparison to conventional optical lithography tools. This attractive direct-write technique needs a dedicated data preparation flow to correct both electronic and resist processes. Moreover, Data Prep has to be completed in a short enough time to preserve the flexibility advantage of MP-EBL. While the MP-EBL tools have currently entered an advanced stage of development, this paper will focus on the data preparation side of the work for specifically the MAPPER Lithography FLX-1200 tool [1]-[4], using the ASELTA Nanographics Inscale software. The complete flow as well as the methodology used to achieve a full-field layout data preparation, within an acceptable cycle time, will be presented. Layout used for Data Prep evaluation was one of a 28 nm technology node Metal1 chip with a field size of 26x33mm2, compatible with typical stepper/scanner field sizes and wafer stepping plans. Proximity Effect Correction (PEC) was applied to the entire field, which was then exported as a single file to MAPPER Lithography's machine format, containing fractured shapes and dose assignments. The Soft Edge beam to beam stitching method was employed in the specific overlap regions defined by the machine format as well. In addition to PEC, verification of the correction was included as part of the overall data preparation cycle time. This verification step was executed on the machine file format to ensure pattern fidelity and accuracy as late in the flow as possible. Verification over the full chip, involving billions of evaluation points, is performed both at nominal conditions and at Process Window corners in order to ensure proper exposure and process latitude. The complete MP-EBL data preparation flow was demonstrated for a 28 nm node Metal1 layout in 37 hours. The final verification step shows that the Edge Placement Error (EPE) is kept below 2.25 nm

  2. Automation of Molecular-Based Analyses: A Primer on Massively Parallel Sequencing

    Science.gov (United States)

    Nguyen, Lan; Burnett, Leslie

    2014-01-01

    Recent advances in genetics have been enabled by new genetic sequencing techniques called massively parallel sequencing (MPS) or next-generation sequencing. Through the ability to sequence in parallel hundreds of thousands to millions of DNA fragments, the cost and time required for sequencing has dramatically decreased. There are a number of different MPS platforms currently available and being used in Australia. Although they differ in the underlying technology involved, their overall processes are very similar: DNA fragmentation, adaptor ligation, immobilisation, amplification, sequencing reaction and data analysis. MPS is being used in research, translational and increasingly now also in clinical settings. Common applications include sequencing of whole genomes, whole exomes or targeted genes for disease-causing gene discovery, genetic diagnosis and targeted cancer therapy. Even though the revolution that is occurring with MPS is exciting due to its increasing use, improving and emerging technologies and new applications, significant challenges still exist. Particularly challenging issues are the bioinformatics required for data analysis, interpretation of results and the ethical dilemma of ‘incidental findings’. PMID:25336762

  3. GPAW - massively parallel electronic structure calculations with Python-based software.

    Energy Technology Data Exchange (ETDEWEB)

    Enkovaara, J.; Romero, N.; Shende, S.; Mortensen, J. (LCF)

    2011-01-01

    Electronic structure calculations are a widely used tool in materials science and large consumer of supercomputing resources. Traditionally, the software packages for these kind of simulations have been implemented in compiled languages, where Fortran in its different versions has been the most popular choice. While dynamic, interpreted languages, such as Python, can increase the effciency of programmer, they cannot compete directly with the raw performance of compiled languages. However, by using an interpreted language together with a compiled language, it is possible to have most of the productivity enhancing features together with a good numerical performance. We have used this approach in implementing an electronic structure simulation software GPAW using the combination of Python and C programming languages. While the chosen approach works well in standard workstations and Unix environments, massively parallel supercomputing systems can present some challenges in porting, debugging and profiling the software. In this paper we describe some details of the implementation and discuss the advantages and challenges of the combined Python/C approach. We show that despite the challenges it is possible to obtain good numerical performance and good parallel scalability with Python based software.

  4. A Massively Parallel Computational Method of Reading Index Files for SOAPsnv.

    Science.gov (United States)

    Zhu, Xiaoqian; Peng, Shaoliang; Liu, Shaojie; Cui, Yingbo; Gu, Xiang; Gao, Ming; Fang, Lin; Fang, Xiaodong

    2015-12-01

    SOAPsnv is the software used for identifying the single nucleotide variation in cancer genes. However, its performance is yet to match the massive amount of data to be processed. Experiments reveal that the main performance bottleneck of SOAPsnv software is the pileup algorithm. The original pileup algorithm's I/O process is time-consuming and inefficient to read input files. Moreover, the scalability of the pileup algorithm is also poor. Therefore, we designed a new algorithm, named BamPileup, aiming to improve the performance of sequential read, and the new pileup algorithm implemented a parallel read mode based on index. Using this method, each thread can directly read the data start from a specific position. The results of experiments on the Tianhe-2 supercomputer show that, when reading data in a multi-threaded parallel I/O way, the processing time of algorithm is reduced to 3.9 s and the application program can achieve a speedup up to 100×. Moreover, the scalability of the new algorithm is also satisfying.

  5. Hybrid Numerical Solvers for Massively Parallel Eigenvalue Computation and Their Benchmark with Electronic Structure Calculations

    CERN Document Server

    Imachi, Hiroto

    2015-01-01

    Optimally hybrid numerical solvers were constructed for massively parallel generalized eigenvalue problem (GEP).The strong scaling benchmark was carried out on the K computer and other supercomputers for electronic structure calculation problems in the matrix sizes of M = 10^4-10^6 with upto 105 cores. The procedure of GEP is decomposed into the two subprocedures of the reducer to the standard eigenvalue problem (SEP) and the solver of SEP. A hybrid solver is constructed, when a routine is chosen for each subprocedure from the three parallel solver libraries of ScaLAPACK, ELPA and EigenExa. The hybrid solvers with the two newer libraries, ELPA and EigenExa, give better benchmark results than the conventional ScaLAPACK library. The detailed analysis on the results implies that the reducer can be a bottleneck in next-generation (exa-scale) supercomputers, which indicates the guidance for future research. The code was developed as a middleware and a mini-application and will appear online.

  6. Massive Exploration of Perturbed Conditions of the Blood Coagulation Cascade through GPU Parallelization

    Directory of Open Access Journals (Sweden)

    Paolo Cazzaniga

    2014-01-01

    high-performance computing solutions is motivated by the need of performing large numbers of in silico analysis to study the behavior of biological systems in different conditions, which necessitate a computing power that usually overtakes the capability of standard desktop computers. In this work we present coagSODA, a CUDA-powered computational tool that was purposely developed for the analysis of a large mechanistic model of the blood coagulation cascade (BCC, defined according to both mass-action kinetics and Hill functions. coagSODA allows the execution of parallel simulations of the dynamics of the BCC by automatically deriving the system of ordinary differential equations and then exploiting the numerical integration algorithm LSODA. We present the biological results achieved with a massive exploration of perturbed conditions of the BCC, carried out with one-dimensional and bi-dimensional parameter sweep analysis, and show that GPU-accelerated parallel simulations of this model can increase the computational performances up to a 181× speedup compared to the corresponding sequential simulations.

  7. GPU-accelerated Tersoff potentials for massively parallel Molecular Dynamics simulations

    Science.gov (United States)

    Nguyen, Trung Dac

    2017-03-01

    The Tersoff potential is one of the empirical many-body potentials that has been widely used in simulation studies at atomic scales. Unlike pair-wise potentials, the Tersoff potential involves three-body terms, which require much more arithmetic operations and data dependency. In this contribution, we have implemented the GPU-accelerated version of several variants of the Tersoff potential for LAMMPS, an open-source massively parallel Molecular Dynamics code. Compared to the existing MPI implementation in LAMMPS, the GPU implementation exhibits a better scalability and offers a speedup of 2.2X when run on 1000 compute nodes on the Titan supercomputer. On a single node, the speedup ranges from 2.0 to 8.0 times, depending on the number of atoms per GPU and hardware configurations. The most notable features of our GPU-accelerated version include its design for MPI/accelerator heterogeneous parallelism, its compatibility with other functionalities in LAMMPS, its ability to give deterministic results and to support both NVIDIA CUDA- and OpenCL-enabled accelerators. Our implementation is now part of the GPU package in LAMMPS and accessible for public use.

  8. Massive exploration of perturbed conditions of the blood coagulation cascade through GPU parallelization.

    Science.gov (United States)

    Cazzaniga, Paolo; Nobile, Marco S; Besozzi, Daniela; Bellini, Matteo; Mauri, Giancarlo

    2014-01-01

    The introduction of general-purpose Graphics Processing Units (GPUs) is boosting scientific applications in Bioinformatics, Systems Biology, and Computational Biology. In these fields, the use of high-performance computing solutions is motivated by the need of performing large numbers of in silico analysis to study the behavior of biological systems in different conditions, which necessitate a computing power that usually overtakes the capability of standard desktop computers. In this work we present coagSODA, a CUDA-powered computational tool that was purposely developed for the analysis of a large mechanistic model of the blood coagulation cascade (BCC), defined according to both mass-action kinetics and Hill functions. coagSODA allows the execution of parallel simulations of the dynamics of the BCC by automatically deriving the system of ordinary differential equations and then exploiting the numerical integration algorithm LSODA. We present the biological results achieved with a massive exploration of perturbed conditions of the BCC, carried out with one-dimensional and bi-dimensional parameter sweep analysis, and show that GPU-accelerated parallel simulations of this model can increase the computational performances up to a 181× speedup compared to the corresponding sequential simulations.

  9. Massively parallel sequencing for early molecular diagnosis in Leber congenital amaurosis.

    Science.gov (United States)

    Coppieters, Frauke; De Wilde, Bram; Lefever, Steve; De Meester, Ellen; De Rocker, Nina; Van Cauwenbergh, Caroline; Pattyn, Filip; Meire, Françoise; Leroy, Bart P; Hellemans, Jan; Vandesompele, Jo; De Baere, Elfride

    2012-06-01

    Leber congenital amaurosis (LCA) is a rare congenital retinal dystrophy associated with 16 genes. Recent breakthroughs in LCA gene therapy offer the first prospect of treating inherited blindness, which requires an unequivocal and early molecular diagnosis. While present genetic tests do not address this due to a tremendous genetic heterogeneity, massively parallel sequencing (MPS) strategies might bring a solution. Here, we developed a comprehensive molecular test for LCA based on targeted MPS of all exons of 16 known LCA genes. We designed a unique and flexible workflow for targeted resequencing of all 236 exons from 16 LCA genes based on quantitative PCR (qPCR) amplicon ligation, shearing, and parallel sequencing of multiple patients on a single lane of a short-read sequencer. Twenty-two prescreened LCA patients were included, five of whom had a known molecular cause. Validation of 107 variations was performed as proof of concept. In addition, the causal genetic defect and a single heterozygous mutation were identified in 3 and 5, respectively, of 17 patients without previously identified mutations. We propose a novel targeted MPS-based approach that is suitable for accurate, fast, and cost-effective early molecular testing in LCA, and easily applicable in other genetic disorders.

  10. Massive parallel IGHV gene sequencing reveals a germinal center pathway in origins of human multiple myeloma.

    Science.gov (United States)

    Cowan, Graeme; Weston-Bell, Nicola J; Bryant, Dean; Seckinger, Anja; Hose, Dirk; Zojer, Niklas; Sahota, Surinder S

    2015-05-30

    Human multiple myeloma (MM) is characterized by accumulation of malignant terminally differentiated plasma cells (PCs) in the bone marrow (BM), raising the question when during maturation neoplastic transformation begins. Immunoglobulin IGHV genes carry imprints of clonal tumor history, delineating somatic hypermutation (SHM) events that generally occur in the germinal center (GC). Here, we examine MM-derived IGHV genes using massive parallel deep sequencing, comparing them with profiles in normal BM PCs. In 4/4 presentation IgG MM, monoclonal tumor-derived IGHV sequences revealed significant evidence for intraclonal variation (ICV) in mutation patterns. IGHV sequences of 2/2 normal PC IgG populations revealed dominant oligoclonal expansions, each expansion also displaying mutational ICV. Clonal expansions in MM and in normal BM PCs reveal common IGHV features. In such MM, the data fit a model of tumor origins in which neoplastic transformation is initiated in a GC B-cell committed to terminal differentiation but still targeted by on-going SHM. Strikingly, the data parallel IGHV clonal sequences in some monoclonal gammopathy of undetermined significance (MGUS) known to display on-going SHM imprints. Since MGUS generally precedes MM, these data suggest origins of MGUS and MM with IGHV gene mutational ICV from the same GC B-cell, arising via a distinctive pathway.

  11. Massively parallel recording of unit and local field potentials with silicon-based electrodes.

    Science.gov (United States)

    Csicsvari, Jozsef; Henze, Darrell A; Jamieson, Brian; Harris, Kenneth D; Sirota, Anton; Barthó, Péter; Wise, Kensall D; Buzsáki, György

    2003-08-01

    Parallel recording of neuronal activity in the behaving animal is a prerequisite for our understanding of neuronal representation and storage of information. Here we describe the development of micro-machined silicon microelectrode arrays for unit and local field recordings. The two-dimensional probes with 96 or 64 recording sites provided high-density recording of unit and field activity with minimal tissue displacement or damage. The on-chip active circuit eliminated movement and other artifacts and greatly reduced the weight of the headgear. The precise geometry of the recording tips allowed for the estimation of the spatial location of the recorded neurons and for high-resolution estimation of extracellular current source density. Action potentials could be simultaneously recorded from the soma and dendrites of the same neurons. Silicon technology is a promising approach for high-density, high-resolution sampling of neuronal activity in both basic research and prosthetic devices.

  12. Implementing Flexible and Scalable Particle-in-Cell Methods for Massively Parallel Computations

    Science.gov (United States)

    Gassmoeller, R.; Bangerth, W.; Puckett, E. G.; Thieulot, C.; Heien, E. M.

    2016-12-01

    Particle-in-cell methods have a long history in modeling of mantle convection, lithospheric deformation and crustal dynamics. They are primarily used to track material information, the strain a material has undergone, the pressure-temperature history of a certain material, or the amount of volatiles or partial melt present in a region. However, their efficient parallel implementation - in particular combined with adaptive meshes - is complicated due to the complex communication and frequent reassignment of particles to cells. Consequently, many scientific software packages accomplish this efficiency by designing particle methods for a single purpose, like the advection of scalar properties that do not evolve over time (e.g., chemical heterogeneities). Design choices for particle advection, data storage, and parallel communication are then optimized for this single purpose, making the code rigid to changing requirements. Here, we present algorithms for a flexible, scalable and efficient particle-in-cell method for massively parallel finite-element codes with adaptively changing meshes. Using a modular plugin structure, we allow maximum flexibility of the generation of particles, the carried tracer properties, the advection and output algorithms, and the projection of properties to the finite-element mesh. We discuss the complexity of the these algorithms and present scaling tests ranging up to tens of thousands of cores and tens of billions of particles. We also discuss load-balancing strategies such as balanced repartitioning for particles in adaptive meshes, quantify sources of errors for the advection of particles, as well as how a proposed velocity correction can address the divergence of the velocity within a cell, and how higher-order finite elements can reduce the need for such a correction. Finally, we present whole mantle convection models as application cases, and compare our implementation to a modern advection-field approach.. We have implemented these

  13. Parallel computing works!

    CERN Document Server

    Fox, Geoffrey C; Messina, Guiseppe C

    2014-01-01

    A clear illustration of how parallel computers can be successfully appliedto large-scale scientific computations. This book demonstrates how avariety of applications in physics, biology, mathematics and other scienceswere implemented on real parallel computers to produce new scientificresults. It investigates issues of fine-grained parallelism relevant forfuture supercomputers with particular emphasis on hypercube architecture. The authors describe how they used an experimental approach to configuredifferent massively parallel machines, design and implement basic systemsoftware, and develop

  14. List scheduling in a parallel machine environment with precedence constraints and setup times

    NARCIS (Netherlands)

    Hurink, J.L.; Knust, S.

    2000-01-01

    We present complexity results which have influence on the strength of list scheduling in a parallel machine environment where additionally precedence constraints and sequence-dependent setup times are given and the makespan has to be minimized. We show that contrary to various other scheduling probl

  15. List scheduling in a parallel machine environment with precedence constraints and setup times

    NARCIS (Netherlands)

    Hurink, Johann; Knust, Sigrid

    2001-01-01

    We present complexity results which have influence on the strength of list scheduling in a parallel machine environment where additionally precedence constraints and sequence-dependent setup times are given and the makespan has to be minimized. We show that contrary to various other scheduling probl

  16. Convex quadratic programming relaxations for parallel machine scheduling with controllable processing times subject to release times

    Institute of Scientific and Technical Information of China (English)

    ZHANG Feng; CHEN Feng; TANG Guochun

    2004-01-01

    Scheduling unrelated parallel machines with controllable processing times subject to release times is investigated. Based on the convex quadratic programming relaxation and the randomized rounding strategy, a 2-approximation algorithm is obtained for a special case with the all-or-none property and then a 3-approximation algorithm is presented for general problem.

  17. An Ant Optimization Model for Unrelated Parallel Machine Scheduling with Energy Consumption and Total Tardiness

    Directory of Open Access Journals (Sweden)

    Peng Liang

    2015-01-01

    Full Text Available This research considers an unrelated parallel machine scheduling problem with energy consumption and total tardiness. This problem is compounded by two challenges: differences of unrelated parallel machines energy consumption and interaction between job assignments and machine state operations. To begin with, we establish a mathematical model for this problem. Then an ant optimization algorithm based on ATC heuristic rule (ATC-ACO is presented. Furthermore, optimal parameters of proposed algorithm are defined via Taguchi methods for generating test data. Finally, comparative experiments indicate the proposed ATC-ACO algorithm has better performance on minimizing energy consumption as well as total tardiness and the modified ATC heuristic rule is more effectively on reducing energy consumption.

  18. Ant System Based Optimization Algorithm and Its Applications in Identical Parallel Machine Scheduling

    Institute of Scientific and Technical Information of China (English)

    陈义保; 姚建初; 钟毅芳

    2002-01-01

    Identical parallel machine scheduling problem for minimizing the makespan is a very important productionscheduling problem. When its scale is large, many difficulties will arise in the course of solving identical parallel machinescheduling problem. Ant system based optimization algorithm (ASBOA) has shown great advantages in solving thecombinatorial optimization problem in view of its characteristics of high efficiency and suitability for practical applications.In this paper, an ASBOA for minimizing the makespan in identical machine scheduling problem is presented. Twodifferent scale numerical examples demonstrate that the ASBOA proposed is efficient and fit for large-scale identicalparallel machine scheduling problem for minimizing the makespan, the quality of its solution has advantages over heuristicprocedure and simulated annealing method, as well as genetic algorithm.

  19. Convergence analysis of a class of massively parallel direction splitting algorithms for the Navier-Stokes equations in simple domains

    KAUST Repository

    Guermond, Jean-Luc

    2012-01-01

    We provide a convergence analysis for a new fractional timestepping technique for the incompressible Navier-Stokes equations based on direction splitting. This new technique is of linear complexity, unconditionally stable and convergent, and suitable for massive parallelization. © 2012 American Mathematical Society.

  20. Revealing the Physics of Galactic Winds Through Massively-Parallel Hydrodynamics Simulations

    Science.gov (United States)

    Schneider, Evan Elizabeth

    This thesis documents the hydrodynamics code Cholla and a numerical study of multiphase galactic winds. Cholla is a massively-parallel, GPU-based code designed for astrophysical simulations that is freely available to the astrophysics community. A static-mesh Eulerian code, Cholla is ideally suited to carrying out massive simulations (> 20483 cells) that require very high resolution. The code incorporates state-of-the-art hydrodynamics algorithms including third-order spatial reconstruction, exact and linearized Riemann solvers, and unsplit integration algorithms that account for transverse fluxes on multidimensional grids. Operator-split radiative cooling and a dual-energy formalism for high mach number flows are also included. An extensive test suite demonstrates Cholla's superior ability to model shocks and discontinuities, while the GPU-native design makes the code extremely computationally efficient - speeds of 5-10 million cell updates per GPU-second are typical on current hardware for 3D simulations with all of the aforementioned physics. The latter half of this work comprises a comprehensive study of the mixing between a hot, supernova-driven wind and cooler clouds representative of those observed in multiphase galactic winds. Both adiabatic and radiatively-cooling clouds are investigated. The analytic theory of cloud-crushing is applied to the problem, and adiabatic turbulent clouds are found to be mixed with the hot wind on similar timescales as the classic spherical case (4-5 t cc) with an appropriate rescaling of the cloud-crushing time. Radiatively cooling clouds survive considerably longer, and the differences in evolution between turbulent and spherical clouds cannot be reconciled with a simple rescaling. The rapid incorporation of low-density material into the hot wind implies efficient mass-loading of hot phases of galactic winds. At the same time, the extreme compression of high-density cloud material leads to long-lived but slow-moving clumps

  1. Automatic Mapping Of Large Signal Processing Systems To A Parallel Machine

    Science.gov (United States)

    Printz, Harry; Kung, H. T.; Mummert, Todd; Scherer, Paul M.

    1989-12-01

    Since the spring of 1988, Carnegie Mellon University and the Naval Air Development Center have been working together to implement several large signal processing systems on the Warp parallel computer. In the course of this work, we have developed a prototype of a software tool that can automatically and efficiently map signal processing systems to distributed-memory parallel machines, such as Warp. We have used this tool to produce Warp implementations of small test systems. The automatically generated programs compare favorably with hand-crafted code. We believe this tool will be a significant aid in the creation of high speed signal processing systems. We assume that signal processing systems have the following characteristics: •They can be described by directed graphs of computational tasks; these graphs may contain thousands of task vertices. • Some tasks can be parallelized in a systolic or data-partitioned manner, while others cannot be parallelized at all. • The side effects of each task, if any, are limited to changes in local variables. • Each task has a data-independent execution time bound, which may be expressed as a function of the way it is parallelized, and the number of processors it is mapped to. In this paper we describe techniques to automatically map such systems to Warp-like parallel machines. We identify and address key issues in gracefully combining different parallel programming styles, in allocating processor, memory and communication bandwidth, and in generating and scheduling efficient parallel code. When iWarp, the VLSI version of the Warp machine, becomes available in 1990, we will extend this tool to generate efficient code for very large applications, which may require as many as 3000 iWarp processors, with an aggregate peak performance of 60 gigaflops.

  2. A quantitative assessment of the Hadoop framework for analyzing massively parallel DNA sequencing data.

    Science.gov (United States)

    Siretskiy, Alexey; Sundqvist, Tore; Voznesenskiy, Mikhail; Spjuth, Ola

    2015-01-01

    New high-throughput technologies, such as massively parallel sequencing, have transformed the life sciences into a data-intensive field. The most common e-infrastructure for analyzing this data consists of batch systems that are based on high-performance computing resources; however, the bioinformatics software that is built on this platform does not scale well in the general case. Recently, the Hadoop platform has emerged as an interesting option to address the challenges of increasingly large datasets with distributed storage, distributed processing, built-in data locality, fault tolerance, and an appealing programming methodology. In this work we introduce metrics and report on a quantitative comparison between Hadoop and a single node of conventional high-performance computing resources for the tasks of short read mapping and variant calling. We calculate efficiency as a function of data size and observe that the Hadoop platform is more efficient for biologically relevant data sizes in terms of computing hours for both split and un-split data files. We also quantify the advantages of the data locality provided by Hadoop for NGS problems, and show that a classical architecture with network-attached storage will not scale when computing resources increase in numbers. Measurements were performed using ten datasets of different sizes, up to 100 gigabases, using the pipeline implemented in Crossbow. To make a fair comparison, we implemented an improved preprocessor for Hadoop with better performance for splittable data files. For improved usability, we implemented a graphical user interface for Crossbow in a private cloud environment using the CloudGene platform. All of the code and data in this study are freely available as open source in public repositories. From our experiments we can conclude that the improved Hadoop pipeline scales better than the same pipeline on high-performance computing resources, we also conclude that Hadoop is an economically viable

  3. Wideband aperture array using RF channelizers and massively parallel digital 2D IIR filterbank

    Science.gov (United States)

    Sengupta, Arindam; Madanayake, Arjuna; Gómez-García, Roberto; Engeberg, Erik D.

    2014-05-01

    Wideband receive-mode beamforming applications in wireless location, electronically-scanned antennas for radar, RF sensing, microwave imaging and wireless communications require digital aperture arrays that offer a relatively constant far-field beam over several octaves of bandwidth. Several beamforming schemes including the well-known true time-delay and the phased array beamformers have been realized using either finite impulse response (FIR) or fast Fourier transform (FFT) digital filter-sum based techniques. These beamforming algorithms offer the desired selectivity at the cost of a high computational complexity and frequency-dependant far-field array patterns. A novel approach to receiver beamforming is the use of massively parallel 2-D infinite impulse response (IIR) fan filterbanks for the synthesis of relatively frequency independent RF beams at an order of magnitude lower multiplier complexity compared to FFT or FIR filter based conventional algorithms. The 2-D IIR filterbanks demand fast digital processing that can support several octaves of RF bandwidth, fast analog-to-digital converters (ADCs) for RF-to-bits type direct conversion of wideband antenna element signals. Fast digital implementation platforms that can realize high-precision recursive filter structures necessary for real-time beamforming, at RF radio bandwidths, are also desired. We propose a novel technique that combines a passive RF channelizer, multichannel ADC technology, and single-phase massively parallel 2-D IIR digital fan filterbanks, realized at low complexity using FPGA and/or ASIC technology. There exists native support for a larger bandwidth than the maximum clock frequency of the digital implementation technology. We also strive to achieve More-than-Moore throughput by processing a wideband RF signal having content with N-fold (B = N Fclk/2) bandwidth compared to the maximum clock frequency Fclk Hz of the digital VLSI platform under consideration. Such increase in bandwidth is

  4. Comparison of pre-analytical FFPE sample preparation methods and their impact on massively parallel sequencing in routine diagnostics.

    Directory of Open Access Journals (Sweden)

    Carina Heydt

    Full Text Available Over the last years, massively parallel sequencing has rapidly evolved and has now transitioned into molecular pathology routine laboratories. It is an attractive platform for analysing multiple genes at the same time with very little input material. Therefore, the need for high quality DNA obtained from automated DNA extraction systems has increased, especially to those laboratories which are dealing with formalin-fixed paraffin-embedded (FFPE material and high sample throughput. This study evaluated five automated FFPE DNA extraction systems as well as five DNA quantification systems using the three most common techniques, UV spectrophotometry, fluorescent dye-based quantification and quantitative PCR, on 26 FFPE tissue samples. Additionally, the effects on downstream applications were analysed to find the most suitable pre-analytical methods for massively parallel sequencing in routine diagnostics. The results revealed that the Maxwell 16 from Promega (Mannheim, Germany seems to be the superior system for DNA extraction from FFPE material. The extracts had a 1.3-24.6-fold higher DNA concentration in comparison to the other extraction systems, a higher quality and were most suitable for downstream applications. The comparison of the five quantification methods showed intermethod variations but all methods could be used to estimate the right amount for PCR amplification and for massively parallel sequencing. Interestingly, the best results in massively parallel sequencing were obtained with a DNA input of 15 ng determined by the NanoDrop 2000c spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA. No difference could be detected in mutation analysis based on the results of the quantification methods. These findings emphasise, that it is particularly important to choose the most reliable and constant DNA extraction system, especially when using small biopsies and low elution volumes, and that all common DNA quantification techniques can

  5. Hierarchical Image Segmentation of Remotely Sensed Data using Massively Parallel GNU-LINUX Software

    Science.gov (United States)

    Tilton, James C.

    2003-01-01

    A hierarchical set of image segmentations is a set of several image segmentations of the same image at different levels of detail in which the segmentations at coarser levels of detail can be produced from simple merges of regions at finer levels of detail. In [1], Tilton, et a1 describes an approach for producing hierarchical segmentations (called HSEG) and gave a progress report on exploiting these hierarchical segmentations for image information mining. The HSEG algorithm is a hybrid of region growing and constrained spectral clustering that produces a hierarchical set of image segmentations based on detected convergence points. In the main, HSEG employs the hierarchical stepwise optimization (HSWO) approach to region growing, which was described as early as 1989 by Beaulieu and Goldberg. The HSWO approach seeks to produce segmentations that are more optimized than those produced by more classic approaches to region growing (e.g. Horowitz and T. Pavlidis, [3]). In addition, HSEG optionally interjects between HSWO region growing iterations, merges between spatially non-adjacent regions (i.e., spectrally based merging or clustering) constrained by a threshold derived from the previous HSWO region growing iteration. While the addition of constrained spectral clustering improves the utility of the segmentation results, especially for larger images, it also significantly increases HSEG s computational requirements. To counteract this, a computationally efficient recursive, divide-and-conquer, implementation of HSEG (RHSEG) was devised, which includes special code to avoid processing artifacts caused by RHSEG s recursive subdivision of the image data. The recursive nature of RHSEG makes for a straightforward parallel implementation. This paper describes the HSEG algorithm, its recursive formulation (referred to as RHSEG), and the implementation of RHSEG using massively parallel GNU-LINUX software. Results with Landsat TM data are included comparing RHSEG with classic

  6. Multiplexed massively parallel SELEX for characterization of human transcription factor binding specificities

    Science.gov (United States)

    Jolma, Arttu; Kivioja, Teemu; Toivonen, Jarkko; Cheng, Lu; Wei, Gonghong; Enge, Martin; Taipale, Mikko; Vaquerizas, Juan M.; Yan, Jian; Sillanpää, Mikko J.; Bonke, Martin; Palin, Kimmo; Talukder, Shaheynoor; Hughes, Timothy R.; Luscombe, Nicholas M.; Ukkonen, Esko; Taipale, Jussi

    2010-01-01

    The genetic code—the binding specificity of all transfer-RNAs—defines how protein primary structure is determined by DNA sequence. DNA also dictates when and where proteins are expressed, and this information is encoded in a pattern of specific sequence motifs that are recognized by transcription factors. However, the DNA-binding specificity is only known for a small fraction of the ∼1400 human transcription factors (TFs). We describe here a high-throughput method for analyzing transcription factor binding specificity that is based on systematic evolution of ligands by exponential enrichment (SELEX) and massively parallel sequencing. The method is optimized for analysis of large numbers of TFs in parallel through the use of affinity-tagged proteins, barcoded selection oligonucleotides, and multiplexed sequencing. Data are analyzed by a new bioinformatic platform that uses the hundreds of thousands of sequencing reads obtained to control the quality of the experiments and to generate binding motifs for the TFs. The described technology allows higher throughput and identification of much longer binding profiles than current microarray-based methods. In addition, as our method is based on proteins expressed in mammalian cells, it can also be used to characterize DNA-binding preferences of full-length proteins or proteins requiring post-translational modifications. We validate the method by determining binding specificities of 14 different classes of TFs and by confirming the specificities for NFATC1 and RFX3 using ChIP-seq. Our results reveal unexpected dimeric modes of binding for several factors that were thought to preferentially bind DNA as monomers. PMID:20378718

  7. Research on postprocessing of seven-axis linkage parallel kinematics machine with complicated surfaces

    Institute of Scientific and Technical Information of China (English)

    WEI Yong-geng; SHI Yong; ZHAO Kun

    2007-01-01

    Because of restriction of workspace of parallel kinematics Machine (PKM), 6 DOF PKM can't finish machining of workpiece with complicated surfaces under only once locating. It is necessary to fit workpiece beyond twice and to lead to low machining precision. Therefore the seven-axis linkage PKM is implemented by fixing a turntable on the worktable of the six-axis linkage PKM. However, the turntable angle decomposing problem from the CL file should be well considered. If the traditional decomposing methods are adopted, the nutation angle usually goes beyond the workspace of the machine. Therefore, according to the relation of the machine coordinate system and the workpiece coordinate system, the turntable angle decomposition algorithmic of the consistent coordinate system and the turntable angle decomposition algorithmic of the non-consistent coordinate system are developed to resolve the problem mentioned above. The turntable angle decomposition of the non-consistent coordinate system processes the decomposition which is based on the consistent coordinate system again. It calculates the initial angle of the locating workpiece, and the decomposed angle of the turntable at the machine coordinate system results in the nutation angle not going beyond workspace of the machine, thereby the decomposition process can be simplified.

  8. DGDFT: A Massively Parallel Method for Large Scale Density Functional Theory Calculations

    CERN Document Server

    Hu, Wei; Yang, Chao

    2015-01-01

    We describe a massively parallel implementation of the recently developed discontinuous Galerkin density functional theory (DGDFT) [J. Comput. Phys. 2012, 231, 2140] method, for efficient large-scale Kohn-Sham DFT based electronic structure calculations. The DGDFT method uses adaptive local basis (ALB) functions generated on-the-fly during the self-consistent field (SCF) iteration to represent the solution to the Kohn-Sham equations. The use of the ALB set provides a systematic way to improve the accuracy of the approximation. It minimizes the number of degrees of freedom required to represent the solution to the Kohn-Sham problem for a desired level of accuracy. In particular, DGDFT can reach the planewave accuracy with far fewer numbers of degrees of freedom. By using the pole expansion and selected inversion (PEXSI) technique to compute electron density, energy and atomic forces, we can make the computational complexity of DGDFT scale at most quadratically with respect to the number of electrons for both i...

  9. Prenatal detection of aneuploidy and imbalanced chromosomal arrangements by massively parallel sequencing.

    Directory of Open Access Journals (Sweden)

    Shan Dan

    Full Text Available Fetal chromosomal abnormalities are the most common reasons for invasive prenatal testing. Currently, G-band karyotyping and several molecular genetic methods have been established for diagnosis of chromosomal abnormalities. Although these testing methods are highly reliable, the major limitation remains restricted resolutions or can only achieve limited coverage on the human genome at one time. The massively parallel sequencing (MPS technologies which can reach single base pair resolution allows detection of genome-wide intragenic deletions and duplication challenging karyotyping and microarrays as the tool for prenatal diagnosis. Here we reported a novel and robust MPS-based method to detect aneuploidy and imbalanced chromosomal arrangements in amniotic fluid (AF samples. We sequenced 62 AF samples on Illumina GAIIx platform and with averagely 0.01× whole genome sequencing data we detected 13 samples with numerical chromosomal abnormalities by z-test. With up to 2× whole genome sequencing data we were able to detect microdeletion/microduplication (ranged from 1.4 Mb to 37.3 Mb of 5 samples from chorionic villus sampling (CVS using SeqSeq algorithm. Our work demonstrated MPS is a robust and accurate approach to detect aneuploidy and imbalanced chromosomal arrangements in prenatal samples.

  10. Effector identification in the lettuce downy mildew Bremia lactucae by massively parallel transcriptome sequencing.

    Science.gov (United States)

    Stassen, Joost H M; Seidl, Michael F; Vergeer, Pim W J; Nijman, Isaäc J; Snel, Berend; Cuppen, Edwin; Van den Ackerveken, Guido

    2012-09-01

    Lettuce downy mildew (Bremia lactucae) is a rapidly adapting oomycete pathogen affecting commercial lettuce cultivation. Oomycetes are known to use a diverse arsenal of secreted proteins (effectors) to manipulate their hosts. Two classes of effector are known to be translocated by the host: the RXLRs and Crinklers. To gain insight into the repertoire of effectors used by B. lactucae to manipulate its host, we performed massively parallel sequencing of cDNA derived from B. lactucae spores and infected lettuce (Lactuca sativa) seedlings. From over 2.3 million 454 GS FLX reads, 59 618 contigs were assembled representing both plant and pathogen transcripts. Of these, 19 663 contigs were determined to be of B. lactucae origin as they matched pathogen genome sequences (SOLiD) that were obtained from >270 million reads of spore-derived genomic DNA. After correction of cDNA sequencing errors with SOLiD data, translation into protein models and filtering, 16 372 protein models remained, 1023 of which were predicted to be secreted. This secretome included elicitins, necrosis and ethylene-inducing peptide 1-like proteins, glucanase inhibitors and lectins, and was enriched in cysteine-rich proteins. Candidate host-translocated effectors included 78 protein models with RXLR effector features. In addition, we found indications for an unknown number of Crinkler-like sequences. Similarity clustering of secreted proteins revealed additional effector candidates. We provide a first look at the transcriptome of B. lactucae and its encoded effector arsenal.

  11. Mitochondrial DNA heteroplasmy in the emerging field of massively parallel sequencing

    Science.gov (United States)

    Just, Rebecca S.; Irwin, Jodi A.; Parson, Walther

    2015-01-01

    Long an important and useful tool in forensic genetic investigations, mitochondrial DNA (mtDNA) typing continues to mature. Research in the last few years has demonstrated both that data from the entire molecule will have practical benefits in forensic DNA casework, and that massively parallel sequencing (MPS) methods will make full mitochondrial genome (mtGenome) sequencing of forensic specimens feasible and cost-effective. A spate of recent studies has employed these new technologies to assess intraindividual mtDNA variation. However, in several instances, contamination and other sources of mixed mtDNA data have been erroneously identified as heteroplasmy. Well vetted mtGenome datasets based on both Sanger and MPS sequences have found authentic point heteroplasmy in approximately 25% of individuals when minor component detection thresholds are in the range of 10–20%, along with positional distribution patterns in the coding region that differ from patterns of point heteroplasmy in the well-studied control region. A few recent studies that examined very low-level heteroplasmy are concordant with these observations when the data are examined at a common level of resolution. In this review we provide an overview of considerations related to the use of MPS technologies to detect mtDNA heteroplasmy. In addition, we examine published reports on point heteroplasmy to characterize features of the data that will assist in the evaluation of future mtGenome data developed by any typing method. PMID:26009256

  12. GRay: A Massively Parallel GPU-based Code for Ray Tracing in Relativistic Spacetimes

    Science.gov (United States)

    Chan, Chi-kwan; Psaltis, Dimitrios; Özel, Feryal

    2013-11-01

    We introduce GRay, a massively parallel integrator designed to trace the trajectories of billions of photons in a curved spacetime. This graphics-processing-unit (GPU)-based integrator employs the stream processing paradigm, is implemented in CUDA C/C++, and runs on nVidia graphics cards. The peak performance of GRay using single-precision floating-point arithmetic on a single GPU exceeds 300 GFLOP (or 1 ns per photon per time step). For a realistic problem, where the peak performance cannot be reached, GRay is two orders of magnitude faster than existing central-processing-unit-based ray-tracing codes. This performance enhancement allows more effective searches of large parameter spaces when comparing theoretical predictions of images, spectra, and light curves from the vicinities of compact objects to observations. GRay can also perform on-the-fly ray tracing within general relativistic magnetohydrodynamic algorithms that simulate accretion flows around compact objects. Making use of this algorithm, we calculate the properties of the shadows of Kerr black holes and the photon rings that surround them. We also provide accurate fitting formulae of their dependencies on black hole spin and observer inclination, which can be used to interpret upcoming observations of the black holes at the center of the Milky Way, as well as M87, with the Event Horizon Telescope.

  13. Discovering multiple transcripts of human hepatocytes using massively parallel signature sequencing (MPSS

    Directory of Open Access Journals (Sweden)

    Li Yi-Xue

    2007-07-01

    Full Text Available Abstract Background The liver is the largest human internal organ – it is composed of multiple cell types and plays a vital role in fulfilling the body's metabolic needs and maintaining homeostasis. Of these cell types the hepatocytes, which account for three-quarters of the liver's volume, perform its main functions. To discover the molecular basis of hepatocyte function, we employed Massively Parallel Signature Sequencing (MPSS to determine the transcriptomic profile of adult human hepatocytes obtained by laser capture microdissection (LCM. Results 10,279 UniGene clusters, representing 7,475 known genes, were detected in human hepatocytes. In addition, 1,819 unique MPSS signatures matching the antisense strand of 1,605 non-redundant UniGene clusters (such as APOC1, APOC2, APOB and APOH were highly expressed in hepatocytes. Conclusion Apart from a large number of protein-coding genes, some of the antisense transcripts expressed in hepatocytes could play important roles in transcriptional interference via a cis-/trans-regulation mechanism. Our result provided a comprehensively transcriptomic atlas of human hepatocytes using MPSS technique, which could be served as an available resource for an in-depth understanding of human liver biology and diseases.

  14. [A safe an easy method for building consensus HIV sequences from 454 massively parallel sequencing data].

    Science.gov (United States)

    Fernández-Caballero Rico, Jose Ángel; Chueca Porcuna, Natalia; Álvarez Estévez, Marta; Mosquera Gutiérrez, María Del Mar; Marcos Maeso, María Ángeles; García, Federico

    2016-10-03

    To show how to generate a consensus sequence from the information of massive parallel sequences data obtained from routine HIV anti-retroviral resistance studies, and that may be suitable for molecular epidemiology studies. Paired Sanger (Trugene-Siemens) and next-generation sequencing (NGS) (454 GSJunior-Roche) HIV RT and protease sequences from 62 patients were studied. NGS consensus sequences were generated using Mesquite, using 10%, 15%, and 20% thresholds. Molecular evolutionary genetics analysis (MEGA) was used for phylogenetic studies. At a 10% threshold, NGS-Sanger sequences from 17/62 patients were phylogenetically related, with a median bootstrap-value of 88% (IQR83.5-95.5). Association increased to 36/62 sequences, median bootstrap 94% (IQR85.5-98)], using a 15% threshold. Maximum association was at the 20% threshold, with 61/62 sequences associated, and a median bootstrap value of 99% (IQR98-100). A safe method is presented to generate consensus sequences from HIV-NGS data at 20% threshold, which will prove useful for molecular epidemiological studies. Copyright © 2016 Elsevier España, S.L.U. and Sociedad Española de Enfermedades Infecciosas y Microbiología Clínica. All rights reserved.

  15. Climate systems modeling on massively parallel processing computers at Lawrence Livermore National Laboratory

    Energy Technology Data Exchange (ETDEWEB)

    Wehner, W.F.; Mirin, A.A.; Bolstad, J.H. [and others

    1996-09-01

    A comprehensive climate system model is under development at Lawrence Livermore National Laboratory. The basis for this model is a consistent coupling of multiple complex subsystem models, each describing a major component of the Earth`s climate. Among these are general circulation models of the atmosphere and ocean, a dynamic and thermodynamic sea ice model, and models of the chemical processes occurring in the air, sea water, and near-surface land. The computational resources necessary to carry out simulations at adequate spatial resolutions for durations of climatic time scales exceed those currently available. Distributed memory massively parallel processing (MPP) computers promise to affordably scale to the computational rates required by directing large numbers of relatively inexpensive processors onto a single problem. We have developed a suite of routines designed to exploit current generation MPP architectures via domain and functional decomposition strategies. These message passing techniques have been implemented in each of the component models and in their coupling interfaces. Production runs of the atmospheric and oceanic components performed on the National Environmental Supercomputing Center (NESC) Cray T3D are described.

  16. Hybrid Parallel Bundle Adjustment for 3D Scene Reconstruction with Massive Points

    Institute of Scientific and Technical Information of China (English)

    Xin Liu; Wei Gao; Zhan-Yi Hu

    2012-01-01

    Bundle adjustment (BA) is a crucial but time consuming step in 3D reconstruction.In this paper,we intend to tackle a special class of BA problems where the reconstructed 3D points are much more numerous than the camera parameters,called Massive-Points BA (MPBA) problems.This is often the case when high-resolution images are used.We present a design and implementation of a new bundle adjustment algorithm for efficiently solving the MPBA problems.The use of hardware parallelism,the multi-core CPUs as well as GPUs,is explored.By careful memory-usage design,the graphic-memory limitation is effectively alleviated.Several modern acceleration strategies for bundle adjustment,such as the mixed-precision arithmetics,the embedded point iteration,and the preconditioned conjugate gradients,are explored and compared.By using several high-resolution image datasets,we generate a variety of MPBA problems,with which the performance of five bundle adjustment algorithms are evaluated.The experimental results show that our algorithm is up to 40 times faster than classical Sparse Bundle Adjustment,while maintaining comparable precision.

  17. Massively parallel energy space exploration for uncluttered visualization of vascular structures.

    Science.gov (United States)

    Jeon, Yongkweon; Won, Joong-Ho; Yoon, Sungroh

    2013-01-01

    Images captured using computed tomography and magnetic resonance angiography are used in the examination of the abdominal aorta and its branches. The examination of all clinically relevant branches simultaneously in a single 2-D image without any misleading overlaps facilitates the diagnosis of vascular abnormalities. This problem is called uncluttered single-image visualization (USIV). We can solve the USIV problem by assigning energy-based scores to visualization candidates and then finding the candidate that optimizes the score; this approach is similar to the manner in which the protein side-chain placement problem has been solved. To obtain near-optimum images, we need to explore the energy space extensively, which is often time consuming. This paper describes a method for exploring the energy space in a massively parallel fashion using graphics processing units. According to our experiments, in which we used 30 images obtained from five patients, the proposed method can reduce the total visualization time substantially. We believe that the proposed method can make a significant contribution to the effective visualization of abdominal vascular structures and precise diagnosis of related abnormalities.

  18. Characterization of the Zoarces viviparus liver transcriptome using massively parallel pyrosequencing

    Directory of Open Access Journals (Sweden)

    Asker Noomi

    2009-07-01

    Full Text Available Abstract Background The teleost Zoarces viviparus (eelpout lives along the coasts of Northern Europe and has long been an established model organism for marine ecology and environmental monitoring. The scarce information about this species genome has however restrained the use of efficient molecular-level assays, such as gene expression microarrays. Results In the present study we present the first comprehensive characterization of the Zoarces viviparus liver transcriptome. From 400,000 reads generated by massively parallel pyrosequencing, more than 50,000 pieces of putative transcripts were assembled, annotated and functionally classified. The data was estimated to cover roughly 40% of the total transcriptome and homologues for about half of the genes of Gasterosteus aculeatus (stickleback were identified. The sequence data was consequently used to design an oligonucleotide microarray for large-scale gene expression analysis. Conclusion Our results show that one run using a Genome Sequencer FLX from 454 Life Science/Roche generates enough genomic information for adequate de novo assembly of a large number of genes in a higher vertebrate. The generated sequence data, including the validated microarray probes, are publicly available to promote genome-wide research in Zoarces viviparus.

  19. Whole genome characterization of hepatitis B virus quasispecies with massively parallel pyrosequencing.

    Science.gov (United States)

    Li, F; Zhang, D; Li, Y; Jiang, D; Luo, S; Du, N; Chen, W; Deng, L; Zeng, C

    2015-03-01

    Viral quasispecies analysis is important for basic and clinical research. This study was designed to detect hepatitis B virus (HBV) genome-wide mutation profiling with detailed variant composition in individual patients, especially quasispecies evolution correlating with liver disease progression. We characterized viral populations by massively parallel pyrosequencing at whole HBV genome level in 17 patients with advanced liver disease (ALD) and 30 chronic carriers (CC). An average sequencing coverage of 2047× and 687× in ALD and CC groups, respectively, were achieved. Deep sequencing data resolved the landscapes of HBV substitutions and a more complicated quasispecies composition than previously observed. The values of substitution frequencies in quasispecies were clustered as either more than 80% or less than 20%, forming a unique U-shaped distribution pattern in both clinical groups. Furthermore, quantitative comparison of mutation frequencies of each site between two groups resulted in a spectrum of substitutions associated with liver disease progression, and among which, C2288A/T, C2304A, and A/G2525C/T were novel candidates. Moreover, distinct deletion patterns in preS, X, and C regions were shown between the two groups. In conclusion, pyrosequencing of the whole HBV genome revealed a panorama of viral quasispecies composition, characteristics of substitution distribution, and mutations correlating to severe liver disease.

  20. MPRAnator: a web-based tool for the design of massively parallel reporter assay experiments.

    Science.gov (United States)

    Georgakopoulos-Soares, Ilias; Jain, Naman; Gray, Jesse M; Hemberg, Martin

    2017-01-01

    With the rapid advances in DNA synthesis and sequencing technologies and the continuing decline in the associated costs, high-throughput experiments can be performed to investigate the regulatory role of thousands of oligonucleotide sequences simultaneously. Nevertheless, designing high-throughput reporter assay experiments such as massively parallel reporter assays (MPRAs) and similar methods remains challenging. We introduce MPRAnator, a set of tools that facilitate rapid design of MPRA experiments. With MPRA Motif design, a set of variables provides fine control of how motifs are placed into sequences, thereby allowing the investigation of the rules that govern transcription factor (TF) occupancy. MPRA single-nucleotide polymorphism design can be used to systematically examine the functional effects of single or combinations of single-nucleotide polymorphisms at regulatory sequences. Finally, the Transmutation tool allows for the design of negative controls by permitting scrambling, reversing, complementing or introducing multiple random mutations in the input sequences or motifs. MPRAnator tool set is implemented in Python, Perl and Javascript and is freely available at www.genomegeek.com and www.sanger.ac.uk/science/tools/mpranator The source code is available on www.github.com/hemberg-lab/MPRAnator/ under the MIT license. The REST API allows programmatic access to MPRAnator using simple URLs. igs@sanger.ac.uk or mh26@sanger.ac.ukSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  1. Novel Y-chromosome Short Tandem Repeat Variants Detected Through the Use of Massively Parallel Sequencing

    Directory of Open Access Journals (Sweden)

    David H. Warshauer

    2015-08-01

    Full Text Available Massively parallel sequencing (MPS technology is capable of determining the sizes of short tandem repeat (STR alleles as well as their individual nucleotide sequences. Thus, single nucleotide polymorphisms (SNPs within the repeat regions of STRs and variations in the pattern of repeat units in a given repeat motif can be used to differentiate alleles of the same length. In this study, MPS was used to sequence 28 forensically-relevant Y-chromosome STRs in a set of 41 DNA samples from the 3 major U.S. population groups (African Americans, Caucasians, and Hispanics. The resulting sequence data, which were analyzed with STRait Razor v2.0, revealed 37 unique allele sequence variants that have not been previously reported. Of these, 19 sequences were variations of documented sequences resulting from the presence of intra-repeat SNPs or alternative repeat unit patterns. Despite a limited sampling, two of the most frequently-observed variants were found only in African American samples. The remaining 18 variants represented allele sequences for which there were no published data with which to compare. These findings illustrate the great potential of MPS with regard to increasing the resolving power of STR typing and emphasize the need for sample population characterization of STR alleles.

  2. Unique archaeal assemblages in the Arctic Ocean unveiled by massively parallel tag sequencing.

    Science.gov (United States)

    Galand, Pierre E; Casamayor, Emilio O; Kirchman, David L; Potvin, Marianne; Lovejoy, Connie

    2009-07-01

    The Arctic Ocean plays a critical role in controlling nutrient budgets between the Pacific and Atlantic Ocean. Archaea are key players in the nitrogen cycle and in cycling nutrients, but their community composition has been little studied in the Arctic Ocean. Here, we characterize archaeal assemblages from surface and deep Arctic water masses using massively parallel tag sequencing of the V6 region of the 16S rRNA gene. This approach gave a very high coverage of the natural communities, allowing a precise description of archaeal assemblages. This first taxonomic description of archaeal communities by tag sequencing reported so far shows that it is possible to assign an identity below phylum level to most (95%) of the archaeal V6 tags, and shows that tag sequencing is a powerful tool for resolving the diversity and distribution of specific microbes in the environment. Marine group I Crenarchaeota was overall the most abundant group in the Arctic Ocean and comprised between 27% and 63% of all tags. Group III Euryarchaeota were more abundant in deep-water masses and represented the largest archaeal group in the deep Atlantic layer of the central Arctic Ocean. Coastal surface waters, in turn, harbored more group II Euryarchaeota. Moreover, group II sequences that dominated surface waters were different from the group II sequences detected in deep waters, suggesting functional differences in closely related groups. Our results unveiled for the first time an archaeal community dominated by group III Euryarchaeota and show biogeographical traits for marine Arctic Archaea.

  3. Novel Y-chromosome Short Tandem Repeat Variants Detected Through the Use of Massively Parallel Sequencing

    Institute of Scientific and Technical Information of China (English)

    David H Warshauer; Jennifer D Churchill; Nicole Novroski; Jonathan L King; Bruce Budowle

    2015-01-01

    Massively parallel sequencing (MPS) technology is capable of determining the sizes of short tandem repeat (STR) alleles as well as their individual nucleotide sequences. Thus, single nucleotide polymorphisms (SNPs) within the repeat regions of STRs and variations in the pattern of repeat units in a given repeat motif can be used to differentiate alleles of the same length. In this study, MPS was used to sequence 28 forensically-relevant Y-chromosome STRs in a set of 41 DNA samples from the 3 major U.S. population groups (African Americans, Caucasians, and Hispanics). The resulting sequence data, which were analyzed with STRait Razor v2.0, revealed 37 unique allele sequence variants that have not been previously reported. Of these, 19 sequences were variations of documented sequences resulting from the presence of intra-repeat SNPs or alternative repeat unit patterns. Despite a limited sampling, two of the most frequently-observed variants were found only in African American samples. The remaining 18 variants represented allele sequences for which there were no published data with which to compare. These findings illustrate the great potential of MPS with regard to increasing the resolving power of STR typing and emphasize the need for sample population characterization of STR alleles.

  4. Radiation hydrodynamics using characteristics on adaptive decomposed domains for massively parallel star formation simulations

    Science.gov (United States)

    Buntemeyer, Lars; Banerjee, Robi; Peters, Thomas; Klassen, Mikhail; Pudritz, Ralph E.

    2016-02-01

    We present an algorithm for solving the radiative transfer problem on massively parallel computers using adaptive mesh refinement and domain decomposition. The solver is based on the method of characteristics which requires an adaptive raytracer that integrates the equation of radiative transfer. The radiation field is split into local and global components which are handled separately to overcome the non-locality problem. The solver is implemented in the framework of the magneto-hydrodynamics code FLASH and is coupled by an operator splitting step. The goal is the study of radiation in the context of star formation simulations with a focus on early disc formation and evolution. This requires a proper treatment of radiation physics that covers both the optically thin as well as the optically thick regimes and the transition region in particular. We successfully show the accuracy and feasibility of our method in a series of standard radiative transfer problems and two 3D collapse simulations resembling the early stages of protostar and disc formation.

  5. ALEGRA -- A massively parallel h-adaptive code for solid dynamics

    Energy Technology Data Exchange (ETDEWEB)

    Summers, R.M.; Wong, M.K.; Boucheron, E.A.; Weatherby, J.R. [Sandia National Labs., Albuquerque, NM (United States)

    1997-12-31

    ALEGRA is a multi-material, arbitrary-Lagrangian-Eulerian (ALE) code for solid dynamics designed to run on massively parallel (MP) computers. It combines the features of modern Eulerian shock codes, such as CTH, with modern Lagrangian structural analysis codes using an unstructured grid. ALEGRA is being developed for use on the teraflop supercomputers to conduct advanced three-dimensional (3D) simulations of shock phenomena important to a variety of systems. ALEGRA was designed with the Single Program Multiple Data (SPMD) paradigm, in which the mesh is decomposed into sub-meshes so that each processor gets a single sub-mesh with approximately the same number of elements. Using this approach the authors have been able to produce a single code that can scale from one processor to thousands of processors. A current major effort is to develop efficient, high precision simulation capabilities for ALEGRA, without the computational cost of using a global highly resolved mesh, through flexible, robust h-adaptivity of finite elements. H-adaptivity is the dynamic refinement of the mesh by subdividing elements, thus changing the characteristic element size and reducing numerical error. The authors are working on several major technical challenges that must be met to make effective use of HAMMER on MP computers.

  6. Practical tools to implement massive parallel pyrosequencing of PCR products in next generation molecular diagnostics.

    Directory of Open Access Journals (Sweden)

    Kim De Leeneer

    Full Text Available Despite improvements in terms of sequence quality and price per basepair, Sanger sequencing remains restricted to screening of individual disease genes. The development of massively parallel sequencing (MPS technologies heralded an era in which molecular diagnostics for multigenic disorders becomes reality. Here, we outline different PCR amplification based strategies for the screening of a multitude of genes in a patient cohort. We performed a thorough evaluation in terms of set-up, coverage and sequencing variants on the data of 10 GS-FLX experiments (over 200 patients. Crucially, we determined the actual coverage that is required for reliable diagnostic results using MPS, and provide a tool to calculate the number of patients that can be screened in a single run. Finally, we provide an overview of factors contributing to false negative or false positive mutation calls and suggest ways to maximize sensitivity and specificity, both important in a routine setting. By describing practical strategies for screening of multigenic disorders in a multitude of samples and providing answers to questions about minimum required coverage, the number of patients that can be screened in a single run and the factors that may affect sensitivity and specificity we hope to facilitate the implementation of MPS technology in molecular diagnostics.

  7. Adaptive Flow Simulation of Turbulence in Subject-Specific Abdominal Aortic Aneurysm on Massively Parallel Computers

    Science.gov (United States)

    Sahni, Onkar; Jansen, Kenneth; Shephard, Mark; Taylor, Charles

    2007-11-01

    Flow within the healthy human vascular system is typically laminar but diseased conditions can alter the geometry sufficiently to produce transitional/turbulent flows in regions focal (and immediately downstream) of the diseased section. The mean unsteadiness (pulsatile or respiratory cycle) further complicates the situation making traditional turbulence simulation techniques (e.g., Reynolds-averaged Navier-Stokes simulations (RANSS)) suspect. At the other extreme, direct numerical simulation (DNS) while fully appropriate can lead to large computational expense, particularly when the simulations must be done quickly since they are intended to affect the outcome of a medical treatment (e.g., virtual surgical planning). To produce simulations in a clinically relevant time frame requires; 1) adaptive meshing technique that closely matches the desired local mesh resolution in all three directions to the highly anisotropic physical length scales in the flow, 2) efficient solution algorithms, and 3) excellent scaling on massively parallel computers. In this presentation we will demonstrate results for a subject-specific simulation of an abdominal aortic aneurysm using stabilized finite element method on anisotropically adapted meshes consisting of O(10^8) elements over O(10^4) processors.

  8. Research on a Novel Parallel Engraving Machine and its Key Technologies

    Directory of Open Access Journals (Sweden)

    Zhang Shi-hui

    2008-11-01

    Full Text Available In order to compensate the disadvantages of conventional engraving machine and exert the advantages of parallel mechanism, a novel parallel engraving machine is presented and some key technologies are studied in this paper. Mechanism performances are analyzed in terms of the first and the second order influence coefficient matrix firstly. So the sizes of mechanism, which are better for all the performance indices of both kinematics and dynamics, can be confirmed and the restriction due to considering only the first order influence coefficient matrix in the past is broken through. Therefore, the theory basis for designing the mechanism size of novel engraving machine with better performances is provided. In addition, method for tool path planning and control technology for engraving force is also studied in the paper. The proposed algorithm for tool path planning on curved surface can be applied to arbitrary spacial curved surface in theory, control technology for engraving force based on fuzzy neural network(FNN has well adaptability to the changing environment. Research on teleoperation for parallel engraving machine based on B/S architecture resolves the key problems such as control mode, sharing mechanism for multiuser, real-time control for engraving job and real-time transmission for video information. Simulation results further show the feasibility and validity of the proposed methods.

  9. Minimisation of total tardiness for identical parallel machine scheduling using genetic algorithm

    Indian Academy of Sciences (India)

    IMRAN ALI CHAUDHRY; ISAM A Q ELBADAWI

    2017-01-01

    In recent years research on parallel machine scheduling has received an increased attention. This paper considers minimisation of total tardiness for scheduling of n jobs on a set of m parallel machines. A spread-sheet-based genetic algorithm (GA) approach is proposed for the problem. The proposed approach is a domain-independent general purpose approach, which has been effectively used to solve this class of problem.The performance of GA is compared with branch and bound and particle swarm optimisation approaches. Two set of problems having 20 and 25 jobs with number of parallel machines equal to 2, 4, 6, 8 and 10 are solved with the proposed approach. Each combination of number of jobs and machines consists of 125 benchmark problems; thus a total for 2250 problems are solved. The results obtained by the proposed approach are comparable with two earlier approaches. It is also demonstrated that a simple GA can be used to produce results that are comparable with problem-specific approach. The proposed approach can also be used to optimise any objectivefunction without changing the basic GA routine.

  10. Characterization of robotics parallel algorithms and mapping onto a reconfigurable SIMD machine

    Science.gov (United States)

    Lee, C. S. G.; Lin, C. T.

    1989-01-01

    The kinematics, dynamics, Jacobian, and their corresponding inverse computations are six essential problems in the control of robot manipulators. Efficient parallel algorithms for these computations are discussed and analyzed. Their characteristics are identified and a scheme on the mapping of these algorithms to a reconfigurable parallel architecture is presented. Based on the characteristics including type of parallelism, degree of parallelism, uniformity of the operations, fundamental operations, data dependencies, and communication requirement, it is shown that most of the algorithms for robotic computations possess highly regular properties and some common structures, especially the linear recursive structure. Moreover, they are well-suited to be implemented on a single-instruction-stream multiple-data-stream (SIMD) computer with reconfigurable interconnection network. The model of a reconfigurable dual network SIMD machine with internal direct feedback is introduced. A systematic procedure internal direct feedback is introduced. A systematic procedure to map these computations to the proposed machine is presented. A new scheduling problem for SIMD machines is investigated and a heuristic algorithm, called neighborhood scheduling, that reorders the processing sequence of subtasks to reduce the communication time is described. Mapping results of a benchmark algorithm are illustrated and discussed.

  11. The R package "sperrorest" : Parallelized spatial error estimation and variable importance assessment for geospatial machine learning

    Science.gov (United States)

    Schratz, Patrick; Herrmann, Tobias; Brenning, Alexander

    2017-04-01

    Computational and statistical prediction methods such as the support vector machine have gained popularity in remote-sensing applications in recent years and are often compared to more traditional approaches like maximum-likelihood classification. However, the accuracy assessment of such predictive models in a spatial context needs to account for the presence of spatial autocorrelation in geospatial data by using spatial cross-validation and bootstrap strategies instead of their now more widely used non-spatial equivalent. The R package sperrorest by A. Brenning [IEEE International Geoscience and Remote Sensing Symposium, 1, 374 (2012)] provides a generic interface for performing (spatial) cross-validation of any statistical or machine-learning technique available in R. Since spatial statistical models as well as flexible machine-learning algorithms can be computationally expensive, parallel computing strategies are required to perform cross-validation efficiently. The most recent major release of sperrorest therefore comes with two new features (aside from improved documentation): The first one is the parallelized version of sperrorest(), parsperrorest(). This function features two parallel modes to greatly speed up cross-validation runs. Both parallel modes are platform independent and provide progress information. par.mode = 1 relies on the pbapply package and calls interactively (depending on the platform) parallel::mclapply() or parallel::parApply() in the background. While forking is used on Unix-Systems, Windows systems use a cluster approach for parallel execution. par.mode = 2 uses the foreach package to perform parallelization. This method uses a different way of cluster parallelization than the parallel package does. In summary, the robustness of parsperrorest() is increased with the implementation of two independent parallel modes. A new way of partitioning the data in sperrorest is provided by partition.factor.cv(). This function gives the user the

  12. Program Suite for Conceptual Designing of Parallel Mechanism-Based Robots and Machine Tools

    Directory of Open Access Journals (Sweden)

    Slobodan Tabaković

    2013-06-01

    This paper describes the categorization of criteria for the conceptual design of parallel mechanism‐based robots or machine tools, resulting from workspace analysis as well as the procedure of their defining. Furthermore, it also presents the designing methodology that was implemented into the program for the creation of a robot or machine tool space model and the optimization of the resulting solution. For verification of the criteria and the programme suite, three common (conceptually different mechanisms with a similar mechanical structure and kinematic characteristics were used.

  13. Real-time topological image smoothing on shared memory parallel machines

    Science.gov (United States)

    Mahmoudi, Ramzi; Akil, Mohamed

    2011-03-01

    Smoothing filter is the method of choice for image preprocessing and pattern recognition. We present a new concurrent method for smoothing 2D object in binary case. Proposed method provides a parallel computation while preserving the topology by using homotopic transformations. We introduce an adapted parallelization strategy called split, distribute and merge (SDM) strategy which allows efficient parallelization of a large class of topological operators including, mainly, smoothing, skeletonization, and watershed algorithms. To achieve a good speedup, we cared about task scheduling. Distributed work during smoothing process is done by a variable number of threads. Tests on 2D binary image (512*512), using shared memory parallel machine (SMPM) with 8 CPU cores (2× Xeon E5405 running at frequency of 2 GHz), showed an enhancement of 5.2 thus a cadency of 32 images per second is achieved.

  14. Algorithms and data structures for massively parallel generic adaptive finite element codes

    KAUST Repository

    Bangerth, Wolfgang

    2011-12-01

    Today\\'s largest supercomputers have 100,000s of processor cores and offer the potential to solve partial differential equations discretized by billions of unknowns. However, the complexity of scaling to such large machines and problem sizes has so far prevented the emergence of generic software libraries that support such computations, although these would lower the threshold of entry and enable many more applications to benefit from large-scale computing. We are concerned with providing this functionality for mesh-adaptive finite element computations. We assume the existence of an "oracle" that implements the generation and modification of an adaptive mesh distributed across many processors, and that responds to queries about its structure. Based on querying the oracle, we develop scalable algorithms and data structures for generic finite element methods. Specifically, we consider the parallel distribution of mesh data, global enumeration of degrees of freedom, constraints, and postprocessing. Our algorithms remove the bottlenecks that typically limit large-scale adaptive finite element analyses. We demonstrate scalability of complete finite element workflows on up to 16,384 processors. An implementation of the proposed algorithms, based on the open source software p4est as mesh oracle, is provided under an open source license through the widely used deal.II finite element software library. © 2011 ACM 0098-3500/2011/12-ART10 $10.00.

  15. An effective estimation of distribution algorithm for parallel litho machine scheduling with reticle constraints

    Institute of Scientific and Technical Information of China (English)

    周炳海

    2016-01-01

    In order to improve the scheduling efficiency of photolithography, bottleneck process of wafer fabrications in the semiconductor industry, an effective estimation of distribution algorithm is pro-posed for scheduling problems of parallel litho machines with reticle constraints, where multiple reti-cles are available for each reticle type.First, the scheduling problem domain of parallel litho ma-chines is described with reticle constraints and mathematical programming formulations are put for-ward with the objective of minimizing total weighted completion time.Second, estimation of distribu-tion algorithm is developed with a decoding scheme specially designed to deal with the reticle con-straints.Third, an insert-based local search with the first move strategy is introduced to enhance the local exploitation ability of the algorithm.Finally, simulation experiments and analysis demonstrate the effectiveness of the proposed algorithm.

  16. Literature Review on the Hybrid Flow Shop Scheduling Problem with Unrelated Parallel Machines

    Directory of Open Access Journals (Sweden)

    Eliana Marcela Peña Tibaduiza

    2017-01-01

    Full Text Available Context: The flow shop hybrid problem with unrelated parallel machines has been less studied in the academia compared to the flow shop hybrid with identical processors. For this reason, there are few reports about the kind of application of this problem in industries. Method: A literature review of the state of the art on flow-shop scheduling problem was conducted by collecting and analyzing academic papers on several scientific databases. For this aim, a search query was constructed using keywords defining the problem and checking the inclusion of unrelated parallel machines in such definition; as a result, 50 papers were finally selected for this study. Results: A classification of the problem according to the characteristics of the production system was performed, also solution methods, constraints and objective functions commonly used are presented. Conclusions: An increasing trend is observed in studies of flow shop with multiple stages, but few are based on industry case-studies.

  17. A SNP panel for identity and kinship testing using massive parallel sequencing.

    Science.gov (United States)

    Grandell, Ida; Samara, Raed; Tillmar, Andreas O

    2016-07-01

    Within forensic genetics, there is still a need for supplementary DNA marker typing in order to increase the power to solve cases for both identity testing and complex kinship issues. One major disadvantage with current capillary electrophoresis (CE) methods is the limitation in DNA marker multiplex capability. By utilizing massive parallel sequencing (MPS) technology, this capability can, however, be increased. We have designed a customized GeneRead DNASeq SNP panel (Qiagen) of 140 previously published autosomal forensically relevant identity SNPs for analysis using MPS. One single amplification step was followed by library preparation using the GeneRead Library Prep workflow (Qiagen). The sequencing was performed on a MiSeq System (Illumina), and the bioinformatic analyses were done using the software Biomedical Genomics Workbench (CLC Bio, Qiagen). Forty-nine individuals from a Swedish population were genotyped in order to establish genotype frequencies and to evaluate the performance of the assay. The analyses showed to have a balanced coverage among the included loci, and the heterozygous balance showed to have less than 0.5 % outliers. Analyses of dilution series of the 2800M Control DNA gave reproducible results down to 0.2 ng DNA input. In addition, typing of FTA samples and bone samples was performed with promising results. Further studies and optimizations are, however, required for a more detailed evaluation of the performance of degraded and PCR-inhibited forensic samples. In summary, the assay offers a straightforward sample-to-genotype workflow and could be useful to gain information in forensic casework, for both identity testing and in order to solve complex kinship issues.

  18. Transcriptional analysis of the Arabidopsis ovule by massively parallel signature sequencing

    Science.gov (United States)

    Sánchez-León, Nidia; Arteaga-Vázquez, Mario; Alvarez-Mejía, César; Mendiola-Soto, Javier; Durán-Figueroa, Noé; Rodríguez-Leal, Daniel; Rodríguez-Arévalo, Isaac; García-Campayo, Vicenta; García-Aguilar, Marcelina; Olmedo-Monfil, Vianey; Arteaga-Sánchez, Mario; Martínez de la Vega, Octavio; Nobuta, Kan; Vemaraju, Kalyan; Meyers, Blake C.; Vielle-Calzada, Jean-Philippe

    2012-01-01

    The life cycle of flowering plants alternates between a predominant sporophytic (diploid) and an ephemeral gametophytic (haploid) generation that only occurs in reproductive organs. In Arabidopsis thaliana, the female gametophyte is deeply embedded within the ovule, complicating the study of the genetic and molecular interactions involved in the sporophytic to gametophytic transition. Massively parallel signature sequencing (MPSS) was used to conduct a quantitative large-scale transcriptional analysis of the fully differentiated Arabidopsis ovule prior to fertilization. The expression of 9775 genes was quantified in wild-type ovules, additionally detecting >2200 new transcripts mapping to antisense or intergenic regions. A quantitative comparison of global expression in wild-type and sporocyteless (spl) individuals resulted in 1301 genes showing 25-fold reduced or null activity in ovules lacking a female gametophyte, including those encoding 92 signalling proteins, 75 transcription factors, and 72 RNA-binding proteins not reported in previous studies based on microarray profiling. A combination of independent genetic and molecular strategies confirmed the differential expression of 28 of them, showing that they are either preferentially active in the female gametophyte, or dependent on the presence of a female gametophyte to be expressed in sporophytic cells of the ovule. Among 18 genes encoding pentatricopeptide-repeat proteins (PPRs) that show transcriptional activity in wild-type but not spl ovules, CIHUATEOTL (At4g38150) is specifically expressed in the female gametophyte and necessary for female gametogenesis. These results expand the nature of the transcriptional universe present in the ovule of Arabidopsis, and offer a large-scale quantitative reference of global expression for future genomic and developmental studies. PMID:22442422

  19. Design considerations for massively parallel sequencing studies of complex human disease.

    Directory of Open Access Journals (Sweden)

    Bing-Jian Feng

    Full Text Available Massively Parallel Sequencing (MPS allows sequencing of entire exomes and genomes to now be done at reasonable cost, and its utility for identifying genes responsible for rare Mendelian disorders has been demonstrated. However, for a complex disease, study designs need to accommodate substantial degrees of locus, allelic, and phenotypic heterogeneity, as well as complex relationships between genotype and phenotype. Such considerations include careful selection of samples for sequencing and a well-developed strategy for identifying the few "true" disease susceptibility genes from among the many irrelevant genes that will be found to harbor rare variants. To examine these issues we have performed simulation-based analyses in order to compare several strategies for MPS sequencing in complex disease. Factors examined include genetic architecture, sample size, number and relationship of individuals selected for sequencing, and a variety of filters based on variant type, multiple observations of genes and concordance of genetic variants within pedigrees. A two-stage design was assumed where genes from the MPS analysis of high-risk families are evaluated in a secondary screening phase of a larger set of probands with more modest family histories. Designs were evaluated using a cost function that assumes the cost of sequencing the whole exome is 400 times that of sequencing a single candidate gene. Results indicate that while requiring variants to be identified in multiple pedigrees and/or in multiple individuals in the same pedigree are effective strategies for reducing false positives, there is a danger of over-filtering so that most true susceptibility genes are missed. In most cases, sequencing more than two individuals per pedigree results in reduced power without any benefit in terms of reduced overall cost. Further, our results suggest that although no single strategy is optimal, simulations can provide important guidelines for study design.

  20. Implementation of a Message Passing Interface into a Cloud-Resolving Model for Massively Parallel Computing

    Science.gov (United States)

    Juang, Hann-Ming Henry; Tao, Wei-Kuo; Zeng, Xi-Ping; Shie, Chung-Lin; Simpson, Joanne; Lang, Steve

    2004-01-01

    The capability for massively parallel programming (MPP) using a message passing interface (MPI) has been implemented into a three-dimensional version of the Goddard Cumulus Ensemble (GCE) model. The design for the MPP with MPI uses the concept of maintaining similar code structure between the whole domain as well as the portions after decomposition. Hence the model follows the same integration for single and multiple tasks (CPUs). Also, it provides for minimal changes to the original code, so it is easily modified and/or managed by the model developers and users who have little knowledge of MPP. The entire model domain could be sliced into one- or two-dimensional decomposition with a halo regime, which is overlaid on partial domains. The halo regime requires that no data be fetched across tasks during the computational stage, but it must be updated before the next computational stage through data exchange via MPI. For reproducible purposes, transposing data among tasks is required for spectral transform (Fast Fourier Transform, FFT), which is used in the anelastic version of the model for solving the pressure equation. The performance of the MPI-implemented codes (i.e., the compressible and anelastic versions) was tested on three different computing platforms. The major results are: 1) both versions have speedups of about 99% up to 256 tasks but not for 512 tasks; 2) the anelastic version has better speedup and efficiency because it requires more computations than that of the compressible version; 3) equal or approximately-equal numbers of slices between the x- and y- directions provide the fastest integration due to fewer data exchanges; and 4) one-dimensional slices in the x-direction result in the slowest integration due to the need for more memory relocation for computation.

  1. Genotypic tropism testing by massively parallel sequencing: qualitative and quantitative analysis

    Directory of Open Access Journals (Sweden)

    Thiele Bernhard

    2011-05-01

    Full Text Available Abstract Background Inferring viral tropism from genotype is a fast and inexpensive alternative to phenotypic testing. While being highly predictive when performed on clonal samples, sensitivity of predicting CXCR4-using (X4 variants drops substantially in clinical isolates. This is mainly attributed to minor variants not detected by standard bulk-sequencing. Massively parallel sequencing (MPS detects single clones thereby being much more sensitive. Using this technology we wanted to improve genotypic prediction of coreceptor usage. Methods Plasma samples from 55 antiretroviral-treated patients tested for coreceptor usage with the Monogram Trofile Assay were sequenced with standard population-based approaches. Fourteen of these samples were selected for further analysis with MPS. Tropism was predicted from each sequence with geno2pheno[coreceptor]. Results Prediction based on bulk-sequencing yielded 59.1% sensitivity and 90.9% specificity compared to the trofile assay. With MPS, 7600 reads were generated on average per isolate. Minorities of sequences with high confidence in CXCR4-usage were found in all samples, irrespective of phenotype. When using the default false-positive-rate of geno2pheno[coreceptor] (10%, and defining a minority cutoff of 5%, the results were concordant in all but one isolate. Conclusions The combination of MPS and coreceptor usage prediction results in a fast and accurate alternative to phenotypic assays. The detection of X4-viruses in all isolates suggests that coreceptor usage as well as fitness of minorities is important for therapy outcome. The high sensitivity of this technology in combination with a quantitative description of the viral population may allow implementing meaningful cutoffs for predicting response to CCR5-antagonists in the presence of X4-minorities.

  2. Transcriptional analysis of the Arabidopsis ovule by massively parallel signature sequencing.

    Science.gov (United States)

    Sánchez-León, Nidia; Arteaga-Vázquez, Mario; Alvarez-Mejía, César; Mendiola-Soto, Javier; Durán-Figueroa, Noé; Rodríguez-Leal, Daniel; Rodríguez-Arévalo, Isaac; García-Campayo, Vicenta; García-Aguilar, Marcelina; Olmedo-Monfil, Vianey; Arteaga-Sánchez, Mario; de la Vega, Octavio Martínez; Nobuta, Kan; Vemaraju, Kalyan; Meyers, Blake C; Vielle-Calzada, Jean-Philippe

    2012-06-01

    The life cycle of flowering plants alternates between a predominant sporophytic (diploid) and an ephemeral gametophytic (haploid) generation that only occurs in reproductive organs. In Arabidopsis thaliana, the female gametophyte is deeply embedded within the ovule, complicating the study of the genetic and molecular interactions involved in the sporophytic to gametophytic transition. Massively parallel signature sequencing (MPSS) was used to conduct a quantitative large-scale transcriptional analysis of the fully differentiated Arabidopsis ovule prior to fertilization. The expression of 9775 genes was quantified in wild-type ovules, additionally detecting >2200 new transcripts mapping to antisense or intergenic regions. A quantitative comparison of global expression in wild-type and sporocyteless (spl) individuals resulted in 1301 genes showing 25-fold reduced or null activity in ovules lacking a female gametophyte, including those encoding 92 signalling proteins, 75 transcription factors, and 72 RNA-binding proteins not reported in previous studies based on microarray profiling. A combination of independent genetic and molecular strategies confirmed the differential expression of 28 of them, showing that they are either preferentially active in the female gametophyte, or dependent on the presence of a female gametophyte to be expressed in sporophytic cells of the ovule. Among 18 genes encoding pentatricopeptide-repeat proteins (PPRs) that show transcriptional activity in wild-type but not spl ovules, CIHUATEOTL (At4g38150) is specifically expressed in the female gametophyte and necessary for female gametogenesis. These results expand the nature of the transcriptional universe present in the ovule of Arabidopsis, and offer a large-scale quantitative reference of global expression for future genomic and developmental studies.

  3. Comprehensive microRNA profiling in B-cells of human centenarians by massively parallel sequencing

    Directory of Open Access Journals (Sweden)

    Gombar Saurabh

    2012-07-01

    Full Text Available Abstract Background MicroRNAs (miRNAs are small, non-coding RNAs that regulate gene expression and play a critical role in development, homeostasis, and disease. Despite their demonstrated roles in age-associated pathologies, little is known about the role of miRNAs in human aging and longevity. Results We employed massively parallel sequencing technology to identify miRNAs expressed in B-cells from Ashkenazi Jewish centenarians, i.e., those living to a hundred and a human model of exceptional longevity, and younger controls without a family history of longevity. With data from 26.7 million reads comprising 9.4 × 108 bp from 3 centenarian and 3 control individuals, we discovered a total of 276 known miRNAs and 8 unknown miRNAs ranging several orders of magnitude in expression levels, a typical characteristics of saturated miRNA-sequencing. A total of 22 miRNAs were found to be significantly upregulated, with only 2 miRNAs downregulated, in centenarians as compared to controls. Gene Ontology analysis of the predicted and validated targets of the 24 differentially expressed miRNAs indicated enrichment of functional pathways involved in cell metabolism, cell cycle, cell signaling, and cell differentiation. A cross sectional expression analysis of the differentially expressed miRNAs in B-cells from Ashkenazi Jewish individuals between the 50th and 100th years of age indicated that expression levels of miR-363* declined significantly with age. Centenarians, however, maintained the youthful expression level. This result suggests that miR-363* may be a candidate longevity-associated miRNA. Conclusion Our comprehensive miRNA data provide a resource for further studies to identify genetic pathways associated with aging and longevity in humans.

  4. High throughput whole rumen metagenome profiling using untargeted massively parallel sequencing

    Directory of Open Access Journals (Sweden)

    Ross Elizabeth M

    2012-07-01

    Full Text Available Abstract Background Variation of microorganism communities in the rumen of cattle (Bos taurus is of great interest because of possible links to economically or environmentally important traits, such as feed conversion efficiency or methane emission levels. The resolution of studies investigating this variation may be improved by utilizing untargeted massively parallel sequencing (MPS, that is, sequencing without targeted amplification of genes. The objective of this study was to develop a method which used MPS to generate “rumen metagenome profiles”, and to investigate if these profiles were repeatable among samples taken from the same cow. Given faecal samples are much easier to obtain than rumen fluid samples; we also investigated whether rumen metagenome profiles were predictive of faecal metagenome profiles. Results Rather than focusing on individual organisms within the rumen, our method used MPS data to generate quantitative rumen micro-biome profiles, regardless of taxonomic classifications. The method requires a previously assembled reference metagenome. A number of such reference metagenomes were considered, including two rumen derived metagenomes, a human faecal microflora metagenome and a reference metagenome made up of publically available prokaryote sequences. Sequence reads from each test sample were aligned to these references. The “rumen metagenome profile” was generated from the number of the reads that aligned to each contig in the database. We used this method to test the hypothesis that rumen fluid microbial community profiles vary more between cows than within multiple samples from the same cow. Rumen fluid samples were taken from three cows, at three locations within the rumen. DNA from the samples was sequenced on the Illumina GAIIx. When the reads were aligned to a rumen metagenome reference, the rumen metagenome profiles were repeatable (P  Conclusions We have presented a simple and high throughput method of

  5. An extensible operating system design for large-scale parallel machines.

    Energy Technology Data Exchange (ETDEWEB)

    Riesen, Rolf E.; Ferreira, Kurt Brian

    2009-04-01

    Running untrusted user-level code inside an operating system kernel has been studied in the 1990's but has not really caught on. We believe the time has come to resurrect kernel extensions for operating systems that run on highly-parallel clusters and supercomputers. The reason is that the usage model for these machines differs significantly from a desktop machine or a server. In addition, vendors are starting to add features, such as floating-point accelerators, multicore processors, and reconfigurable compute elements. An operating system for such machines must be adaptable to the requirements of specific applications and provide abstractions to access next-generation hardware features, without sacrificing performance or scalability.

  6. Practical parallel computing

    CERN Document Server

    Morse, H Stephen

    1994-01-01

    Practical Parallel Computing provides information pertinent to the fundamental aspects of high-performance parallel processing. This book discusses the development of parallel applications on a variety of equipment.Organized into three parts encompassing 12 chapters, this book begins with an overview of the technology trends that converge to favor massively parallel hardware over traditional mainframes and vector machines. This text then gives a tutorial introduction to parallel hardware architectures. Other chapters provide worked-out examples of programs using several parallel languages. Thi

  7. Recessive RYR1 mutations in a patient with severe congenital nemaline myopathy with ophthalomoplegia identified through massively parallel sequencing.

    Science.gov (United States)

    Kondo, Eri; Nishimura, Takafumi; Kosho, Tomoki; Inaba, Yuji; Mitsuhashi, Satomi; Ishida, Takefumi; Baba, Atsushi; Koike, Kenichi; Nishino, Ichizo; Nonaka, Ikuya; Furukawa, Toru; Saito, Kayoko

    2012-04-01

    Nemaline myopathy (NM) is a group of congenital myopathies, characterized by the presence of distinct rod-like inclusions "nemaline bodies" in the sarcoplasm of skeletal muscle fibers. To date, ACTA1, NEB, TPM3, TPM2, TNNT1, and CFL2 have been found to cause NM. We have identified recessive RYR1 mutations in a patient with severe congenital NM, through high-throughput screening of congenital myopathy/muscular dystrophy-related genes using massively parallel sequencing with target gene capture. The patient manifested fetal akinesia, neonatal severe hypotonia with muscle weakness, respiratory insufficiency, swallowing disturbance, and ophthalomoplegia. Skeletal muscle histology demonstrated nemaline bodies and small type 1 fibers, but without central cores or minicores. Congenital myopathies, a molecularly, histopathologically, and clinically heterogeneous group of disorders are considered to be a good candidate for massively parallel sequencing. Copyright © 2012 Wiley Periodicals, Inc.

  8. Some Solution Approaches to Reduce the Imbalance of Workload in Parallel Machines while Planning in Flexible Manufacturing System

    Directory of Open Access Journals (Sweden)

    B.V.Raghavendra,

    2010-05-01

    Full Text Available The loading problem in a Flexible Manufacturing System (FMS is viewed as selecting a subset of jobs from a job pool and allocating the jobs among the machines. Balancing the workload on the parallel machines will reduce the bottleneck and improve the utilization of the machine tools. In this paper an effort is made for developing thestrategies in the pre-release / planning stage which will reduce the imbalance between the parallel machines. Two different strategies are developed and the traditional sequencing shortest and longest rocessing time rule is applied to determine the relative performance index. An illustrative example is accompanies with the shortest processing time.

  9. Automatic QSO Selection Using Machine Learning: Application on Massive Astronomical Database

    Science.gov (United States)

    Kim, D.-W.; Protopapas, P.; Alcock, C.; Byun, Y.-I.; Khardon, R.

    2011-07-01

    We present a new QSO (Quasi-Stellar Object) selection algorithm using Support Vector Machine (SVM), a supervised classification method, on a set of multiple extracted times series features such as period, amplitude, color, and autocorrelation value. We train a model that separates QSOs from variable stars, non-variable stars and microlensing events using the richest possible training set consisting of all known types of variables including QSOs from the MAssive Compact Halo Object (MACHO) database. We applied the trained model on the MACHO Large Magellanic Cloud (LMC) dataset, which consists of 40 million lightcurves, and found 1,620 QSO candidates. During the selection none of the 33,242 known MACHO variables were misclassified as QSO candidates. In order to estimate the true false positive rate, we crossmatched the candidates with astronomical catalogs including the Spitzer Surveying the Agents of a Galaxy's Evolution (SAGE) LMC catalog. The results further suggest that the majority of the candidates, more than 70%, are QSOs.

  10. Parallel Sparse Matrix Solver on the GPU Applied to Simulation of Electrical Machines

    CERN Document Server

    Rodrigues, Antonio Wendell De Oliveira; Menach, Yvonnick Le; Dekeyser, Jean-Luc

    2010-01-01

    Nowadays, several industrial applications are being ported to parallel architectures. In fact, these platforms allow acquire more performance for system modelling and simulation. In the electric machines area, there are many problems which need speed-up on their solution. This paper examines the parallelism of sparse matrix solver on the graphics processors. More specifically, we implement the conjugate gradient technique with input matrix stored in CSR, and Symmetric CSR and CSC formats. This method is one of the most efficient iterative methods available for solving the finite-element basis functions of Maxwell's equations. The GPU (Graphics Processing Unit), which is used for its implementation, provides mechanisms to parallel the algorithm. Thus, it increases significantly the computation speed in relation to serial code on CPU based systems.

  11. Molecular dynamics simulation on a network of workstations using a machine-independent parallel programming language.

    Science.gov (United States)

    Shifman, M A; Windemuth, A; Schulten, K; Miller, P L

    1992-04-01

    Molecular dynamics simulations investigate local and global motion in molecules. Several parallel computing approaches have been taken to attack the most computationally expensive phase of molecular simulations, the evaluation of long range interactions. This paper reviews these approaches and develops a straightforward but effective algorithm using the machine-independent parallel programming language, Linda. The algorithm was run both on a shared memory parallel computer and on a network of high performance Unix workstations. Performance benchmarks were performed on both systems using two proteins. This algorithm offers a portable cost-effective alternative for molecular dynamics simulations. In view of the increasing numbers of networked workstations, this approach could help make molecular dynamics simulations more easily accessible to the research community.

  12. More comprehensive forensic genetic marker analyses for accurate human remains identification using massively parallel DNA sequencing.

    Science.gov (United States)

    Ambers, Angie D; Churchill, Jennifer D; King, Jonathan L; Stoljarova, Monika; Gill-King, Harrell; Assidi, Mourad; Abu-Elmagd, Muhammad; Buhmeida, Abdelbaset; Al-Qahtani, Mohammed; Budowle, Bruce

    2016-10-17

    Although the primary objective of forensic DNA analyses of unidentified human remains is positive identification, cases involving historical or archaeological skeletal remains often lack reference samples for comparison. Massively parallel sequencing (MPS) offers an opportunity to provide biometric data in such cases, and these cases provide valuable data on the feasibility of applying MPS for characterization of modern forensic casework samples. In this study, MPS was used to characterize 140-year-old human skeletal remains discovered at a historical site in Deadwood, South Dakota, United States. The remains were in an unmarked grave and there were no records or other metadata available regarding the identity of the individual. Due to the high throughput of MPS, a variety of biometric markers could be typed using a single sample. Using MPS and suitable forensic genetic markers, more relevant information could be obtained from a limited quantity and quality sample. Results were obtained for 25/26 Y-STRs, 34/34 Y SNPs, 166/166 ancestry-informative SNPs, 24/24 phenotype-informative SNPs, 102/102 human identity SNPs, 27/29 autosomal STRs (plus amelogenin), and 4/8 X-STRs (as well as ten regions of mtDNA). The Y-chromosome (Y-STR, Y-SNP) and mtDNA profiles of the unidentified skeletal remains are consistent with the R1b and H1 haplogroups, respectively. Both of these haplogroups are the most common haplogroups in Western Europe. Ancestry-informative SNP analysis also supported European ancestry. The genetic results are consistent with anthropological findings that the remains belong to a male of European ancestry (Caucasian). Phenotype-informative SNP data provided strong support that the individual had light red hair and brown eyes. This study is among the first to genetically characterize historical human remains with forensic genetic marker kits specifically designed for MPS. The outcome demonstrates that substantially more genetic information can be obtained from

  13. Global transcriptional profiling of the toxic dinoflagellate Alexandrium fundyense using Massively Parallel Signature Sequencing

    Directory of Open Access Journals (Sweden)

    Anderson Donald M

    2006-04-01

    Full Text Available Abstract Background Dinoflagellates are one of the most important classes of marine and freshwater algae, notable both for their functional diversity and ecological significance. They occur naturally as free-living cells, as endosymbionts of marine invertebrates and are well known for their involvement in "red tides". Dinoflagellates are also notable for their unusual genome content and structure, which suggests that the organization and regulation of dinoflagellate genes may be very different from that of most eukaryotes. To investigate the content and regulation of the dinoflagellate genome, we performed a global analysis of the transcriptome of the toxic dinoflagellate Alexandrium fundyense under nitrate- and phosphate-limited conditions using Massively Parallel Signature Sequencing (MPSS. Results Data from the two MPSS libraries showed that the number of unique signatures found in A. fundyense cells is similar to that of humans and Arabidopsis thaliana, two eukaryotes that have been extensively analyzed using this method. The general distribution, abundance and expression patterns of the A. fundyense signatures were also quite similar to other eukaryotes, and at least 10% of the A. fundyense signatures were differentially expressed between the two conditions. RACE amplification and sequencing of a subset of signatures showed that multiple signatures arose from sequence variants of a single gene. Single signatures also mapped to different sequence variants of the same gene. Conclusion The MPSS data presented here provide a quantitative view of the transcriptome and its regulation in these unusual single-celled eukaryotes. The observed signature abundance and distribution in Alexandrium is similar to that of other eukaryotes that have been analyzed using MPSS. Results of signature mapping via RACE indicate that many signatures result from sequence variants of individual genes. These data add to the growing body of evidence for widespread gene

  14. STATE SPACE MODELING OF DIMENSIONAL MACHINING ERRORS OF SERIAL-PARALLEL HYBRID MULTI-STAGE MACHINING SYSTEM

    Institute of Scientific and Technical Information of China (English)

    XI Lifeng; DU Shichang

    2007-01-01

    The final product quality is determined by cumulation, coupling and propagation of product quality variations from all stations in multi-stage manufacturing systems (MMSs). Modeling and control of variation propagation is essential to improve product quality. However, the current stream of variations (SOV) theory can only solve the problem that a single SOV affects the product quality. Due to the existence of multiple variation streams, limited research has been done on the quality control in serial-parallel hybrid multi-stage manufacturing systems (SPH-MMSs). A state space model and its modeling strategies are developed to describe the multiple variation streams stack-up in an SPH-MMS. The SOV theory is extended to SPH-MMS. The dimensions of system model are reduced to the production-reality level, and the effect and feasibility of the model is validated by a machining case.

  15. Modeling the Scheduling Problem of Identical Parallel Machines with Load Balancing by Time Petri Nets

    Directory of Open Access Journals (Sweden)

    Sekhri Larbi

    2014-12-01

    Full Text Available The optimal resources allocation to tasks was the primary objective of the research dealing with scheduling problems. These problems are characterized by their complexity, known as NP-hard in most cases. Currently with the evolution of technology, classical methods are inadequate because they degrade system performance (inflexibility, inefficient resources using policy, etc.. In the context of parallel and distributed systems, several computing units process multitasking applications in concurrent way. Main goal of such process is to schedule tasks and map them on the appropriate machines to achieve the optimal overall system performance (Minimize the Make-span and balance the load among the machines. In this paper we present a Time Petri Net (TPN based approach to solve the scheduling problem by mapping each entity (tasks, resources and constraints to correspondent one in the TPN. In this case, the scheduling problem can be reduced to finding an optimal sequence of transitions leading from an initial marking to a final one. Our approach improves the classical mapping algorithms by introducing a control over resources allocation and by taking into consideration the resource balancing aspect leading to an acceptable state of the system. The approach is applied to a specific class of problems where the machines are parallel and identical. This class is analyzed by using the TiNA (Time Net Analyzer tool software developed in the LAAS laboratory (Toulouse, France.

  16. Mlifdect: Android Malware Detection Based on Parallel Machine Learning and Information Fusion

    Directory of Open Access Journals (Sweden)

    Xin Wang

    2017-01-01

    Full Text Available In recent years, Android malware has continued to grow at an alarming rate. More recent malicious apps’ employing highly sophisticated detection avoidance techniques makes the traditional machine learning based malware detection methods far less effective. More specifically, they cannot cope with various types of Android malware and have limitation in detection by utilizing a single classification algorithm. To address this limitation, we propose a novel approach in this paper that leverages parallel machine learning and information fusion techniques for better Android malware detection, which is named Mlifdect. To implement this approach, we first extract eight types of features from static analysis on Android apps and build two kinds of feature sets after feature selection. Then, a parallel machine learning detection model is developed for speeding up the process of classification. Finally, we investigate the probability analysis based and Dempster-Shafer theory based information fusion approaches which can effectively obtain the detection results. To validate our method, other state-of-the-art detection works are selected for comparison with real-world Android apps. The experimental results demonstrate that Mlifdect is capable of achieving higher detection accuracy as well as a remarkable run-time efficiency compared to the existing malware detection solutions.

  17. Performance analysis of three dimensional integral equation computations on a massively parallel computer. M.S. Thesis

    Science.gov (United States)

    Logan, Terry G.

    1994-01-01

    The purpose of this study is to investigate the performance of the integral equation computations using numerical source field-panel method in a massively parallel processing (MPP) environment. A comparative study of computational performance of the MPP CM-5 computer and conventional Cray-YMP supercomputer for a three-dimensional flow problem is made. A serial FORTRAN code is converted into a parallel CM-FORTRAN code. Some performance results are obtained on CM-5 with 32, 62, 128 nodes along with those on Cray-YMP with a single processor. The comparison of the performance indicates that the parallel CM-FORTRAN code near or out-performs the equivalent serial FORTRAN code for some cases.

  18. Predictive ability of machine learning methods for massive crop yield prediction

    Directory of Open Access Journals (Sweden)

    Alberto Gonzalez-Sanchez

    2014-04-01

    Full Text Available An important issue for agricultural planning purposes is the accurate yield estimation for the numerous crops involved in the planning. Machine learning (ML is an essential approach for achieving practical and effective solutions for this problem. Many comparisons of ML methods for yield prediction have been made, seeking for the most accurate technique. Generally, the number of evaluated crops and techniques is too low and does not provide enough information for agricultural planning purposes. This paper compares the predictive accuracy of ML and linear regression techniques for crop yield prediction in ten crop datasets. Multiple linear regression, M5-Prime regression trees, perceptron multilayer neural networks, support vector regression and k-nearest neighbor methods were ranked. Four accuracy metrics were used to validate the models: the root mean square error (RMS, root relative square error (RRSE, normalized mean absolute error (MAE, and correlation factor (R. Real data of an irrigation zone of Mexico were used for building the models. Models were tested with samples of two consecutive years. The results show that M5-Prime and k-nearest neighbor techniques obtain the lowest average RMSE errors (5.14 and 4.91, the lowest RRSE errors (79.46% and 79.78%, the lowest average MAE errors (18.12% and 19.42%, and the highest average correlation factors (0.41 and 0.42. Since M5-Prime achieves the largest number of crop yield models with the lowest errors, it is a very suitable tool for massive crop yield prediction in agricultural planning.

  19. A Parallel Decision Model Based on Support Vector Machines and Its Application to Fault Diagnosis

    Institute of Scientific and Technical Information of China (English)

    Yan Weiwu(阎威武); Shao Huihe

    2004-01-01

    Many industrial process systems are becoming more and more complex and are characterized by distributed features. To ensure such a system to operate under working order, distributed parameter values are often inspected from subsystems or different points in order to judge working conditions of the system and make global decisions. In this paper, a parallel decision model based on Support Vector Machine (PDMSVM) is introduced and applied to the distributed fault diagnosis in industrial process. PDMSVM is convenient for information fusion of distributed system and it performs well in fault diagnosis with distributed features. PDMSVM makes decision based on synthetic information of subsystems and takes the advantage of Support Vector Machine. Therefore decisions made by PDMSVM are highly reliable and accurate.

  20. A Hybrid Genetic Algorithm to Minimize Total Tardiness for Unrelated Parallel Machine Scheduling with Precedence Constraints

    Directory of Open Access Journals (Sweden)

    Chunfeng Liu

    2013-01-01

    Full Text Available The paper presents a novel hybrid genetic algorithm (HGA for a deterministic scheduling problem where multiple jobs with arbitrary precedence constraints are processed on multiple unrelated parallel machines. The objective is to minimize total tardiness, since delays of the jobs may lead to punishment cost or cancellation of orders by the clients in many situations. A priority rule-based heuristic algorithm, which schedules a prior job on a prior machine according to the priority rule at each iteration, is suggested and embedded to the HGA for initial feasible schedules that can be improved in further stages. Computational experiments are conducted to show that the proposed HGA performs well with respect to accuracy and efficiency of solution for small-sized problems and gets better results than the conventional genetic algorithm within the same runtime for large-sized problems.

  1. Multi-machine scaling of the main SOL parallel heat flux width in tokamak limiter plasmas

    Science.gov (United States)

    Horacek, J.; Pitts, R. A.; Adamek, J.; Arnoux, G.; Bak, J.-G.; Brezinsek, S.; Dimitrova, M.; Goldston, R. J.; Gunn, J. P.; Havlicek, J.; Hong, S.-H.; Janky, F.; LaBombard, B.; Marsen, S.; Maddaluno, G.; Nie, L.; Pericoli, V.; Popov, Tsv; Panek, R.; Rudakov, D.; Seidl, J.; Seo, D. S.; Shimada, M.; Silva, C.; Stangeby, P. C.; Viola, B.; Vondracek, P.; Wang, H.; Xu, G. S.; Xu, Y.; Contributors, JET

    2016-07-01

    As in many of today’s tokamaks, plasma start-up in ITER will be performed in limiter configuration on either the inner or outer midplane first wall (FW). The massive, beryllium armored ITER FW panels are toroidally shaped to protect panel-to-panel misalignments, increasing the deposited power flux density compared with a purely cylindrical surface. The chosen shaping should thus be optimized for a given radial profile of parallel heat flux, {{q}||} in the scrape-off layer (SOL) to ensure optimal power spreading. For plasmas limited on the outer wall in tokamaks, this profile is commonly observed to decay exponentially as {{q}||}={{q}0}\\text{exp} ~≤ft(-r/λ q\\text{omp}\\right) , or, for inner wall limiter plasmas with the double exponential decay comprising a sharp near-SOL feature and a broader main SOL width, λ q\\text{omp} . The initial choice of λ q\\text{omp} , which is critical in ensuring that current ramp-up or down will be possible as planned in the ITER scenario design, was made on the basis of an extremely restricted L-mode divertor dataset, using infra-red thermography measurements on the outer divertor target to extrapolate to a heat flux width at the main plasma midplane. This unsatisfactory situation has now been significantly improved by a dedicated multi-machine ohmic and L-mode limiter plasma study, conducted under the auspices of the International Tokamak Physics Activity, involving 11 tokamaks covering a wide parameter range with R=\\text{0}\\text{.4--2}\\text{.8} \\text{m}, {{B}0}=\\text{1}\\text{.2--7}\\text{.5} \\text{T}, {{I}\\text{p}}=\\text{9--2500} \\text{kA}. Measurements of λ q\\text{omp} in the database are made exclusively on all devices using a variety of fast reciprocating Langmuir probes entering the plasma at a variety of poloidal locations, but with the majority being on the low field side. Statistical analysis of the database reveals nine reasonable engineering and dimensionless scalings. All yield, however, similar

  2. Step kinematic calibration of a 3-DOF planar parallel kinematic machine tool

    Institute of Scientific and Technical Information of China (English)

    CHANG Peng; WANG JinSong; LI TieMin; LIU XinJun; GUAN LiWen

    2008-01-01

    This paper presents a novel step kinematic calibration method for a 3 degree-of-freedom (DOF) planar parallel kinematic machine tool, based on the minimal linear combinations (MLCs) of error parameters. The method using mapping of linear combinations of parameters in error transfer multi-parameters coupling system changes the modeling, identification and error compensation of geometric pa-rameters in the general kinematic calibration into those of linear combinations of parameters. By using the four theorems of the MLCs, the sets of the MLCs that are respectively related to the relative precision and absolute precision are determined. All simple and feasible measurement methods in practice are given, and identifica-tion analysis of the set of the MLCs for each measurement is carried out. According to the identification analysis results, a step calibration including step measurement, step identification and step error compensation is determined by taking into ac-count both measurement costs and observability. The experiment shows that the proposed method has the following merits: (1) the parameter errors that cannot influence precision are completely avoided; (2) it reflects the mapping of linear combinations of parameters more accurately and enhances the precision of identi-fication; and (3) the method is robust, efficient and effective, so that the errors in position and orientation are kept at the same order of the measurement noise. Due to these merits, the present method is attractive for the 3-DOF planar parallel ki-nematic machine tool and can be also applied to other parallel kinematic machine tools with weakly nonlinear kinematics.

  3. Step kinematic calibration of a 3-DOF planar parallel kinematic machine tool

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    This paper presents a novel step kinematic calibration method for a 3 degree-of-freedom(DOF) planar parallel kinematic machine tool,based on the minimal linear combinations(MLCs) of error parameters.The method using mapping of linear combinations of parameters in error transfer multi-parameters coupling system changes the modeling,identification and error compensation of geometric parameters in the general kinematic calibration into those of linear combinations of parameters.By using the four theorems of the MLCs,the sets of the MLCs that are respectively related to the relative precision and absolute precision are determined.All simple and feasible measurement methods in practice are given,and identification analysis of the set of the MLCs for each measurement is carried out.According to the identification analysis results,a step calibration including step measurement,step identification and step error compensation is determined by taking into account both measurement costs and observability.The experiment shows that the proposed method has the following merits:(1) the parameter errors that cannot influence precision are completely avoided;(2) it reflects the mapping of linear combinations of parameters more accurately and enhances the precision of identification;and(3) the method is robust,efficient and effective,so that the errors in position and orientation are kept at the same order of the measurement noise.Due to these merits,the present method is attractive for the 3-DOF planar parallel kinematic machine tool and can be also applied to other parallel kinematic machine tools with weakly nonlinear kinematics.

  4. Identical parallel machine scheduling with nonlinear deterioration and multiple rate modifying activities

    Directory of Open Access Journals (Sweden)

    Ömer Öztürkoğlu

    2017-07-01

    Full Text Available This study focuses on identical parallel machine scheduling of jobs with deteriorating processing times and rate-modifying activities. We consider non linearly increasing processing times of jobs based on their position assignment. Rate modifying activities are also considered to recover the increase in processing times of jobs due to deterioration. We also propose heuristics algorithms that rely on ant colony optimization and simulated annealing algorithms to solve the problem with multiple RMAs in a reasonable amount of time. Finally, we show that ant colony optimization algorithm generates close optimal solutions and superior results than simulated annealing algorithm.

  5. Minimizing flowtime subject to optimal makespan on two identical parallel machines

    Directory of Open Access Journals (Sweden)

    Jatinder N. D. Gupta

    2000-06-01

    Full Text Available We consider the problem of scheduling jobs on two parallel identical machines where an optimal schedule is defined as one that gives the smallest total flowtime (the sum of the completion time of all jobs among the set of schedules with optimal makespan (the completion time of the latest job. Utilizing an existing optimization algorithm for the minimization of makespan, we propose an algorithm to determine optimal schedules for this problem. We empirically show that the proposed algorithm can quickly find optimal schedules for problems containing a large number of jobs.

  6. CLIPS meets the connection machine: Or how to create a parallel production system

    Science.gov (United States)

    Geyer, Steve

    1990-01-01

    Production systems usually present unacceptable run-times when faced with applications requiring tens of thousands to millions of facts. Many efforts have focused on the use of parallelism as a way to increase overall system performance. While these efforts have increased pattern matching and rule evaluation rates, they have only indirectly dealt with the problems faced by fact burdened applications. We have implemented PPS, a version of CLIPS running on the Connection Machine, to directly address the problems faced by these applications. This paper will describe our system, discuss its implementation, and present results.

  7. Comparing the performance of different meta-heuristics for unweighted parallel machine scheduling

    Directory of Open Access Journals (Sweden)

    Adamu, Mumuni Osumah

    2015-08-01

    Full Text Available This article considers the due window scheduling problem to minimise the number of early and tardy jobs on identical parallel machines. This problem is known to be NP complete and thus finding an optimal solution is unlikely. Three meta-heuristics and their hybrids are proposed and extensive computational experiments are conducted. The purpose of this paper is to compare the performance of these meta-heuristics and their hybrids and to determine the best among them. Detailed comparative tests have also been conducted to analyse the different heuristics with the simulated annealing hybrid giving the best result.

  8. Integrated configurable equipment selection and line balancing for mass production with serial-parallel machining systems

    Science.gov (United States)

    Battaïa, Olga; Dolgui, Alexandre; Guschinsky, Nikolai; Levin, Genrikh

    2014-10-01

    Solving equipment selection and line balancing problems together allows better line configurations to be reached and avoids local optimal solutions. This article considers jointly these two decision problems for mass production lines with serial-parallel workplaces. This study was motivated by the design of production lines based on machines with rotary or mobile tables. Nevertheless, the results are more general and can be applied to assembly and production lines with similar structures. The designers' objectives and the constraints are studied in order to suggest a relevant mathematical model and an efficient optimization approach to solve it. A real case study is used to validate the model and the developed approach.

  9. Nonlinearity for a parallel kinematic machine tool and its application to interpolation accuracy analysis

    Institute of Scientific and Technical Information of China (English)

    2002-01-01

    This paper is concerned with the kinematic nonlinearity measure of parallel kinematic machine tool (PKM),which depends upon differential geometry curvature.The nonlinearity can be described by the curve of the solution locus and the equal interval input of joints mapping into inequable interval output of the end-effectors.Such curing and inequation can be measured by BW curvature.So the curvature can measure the nonlinearity of PKM indirectly.Then the distribution of BW curvature in the local area and the whole workspace are also discussed.An example of application to the interpolation accuracy analysis of PKM is given to illustrate the effectiveness of this approach.

  10. An Optimal Algorithm for a Class of Parallel Machines Scheduling Problem

    Institute of Scientific and Technical Information of China (English)

    常俊林; 邵惠鹤

    2004-01-01

    This paper considers the parallel machines scheduling problem where jobs are subject to different release times. A constructive heuristic is first proposed to solve the problem in a modest amount of computer time. In general, the quality of the solutions provided by heuristics degrades with the increase of the probiem's scale. Combined the global search ability of genetic algorithm, this paper proposed a hybrid heuristic to improve the quality of solutions further. The computational results show that the hybrid heuristic combines the advantages of heuristic and genetic algorithm effectively and can provide very good solutions to some large problems in a reasonable amount of computer time.

  11. A Comparative Study between Two Three-DOF Parallel Kinematic Machines using Kinetostatic Criteria and Interval Analysis

    CERN Document Server

    Chablat, Damien; Merlet, Jean-Pierre

    2007-01-01

    This paper addresses the workspace analysis of two 3-DOF translational parallel mechanisms designed for machining applications. The two machines features three fixed linear joints. The joint axes of the first machine are orthogonal whereas these of the second are parallel. In both cases, the mobile platform moves in the Cartesian $x-y-z$ space with fixed orientation. The workspace analysis is conducted on the basis of prescribed kinetostatic performances. Interval analysis based methods are used to compute the dextrous workspace and the largest cube enclosed in this workspace.

  12. Computational cost of isogeometric multi-frontal solvers on parallel distributed memory machines

    KAUST Repository

    Woźniak, Maciej

    2015-02-01

    This paper derives theoretical estimates of the computational cost for isogeometric multi-frontal direct solver executed on parallel distributed memory machines. We show theoretically that for the Cp-1 global continuity of the isogeometric solution, both the computational cost and the communication cost of a direct solver are of order O(log(N)p2) for the one dimensional (1D) case, O(Np2) for the two dimensional (2D) case, and O(N4/3p2) for the three dimensional (3D) case, where N is the number of degrees of freedom and p is the polynomial order of the B-spline basis functions. The theoretical estimates are verified by numerical experiments performed with three parallel multi-frontal direct solvers: MUMPS, PaStiX and SuperLU, available through PETIGA toolkit built on top of PETSc. Numerical results confirm these theoretical estimates both in terms of p and N. For a given problem size, the strong efficiency rapidly decreases as the number of processors increases, becoming about 20% for 256 processors for a 3D example with 1283 unknowns and linear B-splines with C0 global continuity, and 15% for a 3D example with 643 unknowns and quartic B-splines with C3 global continuity. At the same time, one cannot arbitrarily increase the problem size, since the memory required by higher order continuity spaces is large, quickly consuming all the available memory resources even in the parallel distributed memory version. Numerical results also suggest that the use of distributed parallel machines is highly beneficial when solving higher order continuity spaces, although the number of processors that one can efficiently employ is somehow limited.

  13. Development and application of a massively parallel KKR Green function method for large scale systems

    Energy Technology Data Exchange (ETDEWEB)

    Thiess, Alexander R.

    2011-12-19

    In this thesis we present the development of the self-consistent, full-potential Korringa-Kohn-Rostoker (KKR) Green function method KKRnano for calculating the electronic properties, magnetic interactions, and total energy including all electrons on the basis of the density functional theory (DFT) on high-end massively parallelized high-performance computers for supercells containing thousands of atoms without sacrifice of accuracy. KKRnano was used for the following two applications. The first application is centered in the field of dilute magnetic semiconductors. In this field a new promising material combination was identified: gadolinium doped gallium nitride which shows ferromagnetic ordering of colossal magnetic moments above room temperature. It quickly turned out that additional extrinsic defects are inducing the striking properties. However, the question which kind of extrinsic defects are present in experimental samples is still unresolved. In order to shed light on this open question, we perform extensive studies of the most promising candidates: interstitial nitrogen and oxygen, as well as gallium vacancies. By analyzing the pairwise magnetic coupling among defects it is shown that nitrogen and oxygen interstitials cannot support thermally stable ferromagnetic order. Gallium vacancies, on the other hand, facilitate an important coupling mechanism. The vacancies are found to induce large magnetic moments on all surrounding nitrogen sites, which then couple ferromagnetically both among themselves and with the gadolinium dopants. Based on a statistical evaluation it can be concluded that already small concentrations of gallium vacancies can lead to a distinct long-range ferromagnetic ordering. Beyond this important finding we present further indications, from which we infer that gallium vacancies likely cause the striking ferromagnetic coupling of colossal magnetic moments in GaN:Gd. The second application deals with the phase-change material germanium

  14. Machine Learning Based Online Performance Prediction for Runtime Parallelization and Task Scheduling

    Energy Technology Data Exchange (ETDEWEB)

    Li, J; Ma, X; Singh, K; Schulz, M; de Supinski, B R; McKee, S A

    2008-10-09

    With the emerging many-core paradigm, parallel programming must extend beyond its traditional realm of scientific applications. Converting existing sequential applications as well as developing next-generation software requires assistance from hardware, compilers and runtime systems to exploit parallelism transparently within applications. These systems must decompose applications into tasks that can be executed in parallel and then schedule those tasks to minimize load imbalance. However, many systems lack a priori knowledge about the execution time of all tasks to perform effective load balancing with low scheduling overhead. In this paper, we approach this fundamental problem using machine learning techniques first to generate performance models for all tasks and then applying those models to perform automatic performance prediction across program executions. We also extend an existing scheduling algorithm to use generated task cost estimates for online task partitioning and scheduling. We implement the above techniques in the pR framework, which transparently parallelizes scripts in the popular R language, and evaluate their performance and overhead with both a real-world application and a large number of synthetic representative test scripts. Our experimental results show that our proposed approach significantly improves task partitioning and scheduling, with maximum improvements of 21.8%, 40.3% and 22.1% and average improvements of 15.9%, 16.9% and 4.2% for LMM (a real R application) and synthetic test cases with independent and dependent tasks, respectively.

  15. Enhanced computation method of topological smoothing on shared memory parallel machines

    Directory of Open Access Journals (Sweden)

    Mahmoudi Ramzi

    2011-01-01

    Full Text Available Abstract To prepare images for better segmentation, we need preprocessing applications, such as smoothing, to reduce noise. In this paper, we present an enhanced computation method for smoothing 2D object in binary case. Unlike existing approaches, proposed method provides a parallel computation and better memory management, while preserving the topology (number of connected components of the original image by using homotopic transformations defined in the framework of digital topology. We introduce an adapted parallelization strategy called split, distribute and merge (SDM strategy which allows efficient parallelization of a large class of topological operators. To achieve a good speedup and better memory allocation, we cared about task scheduling and managing. Distributed work during smoothing process is done by a variable number of threads. Tests on 2D grayscale image (512*512, using shared memory parallel machine (SMPM with 8 CPU cores (2× Xeon E5405 running at frequency of 2 GHz, showed an enhancement of 5.2 with cache success rate of 70%.

  16. Massively parallel read mapping on GPUs with the q-group index and PEANUT

    NARCIS (Netherlands)

    J. Köster (Johannes); S. Rahmann (Sven)

    2014-01-01

    textabstractWe present the q-group index, a novel data structure for read mapping tailored towards graphics processing units (GPUs) with a small memory footprint and efficient parallel algorithms for querying and building. On top of the q-group index we introduce PEANUT, a highly parallel GPU-based

  17. Noninvasive prenatal diagnosis of fetal trisomy 21 by allelic ratio analysis using targeted massively parallel sequencing of maternal plasma DNA.

    Directory of Open Access Journals (Sweden)

    Gary J W Liao

    Full Text Available BACKGROUND: Plasma DNA obtained from a pregnant woman contains a mixture of maternal and fetal DNA. The fetal DNA proportion in maternal plasma is relatively consistent as determined using polymorphic genetic markers across different chromosomes in euploid pregnancies. For aneuploid pregnancies, the observed fetal DNA proportion measured using polymorphic genetic markers for the aneuploid chromosome would be perturbed. In this study, we investigated the feasibility of analyzing single nucleotide polymorphisms using targeted massively parallel sequencing to detect such perturbations in mothers carrying trisomy 21 fetuses. METHODOLOGY/PRINCIPAL FINDINGS: DNA was extracted from plasma samples collected from fourteen pregnant women carrying singleton fetuses. Hybridization-based targeted sequencing was used to enrich 2 906 single nucleotide polymorphism loci on chr7, chr13, chr18 and chr21. Plasma DNA libraries with and without target enrichment were analyzed by massively parallel sequencing. Genomic DNA samples of both the mother and fetus for each case were genotyped by single nucleotide polymorphism microarray analysis. For the targeted regions, the mean sequencing depth of the enriched samples was 225-fold higher than that of the non-enriched samples. From the targeted sequencing data, the ratio between fetus-specific and shared alleles increased by approximately 2-fold on chr21 in the paternally-derived trisomy 21 case. In comparison, the ratio is decreased by approximately 11% on chr21 in the maternally-derived trisomy 21 cases but with much overlap with the ratio of the euploid cases. Computer simulation revealed the relationship between the fetal DNA proportion, the number of informative alleles and the depth of sequencing. CONCLUSIONS/SIGNIFICANCE: Targeted massively parallel sequencing of single nucleotide polymorphism loci in maternal plasma DNA is a potential approach for trisomy 21 detection. However, the method appears to be less

  18. Method and apparatus for obtaining stack traceback data for multiple computing nodes of a massively parallel computer system

    Science.gov (United States)

    Gooding, Thomas Michael; McCarthy, Patrick Joseph

    2010-03-02

    A data collector for a massively parallel computer system obtains call-return stack traceback data for multiple nodes by retrieving partial call-return stack traceback data from each node, grouping the nodes in subsets according to the partial traceback data, and obtaining further call-return stack traceback data from a representative node or nodes of each subset. Preferably, the partial data is a respective instruction address from each node, nodes having identical instruction address being grouped together in the same subset. Preferably, a single node of each subset is chosen and full stack traceback data is retrieved from the call-return stack within the chosen node.

  19. FAST-SeqS: a simple and efficient method for the detection of aneuploidy by massively parallel sequencing.

    Directory of Open Access Journals (Sweden)

    Isaac Kinde

    Full Text Available Massively parallel sequencing of cell-free, maternal plasma DNA was recently demonstrated to be a safe and effective screening method for fetal chromosomal aneuploidies. Here, we report an improved sequencing method achieving significantly increased throughput and decreased cost by replacing laborious sequencing library preparation steps with PCR employing a single primer pair designed to amplify a discrete subset of repeated regions. Using this approach, samples containing as little as 4% trisomy 21 DNA could be readily distinguished from euploid samples.

  20. Massively Parallel Linear Stability Analysis with P_ARPACK for 3D Fluid Flow Modeled with MPSalsa

    Energy Technology Data Exchange (ETDEWEB)

    Lehoucq, R.B.; Salinger, A.G.

    1998-10-13

    We are interested in the stability of three-dimensional fluid flows to small dkturbances. One computational approach is to solve a sequence of large sparse generalized eigenvalue problems for the leading modes that arise from discretizating the differential equations modeling the flow. The modes of interest are the eigenvalues of largest real part and their associated eigenvectors. We discuss our work to develop an effi- cient and reliable eigensolver for use by the massively parallel simulation code MPSalsa. MPSalsa allows simulation of complex 3D fluid flow, heat transfer, and mass transfer with detailed bulk fluid and surface chemical reaction kinetics.

  1. Efficient Parallel Sorting for Migrating Birds Optimization When Solving Machine-Part Cell Formation Problems

    Directory of Open Access Journals (Sweden)

    Ricardo Soto

    2016-01-01

    Full Text Available The Machine-Part Cell Formation Problem (MPCFP is a NP-Hard optimization problem that consists in grouping machines and parts in a set of cells, so that each cell can operate independently and the intercell movements are minimized. This problem has largely been tackled in the literature by using different techniques ranging from classic methods such as linear programming to more modern nature-inspired metaheuristics. In this paper, we present an efficient parallel version of the Migrating Birds Optimization metaheuristic for solving the MPCFP. Migrating Birds Optimization is a population metaheuristic based on the V-Flight formation of the migrating birds, which is proven to be an effective formation in energy saving. This approach is enhanced by the smart incorporation of parallel procedures that notably improve performance of the several sorting processes performed by the metaheuristic. We perform computational experiments on 1080 benchmarks resulting from the combination of 90 well-known MPCFP instances with 12 sorting configurations with and without threads. We illustrate promising results where the proposal is able to reach the global optimum in all instances, while the solving time with respect to a nonparallel approach is notably reduced.

  2. A Multiobjective Optimization Approach to Solve a Parallel Machines Scheduling Problem

    Directory of Open Access Journals (Sweden)

    Xiaohui Li

    2010-01-01

    Full Text Available A multiobjective optimization problem which focuses on parallel machines scheduling is considered. This problem consists of scheduling independent jobs on identical parallel machines with release dates, due dates, and sequence-dependent setup times. The preemption of jobs is forbidden. The aim is to minimize two different objectives: makespan and total tardiness. The contribution of this paper is to propose first a new mathematical model for this specific problem. Then, since this problem is NP hard in the strong sense, two well-known approximated methods, NSGA-II and SPEA-II, are adopted to solve it. Experimental results show the advantages of NSGA-II for the studied problem. An exact method is then applied to be compared with NSGA-II algorithm in order to prove the efficiency of the former. Experimental results show the advantages of NSGA-II for the studied problem. Computational experiments show that on all the tested instances, our NSGA-II algorithm was able to get the optimal solutions.

  3. Integer-encoded massively parallel processing of fast-learning fuzzy ARTMAP neural networks

    Science.gov (United States)

    Bahr, Hubert A.; DeMara, Ronald F.; Georgiopoulos, Michael

    1997-04-01

    In this paper we develop techniques that are suitable for the parallel implementation of Fuzzy ARTMAP networks. Speedup and learning performance results are provided for execution on a DECmpp/Sx-1208 parallel processor consisting of a DEC RISC Workstation Front-End and MasPar MP-1 Back-End with 8,192 processors. Experiments of the parallel implementation were conducted on the Letters benchmark database developed by Frey and Slate. The results indicate a speedup on the order of 1000-fold which allows combined training and testing time of under four minutes.

  4. Minimizing the total tardiness for the tool change scheduling problem on parallel machines

    Directory of Open Access Journals (Sweden)

    Antonio Costa

    2016-04-01

    Full Text Available This paper deals with the total tardiness minimization problem in a parallel machines manufacturing environment where tool change operations have to be scheduled along with jobs. The mentioned issue belongs to the family of scheduling problems under deterministic machine availability restrictions. A new model that considers the effects of the tool wear on the quality characteristics of the worked product is proposed. Since no mathematical programming-based approach has been developed by literature so far, two distinct mixed integer linear programming models, able to schedule jobs as well as tool change activities along the provided production horizon, have been devised. The former is an adaptation of a well-known model presented by the relevant literature for the single machine scheduling problem with tool changes. The latter has been specifically developed for the issue at hand. After a theoretical analysis aimed at revealing the differences between the proposed mathematical models in terms of computational complexity, an extensive experimental campaign has been fulfilled to assess performances of the proposed methods under the CPU time viewpoint. Obtained results have been statistically analyzed through a properly arranged ANOVA analysis.

  5. Dispatching Equal-Length Jobs to Parallel Machines to Maximize Throughput

    Science.gov (United States)

    Bunde, David P.; Goldwasser, Michael H.

    We consider online, nonpreemptive scheduling of equal-length jobs on parallel machines. Jobs have arbitrary release times and deadlines and a scheduler's goal is to maximize the number of completed jobs (Pm | r j ,p j = p | ∑ 1 - U j ). This problem has been previously studied under two distinct models. In the first, a scheduler must provide immediate notification to a released job as to whether it is accepted into the system. In a stricter model, a scheduler must provide an immediate decision for an accepted job, selecting both the time interval and machine on which it will run. We examine an intermediate model in which a scheduler immediately dispatches an accepted job to a machine, but without committing it to a specific time interval. We present a natural algorithm that is optimally competitive for m = 2. For the special case of unit-length jobs, it achieves competitive ratios for m ≥ 2 that are strictly better than lower bounds for the immediate decision model.

  6. Hybrid massively parallel fast sweeping method for static Hamilton–Jacobi equations

    Energy Technology Data Exchange (ETDEWEB)

    Detrixhe, Miles, E-mail: mdetrixhe@engineering.ucsb.edu [Department of Mechanical Engineering (United States); University of California Santa Barbara, Santa Barbara, CA, 93106 (United States); Gibou, Frédéric, E-mail: fgibou@engineering.ucsb.edu [Department of Mechanical Engineering (United States); University of California Santa Barbara, Santa Barbara, CA, 93106 (United States); Department of Computer Science (United States); Department of Mathematics (United States)

    2016-10-01

    The fast sweeping method is a popular algorithm for solving a variety of static Hamilton–Jacobi equations. Fast sweeping algorithms for parallel computing have been developed, but are severely limited. In this work, we present a multilevel, hybrid parallel algorithm that combines the desirable traits of two distinct parallel methods. The fine and coarse grained components of the algorithm take advantage of heterogeneous computer architecture common in high performance computing facilities. We present the algorithm and demonstrate its effectiveness on a set of example problems including optimal control, dynamic games, and seismic wave propagation. We give results for convergence, parallel scaling, and show state-of-the-art speedup values for the fast sweeping method.

  7. Hybrid massively parallel fast sweeping method for static Hamilton-Jacobi equations

    Science.gov (United States)

    Detrixhe, Miles; Gibou, Frédéric

    2016-10-01

    The fast sweeping method is a popular algorithm for solving a variety of static Hamilton-Jacobi equations. Fast sweeping algorithms for parallel computing have been developed, but are severely limited. In this work, we present a multilevel, hybrid parallel algorithm that combines the desirable traits of two distinct parallel methods. The fine and coarse grained components of the algorithm take advantage of heterogeneous computer architecture common in high performance computing facilities. We present the algorithm and demonstrate its effectiveness on a set of example problems including optimal control, dynamic games, and seismic wave propagation. We give results for convergence, parallel scaling, and show state-of-the-art speedup values for the fast sweeping method.

  8. Massive problem reports mining and analysis based parallelism for similar search

    Science.gov (United States)

    Zhou, Ya; Hu, Cailin; Xiong, Han; Wei, Xiafei; Li, Ling

    2017-05-01

    Massive problem reports and solutions accumulated over time and continuously collected in XML Spreadsheet (XMLSS) format from enterprises and organizations, which record a series of comprehensive description about problems that can help technicians to trace problems and their solutions. It's a significant and challenging issue to effectively manage and analyze these massive semi-structured data to provide similar problem solutions, decisions of immediate problem and assisting product optimization for users during hardware and software maintenance. For this purpose, we build a data management system to manage, mine and analyze these data search results that can be categorized and organized into several categories for users to quickly find out where their interesting results locate. Experiment results demonstrate that this system is better than traditional centralized management system on the performance and the adaptive capability of heterogeneous data greatly. Besides, because of re-extracting topics, it enables each cluster to be described more precise and reasonable.

  9. Massively parallel processing on the Intel Paragon system: One tool in achieving the goals of the Human Genome Project

    Energy Technology Data Exchange (ETDEWEB)

    Ecklund, D.J. [Intel Supercomputer Systems Division, Beaverton, OR (United States)

    1993-12-31

    A massively parallel computing system is one tool that has been adopted by researchers in the Human Genome Project. This tool is one of many in a toolbox of theories, algorithms, and systems that are used to attack the many questions posed by the project. A good tool functions well when applied alone to the problem for which it was devised. A superior tool achieves its solitary goal, and supports and interacts with other tools to achieve goals beyond the scope of any individual tool. The author believes that Intel`s massively parallel Paragon{trademark} XP/S system is a superior tool. This paper presents specific requirements for a superior computing tool for the Human Genome Project (HGP) and shows how the Paragon system addresses these requirements. Computing requirements for HGP are based on three factors: (1) computing requirements of algorithms currently used in sequence homology, protein folding, and database insertion/retrieval; (2) estimates of the computing requirements of new applications arising from evolving biological theories; and (3) the requirements for facilities that support collaboration among scientists in a project of this magnitude. The Paragon system provides many hardware and software features that effectively address these requirements.

  10. A Parallel Adaboost-Backpropagation Neural Network for Massive Image Dataset Classification

    Science.gov (United States)

    Cao, Jianfang; Chen, Lichao; Wang, Min; Shi, Hao; Tian, Yun

    2016-12-01

    Image classification uses computers to simulate human understanding and cognition of images by automatically categorizing images. This study proposes a faster image classification approach that parallelizes the traditional Adaboost-Backpropagation (BP) neural network using the MapReduce parallel programming model. First, we construct a strong classifier by assembling the outputs of 15 BP neural networks (which are individually regarded as weak classifiers) based on the Adaboost algorithm. Second, we design Map and Reduce tasks for both the parallel Adaboost-BP neural network and the feature extraction algorithm. Finally, we establish an automated classification model by building a Hadoop cluster. We use the Pascal VOC2007 and Caltech256 datasets to train and test the classification model. The results are superior to those obtained using traditional Adaboost-BP neural network or parallel BP neural network approaches. Our approach increased the average classification accuracy rate by approximately 14.5% and 26.0% compared to the traditional Adaboost-BP neural network and parallel BP neural network, respectively. Furthermore, the proposed approach requires less computation time and scales very well as evaluated by speedup, sizeup and scaleup. The proposed approach may provide a foundation for automated large-scale image classification and demonstrates practical value.

  11. A Parallel Adaboost-Backpropagation Neural Network for Massive Image Dataset Classification.

    Science.gov (United States)

    Cao, Jianfang; Chen, Lichao; Wang, Min; Shi, Hao; Tian, Yun

    2016-12-01

    Image classification uses computers to simulate human understanding and cognition of images by automatically categorizing images. This study proposes a faster image classification approach that parallelizes the traditional Adaboost-Backpropagation (BP) neural network using the MapReduce parallel programming model. First, we construct a strong classifier by assembling the outputs of 15 BP neural networks (which are individually regarded as weak classifiers) based on the Adaboost algorithm. Second, we design Map and Reduce tasks for both the parallel Adaboost-BP neural network and the feature extraction algorithm. Finally, we establish an automated classification model by building a Hadoop cluster. We use the Pascal VOC2007 and Caltech256 datasets to train and test the classification model. The results are superior to those obtained using traditional Adaboost-BP neural network or parallel BP neural network approaches. Our approach increased the average classification accuracy rate by approximately 14.5% and 26.0% compared to the traditional Adaboost-BP neural network and parallel BP neural network, respectively. Furthermore, the proposed approach requires less computation time and scales very well as evaluated by speedup, sizeup and scaleup. The proposed approach may provide a foundation for automated large-scale image classification and demonstrates practical value.

  12. Enhancing Radio Access Network Performance over LTE-A for Machine-to-Machine Communications under Massive Access

    Directory of Open Access Journals (Sweden)

    Fatemah Alsewaidi

    2016-01-01

    Full Text Available The expected tremendous growth of machine-to-machine (M2M devices will require solutions to improve random access channel (RACH performance. Recent studies have shown that radio access network (RAN performance is degraded under the high density of devices. In this paper, we propose three methods to enhance RAN performance for M2M communications over the LTE-A standard. The first method employs a different value for the physical RACH configuration index to increase random access opportunities. The second method addresses a heterogeneous network by using a number of picocells to increase resources and offload control traffic from the macro base station. The third method involves aggregation points and addresses their effect on RAN performance. Based on evaluation results, our methods improved RACH performance in terms of the access success probability and average access delay.

  13. Massively parallel computing simulation of fluid flow in the unsaturated zone of Yucca Mountain, Nevada

    Energy Technology Data Exchange (ETDEWEB)

    Zhang, Keni; Wu, Yu-Shu; Bodvarsson, G.S.

    2001-08-31

    This paper presents the application of parallel computing techniques to large-scale modeling of fluid flow in the unsaturated zone (UZ) at Yucca Mountain, Nevada. In this study, parallel computing techniques, as implemented into the TOUGH2 code, are applied in large-scale numerical simulations on a distributed-memory parallel computer. The modeling study has been conducted using an over-one-million-cell three-dimensional numerical model, which incorporates a wide variety of field data for the highly heterogeneous fractured formation at Yucca Mountain. The objective of this study is to analyze the impact of various surface infiltration scenarios (under current and possible future climates) on flow through the UZ system, using various hydrogeological conceptual models with refined grids. The results indicate that the one-million-cell models produce better resolution results and reveal some flow patterns that cannot be obtained using coarse-grid modeling models.

  14. A Novel Algorithm for Solving the Multidimensional Neutron Transport Equation on Massively Parallel Architectures

    Energy Technology Data Exchange (ETDEWEB)

    Azmy, Yousry

    2014-06-10

    We employ the Integral Transport Matrix Method (ITMM) as the kernel of new parallel solution methods for the discrete ordinates approximation of the within-group neutron transport equation. The ITMM abandons the repetitive mesh sweeps of the traditional source iterations (SI) scheme in favor of constructing stored operators that account for the direct coupling factors among all the cells' fluxes and between the cells' and boundary surfaces' fluxes. The main goals of this work are to develop the algorithms that construct these operators and employ them in the solution process, determine the most suitable way to parallelize the entire procedure, and evaluate the behavior and parallel performance of the developed methods with increasing number of processes, P. The fastest observed parallel solution method, Parallel Gauss-Seidel (PGS), was used in a weak scaling comparison with the PARTISN transport code, which uses the source iteration (SI) scheme parallelized with the Koch-baker-Alcouffe (KBA) method. Compared to the state-of-the-art SI-KBA with diffusion synthetic acceleration (DSA), this new method- even without acceleration/preconditioning-is completitive for optically thick problems as P is increased to the tens of thousands range. For the most optically thick cells tested, PGS reduced execution time by an approximate factor of three for problems with more than 130 million computational cells on P = 32,768. Moreover, the SI-DSA execution times's trend rises generally more steeply with increasing P than the PGS trend. Furthermore, the PGS method outperforms SI for the periodic heterogeneous layers (PHL) configuration problems. The PGS method outperforms SI and SI-DSA on as few as P = 16 for PHL problems and reduces execution time by a factor of ten or more for all problems considered with more than 2 million computational cells on P = 4.096.

  15. Compliance modeling and analysis of a 3-RPS parallel kinematic machine module

    Science.gov (United States)

    Zhang, Jun; Zhao, Yanqin; Dai, Jiansheng

    2014-07-01

    The compliance modeling and rigidity performance evaluation for the lower mobility parallel manipulators are still to be remained as two overwhelming challenges in the stage of conceptual design due to their geometric complexities. By using the screw theory, this paper explores the compliance modeling and eigencompliance evaluation of a newly patented 1T2R spindle head whose topological architecture is a 3-RPS parallel mechanism. The kinematic definitions and inverse position analysis are briefly addressed in the first place to provide necessary information for compliance modeling. By considering the 3-RPS parallel kinematic machine(PKM) as a typical compliant parallel device, whose three limb assemblages have bending, extending and torsional deflections, an analytical compliance model for the spindle head is established with screw theory and the analytical stiffness matrix of the platform is formulated. Based on the eigenscrew decomposition, the eigencompliance and corresponding eigenscrews are analyzed and the platform's compliance properties are physically interpreted as the suspension of six screw springs. The distributions of stiffness constants of the six screw springs throughout the workspace are predicted in a quick manner with a piece-by-piece calculation algorithm. The numerical simulation reveals a strong dependency of platform's compliance on its configuration in that they are axially symmetric due to structural features. At the last stage, the effects of some design variables such as structural, configurational and dimensional parameters on system rigidity characteristics are investigated with the purpose of providing useful information for the structural design and performance improvement of the PKM. Compared with previous efforts in compliance analysis of PKMs, the present methodology is more intuitive and universal thus can be easily applied to evaluate the overall rigidity performance of other PKMs with high efficiency.

  16. An Interval Analysis Based Study for the Design and the Comparison of 3-DOF Parallel Kinematic Machines

    CERN Document Server

    Chablat, Damien; Majou, Félix; Merlet, Jean-Pierre

    2004-01-01

    This paper addresses an interval analysis based study that is applied to the design and the comparison of 3-DOF parallel kinematic machines. Two design criteria are used, (i) a regular workspace shape and, (ii) a kinetostatic performance index that needs to be as homogeneous as possible throughout the workspace. The interval analysis based method takes these two criteria into account: on the basis of prescribed kinetostatic performances, the workspace is analysed to find out the largest regular dextrous workspace enclosed in the Cartesian workspace. An algorithm describing this method is introduced. Two 3-DOF translational parallel mechanisms designed for machining applications are compared using this method. The first machine features three fixed linear joints which are mounted orthogonally and the second one features three linear joints which are mounted in parallel. In both cases, the mobile platform moves in the Cartesian x-y-z space with fixed orientation.

  17. Use of massively parallel computing to improve modelling accuracy within the nuclear sector

    Directory of Open Access Journals (Sweden)

    L M Evans

    2016-06-01

    This work presents recent advancements in three techniques: Uncertainty quantification (UQ; Cellular automata finite element (CAFE; Image based finite element methods (IBFEM. Case studies are presented demonstrating their suitability for use in nuclear engineering made possible by advancements in parallel computing hardware that is projected to be available for industry within the next decade costing of the order of $100k.

  18. Spatiotemporal Domain Decomposition for Massive Parallel Computation of Space-Time Kernel Density

    Science.gov (United States)

    Hohl, A.; Delmelle, E. M.; Tang, W.

    2015-07-01

    Accelerated processing capabilities are deemed critical when conducting analysis on spatiotemporal datasets of increasing size, diversity and availability. High-performance parallel computing offers the capacity to solve computationally demanding problems in a limited timeframe, but likewise poses the challenge of preventing processing inefficiency due to workload imbalance between computing resources. Therefore, when designing new algorithms capable of implementing parallel strategies, careful spatiotemporal domain decomposition is necessary to account for heterogeneity in the data. In this study, we perform octtree-based adaptive decomposition of the spatiotemporal domain for parallel computation of space-time kernel density. In order to avoid edge effects near subdomain boundaries, we establish spatiotemporal buffers to include adjacent data-points that are within the spatial and temporal kernel bandwidths. Then, we quantify computational intensity of each subdomain to balance workloads among processors. We illustrate the benefits of our methodology using a space-time epidemiological dataset of Dengue fever, an infectious vector-borne disease that poses a severe threat to communities in tropical climates. Our parallel implementation of kernel density reaches substantial speedup compared to sequential processing, and achieves high levels of workload balance among processors due to great accuracy in quantifying computational intensity. Our approach is portable of other space-time analytical tests.

  19. High performance domain decomposition methods on massively parallel architectures with FreeFEM++

    OpenAIRE

    Jolivet, Pierre; Dolean, Victorita; Hecht, Frédéric; Nataf, Frédéric; Prud'Homme, Christophe; Spillane, Nicole

    2012-01-01

    International audience; In this document, we present a parallel implementation in Freefem++ of scalable two-level domain decomposition methods. Numerical studies with highly heterogeneous problems are then performed on large clusters in order to assert the performance of our code.

  20. Machine learning and parallelism in the reconstruction of LHCb and its upgrade

    CERN Document Server

    Stahl, Marian

    2017-01-01

    After a highly successful first data taking period at the LHC, the LHCb experiment developed a new trigger strategy with a real-time reconstruction, alignment and calibration for Run II. This strategy relies on offline-like track reconstruction in the high level trigger, making a separate offline event reconstruction unnecessary. To enable such reconstruction, and additionally keeping up with a higher event rate due to the accelerator upgrade, the time used by the track reconstruction had to be decreased. Timing improvements have in parts been achieved by utilizing parallel computing techniques that will be described in this document by considering two example applications. Despite decreasing computing time, the reconstruction quality in terms of reconstruction efficiency and fake rate could be improved at several places. Two applications of fast machine learning techniques are highlighted, refining track candidate selection at the early stages of the reconstruction.

  1. Variable Neighborhood Search for Parallel Machines Scheduling Problem with Step Deteriorating Jobs

    Directory of Open Access Journals (Sweden)

    Wenming Cheng

    2012-01-01

    Full Text Available In many real scheduling environments, a job processed later needs longer time than the same job when it starts earlier. This phenomenon is known as scheduling with deteriorating jobs to many industrial applications. In this paper, we study a scheduling problem of minimizing the total completion time on identical parallel machines where the processing time of a job is a step function of its starting time and a deteriorating date that is individual to all jobs. Firstly, a mixed integer programming model is presented for the problem. And then, a modified weight-combination search algorithm and a variable neighborhood search are employed to yield optimal or near-optimal schedule. To evaluate the performance of the proposed algorithms, computational experiments are performed on randomly generated test instances. Finally, computational results show that the proposed approaches obtain near-optimal solutions in a reasonable computational time even for large-sized problems.

  2. Nonlinear Elastodynamic Behaviour Analysis of High-Speed Spatial Parallel Coordinate Measuring Machines

    Directory of Open Access Journals (Sweden)

    Xiulong Chen

    2012-10-01

    Full Text Available In order to study the elastodynamic behaviour of 4‐ universal joints‐ prismatic pairs‐ spherical joints / universal joints‐ prismatic pairs‐ universal joints 4‐UPS‐UPU high‐speed spatial PCMMs(parallel coordinate measuring machines, the nonlinear time‐varying dynamics model, which comprehensively considers geometric nonlinearity and the rigid‐flexible coupling effect, is derived by using Lagrange equations and finite element methods. Based on the Newmark method, the kinematics output response of 4‐UPS‐UPU PCMMs is illustrated through numerical simulation. The results of the simulation show that the flexibility of the links is demonstrated to have a significant impact on the system dynamics response. This research can provide the important theoretical base of the optimization design and vibration control for 4‐UPS‐UPU PCMMs.

  3. An Adaptive Method For Texture Characterization In Medical Images Implemented on a Parallel Virtual Machine

    Directory of Open Access Journals (Sweden)

    Socrates A. Mylonas

    2003-06-01

    Full Text Available This paper describes the application of a new texture characterization algorithm for the segmentation of medical ultrasound images. The morphology of these images poses significant problems for the application of traditional image processing techniques and their analysis has been the subject of research for several years. The basis of the algorithm is an optimum signal modelling algorithm (Least Mean Squares-based, which estimates a set of parameters from small image regions. The algorithm has been converted to a structure suitable for implementation on a Parallel Virtual Machine (PVM consisting of a Network of Workstations (NoW, to improve processing speed. Tests were initially carried out on standard textured images. This paper describes preliminary results of the application of the algorithm in texture discrimination and segmentation of medical ultrasound images. The images examined are primarily used in the diagnosis of carotid plaques, which are linked to the risk of stroke.

  4. DecGPU: distributed error correction on massively parallel graphics processing units using CUDA and MPI.

    Science.gov (United States)

    Liu, Yongchao; Schmidt, Bertil; Maskell, Douglas L

    2011-03-29

    Next-generation sequencing technologies have led to the high-throughput production of sequence data (reads) at low cost. However, these reads are significantly shorter and more error-prone than conventional Sanger shotgun reads. This poses a challenge for the de novo assembly in terms of assembly quality and scalability for large-scale short read datasets. We present DecGPU, the first parallel and distributed error correction algorithm for high-throughput short reads (HTSRs) using a hybrid combination of CUDA and MPI parallel programming models. DecGPU provides CPU-based and GPU-based versions, where the CPU-based version employs coarse-grained and fine-grained parallelism using the MPI and OpenMP parallel programming models, and the GPU-based version takes advantage of the CUDA and MPI parallel programming models and employs a hybrid CPU+GPU computing model to maximize the performance by overlapping the CPU and GPU computation. The distributed feature of our algorithm makes it feasible and flexible for the error correction of large-scale HTSR datasets. Using simulated and real datasets, our algorithm demonstrates superior performance, in terms of error correction quality and execution speed, to the existing error correction algorithms. Furthermore, when combined with Velvet and ABySS, the resulting DecGPU-Velvet and DecGPU-ABySS assemblers demonstrate the potential of our algorithm to improve de novo assembly quality for de-Bruijn-graph-based assemblers. DecGPU is publicly available open-source software, written in CUDA C++ and MPI. The experimental results suggest that DecGPU is an effective and feasible error correction algorithm to tackle the flood of short reads produced by next-generation sequencing technologies.

  5. DecGPU: distributed error correction on massively parallel graphics processing units using CUDA and MPI

    Directory of Open Access Journals (Sweden)

    Schmidt Bertil

    2011-03-01

    Full Text Available Abstract Background Next-generation sequencing technologies have led to the high-throughput production of sequence data (reads at low cost. However, these reads are significantly shorter and more error-prone than conventional Sanger shotgun reads. This poses a challenge for the de novo assembly in terms of assembly quality and scalability for large-scale short read datasets. Results We present DecGPU, the first parallel and distributed error correction algorithm for high-throughput short reads (HTSRs using a hybrid combination of CUDA and MPI parallel programming models. DecGPU provides CPU-based and GPU-based versions, where the CPU-based version employs coarse-grained and fine-grained parallelism using the MPI and OpenMP parallel programming models, and the GPU-based version takes advantage of the CUDA and MPI parallel programming models and employs a hybrid CPU+GPU computing model to maximize the performance by overlapping the CPU and GPU computation. The distributed feature of our algorithm makes it feasible and flexible for the error correction of large-scale HTSR datasets. Using simulated and real datasets, our algorithm demonstrates superior performance, in terms of error correction quality and execution speed, to the existing error correction algorithms. Furthermore, when combined with Velvet and ABySS, the resulting DecGPU-Velvet and DecGPU-ABySS assemblers demonstrate the potential of our algorithm to improve de novo assembly quality for de-Bruijn-graph-based assemblers. Conclusions DecGPU is publicly available open-source software, written in CUDA C++ and MPI. The experimental results suggest that DecGPU is an effective and feasible error correction algorithm to tackle the flood of short reads produced by next-generation sequencing technologies.

  6. A Heuristic for the Job Scheduling Problem with a Common Due Window on Parallel and Non-Identical Machines

    Institute of Scientific and Technical Information of China (English)

    2001-01-01

    In this paper, we give a mathematical model for earliness-tardiness job scheduling problem with a common due window on parallel and non-identical machines. Because the job scheduling problem discussed in the paper contains a problem of minimizing make-span, which is NP-complete on parallel and uniform machines, a heuristic algorithm is presented to find an approximate solution for the scheduling problem after proving an important theorem. Two numerical examples illustrate that the heuristic algorithm is very useful and effective in obtaining the near-optimal solution.

  7. OpenCL Implementation of a Parallel Universal Kriging Algorithm for Massive Spatial Data Interpolation on Heterogeneous Systems

    Directory of Open Access Journals (Sweden)

    Fang Huang

    2016-06-01

    Full Text Available In some digital Earth engineering applications, spatial interpolation algorithms are required to process and analyze large amounts of data. Due to its powerful computing capacity, heterogeneous computing has been used in many applications for data processing in various fields. In this study, we explore the design and implementation of a parallel universal kriging spatial interpolation algorithm using the OpenCL programming model on heterogeneous computing platforms for massive Geo-spatial data processing. This study focuses primarily on transforming the hotspots in serial algorithms, i.e., the universal kriging interpolation function, into the corresponding kernel function in OpenCL. We also employ parallelization and optimization techniques in our implementation to improve the code performance. Finally, based on the results of experiments performed on two different high performance heterogeneous platforms, i.e., an NVIDIA graphics processing unit system and an Intel Xeon Phi system (MIC, we show that the parallel universal kriging algorithm can achieve the highest speedup of up to 40× with a single computing device and the highest speedup of up to 80× with multiple devices.

  8. MIP models and hybrid algorithms for simultaneous job splitting and scheduling on unrelated parallel machines.

    Science.gov (United States)

    Eroglu, Duygu Yilmaz; Ozmutlu, H Cenk

    2014-01-01

    We developed mixed integer programming (MIP) models and hybrid genetic-local search algorithms for the scheduling problem of unrelated parallel machines with job sequence and machine-dependent setup times and with job splitting property. The first contribution of this paper is to introduce novel algorithms which make splitting and scheduling simultaneously with variable number of subjobs. We proposed simple chromosome structure which is constituted by random key numbers in hybrid genetic-local search algorithm (GAspLA). Random key numbers are used frequently in genetic algorithms, but it creates additional difficulty when hybrid factors in local search are implemented. We developed algorithms that satisfy the adaptation of results of local search into the genetic algorithms with minimum relocation operation of genes' random key numbers. This is the second contribution of the paper. The third contribution of this paper is three developed new MIP models which are making splitting and scheduling simultaneously. The fourth contribution of this paper is implementation of the GAspLAMIP. This implementation let us verify the optimality of GAspLA for the studied combinations. The proposed methods are tested on a set of problems taken from the literature and the results validate the effectiveness of the proposed algorithms.

  9. Machine learning and parallelism in the reconstruction of LHCb and its upgrade

    CERN Document Server

    De Cian, Michel

    2016-01-01

    The LHCb detector at the LHC is a general purpose detector in the forward region with a focus on reconstructing decays of c- and b-hadrons. For Run II of the LHC, a new trigger strategy with a real-time reconstruction, alignment and calibration was employed. This was made possible by implementing an oine-like track reconstruction in the high level trigger. However, the ever increasing need for a higher throughput and the move to parallelism in the CPU architectures in the last years necessitated the use of vectorization techniques to achieve the desired speed and a more extensive use of machine learning to veto bad events early on. This document discusses selected improvements in computationally expensive parts of the track reconstruction, like the Kalman filter, as well as an improved approach to get rid of fake tracks using fast machine learning techniques. In the last part, a short overview of the track reconstruction challenges for the upgrade of LHCb, is given. Running a fully software-based trigger, a l...

  10. Parallel HOP: A Scalable Halo Finder for Massive Cosmological Data Sets

    CERN Document Server

    Skory, Stephen; Norman, Michael L; Coil, Alison L

    2010-01-01

    Modern N-body cosmological simulations contain billions ($10^9$) of dark matter particles. These simulations require hundreds to thousands of gigabytes of memory, and employ hundreds to tens of thousands of processing cores on many compute nodes. In order to study the distribution of dark matter in a cosmological simulation, the dark matter halos must be identified using a halo finder, which establishes the halo membership of every particle in the simulation. The resources required for halo finding are similar to the requirements for the simulation itself. In particular, simulations have become too extensive to use commonly-employed halo finders, such that the computational requirements to identify halos must now be spread across multiple nodes and cores. Here we present a scalable-parallel halo finding method called Parallel HOP for large-scale cosmological simulation data. Based on the halo finder HOP, it utilizes MPI and domain decomposition to distribute the halo finding workload across multiple compute n...

  11. Analysis and selection of optimal function implementations in massively parallel computer

    Science.gov (United States)

    Archer, Charles Jens; Peters, Amanda; Ratterman, Joseph D.

    2011-05-31

    An apparatus, program product and method optimize the operation of a parallel computer system by, in part, collecting performance data for a set of implementations of a function capable of being executed on the parallel computer system based upon the execution of the set of implementations under varying input parameters in a plurality of input dimensions. The collected performance data may be used to generate selection program code that is configured to call selected implementations of the function in response to a call to the function under varying input parameters. The collected performance data may be used to perform more detailed analysis to ascertain the comparative performance of the set of implementations of the function under the varying input parameters.

  12. Cyclops Tensor Framework: Reducing Communication and Eliminating Load Imbalance in Massively Parallel Contractions

    Science.gov (United States)

    2013-02-13

    regular decomposition which employs a structured communication pattern well-suited for torus network architectures. B. ACES III and SIAL The ACES III...package uses the SIAL framework [14], [15] for distributed memory tensor contractions in coupled-cluster theory. Like the NWChem TCE, SIAL uses tiling to...extract parallelism from each tensor contraction. However, SIAL has a different runtime approach that does not require active- messages, but rather

  13. ESPRIT-Forest: Parallel clustering of massive amplicon sequence data in subquadratic time.

    Science.gov (United States)

    Cai, Yunpeng; Zheng, Wei; Yao, Jin; Yang, Yujie; Mai, Volker; Mao, Qi; Sun, Yijun

    2017-04-01

    The rapid development of sequencing technology has led to an explosive accumulation of genomic sequence data. Clustering is often the first step to perform in sequence analysis, and hierarchical clustering is one of the most commonly used approaches for this purpose. However, it is currently computationally expensive to perform hierarchical clustering of extremely large sequence datasets due to its quadratic time and space complexities. In this paper we developed a new algorithm called ESPRIT-Forest for parallel hierarchical clustering of sequences. The algorithm achieves subquadratic time and space complexity and maintains a high clustering accuracy comparable to the standard method. The basic idea is to organize sequences into a pseudo-metric based partitioning tree for sub-linear time searching of nearest neighbors, and then use a new multiple-pair merging criterion to construct clusters in parallel using multiple threads. The new algorithm was tested on the human microbiome project (HMP) dataset, currently one of the largest published microbial 16S rRNA sequence dataset. Our experiment demonstrated that with the power of parallel computing it is now compu- tationally feasible to perform hierarchical clustering analysis of tens of millions of sequences. The software is available at http://www.acsu.buffalo.edu/∼yijunsun/lab/ESPRIT-Forest.html.

  14. Fully Parallel Self-Learning Analog Support Vector Machine Employing Compact Gaussian Generation Circuits

    Science.gov (United States)

    Zhang, Renyuan; Shibata, Tadashi

    2012-04-01

    An analog support vector machine (SVM) processor employing a fully parallel self-learning circuitry was developed for the classification of highly dimensional patterns. To implement a highly dimensional Gaussian function, which is the most powerful kernel function in classification algorithms but computationally expensive, a compact analog Gaussian generation circuit was developed. By employing this proposed Gaussian generation circuit, a fully parallel self-learning processor based on an SVM algorithm was built for 64 dimension pattern classification. The chip real estate occupied by the processor is very small. The object images from two classes were converted into 64 dimension vectors using the algorithm developed in a previous work and fed into the processor. The learning process autonomously proceeded without any clock-based control and self-converged within a single clock cycle of the system (at 10 MHz). Some test object images were used to verify the learning performance. According to the circuit simulation results, it was shown that all the test images were classified into correct classes in real time. A proof-of-concept chip was designed in a 0.18 µm complementary metal-oxide-semiconductor (CMOS) technology, and the performance of the proposed SVM processor was confirmed from the measurement results of the fabricated chips.

  15. Massively parallel digital high resolution melt for rapid and absolutely quantitative sequence profiling

    Science.gov (United States)

    Velez, Daniel Ortiz; Mack, Hannah; Jupe, Julietta; Hawker, Sinead; Kulkarni, Ninad; Hedayatnia, Behnam; Zhang, Yang; Lawrence, Shelley; Fraley, Stephanie I.

    2017-02-01

    In clinical diagnostics and pathogen detection, profiling of complex samples for low-level genotypes represents a significant challenge. Advances in speed, sensitivity, and extent of multiplexing of molecular pathogen detection assays are needed to improve patient care. We report the development of an integrated platform enabling the identification of bacterial pathogen DNA sequences in complex samples in less than four hours. The system incorporates a microfluidic chip and instrumentation to accomplish universal PCR amplification, High Resolution Melting (HRM), and machine learning within 20,000 picoliter scale reactions, simultaneously. Clinically relevant concentrations of bacterial DNA molecules are separated by digitization across 20,000 reactions and amplified with universal primers targeting the bacterial 16S gene. Amplification is followed by HRM sequence fingerprinting in all reactions, simultaneously. The resulting bacteria-specific melt curves are identified by Support Vector Machine learning, and individual pathogen loads are quantified. The platform reduces reaction volumes by 99.995% and achieves a greater than 200-fold increase in dynamic range of detection compared to traditional PCR HRM approaches. Type I and II error rates are reduced by 99% and 100% respectively, compared to intercalating dye-based digital PCR (dPCR) methods. This technology could impact a number of quantitative profiling applications, especially infectious disease diagnostics.

  16. Massively parallel digital high resolution melt for rapid and absolutely quantitative sequence profiling

    Science.gov (United States)

    Velez, Daniel Ortiz; Mack, Hannah; Jupe, Julietta; Hawker, Sinead; Kulkarni, Ninad; Hedayatnia, Behnam; Zhang, Yang; Lawrence, Shelley; Fraley, Stephanie I.

    2017-01-01

    In clinical diagnostics and pathogen detection, profiling of complex samples for low-level genotypes represents a significant challenge. Advances in speed, sensitivity, and extent of multiplexing of molecular pathogen detection assays are needed to improve patient care. We report the development of an integrated platform enabling the identification of bacterial pathogen DNA sequences in complex samples in less than four hours. The system incorporates a microfluidic chip and instrumentation to accomplish universal PCR amplification, High Resolution Melting (HRM), and machine learning within 20,000 picoliter scale reactions, simultaneously. Clinically relevant concentrations of bacterial DNA molecules are separated by digitization across 20,000 reactions and amplified with universal primers targeting the bacterial 16S gene. Amplification is followed by HRM sequence fingerprinting in all reactions, simultaneously. The resulting bacteria-specific melt curves are identified by Support Vector Machine learning, and individual pathogen loads are quantified. The platform reduces reaction volumes by 99.995% and achieves a greater than 200-fold increase in dynamic range of detection compared to traditional PCR HRM approaches. Type I and II error rates are reduced by 99% and 100% respectively, compared to intercalating dye-based digital PCR (dPCR) methods. This technology could impact a number of quantitative profiling applications, especially infectious disease diagnostics. PMID:28176860

  17. Massively parallel digital high resolution melt for rapid and absolutely quantitative sequence profiling.

    Science.gov (United States)

    Velez, Daniel Ortiz; Mack, Hannah; Jupe, Julietta; Hawker, Sinead; Kulkarni, Ninad; Hedayatnia, Behnam; Zhang, Yang; Lawrence, Shelley; Fraley, Stephanie I

    2017-02-08

    In clinical diagnostics and pathogen detection, profiling of complex samples for low-level genotypes represents a significant challenge. Advances in speed, sensitivity, and extent of multiplexing of molecular pathogen detection assays are needed to improve patient care. We report the development of an integrated platform enabling the identification of bacterial pathogen DNA sequences in complex samples in less than four hours. The system incorporates a microfluidic chip and instrumentation to accomplish universal PCR amplification, High Resolution Melting (HRM), and machine learning within 20,000 picoliter scale reactions, simultaneously. Clinically relevant concentrations of bacterial DNA molecules are separated by digitization across 20,000 reactions and amplified with universal primers targeting the bacterial 16S gene. Amplification is followed by HRM sequence fingerprinting in all reactions, simultaneously. The resulting bacteria-specific melt curves are identified by Support Vector Machine learning, and individual pathogen loads are quantified. The platform reduces reaction volumes by 99.995% and achieves a greater than 200-fold increase in dynamic range of detection compared to traditional PCR HRM approaches. Type I and II error rates are reduced by 99% and 100% respectively, compared to intercalating dye-based digital PCR (dPCR) methods. This technology could impact a number of quantitative profiling applications, especially infectious disease diagnostics.

  18. SIESTA-PEXSI: Massively parallel method for efficient and accurate \\textit{ab initio} materials simulation without matrix diagonalization

    CERN Document Server

    Lin, Lin; Huhs, Georg; Yang, Chao

    2014-01-01

    We describe a scheme for efficient large-scale electronic-structure calculations based on the combination of the pole expansion and selected inversion (PEXSI) technique with the SIESTA method, which uses numerical atomic orbitals within the Kohn-Sham density functional theory (KSDFT) framework. The PEXSI technique can efficiently utilize the sparsity pattern of the Hamiltonian and overlap matrices generated in SIESTA, and for large systems has a much lower computational complexity than that associated with the matrix diagonalization procedure. The PEXSI technique can be used to evaluate the electron density, free energy, atomic forces, density of states and local density of states without computing any eigenvalue or eigenvector of the Kohn-Sham Hamiltonian. It can achieve accuracy fully comparable to that obtained from a matrix diagonalization procedure for general systems, including metallic systems at low temperature. The PEXSI method is also highly scalable. With the recently developed massively parallel P...

  19. Process Simulation of Complex Biological Pathways in Physical Reactive Space and Reformulated for Massively Parallel Computing Platforms.

    Science.gov (United States)

    Ganesan, Narayan; Li, Jie; Sharma, Vishakha; Jiang, Hanyu; Compagnoni, Adriana

    2016-01-01

    Biological systems encompass complexity that far surpasses many artificial systems. Modeling and simulation of large and complex biochemical pathways is a computationally intensive challenge. Traditional tools, such as ordinary differential equations, partial differential equations, stochastic master equations, and Gillespie type methods, are all limited either by their modeling fidelity or computational efficiency or both. In this work, we present a scalable computational framework based on modeling biochemical reactions in explicit 3D space, that is suitable for studying the behavior of large and complex biological pathways. The framework is designed to exploit parallelism and scalability offered by commodity massively parallel processors such as the graphics processing units (GPUs) and other parallel computing platforms. The reaction modeling in 3D space is aimed at enhancing the realism of the model compared to traditional modeling tools and framework. We introduce the Parallel Select algorithm that is key to breaking the sequential bottleneck limiting the performance of most other tools designed to study biochemical interactions. The algorithm is designed to be computationally tractable, handle hundreds of interacting chemical species and millions of independent agents by considering all-particle interactions within the system. We also present an implementation of the framework on the popular graphics processing units and apply it to the simulation study of JAK-STAT Signal Transduction Pathway. The computational framework will offer a deeper insight into various biological processes within the cell and help us observe key events as they unfold in space and time. This will advance the current state-of-the-art in simulation study of large scale biological systems and also enable the realistic simulation study of macro-biological cultures, where inter-cellular interactions are prevalent.

  20. Advances in time-domain electromagnetic simulation capabilities through the use of overset grids and massively parallel computing

    Science.gov (United States)

    Blake, Douglas Clifton

    A new methodology is presented for conducting numerical simulations of electromagnetic scattering and wave-propagation phenomena on massively parallel computing platforms. A process is constructed which is rooted in the Finite-Volume Time-Domain (FVTD) technique to create a simulation capability that is both versatile and practical. In terms of versatility, the method is platform independent, is easily modifiable, and is capable of solving a large number of problems with no alterations. In terms of practicality, the method is sophisticated enough to solve problems of engineering significance and is not limited to mere academic exercises. In order to achieve this capability, techniques are integrated from several scientific disciplines including computational fluid dynamics, computational electromagnetics, and parallel computing. The end result is the first FVTD solver capable of utilizing the highly flexible overset-gridding process in a distributed-memory computing environment. In the process of creating this capability, work is accomplished to conduct the first study designed to quantify the effects of domain-decomposition dimensionality on the parallel performance of hyperbolic partial differential equations solvers; to develop a new method of partitioning a computational domain comprised of overset grids; and to provide the first detailed assessment of the applicability of overset grids to the field of computational electromagnetics. Using these new methods and capabilities, results from a large number of wave propagation and scattering simulations are presented. The overset-grid FVTD algorithm is demonstrated to produce results of comparable accuracy to single-grid simulations while simultaneously shortening the grid-generation process and increasing the flexibility and utility of the FVTD technique. Furthermore, the new domain-decomposition approaches developed for overset grids are shown to be capable of producing partitions that are better load balanced and

  1. Parallel computing works

    Energy Technology Data Exchange (ETDEWEB)

    1991-10-23

    An account of the Caltech Concurrent Computation Program (C{sup 3}P), a five year project that focused on answering the question: Can parallel computers be used to do large-scale scientific computations '' As the title indicates, the question is answered in the affirmative, by implementing numerous scientific applications on real parallel computers and doing computations that produced new scientific results. In the process of doing so, C{sup 3}P helped design and build several new computers, designed and implemented basic system software, developed algorithms for frequently used mathematical computations on massively parallel machines, devised performance models and measured the performance of many computers, and created a high performance computing facility based exclusively on parallel computers. While the initial focus of C{sup 3}P was the hypercube architecture developed by C. Seitz, many of the methods developed and lessons learned have been applied successfully on other massively parallel architectures.

  2. Genome-wide footprints of pig domestication and selection revealed through massive parallel sequencing of pooled DNA.

    Directory of Open Access Journals (Sweden)

    Andreia J Amaral

    Full Text Available BACKGROUND: Artificial selection has caused rapid evolution in domesticated species. The identification of selection footprints across domesticated genomes can contribute to uncover the genetic basis of phenotypic diversity. METHODOLOGY/MAIN FINDINGS: Genome wide footprints of pig domestication and selection were identified using massive parallel sequencing of pooled reduced representation libraries (RRL representing ∼2% of the genome from wild boar and four domestic pig breeds (Large White, Landrace, Duroc and Pietrain which have been under strong selection for muscle development, growth, behavior and coat color. Using specifically developed statistical methods that account for DNA pooling, low mean sequencing depth, and sequencing errors, we provide genome-wide estimates of nucleotide diversity and genetic differentiation in pig. Widespread signals suggestive of positive and balancing selection were found and the strongest signals were observed in Pietrain, one of the breeds most intensively selected for muscle development. Most signals were population-specific but affected genomic regions which harbored genes for common biological categories including coat color, brain development, muscle development, growth, metabolism, olfaction and immunity. Genetic differentiation in regions harboring genes related to muscle development and growth was higher between breeds than between a given breed and the wild boar. CONCLUSIONS/SIGNIFICANCE: These results, suggest that although domesticated breeds have experienced similar selective pressures, selection has acted upon different genes. This might reflect the multiple domestication events of European breeds or could be the result of subsequent introgression of Asian alleles. Overall, it was estimated that approximately 7% of the porcine genome has been affected by selection events. This study illustrates that the massive parallel sequencing of genomic pools is a cost-effective approach to identify

  3. Identification of rare X-linked neuroligin variants by massively parallel sequencing in males with autism spectrum disorder

    Directory of Open Access Journals (Sweden)

    Steinberg Karyn

    2012-09-01

    Full Text Available Abstract Background Autism spectrum disorder (ASD is highly heritable, but the genetic risk factors for it remain largely unknown. Although structural variants with large effect sizes may explain up to 15% ASD, genome-wide association studies have failed to uncover common single nucleotide variants with large effects on phenotype. The focus within ASD genetics is now shifting to the examination of rare sequence variants of modest effect, which is most often achieved via exome selection and sequencing. This strategy has indeed identified some rare candidate variants; however, the approach does not capture the full spectrum of genetic variation that might contribute to the phenotype. Methods We surveyed two loci with known rare variants that contribute to ASD, the X-linked neuroligin genes by performing massively parallel Illumina sequencing of the coding and noncoding regions from these genes in males from families with multiplex autism. We annotated all variant sites and functionally tested a subset to identify other rare mutations contributing to ASD susceptibility. Results We found seven rare variants at evolutionary conserved sites in our study population. Functional analyses of the three 3’ UTR variants did not show statistically significant effects on the expression of NLGN3 and NLGN4X. In addition, we identified two NLGN3 intronic variants located within conserved transcription factor binding sites that could potentially affect gene regulation. Conclusions These data demonstrate the power of massively parallel, targeted sequencing studies of affected individuals for identifying rare, potentially disease-contributing variation. However, they also point out the challenges and limitations of current methods of direct functional testing of rare variants and the difficulties of identifying alleles with modest effects.

  4. Massively parallel DNA sequencing successfully identifies new causative mutations in deafness genes in patients with cochlear implantation and EAS.

    Directory of Open Access Journals (Sweden)

    Maiko Miyagawa

    Full Text Available Genetic factors, the most common etiology in severe to profound hearing loss, are one of the key determinants of Cochlear Implantation (CI and Electric Acoustic Stimulation (EAS outcomes. Satisfactory auditory performance after receiving a CI/EAS in patients with certain deafness gene mutations indicates that genetic testing would be helpful in predicting CI/EAS outcomes and deciding treatment choices. However, because of the extreme genetic heterogeneity of deafness, clinical application of genetic information still entails difficulties. Target exon sequencing using massively parallel DNA sequencing is a new powerful strategy to discover rare causative genes in Mendelian disorders such as deafness. We used massive sequencing of the exons of 58 target candidate genes to analyze 8 (4 early-onset, 4 late-onset Japanese CI/EAS patients, who did not have mutations in commonly found genes including GJB2, SLC26A4, or mitochondrial 1555A>G or 3243A>G mutations. We successfully identified four rare causative mutations in the MYO15A, TECTA, TMPRSS3, and ACTG1 genes in four patients who showed relatively good auditory performance with CI including EAS, suggesting that genetic testing may be able to predict the performance after implantation.

  5. A hybrid multi-objective evolutionary algorithm approach for handling sequence- and machine-dependent set-up times in unrelated parallel machine scheduling problem

    Indian Academy of Sciences (India)

    V K MANUPATI; G RAJYALAKSHMI; FELIX T S CHAN; J J THAKKAR

    2017-03-01

    This paper addresses a fuzzy mixed-integer non-linear programming (FMINLP) model by considering machine-dependent and job-sequence-dependent set-up times that minimize the total completion time,the number of tardy jobs, the total flow time and the machine load variation in the context of unrelated parallel machine scheduling (UPMS) problem. The above-mentioned multi-objectives were considered based on nonzero ready times, machine- and sequence-dependent set-up times and secondary resource constraints for jobs.The proposed approach considers unrelated parallel machines with inherent uncertainty in processing times and due dates. Since the problem is shown to be NP-hard in nature, it is a challenging task to find the optimal/nearoptimal solutions for conflicting objectives simultaneously in a reasonable time. Therefore, we introduced a new multi-objective-based evolutionary artificial immune non-dominated sorting genetic algorithm (AI-NSGA-II) to resolve the above-mentioned complex problem. The performance of the proposed multi-objective AI-NSGA-II algorithm has been compared to that of multi-objective particle swarm optimization (MOPSO) and conventionalnon-dominated sorting genetic algorithm (CNSGA-II), and it is found that the proposed multi-objective-based hybrid meta-heuristic produces high-quality solutions. Finally, the results obtained from benchmark instances and randomly generated instances as test problems evince the robust performance of the proposed multiobjective algorithm.

  6. Extended computational kernels in a massively parallel implementation of the Trotter-Suzuki approximation

    Science.gov (United States)

    Wittek, Peter; Calderaro, Luca

    2015-12-01

    We extended a parallel and distributed implementation of the Trotter-Suzuki algorithm for simulating quantum systems to study a wider range of physical problems and to make the library easier to use. The new release allows periodic boundary conditions, many-body simulations of non-interacting particles, arbitrary stationary potential functions, and imaginary time evolution to approximate the ground state energy. The new release is more resilient to the computational environment: a wider range of compiler chains and more platforms are supported. To ease development, we provide a more extensive command-line interface, an application programming interface, and wrappers from high-level languages.

  7. A massively parallel GPU-accelerated model for analysis of fully nonlinear free surface waves

    DEFF Research Database (Denmark)

    Engsig-Karup, Allan Peter; Madsen, Morten G.; Glimberg, Stefan Lemvig

    2011-01-01

    -throughput co-processors to the CPU. We describe and demonstrate how this approach makes it possible to do fast desktop computations for large nonlinear wave problems in numerical wave tanks (NWTs) with close to 50/100 million total grid points in double/ single precision with 4 GB global device memory...... space dimensions and is useful for fast analysis and prediction purposes in coastal and offshore engineering. A dedicated numerical model based on the proposed algorithm is executed in parallel by utilizing affordable modern special purpose graphics processing unit (GPU). The model is based on a low...

  8. Parallel group independent component analysis for massive fMRI data sets

    Science.gov (United States)

    Huang, Lei; Qiu, Huitong; Nebel, Mary Beth; Mostofsky, Stewart H.; Pekar, James J.; Lindquist, Martin A.; Eloyan, Ani; Caffo, Brian S.

    2017-01-01

    Independent component analysis (ICA) is widely used in the field of functional neuroimaging to decompose data into spatio-temporal patterns of co-activation. In particular, ICA has found wide usage in the analysis of resting state fMRI (rs-fMRI) data. Recently, a number of large-scale data sets have become publicly available that consist of rs-fMRI scans from thousands of subjects. As a result, efficient ICA algorithms that scale well to the increased number of subjects are required. To address this problem, we propose a two-stage likelihood-based algorithm for performing group ICA, which we denote Parallel Group Independent Component Analysis (PGICA). By utilizing the sequential nature of the algorithm and parallel computing techniques, we are able to efficiently analyze data sets from large numbers of subjects. We illustrate the efficacy of PGICA, which has been implemented in R and is freely available through the Comprehensive R Archive Network, through simulation studies and application to rs-fMRI data from two large multi-subject data sets, consisting of 301 and 779 subjects respectively. PMID:28278208

  9. 3D multiphysics modeling of superconducting cavities with a massively parallel simulation suite

    Directory of Open Access Journals (Sweden)

    Oleksiy Kononenko

    2017-10-01

    Full Text Available Radiofrequency cavities based on superconducting technology are widely used in particle accelerators for various applications. The cavities usually have high quality factors and hence narrow bandwidths, so the field stability is sensitive to detuning from the Lorentz force and external loads, including vibrations and helium pressure variations. If not properly controlled, the detuning can result in a serious performance degradation of a superconducting accelerator, so an understanding of the underlying detuning mechanisms can be very helpful. Recent advances in the simulation suite ace3p have enabled realistic multiphysics characterization of such complex accelerator systems on supercomputers. In this paper, we present the new capabilities in ace3p for large-scale 3D multiphysics modeling of superconducting cavities, in particular, a parallel eigensolver for determining mechanical resonances, a parallel harmonic response solver to calculate the response of a cavity to external vibrations, and a numerical procedure to decompose mechanical loads, such as from the Lorentz force or piezoactuators, into the corresponding mechanical modes. These capabilities have been used to do an extensive rf-mechanical analysis of dressed TESLA-type superconducting cavities. The simulation results and their implications for the operational stability of the Linac Coherent Light Source-II are discussed.

  10. A new class of massively parallel direction splitting for the incompressible Navier–Stokes equations

    KAUST Repository

    Guermond, J.L.

    2011-06-01

    We introduce in this paper a new direction splitting algorithm for solving the incompressible Navier-Stokes equations. The main originality of the method consists of using the operator (I-∂xx)(I-∂yy)(I-∂zz) for approximating the pressure correction instead of the Poisson operator as done in all the contemporary projection methods. The complexity of the proposed algorithm is significantly lower than that of projection methods, and it is shown the have the same stability properties as the Poisson-based pressure-correction techniques, either in standard or rotational form. The first-order (in time) version of the method is proved to have the same convergence properties as the classical first-order projection techniques. Numerical tests reveal that the second-order version of the method has the same convergence rate as its second-order projection counterpart as well. The method is suitable for parallel implementation and preliminary tests show excellent parallel performance on a distributed memory cluster of up to 1024 processors. The method has been validated on the three-dimensional lid-driven cavity flow using grids composed of up to 2×109 points. © 2011 Elsevier B.V.

  11. Parallel group independent component analysis for massive fMRI data sets.

    Science.gov (United States)

    Chen, Shaojie; Huang, Lei; Qiu, Huitong; Nebel, Mary Beth; Mostofsky, Stewart H; Pekar, James J; Lindquist, Martin A; Eloyan, Ani; Caffo, Brian S

    2017-01-01

    Independent component analysis (ICA) is widely used in the field of functional neuroimaging to decompose data into spatio-temporal patterns of co-activation. In particular, ICA has found wide usage in the analysis of resting state fMRI (rs-fMRI) data. Recently, a number of large-scale data sets have become publicly available that consist of rs-fMRI scans from thousands of subjects. As a result, efficient ICA algorithms that scale well to the increased number of subjects are required. To address this problem, we propose a two-stage likelihood-based algorithm for performing group ICA, which we denote Parallel Group Independent Component Analysis (PGICA). By utilizing the sequential nature of the algorithm and parallel computing techniques, we are able to efficiently analyze data sets from large numbers of subjects. We illustrate the efficacy of PGICA, which has been implemented in R and is freely available through the Comprehensive R Archive Network, through simulation studies and application to rs-fMRI data from two large multi-subject data sets, consisting of 301 and 779 subjects respectively.

  12. Makespan Minimization for The Identical Machine Parallel Shop with Sequence Dependent Setup Times Using a Genetic Algorithm

    Directory of Open Access Journals (Sweden)

    Salazar-Hornig E.

    2013-01-01

    Full Text Available A genetic algorithm for the parallel shop with identical machines scheduling problem with sequence dependent setup times and makespan (Cmáx minimization is presented. The genetic algorithm is compared with other heuristic methods using a randomly generated test problem set. A local improvement procedure in the evolutionary process of the genetic algorithm is introduced, which significantly improves its performance.

  13. Library preparation and multiplex capture for massive parallel sequencing applications made efficient and easy.

    Directory of Open Access Journals (Sweden)

    Mårten Neiman

    Full Text Available During the recent years, rapid development of sequencing technologies and a competitive market has enabled researchers to perform massive sequencing projects at a reasonable cost. As the price for the actual sequencing reactions drops, enabling more samples to be sequenced, the relative price for preparing libraries gets larger and the practical laboratory work becomes complex and tedious. We present a cost-effective strategy for simplified library preparation compatible with both whole genome- and targeted sequencing experiments. An optimized enzyme composition and reaction buffer reduces the number of required clean-up steps and allows for usage of bulk enzymes which makes the whole process cheap, efficient and simple. We also present a two-tagging strategy, which allows for multiplex sequencing of targeted regions. To prove our concept, we have prepared libraries for low-pass sequencing from 100 ng DNA, performed 2-, 4- and 8-plex exome capture and a 96-plex capture of a 500 kb region. In all samples we see a high concordance (>99.4% of SNP calls when comparing to commercially available SNP-chip platforms.

  14. Massively parallel simulations of relativistic fluid dynamics on graphics processing units with CUDA

    CERN Document Server

    Bazow, Dennis; Strickland, Michael

    2016-01-01

    Relativistic fluid dynamics is a major component in dynamical simulations of the quark-gluon plasma created in relativistic heavy-ion collisions. Simulations of the full three-dimensional dissipative dynamics of the quark-gluon plasma with fluctuating initial conditions are computationally expensive and typically require some degree of parallelization. In this paper, we present a GPU implementation of the Kurganov-Tadmor algorithm which solves the 3+1d relativistic viscous hydrodynamics equations including the effects of both bulk and shear viscosities. We demonstrate that the resulting CUDA-based GPU code is approximately two orders of magnitude faster than the corresponding serial implementation of the Kurganov-Tadmor algorithm. We validate the code using (semi-)analytic tests such as the relativistic shock-tube and Gubser flow.

  15. A new massively parallel version of CRYSTAL for large systems on high performance computing architectures.

    Science.gov (United States)

    Orlando, Roberto; Delle Piane, Massimo; Bush, Ian J; Ugliengo, Piero; Ferrabone, Matteo; Dovesi, Roberto

    2012-10-30

    Fully ab initio treatment of complex solid systems needs computational software which is able to efficiently take advantage of the growing power of high performance computing (HPC) architectures. Recent improvements in CRYSTAL, a periodic ab initio code that uses a Gaussian basis set, allows treatment of very large unit cells for crystalline systems on HPC architectures with high parallel efficiency in terms of running time and memory requirements. The latter is a crucial point, due to the trend toward architectures relying on a very high number of cores with associated relatively low memory availability. An exhaustive performance analysis shows that density functional calculations, based on a hybrid functional, of low-symmetry systems containing up to 100,000 atomic orbitals and 8000 atoms are feasible on the most advanced HPC architectures available to European researchers today, using thousands of processors.

  16. Neptune: An astrophysical smooth particle hydrodynamics code for massively parallel computer architectures

    Science.gov (United States)

    Sandalski, Stou

    Smooth particle hydrodynamics is an efficient method for modeling the dynamics of fluids. It is commonly used to simulate astrophysical processes such as binary mergers. We present a newly developed GPU accelerated smooth particle hydrodynamics code for astrophysical simulations. The code is named neptune after the Roman god of water. It is written in OpenMP parallelized C++ and OpenCL and includes octree based hydrodynamic and gravitational acceleration. The design relies on object-oriented methodologies in order to provide a flexible and modular framework that can be easily extended and modified by the user. Several pre-built scenarios for simulating collisions of polytropes and black-hole accretion are provided. The code is released under the MIT Open Source license and publicly available at http://code.google.com/p/neptune-sph/.

  17. Running ATLAS workloads within massively parallel distributed applications using Athena Multi-Process framework (AthenaMP)

    Science.gov (United States)

    Calafiura, Paolo; Leggett, Charles; Seuster, Rolf; Tsulaia, Vakhtang; Van Gemmeren, Peter

    2015-12-01

    AthenaMP is a multi-process version of the ATLAS reconstruction, simulation and data analysis framework Athena. By leveraging Linux fork and copy-on-write mechanisms, it allows for sharing of memory pages between event processors running on the same compute node with little to no change in the application code. Originally targeted to optimize the memory footprint of reconstruction jobs, AthenaMP has demonstrated that it can reduce the memory usage of certain configurations of ATLAS production jobs by a factor of 2. AthenaMP has also evolved to become the parallel event-processing core of the recently developed ATLAS infrastructure for fine-grained event processing (Event Service) which allows the running of AthenaMP inside massively parallel distributed applications on hundreds of compute nodes simultaneously. We present the architecture of AthenaMP, various strategies implemented by AthenaMP for scheduling workload to worker processes (for example: Shared Event Queue and Shared Distributor of Event Tokens) and the usage of AthenaMP in the diversity of ATLAS event processing workloads on various computing resources: Grid, opportunistic resources and HPC.

  18. Running ATLAS workloads within massively parallel distributed applications using Athena Multi-Process framework (AthenaMP)

    CERN Document Server

    Calafiura, Paolo; The ATLAS collaboration; Seuster, Rolf; Tsulaia, Vakhtang; van Gemmeren, Peter

    2015-01-01

    AthenaMP is a multi-process version of the ATLAS reconstruction and data analysis framework Athena. By leveraging Linux fork and copy-on-write, it allows the sharing of memory pages between event processors running on the same compute node with little to no change in the application code. Originally targeted to optimize the memory footprint of reconstruction jobs, AthenaMP has demonstrated that it can reduce the memory usage of certain confugurations of ATLAS production jobs by a factor of 2. AthenaMP has also evolved to become the parallel event-processing core of the recently developed ATLAS infrastructure for fine-grained event processing (Event Service) which allows to run AthenaMP inside massively parallel distributed applications on hundreds of compute nodes simultaneously. We present the architecture of AthenaMP, various strategies implemented by AthenaMP for scheduling workload to worker processes (for example: Shared Event Queue and Shared Distributor of Event Tokens) and the usage of AthenaMP in the...

  19. Running ATLAS workloads within massively parallel distributed applications using Athena Multi-Process framework (AthenaMP)

    CERN Document Server

    Calafiura, Paolo; Seuster, Rolf; Tsulaia, Vakhtang; van Gemmeren, Peter

    2015-01-01

    AthenaMP is a multi-process version of the ATLAS reconstruction, simulation and data analysis framework Athena. By leveraging Linux fork and copy-on-write, it allows for sharing of memory pages between event processors running on the same compute node with little to no change in the application code. Originally targeted to optimize the memory footprint of reconstruction jobs, AthenaMP has demonstrated that it can reduce the memory usage of certain configurations of ATLAS production jobs by a factor of 2. AthenaMP has also evolved to become the parallel event-processing core of the recently developed ATLAS infrastructure for fine-grained event processing (Event Service) which allows to run AthenaMP inside massively parallel distributed applications on hundreds of compute nodes simultaneously. We present the architecture of AthenaMP, various strategies implemented by AthenaMP for scheduling workload to worker processes (for example: Shared Event Queue and Shared Distributor of Event Tokens) and the usage of Ath...

  20. Deep mutational scanning of an antibody against epidermal growth factor receptor using mammalian cell display and massively parallel pyrosequencing.

    Science.gov (United States)

    Forsyth, Charles M; Juan, Veronica; Akamatsu, Yoshiko; DuBridge, Robert B; Doan, Minhtam; Ivanov, Alexander V; Ma, Zhiyuan; Polakoff, Dixie; Razo, Jennifer; Wilson, Keith; Powers, David B

    2013-01-01

    We developed a method for deep mutational scanning of antibody complementarity-determining regions (CDRs) that can determine in parallel the effect of every possible single amino acid CDR substitution on antigen binding. The method uses libraries of full length IgGs containing more than 1000 CDR point mutations displayed on mammalian cells, sorted by flow cytometry into subpopulations based on antigen affinity and analyzed by massively parallel pyrosequencing. Higher, lower and neutral affinity mutations are identified by their enrichment or depletion in the FACS subpopulations. We applied this method to a humanized version of the anti-epidermal growth factor receptor antibody cetuximab, generated a near comprehensive data set for 1060 point mutations that recapitulates previously determined structural and mutational data for these CDRs and identified 67 point mutations that increase affinity. The large-scale, comprehensive sequence-function data sets generated by this method should have broad utility for engineering properties such as antibody affinity and specificity and may advance theoretical understanding of antibody-antigen recognition.

  1. Nonlinearity for a parallel kinematic machine tool and its application to interpolation accuracy analysis

    Institute of Scientific and Technical Information of China (English)

    2002-01-01

    [1]Carlo,I.,Direct Position Analysis of The Stewart Platform Mechanism,Mech.Mach.Theory,1990,25(6): 611-621.[2]Fitchter,E.F.,A Stewart platform based manipulator general theory and practical construction,Inter.Journal of Robtics Research,1986,(2): 157-182.[3]Raghavan,M.,The Stewart platform of general geometry has 40 configurations,ASME J.Mechanical Design,1993,115(2): 277-281.[4]Wampler,C.W.,Forward displacement analysis of general six-in-parallel SPS (Stewart) platform manipulators using SOMA Coordinates,Mech.Mach.Theory,1995,31(3): 331-337.[5]Wen,F.A.,Liang,C.G.,Displacement analysis of the 6-6 stewart platform mechanisms,Mech.Mach.Theory,1994,29(4): 547-557.[6]Liang,C.G.,Rong,H.,The forward displacement solution to a Stewart platform type maniputator,China Journal of Mechanical Engineering,1991,27(2): 26-30.[7]Huang,Z.,Spatial Mechanisms,Beijing: China Machine Press,1991.[8]Liu,K.et al.,The singularities and dynamics of a stewart platform manipulator,J.of Intelligent and Robotic Systems,1993,8: 287-308.[9]Bates,D.M.,Watts,D.G.,Relative curvature measures of nonlinearity,Journal of the Royal Statistical Society,1980(B42): 1-25.[10]Huang Tian,Wang Jinsong,Whitehouse,D.J.,Theory and methodology for kinematic design of Gough-Stewart platforms,Science in China,Series E,1999,42(4): 425-436.[11]Wang,Z.H.,Wang,J.S.,Yang,X.D.,Wei,Y.M.,Interpolation scheme and simulation study on its accuracy for a Stewart platform based CNC machine Tool VAMTIY,China Mechanical Engineering,1999,10(10): 1121-1123.[12]Wang Zhonghua,Wang Jinsong,Study on interpolation and its error distribution in workspace for a Stewart platform based CNC machine tool,in Proceedings of International Conference on Advanced Manufacturing Systems and Manufacturing Automation (AMSMA'2000),19th-2st,June,2000,Guangzhou,China.

  2. Massively-parallel electrical-conductivity imaging of hydrocarbonsusing the Blue Gene/L supercomputer

    Energy Technology Data Exchange (ETDEWEB)

    Commer, M.; Newman, G.A.; Carazzone, J.J.; Dickens, T.A.; Green,K.E.; Wahrmund, L.A.; Willen, D.E.; Shiu, J.

    2007-05-16

    Large-scale controlled source electromagnetic (CSEM)three-dimensional (3D) geophysical imaging is now receiving considerableattention for electrical conductivity mapping of potential offshore oiland gas reservoirs. To cope with the typically large computationalrequirements of the 3D CSEM imaging problem, our strategies exploitcomputational parallelism and optimized finite-difference meshing. Wereport on an imaging experiment, utilizing 32,768 tasks/processors on theIBM Watson Research Blue Gene/L (BG/L) supercomputer. Over a 24-hourperiod, we were able to image a large scale marine CSEM field data setthat previously required over four months of computing time ondistributed clusters utilizing 1024 tasks on an Infiniband fabric. Thetotal initial data misfit could be decreased by 67 percent within 72completed inversion iterations, indicating an electrically resistiveregion in the southern survey area below a depth of 1500 m below theseafloor. The major part of the residual misfit stems from transmitterparallel receiver components that have an offset from the transmittersail line (broadside configuration). Modeling confirms that improvedbroadside data fits can be achieved by considering anisotropic electricalconductivities. While delivering a satisfactory gross scale image for thedepths of interest, the experiment provides important evidence for thenecessity of discriminating between horizontal and verticalconductivities for maximally consistent 3D CSEM inversions.

  3. Delta: An object-oriented finite element code architecture for massively parallel computers

    Energy Technology Data Exchange (ETDEWEB)

    Weatherby, J.R.; Schutt, J.A.; Peery, J.S.; Hogan, R.E.

    1996-02-01

    Delta is an object-oriented code architecture based on the finite element method which enables simulation of a wide range of engineering mechanics problems in a parallel processing environment. Written in C{sup ++}, Delta is a natural framework for algorithm development and for research involving coupling of mechanics from different Engineering Science disciplines. To enhance flexibility and encourage code reuse, the architecture provides a clean separation of the major aspects of finite element programming. Spatial discretization, temporal discretization, and the solution of linear and nonlinear systems of equations are each implemented separately, independent from the governing field equations. Other attractive features of the Delta architecture include support for constitutive models with internal variables, reusable ``matrix-free`` equation solvers, and support for region-to-region variations in the governing equations and the active degrees of freedom. A demonstration code built from the Delta architecture has been used in two-dimensional and three-dimensional simulations involving dynamic and quasi-static solid mechanics, transient and steady heat transport, and flow in porous media.

  4. Massively parallel measurements of molecular interaction kinetics on a microfluidic platform.

    Science.gov (United States)

    Geertz, Marcel; Shore, David; Maerkl, Sebastian J

    2012-10-09

    Quantitative biology requires quantitative data. No high-throughput technologies exist capable of obtaining several hundred independent kinetic binding measurements in a single experiment. We present an integrated microfluidic device (k-MITOMI) for the simultaneous kinetic characterization of 768 biomolecular interactions. We applied k-MITOMI to the kinetic analysis of transcription factor (TF)-DNA interactions, measuring the detailed kinetic landscapes of the mouse TF Zif268, and the yeast TFs Tye7p, Yox1p, and Tbf1p. We demonstrated the integrated nature of k-MITOMI by expressing, purifying, and characterizing 27 additional yeast transcription factors in parallel on a single device. Overall, we obtained 2,388 association and dissociation curves of 223 unique molecular interactions with equilibrium dissociation constants ranging from 2 × 10(-6) M to 2 × 10(-9) M, and dissociation rate constants of approximately 6 s(-1) to 8.5 × 10(-3) s(-1). Association rate constants were uniform across 3 TF families, ranging from 3.7 × 10(6) M(-1) s(-1) to 9.6 × 10(7) M(-1) s(-1), and are well below the diffusion limit. We expect that k-MITOMI will contribute to our quantitative understanding of biological systems and accelerate the development and characterization of engineered systems.

  5. II - Template Metaprogramming for Massively Parallel Scientific Computing - Vectorization with Expression Templates

    CERN Document Server

    CERN. Geneva

    2016-01-01

    Large scale scientific computing raises questions on different levels ranging from the fomulation of the problems to the choice of the best algorithms and their implementation for a specific platform. There are similarities in these different topics that can be exploited by modern-style C++ template metaprogramming techniques to produce readable, maintainable and generic code. Traditional low-level code tend to be fast but platform-dependent, and it obfuscates the meaning of the algorithm. On the other hand, object-oriented approach is nice to read, but may come with an inherent performance penalty. These lectures aim to present he basics of the Expression Template (ET) idiom which allows us to keep the object-oriented approach without sacrificing performance. We will in particular show to to enhance ET to include SIMD vectorization. We will then introduce techniques for abstracting iteration, and introduce thread-level parallelism for use in heavy data-centric loads. We will show to to apply these methods i...

  6. Comparison of Parallel Kinematic Machines with Three Translational Degrees of Freedom and Linear Actuation

    Institute of Scientific and Technical Information of China (English)

    PRAUSE Isabel; CHARAF EDDINE Sami; CORVES Burkhard

    2015-01-01

    The development of new robot structures, in particular of parallel kinematic machines(PKM), is widely systematized by different structure synthesis methods. Recent research increasingly focuses on PKM with less than six degrees of freedom(DOF). However, an overall comparison and evaluation of these structures is missing. In order to compare symmetrical PKM with three translational DOF, different evaluation criteria are used. Workspace, maximum actuation forces and velocities, power, actuator stiffness, accuracy and transmission behavior are taken into account to investigate strengths and weaknesses of the PKMs. A selection scheme based on possible configurations of translational PKM including different frame configurations is presented. Moreover, an optimization method based on a genetic algorithm is described to determine the geometric parameters of the selected PKM for an exemplary load case and a prescribed workspace. The values of the mentioned criteria are determined for all considered PKM with respect to certain boundary conditions. The distribution and spreading of these values within the prescribed workspace is presented by using box plots for each criterion. Thereby, the performance characteristics of the different structures can be compared directly. The results show that there is no“best”PKM. Further inquiries such as dynamic or stiffness analysis are necessary to extend the comparison and to finally select a PKM.

  7. A distributed parallel genetic algorithm of placement strategy for virtual machines deployment on cloud platform.

    Science.gov (United States)

    Dong, Yu-Shuang; Xu, Gao-Chao; Fu, Xiao-Dong

    2014-01-01

    The cloud platform provides various services to users. More and more cloud centers provide infrastructure as the main way of operating. To improve the utilization rate of the cloud center and to decrease the operating cost, the cloud center provides services according to requirements of users by sharding the resources with virtualization. Considering both QoS for users and cost saving for cloud computing providers, we try to maximize performance and minimize energy cost as well. In this paper, we propose a distributed parallel genetic algorithm (DPGA) of placement strategy for virtual machines deployment on cloud platform. It executes the genetic algorithm parallelly and distributedly on several selected physical hosts in the first stage. Then it continues to execute the genetic algorithm of the second stage with solutions obtained from the first stage as the initial population. The solution calculated by the genetic algorithm of the second stage is the optimal one of the proposed approach. The experimental results show that the proposed placement strategy of VM deployment can ensure QoS for users and it is more effective and more energy efficient than other placement strategies on the cloud platform.

  8. Robust Parallel Machine Scheduling Problem with Uncertainties and Sequence-Dependent Setup Time

    Directory of Open Access Journals (Sweden)

    Hongtao Hu

    2016-01-01

    Full Text Available A parallel machine scheduling problem in plastic production is studied in this paper. In this problem, the processing time and arrival time are uncertain but lie in their respective intervals. In addition, each job must be processed together with a mold while jobs which belong to one family can share the same mold. Therefore, time changing mold is required for two consecutive jobs that belong to different families, which is known as sequence-dependent setup time. This paper aims to identify a robust schedule by min–max regret criterion. It is proved that the scenario incurring maximal regret for each feasible solution lies in finite extreme scenarios. A mixed integer linear programming formulation and an exact algorithm are proposed to solve the problem. Moreover, a modified artificial bee colony algorithm is developed to solve large-scale problems. The performance of the presented algorithm is evaluated through extensive computational experiments and the results show that the proposed algorithm surpasses the exact method in terms of objective value and computational time.

  9. Hybrid Black Hole Algorithm for Bi-Criteria Job Scheduling on Parallel Machines

    Directory of Open Access Journals (Sweden)

    Kawal Jeet

    2016-04-01

    Full Text Available Nature-inspired algorithms are recently being appreciated for solving complex optimization and engineering problems. Black hole algorithm is one of the recent nature-inspired algorithms that have obtained inspiration from black hole theory of universe. In this paper, four formulations of multi-objective black hole algorithm have been developed by using combination of weighted objectives, use of secondary storage for managing possible solutions and use of Genetic Algorithm (GA. These formulations are further applied for scheduling jobs on parallel machines while optimizing bi-criteria namely maximum tardiness and weighted flow time. It has been empirically verified that GA based multi-objective Black Hole algorithms leads to better results as compared to their counterparts. Also the use of combination of secondary storage and GA further improves the resulting job sequence. The proposed algorithms are further compared to some of the existing algorithms, and empirically found to be better. The results have been validated by numerical illustrations and statistical tests.

  10. Modeling cardiovascular hemodynamics using the lattice Boltzmann method on massively parallel supercomputers

    Science.gov (United States)

    Randles, Amanda Elizabeth

    the modeling of fluids in vessels with smaller diameters and a method for introducing the deformational forces exerted on the arterial flows from the movement of the heart by borrowing concepts from cosmodynamics are presented. These additional forces have a great impact on the endothelial shear stress. Third, the fluid model is extended to not only recover Navier-Stokes hydrodynamics, but also a wider range of Knudsen numbers, which is especially important in micro- and nano-scale flows. The tradeoffs of many optimizations methods such as the use of deep halo level ghost cells that, alongside hybrid programming models, reduce the impact of such higher-order models and enable efficient modeling of extreme regimes of computational fluid dynamics are discussed. Fourth, the extension of these models to other research questions like clogging in microfluidic devices and determining the severity of co-arctation of the aorta is presented. Through this work, a validation of these methods by taking real patient data and the measured pressure value before the narrowing of the aorta and predicting the pressure drop across the co-arctation is shown. Comparison with the measured pressure drop in vivo highlights the accuracy and potential impact of such patient specific simulations. Finally, a method to enable the simulation of longer trajectories in time by discretizing both spatially and temporally is presented. In this method, a serial coarse iterator is used to initialize data at discrete time steps for a fine model that runs in parallel. This coarse solver is based on a larger time step and typically a coarser discretization in space. Iterative refinement enables the compute-intensive fine iterator to be modeled with temporal parallelization. The algorithm consists of a series of prediction-corrector iterations completing when the results have converged within a certain tolerance. Combined, these developments allow large fluid models to be simulated for longer time durations

  11. Massively Parallelized Pollen Tube Guidance and Mechanical Measurements on a Lab-on-a-Chip Platform.

    Science.gov (United States)

    Shamsudhin, Naveen; Laeubli, Nino; Atakan, Huseyin Baris; Vogler, Hannes; Hu, Chengzhi; Haeberle, Walter; Sebastian, Abu; Grossniklaus, Ueli; Nelson, Bradley J

    2016-01-01

    Pollen tubes are used as a model in the study of plant morphogenesis, cellular differentiation, cell wall biochemistry, biomechanics, and intra- and intercellular signaling. For a "systems-understanding" of the bio-chemo-mechanics of tip-polarized growth in pollen tubes, the need for a versatile, experimental assay platform for quantitative data collection and analysis is critical. We introduce a Lab-on-a-Chip (LoC) concept for high-throughput pollen germination and pollen tube guidance for parallelized optical and mechanical measurements. The LoC localizes a large number of growing pollen tubes on a single plane of focus with unidirectional tip-growth, enabling high-resolution quantitative microscopy. This species-independent LoC platform can be integrated with micro-/nano-indentation systems, such as the cellular force microscope (CFM) or the atomic force microscope (AFM), allowing for rapid measurements of cell wall stiffness of growing tubes. As a demonstrative example, we show the growth and directional guidance of hundreds of lily (Lilium longiflorum) and Arabidopsis (Arabidopsis thaliana) pollen tubes on a single LoC microscopy slide. Combining the LoC with the CFM, we characterized the cell wall stiffness of lily pollen tubes. Using the stiffness statistics and finite-element-method (FEM)-based approaches, we computed an effective range of the linear elastic moduli of the cell wall spanning the variability space of physiological parameters including internal turgor, cell wall thickness, and tube diameter. We propose the LoC device as a versatile and high-throughput phenomics platform for plant reproductive and development biology using the pollen tube as a model.

  12. Massively parallel tag sequencing reveals the complexity of anaerobic marine protistan communities

    Directory of Open Access Journals (Sweden)

    Chistoserdov Andrei

    2009-11-01

    Full Text Available Abstract Background Recent advances in sequencing strategies make possible unprecedented depth and scale of sampling for molecular detection of microbial diversity. Two major paradigm-shifting discoveries include the detection of bacterial diversity that is one to two orders of magnitude greater than previous estimates, and the discovery of an exciting 'rare biosphere' of molecular signatures ('species' of poorly understood ecological significance. We applied a high-throughput parallel tag sequencing (454 sequencing protocol adopted for eukaryotes to investigate protistan community complexity in two contrasting anoxic marine ecosystems (Framvaren Fjord, Norway; Cariaco deep-sea basin, Venezuela. Both sampling sites have previously been scrutinized for protistan diversity by traditional clone library construction and Sanger sequencing. By comparing these clone library data with 454 amplicon library data, we assess the efficiency of high-throughput tag sequencing strategies. We here present a novel, highly conservative bioinformatic analysis pipeline for the processing of large tag sequence data sets. Results The analyses of ca. 250,000 sequence reads revealed that the number of detected Operational Taxonomic Units (OTUs far exceeded previous richness estimates from the same sites based on clone libraries and Sanger sequencing. More than 90% of this diversity was represented by OTUs with less than 10 sequence tags. We detected a substantial number of taxonomic groups like Apusozoa, Chrysomerophytes, Centroheliozoa, Eustigmatophytes, hyphochytriomycetes, Ichthyosporea, Oikomonads, Phaeothamniophytes, and rhodophytes which remained undetected by previous clone library-based diversity surveys of the sampling sites. The most important innovations in our newly developed bioinformatics pipeline employ (i BLASTN with query parameters adjusted for highly variable domains and a complete database of public ribosomal RNA (rRNA gene sequences for taxonomic

  13. Efficient Parallel Sorting for Migrating Birds Optimization When Solving Machine-Part Cell Formation Problems

    National Research Council Canada - National Science Library

    Soto, Ricardo; Crawford, Broderick; Almonacid, Boris; Paredes, Fernando

    2016-01-01

    The Machine-Part Cell Formation Problem (MPCFP) is a NP-Hard optimization problem that consists in grouping machines and parts in a set of cells, so that each cell can operate independently and the intercell movements are minimized...

  14. Modular and efficient ozone systems based on massively parallel chemical processing in microchannel plasma arrays: performance and commercialization

    Science.gov (United States)

    Kim, M.-H.; Cho, J. H.; Park, S.-J.; Eden, J. G.

    2017-08-01

    Plasmachemical systems based on the production of a specific molecule (O3) in literally thousands of microchannel plasmas simultaneously have been demonstrated, developed and engineered over the past seven years, and commercialized. At the heart of this new plasma technology is the plasma chip, a flat aluminum strip fabricated by photolithographic and wet chemical processes and comprising 24-48 channels, micromachined into nanoporous aluminum oxide, with embedded electrodes. By integrating 4-6 chips into a module, the mass output of an ozone microplasma system is scaled linearly with the number of modules operating in parallel. A 115 g/hr (2.7 kg/day) ozone system, for example, is realized by the combined output of 18 modules comprising 72 chips and 1,800 microchannels. The implications of this plasma processing architecture for scaling ozone production capability, and reducing capital and service costs when introducing redundancy into the system, are profound. In contrast to conventional ozone generator technology, microplasma systems operate reliably (albeit with reduced output) in ambient air and humidity levels up to 90%, a characteristic attributable to the water adsorption/desorption properties and electrical breakdown strength of nanoporous alumina. Extensive testing has documented chip and system lifetimes (MTBF) beyond 5,000 hours, and efficiencies >130 g/kWh when oxygen is the feedstock gas. Furthermore, the weight and volume of microplasma systems are a factor of 3-10 lower than those for conventional ozone systems of comparable output. Massively-parallel plasmachemical processing offers functionality, performance, and commercial value beyond that afforded by conventional technology, and is currently in operation in more than 30 countries worldwide.

  15. A bumpy ride on the diagnostic bench of massive parallel sequencing, the case of the mitochondrial genome.

    Directory of Open Access Journals (Sweden)

    Kim Vancampenhout

    Full Text Available The advent of massive parallel sequencing (MPS has revolutionized the field of human molecular genetics, including the diagnostic study of mitochondrial (mt DNA dysfunction. The analysis of the complete mitochondrial genome using MPS platforms is now common and will soon outrun conventional sequencing. However, the development of a robust and reliable protocol is rather challenging. A previous pilot study for the re-sequencing of human mtDNA revealed an uneven coverage, affecting predominantly part of the plus strand. In an attempt to address this problem, we undertook a comparative study of standard and modified protocols for the Ion Torrent PGM system. We could not improve strand representation by altering the recommended shearing methodology of the standard workflow or omitting the DNA polymerase amplification step from the library construction process. However, we were able to associate coverage bias of the plus strand with a specific sequence motif. Additionally, we compared coverage and variant calling across technologies. The same samples were also sequenced on a MiSeq device which showed that coverage and heteroplasmic variant calling were much improved.

  16. Extended testing of a general contextual classifier using the massively parallel processor - Preliminary results and test plans. [for thematic mapping

    Science.gov (United States)

    Tilton, J. C.

    1985-01-01

    Earlier encouraging test results of a contextual classifier that combines spatial and spectral information employing a general statistical approach are expanded. The earlier results were of limited meaning because they were produced from small (50-by-50 pixel) data sets. An implementation of the contextual classifier on NASA Goddard's Massively Parallel Processor (MPP) is presented; for the first time the MPP makes feasible the testing of the classifier on large data sets (a 12-hour test on a VAX-11/780 minicomputer now takes 5 minutes on the MPP). The MPP is a Single-Instruction, Multiple Data Stream computer, consisting of 16,384 bit serial microprocessors connected in a 128-by-128 mesh array with each element having data transfer connections with its four nearest neighbors so that the MPP is capable of billions of operations per second. Preliminary results are given (with more expected for the conference) and plans are mentioned for extended testing of the contextual classifier on Thematic Mapper data sets.

  17. LiNbO3: A photovoltaic substrate for massive parallel manipulation and patterning of nano-objects

    Science.gov (United States)

    Carrascosa, M.; García-Cabañes, A.; Jubera, M.; Ramiro, J. B.; Agulló-López, F.

    2015-12-01

    The application of evanescent photovoltaic (PV) fields, generated by visible illumination of Fe:LiNbO3 substrates, for parallel massive trapping and manipulation of micro- and nano-objects is critically reviewed. The technique has been often referred to as photovoltaic or photorefractive tweezers. The main advantage of the new method is that the involved electrophoretic and/or dielectrophoretic forces do not require any electrodes and large scale manipulation of nano-objects can be easily achieved using the patterning capabilities of light. The paper describes the experimental techniques for particle trapping and the main reported experimental results obtained with a variety of micro- and nano-particles (dielectric and conductive) and different illumination configurations (single beam, holographic geometry, and spatial light modulator projection). The report also pays attention to the physical basis of the method, namely, the coupling of the evanescent photorefractive fields to the dielectric response of the nano-particles. The role of a number of physical parameters such as the contrast and spatial periodicities of the illumination pattern or the particle deposition method is discussed. Moreover, the main properties of the obtained particle patterns in relation to potential applications are summarized, and first demonstrations reviewed. Finally, the PV method is discussed in comparison to other patterning strategies, such as those based on the pyroelectric response and the electric fields associated to domain poling of ferroelectric materials.

  18. Non-CAR resists and advanced materials for Massively Parallel E-Beam Direct Write process integration

    Science.gov (United States)

    Pourteau, Marie-Line; Servin, Isabelle; Lepinay, Kévin; Essomba, Cyrille; Dal'Zotto, Bernard; Pradelles, Jonathan; Lattard, Ludovic; Brandt, Pieter; Wieland, Marco

    2016-03-01

    The emerging Massively Parallel-Electron Beam Direct Write (MP-EBDW) is an attractive high resolution high throughput lithography technology. As previously shown, Chemically Amplified Resists (CARs) meet process/integration specifications in terms of dose-to-size, resolution, contrast, and energy latitude. However, they are still limited by their line width roughness. To overcome this issue, we tested an alternative advanced non-CAR and showed it brings a substantial gain in sensitivity compared to CAR. We also implemented and assessed in-line post-lithographic treatments for roughness mitigation. For outgassing-reduction purpose, a top-coat layer is added to the total process stack. A new generation top-coat was tested and showed improved printing performances compared to the previous product, especially avoiding dark erosion: SEM cross-section showed a straight pattern profile. A spin-coatable charge dissipation layer based on conductive polyaniline has also been tested for conductivity and lithographic performances, and compatibility experiments revealed that the underlying resist type has to be carefully chosen when using this product. Finally, the Process Of Reference (POR) trilayer stack defined for 5 kV multi-e-beam lithography was successfully etched with well opened and straight patterns, and no lithography-etch bias.

  19. LiNbO{sub 3}: A photovoltaic substrate for massive parallel manipulation and patterning of nano-objects

    Energy Technology Data Exchange (ETDEWEB)

    Carrascosa, M.; García-Cabañes, A.; Jubera, M. [Dept. Física de Materiales, Universidad Autónoma de Madrid, Madrid 28049 (Spain); Ramiro, J. B. [Dept. Mecánica de Fluidos y Propulsión Aeroespacial, Universidad Politécnica de Madrid, Madrid 28040 (Spain); Agulló-López, F. [Centro de Microanálisis de Materiales (CMAM), Universidad Autónoma de Madrid, Madrid 28049 (Spain)

    2015-12-15

    The application of evanescent photovoltaic (PV) fields, generated by visible illumination of Fe:LiNbO{sub 3} substrates, for parallel massive trapping and manipulation of micro- and nano-objects is critically reviewed. The technique has been often referred to as photovoltaic or photorefractive tweezers. The main advantage of the new method is that the involved electrophoretic and/or dielectrophoretic forces do not require any electrodes and large scale manipulation of nano-objects can be easily achieved using the patterning capabilities of light. The paper describes the experimental techniques for particle trapping and the main reported experimental results obtained with a variety of micro- and nano-particles (dielectric and conductive) and different illumination configurations (single beam, holographic geometry, and spatial light modulator projection). The report also pays attention to the physical basis of the method, namely, the coupling of the evanescent photorefractive fields to the dielectric response of the nano-particles. The role of a number of physical parameters such as the contrast and spatial periodicities of the illumination pattern or the particle deposition method is discussed. Moreover, the main properties of the obtained particle patterns in relation to potential applications are summarized, and first demonstrations reviewed. Finally, the PV method is discussed in comparison to other patterning strategies, such as those based on the pyroelectric response and the electric fields associated to domain poling of ferroelectric materials.

  20. Performance of microarray and liquid based capture methods for target enrichment for massively parallel sequencing and SNP discovery.

    Directory of Open Access Journals (Sweden)

    Anna Kiialainen

    Full Text Available Targeted sequencing is a cost-efficient way to obtain answers to biological questions in many projects, but the choice of the enrichment method to use can be difficult. In this study we compared two hybridization methods for target enrichment for massively parallel sequencing and single nucleotide polymorphism (SNP discovery, namely Nimblegen sequence capture arrays and the SureSelect liquid-based hybrid capture system. We prepared sequencing libraries from three HapMap samples using both methods, sequenced the libraries on the Illumina Genome Analyzer, mapped the sequencing reads back to the genome, and called variants in the sequences. 74-75% of the sequence reads originated from the targeted region in the SureSelect libraries and 41-67% in the Nimblegen libraries. We could sequence up to 99.9% and 99.5% of the regions targeted by capture probes from the SureSelect libraries and from the Nimblegen libraries, respectively. The Nimblegen probes covered 0.6 Mb more of the original 3.1 Mb target region than the SureSelect probes. In each sample, we called more SNPs and detected more novel SNPs from the libraries that were prepared using the Nimblegen method. Thus the Nimblegen method gave better results when judged by the number of SNPs called, but this came at the cost of more over-sampling.

  1. Massively parallel and highly quantitative single-particle analysis on interactions between nanoparticles on supported lipid bilayer.

    Science.gov (United States)

    Lee, Young Kwang; Kim, Sungi; Oh, Jeong-Wook; Nam, Jwa-Min

    2014-03-12

    Observation of individual single-nanoparticle reactions provides direct information and insight for many complex chemical, physical, and biological processes, but this is utterly challenging with conventional high-resolution imaging techniques on conventional platforms. Here, we developed a photostable plasmonic nanoparticle-modified supported lipid bilayer (PNP-SLB) platform that allows for massively parallel in situ analysis of the interactions between nanoparticles with single-particle resolution on a two-dimensional (2D) fluidic surface. Each particle-by-particle PNP clustering process was monitored in real time and quantified via analysis of individual particle diffusion trajectories and single-particle-level plasmonic coupling. Importantly, the PNP-SLB-based nanoparticle cluster growth kinetics result was fitted well. As an application example, we performed a DNA detection assay, and the result suggests that our approach has very promising sensitivity and dynamic range (high attomolar to high femtomolar) without optimization, as well as remarkable single-base mismatch discrimination capability. The method shown herein can be readily applied for many different types of intermolecular and interparticle interactions and provide convenient tools and new insights for studying dynamic interactions on a highly controllable and analytical platform.

  2. Massively parallel E-beam inspection: enabling next-generation patterned defect inspection for wafer and mask manufacturing

    Science.gov (United States)

    Malloy, Matt; Thiel, Brad; Bunday, Benjamin D.; Wurm, Stefan; Mukhtar, Maseeh; Quoi, Kathy; Kemen, Thomas; Zeidler, Dirk; Eberle, Anna Lena; Garbowski, Tomasz; Dellemann, Gregor; Peters, Jan Hendrik

    2015-03-01

    SEMATECH aims to identify and enable disruptive technologies to meet the ever-increasing demands of semiconductor high volume manufacturing (HVM). As such, a program was initiated in 2012 focused on high-speed e-beam defect inspection as a complement, and eventual successor, to bright field optical patterned defect inspection [1]. The primary goal is to enable a new technology to overcome the key gaps that are limiting modern day inspection in the fab; primarily, throughput and sensitivity to detect ultra-small critical defects. The program specifically targets revolutionary solutions based on massively parallel e-beam technologies, as opposed to incremental improvements to existing e-beam and optical inspection platforms. Wafer inspection is the primary target, but attention is also being paid to next generation mask inspection. During the first phase of the multi-year program multiple technologies were reviewed, a down-selection was made to the top candidates, and evaluations began on proof of concept systems. A champion technology has been selected and as of late 2014 the program has begun to move into the core technology maturation phase in order to enable eventual commercialization of an HVM system. Performance data from early proof of concept systems will be shown along with roadmaps to achieving HVM performance. SEMATECH's vision for moving from early-stage development to commercialization will be shown, including plans for development with industry leading technology providers.

  3. Combined fragment molecular orbital cluster in molecule approach to massively parallel electron correlation calculations for large systems.

    Science.gov (United States)

    Findlater, Alexander D; Zahariev, Federico; Gordon, Mark S

    2015-04-16

    The local correlation "cluster-in-molecule" (CIM) method is combined with the fragment molecular orbital (FMO) method, providing a flexible, massively parallel, and near-linear scaling approach to the calculation of electron correlation energies for large molecular systems. Although the computational scaling of the CIM algorithm is already formally linear, previous knowledge of the Hartree-Fock (HF) reference wave function and subsequent localized orbitals is required; therefore, extending the CIM method to arbitrarily large systems requires the aid of low-scaling/linear-scaling approaches to HF and orbital localization. Through fragmentation, the combined FMO-CIM method linearizes the scaling, with respect to system size, of the HF reference and orbital localization calculations, achieving near-linear scaling at both the reference and electron correlation levels. For the 20-residue alanine α helix, the preliminary implementation of the FMO-CIM method captures 99.6% of the MP2 correlation energy, requiring 21% of the MP2 wall time. The new method is also applied to solvated adamantine to illustrate the multilevel capability of the FMO-CIM method.

  4. External Quality Assessment for Detection of Fetal Trisomy 21, 18, and 13 by Massively Parallel Sequencing in Clinical Laboratories.

    Science.gov (United States)

    Zhang, Rui; Zhang, Hongyun; Li, Yulong; Han, Yanxi; Xie, Jiehong; Li, Jinming

    2016-03-01

    An external quality assessment for detection of trisomy 21, 18, and 13 by massively parallel sequencing was implemented by the National Center for Clinical Laboratories of People's Republic of China in 2014. Simulated samples were prepared by mixing fragmented abnormal DNA with plasma from non-pregnant women. The external quality assessment panel, comprising 5 samples from pregnant healthy women, 2 samples with sex chromosome aneuploidies, and 13 samples with different concentrations of fetal fractions positive for trisomy 21, 18, and 13, was then distributed to participating laboratories. In total, 55.6% (47 of 84) of respondents correctly identified each of the samples in the panel. Seventeen false-negative and 87 gray zone results were reported, most [102 of 104 (98.1%)] of which were derived from for trisomy samples with effective fetal fractions trisomy sample generated by BGISEQ-100. Overall, most clinical laboratories detected samples containing effective fetal fractions >4%. Our study shows need for further laboratory training in the management of samples with low fetal fractions. For some assays, precision of Z values needs to be improved.

  5. HLA-F coding and regulatory segments variability determined by massively parallel sequencing procedures in a Brazilian population sample.

    Science.gov (United States)

    Lima, Thálitta Hetamaro Ayala; Buttura, Renato Vidal; Donadi, Eduardo Antônio; Veiga-Castelli, Luciana Caricati; Mendes-Junior, Celso Teixeira; Castelli, Erick C

    2016-10-01

    Human Leucocyte Antigen F (HLA-F) is a non-classical HLA class I gene distinguished from its classical counterparts by low allelic polymorphism and distinctive expression patterns. Its exact function remains unknown. It is believed that HLA-F has tolerogenic and immune modulatory properties. Currently, there is little information regarding the HLA-F allelic variation among human populations and the available studies have evaluated only a fraction of the HLA-F gene segment and/or have searched for known alleles only. Here we present a strategy to evaluate the complete HLA-F variability including its 5' upstream, coding and 3' downstream segments by using massively parallel sequencing procedures. HLA-F variability was surveyed on 196 individuals from the Brazilian Southeast. The results indicate that the HLA-F gene is indeed conserved at the protein level, where thirty coding haplotypes or coding alleles were detected, encoding only four different HLA-F full-length protein molecules. Moreover, a same protein molecule is encoded by 82.45% of all coding alleles detected in this Brazilian population sample. However, the HLA-F nucleotide and haplotype variability is much higher than our current knowledge both in Brazilians and considering the 1000 Genomes Project data. This protein conservation is probably a consequence of the key role of HLA-F in the immune system physiology.

  6. Discussion on "Techniques for Massive-Data Machine Learning in Astronomy" by A. Gray

    CERN Document Server

    Ball, Nicholas M

    2011-01-01

    Astronomy is increasingly encountering two fundamental truths: (1) The field is faced with the task of extracting useful information from extremely large, complex, and high dimensional datasets; (2) The techniques of astroinformatics and astrostatistics are the only way to make this tractable, and bring the required level of sophistication to the analysis. Thus, an approach which provides these tools in a way that scales to these datasets is not just desirable, it is vital. The expertise required spans not just astronomy, but also computer science, statistics, and informatics. As a computer scientist and expert in machine learning, Alex's contribution of expertise and a large number of fast algorithms designed to scale to large datasets, is extremely welcome. We focus in this discussion on the questions raised by the practical application of these algorithms to real astronomical datasets. That is, what is needed to maximally leverage their potential to improve the science return? This is not a trivial task. W...

  7. Concise review of relaxations and approximation algorithms for nonidentical parallel-machine scheduling to minimize total weighted completion times

    Institute of Scientific and Technical Information of China (English)

    Li Kai; Yang Shanlin

    2008-01-01

    A class of nonidentical parallel machine scheduling problems are considered in which the goal is to minimize the total weighted completion time.Models and relaxations are collected.Most of these problems are NP-hard,in the strong sense,or open problems,therefore approximation algorithms are studied.The review reveals that there exist some potential areas worthy of further research.

  8. A note on the paper "Minimizing total tardiness on parallel machines with preemptions" by Kravchenko and Werner [2010

    CERN Document Server

    Prot, D; Lahlou, C

    2011-01-01

    In this note, we point out two major errors in the paper "Minimizing total tardiness on parallel machines with preemptions" by Kravchenko and Werner [2010]. More precisely, they proved that both problems P|pmtn|sum(Tj) and P|rj, pj = p, pmtn|sum(Tj) are NP-Hard. We give a counter-example to their proofs, letting the complexity of these two problems open.

  9. Parallel machine scheduling with step-deteriorating jobs and setup times by a hybrid discrete cuckoo search algorithm

    Science.gov (United States)

    Guo, Peng; Cheng, Wenming; Wang, Yi

    2015-11-01

    This article considers the parallel machine scheduling problem with step-deteriorating jobs and sequence-dependent setup times. The objective is to minimize the total tardiness by determining the allocation and sequence of jobs on identical parallel machines. In this problem, the processing time of each job is a step function dependent upon its starting time. An individual extended time is penalized when the starting time of a job is later than a specific deterioration date. The possibility of deterioration of a job makes the parallel machine scheduling problem more challenging than ordinary ones. A mixed integer programming model for the optimal solution is derived. Due to its NP-hard nature, a hybrid discrete cuckoo search algorithm is proposed to solve this problem. In order to generate a good initial swarm, a modified Biskup-Hermann-Gupta (BHG) heuristic called MBHG is incorporated into the population initialization. Several discrete operators are proposed in the random walk of Lévy flights and the crossover search. Moreover, a local search procedure based on variable neighbourhood descent is integrated into the algorithm as a hybrid strategy in order to improve the quality of elite solutions. Computational experiments are executed on two sets of randomly generated test instances. The results show that the proposed hybrid algorithm can yield better solutions in comparison with the commercial solver CPLEX® with a one hour time limit, the discrete cuckoo search algorithm and the existing variable neighbourhood search algorithm.

  10. Enabling inspection solutions for future mask technologies through the development of massively parallel E-Beam inspection

    Science.gov (United States)

    Malloy, Matt; Thiel, Brad; Bunday, Benjamin D.; Wurm, Stefan; Jindal, Vibhu; Mukhtar, Maseeh; Quoi, Kathy; Kemen, Thomas; Zeidler, Dirk; Eberle, Anna Lena; Garbowski, Tomasz; Dellemann, Gregor; Peters, Jan Hendrik

    2015-09-01

    The new device architectures and materials being introduced for sub-10nm manufacturing, combined with the complexity of multiple patterning and the need for improved hotspot detection strategies, have pushed current wafer inspection technologies to their limits. In parallel, gaps in mask inspection capability are growing as new generations of mask technologies are developed to support these sub-10nm wafer manufacturing requirements. In particular, the challenges associated with nanoimprint and extreme ultraviolet (EUV) mask inspection require new strategies that enable fast inspection at high sensitivity. The tradeoffs between sensitivity and throughput for optical and e-beam inspection are well understood. Optical inspection offers the highest throughput and is the current workhorse of the industry for both wafer and mask inspection. E-beam inspection offers the highest sensitivity but has historically lacked the throughput required for widespread adoption in the manufacturing environment. It is unlikely that continued incremental improvements to either technology will meet tomorrow's requirements, and therefore a new inspection technology approach is required; one that combines the high-throughput performance of optical with the high-sensitivity capabilities of e-beam inspection. To support the industry in meeting these challenges SUNY Poly SEMATECH has evaluated disruptive technologies that can meet the requirements for high volume manufacturing (HVM), for both the wafer fab [1] and the mask shop. Highspeed massively parallel e-beam defect inspection has been identified as the leading candidate for addressing the key gaps limiting today's patterned defect inspection techniques. As of late 2014 SUNY Poly SEMATECH completed a review, system analysis, and proof of concept evaluation of multiple e-beam technologies for defect inspection. A champion approach has been identified based on a multibeam technology from Carl Zeiss. This paper includes a discussion on the

  11. Parallel machine scheduling with release dates, due dates and family setup times

    NARCIS (Netherlands)

    Schutten, J.M.J.; Leussink, R.A.M.

    1996-01-01

    In manufacturing, there is a fundamental conflict between efficient production and delivery performance. Maximizing machine utilization by batching similar jobs may lead to poor delivery performance. Minimizing customers' dissatisfaction may lead to an inefficient use of the machines. In this paper,

  12. Monte Carlo standardless approach for laser induced breakdown spectroscopy based on massive parallel graphic processing unit computing

    Science.gov (United States)

    Demidov, A.; Eschlböck-Fuchs, S.; Kazakov, A. Ya.; Gornushkin, I. B.; Kolmhofer, P. J.; Pedarnig, J. D.; Huber, N.; Heitz, J.; Schmid, T.; Rössler, R.; Panne, U.

    2016-11-01

    The improved Monte-Carlo (MC) method for standard-less analysis in laser induced breakdown spectroscopy (LIBS) is presented. Concentrations in MC LIBS are found by fitting model-generated synthetic spectra to experimental spectra. The current version of MC LIBS is based on the graphic processing unit (GPU) computation and reduces the analysis time down to several seconds per spectrum/sample. The previous version of MC LIBS which was based on the central processing unit (CPU) computation requested unacceptably long analysis times of 10's minutes per spectrum/sample. The reduction of the computational time is achieved through the massively parallel computing on the GPU which embeds thousands of co-processors. It is shown that the number of iterations on the GPU exceeds that on the CPU by a factor > 1000 for the 5-dimentional parameter space and yet requires > 10-fold shorter computational time. The improved GPU-MC LIBS outperforms the CPU-MS LIBS in terms of accuracy, precision, and analysis time. The performance is tested on LIBS-spectra obtained from pelletized powders of metal oxides consisting of CaO, Fe2O3, MgO, and TiO2 that simulated by-products of steel industry, steel slags. It is demonstrated that GPU-based MC LIBS is capable of rapid multi-element analysis with relative error between 1 and 10's percent that is sufficient for industrial applications (e.g. steel slag analysis). The results of the improved GPU-based MC LIBS are positively compared to that of the CPU-based MC LIBS as well as to the results of the standard calibration-free (CF) LIBS based on the Boltzmann plot method.

  13. Genome evolution and meiotic maps by massively parallel DNA sequencing: spotted gar, an outgroup for the teleost genome duplication.

    Science.gov (United States)

    Amores, Angel; Catchen, Julian; Ferrara, Allyse; Fontenot, Quenton; Postlethwait, John H

    2011-08-01

    Genomic resources for hundreds of species of evolutionary, agricultural, economic, and medical importance are unavailable due to the expense of well-assembled genome sequences and difficulties with multigenerational studies. Teleost fish provide many models for human disease but possess anciently duplicated genomes that sometimes obfuscate connectivity. Genomic information representing a fish lineage that diverged before the teleost genome duplication (TGD) would provide an outgroup for exploring the mechanisms of evolution after whole-genome duplication. We exploited massively parallel DNA sequencing to develop meiotic maps with thrift and speed by genotyping F(1) offspring of a single female and a single male spotted gar (Lepisosteus oculatus) collected directly from nature utilizing only polymorphisms existing in these two wild individuals. Using Stacks, software that automates the calling of genotypes from polymorphisms assayed by Illumina sequencing, we constructed a map containing 8406 markers. RNA-seq on two map-cross larvae provided a reference transcriptome that identified nearly 1000 mapped protein-coding markers and allowed genome-wide analysis of conserved synteny. Results showed that the gar lineage diverged from teleosts before the TGD and its genome is organized more similarly to that of humans than teleosts. Thus, spotted gar provides a critical link between medical models in teleost fish, to which gar is biologically similar, and humans, to which gar is genomically similar. Application of our F(1) dense mapping strategy to species with no prior genome information promises to facilitate comparative genomics and provide a scaffold for ordering the numerous contigs arising from next generation genome sequencing.

  14. Investigating the effect of two methane-mitigating diets on the rumen microbiome using massively parallel sequencing.

    Science.gov (United States)

    Ross, E M; Moate, P J; Marett, L; Cocks, B G; Hayes, B J

    2013-09-01

    Variation in the composition of microorganisms in the rumen (the rumen microbiome) of dairy cattle (Bos taurus) is of great interest because of possible links to methane emission levels. Feed additives are one method being investigated to reduce enteric methane production by dairy cattle. Here we report the effect of 2 methane-mitigating feed additives (grapemarc and a combination of lipids and tannin) on rumen microbiome profiles of Holstein dairy cattle. We used untargeted (shotgun) massively parallel sequencing of microbes present in rumen fluid to generate quantitative rumen microbiome profiles. We observed large effects of the feed additives on the rumen microbiome profiles using multiple approaches, including linear mixed modeling, hierarchical clustering, and metagenomic predictions. The effect on the fecal microbiome profiles was not detectable using hierarchical clustering, but was significant in the linear mixed model and when metagenomic predictions were used, suggesting a more subtle effect of the diets on the lower gastrointestinal microbiome. A differential representation analysis (analogous to differential expression in RNA sequencing) showed significant overlap in the contigs (which are genome fragments representing different microorganism species) that were differentially represented between experiments. These similarities suggest that, despite the different additives used, the 2 diets assessed in this investigation altered the microbiomes of the samples in similar ways. Contigs that were differentially represented in both experiments were tested for associations with methane production in an independent set of animals. These animals were not treated with a methane-mitigating diet, but did show substantial natural variation in methane emission levels. The contigs that were significantly differentially represented in response to both dietary additives showed a significant enrichment for associations with methane production. This suggests that these

  15. Application of affymetrix array and massively parallel signature sequencing for identification of genes involved in prostate cancer progression

    Directory of Open Access Journals (Sweden)

    Eichner Lillian J

    2005-07-01

    Full Text Available Abstract Background Affymetrix GeneChip Array and Massively Parallel Signature Sequencing (MPSS are two high throughput methodologies used to profile transcriptomes. Each method has certain strengths and weaknesses; however, no comparison has been made between the data derived from Affymetrix arrays and MPSS. In this study, two lineage-related prostate cancer cell lines, LNCaP and C4-2, were used for transcriptome analysis with the aim of identifying genes associated with prostate cancer progression. Methods Affymetrix GeneChip array and MPSS analyses were performed. Data was analyzed with GeneSpring 6.2 and in-house perl scripts. Expression array results were verified with RT-PCR. Results Comparison of the data revealed that both technologies detected genes the other did not. In LNCaP, 3,180 genes were only detected by Affymetrix and 1,169 genes were only detected by MPSS. Similarly, in C4-2, 4,121 genes were only detected by Affymetrix and 1,014 genes were only detected by MPSS. Analysis of the combined transcriptomes identified 66 genes unique to LNCaP cells and 33 genes unique to C4-2 cells. Expression analysis of these genes in prostate cancer specimens showed CA1 to be highly expressed in bone metastasis but not expressed in primary tumor and EPHA7 to be expressed in normal prostate and primary tumor but not bone metastasis. Conclusion Our data indicates that transcriptome profiling with a single methodology will not fully assess the expression of all genes in a cell line. A combination of transcription profiling technologies such as DNA array and MPSS provides a more robust means to assess the expression profile of an RNA sample. Finally, genes that were differentially expressed in cell lines were also differentially expressed in primary prostate cancer and its metastases.

  16. Relationship Between Faults Oriented Parallel and Oblique to Bedding in Neogene Massive Siliceous Mudstones at The Horonobe Underground Research Laboratory, Japan

    Science.gov (United States)

    Hayano, Akira; Ishii, Eiichi

    2016-10-01

    This study investigates the mechanical relationship between bedding-parallel and bedding-oblique faults in a Neogene massive siliceous mudstone at the site of the Horonobe Underground Research Laboratory (URL) in Hokkaido, Japan, on the basis of observations of drill-core recovered from pilot boreholes and fracture mapping on shaft and gallery walls. Four bedding-parallel faults with visible fault gouge, named respectively the MM Fault, the Last MM Fault, the S1 Fault, and the S2 Fault (stratigraphically, from the highest to the lowest), were observed in two pilot boreholes (PB-V01 and SAB-1). The distribution of the bedding-parallel faults at 350 m depth in the Horonobe URL indicates that these faults are spread over at least several tens of meters in parallel along a bedding plane. The observation that the bedding-oblique fault displaces the Last MM fault is consistent with the previous interpretation that the bedding- oblique faults formed after the bedding-parallel faults. In addition, the bedding-parallel faults terminate near the MM and S1 faults, indicating that the bedding-parallel faults with visible fault gouge act to terminate the propagation of younger bedding-oblique faults. In particular, the MM and S1 faults, which have a relatively thick fault gouge, appear to have had a stronger control on the propagation of bedding-oblique faults than did the Last MM fault, which has a relatively thin fault gouge.

  17. An imperialist competitive algorithm for a bi-objective parallel machine scheduling problem with load balancing consideration

    Directory of Open Access Journals (Sweden)

    Mansooreh Madani-Isfahani

    2013-04-01

    Full Text Available In this paper, we present a new Imperialist Competitive Algorithm (ICA to solve a bi-objective scheduling of parallel-unrelated machines where setup times are sequence dependent. The objectives include mean completion tasks and mean squares of deviations from machines workload from their averages. The performance of the proposed ICA (PICA method is examined using some randomly generated data and they are compared with three alternative methods including particle swarm optimization (PSO, original version of imperialist competitive algorithm (OICA and genetic algorithm (GA in terms of the objective function values. The preliminary results indicate that the proposed study outperforms other alternative methods. In addition, while OICA performs the worst as alternative solution strategy, PSO and GA seem to perform better.

  18. Massively Parallel Assimilation of TOGA/TAO and Topex/Poseidon Measurements into a Quasi Isopycnal Ocean General Circulation Model Using an Ensemble Kalman Filter

    Science.gov (United States)

    Keppenne, Christian L.; Rienecker, Michele; Borovikov, Anna Y.; Suarez, Max

    1999-01-01

    A massively parallel ensemble Kalman filter (EnKF)is used to assimilate temperature data from the TOGA/TAO array and altimetry from TOPEX/POSEIDON into a Pacific basin version of the NASA Seasonal to Interannual Prediction Project (NSIPP)ls quasi-isopycnal ocean general circulation model. The EnKF is an approximate Kalman filter in which the error-covariance propagation step is modeled by the integration of multiple instances of a numerical model. An estimate of the true error covariances is then inferred from the distribution of the ensemble of model state vectors. This inplementation of the filter takes advantage of the inherent parallelism in the EnKF algorithm by running all the model instances concurrently. The Kalman filter update step also occurs in parallel by having each processor process the observations that occur in the region of physical space for which it is responsible. The massively parallel data assimilation system is validated by withholding some of the data and then quantifying the extent to which the withheld information can be inferred from the assimilation of the remaining data. The distributions of the forecast and analysis error covariances predicted by the ENKF are also examined.

  19. Error modeling, sensitivity analysis and assembly process of a class of 3-DOF parallel kinematic machines with parallelogram struts

    Institute of Scientific and Technical Information of China (English)

    2002-01-01

    This paper presents an error modeling methodology that enables the tolerance design, assembly and kinematic calibration of a class of 3-DOF parallel kinematic machines with parallelogram struts to be integrated into a unified framework. The error mapping function is formulated to identify the source errors affecting the uncompensable pose error. The sensitivity analysis in the sense of statistics is also carried out to investigate the influences of source errors on the pose accuracy. An assembly process that can effectively minimize the uncompensable pose error is proposed as one of the results of this investigation.

  20. AN IMPROVED BRANCH-AND-BOUND ALGORITHM TO MINIMIZE THE WEIGHTED FLOWTIME ON IDENTICAL PARALLEL MACHINES WITH FAMILY SETUP TIMES

    Institute of Scientific and Technical Information of China (English)

    Belgacem BETTAYEB; Imed KACEM; Kondo H.ADJALLAH

    2008-01-01

    This article investigates identical parallel machines scheduling with family setup times. Theobjective function being the weighted sum of completion times, the problem is known to be strongly NP-hard. We propose a constructive heuristic algorithm and three complementary lower bounds. Two of these bounds proceed by elimination of setup times or by distributing each of them to jobs of the corresponding family, while the third one is based on a lagrangian relaxation. The bounds and the heuristic are incorporated into a branch-and-bound algorithm. Experimental results obtained outperform those of the methods presented in previous works, in term of size of solved problems.

  1. On the Parallel Elliptic Single/Multigrid Solutions about Aligned and Nonaligned Bodies Using the Virtual Machine for Multiprocessors

    Directory of Open Access Journals (Sweden)

    A. Averbuch

    1994-01-01

    Full Text Available Parallel elliptic single/multigrid solutions around an aligned and nonaligned body are presented and implemented on two multi-user and single-user shared memory multiprocessors (Sequent Symmetry and MOS and on a distributed memory multiprocessor (a Transputer network. Our parallel implementation uses the Virtual Machine for Muli-Processors (VMMP, a software package that provides a coherent set of services for explicitly parallel application programs running on diverse multiple instruction multiple data (MIMD multiprocessors, both shared memory and message passing. VMMP is intended to simplify parallel program writing and to promote portable and efficient programming. Furthermore, it ensures high portability of application programs by implementing the same services on all target multiprocessors. The performance of our algorithm is investigated in detail. It is seen to fit well the above architectures when the number of processors is less than the maximal number of grid points along the axes. In general, the efficiency in the nonaligned case is higher than in the aligned case. Alignment overhead is observed to be up to 200% in the shared-memory case and up to 65% in the message-passing case. We have demonstrated that when using VMMP, the portability of the algorithms is straightforward and efficient.

  2. An efficient parallel algorithm for O(N^2) direct summation method and its variations on distributed-memory parallel machines

    CERN Document Server

    Makino, J

    2001-01-01

    We present a novel, highly efficient algorithm to parallelize O(N^2) direct summation method for N-body problems with individual timesteps on distributed-memory parallel machines such as Beowulf clusters. Previously known algorithms, in which all processors have complete copies of the N-body system, has the serious problem that the communication-computation ratio increases as we increase the number of processors, since the communication cost is independent of the number of processors. In the new algorithm, p processors are organized as a $\\sqrt{p}\\times \\sqrt{p}$ two-dimensional array. Each processor has $N/\\sqrt{p}$ particles, but the data are distributed in such a way that complete system is presented if we look at any row or column consisting of $\\sqrt{p}$ processors. In this algorithm, the communication cost scales as $N /\\sqrt{p}$, while the calculation cost scales as $N^2/p$. Thus, we can use a much larger number of processors without losing efficiency compared to what was practical with previously know...

  3. A bound for the convergence rate of parallel tempering for sampling restricted Boltzmann machines

    DEFF Research Database (Denmark)

    Fischer, Asja; Igel, Christian

    2015-01-01

    Abstract Sampling from restricted Boltzmann machines (RBMs) is done by Markov chain Monte Carlo (MCMC) methods. The faster the convergence of the Markov chain, the more efficiently can high quality samples be obtained. This is also important for robust training of RBMs, which usually relies...

  4. Modelling and Control of Inverse Dynamics for a 5-DOF Parallel Kinematic Polishing Machine

    Directory of Open Access Journals (Sweden)

    Weiyang Lin

    2013-08-01

     /  control method is presented and investigated 2∞ in order to track the error control of the inverse dynamic model; the simulation results from different conditions show that the mixed  /  control method could 2∞ achieve an optimal and robust control performance. This work shows that the presented PKPM has a higher dynamic performance than conventional machine tools.

  5. A Three-Stage Optimization Algorithm for the Stochastic Parallel Machine Scheduling Problem with Adjustable Production Rates

    Directory of Open Access Journals (Sweden)

    Rui Zhang

    2013-01-01

    Full Text Available We consider a parallel machine scheduling problem with random processing/setup times and adjustable production rates. The objective functions to be minimized consist of two parts; the first part is related with the due date performance (i.e., the tardiness of the jobs, while the second part is related with the setting of machine speeds. Therefore, the decision variables include both the production schedule (sequences of jobs and the production rate of each machine. The optimization process, however, is significantly complicated by the stochastic factors in the manufacturing system. To address the difficulty, a simulation-based three-stage optimization framework is presented in this paper for high-quality robust solutions to the integrated scheduling problem. The first stage (crude optimization is featured by the ordinal optimization theory, the second stage (finer optimization is implemented with a metaheuristic called differential evolution, and the third stage (fine-tuning is characterized by a perturbation-based local search. Finally, computational experiments are conducted to verify the effectiveness of the proposed approach. Sensitivity analysis and practical implications are also discussed.

  6. MapReduce Based Parallel Neural Networks in Enabling Large Scale Machine Learning.

    Science.gov (United States)

    Liu, Yang; Yang, Jie; Huang, Yuan; Xu, Lixiong; Li, Siguang; Qi, Man

    2015-01-01

    Artificial neural networks (ANNs) have been widely used in pattern recognition and classification applications. However, ANNs are notably slow in computation especially when the size of data is large. Nowadays, big data has received a momentum from both industry and academia. To fulfill the potentials of ANNs for big data applications, the computation process must be speeded up. For this purpose, this paper parallelizes neural networks based on MapReduce, which has become a major computing model to facilitate data intensive applications. Three data intensive scenarios are considered in the parallelization process in terms of the volume of classification data, the size of the training data, and the number of neurons in the neural network. The performance of the parallelized neural networks is evaluated in an experimental MapReduce computer cluster from the aspects of accuracy in classification and efficiency in computation.

  7. A two-level real-time vision machine combining coarse and fine grained parallelism

    DEFF Research Database (Denmark)

    Jensen, Lars Baunegaard With; Kjær-Nielsen, Anders; Pauwels, Karl;

    2010-01-01

    In this paper, we describe a real-time vision machine having a stereo camera as input generating visual information on two different levels of abstraction. The system provides visual low-level and mid-level information in terms of dense stereo and optical flow, egomotion, indicating areas...... a factor 90 and a reduction of latency of a factor 26 compared to processing on a single CPU--core. Since the vision machine provides generic visual information it can be used in many contexts. Currently it is used in a driver assistance context as well as in two robotic applications....... with independently moving objects as well as a condensed geometric description of the scene. The system operates at more than 20 Hz using a hybrid architecture consisting of one dual--GPU card and one quad-core CPU. The different processing stages of visual information have rather different characteristics...

  8. Implementing of Massive Log Analysis System Based on Parallel Computing%基于并行计算的海量日志分析系统实现

    Institute of Scientific and Technical Information of China (English)

    白超; 杨静; 吴建国

    2013-01-01

    On the basis of analyzing log type and features deeply,design and implement a massive log processing system based on parallel computing.It adopts the method of cluster to collect log in parallel way,store in the distributed file system,and analyze log by parallel computing.The system achieves log collection and analysis through automated processing,can effectively carry on security maintenance,system performance optimization,system failure check after the system deployment.The system combines distributed and cloud computing solutions to improve efficiency of log processing,solves the major problems of massive logs processing effectively,provides a complete and effective solutions for massive logs processing.%通过深入研究日志的类型和特点,设计并实现了一套基于并行计算的海量日志文件分析系统.该系统采用集群方式并行地收集日志文件,采用分布式文件系统存储,最终利用并行计算对日志进行分析处理.该系统实现了日志采集、分析的完全自动化处理,在系统部署之后能够有效地进行系统安全的维护、系统性能的优化、系统故障的排查.该系统结合云计算提高了日志分析的效率,解决了海量日志处理过程中存在的问题,为海量日志分析提供了一个完整有效的解决方案.

  9. Method and apparatus for routing data in an inter-nodal communications lattice of a massively parallel computer system by dynamic global mapping of contended links

    Science.gov (United States)

    Archer, Charles Jens; Musselman, Roy Glenn; Peters, Amanda; Pinnow, Kurt Walter; Swartz, Brent Allen; Wallenfelt, Brian Paul

    2011-10-04

    A massively parallel nodal computer system periodically collects and broadcasts usage data for an internal communications network. A node sending data over the network makes a global routing determination using the network usage data. Preferably, network usage data comprises an N-bit usage value for each output buffer associated with a network link. An optimum routing is determined by summing the N-bit values associated with each link through which a data packet must pass, and comparing the sums associated with different possible routes.

  10. Method and apparatus for routing data in an inter-nodal communications lattice of a massively parallel computer system by dynamically adjusting local routing strategies

    Science.gov (United States)

    Archer, Charles Jens; Musselman, Roy Glenn; Peters, Amanda; Pinnow, Kurt Walter; Swartz, Brent Allen; Wallenfelt, Brian Paul

    2010-03-16

    A massively parallel computer system contains an inter-nodal communications network of node-to-node links. Each node implements a respective routing strategy for routing data through the network, the routing strategies not necessarily being the same in every node. The routing strategies implemented in the nodes are dynamically adjusted during application execution to shift network workload as required. Preferably, adjustment of routing policies in selective nodes is performed at synchronization points. The network may be dynamically monitored, and routing strategies adjusted according to detected network conditions.

  11. Method and apparatus for analyzing error conditions in a massively parallel computer system by identifying anomalous nodes within a communicator set

    Science.gov (United States)

    Gooding, Thomas Michael

    2011-04-19

    An analytical mechanism for a massively parallel computer system automatically analyzes data retrieved from the system, and identifies nodes which exhibit anomalous behavior in comparison to their immediate neighbors. Preferably, anomalous behavior is determined by comparing call-return stack tracebacks for each node, grouping like nodes together, and identifying neighboring nodes which do not themselves belong to the group. A node, not itself in the group, having a large number of neighbors in the group, is a likely locality of error. The analyzer preferably presents this information to the user by sorting the neighbors according to number of adjoining members of the group.

  12. Animation of interactive fluid flow visualization tools on a data parallel machine

    Energy Technology Data Exchange (ETDEWEB)

    Sethian, J.A. (California Univ., Berkeley, CA (USA). Dept. of Mathematics); Salem, J.B. (Thinking Machines Corp., Cambridge, MA (USA))

    1989-01-01

    The authors describe a new graphics environment for essentially real-time interactive visualization of computational fluid mechanics. The researcher may interactively examine fluid data on a graphics display using animated flow visualization diagnostics that mimic those in the experimental laboratory. These tools include display of moving color contours for scalar fields, smoke or dye injection of passive particles to identify coherent flow structures, and bubble wire tracers for velocity profiles, as well as three-dimensional interactive rotation and zoom and pan. The system is implemented on a data parallel supercomputer attached to a framebuffer. Since most fluid visualization techniques are highly parallel in nature, this allows rapid animation of fluid motion. The authors demonstrate our interactive graphics fluid flow system by analyzing data generated by numerical simulations of viscous, incompressible, laminar and turbulent flow over a backward-facing step and in a closed cavity. Input parameters are menu-driven, and images are updated at nine frames per second.

  13. MapReduce Based Parallel Neural Networks in Enabling Large Scale Machine Learning

    OpenAIRE

    Yang Liu; Jie Yang; Yuan Huang; Lixiong Xu; Siguang Li; Man Qi

    2015-01-01

    Artificial neural networks (ANNs) have been widely used in pattern recognition and classification applications. However, ANNs are notably slow in computation especially when the size of data is large. Nowadays, big data has received a momentum from both industry and academia. To fulfill the potentials of ANNs for big data applications, the computation process must be speeded up. For this purpose, this paper parallelizes neural networks based on MapReduce, which has become a major computing mo...

  14. Dynamic Resource Allocation Using Virtual Machines and Parallel Data Processing in the Cloud

    Directory of Open Access Journals (Sweden)

    Y.Bharath Bhushan

    2015-11-01

    Full Text Available The main enabling technology for cloud computing is virtualization which generalize the physical infrastructure and makes it easy to use and manage. Virtualization is used to allocate resources based on their needs and also supports green computing concept. Parallel data processing has emerged to be one of the killer applications for Infrastructure-as-a-Service (IaaS clouds. The processing frameworks which are currently used have been designed for static, homogeneous cluster setups and disregard the particular nature of a cloud. The allocated compute resources may be inadequate for big parts of the submitted job and unnecessarily increase processing time and cost. In this paper we are applying the concept of “SKEWNESS” to measure the unevenness in the multi-dimensional resource utilization of a server. By minimizing skewness, we can combine different types of workloads and improve the overall utilization of server resources and discuss the opportunities and challenges for efficient parallel data processing in clouds using “NEPHELE’S ARCHITECTURE”.  Nephel’s architecture offers efficient parallel data processing in clouds. It is the first data processing framework for the dynamic resource allocation offered by today’s IaaS clouds for both, task scheduling and execution

  15. AFL-1: A programming Language for Massively Concurrent Computers.

    Science.gov (United States)

    1986-11-01

    Languages, Massively Parallel Systems, Connectionist Networks, Activity Flow, Connection Machine, Rule Based Systems. 30. ADST MACY (Cietwoo a n rewu, d...compile time that can be used to execute some function at run time. To implement a rule based sys- tem with such a language one wants a way to...34 *= *€ _° " ’ ____ ____ ___ ____ ____ ___Granularity Fine Grain (SIMD) Medium Grain Coarse Grain (104 - 10W procs) (0 104 procs) (< 102 proc.) Connection Machine ’Hill isS51, Dado

  16. Pash 3.0: A versatile software package for read mapping and integrative analysis of genomic and epigenomic variation using massively parallel DNA sequencing

    Directory of Open Access Journals (Sweden)

    Chen Zuozhou

    2010-11-01

    Full Text Available Abstract Background Massively parallel sequencing readouts of epigenomic assays are enabling integrative genome-wide analyses of genomic and epigenomic variation. Pash 3.0 performs sequence comparison and read mapping and can be employed as a module within diverse configurable analysis pipelines, including ChIP-Seq and methylome mapping by whole-genome bisulfite sequencing. Results Pash 3.0 generally matches the accuracy and speed of niche programs for fast mapping of short reads, and exceeds their performance on longer reads generated by a new generation of massively parallel sequencing technologies. By exploiting longer read lengths, Pash 3.0 maps reads onto the large fraction of genomic DNA that contains repetitive elements and polymorphic sites, including indel polymorphisms. Conclusions We demonstrate the versatility of Pash 3.0 by analyzing the interaction between CpG methylation, CpG SNPs, and imprinting based on publicly available whole-genome shotgun bisulfite sequencing data. Pash 3.0 makes use of gapped k-mer alignment, a non-seed based comparison method, which is implemented using multi-positional hash tables. This allows Pash 3.0 to run on diverse hardware platforms, including individual computers with standard RAM capacity, multi-core hardware architectures and large clusters.

  17. Massively parallel sequencing of patients with intellectual disability, congenital anomalies and/or autism spectrum disorders with a targeted gene panel.

    Directory of Open Access Journals (Sweden)

    Maggie Brett

    Full Text Available Developmental delay and/or intellectual disability (DD/ID affects 1-3% of all children. At least half of these are thought to have a genetic etiology. Recent studies have shown that massively parallel sequencing (MPS using a targeted gene panel is particularly suited for diagnostic testing for genetically heterogeneous conditions. We report on our experiences with using massively parallel sequencing of a targeted gene panel of 355 genes for investigating the genetic etiology of eight patients with a wide range of phenotypes including DD/ID, congenital anomalies and/or autism spectrum disorder. Targeted sequence enrichment was performed using the Agilent SureSelect Target Enrichment Kit and sequenced on the Illumina HiSeq2000 using paired-end reads. For all eight patients, 81-84% of the targeted regions achieved read depths of at least 20×, with average read depths overlapping targets ranging from 322× to 798×. Causative variants were successfully identified in two of the eight patients: a nonsense mutation in the ATRX gene and a canonical splice site mutation in the L1CAM gene. In a third patient, a canonical splice site variant in the USP9X gene could likely explain all or some of her clinical phenotypes. These results confirm the value of targeted MPS for investigating DD/ID in children for diagnostic purposes. However, targeted gene MPS was less likely to provide a genetic diagnosis for children whose phenotype includes autism.

  18. Hybrid Metaheuristics for the Unrelated Parallel Machine Scheduling to Minimize Makespan and Maximum Just-in-Time Deviations

    Directory of Open Access Journals (Sweden)

    Chiuh Cheng Chyu

    2012-06-01

    Full Text Available This paper studies the unrelated parallel machine scheduling problem with three minimization objectives – makespan, maximum earliness, and maximum tardiness (MET-UPMSP. The last two objectives combined are related to just-in-time (JIT performance of a solution. Three hybrid algorithms are presented to solve the MET-UPMSP: reactive GRASP with path relinking, dual-archived memetic algorithm (DAMA, and SPEA2. In order to improve the solution quality, min-max matching is included in the decoding scheme for each algorithm. An experiment is conducted to evaluate the performance of the three algorithms, using 100 (jobs x 3 (machines and 200 x 5 problem instances with three combinations of two due date factors – tight and range. The numerical results indicate that DAMA performs best and GRASP performs second for most problem instances in three performance metrics: HVR, GD, and Spread. The experimental results also show that incorporating min-max matching into decoding scheme significantly improves the solution quality for the two population-based algorithms. It is worth noting that the solutions produced by DAMA with matching decoding can be used as benchmark to evaluate the performance of other algorithms.

  19. Hybrid soft-lithography/laser machined microchips for the parallel generation of droplets.

    Science.gov (United States)

    Muluneh, M; Issadore, D

    2013-12-21

    Microfluidic chips have been developed to generate droplets and microparticles with control over size, shape, and composition not possible using conventional methods. However, it has remained a challenge to scale-up production for practical applications due to the inherently limited throughput of micro-scale devices. To address this problem, we have developed a self-contained microchip that integrates many (N = 512) micro-scale droplet makers. This 3 × 3 cm(2) PDMS microchip consists of a two-dimensional array of 32 × 16 flow-focusing droplet makers, a network of flow channels that connect them, and only two inputs and one output. The key innovation of this technology is the hybrid use of both soft-lithography and direct laser-micromachining. The microscale resolution of soft lithography is used to fabricate flow-focusing droplet makers that can produce small and precisely defined droplets. Deeply engraved (h ≈ 500 μm) laser-machined channels are utilized to supply each of the droplet makers with its oil phase, aqueous phase, and access to an output channel. The engraved channels' low hydrodynamic resistance ensures that each droplet maker is driven with the same flow rates for highly uniform droplet formation. To demonstrate the utility of this approach, water droplets (d ≈ 80 μm) were generated in hexadecane on both 8 × 1 and 32 × 16 geometries.

  20. 跨平台的海量波形数据并行绘制算法%A Cross-platform Parallel Painting Algorithm for Massive Waveform Data

    Institute of Scientific and Technical Information of China (English)

    桂勋; 姚兰; 钱清泉

    2009-01-01

    针对当前第三方电力暂态数据分析软件在绘制海量波形数据时出现的效率低下、反应缓慢情况,结合多核并行计算技术,提出了一种可跨平台的海量COMTRADE波形数据并行绘制算法及其技术.该算法在分析传统串行绘制系统内部关系的基础上,提出了基于并行绘制的新型关系:将原有单一图层分为波形图层和用户控制图层,其中以并行方式绘制波形图层,而后通过融合图层方式完成最终绘制.通过试验分析Windows和UNIX下的各种图形绘制技术,找到了最适合海量波形数据并行绘制的跨平台技术组合:"QImage+QPainter"技术模式.结合跨平台的线程库Pthareads,详细论述了并行绘制算法的每个步骤,给出了让通道绘制线程能负载平衡运行的绘制工作量均分公式,同时给出了通道绘制线程和图层融合详细算法的伪码.试验证明所提出的并行绘制算法可获得较大加速比,并可随着绘制工作量的加大和CPU核的增多,获得线性加速比.%Since the current third party power system analysis software is low efficient and slow in printing massive waveform data, a cross-platform massive wave COMTRADE data parallel painting algorithm are proposed, based on the parallel multi-core computing technology, on the basis of the internal relationship of tradition serial painting system, a new relationship based on the parallel printing is presented in the algorithm, the original single layer is divided into waveform layer and user control layer, the waveform layer "is printed in parallel mode and the final printing is done by the way of image composition. By analyzing the printing technologies of Windows and Unix, QImage+ Qpainter is found to be the most suitable cross-platform technology for massive waveform parallel painting. Each step of the parallel printing algorithm is elaborated with a cross-platform thread library named Pthreads. the workload sharing formulas are given

  1. Massively parallel and linear-scaling algorithm for second-order Møller-Plesset perturbation theory applied to the study of supramolecular wires

    Science.gov (United States)

    Kjærgaard, Thomas; Baudin, Pablo; Bykov, Dmytro; Eriksen, Janus Juul; Ettenhuber, Patrick; Kristensen, Kasper; Larkin, Jeff; Liakh, Dmitry; Pawłowski, Filip; Vose, Aaron; Wang, Yang Min; Jørgensen, Poul

    2017-03-01

    We present a scalable cross-platform hybrid MPI/OpenMP/OpenACC implementation of the Divide-Expand-Consolidate (DEC) formalism with portable performance on heterogeneous HPC architectures. The Divide-Expand-Consolidate formalism is designed to reduce the steep computational scaling of conventional many-body methods employed in electronic structure theory to linear scaling, while providing a simple mechanism for controlling the error introduced by this approximation. Our massively parallel implementation of this general scheme has three levels of parallelism, being a hybrid of the loosely coupled task-based parallelization approach and the conventional MPI +X programming model, where X is either OpenMP or OpenACC. We demonstrate strong and weak scalability of this implementation on heterogeneous HPC systems, namely on the GPU-based Cray XK7 Titan supercomputer at the Oak Ridge National Laboratory. Using the ;resolution of the identity second-order Møller-Plesset perturbation theory; (RI-MP2) as the physical model for simulating correlated electron motion, the linear-scaling DEC implementation is applied to 1-aza-adamantane-trione (AAT) supramolecular wires containing up to 40 monomers (2440 atoms, 6800 correlated electrons, 24 440 basis functions and 91 280 auxiliary functions). This represents the largest molecular system treated at the MP2 level of theory, demonstrating an efficient removal of the scaling wall pertinent to conventional quantum many-body methods.

  2. Protruding microgripper with force amplification and parallel jaw motion for in-situ sample manipulation in sem and fib-machines

    NARCIS (Netherlands)

    Krijnen, Gijsbertus J.M.; Haanstra, R.P.; Haanstra, R.; Potters, E.; Potters, D.R.; Berenschot, Johan W.; Von Harrach, S.; Elwenspoek, Michael Curt

    2003-01-01

    We report on, to our best knowledge the first, protrud- ing electrostatic microgripper with force amplification and parallel jaw motion for in-situ manipulation of sub-mi- crometer thick membranes in combined Scanning Electron Microscopy (SEM) / Focussed Ion Beam (FIB) machines. The gripper is used

  3. The PVM (Parallel Virtual Machine) system: Supercomputer level concurrent computation on a network of IBM RS/6000 power stations

    Energy Technology Data Exchange (ETDEWEB)

    Sunderam, V.S. (Emory Univ., Atlanta, GA (USA). Dept. of Mathematics and Computer Science); Geist, G.A. (Oak Ridge National Lab., TN (USA))

    1991-01-01

    The PVM (Parallel Virtual Machine) system enables supercomputer level concurrent computations to be performed on interconnected networks of heterogeneous computer systems. Specifically, a network of 13 IBM RS/6000 powerstations has been successfully used to execute production quality runs of superconductor modeling codes at more than 250 Mflops. This work demonstrates the effectiveness of cooperative concurrent processing for high performance applications, and shows that supercomputer level computations may be attained at a fraction of the cost on distributed computing platforms. This paper describes the PVM programming environment and user facilities, as they apply to hardware platforms comprising a network of IBM RS/6000 powerstations. The salient design features of PVM will be discussed; including heterogeneity, scalability, multilanguage support, provisions for fault tolerance, the use of multiprocessors and scalar machines, an interactive graphical front end, and support for profiling, tracing, and visual analysis. The PVM system has been used extensively, and a range of production quality concurrent applications have been successfully executed using PVM on a variety of networked platforms. The paper will mention representative examples, and discuss two in detail. The first is a material sciences problem that was originally developed on a Cray 2. This application code calculates the electronic structure of metallic alloys from first principles and is based on the KKR-CPA algorithm. The second is a molecular dynamics simulation for calculating materials properties. Performance results for both applicants on networks of RS/6000 powerstations will be presented, and accompanied by discussions of the other advantages of PVM and its potential as a complement or alternative to conventional supercomputers.

  4. 基于机器学习的并行文件系统性能预测%Predicting the Parallel File System Performance via Machine Learning

    Institute of Scientific and Technical Information of China (English)

    赵铁柱; 董守斌; Verdi March; Simon See

    2011-01-01

    并行文件系统能有效解决高性能计算系统的海量数据存储和I/O瓶颈问题.由于影响系统性能的因素十分复杂,如何有效地评估系统性能并对性能进行预测成为一个潜在的挑战和热点.以并行文件系统的性能评估和预测作为研究目标,在研究文件系统的架构和性能因子后,设计了一个基于机器学习的并行文件系统预测模型,运用特征选择算法对性能因子数量进行约简,挖掘出系统性能和影响因子之间的特定的关系进行性能预测.通过设计大量实验用例,对特定的Lustre文件系统进行性能评估和预测.评估和实验结果表明:threads/OST、对象存储器(OSS)的数量、磁盘数目和RAID的组织方式是4个调整系统性能最重要因子,预测结果的平均相对误差能控制在25.1%~32.1%之间,具有较好预准确度.%Parallel file system can effectively solve the problems of massive data storage and I/O bottleneck. Because the potential impact on the system is not clearly understood, how to evaluate and predict performance of parallel file system becomes the potential challenge and hotspot. In this work,we aim to research the performance evaluation and prediction of parallel file system. After studying the architecture and performance factors of such file system, we design a predictive mode of parallel file system based on machine learning approaches. We use feature selection algorithms to reduce the number of performance factors to be tested in validating the performance. We also mine the particular relationship of system performance and impact factors to predict the performance of a specific file system. We validate and predict the performance of a specific Lustre file system through a series of experiment cases. Our evaluation and experiment results indicate that threads/OST, num of OSSs (Object Storage Server), hum of disks and num and type of RAID are the four most important parameters to tune the performance

  5. Message passing interface and multithreading hybrid for parallel molecular docking of large databases on petascale high performance computing machines.

    Science.gov (United States)

    Zhang, Xiaohua; Wong, Sergio E; Lightstone, Felice C

    2013-04-30

    A mixed parallel scheme that combines message passing interface (MPI) and multithreading was implemented in the AutoDock Vina molecular docking program. The resulting program, named VinaLC, was tested on the petascale high performance computing (HPC) machines at Lawrence Livermore National Laboratory. To exploit the typical cluster-type supercomputers, thousands of docking calculations were dispatched by the master process to run simultaneously on thousands of slave processes, where each docking calculation takes one slave process on one node, and within the node each docking calculation runs via multithreading on multiple CPU cores and shared memory. Input and output of the program and the data handling within the program were carefully designed to deal with large databases and ultimately achieve HPC on a large number of CPU cores. Parallel performance analysis of the VinaLC program shows that the code scales up to more than 15K CPUs with a very low overhead cost of 3.94%. One million flexible compound docking calculations took only 1.4 h to finish on about 15K CPUs. The docking accuracy of VinaLC has been validated against the DUD data set by the re-docking of X-ray ligands and an enrichment study, 64.4% of the top scoring poses have RMSD values under 2.0 Å. The program has been demonstrated to have good enrichment performance on 70% of the targets in the DUD data set. An analysis of the enrichment factors calculated at various percentages of the screening database indicates VinaLC has very good early recovery of actives.

  6. Adaptation of a Multi-Block Structured Solver for Effective Use in a Hybrid CPU/GPU Massively Parallel Environment

    Science.gov (United States)

    Gutzwiller, David; Gontier, Mathieu; Demeulenaere, Alain

    2014-11-01

    Multi-Block structured solvers hold many advantages over their unstructured counterparts, such as a smaller memory footprint and efficient serial performance. Historically, multi-block structured solvers have not been easily adapted for use in a High Performance Computing (HPC) environment, and the recent trend towards hybrid GPU/CPU architectures has further complicated the situation. This paper will elaborate on developments and innovations applied to the NUMECA FINE/Turbo solver that have allowed near-linear scalability with real-world problems on over 250 hybrid GPU/GPU cluster nodes. Discussion will focus on the implementation of virtual partitioning and load balancing algorithms using a novel meta-block concept. This implementation is transparent to the user, allowing all pre- and post-processing steps to be performed using a simple, unpartitioned grid topology. Additional discussion will elaborate on developments that have improved parallel performance, including fully parallel I/O with the ADIOS API and the GPU porting of the computationally heavy CPUBooster convergence acceleration module. Head of HPC and Release Management, Numeca International.

  7. Rapid profiling of the antigen regions recognized by serum antibodies using massively parallel sequencing of antigen-specific libraries.

    KAUST Repository

    Domina, Maria

    2014-12-04

    There is a need for techniques capable of identifying the antigenic epitopes targeted by polyclonal antibody responses during deliberate or natural immunization. Although successful, traditional phage library screening is laborious and can map only some of the epitopes. To accelerate and improve epitope identification, we have employed massive sequencing of phage-displayed antigen-specific libraries using the Illumina MiSeq platform. This enabled us to precisely identify the regions of a model antigen, the meningococcal NadA virulence factor, targeted by serum antibodies in vaccinated individuals and to rank hundreds of antigenic fragments according to their immunoreactivity. We found that next generation sequencing can significantly empower the analysis of antigen-specific libraries by allowing simultaneous processing of dozens of library/serum combinations in less than two days, including the time required for antibody-mediated library selection. Moreover, compared with traditional plaque picking, the new technology (named Phage-based Representation OF Immuno-Ligand Epitope Repertoire or PROFILER) provides superior resolution in epitope identification. PROFILER seems ideally suited to streamline and guide rational antigen design, adjuvant selection, and quality control of newly produced vaccines. Furthermore, this method is also susceptible to find important applications in other fields covered by traditional quantitative serology.

  8. Rapid profiling of the antigen regions recognized by serum antibodies using massively parallel sequencing of antigen-specific libraries.

    Directory of Open Access Journals (Sweden)

    Maria Domina

    Full Text Available There is a need for techniques capable of identifying the antigenic epitopes targeted by polyclonal antibody responses during deliberate or natural immunization. Although successful, traditional phage library screening is laborious and can map only some of the epitopes. To accelerate and improve epitope identification, we have employed massive sequencing of phage-displayed antigen-specific libraries using the Illumina MiSeq platform. This enabled us to precisely identify the regions of a model antigen, the meningococcal NadA virulence factor, targeted by serum antibodies in vaccinated individuals and to rank hundreds of antigenic fragments according to their immunoreactivity. We found that next generation sequencing can significantly empower the analysis of antigen-specific libraries by allowing simultaneous processing of dozens of library/serum combinations in less than two days, including the time required for antibody-mediated library selection. Moreover, compared with traditional plaque picking, the new technology (named Phage-based Representation OF Immuno-Ligand Epitope Repertoire or PROFILER provides superior resolution in epitope identification. PROFILER seems ideally suited to streamline and guide rational antigen design, adjuvant selection, and quality control of newly produced vaccines. Furthermore, this method is also susceptible to find important applications in other fields covered by traditional quantitative serology.

  9. Parallel Algorithm for GPU Processing; for use in High Speed Machine Vision Sensing of Cotton Lint Trash

    Directory of Open Access Journals (Sweden)

    Mathew G. Pelletier

    2008-02-01

    Full Text Available One of the main hurdles standing in the way of optimal cleaning of cotton lint isthe lack of sensing systems that can react fast enough to provide the control system withreal-time information as to the level of trash contamination of the cotton lint. This researchexamines the use of programmable graphic processing units (GPU as an alternative to thePC’s traditional use of the central processing unit (CPU. The use of the GPU, as analternative computation platform, allowed for the machine vision system to gain asignificant improvement in processing time. By improving the processing time, thisresearch seeks to address the lack of availability of rapid trash sensing systems and thusalleviate a situation in which the current systems view the cotton lint either well before, orafter, the cotton is cleaned. This extended lag/lead time that is currently imposed on thecotton trash cleaning control systems, is what is responsible for system operators utilizing avery large dead-band safety buffer in order to ensure that the cotton lint is not undercleaned.Unfortunately, the utilization of a large dead-band buffer results in the majority ofthe cotton lint being over-cleaned which in turn causes lint fiber-damage as well assignificant losses of the valuable lint due to the excessive use of cleaning machinery. Thisresearch estimates that upwards of a 30% reduction in lint loss could be gained through theuse of a tightly coupled trash sensor to the cleaning machinery control systems. Thisresearch seeks to improve processing times through the development of a new algorithm forcotton trash sensing that allows for implementation on a highly parallel architecture.Additionally, by moving the new parallel algorithm onto an alternative computing platform,the graphic processing unit “GPU”, for processing of the cotton trash images, a speed up ofover 6.5 times, over optimized code running on the PC’s central processing

  10. Partition-of-unity finite-element method for large scale quantum molecular dynamics on massively parallel computational platforms

    Energy Technology Data Exchange (ETDEWEB)

    Pask, J E; Sukumar, N; Guney, M; Hu, W

    2011-02-28

    Over the course of the past two decades, quantum mechanical calculations have emerged as a key component of modern materials research. However, the solution of the required quantum mechanical equations is a formidable task and this has severely limited the range of materials systems which can be investigated by such accurate, quantum mechanical means. The current state of the art for large-scale quantum simulations is the planewave (PW) method, as implemented in now ubiquitous VASP, ABINIT, and QBox codes, among many others. However, since the PW method uses a global Fourier basis, with strictly uniform resolution at all points in space, and in which every basis function overlaps every other at every point, it suffers from substantial inefficiencies in calculations involving atoms with localized states, such as first-row and transition-metal atoms, and requires substantial nonlocal communications in parallel implementations, placing critical limits on scalability. In recent years, real-space methods such as finite-differences (FD) and finite-elements (FE) have been developed to address these deficiencies by reformulating the required quantum mechanical equations in a strictly local representation. However, while addressing both resolution and parallel-communications problems, such local real-space approaches have been plagued by one key disadvantage relative to planewaves: excessive degrees of freedom (grid points, basis functions) needed to achieve the required accuracies. And so, despite critical limitations, the PW method remains the standard today. In this work, we show for the first time that this key remaining disadvantage of real-space methods can in fact be overcome: by building known atomic physics into the solution process using modern partition-of-unity (PU) techniques in finite element analysis. Indeed, our results show order-of-magnitude reductions in basis size relative to state-of-the-art planewave based methods. The method developed here is

  11. The Parallel C Preprocessor

    Directory of Open Access Journals (Sweden)

    Eugene D. Brooks III

    1992-01-01

    Full Text Available We describe a parallel extension of the C programming language designed for multiprocessors that provide a facility for sharing memory between processors. The programming model was initially developed on conventional shared memory machines with small processor counts such as the Sequent Balance and Alliant FX/8, but has more recently been used on a scalable massively parallel machine, the BBN TC2000. The programming model is split-join rather than fork-join. Concurrency is exploited to use a fixed number of processors more efficiently rather than to exploit more processors as in the fork-join model. Team splitting, a mechanism to split the team of processors executing a code into subteams to handle parallel subtasks, is used to provide an efficient mechanism to exploit nested concurrency. We have found the split-join programming model to have an inherent implementation advantage, compared to the fork-join model, when the number of processors in a machine becomes large.

  12. RH 1.5D: a massively parallel code for multi-level radiative transfer with partial frequency redistribution and Zeeman polarisation

    Science.gov (United States)

    Pereira, Tiago M. D.; Uitenbroek, Han

    2015-02-01

    The emergence of three-dimensional magneto-hydrodynamic simulations of stellar atmospheres has sparked a need for efficient radiative transfer codes to calculate detailed synthetic spectra. We present RH 1.5D, a massively parallel code based on the RH code and capable of performing Zeeman polarised multi-level non-local thermodynamical equilibrium calculations with partial frequency redistribution for an arbitrary amount of chemical species. The code calculates spectra from 3D, 2D or 1D atmospheric models on a column-by-column basis (or 1.5D). While the 1.5D approximation breaks down in the cores of very strong lines in an inhomogeneous environment, it is nevertheless suitable for a large range of scenarios and allows for faster convergence with finer control over the iteration of each simulation column. The code scales well to at least tens of thousands of CPU cores, and is publicly available. In the present work we briefly describe its inner workings, strategies for convergence optimisation, its parallelism, and some possible applications.

  13. Research in Parallel Algorithms and Software for Computational Aerosciences

    Science.gov (United States)

    Domel, Neal D.

    1996-01-01

    Phase 1 is complete for the development of a computational fluid dynamics CFD) parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.

  14. A two-stage flexible flow-shop scheduling problem with m identical parallel machines on one stage and a batch processor on the other stage

    Institute of Scientific and Technical Information of China (English)

    HE Long-min; SUN Shi-jie; CHENG Ming-bao

    2008-01-01

    This paper considers a hybrid two-stage flow-shop scheduling problem with m identical parallel machineson one stage and a batch processor on the other stage.The processing time of job Jj on any of m identical parallel machines is aj≡a(j∈N),and the processing time of job Jj is bj(j∈N)on a batch processor M.We take makespan(Cmax)as our minimization objective.In this paper,for the problem of FSMP-BI(m identical parallel machines on the first stage and a batch processor on the second stage),based on the algorithm given by Sung and Choung for the problem of l I rj,BI I Cmax under the constraint of the given processing sequence,we develop an optimal dynamic programming Algorithm H1 for it in max{O(nlogn),O(nB)} time.A max{O(nlogn),O(nB)} time symmetric Algorithm H2 is given then for the problem of BI-FSMP(a batch processor on the first stage and m identical parallel machines on the second stage).

  15. Guided ultrasonic waves in the cylindrical layer-substrate structures. Application to the control of the massive machine elements with cylindrical cavities

    Science.gov (United States)

    El Ouahdani, M.; Sidki, M.; Ramdani, A.

    2005-12-01

    This paper presents a study of the ultrasonic wave propagation in a cylindrical layer-substrate structure of an infinite length. We determine the dispersion curves of the structure, the displacements field in the structure and the impact of the contact quality between the layer and the substrate. The industrial application aimed by our study is the control of the massive machine elements with cylindrical cavities coated and exposed to corrosion. The obtained results show that some modes of propagation are insensitive to the layer thickness. Therefore, these modes can be generated during the ultrasonic control of the layer. In addition, for the dimensions considered here, the second mode of propagation is the most adapted for the detection of defects in the vicinity of the internal layer wall. In addition, our study shows the possibility of characterization of the quality of contact between the layer and the substrate from the analysis of dispersion curves of the structure.

  16. Implementation of a flexible and scalable particle-in-cell method for massively parallel computations in the mantle convection code ASPECT

    Science.gov (United States)

    Gassmöller, Rene; Bangerth, Wolfgang

    2016-04-01

    Particle-in-cell methods have a long history and many applications in geodynamic modelling of mantle convection, lithospheric deformation and crustal dynamics. They are primarily used to track material information, the strain a material has undergone, the pressure-temperature history a certain material region has experienced, or the amount of volatiles or partial melt present in a region. However, their efficient parallel implementation - in particular combined with adaptive finite-element meshes - is complicated due to the complex communication patterns and frequent reassignment of particles to cells. Consequently, many current scientific software packages accomplish this efficient implementation by specifically designing particle methods for a single purpose, like the advection of scalar material properties that do not evolve over time (e.g., for chemical heterogeneities). Design choices for particle integration, data storage, and parallel communication are then optimized for this single purpose, making the code relatively rigid to changing requirements. Here, we present the implementation of a flexible, scalable and efficient particle-in-cell method for massively parallel finite-element codes with adaptively changing meshes. Using a modular plugin structure, we allow maximum flexibility of the generation of particles, the carried tracer properties, the advection and output algorithms, and the projection of properties to the finite-element mesh. We present scaling tests ranging up to tens of thousands of cores and tens of billions of particles. Additionally, we discuss efficient load-balancing strategies for particles in adaptive meshes with their strengths and weaknesses, local particle-transfer between parallel subdomains utilizing existing communication patterns from the finite element mesh, and the use of established parallel output algorithms like the HDF5 library. Finally, we show some relevant particle application cases, compare our implementation to a

  17. Massively parallel approach to time-domain forward and inverse modelling of EM induction problem in spherical Earth

    Science.gov (United States)

    Velimsky, J.

    2011-12-01

    Inversion of observatory and low-orbit satellite geomagnetic data in terms of the three-dimensional distribution of electrical conductivity in the Earth's mantle can provide an independent constraint on the physical, chemical, and mineralogical composition of the Earth's mantle. This problem has been recently approached by different numerical methods. There are several key challenges from the numerical and algorithmic point of view, in particular the accuracy and speed of the forward solver, the effective evaluation of sensitivities of data to changes of model parameters, and the dependence of results on the a-priori knowledge of the spatio-temporal structure of the primary ionospheric and magnetospheric electric currents. Here I present recent advancements of the time-domain, spherical harmonic-finite element approach. The forward solver has been adapted to distributed-memory parallel architecture using band-matrix routines from the ScaLapack library. The evaluation of gradient of data misfit in the model space using adjoint approach has been also paralellized. Finally, the inverse problem has been reformulated in a way which allows for simultaneous reconstruction of conductivity model and external field model directly from the data.

  18. Massively parallel signal processing using the graphics processing unit for real-time brain-computer interface feature extraction

    Directory of Open Access Journals (Sweden)

    J. Adam Wilson

    2009-07-01

    Full Text Available The clock speeds of modern computer processors have nearly plateaued in the past five years. Consequently, neural prosthetic systems that rely on processing large quantities of data in a short period of time face a bottleneck, in that it may not be possible to process all of the data recorded from an electrode array with high channel counts and bandwidth, such as electrocorticographic grids or other implantable systems. Therefore, in this study a method of using the processing capabilities of a graphics card (GPU was developed for real-time neural signal processing of a brain-computer interface (BCI. The NVIDIA CUDA system was used to offload processing to the GPU, which is capable of running many operations in parallel, potentially greatly increasing the speed of existing algorithms. The BCI system records many channels of data, which are processed and translated into a control signal, such as the movement of a computer cursor. This signal processing chain involves computing a matrix-matrix multiplication (i.e., a spatial filter, followed by calculating the power spectral density on every channel using an auto-regressive method, and finally classifying appropriate features for control. In this study, the first two computationally-intensive steps were implemented on the GPU, and the speed was compared to both the current implementation and a CPU-based implementation that uses multi-threading. Significant performance gains were obtained with GPU processing: the current implementation processed 1000 channels in 933 ms, while the new GPU method took only 27 ms, an improvement of nearly 35 times.

  19. Massively Parallel Signal Processing using the Graphics Processing Unit for Real-Time Brain-Computer Interface Feature Extraction.

    Science.gov (United States)

    Wilson, J Adam; Williams, Justin C

    2009-01-01

    The clock speeds of modern computer processors have nearly plateaued in the past 5 years. Consequently, neural prosthetic systems that rely on processing large quantities of data in a short period of time face a bottleneck, in that it may not be possible to process all of the data recorded from an electrode array with high channel counts and bandwidth, such as electrocorticographic grids or other implantable systems. Therefore, in this study a method of using the processing capabilities of a graphics card [graphics processing unit (GPU)] was developed for real-time neural signal processing of a brain-computer interface (BCI). The NVIDIA CUDA system was used to offload processing to the GPU, which is capable of running many operations in parallel, potentially greatly increasing the speed of existing algorithms. The BCI system records many channels of data, which are processed and translated into a control signal, such as the movement of a computer cursor. This signal processing chain involves computing a matrix-matrix multiplication (i.e., a spatial filter), followed by calculating the power spectral density on every channel using an auto-regressive method, and finally classifying appropriate features for control. In this study, the first two computationally intensive steps were implemented on the GPU, and the speed was compared to both the current implementation and a central processing unit-based implementation that uses multi-threading. Significant performance gains were obtained with GPU processing: the current implementation processed 1000 channels of 250 ms in 933 ms, while the new GPU method took only 27 ms, an improvement of nearly 35 times.

  20. The Single Machine Parallel Batch Scheduling Problem with Job Compatibility Constraints%带工件相容约束的单机分批排序问题

    Institute of Scientific and Technical Information of China (English)

    张群发; 林诒勋

    2007-01-01

    The single machine parallel batch problem with job compatibility is considered to minimize makespan,where the job compatibility constraints are represented by a graph G.This problem is proved to be NP-hard.And when the graph G is limited to be a general bipartite,a complete bipartite and a complete m-partite graph,these problems are solved in polynomial time respectively.

  1. Parallel Genetic Algorithm for Solving Identical Parallel Machine Scheduling Problem with Constraint%用并行遗传算法解决带约束并行多机调度问题

    Institute of Scientific and Technical Information of China (English)

    吴昊; 程锦松

    2001-01-01

    Genetic algorithms(GA)are "global" numerical-optimal methods. It exists natural parallel characteristic. In this paper, we present a kind of parallel genetic algorithm which is based on master-slave control networks for solving identical parallel machine scheduling problem with constraint,and implement under the environment of the PVM. The computational results show that the genetic algorithm is efficient and fit for large scale identical parallel machine scheduling problem.%遗传算法是一种全局优化的数值计算方法。它存在自然并行性。本文提出一种解带约束并行多机调度问题的主从式控制网络并行遗传算法,并在PVM环境下实现。计算结果表明,并行遗传算法是有效的,且能适用于大规模并行多机调度问题。

  2. JASMIN-based Massive Parallel Computing of Large Scale Groundwater Flow%基于JASMIN的地下水流大规模并行数值模拟

    Institute of Scientific and Technical Information of China (English)

    程汤培; 莫则尧; 邵景力

    2013-01-01

    针对具有精细网格剖分、长时间跨度特征的地下水流模拟中计算时间长、存储开销大等瓶颈问题,基于MODFLOW三维非稳定流计算方法,提出基于网格片的核心算法以及基于影像区的通信机制,并在JASMIN框架上研制了大规模地下水流并行数值模拟程序JOGFLOW.通过河南郑州市中牟县雁鸣湖水源地地下水流的模拟,对程序正确性和性能进行了验证;通过建立一个具有精细网格剖分的假想地下水概念模型对可扩展性进行测试.相对于32核的并行程序,在512以及1024个处理机上的并行效率分别可达77.2%和67.5%.数值模拟结果表明,JOGFLOW具有较好的计算性能与可扩展性,能够有效使用数百上千计算核心,支持千万量级以上网格剖分的地下水流模型的大规模并行计算.%To overcome prohibitive cost in computational time and memory requirement in simulating groundwater flow models with detailed spatial discretization and long time period,we present an efficient massive parallel-computing program JOGFLOW for large scale groundwater flow simulation.In the program,groundwater flow process in MODFLOW is re-implemented on JASMIN by designing patch-based algorithms as well as using communication method based on adding ghost cells to each patch.Accuracy and efficiency of JOGFLOW are demonstrated in modeling a field flow located at Yanming Lake in Zhengzhou of Henan province.Parallel scalability is measured by simulating a hypothetic groundwater flow problem with much detailed spatial discretization.Compared to 32 cores,the parallel efficiency reaches 77.2% and 67.5% on 512 and 1 024 processors,respectively.Numerical modeling demonstrates good performance and scalability of JOGFLOW,which enables to support groundwater flow simulation with tens of millions of computational cells through massive parallel computing on hundreds or thousands of CPU cores.

  3. RH 1.5D: a massively parallel code for multi-level radiative transfer with partial frequency redistribution and Zeeman polarisation

    CERN Document Server

    Pereira, Tiago M D

    2014-01-01

    The emergence of three-dimensional magneto-hydrodynamic (MHD) simulations of stellar atmospheres has sparked a need for efficient radiative transfer codes to calculate detailed synthetic spectra. We present RH 1.5D, a massively parallel code based on the RH code and capable of performing Zeeman polarised multi-level non-local thermodynamical equilibrium (NLTE) calculations with partial frequency redistribution for an arbitrary amount of chemical species. The code calculates spectra from 3D, 2D or 1D atmospheric models on a column-by-column basis (or 1.5D). While the 1.5D approximation breaks down in the cores of very strong lines in an inhomogeneous environment, it is nevertheless suitable for a large range of scenarios and allows for faster convergence with finer control over the iteration of each simulation column. The code scales well to at least tens of thousands of CPU cores, and is publicly available. In the present work we briefly describe its inner workings, strategies for convergence optimisation, it...

  4. HLA-G variability and haplotypes detected by massively parallel sequencing procedures in the geographicaly distinct population samples of Brazil and Cyprus.

    Science.gov (United States)

    Castelli, Erick C; Gerasimou, Petroula; Paz, Michelle A; Ramalho, Jaqueline; Porto, Iane O P; Lima, Thálitta H A; Souza, Andréia S; Veiga-Castelli, Luciana C; Collares, Cristhianna V A; Donadi, Eduardo A; Mendes-Junior, Celso T; Costeas, Paul

    2017-03-01

    The HLA-G molecule presents immunomodulatory properties that might inhibit immune responses when interacting with specific Natural Killer and T cell receptors, such as KIR2DL4, ILT2 and ILT4. Thus, HLA-G might influence the outcome of situations in which fine immune system modulation is required, such as autoimmune diseases, transplants, cancer and pregnancy. The majority of the studies regarding the HLA-G gene variability so far was restricted to a specific gene segment (i.e., promoter, coding or 3' untranslated region), and was performed by using Sanger sequencing and probabilistic models to infer haplotypes. Here we propose a massively parallel sequencing (NGS) with a bioinformatics strategy to evaluate the entire HLA-G regulatory and coding segments, with haplotypes inferred relying more on the straightforward haplotyping capabilities of NGS, and less on probabilistic models. Then, HLA-G variability was surveyed in two admixed population samples of distinct geographical regions and demographic backgrounds, Cyprus and Brazil. Most haplotypes (promoters, coding, 3'UTR and extended ones) were detected both in Brazil and Cyprus and were identical to the ones already described by probabilistic models, indicating that these haplotypes are quite old and may be present worldwide. Copyright © 2017 Elsevier Ltd. All rights reserved.

  5. Sequencing the hypervariable regions of human mitochondrial DNA using massively parallel sequencing: Enhanced data acquisition for DNA samples encountered in forensic testing.

    Science.gov (United States)

    Davis, Carey; Peters, Dixie; Warshauer, David; King, Jonathan; Budowle, Bruce

    2015-03-01

    Mitochondrial DNA testing is a useful tool in the analysis of forensic biological evidence. In cases where nuclear DNA is damaged or limited in quantity, the higher copy number of mitochondrial genomes available in a sample can provide information about the source of a sample. Currently, Sanger-type sequencing (STS) is the primary method to develop mitochondrial DNA profiles. This method is laborious and time consuming. Massively parallel sequencing (MPS) can increase the amount of information obtained from mitochondrial DNA samples while improving turnaround time by decreasing the numbers of manipulations and more so by exploiting high throughput analyses to obtain interpretable results. In this study 18 buccal swabs, three different tissue samples from five individuals, and four bones samples from casework were sequenced at hypervariable regions I and II using STS and MPS. Sample enrichment for STS and MPS was PCR-based. Library preparation for MPS was performed using Nextera® XT DNA Sample Preparation Kit and sequencing was performed on the MiSeq™ (Illumina, Inc.). MPS yielded full concordance of base calls with STS results, and the newer methodology was able to resolve length heteroplasmy in homopolymeric regions. This study demonstrates short amplicon MPS of mitochondrial DNA is feasible, can provide information not possible with STS, and lays the groundwork for development of a whole genome sequencing strategy for degraded samples.

  6. Massively Parallel Sequencing (MPS) of Cell-Free Fetal DNA (cffDNA) for Trisomies 21, 18, and 13 in Twin Pregnancies.

    Science.gov (United States)

    Du, Erqiu; Feng, Chun; Cao, Yuming; Yao, Yanru; Lu, Jing; Zhang, Yuanzhen

    2017-06-01

    Massively parallel sequencing (MPS) technology has become increasingly available and has been widely used to screen for trisomies 21, 18, and 13 in singleton pregnancies. This study assessed the performance of MPS testing of cell-free fetal DNA (cffDNA) from maternal plasma for trisomies 21, 18, and 13 in twin pregnancies. Ninety-two women with twin pregnancies were recruited. The results were identified through karyotypes of amniocentesis or clinical examination and follow-up of the neonates. Fluorescent in-situ hybridization was used to examine the placentas postnatally in cases of false-positive results. The fetuses with autosomal trisomy 21 (n = 2) and trisomy 15 (n = 1) were successfully detected via MPS testing of cffDNA. There was one false-positive for trisomy 13 (n = 1), and fluorescence in-situ hybridization (FISH) identified confined placental mosaicism in this case. For twin pregnancies undergoing second-trimester screening for trisomy, MPS testing of cffDNA is feasible and can enhance the diagnostic spectrum of non-invasive prenatal testing, which could effectively reduce invasive prenatal diagnostic methods. In addition to screening for trisomy 21, 18, and 13 by cffDNA, MPS can detect fetal additional autosomal trisomy. False-positive results cannot completely exclude confined placental mosaicism.

  7. Forensic massively parallel sequencing data analysis tool: Implementation of MyFLq as a standalone web- and Illumina BaseSpace(®)-application.

    Science.gov (United States)

    Van Neste, Christophe; Gansemans, Yannick; De Coninck, Dieter; Van Hoofstat, David; Van Criekinge, Wim; Deforce, Dieter; Van Nieuwerburgh, Filip

    2015-03-01

    Routine use of massively parallel sequencing (MPS) for forensic genomics is on the horizon. The last few years, several algorithms and workflows have been developed to analyze forensic MPS data. However, none have yet been tailored to the needs of the forensic analyst who does not possess an extensive bioinformatics background. We developed our previously published forensic MPS data analysis framework MyFLq (My-Forensic-Loci-queries) into an open-source, user-friendly, web-based application. It can be installed as a standalone web application, or run directly from the Illumina BaseSpace environment. In the former, laboratories can keep their data on-site, while in the latter, data from forensic samples that are sequenced on an Illumina sequencer can be uploaded to Basespace during acquisition, and can subsequently be analyzed using the published MyFLq BaseSpace application. Additional features were implemented such as an interactive graphical report of the results, an interactive threshold selection bar, and an allele length-based analysis in addition to the sequenced-based analysis. Practical use of the application is demonstrated through the analysis of four 16-plex short tandem repeat (STR) samples, showing the complementarity between the sequence- and length-based analysis of the same MPS data. Copyright © 2014 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.

  8. Non-invasive prenatal testing using massively parallel sequencing of maternal plasma DNA: from molecular karyotyping to fetal whole-genome sequencing.

    Science.gov (United States)

    Lo, Y M Dennis

    2013-12-01

    The discovery of cell-free fetal DNA in maternal plasma in 1997 has stimulated a rapid development of non-invasive prenatal testing. The recent advent of massively parallel sequencing has allowed the analysis of circulating cell-free fetal DNA to be performed with unprecedented sensitivity and precision. Fetal trisomies 21, 18 and 13 are now robustly detectable in maternal plasma and such analyses have been available clinically since 2011. Fetal genome-wide molecular karyotyping and whole-genome sequencing have now been demonstrated in a number of proof-of-concept studies. Genome-wide and targeted sequencing of maternal plasma has been shown to allow the non-invasive prenatal testing of β-thalassaemia and can potentially be generalized to other monogenic diseases. It is thus expected that plasma DNA-based non-invasive prenatal testing will play an increasingly important role in future obstetric care. It is thus timely and important that the ethical, social and legal issues of non-invasive prenatal testing be discussed actively by all parties involved in prenatal care.

  9. Dielectrophoresis-assisted massively parallel cell pairing and fusion based on field constriction created by a micro-orifice array sheet.

    Science.gov (United States)

    Kimura, Yuji; Gel, Murat; Techaumnat, Boonchai; Oana, Hidehiro; Kotera, Hidetoshi; Washizu, Masao

    2011-09-01

    In this paper, we present a novel electrofusion device that enables massive parallelism, using an electrically insulating sheet having a two-dimensional micro-orifice array. The sheet is sandwiched by a pair of micro-chambers with immersed electrodes, and each chamber is filled with the suspensions of the two types of cells to be fused. Dielectrophoresis, assisted by sedimentation, is used to position the cells in the upper chamber down onto the orifices, then the device is flipped over to position the cells on the other side, so that cell pairs making contact in the orifice are formed. When a pulse voltage is applied to the electrodes, most voltage drop occurs around the orifice and impressed on the cell membrane in the orifice. This makes possible the application of size-independent voltage to fuse two cells in contact at all orifices exclusively in 1:1 manner. In the experiment, cytoplasm of one of the cells is stained with a fluorescence dye, and the transfer of the fluorescence to the other cell is used as the indication of fusion events. The two-dimensional orifice arrangement at the pitch of 50 μm realizes simultaneous fusion of 6 × 10³ cells on a 4 mm diameter chip, and the fusion yield of 78-90% is achieved for various sizes and types of cells.

  10. Parallel Computational Fluid Dynamics: Current Status and Future Requirements

    Science.gov (United States)

    Simon, Horst D.; VanDalsem, William R.; Dagum, Leonardo; Kutler, Paul (Technical Monitor)

    1994-01-01

    One or the key objectives of the Applied Research Branch in the Numerical Aerodynamic Simulation (NAS) Systems Division at NASA Allies Research Center is the accelerated introduction of highly parallel machines into a full operational environment. In this report we discuss the performance results obtained from the implementation of some computational fluid dynamics (CFD) applications on the Connection Machine CM-2 and the Intel iPSC/860. We summarize some of the experiences made so far with the parallel testbed machines at the NAS Applied Research Branch. Then we discuss the long term computational requirements for accomplishing some of the grand challenge problems in computational aerosciences. We argue that only massively parallel machines will be able to meet these grand challenge requirements, and we outline the computer science and algorithm research challenges ahead.

  11. Distributed Learning over Massive XML Documents in ELM Feature Space

    Directory of Open Access Journals (Sweden)

    Xin Bi

    2015-01-01

    Full Text Available With the exponentially increasing volume of XML data, centralized learning solutions are unable to meet the requirements of mining applications with massive training samples. In this paper, a solution to distributed learning over massive XML documents is proposed, which provides distributed conversion of XML documents into representation model in parallel based on MapReduce and a distributed learning component based on Extreme Learning Machine for mining tasks of classification or clustering. Within this framework, training samples are converted from raw XML datasets with better efficiency and information representation ability and taken to distributed learning algorithms in Extreme Learning Machine (ELM feature space. Extensive experiments are conducted on massive XML documents datasets to verify the effectiveness and efficiency for both classification and clustering applications.

  12. Kinematic and dynamic analysis and simulation on parallel kinematics machine tool%并联运动机床运动学和动力学分析与仿真

    Institute of Scientific and Technical Information of China (English)

    张毅; 牟思惠; 陈怀军

    2011-01-01

    以4-XPXUYYUX型并联运动机床为例,采用影响系数法建立了运动学分析的数学模型,利用Lagrange方程求解出动力学模型,为其它理论分析奠定基础。同时以虚拟样机软件ADAMS为工具,建立了并联机床的仿真模型,并对并联机床进行了动力学分析,获得了并联运动机床的动态特性,从而完善并优化了并联运动机床的系统设计。%The parallel kinematics machine tool is based on the space of mechanism.It can achieve space complex motion,meet requirements of processing of multi-degree freedom,and can process complex parts.Because the design process of a parallel kinematics machine tool is quite complicated,there are limitations of the traditional design methods using experience and static researches.It is necessary to explore the related design theories and study the dynamic characteristics of the parallel kinematics machine tool.This study takes a 4-XPXUYYUX parallel kinematics machine tool as an example,develops the mathematical model of kinematics analysis by the coefficient method.The dynamic model is solved using Lagrange equation,which can provide the basis for other theoretical analysis.We develop a simulation model of parallel kinematics machine tool by the virtual prototype software-ADAMS,and analyze the dynamic of parallel kinematics machine tool.The dynamic characteristics of parallel kinematics machine tool are acquired.The system design of parallel kinematics machine tool is improved and optimized.

  13. Development of ballistic hot electron emitter and its applications to parallel processing: active-matrix massive direct-write lithography in vacuum and thin films deposition in solutions

    Science.gov (United States)

    Koshida, N.; Kojima, A.; Ikegami, N.; Suda, R.; Yagi, M.; Shirakashi, J.; Yoshida, T.; Miyaguchi, H.; Muroyama, M.; Nishino, H.; Yoshida, S.; Sugata, M.; Totsu, K.; Esashi, M.

    2015-03-01

    Making the best use of the characteristic features in nanocrystalline Si (nc-Si) ballistic hot electron source, the alternative lithographic technology is presented based on the two approaches: physical excitation in vacuum and chemical reduction in solutions. The nc-Si cold cathode is a kind of metal-insulator-semiconductor (MIS) diode, composed of a thin metal film, an nc-Si layer, an n+-Si substrate, and an ohmic back contact. Under a biased condition, energetic electrons are uniformly and directionally emitted through the thin surface electrodes. In vacuum, this emitter is available for active-matrix drive massive parallel lithography. Arrayed 100×100 emitters (each size: 10×10 μm2, pitch: 100 μm) are fabricated on silicon substrate by conventional planar process, and then every emitter is bonded with integrated complementary metal-oxide-semiconductor (CMOS) driver using through-silicon-via (TSV) interconnect technology. Electron multi-beams emitted from selected devices are focused by a micro-electro-mechanical system (MEMS) condenser lens array and introduced into an accelerating system with a demagnification factor of 100. The electron accelerating voltage is 5 kV. The designed size of each beam landing on the target is 10×10 nm2 in square. Here we discuss the fabrication process of the emitter array with TSV holes, implementation of integrated ctive-matrix driver circuit, the bonding of these components, the construction of electron optics, and the overall operation in the exposure system including the correction of possible aberrations. The experimental results of this mask-less parallel pattern transfer are shown in terms of simple 1:1 projection and parallel lithography under an active-matrix drive scheme. Another application is the use of this emitter as an active electrode supplying highly reducing electrons into solutions. A very small amount of metal-salt solutions is dripped onto the nc-Si emitter surface, and the emitter is driven without

  14. Parallel plasma fluid turbulence calculations

    Energy Technology Data Exchange (ETDEWEB)

    Leboeuf, J.N.; Carreras, B.A.; Charlton, L.A.; Drake, J.B.; Lynch, V.E.; Newman, D.E.; Sidikman, K.L.; Spong, D.A.

    1994-12-31

    The study of plasma turbulence and transport is a complex problem of critical importance for fusion-relevant plasmas. To this day, the fluid treatment of plasma dynamics is the best approach to realistic physics at the high resolution required for certain experimentally relevant calculations. Core and edge turbulence in a magnetic fusion device have been modeled using state-of-the-art, nonlinear, three-dimensional, initial-value fluid and gyrofluid codes. Parallel implementation of these models on diverse platforms--vector parallel (National Energy Research Supercomputer Center`s CRAY Y-MP C90), massively parallel (Intel Paragon XP/S 35), and serial parallel (clusters of high-performance workstations using the Parallel Virtual Machine protocol)--offers a variety of paths to high resolution and significant improvements in real-time efficiency, each with its own advantages. The largest and most efficient calculations have been performed at the 200 Mword memory limit on the C90 in dedicated mode, where an overlap of 12 to 13 out of a maximum of 16 processors has been achieved with a gyrofluid model of core fluctuations. The richness of the physics captured by these calculations is commensurate with the increased resolution and efficiency and is limited only by the ingenuity brought to the analysis of the massive amounts of data generated.

  15. Parallel inversion of a massive ERT data set to characterize deep vadose zone contamination beneath former nuclear waste infiltration galleries at the Hanford Site B-Complex (Invited)

    Science.gov (United States)

    Johnson, T.; Rucker, D. F.; Wellman, D.

    2013-12-01

    revealed the general footprint of vadose zone contamination beneath infiltration galleries. In 2011, the USDOE commissioned an effort to re-invert the B-Complex ERT data as a whole using a recently developed massively parallel 3D ERT inversion code. The computational mesh included approximately 1.085 million elements and closely honored the 37m of topographic relief as determined by LiDAR imaging. The water table and tank boundaries were also incorporated into the mesh to facilitate regularization disconnects, enabling sharp conductivity contrasts where they occur naturally without penalty. The data were inverted using 1024 processors, requiring 910 Gb of memory and 11.5 hours of computation time. The imaging results revealed previously unrealized detail concerning the distribution and behavior of contaminants migrating through the vadose zone, and are currently being used by site cleanup operators and regulators to understand the origin of a groundwater nitrate plume emerging from one of the infiltration galleries. The results overall demonstrate the utility of high performance computing, unstructured meshing, and custom regularization constraints for optimal processing of massive ERT data sets enabled by modern ERT survey hardware.

  16. An approximation algorithm for parallel machine scheduling with simple linear deterioration%关于简单线性恶化问题的平行机排序的一个全多项式逼近算法

    Institute of Scientific and Technical Information of China (English)

    任传荣; 康丽英

    2007-01-01

    In this paper, a parallel machine scheduling problem was considered, where the processing time of a job is a simple linear function of its starting time. The objective is to minimize makespan. A fully polynomial time approximation scheme for the problem of scheduling n deteriorating jobs on two identical machines was worked out. Furthermore, the result was generalized to the case of a fixed number of machines.

  17. Assessing quality standards for ChIP-seq and related massive parallel sequencing-generated datasets: When rating goes beyond avoiding the crisis.

    Science.gov (United States)

    Mendoza-Parra, Marco Antonio; Gronemeyer, Hinrich

    2014-12-01

    Massive parallel DNA sequencing combined with chromatin immunoprecipitation and a large variety of DNA/RNA-enrichment methodologies is at the origin of data resources of major importance. Indeed these resources, available for multiple genomes, represent the most comprehensive catalogue of (i) cell, development and signal transduction-specified patterns of binding sites for transcription factors ('cistromes') and for transcription and chromatin modifying machineries and (ii) the patterns of specific local post-translational modifications of histones and DNA ('epigenome') or of regulatory chromatin binding factors. In addition, (iii) the resources specifying chromatin structure alterations are emerging. Importantly, these types of "omics" datasets populate increasingly public repositories and provide highly valuable resources for the exploration of general principles of cell function in a multi-dimensional genome-transcriptome-epigenome-chromatin structure context. However, data mining is critically dependent on the data quality, an issue that, surprisingly, is still largely ignored by scientists and well-financed consortia, data repositories and scientific journals. So what determines the quality of ChIP-seq experiments and the datasets generated therefrom and what refrains scientists from associating quality criteria to their data? In this 'opinion' we trace the various parameters that influence the quality of this type of datasets, as well as the computational efforts that were made until now to qualify them. Moreover, we describe a universal quality control (QC) certification approach that provides a quality rating for ChIP-seq and enrichment-related assays. The corresponding QC tool and a regularly updated database, from which at present the quality parameters of more than 8000 datasets can be retrieved, are freely accessible at www.ngs-qc.org.

  18. The effect of the macrolide antibiotic tylosin on microbial diversity in the canine small intestine as demonstrated by massive parallel 16S rRNA gene sequencing.

    Science.gov (United States)

    Suchodolski, Jan S; Dowd, Scot E; Westermarck, Elias; Steiner, Jörg M; Wolcott, Randy D; Spillmann, Thomas; Harmoinen, Jaana A

    2009-10-02

    Recent studies have shown that the fecal microbiota is generally resilient to short-term antibiotic administration, but some bacterial taxa may remain depressed for several months. Limited information is available about the effect of antimicrobials on small intestinal microbiota, an important contributor to gastrointestinal health. The antibiotic tylosin is often successfully used for the treatment of chronic diarrhea in dogs, but its exact mode of action and its effect on the intestinal microbiota remain unknown. The aim of this study was to evaluate the effect of tylosin on canine jejunal microbiota. Tylosin was administered at 20 to 22 mg/kg q 24 hr for 14 days to five healthy dogs, each with a pre-existing jejunal fistula. Jejunal brush samples were collected through the fistula on days 0, 14, and 28 (14 days after withdrawal of tylosin). Bacterial diversity was characterized using massive parallel 16S rRNA gene pyrosequencing. Pyrosequencing revealed a previously unrecognized species richness in the canine small intestine. Ten bacterial phyla were identified. Microbial populations were phylogenetically more similar during tylosin treatment. However, a remarkable inter-individual response was observed for specific taxa. Fusobacteria, Bacteroidales, and Moraxella tended to decrease. The proportions of Enterococcus-like organisms, Pasteurella spp., and Dietzia spp. increased significantly during tylosin administration (p tylosin increased in their proportions. Tylosin may lead to prolonged effects on the composition and diversity of jejunal microbiota. However, these changes were not associated with any short-term clinical signs of gastrointestinal disease in healthy dogs. Our results illustrate the complexity of the intestinal microbiota and the challenges associated with evaluating the effect of antibiotic administration on the various bacterial groups and their potential interactions.

  19. Application of high-resolution, massively parallel pyrosequencing for estimation of haplotypes and gene expression levels of swine leukocyte antigen (SLA) class I genes.

    Science.gov (United States)

    Kita, Yuki F; Ando, Asako; Tanaka, Keiko; Suzuki, Shingo; Ozaki, Yuki; Uenishi, Hirohide; Inoko, Hidetoshi; Kulski, Jerzy K; Shiina, Takashi

    2012-03-01

    The swine is an important animal model for allo- and xeno-transplantation donor studies, which necessitates an extensive characterization of the expression and sequence variations within the highly polygenic and polymorphic swine leukocyte antigen (SLA) region. Massively parallel pyrosequencing is potentially an effective new 2ndGen method for simultaneous high-throughput genotyping and detection of SLA class I gene expression levels. In this study, we compared the 2ndGen method using the Roche Genome Sequencer 454 FLX with the conventional method using sub-cloning and Sanger sequencing to genotype SLA class I genes in five pigs of the Clawn breed and four pigs of the Landrace breed. We obtained an average of 10.4 SLA class I sequences per pig by the 2ndGen method, consistent with the inheritance data, and an average of only 6.0 sequences by the conventional method. We also performed a correlation analysis between the sequence read numbers obtained by the 2ndGen method and the relative expression values obtained by quantitative real-time PCR analysis at the allele level. A significant correlation coefficient (r = 0.899, P SLA class I genes SLA-1, SLA-2, and SLA-3, suggesting that the sequence read numbers closely reflect the gene expression levels in white blood cells. Overall, five novel class I sequences, different haplotype-specific expression patterns and a splice variant for one of the SLA class I genes were identified by the 2ndGen method at greater efficiency and sensitivity than the conventional method.

  20. Influences of diurnal sampling bias on fixed-point monitoring of plankton biodiversity determined using a massively parallel sequencing-based technique.

    Science.gov (United States)

    Nagai, Satoshi; Hida, Kohsuke; Urushizaki, Shingo; Onitsuka, Goh; Yasuike, Motoshige; Nakamura, Yoji; Fujiwara, Atushi; Tajimi, Seisuke; Kimoto, Katsunori; Kobayashi, Takanori; Gojobori, Takashi; Ototake, Mitsuru

    2016-02-01

    In this study, we investigated the influence of diurnal sampling bias on the community structure of plankton by comparing the biodiversity among seawater samples (n=9) obtained every 3h for 24h by using massively parallel sequencing (MPS)-based plankton monitoring at a fixed point conducted at Himedo seaport in Yatsushiro Sea, Japan. The number of raw operational taxonomy units (OTUs) and OTUs after re-sampling was 507-658 (558 ± 104, mean ± standard deviation) and 448-544 (467 ± 81), respectively, indicating high plankton biodiversity at the sampling location. The relative abundance of the top 20 OTUs in the samples from Himedo seaport was 48.8-67.7% (58.0 ± 5.8%), and the highest-ranked OTU was Pseudo-nitzschia species (Bacillariophyta) with a relative abundance of 17.3-39.2%, followed by Oithona sp. 1 and Oithona sp. 2 (Arthropoda). During seawater sampling, the semidiurnal tidal current having an amplitude of 0.3ms(-1) was dominant, and the westward residual current driven by the northeasterly wind was continuously observed during the 24-h monitoring. Therefore, the relative abundance of plankton species apparently fluctuated among the samples, but no significant difference was noted according to G-test (p>0.05). Significant differences were observed between the samples obtained from a different locality (Kusuura in Yatsushiro Sea) and at different dates, suggesting that the influence of diurnal sampling bias on plankton diversity, determined using the MPS-based survey, was not significant and acceptable. Copyright © 2015 Elsevier B.V. All rights reserved.