times speedup compared: Topics by WorldWideScience.org

Sample records for times speedup compared

The time-varying shortest path problem with fuzzy transit costs and speedup

Directory of Open Access Journals (Sweden)

Rezapour Hassan

2016-08-01

Full Text Available In this paper, we focus on the time-varying shortest path problem, where the transit costs are fuzzy numbers. Moreover, we consider this problem in which the transit time can be shortened at a fuzzy speedup cost. Speedup may also be a better decision to find the shortest path from a source vertex to a specified vertex.
Conversion of DAP models to SPEEDUP

International Nuclear Information System (INIS)

Aull, J.E.

1993-08-01

Several processes at the Savannah River Site are modeled using Bechtel's Dynamic Analysis Program (DAP) which uses a sequential modular modeling architecture. The feasibility of conversion of DAP models to SPEEDUP was examined because of the benefits associated with this de facto industry standard. The equation-based approach used in SPEEDUP gives accuracy, stability, and ease of maintenance. The DAP licenses on our site are for single-user PS/2 machines whereas the SPEEDUP product is licensed on a VAX minicomputer which provides faster execution and ease of integration with existing visualization tools. In this paper the basic unit operations of a DAP model that simulates a ventilation system are described. The basic operations were modeled with both DAP and SPEEDUP, and the two models yield results that are in close agreement. Since the basic unit operations of the DAP model have been successfully duplicated using SPEEDUP, it is feasible to proceed with model conversion. DAP subroutines and functions that involve only algebraic manipulation may be inserted directly into the SPEEDUP model or their underlying equations may be extracted and written as SPEEDUP model equations. A problem modeled in SPEEDUP running on a VAX 8810 runs approximately fifteen times faster in elapsed time than the same problem modeled with DAP on a 33 MHz Intel 80486 processor
Quantum computing. Defining and detecting quantum speedup.

Science.gov (United States)

Rønnow, Troels F; Wang, Zhihui; Job, Joshua; Boixo, Sergio; Isakov, Sergei V; Wecker, David; Martinis, John M; Lidar, Daniel A; Troyer, Matthias

2014-07-25

The development of small-scale quantum devices raises the question of how to fairly assess and detect quantum speedup. Here, we show how to define and measure quantum speedup and how to avoid pitfalls that might mask or fake such a speedup. We illustrate our discussion with data from tests run on a D-Wave Two device with up to 503 qubits. By using random spin glass instances as a benchmark, we found no evidence of quantum speedup when the entire data set is considered and obtained inconclusive results when comparing subsets of instances on an instance-by-instance basis. Our results do not rule out the possibility of speedup for other classes of problems and illustrate the subtle nature of the quantum speedup question. Copyright © 2014, American Association for the Advancement of Science.
Asymptotic speedups, bisimulation and distillation (Work in progress)

DEFF Research Database (Denmark)

Jones, Neil; Hamilton, G. W.

2015-01-01

Distillation is a fully automatic program transformation that can yield superlinear program speedups. Bisimulation is a key to the proof that distillation is correct, i.e., preserves semantics. However the proof, based on observational equivalence, is insensitive to program running times....... This paper shows how distillation can give superlinear speedups on some “old chestnut” programs well-known from the early program transformation literature: naive reverse, factorial sum, and Fibonacci....
Architectures for Quantum Simulation Showing a Quantum Speedup

Science.gov (United States)

Bermejo-Vega, Juan; Hangleiter, Dominik; Schwarz, Martin; Raussendorf, Robert; Eisert, Jens

2018-04-01

One of the main aims in the field of quantum simulation is to achieve a quantum speedup, often referred to as "quantum computational supremacy," referring to the experimental realization of a quantum device that computationally outperforms classical computers. In this work, we show that one can devise versatile and feasible schemes of two-dimensional, dynamical, quantum simulators showing such a quantum speedup, building on intermediate problems involving nonadaptive, measurement-based, quantum computation. In each of the schemes, an initial product state is prepared, potentially involving an element of randomness as in disordered models, followed by a short-time evolution under a basic translationally invariant Hamiltonian with simple nearest-neighbor interactions and a mere sampling measurement in a fixed basis. The correctness of the final-state preparation in each scheme is fully efficiently certifiable. We discuss experimental necessities and possible physical architectures, inspired by platforms of cold atoms in optical lattices and a number of others, as well as specific assumptions that enter the complexity-theoretic arguments. This work shows that benchmark settings exhibiting a quantum speedup may require little control, in contrast to universal quantum computing. Thus, our proposal puts a convincing experimental demonstration of a quantum speedup within reach in the near term.
Numerical benchmarking of SPEEDUP trademark against point kinetics solutions

International Nuclear Information System (INIS)

Gregory, M.V.

1993-02-01

SPEEDUP trademark is a state-of-the-art, dynamic, chemical process modeling package offered by Aspen Technology. In anticipation of new customers' needs for new analytical tools to support the site's waste management activities, SRTC has secured a multiple-user license to SPEEDUP trademark. In order to verify both the installation and mathematical correctness of the algorithms in SPEEDUP trademark, we have performed several numerical benchmarking calculations. These calculations are the first steps in establishing an on-site quality assurance pedigree for SPEEDUP trademark. The benchmark calculations consisted of SPEEDUP trademark Version 5.3L representations of five neutron kinetics benchmarks (each a mathematically stiff system of seven coupled ordinary differential equations), whose exact solutions are documented in the open literature. In all cases, SPEEDUP trademark solutions to be in excellent agreement with the reference solutions. A minor peculiarity in dealing with a non-existent discontinuity in the OPERATION section of the model made itself evident
Speedup predictions on large scientific parallel programs

International Nuclear Information System (INIS)

Williams, E.; Bobrowicz, F.

1985-01-01

How much speedup can we expect for large scientific parallel programs running on supercomputers. For insight into this problem we extend the parallel processing environment currently existing on the Cray X-MP (a shared memory multiprocessor with at most four processors) to a simulated N-processor environment, where N greater than or equal to 1. Several large scientific parallel programs from Los Alamos National Laboratory were run in this simulated environment, and speedups were predicted. A speedup of 14.4 on 16 processors was measured for one of the three most used codes at the Laboratory
Search for scalar-tensor gravity theories with a non-monotonic time evolution of the speed-up factor

Energy Technology Data Exchange (ETDEWEB)

Navarro, A [Dept Fisica, Universidad de Murcia, E30071-Murcia (Spain); Serna, A [Dept Fisica, Computacion y Comunicaciones, Universidad Miguel Hernandez, E03202-Elche (Spain); Alimi, J-M [Lab. de l' Univers et de ses Theories (LUTH, CNRS FRE2462), Observatoire de Paris-Meudon, F92195-Meudon (France)

2002-08-21

We present a method to detect, in the framework of scalar-tensor gravity theories, the existence of stationary points in the time evolution of the speed-up factor. An attractive aspect of this method is that, once the particular scalar-tensor theory has been specified, the stationary points are found through a simple algebraic equation which does not contain any integration. By applying this method to the three classes of scalar-tensor theories defined by Barrow and Parsons, we have found several new cosmological models with a non-monotonic evolution of the speed-up factor. The physical interest of these models is that, as previously shown by Serna and Alimi, they predict the observed primordial abundance of light elements for a very wide range of baryon density. These models are then consistent with recent CMB and Lyman-{alpha} estimates of the baryon content of the universe.
Comparison of Speed-Up Over Hills Derived from Wind-Tunnel Experiments, Wind-Loading Standards, and Numerical Modelling

Science.gov (United States)

Safaei Pirooz, Amir A.; Flay, Richard G. J.

2018-03-01

We evaluate the accuracy of the speed-up provided in several wind-loading standards by comparison with wind-tunnel measurements and numerical predictions, which are carried out at a nominal scale of 1:500 and full-scale, respectively. Airflow over two- and three-dimensional bell-shaped hills is numerically modelled using the Reynolds-averaged Navier-Stokes method with a pressure-driven atmospheric boundary layer and three different turbulence models. Investigated in detail are the effects of grid size on the speed-up and flow separation, as well as the resulting uncertainties in the numerical simulations. Good agreement is obtained between the numerical prediction of speed-up, as well as the wake region size and location, with that according to large-eddy simulations and the wind-tunnel results. The numerical results demonstrate the ability to predict the airflow over a hill with good accuracy with considerably less computational time than for large-eddy simulation. Numerical simulations for a three-dimensional hill show that the speed-up and the wake region decrease significantly when compared with the flow over two-dimensional hills due to the secondary flow around three-dimensional hills. Different hill slopes and shapes are simulated numerically to investigate the effect of hill profile on the speed-up. In comparison with more peaked hill crests, flat-topped hills have a lower speed-up at the crest up to heights of about half the hill height, for which none of the standards gives entirely satisfactory values of speed-up. Overall, the latest versions of the National Building Code of Canada and the Australian and New Zealand Standard give the best predictions of wind speed over isolated hills.
Rotational speedups accompanying angular deceleration of a superfluid

International Nuclear Information System (INIS)

Campbell, L.J.

1979-01-01

Exact calculations of the angular deceleration of superfluid vortex arrays show momentary speedups in the angular velocity caused by coherent, multiple vortex loss at the boundary. The existence and shape of the speedups depend on the vortex friction, the deceleration rate, and the pattern symmetry. The phenomenon resembles, in several ways, that observed in pulsars
A speedup technique for (l, d-motif finding algorithms

Directory of Open Access Journals (Sweden)

Dinh Hieu

2011-03-01

Full Text Available Abstract Background The discovery of patterns in DNA, RNA, and protein sequences has led to the solution of many vital biological problems. For instance, the identification of patterns in nucleic acid sequences has resulted in the determination of open reading frames, identification of promoter elements of genes, identification of intron/exon splicing sites, identification of SH RNAs, location of RNA degradation signals, identification of alternative splicing sites, etc. In protein sequences, patterns have proven to be extremely helpful in domain identification, location of protease cleavage sites, identification of signal peptides, protein interactions, determination of protein degradation elements, identification of protein trafficking elements, etc. Motifs are important patterns that are helpful in finding transcriptional regulatory elements, transcription factor binding sites, functional genomics, drug design, etc. As a result, numerous papers have been written to solve the motif search problem. Results Three versions of the motif search problem have been proposed in the literature: Simple Motif Search (SMS, (l, d-motif search (or Planted Motif Search (PMS, and Edit-distance-based Motif Search (EMS. In this paper we focus on PMS. Two kinds of algorithms can be found in the literature for solving the PMS problem: exact and approximate. An exact algorithm identifies the motifs always and an approximate algorithm may fail to identify some or all of the motifs. The exact version of PMS problem has been shown to be NP-hard. Exact algorithms proposed in the literature for PMS take time that is exponential in some of the underlying parameters. In this paper we propose a generic technique that can be used to speedup PMS algorithms. Conclusions We present a speedup technique that can be used on any PMS algorithm. We have tested our speedup technique on a number of algorithms. These experimental results show that our speedup technique is indeed very
Optimization of speed-up network component values for the 30 Ω resistively terminated prototype kicker magnet

International Nuclear Information System (INIS)

Barnes, M.J.; Wait, G.D.

1993-01-01

Kicker magnets are required for all ring-to-ring transfers in the 5 rings of the proposed KAON factory synchrotron. The kick must rise from 1% to 99% of full strength during the time interval of gaps created in the beam (80 ns to 160 ns) so that the beam can be extracted with minimum losses. In order to achieve the specified rise-time and open-quote flatness close-quote for the kick it is necessary to utilize speed-up networks, comprising a capacitor and a resistor, in the electrical circuit. Speed-up networks may be connected electrically on both the input and output of the kicker magnet. In addition it is advantageous to connect a open-quote speed-up close-quote network on the input of the resistive terminator(s). A sequence which may minimize the number of mathematical simulations required to optimize the values of the 8 possible speed-up components is presented. PE2D has been utilized to determine inductance and capacitance values for the resistive terminator; this data has been used in PSpice transient analyses. Results of the PE2D predictions are also presented. The research has culminated in a predicted kick rise time (1% to 99%) of less than 50 ns for a TRIUMF 10 cell prototype kicker magnet. The proposed improvements are currently being implemented on our prototype kicker system
Understanding the Quantum Computational Speed-up via De-quantisation

Directory of Open Access Journals (Sweden)

Cristian S. Calude

2010-06-01

Full Text Available While it seems possible that quantum computers may allow for algorithms offering a computational speed-up over classical algorithms for some problems, the issue is poorly understood. We explore this computational speed-up by investigating the ability to de-quantise quantum algorithms into classical simulations of the algorithms which are as efficient in both time and space as the original quantum algorithms. The process of de-quantisation helps formulate conditions to determine if a quantum algorithm provides a real speed-up over classical algorithms. These conditions can be used to develop new quantum algorithms more effectively (by avoiding features that could allow the algorithm to be efficiently classically simulated, as well as providing the potential to create new classical algorithms (by using features which have proved valuable for quantum algorithms. Results on many different methods of de-quantisations are presented, as well as a general formal definition of de-quantisation. De-quantisations employing higher-dimensional classical bits, as well as those using matrix-simulations, put emphasis on entanglement in quantum algorithms; a key result is that any algorithm in which the entanglement is bounded is de-quantisable. These methods are contrasted with the stabiliser formalism de-quantisations due to the Gottesman-Knill Theorem, as well as those which take advantage of the topology of the circuit for a quantum algorithm. The benefits of the different methods are contrasted, and the importance of a range of techniques is emphasised. We further discuss some features of quantum algorithms which current de-quantisation methods do not cover.
Quantum Speedup for Active Learning Agents

Directory of Open Access Journals (Sweden)

Giuseppe Davide Paparo

2014-07-01

Full Text Available Can quantum mechanics help us build intelligent learning agents? A defining signature of intelligent behavior is the capacity to learn from experience. However, a major bottleneck for agents to learn in real-life situations is the size and complexity of the corresponding task environment. Even in a moderately realistic environment, it may simply take too long to rationally respond to a given situation. If the environment is impatient, allowing only a certain time for a response, an agent may then be unable to cope with the situation and to learn at all. Here, we show that quantum physics can help and provide a quadratic speedup for active learning as a genuine problem of artificial intelligence. This result will be particularly relevant for applications involving complex task environments.
Controls on the inland propagation of terminus-driven speedups at Helheim Glacier, SE Greenland

Science.gov (United States)

Kehrl, L. M.; Joughin, I. R.; Smith, B.

2017-12-01

Tidewater glaciers are very sensitive to changes in the stress balance near their termini. When submarine melt or iceberg calving reduce lateral or basal resistance near the terminus, the glacier typically must speed up to produce the additional longitudinal and lateral stress gradients necessary to restore stress balance. Once speedup near the terminus is initiated, it can propagate inland through longitudinal stress coupling, thinning-induced changes in the effective pressure, and/or a steepening of surface slopes. The controls on these processes and the timing and spatial extent of the inland response, however, remain poorly understood. In this study, we use a three-dimensional, Full Stokes model (Elmer/Ice) to investigate the effects of different ice rheology and basal sliding parameterizations on the inland propagation of speedups at Helheim Glacier, SE Greenland. Using satellite observations of terminus position, we force the model with the observed 3-km, 2013/14 retreat history and allow the model to evolve in response to this retreat. We run a set of simulations that vary the ice rheology (constant or spatially variable ice temperature) and basal sliding law (linear, nonlinear, and effective-pressure-dependent). Our results show that the choice of parameterizations affect the timing and spatial extent of the inland response, but that the range of acceptable parameters can be constrained by comparing the model results to satellite observations of surface velocity and elevation.
Tunneling and Speedup in Quantum Optimization for Permutation-Symmetric Problems

Directory of Open Access Journals (Sweden)

Siddharth Muthukrishnan

2016-07-01

Full Text Available Tunneling is often claimed to be the key mechanism underlying possible speedups in quantum optimization via quantum annealing (QA, especially for problems featuring a cost function with tall and thin barriers. We present and analyze several counterexamples from the class of perturbed Hamming weight optimization problems with qubit permutation symmetry. We first show that, for these problems, the adiabatic dynamics that make tunneling possible should be understood not in terms of the cost function but rather the semiclassical potential arising from the spin-coherent path-integral formalism. We then provide an example where the shape of the barrier in the final cost function is short and wide, which might suggest no quantum advantage for QA, yet where tunneling renders QA superior to simulated annealing in the adiabatic regime. However, the adiabatic dynamics turn out not be optimal. Instead, an evolution involving a sequence of diabatic transitions through many avoided-level crossings, involving no tunneling, is optimal and outperforms adiabatic QA. We show that this phenomenon of speedup by diabatic transitions is not unique to this example, and we provide an example where it provides an exponential speedup over adiabatic QA. In yet another twist, we show that a classical algorithm, spin-vector dynamics, is at least as efficient as diabatic QA. Finally, in a different example with a convex cost function, the diabatic transitions result in a speedup relative to both adiabatic QA with tunneling and classical spin-vector dynamics.
Seeking Quantum Speedup Through Spin Glasses: The Good, the Bad, and the Ugly*

Directory of Open Access Journals (Sweden)

Helmut G. Katzgraber

2015-09-01

Full Text Available There has been considerable progress in the design and construction of quantum annealing devices. However, a conclusive detection of quantum speedup over traditional silicon-based machines remains elusive, despite multiple careful studies. In this work we outline strategies to design hard tunable benchmark instances based on insights from the study of spin glasses—the archetypal random benchmark problem for novel algorithms and optimization devices. We propose to complement head-to-head scaling studies that compare quantum annealing machines to state-of-the-art classical codes with an approach that compares the performance of different algorithms and/or computing architectures on different classes of computationally hard tunable spin-glass instances. The advantage of such an approach lies in having to compare only the performance hit felt by a given algorithm and/or architecture when the instance complexity is increased. Furthermore, we propose a methodology that might not directly translate into the detection of quantum speedup but might elucidate whether quantum annealing has a “quantum advantage” over corresponding classical algorithms, such as simulated annealing. Our results on a 496-qubit D-Wave Two quantum annealing device are compared to recently used state-of-the-art thermal simulated annealing codes.
The role of real-time in biomedical science: a meta-analysis on computational complexity, delay and speedup.

Science.gov (United States)

Faust, Oliver; Yu, Wenwei; Rajendra Acharya, U

2015-03-01

The concept of real-time is very important, as it deals with the realizability of computer based health care systems. In this paper we review biomedical real-time systems with a meta-analysis on computational complexity (CC), delay (Δ) and speedup (Sp). During the review we found that, in the majority of papers, the term real-time is part of the thesis indicating that a proposed system or algorithm is practical. However, these papers were not considered for detailed scrutiny. Our detailed analysis focused on papers which support their claim of achieving real-time, with a discussion on CC or Sp. These papers were analyzed in terms of processing system used, application area (AA), CC, Δ, Sp, implementation/algorithm (I/A) and competition. The results show that the ideas of parallel processing and algorithm delay were only recently introduced and journal papers focus more on Algorithm (A) development than on implementation (I). Most authors compete on big O notation (O) and processing time (PT). Based on these results, we adopt the position that the concept of real-time will continue to play an important role in biomedical systems design. We predict that parallel processing considerations, such as Sp and algorithm scaling, will become more important. Copyright © 2015 Elsevier Ltd. All rights reserved.
A Technique to Speedup Access to Web Contents

Indian Academy of Sciences (India)

Home; Journals; Resonance – Journal of Science Education; Volume 7; Issue 7. Web Caching - A Technique to Speedup Access to Web Contents. Harsha Srinath Shiva Shankar Ramanna. General Article Volume 7 Issue 7 July 2002 pp 54-62 ... Keywords. World wide web; data caching; internet traffic; web page access.
Quantum speedup in solving the maximal-clique problem

Science.gov (United States)

Chang, Weng-Long; Yu, Qi; Li, Zhaokai; Chen, Jiahui; Peng, Xinhua; Feng, Mang

2018-03-01

The maximal-clique problem, to find the maximally sized clique in a given graph, is classically an NP-complete computational problem, which has potential applications ranging from electrical engineering, computational chemistry, and bioinformatics to social networks. Here we develop a quantum algorithm to solve the maximal-clique problem for any graph G with n vertices with quadratic speedup over its classical counterparts, where the time and spatial complexities are reduced to, respectively, O (√{2n}) and O (n2) . With respect to oracle-related quantum algorithms for the NP-complete problems, we identify our algorithm as optimal. To justify the feasibility of the proposed quantum algorithm, we successfully solve a typical clique problem for a graph G with two vertices and one edge by carrying out a nuclear magnetic resonance experiment involving four qubits.

A simple, practical and complete O-time Algorithm for RNA folding using the Four-Russians Speedup

Directory of Open Access Journals (Sweden)

Gusfield Dan

2010-01-01

Full Text Available Abstract Background The problem of computationally predicting the secondary structure (or folding of RNA molecules was first introduced more than thirty years ago and yet continues to be an area of active research and development. The basic RNA-folding problem of finding a maximum cardinality, non-crossing, matching of complimentary nucleotides in an RNA sequence of length n, has an O(n3-time dynamic programming solution that is widely applied. It is known that an o(n3 worst-case time solution is possible, but the published and suggested methods are complex and have not been established to be practical. Significant practical improvements to the original dynamic programming method have been introduced, but they retain the O(n3 worst-case time bound when n is the only problem-parameter used in the bound. Surprisingly, the most widely-used, general technique to achieve a worst-case (and often practical speed up of dynamic programming, the Four-Russians technique, has not been previously applied to the RNA-folding problem. This is perhaps due to technical issues in adapting the technique to RNA-folding. Results In this paper, we give a simple, complete, and practical Four-Russians algorithm for the basic RNA-folding problem, achieving a worst-case time-bound of O(n3/log(n. Conclusions We show that this time-bound can also be obtained for richer nucleotide matching scoring-schemes, and that the method achieves consistent speed-ups in practice. The contribution is both theoretical and practical, since the basic RNA-folding problem is often solved multiple times in the inner-loop of more complex algorithms, and for long RNA molecules in the study of RNA virus genomes.
Evaluation of speedup of Monte Carlo calculations of two simple reactor physics problems coded for the GPU/CUDA environment

International Nuclear Information System (INIS)

Ding, Aiping; Liu, Tianyu; Liang, Chao; Ji, Wei; Shephard, Mark S.; Xu, X George; Brown, Forrest B.

2011-01-01

Monte Carlo simulation is ideally suited for solving Boltzmann neutron transport equation in inhomogeneous media. However, routine applications require the computation time to be reduced to hours and even minutes in a desktop system. The interest in adopting GPUs for Monte Carlo acceleration is rapidly mounting, fueled partially by the parallelism afforded by the latest GPU technologies and the challenge to perform full-size reactor core analysis on a routine basis. In this study, Monte Carlo codes for a fixed-source neutron transport problem and an eigenvalue/criticality problem were developed for CPU and GPU environments, respectively, to evaluate issues associated with computational speedup afforded by the use of GPUs. The results suggest that a speedup factor of 30 in Monte Carlo radiation transport of neutrons is within reach using the state-of-the-art GPU technologies. However, for the eigenvalue/criticality problem, the speedup was 8.5. In comparison, for a task of voxelizing unstructured mesh geometry that is more parallel in nature, the speedup of 45 was obtained. It was observed that, to date, most attempts to adopt GPUs for Monte Carlo acceleration were based on naïve implementations and have not yielded the level of anticipated gains. Successful implementation of Monte Carlo schemes for GPUs will likely require the development of an entirely new code. Given the prediction that future-generation GPU products will likely bring exponentially improved computing power and performances, innovative hardware and software solutions may make it possible to achieve full-core Monte Carlo calculation within one hour using a desktop computer system in a few years. (author)
Experimental observation of pulse delay and speed-up in cascaded quantum well gain and absorber media

DEFF Research Database (Denmark)

Hansen, Per Lunnemann; Poel, Mike van der; Yvind, Kresten

2008-01-01

Slow-down and speed-up of 180 fs pulses in semiconductor waveguides beyond the existing models is obseved. Cascaded gain and absorbing sections is shown to provide significant temporal pulse shifting at near constant output pulse energy.......Slow-down and speed-up of 180 fs pulses in semiconductor waveguides beyond the existing models is obseved. Cascaded gain and absorbing sections is shown to provide significant temporal pulse shifting at near constant output pulse energy....
The quadratic speedup in Grover's search algorithm from the entanglement perspective

International Nuclear Information System (INIS)

Rungta, Pranaw

2009-01-01

We show that Grover's algorithm can be described as an iterative change of the bipartite entanglement, which leads to a necessary and sufficient condition for quadratic speedup. This allows us to reestablish, from the entanglement perspective, that Grover's search algorithm is the only optimal pure state search algorithm.
CUDA-based real time surgery simulation.

Science.gov (United States)

Liu, Youquan; De, Suvranu

2008-01-01

In this paper we present a general software platform that enables real time surgery simulation on the newly available compute unified device architecture (CUDA)from NVIDIA. CUDA-enabled GPUs harness the power of 128 processors which allow data parallel computations. Compared to the previous GPGPU, it is significantly more flexible with a C language interface. We report implementation of both collision detection and consequent deformation computation algorithms. Our test results indicate that the CUDA enables a twenty times speedup for collision detection and about fifteen times speedup for deformation computation on an Intel Core 2 Quad 2.66 GHz machine with GeForce 8800 GTX.
SPEEDUP modeling of the defense waste processing facility at the SRS

International Nuclear Information System (INIS)

Smith, F.G. III.

1997-01-01

A computer model has been developed for the dynamic simulation of batch process operations within the Defense Waste Processing Facility (DWPF) at the Savannah River Site (SRS). The DWPF chemically treats high level waste materials from the site tank farm and vitrifies the resulting slurry into a borosilicate glass for permanent disposal. The DWPF consists of three major processing areas: Salt Processing Cell (SPC), Chemical Processing Cell (CPC) and the Melt Cell. A fully integrated model of these process units has been developed using the SPEEDUP trademark software from Aspen Technology. Except for glass production in the Melt Cell, all of the chemical operations within DWPF are batch processes. Since SPEEDUP is designed for dynamic modeling of continuous processes, considerable effort was required to device batch process algorithms. This effort was successful and the model is able to simulate batch operations and the dynamic behavior of the process. The model also includes an optimization calculation that maximizes the waste content in the final glass product. In this paper, we will describe the process model in some detail and present preliminary results from a few simulation studies
A simple, practical and complete O(n3/log n)-time algorithm for RNA folding using the Four-Russians speedup.

Science.gov (United States)

Frid, Yelena; Gusfield, Dan

2010-01-04

The problem of computationally predicting the secondary structure (or folding) of RNA molecules was first introduced more than thirty years ago and yet continues to be an area of active research and development. The basic RNA-folding problem of finding a maximum cardinality, non-crossing, matching of complimentary nucleotides in an RNA sequence of length n, has an O(n3)-time dynamic programming solution that is widely applied. It is known that an o(n3) worst-case time solution is possible, but the published and suggested methods are complex and have not been established to be practical. Significant practical improvements to the original dynamic programming method have been introduced, but they retain the O(n3) worst-case time bound when n is the only problem-parameter used in the bound. Surprisingly, the most widely-used, general technique to achieve a worst-case (and often practical) speed up of dynamic programming, the Four-Russians technique, has not been previously applied to the RNA-folding problem. This is perhaps due to technical issues in adapting the technique to RNA-folding. In this paper, we give a simple, complete, and practical Four-Russians algorithm for the basic RNA-folding problem, achieving a worst-case time-bound of O(n3/log(n)). We show that this time-bound can also be obtained for richer nucleotide matching scoring-schemes, and that the method achieves consistent speed-ups in practice. The contribution is both theoretical and practical, since the basic RNA-folding problem is often solved multiple times in the inner-loop of more complex algorithms, and for long RNA molecules in the study of RNA virus genomes.
Design Patterns to Achieve 300x Speedup for Oceanographic Analytics in the Cloud

Science.gov (United States)

Jacob, J. C.; Greguska, F. R., III; Huang, T.; Quach, N.; Wilson, B. D.

2017-12-01

We describe how we achieve super-linear speedup over standard approaches for oceanographic analytics on a cluster computer and the Amazon Web Services (AWS) cloud. NEXUS is an open source platform for big data analytics in the cloud that enables this performance through a combination of horizontally scalable data parallelism with Apache Spark and rapid data search, subset, and retrieval with tiled array storage in cloud-aware NoSQL databases like Solr and Cassandra. NEXUS is the engine behind several public portals at NASA and OceanWorks is a newly funded project for the ocean community that will mature and extend this capability for improved data discovery, subset, quality screening, analysis, matchup of satellite and in situ measurements, and visualization. We review the Python language API for Spark and how to use it to quickly convert existing programs to use Spark to run with cloud-scale parallelism, and discuss strategies to improve performance. We explain how partitioning the data over space, time, or both leads to algorithmic design patterns for Spark analytics that can be applied to many different algorithms. We use NEXUS analytics as examples, including area-averaged time series, time averaged map, and correlation map.
Methods for calculating the speed-up characteristics of steam-water turbines

International Nuclear Information System (INIS)

Golovach, E.A.

1981-01-01

The methods of approximate and specified calculations of speed- up characteristics of steam-water turbines are considered. The specified non-linear method takes into account change of thermal efficiency, heat drop and losses in the turbine as well as vacuum break-up the condenser. Speed-up characteristics of the K-1000-60-1500 turbine are presented. The calculational results obtained by the non-linear method are compared with the calculations conducted by the approximate linearized method. Differences in the frequency speed up of the turbine rotor rotation calculated by the two methods constitute only 0.5-2.0%. That is why it is necessary to take into account in the specified calculations first of all the most important factors following the rotor speed- up in the following consequence: valve shift of the high pressure cylinder (HPC); steam volume in front of the HPC; shift of the valves behind the separator-steam superheater (SSS); steam volumes and moisture boiling in the SSS; steam consumption for regenerating heating of feed water, steam volumes at the intermediate elements of the turbine, losses in the turbine, heat drop and thermal efficiency [ru
Environment-Assisted Speed-up of the Field Evolution in Cavity Quantum Electrodynamics.

Science.gov (United States)

Cimmarusti, A D; Yan, Z; Patterson, B D; Corcos, L P; Orozco, L A; Deffner, S

2015-06-12

We measure the quantum speed of the state evolution of the field in a weakly driven optical cavity QED system. To this end, the mode of the electromagnetic field is considered as a quantum system of interest with a preferential coupling to a tunable environment: the atoms. By controlling the environment, i.e., changing the number of atoms coupled to the optical cavity mode, an environment-assisted speed-up is realized: the quantum speed of the state repopulation in the optical cavity increases with the coupling strength between the optical cavity mode and this non-Markovian environment (the number of atoms).
Environment-Assisted Speed-up of the Field Evolution in Cavity Quantum Electrodynamics

International Nuclear Information System (INIS)

Cimmarusti, A. D.; Yan, Z.; Patterson, B. D.; Corcos, L. P.; Orozco, L. A.; Deffner, S.

2015-01-01

We measure the quantum speed of the state evolution of the field in a weakly-driven optical cavity QED system. To this end, the mode of the electromagnetic field is considered as a quantum system of interest with a preferential coupling to a tunable environment: the atoms. By controlling the environment, i.e., changing the number of atoms coupled to the optical cavity mode, an environment assisted speed-up is realized: the quantum speed of the state re-population in the optical cavity increases with the coupling strength between the optical cavity mode and this non-Markovian environment (the number of atoms)
The method of speed-up teaching the technique of ski sport of students of the second course of higher sport institute

Directory of Open Access Journals (Sweden)

Sidorova T.V.

2010-09-01

Full Text Available The rational method of the speed-up teaching of students of flat rate of discipline is certain «Ski sport» to on to credit-module to the system. 60 students took part in an experiment. In basis of the speed-up teaching fixed integrally-separate going near mastering and perfection of technique of methods of movement on pattens. Optimum correlation of employments is set at teaching the technique of classic and skating styles of movement on pattens taking into account морфо-функциональных and physical qualities of students.
Relating two proposed methods for speedup of algorithms for fitting two- and three-way principal component and related multilinear models

NARCIS (Netherlands)

Kiers, Henk A.L.; Harshman, Richard A.

Multilinear analysis methods such as component (and three-way component) analysis of very large data sets can become very computationally demanding and even infeasible unless some method is used to compress the data and/or speed up the algorithms. We discuss two previously proposed speedup methods.
Climate-driven speedup of alpine treeline forest growth in the Tianshan Mountains, Northwestern China.

Science.gov (United States)

Qi, Zhaohuan; Liu, Hongyan; Wu, Xiuchen; Hao, Qian

2015-02-01

Forest growth is sensitive to interannual climatic change in the alpine treeline ecotone (ATE). Whether the alpine treeline ecotone shares a similar pattern of forest growth with lower elevational closed forest belt (CFB) under changing climate remains unclear. Here, we reported an unprecedented acceleration of Picea schrenkiana forest growth since 1960s in the ATE of Tianshan Mountains, northwestern China by a stand-total sampling along six altitudinal transects with three plots in each transect: one from the ATE between the treeline and the forest line, and the other two from the CFB. All the sampled P. schrenkiana forest patches show a higher growth speed after 1960 and, comparatively, forest growth in the CFB has sped up much slower than that in the ATE. The speedup of forest growth at the ATE is mainly accounted for by climate factors, with increasing temperature suggested to be the primary driver. Stronger water deficit as well as more competition within the CFB might have restricted forest growth there more than that within the ATE, implying biotic factors were also significant for the accelerated forest growth in the ATE, which should be excluded from simulations and predictions of warming-induced treeline dynamics. © 2014 John Wiley & Sons Ltd.
Real-time quantitative phase reconstruction in off-axis digital holography using multiplexing.

Science.gov (United States)

Girshovitz, Pinhas; Shaked, Natan T

2014-04-15

We present a new approach for obtaining significant speedup in the digital processing of extracting unwrapped phase profiles from off-axis digital holograms. The new technique digitally multiplexes two orthogonal off-axis holograms, where the digital reconstruction, including spatial filtering and two-dimensional phase unwrapping on a decreased number of pixels, can be performed on both holograms together, without redundant operations. Using this technique, we were able to reconstruct, for the first time to our knowledge, unwrapped phase profiles from off-axis holograms with 1 megapixel in more than 30 frames per second using a standard single-core personal computer on a MATLAB platform, without using graphic-processing-unit programming or parallel computing. This new technique is important for real-time quantitative visualization and measurements of highly dynamic samples and is applicable for a wide range of applications, including rapid biological cell imaging and real-time nondestructive testing. After comparing the speedups obtained by the new technique for holograms of various sizes, we present experimental results of real-time quantitative phase visualization of cells flowing rapidly through a microchannel.
Acceleration for 2D time-domain elastic full waveform inversion using a single GPU card

Science.gov (United States)

Jiang, Jinpeng; Zhu, Peimin

2018-05-01

Full waveform inversion (FWI) is a challenging procedure due to the high computational cost related to the modeling, especially for the elastic case. The graphics processing unit (GPU) has become a popular device for the high-performance computing (HPC). To reduce the long computation time, we design and implement the GPU-based 2D elastic FWI (EFWI) in time domain using a single GPU card. We parallelize the forward modeling and gradient calculations using the CUDA programming language. To overcome the limitation of relatively small global memory on GPU, the boundary saving strategy is exploited to reconstruct the forward wavefield. Moreover, the L-BFGS optimization method used in the inversion increases the convergence of the misfit function. A multiscale inversion strategy is performed in the workflow to obtain the accurate inversion results. In our tests, the GPU-based implementations using a single GPU device achieve >15 times speedup in forward modeling, and about 12 times speedup in gradient calculation, compared with the eight-core CPU implementations optimized by OpenMP. The test results from the GPU implementations are verified to have enough accuracy by comparing the results obtained from the CPU implementations.
Who Has the Time? The Relationship between Household Labor Time and Sexual Frequency

Science.gov (United States)

Gager, Constance T.; Yabiku, Scott T.

2010-01-01

Motivated by the trend of women spending more time in paid labor and the general speedup of everyday life, the authors explore whether the resulting time crunch affects sexual frequency among married couples. Although prior research has examined the associations between relationship quality and household labor time, few have examined a dimension…
Multigrid Reduction in Time for Nonlinear Parabolic Problems

Energy Technology Data Exchange (ETDEWEB)

Falgout, R. D. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Manteuffel, T. A. [Univ. of Colorado, Boulder, CO (United States); O' Neill, B. [Univ. of Colorado, Boulder, CO (United States); Schroder, J. B. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

2016-01-04

The need for parallel-in-time is being driven by changes in computer architectures, where future speed-ups will be available through greater concurrency, but not faster clock speeds, which are stagnant.This leads to a bottleneck for sequential time marching schemes, because they lack parallelism in the time dimension. Multigrid Reduction in Time (MGRIT) is an iterative procedure that allows for temporal parallelism by utilizing multigrid reduction techniques and a multilevel hierarchy of coarse time grids. MGRIT has been shown to be effective for linear problems, with speedups of up to 50 times. The goal of this work is the efficient solution of nonlinear problems with MGRIT, where efficient is defined as achieving similar performance when compared to a corresponding linear problem. As our benchmark, we use the p-Laplacian, where p = 4 corresponds to a well-known nonlinear diffusion equation and p = 2 corresponds to our benchmark linear diffusion problem. When considering linear problems and implicit methods, the use of optimal spatial solvers such as spatial multigrid imply that the cost of one time step evaluation is fixed across temporal levels, which have a large variation in time step sizes. This is not the case for nonlinear problems, where the work required increases dramatically on coarser time grids, where relatively large time steps lead to worse conditioned nonlinear solves and increased nonlinear iteration counts per time step evaluation. This is the key difficulty explored by this paper. We show that by using a variety of strategies, most importantly, spatial coarsening and an alternate initial guess to the nonlinear time-step solver, we can reduce the work per time step evaluation over all temporal levels to a range similar with the corresponding linear problem. This allows for parallel scaling behavior comparable to the corresponding linear problem.
COMPARATIVE STUDY OF THREE LINEAR SYSTEM SOLVER APPLIED TO FAST DECOUPLED LOAD FLOW METHOD FOR CONTINGENCY ANALYSIS

Directory of Open Access Journals (Sweden)

Syafii

2017-03-01

Full Text Available This paper presents the assessment of fast decoupled load flow computation using three linear system solver scheme. The full matrix version of the fast decoupled load flow based on XB methods used in this study. The numerical investigations are carried out on the small and large test systems. The execution time of small system such as IEEE 14, 30, and 57 are very fast, therefore the computation time can not be compared for these cases. Another cases IEEE 118, 300 and TNB 664 produced significant execution speedup. The superLU factorization sparse matrix solver has best performance and speedup of load flow solution as well as in contigency analysis. The invers full matrix solver can solved only for IEEE 118 bus test system in 3.715 second and for another cases take too long time. However for superLU factorization linear solver can solved all of test system in 7.832 second for a largest of test system. Therefore the superLU factorization linear solver can be a viable alternative applied in contingency analysis.
Timing measurements of some tracking algorithms and suitability of FPGA's to improve the execution speed

CERN Document Server

Khomich, A; Kugel, A; Männer, R; Müller, M; Baines, J T M

2003-01-01

Some of track reconstruction algorithms which are common to all B-physics channels and standard RoI processing have been tested for execution time and assessed for suitability for speed-up by using FPGA coprocessor. The studies presented in this note were performed in the C/C++ framework, CTrig, which was the fullest set of algorithms available at the time of study For investigation of possible speed-up of algorithms most time consuming parts of TRT-LUT was implemented in VHDL for running in FPGA coprocessor board MPRACE. MPRACE (Reconfigurable Accelerator / Computing Engine) is an FPGA-Coprocessor based on Xilinx Virtex-2 FPGA and made as 64Bit/66MHz PCI card developed at the University of Mannheim. Timing measurements results for a TRT Full Scan algorithm executed on the MPRACE are presented here as well. The measurement results show a speed-up factor of ~2 for this algorithm.

Real-time computation of parameter fitting and image reconstruction using graphical processing units

Science.gov (United States)

Locans, Uldis; Adelmann, Andreas; Suter, Andreas; Fischer, Jannis; Lustermann, Werner; Dissertori, Günther; Wang, Qiulin

2017-06-01

In recent years graphical processing units (GPUs) have become a powerful tool in scientific computing. Their potential to speed up highly parallel applications brings the power of high performance computing to a wider range of users. However, programming these devices and integrating their use in existing applications is still a challenging task. In this paper we examined the potential of GPUs for two different applications. The first application, created at Paul Scherrer Institut (PSI), is used for parameter fitting during data analysis of μSR (muon spin rotation, relaxation and resonance) experiments. The second application, developed at ETH, is used for PET (Positron Emission Tomography) image reconstruction and analysis. Applications currently in use were examined to identify parts of the algorithms in need of optimization. Efficient GPU kernels were created in order to allow applications to use a GPU, to speed up the previously identified parts. Benchmarking tests were performed in order to measure the achieved speedup. During this work, we focused on single GPU systems to show that real time data analysis of these problems can be achieved without the need for large computing clusters. The results show that the currently used application for parameter fitting, which uses OpenMP to parallelize calculations over multiple CPU cores, can be accelerated around 40 times through the use of a GPU. The speedup may vary depending on the size and complexity of the problem. For PET image analysis, the obtained speedups of the GPU version were more than × 40 larger compared to a single core CPU implementation. The achieved results show that it is possible to improve the execution time by orders of magnitude.
Josephson comparator switching time

Energy Technology Data Exchange (ETDEWEB)

Herr, Quentin P; Miller, Donald L; Przybysz, John X [Northrop Grumman, Baltimore, MD (United States)

2006-05-15

Comparator performance can be characterized in terms of both sensitivity and decision time. Delta-sigma analogue-to-digital converters are tolerant of sensitivity errors but require short decision time due to feedback. We have analysed the Josephson comparator using the numerical solution of the Fokker-Planck equation, which describes the time evolution of the ensemble probability distribution. At balance, the result is essentially independent of temperature in the range 5-20 K. There is a very small probability, 1 x 10{sup -14}, that the decision time will be longer than seven single-flux-quantum pulse widths, defined as Phi{sub 0}/(I{sub c}R{sub n}). For junctions with a critical current density of 4.5 kA, this decision time is only 20 ps. Decision time error probability decreases rapidly with lengthening time interval, at a rate of two orders of magnitude per pulse width. We conclude that Josephson comparator performance is quite favourable for analogue-to-digital converter applications.
Optimization Techniques for Dimensionally Truncated Sparse Grids on Heterogeneous Systems

KAUST Repository

Deftu, A.; Murarasu, A.

2013-01-01

and especially the similarities between our optimization strategies for the two architectures. With regard to our test case for which achieving high speedups is a "must" for real-time visualization, we report a speedup of up to 6.2x times compared to the state
Finite Difference Time Domain (FDTD) Simulations Using Graphics Processors

National Research Council Canada - National Science Library

Adams, Samuel; Payne, Jason; Boppana, Rajendra

2007-01-01

.... This paper shows how GPUs can be used to greatly speedup FDTD simulations. The main objective is to leverage GPU processing power for FDTD update calculations and complete computationally expensive simulations in reasonable time...
Speedup of minimum discontinuity phase unwrapping algorithm with a reference phase distribution

Science.gov (United States)

Liu, Yihang; Han, Yu; Li, Fengjiao; Zhang, Qican

2018-06-01

In three-dimensional (3D) shape measurement based on phase analysis, the phase analysis process usually produces a wrapped phase map ranging from - π to π with some 2 π discontinuities, and thus a phase unwrapping algorithm is necessary to recover the continuous and nature phase map from which 3D height distribution can be restored. Usually, the minimum discontinuity phase unwrapping algorithm can be used to solve many different kinds of phase unwrapping problems, but its main drawback is that it requires a large amount of computations and has low efficiency in searching for the improving loop within the phase's discontinuity area. To overcome this drawback, an improvement to speedup of the minimum discontinuity phase unwrapping algorithm by using the phase distribution on reference plane is proposed. In this improved algorithm, before the minimum discontinuity phase unwrapping algorithm is carried out to unwrap phase, an integer number K was calculated from the ratio of the wrapped phase to the nature phase on a reference plane. And then the jump counts of the unwrapped phase can be reduced by adding 2K π, so the efficiency of the minimum discontinuity phase unwrapping algorithm is significantly improved. Both simulated and experimental data results verify the feasibility of the proposed improved algorithm, and both of them clearly show that the algorithm works very well and has high efficiency.
Changing basal conditions during the speed-up of Jakobshavn Isbræ, Greenland

Science.gov (United States)

Habermann, M.; Truffer, M.; Maxwell, D.

2013-11-01

Ice-sheet outlet glaciers can undergo dynamic changes such as the rapid speed-up of Jakobshavn Isbræ following the disintegration of its floating ice tongue. These changes are associated with stress changes on the boundary of the ice mass. We invert for basal conditions from surface velocity data throughout a well-observed period of rapid change and evaluate parameterizations currently used in ice-sheet models. A Tikhonov inverse method with a shallow-shelf approximation forward model is used for diagnostic inversions for the years 1985, 2000, 2005, 2006 and 2008. Our ice-softness, model norm, and regularization parameter choices are justified using the data-model misfit metric and the L curve method. The sensitivity of the inversion results to these parameter choices is explored. We find a lowering of effective basal yield stress in the first 7 km upstream from the 2008 grounding line and no significant changes higher upstream. The temporal evolution in the fast flow area is in broad agreement with a Mohr-Coulomb parameterization of basal shear stress, but with a till friction angle much lower than has been measured for till samples. The lowering of effective basal yield stress is significant within the uncertainties of the inversion, but it cannot be ruled out that there are other significant contributors to the acceleration of the glacier.
Changing basal conditions during the speed-up of Jakobshavn Isbræ, Greenland

Directory of Open Access Journals (Sweden)

M. Habermann

2013-11-01

Full Text Available Ice-sheet outlet glaciers can undergo dynamic changes such as the rapid speed-up of Jakobshavn Isbræ following the disintegration of its floating ice tongue. These changes are associated with stress changes on the boundary of the ice mass. We invert for basal conditions from surface velocity data throughout a well-observed period of rapid change and evaluate parameterizations currently used in ice-sheet models. A Tikhonov inverse method with a shallow-shelf approximation forward model is used for diagnostic inversions for the years 1985, 2000, 2005, 2006 and 2008. Our ice-softness, model norm, and regularization parameter choices are justified using the data-model misfit metric and the L curve method. The sensitivity of the inversion results to these parameter choices is explored. We find a lowering of effective basal yield stress in the first 7 km upstream from the 2008 grounding line and no significant changes higher upstream. The temporal evolution in the fast flow area is in broad agreement with a Mohr–Coulomb parameterization of basal shear stress, but with a till friction angle much lower than has been measured for till samples. The lowering of effective basal yield stress is significant within the uncertainties of the inversion, but it cannot be ruled out that there are other significant contributors to the acceleration of the glacier.
Design Flow Instantiation for Run-Time Reconfigurable Systems: A Case Study

Directory of Open Access Journals (Sweden)

Yang Qu

2007-12-01

Full Text Available Reconfigurable system is a promising alternative to deliver both flexibility and performance at the same time. New reconfigurable technologies and technology-dependent tools have been developed, but a complete overview of the whole design flow for run-time reconfigurable systems is missing. In this work, we present a design flow instantiation for such systems using a real-life application. The design flow is roughly divided into two parts: system level and implementation. At system level, our supports for hardware resource estimation and performance evaluation are applied. At implementation level, technology-dependent tools are used to realize the run-time reconfiguration. The design case is part of a WCDMA decoder on a commercially available reconfigurable platform. The results show that using run-time reconfiguration can save over 40% area when compared to a functionally equivalent fixed system and achieve 30 times speedup in processing time when compared to a functionally equivalent pure software design.
Extending molecular simulation time scales: Parallel in time integrations for high-level quantum chemistry and complex force representations

International Nuclear Information System (INIS)

Bylaska, Eric J.; Weare, Jonathan Q.; Weare, John H.

2013-01-01

Parallel in time simulation algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f (e.g., Verlet algorithm), is available to propagate the system from time t i (trajectory positions and velocities x i = (r i , v i )) to time t i+1 (x i+1 ) by x i+1 = f i (x i ), the dynamics problem spanning an interval from t 0 …t M can be transformed into a root finding problem, F(X) = [x i − f(x (i−1 )] i =1,M = 0, for the trajectory variables. The root finding problem is solved using a variety of root finding techniques, including quasi-Newton and preconditioned quasi-Newton schemes that are all unconditionally convergent. The algorithms are parallelized by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed parallel in time methods is discussed, and the effectiveness of various approaches to solving the root finding problem is tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD simulations, such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The parallel in time algorithms developed are tested by applying them to MD and AIMD simulations of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD simulation using Stillinger-Weber potentials, and a HCl + 4H 2 O AIMD simulation at the MP2 level. The maximum speedup ((serial execution time)/(parallel execution time) ) obtained by parallelizing the Stillinger-Weber MD simulation was nearly 3.0. For the AIMD MP2 simulations, the algorithms achieved speedups of up to 14.3. The parallel in time algorithms can be implemented in a
Extending molecular simulation time scales: Parallel in time integrations for high-level quantum chemistry and complex force representations.

Science.gov (United States)

Bylaska, Eric J; Weare, Jonathan Q; Weare, John H

2013-08-21

Parallel in time simulation algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f (e.g., Verlet algorithm), is available to propagate the system from time ti (trajectory positions and velocities xi = (ri, vi)) to time ti + 1 (xi + 1) by xi + 1 = fi(xi), the dynamics problem spanning an interval from t0[ellipsis (horizontal)]tM can be transformed into a root finding problem, F(X) = [xi - f(x(i - 1)]i = 1, M = 0, for the trajectory variables. The root finding problem is solved using a variety of root finding techniques, including quasi-Newton and preconditioned quasi-Newton schemes that are all unconditionally convergent. The algorithms are parallelized by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed parallel in time methods is discussed, and the effectiveness of various approaches to solving the root finding problem is tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD simulations, such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The parallel in time algorithms developed are tested by applying them to MD and AIMD simulations of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD simulation using Stillinger-Weber potentials, and a HCl + 4H2O AIMD simulation at the MP2 level. The maximum speedup (serial execution/timeparallel execution time) obtained by parallelizing the Stillinger-Weber MD simulation was nearly 3.0. For the AIMD MP2 simulations, the algorithms achieved speedups of up to 14.3. The parallel in time algorithms can be implemented in a
High Resolution Near Real Time Image Processing and Support for MSSS Modernization

Science.gov (United States)

Duncan, R. B.; Sabol, C.; Borelli, K.; Spetka, S.; Addison, J.; Mallo, A.; Farnsworth, B.; Viloria, R.

2012-09-01

This paper describes image enhancement software applications engineering development work that has been performed in support of Maui Space Surveillance System (MSSS) Modernization. It also includes R&D and transition activity that has been performed over the past few years with the objective of providing increased space situational awareness (SSA) capabilities. This includes Air Force Research Laboratory (AFRL) use of an FY10 Dedicated High Performance Investment (DHPI) cluster award -- and our selection and planned use for an FY12 DHPI award. We provide an introduction to image processing of electro optical (EO) telescope sensors data; and a high resolution image enhancement and near real time processing and summary status overview. We then describe recent image enhancement applications development and support for MSSS Modernization, results to date, and end with a discussion of desired future development work and conclusions. Significant improvements to image processing enhancement have been realized over the past several years, including a key application that has realized more than a 10,000-times speedup compared to the original R&D code -- and a greater than 72-times speedup over the past few years. The latest version of this code maintains software efficiency for post-mission processing while providing optimization for image processing of data from a new EO sensor at MSSS. Additional work has also been performed to develop low latency, near real time processing of data that is collected by the ground-based sensor during overhead passes of space objects.
An explicit multi-time-stepping algorithm for aerodynamic flows

OpenAIRE

Niemann-Tuitman, B.E.; Veldman, A.E.P.

1997-01-01

An explicit multi-time-stepping algorithm with applications to aerodynamic flows is presented. In the algorithm, in different parts of the computational domain different time steps are taken, and the flow is synchronized at the so-called synchronization levels. The algorithm is validated for aerodynamic turbulent flows. For two-dimensional flows speedups in the order of five with respect to single time stepping are obtained.
SPARSE: quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics.

Science.gov (United States)

Will, Sebastian; Otto, Christina; Miladi, Milad; Möhl, Mathias; Backofen, Rolf

2015-08-01

RNA-Seq experiments have revealed a multitude of novel ncRNAs. The gold standard for their analysis based on simultaneous alignment and folding suffers from extreme time complexity of [Formula: see text]. Subsequently, numerous faster 'Sankoff-style' approaches have been suggested. Commonly, the performance of such methods relies on sequence-based heuristics that restrict the search space to optimal or near-optimal sequence alignments; however, the accuracy of sequence-based methods breaks down for RNAs with sequence identities below 60%. Alignment approaches like LocARNA that do not require sequence-based heuristics, have been limited to high complexity ([Formula: see text] quartic time). Breaking this barrier, we introduce the novel Sankoff-style algorithm 'sparsified prediction and alignment of RNAs based on their structure ensembles (SPARSE)', which runs in quadratic time without sequence-based heuristics. To achieve this low complexity, on par with sequence alignment algorithms, SPARSE features strong sparsification based on structural properties of the RNA ensembles. Following PMcomp, SPARSE gains further speed-up from lightweight energy computation. Although all existing lightweight Sankoff-style methods restrict Sankoff's original model by disallowing loop deletions and insertions, SPARSE transfers the Sankoff algorithm to the lightweight energy model completely for the first time. Compared with LocARNA, SPARSE achieves similar alignment and better folding quality in significantly less time (speedup: 3.7). At similar run-time, it aligns low sequence identity instances substantially more accurate than RAF, which uses sequence-based heuristics. © The Author 2015. Published by Oxford University Press.
Extending molecular simulation time scales: Parallel in time integrations for high-level quantum chemistry and complex force representations

Energy Technology Data Exchange (ETDEWEB)

Bylaska, Eric J., E-mail: Eric.Bylaska@pnnl.gov [Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, P.O. Box 999, Richland, Washington 99352 (United States); Weare, Jonathan Q., E-mail: weare@uchicago.edu [Department of Mathematics, University of Chicago, Chicago, Illinois 60637 (United States); Weare, John H., E-mail: jweare@ucsd.edu [Department of Chemistry and Biochemistry, University of California, San Diego, La Jolla, California 92093 (United States)

2013-08-21

Parallel in time simulation algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f (e.g., Verlet algorithm), is available to propagate the system from time t{sub i} (trajectory positions and velocities x{sub i} = (r{sub i}, v{sub i})) to time t{sub i+1} (x{sub i+1}) by x{sub i+1} = f{sub i}(x{sub i}), the dynamics problem spanning an interval from t{sub 0}…t{sub M} can be transformed into a root finding problem, F(X) = [x{sub i} − f(x{sub (i−1})]{sub i} {sub =1,M} = 0, for the trajectory variables. The root finding problem is solved using a variety of root finding techniques, including quasi-Newton and preconditioned quasi-Newton schemes that are all unconditionally convergent. The algorithms are parallelized by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed parallel in time methods is discussed, and the effectiveness of various approaches to solving the root finding problem is tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD simulations, such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The parallel in time algorithms developed are tested by applying them to MD and AIMD simulations of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD simulation using Stillinger-Weber potentials, and a HCl + 4H{sub 2}O AIMD simulation at the MP2 level. The maximum speedup ((serial execution time)/(parallel execution time) ) obtained by parallelizing the Stillinger-Weber MD simulation was nearly 3.0. For the AIMD MP2 simulations, the algorithms achieved speedups of up
Time and Outcome Framing in Intertemporal Tradeoffs

Science.gov (United States)

Scholten, Marc; Read, Daniel

2013-01-01

A robust anomaly in intertemporal choice is the delay-speedup asymmetry: Receipts are discounted more, and payments are discounted less, when delayed than when expedited over the same interval. We developed 2 versions of the tradeoff model (Scholten & Read, 2010) to address such situations, in which an outcome is expected at a given time but…
A multi-GPU real-time dose simulation software framework for lung radiotherapy.

Science.gov (United States)

Santhanam, A P; Min, Y; Neelakkantan, H; Papp, N; Meeks, S L; Kupelian, P A

2012-09-01

Medical simulation frameworks facilitate both the preoperative and postoperative analysis of the patient's pathophysical condition. Of particular importance is the simulation of radiation dose delivery for real-time radiotherapy monitoring and retrospective analyses of the patient's treatment. In this paper, a software framework tailored for the development of simulation-based real-time radiation dose monitoring medical applications is discussed. A multi-GPU-based computational framework coupled with inter-process communication methods is introduced for simulating the radiation dose delivery on a deformable 3D volumetric lung model and its real-time visualization. The model deformation and the corresponding dose calculation are allocated among the GPUs in a task-specific manner and is performed in a pipelined manner. Radiation dose calculations are computed on two different GPU hardware architectures. The integration of this computational framework with a front-end software layer and back-end patient database repository is also discussed. Real-time simulation of the dose delivered is achieved at once every 120 ms using the proposed framework. With a linear increase in the number of GPU cores, the computational time of the simulation was linearly decreased. The inter-process communication time also improved with an increase in the hardware memory. Variations in the delivered dose and computational speedup for variations in the data dimensions are investigated using D70 and D90 as well as gEUD as metrics for a set of 14 patients. Computational speed-up increased with an increase in the beam dimensions when compared with a CPU-based commercial software while the error in the dose calculation was lung model-based radiotherapy is an effective tool for performing both real-time and retrospective analyses.
Parareal in Time for Dynamic Simulations of Power Systems

Energy Technology Data Exchange (ETDEWEB)

Gurrala, Gurunath [ORNL; Dimitrovski, Aleksandar D [ORNL; Pannala, Sreekanth [ORNL; Simunovic, Srdjan [ORNL; Starke, Michael R [ORNL

2015-01-01

In recent years, there have been significant developments in parallel algorithms and high performance parallel computing platforms. Parareal in time algorithm has become popular for long transient simulations (e.g., molecular dynamics, fusion, reacting flows). Parareal is a parallel algorithm which divides the time interval into sub-intervals and solves them concurrently. This paper investigates the applicability of the parareal algorithm to power system dynamic simulations. Preliminary results on the application of parareal for multi-machine power systems are reported in this paper. Two widely used test systems, WECC 3-generator 9-bus system, New England 10-generator 39- bus system, is used to explore the effectiveness of the parareal. Severe 3 phase bus faults are simulated using both the classical and detailed models of multi-machine power systems. Actual Speedup of 5-7 times is observed assuming ideal parallelization. It has been observed that the speedup factors of the order of 20 can be achieved by using fast coarse approximations of power system models. Dependency of parareal convergence on fault duration and location has been observed.
On the Feasibility and Limitations of Just-in-Time Instruction Set Extension for FPGA-Based Reconfigurable Processors

Directory of Open Access Journals (Sweden)

Mariusz Grad

2012-01-01

Full Text Available Reconfigurable instruction set processors provide the possibility of tailor the instruction set of a CPU to a particular application. While this customization process could be performed during runtime in order to adapt the CPU to the currently executed workload, this use case has been hardly investigated. In this paper, we study the feasibility of moving the customization process to runtime and evaluate the relation of the expected speedups and the associated overheads. To this end, we present a tool flow that is tailored to the requirements of this just-in-time ASIP specialization scenario. We evaluate our methods by targeting our previously introduced Woolcano reconfigurable ASIP architecture for a set of applications from the SPEC2006, SPEC2000, MiBench, and SciMark2 benchmark suites. Our results show that just-in-time ASIP specialization is promising for embedded computing applications, where average speedups of 5x can be achieved by spending 50 minutes for custom instruction identification and hardware generation. These overheads will be compensated if the applications execute for more than 2 hours. For the scientific computing benchmarks, the achievable speedup is only 1.2x, which requires significant execution times in the order of days to amortize the overheads.
Development of volume rendering module for real-time visualization system

International Nuclear Information System (INIS)

Otani, Takayuki; Muramatsu, Kazuhiro

2000-03-01

Volume rendering is a method to visualize the distribution of physical quantities in the three dimensional space from any viewpoint by tracing the ray direction on the ordinary two dimensional monitoring display. It enables to provide the interior information as well as the surfacial one by producing the translucent images. Therefore, it is regarded as a very useful means as well as an important one in the analysis of the computational results of the scientific calculations, although it has, unfortunately, disadvantage to need a large amount of computing time. This report describes algorithm and its performance of the volume rendering soft-ware which was developed as an important functional module in the real-time visualization system PATRAS. This module can directly visualize the computed results on BFC grid. Moreover, it has already realized the speed-up in some parts of the software by the use of a newly developed heuristic technique. This report includes the investigation on the speed-up of the software by parallel processing. (author)
SPARSE: quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics

Science.gov (United States)

Will, Sebastian; Otto, Christina; Miladi, Milad; Möhl, Mathias; Backofen, Rolf

2015-01-01

Motivation: RNA-Seq experiments have revealed a multitude of novel ncRNAs. The gold standard for their analysis based on simultaneous alignment and folding suffers from extreme time complexity of O(n6). Subsequently, numerous faster ‘Sankoff-style’ approaches have been suggested. Commonly, the performance of such methods relies on sequence-based heuristics that restrict the search space to optimal or near-optimal sequence alignments; however, the accuracy of sequence-based methods breaks down for RNAs with sequence identities below 60%. Alignment approaches like LocARNA that do not require sequence-based heuristics, have been limited to high complexity (≥ quartic time). Results: Breaking this barrier, we introduce the novel Sankoff-style algorithm ‘sparsified prediction and alignment of RNAs based on their structure ensembles (SPARSE)’, which runs in quadratic time without sequence-based heuristics. To achieve this low complexity, on par with sequence alignment algorithms, SPARSE features strong sparsification based on structural properties of the RNA ensembles. Following PMcomp, SPARSE gains further speed-up from lightweight energy computation. Although all existing lightweight Sankoff-style methods restrict Sankoff’s original model by disallowing loop deletions and insertions, SPARSE transfers the Sankoff algorithm to the lightweight energy model completely for the first time. Compared with LocARNA, SPARSE achieves similar alignment and better folding quality in significantly less time (speedup: 3.7). At similar run-time, it aligns low sequence identity instances substantially more accurate than RAF, which uses sequence-based heuristics. Availability and implementation: SPARSE is freely available at http://www.bioinf.uni-freiburg.de/Software/SPARSE. Contact: backofen@informatik.uni-freiburg.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25838465

Porting of the transfer-matrix method for multilayer thin-film computations on graphics processing units

Science.gov (United States)

Limmer, Steffen; Fey, Dietmar

2013-07-01

Thin-film computations are often a time-consuming task during optical design. An efficient way to accelerate these computations with the help of graphics processing units (GPUs) is described. It turned out that significant speed-ups can be achieved. We investigate the circumstances under which the best speed-up values can be expected. Therefore we compare different GPUs among themselves and with a modern CPU. Furthermore, the effect of thickness modulation on the speed-up and the runtime behavior depending on the input data is examined.
Rigid Body Sampling and Individual Time Stepping for Rigid-Fluid Coupling of Fluid Simulation

Directory of Open Access Journals (Sweden)

Xiaokun Wang

2017-01-01

Full Text Available In this paper, we propose an efficient and simple rigid-fluid coupling scheme with scientific programming algorithms for particle-based fluid simulation and three-dimensional visualization. Our approach samples the surface of rigid bodies with boundary particles that interact with fluids. It contains two procedures, that is, surface sampling and sampling relaxation, which insures uniform distribution of particles with less iterations. Furthermore, we present a rigid-fluid coupling scheme integrating individual time stepping to rigid-fluid coupling, which gains an obvious speedup compared to previous method. The experimental results demonstrate the effectiveness of our approach.
A real-time hybrid neuron network for highly parallel cognitive systems.

Science.gov (United States)

Christiaanse, Gerrit Jan; Zjajo, Amir; Galuzzi, Carlo; van Leuken, Rene

2016-08-01

For comprehensive understanding of how neurons communicate with each other, new tools need to be developed that can accurately mimic the behaviour of such neurons and neuron networks under `real-time' constraints. In this paper, we propose an easily customisable, highly pipelined, neuron network design, which executes optimally scheduled floating-point operations for maximal amount of biophysically plausible neurons per FPGA family type. To reduce the required amount of resources without adverse effect on the calculation latency, a single exponent instance is used for multiple neuron calculation operations. Experimental results indicate that the proposed network design allows the simulation of up to 1188 neurons on Virtex7 (XC7VX550T) device in brain real-time yielding a speed-up of x12.4 compared to the state-of-the art.
Achieving Performance Speed-up in FPGA Based Bit-Parallel Multipliers using Embedded Primitive and Macro support

Directory of Open Access Journals (Sweden)

Burhan Khurshid

2015-05-01

Full Text Available Modern Field Programmable Gate Arrays (FPGA are fast moving into the consumer market and their domain has expanded from prototype designing to low and medium volume productions. FPGAs are proving to be an attractive replacement for Application Specific Integrated Circuits (ASIC primarily because of the low Non-recurring Engineering (NRE costs associated with FPGA platforms. This has prompted FPGA vendors to improve the capacity and flexibility of the underlying primitive fabric and include specialized macro support and intellectual property (IP cores in their offerings. However, most of the work related to FPGA implementations does not take full advantage of these offerings. This is primarily because designers rely mainly on the technology-independent optimization to enhance the performance of the system and completely neglect the speed-up that is achievable using these embedded primitives and macro support. In this paper, we consider the technology-dependent optimization of fixed-point bit-parallel multipliers by carrying out their implementations using embedded primitives and macro support that are inherent in modern day FPGAs. Our implementation targets three different FPGA families viz. Spartan-6, Virtex-4 and Virtex-5. The implementation results indicate that a considerable speed up in performance is achievable using these embedded FPGA resources.
Plate Speed-up and Deceleration during Continental Rifting: Insights from Global 2D Mantle Convection Models.

Science.gov (United States)

Brune, S.; Ulvrova, M.; Williams, S.

2017-12-01

The surface of the Earth is divided into a jigsaw of tectonic plates, some carrrying continents that disperse and aggregate through time, forming transient supercontinents like Pangea and Rodinia. Here, we study continental rifting using large-scale numerical simulations with self-consistent evolution of plate boundaries, where continental break-up emerges spontaneously due to slab pull, basal drag and trench suction forces.We use the StagYY convection code employing a visco-plastic rheology in a spherical annulus geometry. We consider an incompressible mantle under the Boussinesq approximation that is basally and internally heated.We show that continental separation follows a characteristic evolution with three distinctive phases: (1) A pre-rift phase that typically lasts for several hundreds of millions of years with tectonic quiescence in the suture and extensional stresses that are slowly building up. (2) A rift phase that further divides into a slow rift period of several tens of millions of years where stresses continuously increase followed by a rift acceleration period featuring an abrupt stress drop within several millions of years. The speed-up takes place before lithospheric break-up and therefore affects the structural architecture of the rifted margins. (3) The drifting phase with initially high divergence rates persists over tens of millions of years until the system adjust to new conditions and the spreading typically slows down.By illustrating the geodynamic connection between subduction dynamics and rift evolution, our results allow new interpretations of plate tectonic reconstructions. Rift acceleration within the second phase of rifting is compensated by enhanced convergence rates at subduction zones. This model outcome predicts enhanced subduction velocities, e.g. between North America and the Farallon plate during Central Atlantic rifting 200 My ago, or closure of potential back-arc basins such as in the proto-Andean ranges of South America
Multiple time step integrators in ab initio molecular dynamics

International Nuclear Information System (INIS)

Luehr, Nathan; Martínez, Todd J.; Markland, Thomas E.

2014-01-01

Multiple time-scale algorithms exploit the natural separation of time-scales in chemical systems to greatly accelerate the efficiency of molecular dynamics simulations. Although the utility of these methods in systems where the interactions are described by empirical potentials is now well established, their application to ab initio molecular dynamics calculations has been limited by difficulties associated with splitting the ab initio potential into fast and slowly varying components. Here we present two schemes that enable efficient time-scale separation in ab initio calculations: one based on fragment decomposition and the other on range separation of the Coulomb operator in the electronic Hamiltonian. We demonstrate for both water clusters and a solvated hydroxide ion that multiple time-scale molecular dynamics allows for outer time steps of 2.5 fs, which are as large as those obtained when such schemes are applied to empirical potentials, while still allowing for bonds to be broken and reformed throughout the dynamics. This permits computational speedups of up to 4.4x, compared to standard Born-Oppenheimer ab initio molecular dynamics with a 0.5 fs time step, while maintaining the same energy conservation and accuracy
Time and outcome framing in intertemporal tradeoffs.

Science.gov (United States)

Scholten, Marc; Read, Daniel

2013-07-01

A robust anomaly in intertemporal choice is the delay-speedup asymmetry: Receipts are discounted more, and payments are discounted less, when delayed than when expedited over the same interval. We developed 2 versions of the tradeoff model (Scholten & Read, 2010) to address such situations, in which an outcome is expected at a given time but then its timing is changed. The outcome framing model generalizes the approach taken by the hyperbolic discounting model (Loewenstein & Prelec, 1992): Not obtaining a positive outcome when expected is a worse than expected state, to which people are over-responsive, or hypersensitive, and not incurring a negative outcome when expected is a better than expected state, to which people are under-responsive, or hyposensitive. The time framing model takes a new approach: Delaying a positive outcome or speeding up a negative one involves a loss of time to which people are hypersensitive, and speeding up a positive outcome or delaying a negative one involves a gain of time to which people are hyposensitive. We compare the models on their quantitative predictions of indifference data from matching and preference data from choice. The time framing model systematically outperforms the outcome framing model. PsycINFO Database Record (c) 2013 APA, all rights reserved.
Distributed Algorithms for Time Optimal Reachability Analysis

DEFF Research Database (Denmark)

Zhang, Zhengkui; Nielsen, Brian; Larsen, Kim Guldstrand

2016-01-01

. We propose distributed computing to accelerate time optimal reachability analysis. We develop five distributed state exploration algorithms, implement them in \\uppaal enabling it to exploit the compute resources of a dedicated model-checking cluster. We experimentally evaluate the implemented...... algorithms with four models in terms of their ability to compute near- or proven-optimal solutions, their scalability, time and memory consumption and communication overhead. Our results show that distributed algorithms work much faster than sequential algorithms and have good speedup in general.......Time optimal reachability analysis is a novel model based technique for solving scheduling and planning problems. After modeling them as reachability problems using timed automata, a real-time model checker can compute the fastest trace to the goal states which constitutes a time optimal schedule...
Comparing an FPGA to a Cell for an Image Processing Application

Science.gov (United States)

Rakvic, Ryan N.; Ngo, Hau; Broussard, Randy P.; Ives, Robert W.

2010-12-01

Modern advancements in configurable hardware, most notably Field-Programmable Gate Arrays (FPGAs), have provided an exciting opportunity to discover the parallel nature of modern image processing algorithms. On the other hand, PlayStation3 (PS3) game consoles contain a multicore heterogeneous processor known as the Cell, which is designed to perform complex image processing algorithms at a high performance. In this research project, our aim is to study the differences in performance of a modern image processing algorithm on these two hardware platforms. In particular, Iris Recognition Systems have recently become an attractive identification method because of their extremely high accuracy. Iris matching, a repeatedly executed portion of a modern iris recognition algorithm, is parallelized on an FPGA system and a Cell processor. We demonstrate a 2.5 times speedup of the parallelized algorithm on the FPGA system when compared to a Cell processor-based version.
Comparing an FPGA to a Cell for an Image Processing Application

Directory of Open Access Journals (Sweden)

Robert W. Ives

2010-01-01

Full Text Available Modern advancements in configurable hardware, most notably Field-Programmable Gate Arrays (FPGAs, have provided an exciting opportunity to discover the parallel nature of modern image processing algorithms. On the other hand, PlayStation3 (PS3 game consoles contain a multicore heterogeneous processor known as the Cell, which is designed to perform complex image processing algorithms at a high performance. In this research project, our aim is to study the differences in performance of a modern image processing algorithm on these two hardware platforms. In particular, Iris Recognition Systems have recently become an attractive identification method because of their extremely high accuracy. Iris matching, a repeatedly executed portion of a modern iris recognition algorithm, is parallelized on an FPGA system and a Cell processor. We demonstrate a 2.5 times speedup of the parallelized algorithm on the FPGA system when compared to a Cell processor-based version.
Multi-GPU based acceleration of a list-mode DRAMA toward real-time OpenPET imaging

Energy Technology Data Exchange (ETDEWEB)

Kinouchi, Shoko [Chiba Univ. (Japan); National Institute of Radiological Sciences, Chiba (Japan); Yamaya, Taiga; Yoshida, Eiji; Tashima, Hideaki [National Institute of Radiological Sciences, Chiba (Japan); Kudo, Hiroyuki [Tsukuba Univ., Ibaraki (Japan); Suga, Mikio [Chiba Univ. (Japan)

2011-07-01

OpenPET, which has a physical gap between two detector rings, is our new PET geometry. In order to realize future radiation therapy guided by OpenPET, real-time imaging is required. Therefore we developed a list-mode image reconstruction method using general purpose graphic processing units (GPUs). For GPU implementation, the efficiency of acceleration depends on the implementation method which is required to avoid conditional statements. Therefore, in our previous study, we developed a new system model which was suited for the GPU implementation. In this paper, we implemented our image reconstruction method using 4 GPUs to get further acceleration. We applied the developed reconstruction method to a small OpenPET prototype. We obtained calculation times of total iteration using 4 GPUs that were 3.4 times faster than using a single GPU. Compared to using a single CPU, we achieved the reconstruction time speed-up of 142 times using 4 GPUs. (orig.)
How emotions change time

Directory of Open Access Journals (Sweden)

Annett eSchirmer

2011-10-01

Full Text Available Experimental evidence suggests that emotions can both speed-up and slow-down the internal clock. Speeding-up has been observed for to-be-timed emotional stimuli that have the capacity to sustain attention, whereas slowing-down has been observed for to-be-timed neutral stimuli that are presented in the context of emotional distractors. These effects have been explained by mechanisms that involve changes in bodily arousal, attention or sentience. A review of these mechanisms suggests both merits and difficulties in the explanation of the emotion-timing link. Therefore, a hybrid mechanism involving stimulus-specific sentient representations is proposed as a candidate for mediating emotional influences on time. According to this proposal, emotional events enhance sentient representations, which in turn support temporal estimates. Emotional stimuli with a larger share in ones sentience are then perceived as longer than neutral stimuli with a smaller share.
Explicit time integration of finite element models on a vectorized, concurrent computer with shared memory

Science.gov (United States)

Gilbertsen, Noreen D.; Belytschko, Ted

1990-01-01

The implementation of a nonlinear explicit program on a vectorized, concurrent computer with shared memory is described and studied. The conflict between vectorization and concurrency is described and some guidelines are given for optimal block sizes. Several example problems are summarized to illustrate the types of speed-ups which can be achieved by reprogramming as compared to compiler optimization.
OpenMP GNU and Intel Fortran programs for solving the time-dependent Gross-Pitaevskii equation

Science.gov (United States)

Young-S., Luis E.; Muruganandam, Paulsamy; Adhikari, Sadhan K.; Lončar, Vladimir; Vudragović, Dušan; Balaž, Antun

2017-11-01

We present Open Multi-Processing (OpenMP) version of Fortran 90 programs for solving the Gross-Pitaevskii (GP) equation for a Bose-Einstein condensate in one, two, and three spatial dimensions, optimized for use with GNU and Intel compilers. We use the split-step Crank-Nicolson algorithm for imaginary- and real-time propagation, which enables efficient calculation of stationary and non-stationary solutions, respectively. The present OpenMP programs are designed for computers with multi-core processors and optimized for compiling with both commercially-licensed Intel Fortran and popular free open-source GNU Fortran compiler. The programs are easy to use and are elaborated with helpful comments for the users. All input parameters are listed at the beginning of each program. Different output files provide physical quantities such as energy, chemical potential, root-mean-square sizes, densities, etc. We also present speedup test results for new versions of the programs. Program files doi:http://dx.doi.org/10.17632/y8zk3jgn84.2 Licensing provisions: Apache License 2.0 Programming language: OpenMP GNU and Intel Fortran 90. Computer: Any multi-core personal computer or workstation with the appropriate OpenMP-capable Fortran compiler installed. Number of processors used: All available CPU cores on the executing computer. Journal reference of previous version: Comput. Phys. Commun. 180 (2009) 1888; ibid.204 (2016) 209. Does the new version supersede the previous version?: Not completely. It does supersede previous Fortran programs from both references above, but not OpenMP C programs from Comput. Phys. Commun. 204 (2016) 209. Nature of problem: The present Open Multi-Processing (OpenMP) Fortran programs, optimized for use with commercially-licensed Intel Fortran and free open-source GNU Fortran compilers, solve the time-dependent nonlinear partial differential (GP) equation for a trapped Bose-Einstein condensate in one (1d), two (2d), and three (3d) spatial dimensions for
Quantum random-walk search algorithm

International Nuclear Information System (INIS)

Shenvi, Neil; Whaley, K. Birgitta; Kempe, Julia

2003-01-01

Quantum random walks on graphs have been shown to display many interesting properties, including exponentially fast hitting times when compared with their classical counterparts. However, it is still unclear how to use these novel properties to gain an algorithmic speedup over classical algorithms. In this paper, we present a quantum search algorithm based on the quantum random-walk architecture that provides such a speedup. It will be shown that this algorithm performs an oracle search on a database of N items with O(√(N)) calls to the oracle, yielding a speedup similar to other quantum search algorithms. It appears that the quantum random-walk formulation has considerable flexibility, presenting interesting opportunities for development of other, possibly novel quantum algorithms
Confabulation Based Real-time Anomaly Detection for Wide-area Surveillance Using Heterogeneous High Performance Computing Architecture

Science.gov (United States)

2015-06-01

CONFABULATION BASED REAL-TIME ANOMALY DETECTION FOR WIDE-AREA SURVEILLANCE USING HETEROGENEOUS HIGH PERFORMANCE COMPUTING ARCHITECTURE SYRACUSE...DETECTION FOR WIDE-AREA SURVEILLANCE USING HETEROGENEOUS HIGH PERFORMANCE COMPUTING ARCHITECTURE 5a. CONTRACT NUMBER FA8750-12-1-0251 5b. GRANT...processors including graphic processor units (GPUs) and Intel Xeon Phi processors. Experimental results showed significant speedups, which can enable
An Internal Data Non-hiding Type Real-time Kernel and its Application to the Mechatronics Controller

Science.gov (United States)

Yoshida, Toshio

For the mechatronics equipment controller that controls robots and machine tools, high-speed motion control processing is essential. The software system of the controller like other embedded systems is composed of three layers software such as real-time kernel layer, middleware layer, and application software layer on the dedicated hardware. The application layer in the top layer is composed of many numbers of tasks, and application function of the system is realized by the cooperation between these tasks. In this paper we propose an internal data non-hiding type real-time kernel in which customizing the task control is possible only by change in the program code of the task side without any changes in the program code of real-time kernel. It is necessary to reduce the overhead caused by the real-time kernel task control for the speed-up of the motion control of the mechatronics equipment. For this, customizing the task control function is needed. We developed internal data non-cryptic type real-time kernel ZRK to evaluate this method, and applied to the control of the multi system automatic lathe. The effect of the speed-up of the task cooperation processing was able to be confirmed by combined task control processing on the task side program code using an internal data non-hiding type real-time kernel ZRK.
Accuracy versus run time in an adiabatic quantum search

International Nuclear Information System (INIS)

Rezakhani, A. T.; Pimachev, A. K.; Lidar, D. A.

2010-01-01

Adiabatic quantum algorithms are characterized by their run time and accuracy. The relation between the two is essential for quantifying adiabatic algorithmic performance yet is often poorly understood. We study the dynamics of a continuous time, adiabatic quantum search algorithm and find rigorous results relating the accuracy and the run time. Proceeding with estimates, we show that under fairly general circumstances the adiabatic algorithmic error exhibits a behavior with two discernible regimes: The error decreases exponentially for short times and then decreases polynomially for longer times. We show that the well-known quadratic speedup over classical search is associated only with the exponential error regime. We illustrate the results through examples of evolution paths derived by minimization of the adiabatic error. We also discuss specific strategies for controlling the adiabatic error and run time.
Compositional data types

DEFF Research Database (Denmark)

Bahr, Patrick; Hvitved, Tom

2011-01-01

-cut fusion style deforestation which yields considerable speedups. We demonstrate our framework in the setting of compiler construction, and moreover, we compare compositional data types with generic programming techniques and show that both are comparable in run-time performance and expressivity while our...
An FPGA Architecture for Extracting Real-Time Zernike Coefficients from Measured Phase Gradients

Science.gov (United States)

Moser, Steven; Lee, Peter; Podoleanu, Adrian

2015-04-01

Zernike modes are commonly used in adaptive optics systems to represent optical wavefronts. However, real-time calculation of Zernike modes is time consuming due to two factors: the large factorial components in the radial polynomials used to define them and the large inverse matrix calculation needed for the linear fit. This paper presents an efficient parallel method for calculating Zernike coefficients from phase gradients produced by a Shack-Hartman sensor and its real-time implementation using an FPGA by pre-calculation and storage of subsections of the large inverse matrix. The architecture exploits symmetries within the Zernike modes to achieve a significant reduction in memory requirements and a speed-up of 2.9 when compared to published results utilising a 2D-FFT method for a grid size of 8×8. Analysis of processor element internal word length requirements show that 24-bit precision in precalculated values of the Zernike mode partial derivatives ensures less than 0.5% error per Zernike coefficient and an overall error of RAM usage is <16% for Shack-Hartmann grid sizes up to 32×32.

Highly comparative time-series analysis: the empirical structure of time series and their methods.

Science.gov (United States)

Fulcher, Ben D; Little, Max A; Jones, Nick S

2013-06-06

The process of collecting and organizing sets of observations represents a common theme throughout the history of science. However, despite the ubiquity of scientists measuring, recording and analysing the dynamics of different processes, an extensive organization of scientific time-series data and analysis methods has never been performed. Addressing this, annotated collections of over 35 000 real-world and model-generated time series, and over 9000 time-series analysis algorithms are analysed in this work. We introduce reduced representations of both time series, in terms of their properties measured by diverse scientific methods, and of time-series analysis methods, in terms of their behaviour on empirical time series, and use them to organize these interdisciplinary resources. This new approach to comparing across diverse scientific data and methods allows us to organize time-series datasets automatically according to their properties, retrieve alternatives to particular analysis methods developed in other scientific disciplines and automate the selection of useful methods for time-series classification and regression tasks. The broad scientific utility of these tools is demonstrated on datasets of electroencephalograms, self-affine time series, heartbeat intervals, speech signals and others, in each case contributing novel analysis techniques to the existing literature. Highly comparative techniques that compare across an interdisciplinary literature can thus be used to guide more focused research in time-series analysis for applications across the scientific disciplines.
GOTHIC: Gravitational oct-tree code accelerated by hierarchical time step controlling

Science.gov (United States)

Miki, Yohei; Umemura, Masayuki

2017-04-01

The tree method is a widely implemented algorithm for collisionless N-body simulations in astrophysics well suited for GPU(s). Adopting hierarchical time stepping can accelerate N-body simulations; however, it is infrequently implemented and its potential remains untested in GPU implementations. We have developed a Gravitational Oct-Tree code accelerated by HIerarchical time step Controlling named GOTHIC, which adopts both the tree method and the hierarchical time step. The code adopts some adaptive optimizations by monitoring the execution time of each function on-the-fly and minimizes the time-to-solution by balancing the measured time of multiple functions. Results of performance measurements with realistic particle distribution performed on NVIDIA Tesla M2090, K20X, and GeForce GTX TITAN X, which are representative GPUs of the Fermi, Kepler, and Maxwell generation of GPUs, show that the hierarchical time step achieves a speedup by a factor of around 3-5 times compared to the shared time step. The measured elapsed time per step of GOTHIC is 0.30 s or 0.44 s on GTX TITAN X when the particle distribution represents the Andromeda galaxy or the NFW sphere, respectively, with 224 = 16,777,216 particles. The averaged performance of the code corresponds to 10-30% of the theoretical single precision peak performance of the GPU.
Exploration of automatic optimisation for CUDA programming

KAUST Repository

Al-Mouhamed, Mayez; Khan, Ayaz ul Hassan

2014-01-01

© 2014 Taylor & Francis. Writing optimised compute unified device architecture (CUDA) program for graphic processing units (GPUs) is complex even for experts. We present a design methodology for a restructuring tool that converts C-loops into optimised CUDA kernels based on a three-step algorithm which are loop tiling, coalesced memory access and resource optimisation. A method for finding possible loop tiling solutions with coalesced memory access is developed and a simplified algorithm for restructuring C-loops into an efficient CUDA kernel is presented. In the evaluation, we implement matrix multiply (MM), matrix transpose (M-transpose), matrix scaling (M-scaling) and matrix vector multiply (MV) using the proposed algorithm. We present the analysis of the execution time and GPU throughput for the above applications, which favourably compare to other proposals. Evaluation is carried out while scaling the problem size and running under a variety of kernel configurations. The obtained speedup is about 28-35% for M-transpose compared to NVIDIA Software Development Kit, 33% speedup for MV compared to general purpose computation on graphics processing unit compiler, and more than 80% speedup for MM and M-scaling compared to CUDA-lite.
Exploration of automatic optimisation for CUDA programming

KAUST Repository

Al-Mouhamed, Mayez

2014-09-16

© 2014 Taylor & Francis. Writing optimised compute unified device architecture (CUDA) program for graphic processing units (GPUs) is complex even for experts. We present a design methodology for a restructuring tool that converts C-loops into optimised CUDA kernels based on a three-step algorithm which are loop tiling, coalesced memory access and resource optimisation. A method for finding possible loop tiling solutions with coalesced memory access is developed and a simplified algorithm for restructuring C-loops into an efficient CUDA kernel is presented. In the evaluation, we implement matrix multiply (MM), matrix transpose (M-transpose), matrix scaling (M-scaling) and matrix vector multiply (MV) using the proposed algorithm. We present the analysis of the execution time and GPU throughput for the above applications, which favourably compare to other proposals. Evaluation is carried out while scaling the problem size and running under a variety of kernel configurations. The obtained speedup is about 28-35% for M-transpose compared to NVIDIA Software Development Kit, 33% speedup for MV compared to general purpose computation on graphics processing unit compiler, and more than 80% speedup for MM and M-scaling compared to CUDA-lite.
GPU accelerated fully space and time resolved numerical simulations of self-focusing laser beams in SBS-active media

Energy Technology Data Exchange (ETDEWEB)

Mauger, Sarah; Colin de Verdière, Guillaume [CEA-DAM, DIF, 91297 Arpajon (France); Bergé, Luc, E-mail: luc.berge@cea.fr [CEA-DAM, DIF, 91297 Arpajon (France); Skupin, Stefan [Max Planck Institute for the Physics of Complex Systems, 01187 Dresden (Germany); Friedrich Schiller University, Institute of Condensed Matter Theory and Optics, 07743 Jena (Germany)

2013-02-15

A computer cluster equipped with Graphics Processing Units (GPUs) is used for simulating nonlinear optical wave packets undergoing Kerr self-focusing and stimulated Brillouin scattering in fused silica. We first recall the model equations in full (3+1) dimensions. These consist of two coupled nonlinear Schrödinger equations for counterpropagating optical beams closed with a source equation for light-induced acoustic waves seeded by thermal noise. Compared with simulations on a conventional cluster of Central Processing Units (CPUs), GPU-based computations allow us to use a significant (16 times) larger number of mesh points within similar computation times. Reciprocally, simulations employing the same number of mesh points are between 3 and 20 times faster on GPUs than on the same number of classical CPUs. Performance speedups close to 45 are reported for isolated functions evaluating, e.g., the optical nonlinearities. Since the field intensities may reach the ionization threshold of silica, the action of a defocusing electron plasma is also addressed.
GPU accelerated fully space and time resolved numerical simulations of self-focusing laser beams in SBS-active media

International Nuclear Information System (INIS)

Mauger, Sarah; Colin de Verdière, Guillaume; Bergé, Luc; Skupin, Stefan

2013-01-01

A computer cluster equipped with Graphics Processing Units (GPUs) is used for simulating nonlinear optical wave packets undergoing Kerr self-focusing and stimulated Brillouin scattering in fused silica. We first recall the model equations in full (3+1) dimensions. These consist of two coupled nonlinear Schrödinger equations for counterpropagating optical beams closed with a source equation for light-induced acoustic waves seeded by thermal noise. Compared with simulations on a conventional cluster of Central Processing Units (CPUs), GPU-based computations allow us to use a significant (16 times) larger number of mesh points within similar computation times. Reciprocally, simulations employing the same number of mesh points are between 3 and 20 times faster on GPUs than on the same number of classical CPUs. Performance speedups close to 45 are reported for isolated functions evaluating, e.g., the optical nonlinearities. Since the field intensities may reach the ionization threshold of silica, the action of a defocusing electron plasma is also addressed
Parallel time domain solvers for electrically large transient scattering problems

KAUST Repository

Liu, Yang

2014-09-26

Marching on in time (MOT)-based integral equation solvers represent an increasingly appealing avenue for analyzing transient electromagnetic interactions with large and complex structures. MOT integral equation solvers for analyzing electromagnetic scattering from perfect electrically conducting objects are obtained by enforcing electric field boundary conditions and implicitly time advance electric surface current densities by iteratively solving sparse systems of equations at all time steps. Contrary to finite difference and element competitors, these solvers apply to nonlinear and multi-scale structures comprising geometrically intricate and deep sub-wavelength features residing atop electrically large platforms. Moreover, they are high-order accurate, stable in the low- and high-frequency limits, and applicable to conducting and penetrable structures represented by highly irregular meshes. This presentation reviews some recent advances in the parallel implementations of time domain integral equation solvers, specifically those that leverage multilevel plane-wave time-domain algorithm (PWTD) on modern manycore computer architectures including graphics processing units (GPUs) and distributed memory supercomputers. The GPU-based implementation achieves at least one order of magnitude speedups compared to serial implementations while the distributed parallel implementation are highly scalable to thousands of compute-nodes. A distributed parallel PWTD kernel has been adopted to solve time domain surface/volume integral equations (TDSIE/TDVIE) for analyzing transient scattering from large and complex-shaped perfectly electrically conducting (PEC)/dielectric objects involving ten million/tens of millions of spatial unknowns.
GPU Computing in Bayesian Inference of Realized Stochastic Volatility Model

International Nuclear Information System (INIS)

Takaishi, Tetsuya

2015-01-01

The realized stochastic volatility (RSV) model that utilizes the realized volatility as additional information has been proposed to infer volatility of financial time series. We consider the Bayesian inference of the RSV model by the Hybrid Monte Carlo (HMC) algorithm. The HMC algorithm can be parallelized and thus performed on the GPU for speedup. The GPU code is developed with CUDA Fortran. We compare the computational time in performing the HMC algorithm on GPU (GTX 760) and CPU (Intel i7-4770 3.4GHz) and find that the GPU can be up to 17 times faster than the CPU. We also code the program with OpenACC and find that appropriate coding can achieve the similar speedup with CUDA Fortran
A Model for Speedup of Parallel Programs

Science.gov (United States)

1997-01-01

Sanjeev. K Setia . The interaction between mem- ory allocation and adaptive partitioning in message- passing multicomputers. In IPPS 󈨣 Workshop on Job...Scheduling Strategies for Parallel Processing, pages 89{99, 1995. [15] Sanjeev K. Setia and Satish K. Tripathi. A compar- ative analysis of static
Where Does the Time Go in Software DSMs?--Experiences with JIAJIA

Institute of Scientific and Technical Information of China (English)

SHI Weisong; HU Weiwu; TANGZhimin

1999-01-01

The performance gap between softwareDSM systems and message passing platforms prevents the prevalence ofsoftware DSM system greatly, though great efforts have been delivered inthis area in the past decade. In this paper, we take the challenge tofind where we should focus our efforts in the future design. Thecomponents of total system overhead of software DSM systems are analyzedin detail firstly. Based on a state-of-the-art software DSM systemJIAJIA, we measure these components on Dawning parallel system and drawfive important conclusions which are different from some traditionalviewpoints. (1) The performance of the JIAJIA software DSM system isacceptable. For four of eight applications, the parallel efficiencyachieved by JIAJIA is about 80%, while for two others, 70% efficiencycan be obtained. (2) 40.94% interrupt service time is overlapped withwaiting time. (3) Encoding and decoding diffs do not cost muchtime (<1%), so using hardware support to encode/decode diffs andsend/receive messages is not worthwhile. (4) Great endeavours should beput to reduce data miss penalty and optimize synchronization operations,which occupy 11.75% and 13.65% of total execution time respectively.(5) Communication hardware overhead occupies 66.76% of the wholecommunication time in the experimental environment, and communicationsoftware overhead does not take much time as expected.Moreover, by studying the effect of CPU speed to system overhead, wefind that the common speedup formula for distributed memory systems doesnot work under software DSM systems. Therefore, we design a new speedupformula special to software DSM systems, and point out that when the CPUspeed increases the speedup can be increased too even if the networkspeed is fixed, which is impossible in message passing systems. Finally,we argue that JIAJIA system has desired scalability.
A Scalable GVT Estimation Algorithm for PDES: Using Lower Bound of Event-Bulk-Time

Directory of Open Access Journals (Sweden)

Yong Peng

2015-01-01

Full Text Available Global Virtual Time computation of Parallel Discrete Event Simulation is crucial for conducting fossil collection and detecting the termination of simulation. The triggering condition of GVT computation in typical approaches is generally based on the wall-clock time or logical time intervals. However, the GVT value depends on the timestamps of events rather than the wall-clock time or logical time intervals. Therefore, it is difficult for the existing approaches to select appropriate time intervals to compute the GVT value. In this study, we propose a scalable GVT estimation algorithm based on Lower Bound of Event-Bulk-Time, which triggers the computation of the GVT value according to the number of processed events. In order to calculate the number of transient messages, our algorithm employs Event-Bulk to record the messages sent and received by Logical Processes. To eliminate the performance bottleneck, we adopt an overlapping computation approach to distribute the workload of GVT computation to all worker-threads. We compare our algorithm with the fast asynchronous GVT algorithm using PHOLD benchmark on the shared memory machine. Experimental results indicate that our algorithm has a light overhead and shows higher speedup and accuracy of GVT computation than the fast asynchronous GVT algorithm.
Taylor Series Trajectory Calculations Including Oblateness Effects and Variable Atmospheric Density

Science.gov (United States)

Scott, James R.

2011-01-01

Taylor series integration is implemented in NASA Glenn's Spacecraft N-body Analysis Program, and compared head-to-head with the code's existing 8th- order Runge-Kutta Fehlberg time integration scheme. This paper focuses on trajectory problems that include oblateness and/or variable atmospheric density. Taylor series is shown to be significantly faster and more accurate for oblateness problems up through a 4x4 field, with speedups ranging from a factor of 2 to 13. For problems with variable atmospheric density, speedups average 24 for atmospheric density alone, and average 1.6 to 8.2 when density and oblateness are combined.
On run-time exploitation of concurrency

NARCIS (Netherlands)

Holzenspies, P.K.F.

2010-01-01

The `free' speed-up stemming from ever increasing processor speed is over. Performance increase in computer systems can now only be achieved through parallelism. One of the biggest challenges in computer science is how to map applications onto parallel computers. Concurrency, seen as the set of
A NEW MONTE CARLO METHOD FOR TIME-DEPENDENT NEUTRINO RADIATION TRANSPORT

International Nuclear Information System (INIS)

Abdikamalov, Ernazar; Ott, Christian D.; O'Connor, Evan; Burrows, Adam; Dolence, Joshua C.; Löffler, Frank; Schnetter, Erik

2012-01-01

Monte Carlo approaches to radiation transport have several attractive properties such as simplicity of implementation, high accuracy, and good parallel scaling. Moreover, Monte Carlo methods can handle complicated geometries and are relatively easy to extend to multiple spatial dimensions, which makes them potentially interesting in modeling complex multi-dimensional astrophysical phenomena such as core-collapse supernovae. The aim of this paper is to explore Monte Carlo methods for modeling neutrino transport in core-collapse supernovae. We generalize the Implicit Monte Carlo photon transport scheme of Fleck and Cummings and gray discrete-diffusion scheme of Densmore et al. to energy-, time-, and velocity-dependent neutrino transport. Using our 1D spherically-symmetric implementation, we show that, similar to the photon transport case, the implicit scheme enables significantly larger timesteps compared with explicit time discretization, without sacrificing accuracy, while the discrete-diffusion method leads to significant speed-ups at high optical depth. Our results suggest that a combination of spectral, velocity-dependent, Implicit Monte Carlo and discrete-diffusion Monte Carlo methods represents a robust approach for use in neutrino transport calculations in core-collapse supernovae. Our velocity-dependent scheme can easily be adapted to photon transport.
A NEW MONTE CARLO METHOD FOR TIME-DEPENDENT NEUTRINO RADIATION TRANSPORT

Energy Technology Data Exchange (ETDEWEB)

Abdikamalov, Ernazar; Ott, Christian D.; O' Connor, Evan [TAPIR, California Institute of Technology, MC 350-17, 1200 E California Blvd., Pasadena, CA 91125 (United States); Burrows, Adam; Dolence, Joshua C. [Department of Astrophysical Sciences, Princeton University, Peyton Hall, Ivy Lane, Princeton, NJ 08544 (United States); Loeffler, Frank; Schnetter, Erik, E-mail: abdik@tapir.caltech.edu [Center for Computation and Technology, Louisiana State University, 216 Johnston Hall, Baton Rouge, LA 70803 (United States)

2012-08-20

Monte Carlo approaches to radiation transport have several attractive properties such as simplicity of implementation, high accuracy, and good parallel scaling. Moreover, Monte Carlo methods can handle complicated geometries and are relatively easy to extend to multiple spatial dimensions, which makes them potentially interesting in modeling complex multi-dimensional astrophysical phenomena such as core-collapse supernovae. The aim of this paper is to explore Monte Carlo methods for modeling neutrino transport in core-collapse supernovae. We generalize the Implicit Monte Carlo photon transport scheme of Fleck and Cummings and gray discrete-diffusion scheme of Densmore et al. to energy-, time-, and velocity-dependent neutrino transport. Using our 1D spherically-symmetric implementation, we show that, similar to the photon transport case, the implicit scheme enables significantly larger timesteps compared with explicit time discretization, without sacrificing accuracy, while the discrete-diffusion method leads to significant speed-ups at high optical depth. Our results suggest that a combination of spectral, velocity-dependent, Implicit Monte Carlo and discrete-diffusion Monte Carlo methods represents a robust approach for use in neutrino transport calculations in core-collapse supernovae. Our velocity-dependent scheme can easily be adapted to photon transport.
Exact diagonalization of quantum lattice models on coprocessors

Science.gov (United States)

Siro, T.; Harju, A.

2016-10-01

We implement the Lanczos algorithm on an Intel Xeon Phi coprocessor and compare its performance to a multi-core Intel Xeon CPU and an NVIDIA graphics processor. The Xeon and the Xeon Phi are parallelized with OpenMP and the graphics processor is programmed with CUDA. The performance is evaluated by measuring the execution time of a single step in the Lanczos algorithm. We study two quantum lattice models with different particle numbers, and conclude that for small systems, the multi-core CPU is the fastest platform, while for large systems, the graphics processor is the clear winner, reaching speedups of up to 7.6 compared to the CPU. The Xeon Phi outperforms the CPU with sufficiently large particle number, reaching a speedup of 2.5.
Reconfiguration in FPGA-Based Multi-Core Platforms for Hard Real-Time Applications

DEFF Research Database (Denmark)

Pezzarossa, Luca; Schoeberl, Martin; Sparsø, Jens

2016-01-01

-case execution-time of tasks of an application that determines the systems ability to respond in time. To support this focus, the platform must provide service guarantees for both communication and computation resources. In addition, many hard real-time applications have multiple modes of operation, and each......In general-purpose computing multi-core platforms, hardware accelerators and reconfiguration are means to improve performance; i.e., the average-case execution time of a software application. In hard real-time systems, such average-case speed-up is not in itself relevant - it is the worst...... mode has specific requirements. An interesting perspective on reconfigurable computing is to exploit run-time reconfiguration to support mode changes. In this paper we explore approaches to reconfiguration of communication and computation resources in the T-CREST hard real-time multi-core platform...
A Controller for Dynamic Partial Reconfiguration in FPGA-Based Real-Time Systems

DEFF Research Database (Denmark)

Pezzarossa, Luca; Schoeberl, Martin; Sparsø, Jens

2017-01-01

-source DPR controller specially developed for hard real-time systems and prototyped in connection with the open-source multi-core platform for real-time applications T-CREST. The controller enables a processor to perform reconfiguration in a time-predictable manner and supports different operating modes......In real-time systems, the use of hardware accelerators can lead to a worst-case execution-time speed-up, to a simplification of its analysis, and to a reduction of its pessimism. When using FPGA technology, dynamic partial reconfiguration (DPR) can be used to minimize the area, by only loading....... The paper also presents a software tool for bitstream conversion, compression, and for reconfiguration time analysis. The DPR controller is evaluated in terms of hardware cost, operating frequency, speed, and bitstream compression ratio vs. reconfiguration time trade-off. A simple application example...
Travel Time to Hospital for Childbirth: Comparing Calculated Versus Reported Travel Times in France.

Science.gov (United States)

Pilkington, Hugo; Prunet, Caroline; Blondel, Béatrice; Charreire, Hélène; Combier, Evelyne; Le Vaillant, Marc; Amat-Roze, Jeanne-Marie; Zeitlin, Jennifer

2018-01-01

Objectives Timely access to health care is critical in obstetrics. Yet obtaining reliable estimates of travel times to hospital for childbirth poses methodological challenges. We compared two measures of travel time, self-reported and calculated, to assess concordance and to identify determinants of long travel time to hospital for childbirth. Methods Data came from the 2010 French National Perinatal Survey, a national representative sample of births (N = 14 681). We compared both travel time measures by maternal, maternity unit and geographic characteristics in rural, peri-urban and urban areas. Logistic regression models were used to study factors associated with reported and calculated times ≥30 min. Cohen's kappa coefficients were also calculated to estimate the agreement between reported and calculated times according to women's characteristics. Results In urban areas, the proportion of women with travel times ≥30 min was higher when reported rather than calculated times were used (11.0 vs. 3.6%). Longer reported times were associated with non-French nationality [adjusted odds ratio (aOR) 1.3 (95% CI 1.0-1.7)] and inadequate prenatal care [aOR 1.5 (95% CI 1.2-2.0)], but not for calculated times. Concordance between the two measures was higher in peri-urban and rural areas (52.4 vs. 52.3% for rural areas). Delivery in a specialised level 2 or 3 maternity unit was a principal determinant of long reported and measured times in peri-urban and rural areas. Conclusions for Practice The level of agreement between reported and calculated times varies according to geographic context. Poor measurement of travel time in urban areas may mask problems in accessibility.
Computational time analysis of the numerical solution of 3D electrostatic Poisson's equation

Science.gov (United States)

Kamboh, Shakeel Ahmed; Labadin, Jane; Rigit, Andrew Ragai Henri; Ling, Tech Chaw; Amur, Khuda Bux; Chaudhary, Muhammad Tayyab

2015-05-01

3D Poisson's equation is solved numerically to simulate the electric potential in a prototype design of electrohydrodynamic (EHD) ion-drag micropump. Finite difference method (FDM) is employed to discretize the governing equation. The system of linear equations resulting from FDM is solved iteratively by using the sequential Jacobi (SJ) and sequential Gauss-Seidel (SGS) methods, simulation results are also compared to examine the difference between the results. The main objective was to analyze the computational time required by both the methods with respect to different grid sizes and parallelize the Jacobi method to reduce the computational time. In common, the SGS method is faster than the SJ method but the data parallelism of Jacobi method may produce good speedup over SGS method. In this study, the feasibility of using parallel Jacobi (PJ) method is attempted in relation to SGS method. MATLAB Parallel/Distributed computing environment is used and a parallel code for SJ method is implemented. It was found that for small grid size the SGS method remains dominant over SJ method and PJ method while for large grid size both the sequential methods may take nearly too much processing time to converge. Yet, the PJ method reduces computational time to some extent for large grid sizes.

Accelerating solutions of one-dimensional unsteady PDEs with GPU-based swept time-space decomposition

Science.gov (United States)

Magee, Daniel J.; Niemeyer, Kyle E.

2018-03-01

The expedient design of precision components in aerospace and other high-tech industries requires simulations of physical phenomena often described by partial differential equations (PDEs) without exact solutions. Modern design problems require simulations with a level of resolution difficult to achieve in reasonable amounts of time-even in effectively parallelized solvers. Though the scale of the problem relative to available computing power is the greatest impediment to accelerating these applications, significant performance gains can be achieved through careful attention to the details of memory communication and access. The swept time-space decomposition rule reduces communication between sub-domains by exhausting the domain of influence before communicating boundary values. Here we present a GPU implementation of the swept rule, which modifies the algorithm for improved performance on this processing architecture by prioritizing use of private (shared) memory, avoiding interblock communication, and overwriting unnecessary values. It shows significant improvement in the execution time of finite-difference solvers for one-dimensional unsteady PDEs, producing speedups of 2 - 9 × for a range of problem sizes, respectively, compared with simple GPU versions and 7 - 300 × compared with parallel CPU versions. However, for a more sophisticated one-dimensional system of equations discretized with a second-order finite-volume scheme, the swept rule performs 1.2 - 1.9 × worse than a standard implementation for all problem sizes.
Parallel spatial direct numerical simulations on the Intel iPSC/860 hypercube

Science.gov (United States)

Joslin, Ronald D.; Zubair, Mohammad

1993-01-01

The implementation and performance of a parallel spatial direct numerical simulation (PSDNS) approach on the Intel iPSC/860 hypercube is documented. The direct numerical simulation approach is used to compute spatially evolving disturbances associated with the laminar-to-turbulent transition in boundary-layer flows. The feasibility of using the PSDNS on the hypercube to perform transition studies is examined. The results indicate that the direct numerical simulation approach can effectively be parallelized on a distributed-memory parallel machine. By increasing the number of processors nearly ideal linear speedups are achieved with nonoptimized routines; slower than linear speedups are achieved with optimized (machine dependent library) routines. This slower than linear speedup results because the Fast Fourier Transform (FFT) routine dominates the computational cost and because the routine indicates less than ideal speedups. However with the machine-dependent routines the total computational cost decreases by a factor of 4 to 5 compared with standard FORTRAN routines. The computational cost increases linearly with spanwise wall-normal and streamwise grid refinements. The hypercube with 32 processors was estimated to require approximately twice the amount of Cray supercomputer single processor time to complete a comparable simulation; however it is estimated that a subgrid-scale model which reduces the required number of grid points and becomes a large-eddy simulation (PSLES) would reduce the computational cost and memory requirements by a factor of 10 over the PSDNS. This PSLES implementation would enable transition simulations on the hypercube at a reasonable computational cost.
Smoke simulation for fire engineering using a multigrid method on graphics hardware

DEFF Research Database (Denmark)

Glimberg, Stefan; Erleben, Kenny; Bennetsen, Jens

2009-01-01

interactive physical simulation for engineering purposes, has the benefit of reducing production turn-around time. We have measured speed-up improvements by a factor of up to 350, compared to existing CPU-based solvers. The present CUDA-based solver promises huge potential in economical benefits, as well...
Accelerating the explicitly restarted Arnoldi method with GPUs using an auto-tuned matrix vector product

International Nuclear Information System (INIS)

Dubois, J.; Calvin, Ch.; Dubois, J.; Petiton, S.

2011-01-01

This paper presents a parallelized hybrid single-vector Arnoldi algorithm for computing approximations to Eigen-pairs of a nonsymmetric matrix. We are interested in the use of accelerators and multi-core units to speed up the Arnoldi process. The main goal is to propose a parallel version of the Arnoldi solver, which can efficiently use multiple multi-core processors or multiple graphics processing units (GPUs) in a mixed coarse and fine grain fashion. In the proposed algorithms, this is achieved by an auto-tuning of the matrix vector product before starting the Arnoldi Eigen-solver as well as the reorganization of the data and global communications so that communication time is reduced. The execution time, performance, and scalability are assessed with well-known dense and sparse test matrices on multiple Nehalems, GT200 NVidia Tesla, and next generation Fermi Tesla. With one processor, we see a performance speedup of 2 to 3x when using all the physical cores, and a total speedup of 2 to 8x when adding a GPU to this multi-core unit, and hence a speedup of 4 to 24x compared to the sequential solver. (authors)
GPU-accelerated Modeling and Element-free Reverse-time Migration with Gauss Points Partition

Science.gov (United States)

Zhen, Z.; Jia, X.

2014-12-01

Element-free method (EFM) has been applied to seismic modeling and migration. Compared with finite element method (FEM) and finite difference method (FDM), it is much cheaper and more flexible because only the information of the nodes and the boundary of the study area are required in computation. In the EFM, the number of Gauss points should be consistent with the number of model nodes; otherwise the accuracy of the intermediate coefficient matrices would be harmed. Thus when we increase the nodes of velocity model in order to obtain higher resolution, we find that the size of the computer's memory will be a bottleneck. The original EFM can deal with at most 81×81 nodes in the case of 2G memory, as tested by Jia and Hu (2006). In order to solve the problem of storage and computation efficiency, we propose a concept of Gauss points partition (GPP), and utilize the GPUs to improve the computation efficiency. Considering the characteristics of the Gaussian points, the GPP method doesn't influence the propagation of seismic wave in the velocity model. To overcome the time-consuming computation of the stiffness matrix (K) and the mass matrix (M), we also use the GPUs in our computation program. We employ the compressed sparse row (CSR) format to compress the intermediate sparse matrices and try to simplify the operations by solving the linear equations with the CULA Sparse's Conjugate Gradient (CG) solver instead of the linear sparse solver 'PARDISO'. It is observed that our strategy can significantly reduce the computational time of K and Mcompared with the algorithm based on CPU. The model tested is Marmousi model. The length of the model is 7425m and the depth is 2990m. We discretize the model with 595x298 nodes, 300x300 Gauss cells and 3x3 Gauss points in each cell. In contrast to the computational time of the conventional EFM, the GPUs-GPP approach can substantially improve the efficiency. The speedup ratio of time consumption of computing K, M is 120 and the
Comparing of the Reaction Time in Substance-Dependent and Non-Dependent Individuals

Directory of Open Access Journals (Sweden)

Mohammad Narimani

2012-11-01

Full Text Available Aim: The aim of this study was to compare the simple, selective, and discrimination reaction time in substance-dependent and non-dependent individuals. Method: In this causal-comparative study, the population included of 425 males (opium and crystal dependents who were referred to addiction rehabilitation centers in Tabriz. By random sampling, 16 opium dependents, 16 crystal dependents, and 16 non-dependent individuals with no history of dependency as the compare group were selected. All groups peered in age, and marital status. For gathering data, “Addicts Admit Questionnaire” and laboratory device known as the "Reaction Time Assay" have been used. Results: The results of this study showed that there are significant differences among all groups in simple reaction time, choice reaction time and reaction time to auditory stimuli, but no significant difference in discrimination reaction time and reaction time to visual stimulus observed. Conclusion: The reaction time of substance-dependent groups is slower than non-dependent groups.
GPU-accelerated low-latency real-time searches for gravitational waves from compact binary coalescence

International Nuclear Information System (INIS)

Liu Yuan; Du Zhihui; Chung, Shin Kee; Hooper, Shaun; Blair, David; Wen Linqing

2012-01-01

We present a graphics processing unit (GPU)-accelerated time-domain low-latency algorithm to search for gravitational waves (GWs) from coalescing binaries of compact objects based on the summed parallel infinite impulse response (SPIIR) filtering technique. The aim is to facilitate fast detection of GWs with a minimum delay to allow prompt electromagnetic follow-up observations. To maximize the GPU acceleration, we apply an efficient batched parallel computing model that significantly reduces the number of synchronizations in SPIIR and optimizes the usage of the memory and hardware resource. Our code is tested on the CUDA ‘Fermi’ architecture in a GTX 480 graphics card and its performance is compared with a single core of Intel Core i7 920 (2.67 GHz). A 58-fold speedup is achieved while giving results in close agreement with the CPU implementation. Our result indicates that it is possible to conduct a full search for GWs from compact binary coalescence in real time with only one desktop computer equipped with a Fermi GPU card for the initial LIGO detectors which in the past required more than 100 CPUs. (paper)
A Novel UDT-Based Transfer Speed-Up Protocol for Fog Computing

Directory of Open Access Journals (Sweden)

Zhijie Han

2018-01-01

Full Text Available Fog computing is a distributed computing model as the middle layer between the cloud data center and the IoT device/sensor. It provides computing, network, and storage devices so that cloud based services can be closer to IOT devices and sensors. Cloud computing requires a lot of bandwidth, and the bandwidth of the wireless network is limited. In contrast, the amount of bandwidth required for “fog computing” is much less. In this paper, we improved a new protocol Peer Assistant UDT-Based Data Transfer Protocol (PaUDT, applied to Iot-Cloud computing. Furthermore, we compared the efficiency of the congestion control algorithm of UDT with the Adobe’s Secure Real-Time Media Flow Protocol (RTMFP, based on UDP completely at the transport layer. At last, we built an evaluation model of UDT in RTT and bit error ratio which describes the performance. The theoretical analysis and experiment result have shown that UDT has good performance in IoT-Cloud computing.
Modelling and Comparative Performance Analysis of a Time-Reversed UWB System

Directory of Open Access Journals (Sweden)

Popovski K

2007-01-01

Full Text Available The effects of multipath propagation lead to a significant decrease in system performance in most of the proposed ultra-wideband communication systems. A time-reversed system utilises the multipath channel impulse response to decrease receiver complexity, through a prefiltering at the transmitter. This paper discusses the modelling and comparative performance of a UWB system utilising time-reversed communications. System equations are presented, together with a semianalytical formulation on the level of intersymbol interference and multiuser interference. The standardised IEEE 802.15.3a channel model is applied, and the estimated error performance is compared through simulation with the performance of both time-hopped time-reversed and RAKE-based UWB systems.
Restless Tuneup of High-Fidelity Qubit Gates

NARCIS (Netherlands)

Rol, M.A.; Bultink, C.C.; O'Brien, T.E.; Jong, S.R. de; Theis, L.S.; Fu, X.; Luthi, F.; Vermeulen, R.F.L.; Sterke, J.C. de; Bruno, A.; Deurloo, D.; Schouten, R.N.; Wilhelm, F.K.; Dicarlo, L.

2017-01-01

We present a tuneup protocol for qubit gates with tenfold speedup over traditional methods reliant on qubit initialization by energy relaxation. This speedup is achieved by constructing a cost function for Nelder-Mead optimization from real-time correlation of nondemolition measurements interleaving
Restless Tuneup of High-Fidelity Qubit Gates

NARCIS (Netherlands)

Rol, M.A.; Bultink, C.C.; O'Brien, T.E.; De Jong, S. R.; Theis, L. S.; Fu, X.; Lüthi, F.; Vermeulen, R.F.L.; de Sterke, J.C.; Bruno, A.; Deurloo, D.; Schouten, R.N.; Wilhelm, FK; Di Carlo, L.

2017-01-01

We present a tuneup protocol for qubit gates with tenfold speedup over traditional methods reliant on qubit initialization by energy relaxation. This speedup is achieved by constructing a cost function for Nelder-Mead optimization from real-time correlation of nondemolition measurements
Reducing external speedup requirements for input-queued crossbars

DEFF Research Database (Denmark)

Berger, Michael Stubert

2005-01-01

performance degradation. This implies, that the required bandwidth between port card and switch card is 2 times the actual port speed, adding to cost and complexity. To reduce this bandwidth, a modified architecture is proposed that introduces a small amount of input and output memory on the switch card chip...
Performance Comparison of Big Data Analytics With NEXUS and Giovanni

Science.gov (United States)

Jacob, J. C.; Huang, T.; Lynnes, C.

2016-12-01

NEXUS is an emerging data-intensive analysis framework developed with a new approach for handling science data that enables large-scale data analysis. It is available through open source. We compare performance of NEXUS and Giovanni for 3 statistics algorithms applied to NASA datasets. Giovanni is a statistics web service at NASA Distributed Active Archive Centers (DAACs). NEXUS is a cloud-computing environment developed at JPL and built on Apache Solr, Cassandra, and Spark. We compute global time-averaged map, correlation map, and area-averaged time series. The first two algorithms average over time to produce a value for each pixel in a 2-D map. The third algorithm averages spatially to produce a single value for each time step. This talk is our report on benchmark comparison findings that indicate 15x speedup with NEXUS over Giovanni to compute area-averaged time series of daily precipitation rate for the Tropical Rainfall Measuring Mission (TRMM with 0.25 degree spatial resolution) for the Continental United States over 14 years (2000-2014) with 64-way parallelism and 545 tiles per granule. 16-way parallelism with 16 tiles per granule worked best with NEXUS for computing an 18-year (1998-2015) TRMM daily precipitation global time averaged map (2.5 times speedup) and 18-year global map of correlation between TRMM daily precipitation and TRMM real time daily precipitation (7x speedup). These and other benchmark results will be presented along with key lessons learned in applying the NEXUS tiling approach to big data analytics in the cloud.
Comparative Evaluations of Four Specification Methods for Real-Time Systems

Science.gov (United States)

1989-12-01

December 1989 Comparative Evaluations of Four Specification Methods for Real - Time Systems David P. Wood William G. Wood Specification and Design Methods...Methods for Real - Time Systems Abstract: A number of methods have been proposed in the last decade for the specification of system and software requirements...and software specification for real - time systems . Our process for the identification of methods that meet the above criteria is described in greater
Real-Time Incompressible Fluid Simulation on the GPU

Directory of Open Access Journals (Sweden)

Xiao Nie

2015-01-01

Full Text Available We present a parallel framework for simulating incompressible fluids with predictive-corrective incompressible smoothed particle hydrodynamics (PCISPH on the GPU in real time. To this end, we propose an efficient GPU streaming pipeline to map the entire computational task onto the GPU, fully exploiting the massive computational power of state-of-the-art GPUs. In PCISPH-based simulations, neighbor search is the major performance obstacle because this process is performed several times at each time step. To eliminate this bottleneck, an efficient parallel sorting method for this time-consuming step is introduced. Moreover, we discuss several optimization techniques including using fast on-chip shared memory to avoid global memory bandwidth limitations and thus further improve performance on modern GPU hardware. With our framework, the realism of real-time fluid simulation is significantly improved since our method enforces incompressibility constraint which is typically ignored due to efficiency reason in previous GPU-based SPH methods. The performance results illustrate that our approach can efficiently simulate realistic incompressible fluid in real time and results in a speed-up factor of up to 23 on a high-end NVIDIA GPU in comparison to single-threaded CPU-based implementation.
Energy expenditure of sedentary screen time compared with active screen time for children.

Science.gov (United States)

Lanningham-Foster, Lorraine; Jensen, Teresa B; Foster, Randal C; Redmond, Aoife B; Walker, Brian A; Heinz, Dieter; Levine, James A

2006-12-01

We examined the effect of activity-enhancing screen devices on children's energy expenditure compared with performing the same activities while seated. Our hypothesis was that energy expenditure would be significantly greater when children played activity-promoting video games, compared with sedentary video games. Energy expenditure was measured for 25 children aged 8 to 12 years, 15 of whom were lean, while they were watching television seated, playing a traditional video game seated, watching television while walking on a treadmill at 1.5 miles per hour, and playing activity-promoting video games. Watching television and playing video games while seated increased energy expenditure by 20 +/- 13% and 22 +/- 12% above resting values, respectively. When subjects were walking on the treadmill and watching television, energy expenditure increased by 138 +/- 40% over resting values. For the activity-promoting video games, energy expenditure increased by 108 +/- 40% with the EyeToy (Sony Computer Entertainment) and by 172 +/- 68% with Dance Dance Revolution Ultramix 2 (Konami Digital Entertainment). Energy expenditure more than doubles when sedentary screen time is converted to active screen time. Such interventions might be considered for obesity prevention and treatment.
Direct comparison of quantum and simulated annealing on a fully connected Ising ferromagnet

Science.gov (United States)

Wauters, Matteo M.; Fazio, Rosario; Nishimori, Hidetoshi; Santoro, Giuseppe E.

2017-08-01

We compare the performance of quantum annealing (QA, through Schrödinger dynamics) and simulated annealing (SA, through a classical master equation) on the p -spin infinite range ferromagnetic Ising model, by slowly driving the system across its equilibrium, quantum or classical, phase transition. When the phase transition is second order (p =2 , the familiar two-spin Ising interaction) SA shows a remarkable exponential speed-up over QA. For a first-order phase transition (p ≥3 , i.e., with multispin Ising interactions), in contrast, the classical annealing dynamics appears to remain stuck in the disordered phase, while we have clear evidence that QA shows a residual energy which decreases towards zero when the total annealing time τ increases, albeit in a rather slow (logarithmic) fashion. This is one of the rare examples where a limited quantum speedup, a speedup by QA over SA, has been shown to exist by direct solutions of the Schrödinger and master equations in combination with a nonequilibrium Landau-Zener analysis. We also analyze the imaginary-time QA dynamics of the model, finding a 1 /τ2 behavior for all finite values of p , as predicted by the adiabatic theorem of quantum mechanics. The Grover-search limit p (odd )=∞ is also discussed.
Fast Simulation of Dynamic Ultrasound Images Using the GPU.

Science.gov (United States)

Storve, Sigurd; Torp, Hans

2017-10-01

Simulated ultrasound data is a valuable tool for development and validation of quantitative image analysis methods in echocardiography. Unfortunately, simulation time can become prohibitive for phantoms consisting of a large number of point scatterers. The COLE algorithm by Gao et al. is a fast convolution-based simulator that trades simulation accuracy for improved speed. We present highly efficient parallelized CPU and GPU implementations of the COLE algorithm with an emphasis on dynamic simulations involving moving point scatterers. We argue that it is crucial to minimize the amount of data transfers from the CPU to achieve good performance on the GPU. We achieve this by storing the complete trajectories of the dynamic point scatterers as spline curves in the GPU memory. This leads to good efficiency when simulating sequences consisting of a large number of frames, such as B-mode and tissue Doppler data for a full cardiac cycle. In addition, we propose a phase-based subsample delay technique that efficiently eliminates flickering artifacts seen in B-mode sequences when COLE is used without enough temporal oversampling. To assess the performance, we used a laptop computer and a desktop computer, each equipped with a multicore Intel CPU and an NVIDIA GPU. Running the simulator on a high-end TITAN X GPU, we observed two orders of magnitude speedup compared to the parallel CPU version, three orders of magnitude speedup compared to simulation times reported by Gao et al. in their paper on COLE, and a speedup of 27000 times compared to the multithreaded version of Field II, using numbers reported in a paper by Jensen. We hope that by releasing the simulator as an open-source project we will encourage its use and further development.
A pseudospectral matrix method for time-dependent tensor fields on a spherical shell

International Nuclear Information System (INIS)

Brügmann, Bernd

2013-01-01

We construct a pseudospectral method for the solution of time-dependent, non-linear partial differential equations on a three-dimensional spherical shell. The problem we address is the treatment of tensor fields on the sphere. As a test case we consider the evolution of a single black hole in numerical general relativity. A natural strategy would be the expansion in tensor spherical harmonics in spherical coordinates. Instead, we consider the simpler and potentially more efficient possibility of a double Fourier expansion on the sphere for tensors in Cartesian coordinates. As usual for the double Fourier method, we employ a filter to address time-step limitations and certain stability issues. We find that a tensor filter based on spin-weighted spherical harmonics is successful, while two simplified, non-spin-weighted filters do not lead to stable evolutions. The derivatives and the filter are implemented by matrix multiplication for efficiency. A key technical point is the construction of a matrix multiplication method for the spin-weighted spherical harmonic filter. As example for the efficient parallelization of the double Fourier, spin-weighted filter method we discuss an implementation on a GPU, which achieves a speed-up of up to a factor of 20 compared to a single core CPU implementation
An exact and efficient first passage time algorithm for reaction–diffusion processes on a 2D-lattice

International Nuclear Information System (INIS)

Bezzola, Andri; Bales, Benjamin B.; Alkire, Richard C.; Petzold, Linda R.

2014-01-01

We present an exact and efficient algorithm for reaction–diffusion–nucleation processes on a 2D-lattice. The algorithm makes use of first passage time (FPT) to replace the computationally intensive simulation of diffusion hops in KMC by larger jumps when particles are far away from step-edges or other particles. Our approach computes exact probability distributions of jump times and target locations in a closed-form formula, based on the eigenvectors and eigenvalues of the corresponding 1D transition matrix, maintaining atomic-scale resolution of resulting shapes of deposit islands. We have applied our method to three different test cases of electrodeposition: pure diffusional aggregation for large ranges of diffusivity rates and for simulation domain sizes of up to 4096×4096 sites, the effect of diffusivity on island shapes and sizes in combination with a KMC edge diffusion, and the calculation of an exclusion zone in front of a step-edge, confirming statistical equivalence to standard KMC simulations. The algorithm achieves significant speedup compared to standard KMC for cases where particles diffuse over long distances before nucleating with other particles or being captured by larger islands

An exact and efficient first passage time algorithm for reaction–diffusion processes on a 2D-lattice

Energy Technology Data Exchange (ETDEWEB)

Bezzola, Andri, E-mail: andri.bezzola@gmail.com [Mechanical Engineering Department, University of California, Santa Barbara, CA 93106 (United States); Bales, Benjamin B., E-mail: bbbales2@gmail.com [Mechanical Engineering Department, University of California, Santa Barbara, CA 93106 (United States); Alkire, Richard C., E-mail: r-alkire@uiuc.edu [Department of Chemical Engineering, University of Illinois, Urbana, IL 61801 (United States); Petzold, Linda R., E-mail: petzold@engineering.ucsb.edu [Mechanical Engineering Department and Computer Science Department, University of California, Santa Barbara, CA 93106 (United States)

2014-01-01

We present an exact and efficient algorithm for reaction–diffusion–nucleation processes on a 2D-lattice. The algorithm makes use of first passage time (FPT) to replace the computationally intensive simulation of diffusion hops in KMC by larger jumps when particles are far away from step-edges or other particles. Our approach computes exact probability distributions of jump times and target locations in a closed-form formula, based on the eigenvectors and eigenvalues of the corresponding 1D transition matrix, maintaining atomic-scale resolution of resulting shapes of deposit islands. We have applied our method to three different test cases of electrodeposition: pure diffusional aggregation for large ranges of diffusivity rates and for simulation domain sizes of up to 4096×4096 sites, the effect of diffusivity on island shapes and sizes in combination with a KMC edge diffusion, and the calculation of an exclusion zone in front of a step-edge, confirming statistical equivalence to standard KMC simulations. The algorithm achieves significant speedup compared to standard KMC for cases where particles diffuse over long distances before nucleating with other particles or being captured by larger islands.
Interval Abstraction Refinement for Model Checking of Timed-Arc Petri Nets

DEFF Research Database (Denmark)

Viesmose, Sine Lyhne; Jacobsen, Thomas Stig; Jensen, Jacob Jon

2014-01-01

can be considerably faster but it does not in general guarantee conclusive answers. We implement the algorithms within the open-source model checker TAPAAL and demonstrate on a number of experiments that our approximation techniques often result in a significant speed-up of the verification....
Comparative analysis of clustering methods for gene expression time course data

Directory of Open Access Journals (Sweden)

Ivan G. Costa

2004-01-01

Full Text Available This work performs a data driven comparative study of clustering methods used in the analysis of gene expression time courses (or time series. Five clustering methods found in the literature of gene expression analysis are compared: agglomerative hierarchical clustering, CLICK, dynamical clustering, k-means and self-organizing maps. In order to evaluate the methods, a k-fold cross-validation procedure adapted to unsupervised methods is applied. The accuracy of the results is assessed by the comparison of the partitions obtained in these experiments with gene annotation, such as protein function and series classification.
Real-time trajectory optimization on parallel processors

Science.gov (United States)

Psiaki, Mark L.

1993-01-01

A parallel algorithm has been developed for rapidly solving trajectory optimization problems. The goal of the work has been to develop an algorithm that is suitable to do real-time, on-line optimal guidance through repeated solution of a trajectory optimization problem. The algorithm has been developed on an INTEL iPSC/860 message passing parallel processor. It uses a zero-order-hold discretization of a continuous-time problem and solves the resulting nonlinear programming problem using a custom-designed augmented Lagrangian nonlinear programming algorithm. The algorithm achieves parallelism of function, derivative, and search direction calculations through the principle of domain decomposition applied along the time axis. It has been encoded and tested on 3 example problems, the Goddard problem, the acceleration-limited, planar minimum-time to the origin problem, and a National Aerospace Plane minimum-fuel ascent guidance problem. Execution times as fast as 118 sec of wall clock time have been achieved for a 128-stage Goddard problem solved on 32 processors. A 32-stage minimum-time problem has been solved in 151 sec on 32 processors. A 32-stage National Aerospace Plane problem required 2 hours when solved on 32 processors. A speed-up factor of 7.2 has been achieved by using 32-nodes instead of 1-node to solve a 64-stage Goddard problem.
Numerical approaches to time evolution of complex quantum systems

International Nuclear Information System (INIS)

Fehske, Holger; Schleede, Jens; Schubert, Gerald; Wellein, Gerhard; Filinov, Vladimir S.; Bishop, Alan R.

2009-01-01

We examine several numerical techniques for the calculation of the dynamics of quantum systems. In particular, we single out an iterative method which is based on expanding the time evolution operator into a finite series of Chebyshev polynomials. The Chebyshev approach benefits from two advantages over the standard time-integration Crank-Nicholson scheme: speedup and efficiency. Potential competitors are semiclassical methods such as the Wigner-Moyal or quantum tomographic approaches. We outline the basic concepts of these techniques and benchmark their performance against the Chebyshev approach by monitoring the time evolution of a Gaussian wave packet in restricted one-dimensional (1D) geometries. Thereby the focus is on tunnelling processes and the motion in anharmonic potentials. Finally we apply the prominent Chebyshev technique to two highly non-trivial problems of current interest: (i) the injection of a particle in a disordered 2D graphene nanoribbon and (ii) the spatiotemporal evolution of polaron states in finite quantum systems. Here, depending on the disorder/electron-phonon coupling strength and the device dimensions, we observe transmission or localisation of the matter wave.
Spatial search by quantum walk

International Nuclear Information System (INIS)

Childs, Andrew M.; Goldstone, Jeffrey

2004-01-01

Grover's quantum search algorithm provides a way to speed up combinatorial search, but is not directly applicable to searching a physical database. Nevertheless, Aaronson and Ambainis showed that a database of N items laid out in d spatial dimensions can be searched in time of order √(N) for d>2, and in time of order √(N) poly(log N) for d=2. We consider an alternative search algorithm based on a continuous-time quantum walk on a graph. The case of the complete graph gives the continuous-time search algorithm of Farhi and Gutmann, and other previously known results can be used to show that √(N) speedup can also be achieved on the hypercube. We show that full √(N) speedup can be achieved on a d-dimensional periodic lattice for d>4. In d=4, the quantum walk search algorithm takes time of order √(N) poly(log N), and in d<4, the algorithm does not provide substantial speedup
Vectorization of DOT3.5 code

International Nuclear Information System (INIS)

Nonomiya, Iwao; Ishiguro, Misako; Tsutsui, Tsuneo

1990-07-01

In this report, we describe the vectorization of two-dimensional Sn-method radiation transport code DOT3.5. Vectorized codes are not only the NEA original version developed at ORNL but also the versions improved by JAERI: DOT3.5 FNS version for fusion neutronics analyses, DOT3.5 FER version for fusion reactor design, and ESPRIT module of RADHEAT-V4 code system for radiation shielding and radiation transport analyses. In DOT3.5, input/output processing time amounts to a great part of the elapsed time when a large number of energy groups and/or a large number of spatial mesh points are used in the calculated problem. Therefore, an improvement has been made for the speedup of input/output processing in the DOT3.5 FNS version, and DOT-DD (Double Differential cross section) code. The total speedup ratio of vectorized version to the original scalar one is 1.7∼1.9 for DOT3.5 NEA version, 2.2∼2.3 fro DOT3.5 FNS version, 1.7 for DOT3.5 FER version, and 3.1∼4.4 for RADHEAT-V4, respectively. The elapsed times for improved DOT3.5 FNS version and DOT-DD are reduced to 50∼65% that of the original version by the input/output speedup. In this report, we describe summary of codes, the techniques used for the vectorization and input/output speedup, verification of computed results, and speedup effect. (author)
Feynman’s clock, a new variational principle, and parallel-in-time quantum dynamics

Science.gov (United States)

McClean, Jarrod R.; Parkhill, John A.; Aspuru-Guzik, Alán

2013-01-01

We introduce a discrete-time variational principle inspired by the quantum clock originally proposed by Feynman and use it to write down quantum evolution as a ground-state eigenvalue problem. The construction allows one to apply ground-state quantum many-body theory to quantum dynamics, extending the reach of many highly developed tools from this fertile research area. Moreover, this formalism naturally leads to an algorithm to parallelize quantum simulation over time. We draw an explicit connection between previously known time-dependent variational principles and the time-embedded variational principle presented. Sample calculations are presented, applying the idea to a hydrogen molecule and the spin degrees of freedom of a model inorganic compound, demonstrating the parallel speedup of our method as well as its flexibility in applying ground-state methodologies. Finally, we take advantage of the unique perspective of this variational principle to examine the error of basis approximations in quantum dynamics. PMID:24062428
Accelerating electrostatic surface potential calculation with multi-scale approximation on graphics processing units.

Science.gov (United States)

Anandakrishnan, Ramu; Scogland, Tom R W; Fenley, Andrew T; Gordon, John C; Feng, Wu-chun; Onufriev, Alexey V

2010-06-01

Tools that compute and visualize biomolecular electrostatic surface potential have been used extensively for studying biomolecular function. However, determining the surface potential for large biomolecules on a typical desktop computer can take days or longer using currently available tools and methods. Two commonly used techniques to speed-up these types of electrostatic computations are approximations based on multi-scale coarse-graining and parallelization across multiple processors. This paper demonstrates that for the computation of electrostatic surface potential, these two techniques can be combined to deliver significantly greater speed-up than either one separately, something that is in general not always possible. Specifically, the electrostatic potential computation, using an analytical linearized Poisson-Boltzmann (ALPB) method, is approximated using the hierarchical charge partitioning (HCP) multi-scale method, and parallelized on an ATI Radeon 4870 graphical processing unit (GPU). The implementation delivers a combined 934-fold speed-up for a 476,040 atom viral capsid, compared to an equivalent non-parallel implementation on an Intel E6550 CPU without the approximation. This speed-up is significantly greater than the 42-fold speed-up for the HCP approximation alone or the 182-fold speed-up for the GPU alone. Copyright (c) 2010 Elsevier Inc. All rights reserved.
Preconditioned conjugate gradient methods for the Navier-Stokes equations

Science.gov (United States)

Ajmani, Kumud; Ng, Wing-Fai; Liou, Meng-Sing

1994-01-01

A preconditioned Krylov subspace method (GMRES) is used to solve the linear systems of equations formed at each time-integration step of the unsteady, two-dimensional, compressible Navier-Stokes equations of fluid flow. The Navier-Stokes equations are cast in an implicit, upwind finite-volume, flux-split formulation. Several preconditioning techniques are investigated to enhance the efficiency and convergence rate of the implicit solver based on the GMRES algorithm. The superiority of the new solver is established by comparisons with a conventional implicit solver, namely line Gauss-Seidel relaxation (LGSR). Computational test results for low-speed (incompressible flow over a backward-facing step at Mach 0.1), transonic flow (trailing edge flow in a transonic turbine cascade), and hypersonic flow (shock-on-shock interactions on a cylindrical leading edge at Mach 6.0) are presented. For the Mach 0.1 case, overall speedup factors of up to 17 (in terms of time-steps) and 15 (in terms of CPU time on a CRAY-YMP/8) are found in favor of the preconditioned GMRES solver, when compared with the LGSR solver. The corresponding speedup factors for the transonic flow case are 17 and 23, respectively. The hypersonic flow case shows slightly lower speedup factors of 9 and 13, respectively. The study of preconditioners conducted in this research reveals that a new LUSGS-type preconditioner is much more efficient than a conventional incomplete LU-type preconditioner.
Extraction of Phrase-Structure Fragments with a Linear Average Time Tree-Kernel

NARCIS (Netherlands)

van Cranenburgh, Andreas

2014-01-01

We present an algorithm and implementation for extracting recurring fragments from treebanks. Using a tree-kernel method the largest common fragments are extracted from each pair of trees. The algorithm presented achieves a thirty-fold speedup over the previously available method on the Wall Street
OSCAR API for Real-Time Low-Power Multicores and Its Performance on Multicores and SMP Servers

Science.gov (United States)

Kimura, Keiji; Mase, Masayoshi; Mikami, Hiroki; Miyamoto, Takamichi; Shirako, Jun; Kasahara, Hironori

OSCAR (Optimally Scheduled Advanced Multiprocessor) API has been designed for real-time embedded low-power multicores to generate parallel programs for various multicores from different vendors by using the OSCAR parallelizing compiler. The OSCAR API has been developed by Waseda University in collaboration with Fujitsu Laboratory, Hitachi, NEC, Panasonic, Renesas Technology, and Toshiba in an METI/NEDO project entitled "Multicore Technology for Realtime Consumer Electronics." By using the OSCAR API as an interface between the OSCAR compiler and backend compilers, the OSCAR compiler enables hierarchical multigrain parallel processing with memory optimization under capacity restriction for cache memory, local memory, distributed shared memory, and on-chip/off-chip shared memory; data transfer using a DMA controller; and power reduction control using DVFS (Dynamic Voltage and Frequency Scaling), clock gating, and power gating for various embedded multicores. In addition, a parallelized program automatically generated by the OSCAR compiler with OSCAR API can be compiled by the ordinary OpenMP compilers since the OSCAR API is designed on a subset of the OpenMP. This paper describes the OSCAR API and its compatibility with the OSCAR compiler by showing code examples. Performance evaluations of the OSCAR compiler and the OSCAR API are carried out using an IBM Power5+ workstation, an IBM Power6 high-end SMP server, and a newly developed consumer electronics multicore chip RP2 by Renesas, Hitachi and Waseda. From the results of scalability evaluation, it is found that on an average, the OSCAR compiler with the OSCAR API can exploit 5.8 times speedup over the sequential execution on the Power5+ workstation with eight cores and 2.9 times speedup on RP2 with four cores, respectively. In addition, the OSCAR compiler can accelerate an IBM XL Fortran compiler up to 3.3 times on the Power6 SMP server. Due to low-power optimization on RP2, the OSCAR compiler with the OSCAR API
Horizontal vectorization of electron repulsion integrals.

Science.gov (United States)

Pritchard, Benjamin P; Chow, Edmond

2016-10-30

We present an efficient implementation of the Obara-Saika algorithm for the computation of electron repulsion integrals that utilizes vector intrinsics to calculate several primitive integrals concurrently in a SIMD vector. Initial benchmarks display a 2-4 times speedup with AVX instructions over comparable scalar code, depending on the basis set. Speedup over scalar code is found to be sensitive to the level of contraction of the basis set, and is best for (lAlB|lClD) quartets when lD = 0 or lB=lD=0, which makes such a vectorization scheme particularly suitable for density fitting. The basic Obara-Saika algorithm, how it is vectorized, and the performance bottlenecks are analyzed and discussed. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
IMPROVING THE PERFORMANCE OF THE LINEAR SYSTEMS SOLVERS USING CUDA

Directory of Open Access Journals (Sweden)

BOGDAN OANCEA

2012-05-01

Full Text Available Parallel computing can offer an enormous advantage regarding the performance for very large applications in almost any field: scientific computing, computer vision, databases, data mining, and economics. GPUs are high performance many-core processors that can obtain very high FLOP rates. Since the first idea of using GPU for general purpose computing, things have evolved and now there are several approaches to GPU programming: CUDA from NVIDIA and Stream from AMD. CUDA is now a popular programming model for general purpose computations on GPU for C/C++ programmers. A great number of applications were ported to CUDA programming model and they obtain speedups of orders of magnitude comparing to optimized CPU implementations. In this paper we present an implementation of a library for solving linear systems using the CCUDA framework. We present the results of performance tests and show that using GPU one can obtain speedups of about of approximately 80 times comparing with a CPU implementation.
An optimization of a GPU-based parallel wind field module

International Nuclear Information System (INIS)

Pinheiro, André L.S.; Shirru, Roberto

2017-01-01

Atmospheric radionuclide dispersion systems (ARDS) are important tools to predict the impact of radioactive releases from Nuclear Power Plants and guide people evacuation from affected areas. Four modules comprise ARDS: Source Term, Wind Field, Plume Dispersion and Doses Calculations. The slowest is the Wind Field Module that was previously parallelized using the CUDA C language. The statement purpose of this work is to show the speedup gain with the optimization of the already parallel code of the GPU-based Wind Field module, based in WEST model (Extrapolated from Stability and Terrain). Due to the parallelization done in the wind field module, it was observed that some CUDA processors became idle, thus contributing to a reduction in speedup. It was proposed in this work a way of allocating these idle CUDA processors in order to increase the speedup. An acceleration of about 4 times can be seen in the comparative case study between the regular CUDA code and the optimized CUDA code. These results are quite motivating and point out that even after a parallelization of code, a parallel code optimization should be taken into account. (author)
An optimization of a GPU-based parallel wind field module

Energy Technology Data Exchange (ETDEWEB)

Pinheiro, André L.S.; Shirru, Roberto [Coordenacao de Pos-Graduacao e Pesquisa de Engenharia (PEN/COPPE/UFRJ), Rio de Janeiro, RJ (Brazil). Programa de Engenharia Nuclear; Pereira, Cláudio M.N.A., E-mail: apinheiro99@gmail.com, E-mail: schirru@lmp.ufrj.br, E-mail: cmnap@ien.gov.br [Instituto de Engenharia Nuclear (IEN/CNEN-RJ), Rio de Janeiro, RJ (Brazil)

2017-07-01

Atmospheric radionuclide dispersion systems (ARDS) are important tools to predict the impact of radioactive releases from Nuclear Power Plants and guide people evacuation from affected areas. Four modules comprise ARDS: Source Term, Wind Field, Plume Dispersion and Doses Calculations. The slowest is the Wind Field Module that was previously parallelized using the CUDA C language. The statement purpose of this work is to show the speedup gain with the optimization of the already parallel code of the GPU-based Wind Field module, based in WEST model (Extrapolated from Stability and Terrain). Due to the parallelization done in the wind field module, it was observed that some CUDA processors became idle, thus contributing to a reduction in speedup. It was proposed in this work a way of allocating these idle CUDA processors in order to increase the speedup. An acceleration of about 4 times can be seen in the comparative case study between the regular CUDA code and the optimized CUDA code. These results are quite motivating and point out that even after a parallelization of code, a parallel code optimization should be taken into account. (author)
Parallel protein secondary structure prediction based on neural networks.

Science.gov (United States)

Zhong, Wei; Altun, Gulsah; Tian, Xinmin; Harrison, Robert; Tai, Phang C; Pan, Yi

2004-01-01

Protein secondary structure prediction has a fundamental influence on today's bioinformatics research. In this work, binary and tertiary classifiers of protein secondary structure prediction are implemented on Denoeux belief neural network (DBNN) architecture. Hydrophobicity matrix, orthogonal matrix, BLOSUM62 and PSSM (position specific scoring matrix) are experimented separately as the encoding schemes for DBNN. The experimental results contribute to the design of new encoding schemes. New binary classifier for Helix versus not Helix ( approximately H) for DBNN produces prediction accuracy of 87% when PSSM is used for the input profile. The performance of DBNN binary classifier is comparable to other best prediction methods. The good test results for binary classifiers open a new approach for protein structure prediction with neural networks. Due to the time consuming task of training the neural networks, Pthread and OpenMP are employed to parallelize DBNN in the hyperthreading enabled Intel architecture. Speedup for 16 Pthreads is 4.9 and speedup for 16 OpenMP threads is 4 in the 4 processors shared memory architecture. Both speedup performance of OpenMP and Pthread is superior to that of other research. With the new parallel training algorithm, thousands of amino acids can be processed in reasonable amount of time. Our research also shows that hyperthreading technology for Intel architecture is efficient for parallel biological algorithms.
Seasonal time series forecasting: a comparative study of arima and ...

African Journals Online (AJOL)

This paper addresses the concerns of Faraway and Chatfield (1998) who questioned the forecasting ability of Artificial Neural Networks (ANN). In particular the paper compares the performance of Artificial Neural Networks (ANN) and ARIMA models in forecasting of seasonal (monthly) Time series. Using the Airline data ...
GPU-based parallel computing in real-time modeling of atmospheric transport and diffusion of radioactive material

Energy Technology Data Exchange (ETDEWEB)

Santos, Marcelo C. dos; Pereira, Claudio M.N.A.; Schirru, Roberto; Pinheiro, André, E-mail: jovitamarcelo@gmail.com, E-mail: cmnap@ien.gov.br, E-mail: schirru@lmp.ufrj.br, E-mail: apinheiro99@gmail.com [Instituto de Engenharia Nuclear (IEN/CNEN-RJ), Rio de Janeiro, RJ (Brazil); Coordenacao de Pos-Graduacao e Pesquisa de Engenharia (COPPE/UFRJ), Rio de Janeiro, RJ (Brazil). Programa de Engenharia Nuclear

2017-07-01

Atmospheric radionuclide dispersion systems (ARDS) are essential mechanisms to predict the consequences of unexpected radioactive releases from nuclear power plants. Considering, that during an eventuality of an accident with a radioactive material release, an accurate forecast is vital to guide the evacuation plan of the possible affected areas. However, in order to predict the dispersion of the radioactive material and its impact on the environment, the model must process information about source term (radioactive materials released, activities and location), weather condition (wind, humidity and precipitation) and geographical characteristics (topography). Furthermore, ARDS is basically composed of 4 main modules: Source Term, Wind Field, Plume Dispersion and Doses Calculations. The Wind Field and Plume Dispersion modules are the ones that require a high computational performance to achieve accurate results within an acceptable time. Taking this into account, this work focuses on the development of a GPU-based parallel Plume Dispersion module, focusing on the radionuclide transport and diffusion calculations, which use a given wind field and a released source term as parameters. The program is being developed using the C ++ programming language, allied with CUDA libraries. In comparative case study between a parallel and sequential version of the slower function of the Plume Dispersion module, a speedup of 11.63 times could be observed. (author)
GPU-based parallel computing in real-time modeling of atmospheric transport and diffusion of radioactive material

International Nuclear Information System (INIS)

Santos, Marcelo C. dos; Pereira, Claudio M.N.A.; Schirru, Roberto; Pinheiro, André; Coordenacao de Pos-Graduacao e Pesquisa de Engenharia

2017-01-01

Atmospheric radionuclide dispersion systems (ARDS) are essential mechanisms to predict the consequences of unexpected radioactive releases from nuclear power plants. Considering, that during an eventuality of an accident with a radioactive material release, an accurate forecast is vital to guide the evacuation plan of the possible affected areas. However, in order to predict the dispersion of the radioactive material and its impact on the environment, the model must process information about source term (radioactive materials released, activities and location), weather condition (wind, humidity and precipitation) and geographical characteristics (topography). Furthermore, ARDS is basically composed of 4 main modules: Source Term, Wind Field, Plume Dispersion and Doses Calculations. The Wind Field and Plume Dispersion modules are the ones that require a high computational performance to achieve accurate results within an acceptable time. Taking this into account, this work focuses on the development of a GPU-based parallel Plume Dispersion module, focusing on the radionuclide transport and diffusion calculations, which use a given wind field and a released source term as parameters. The program is being developed using the C ++ programming language, allied with CUDA libraries. In comparative case study between a parallel and sequential version of the slower function of the Plume Dispersion module, a speedup of 11.63 times could be observed. (author)

Beyond the sticker price: including and excluding time in comparing food prices.

Science.gov (United States)

Yang, Yanliang; Davis, George C; Muth, Mary K

2015-07-01

An ongoing debate in the literature is how to measure the price of food. Most analyses have not considered the value of time in measuring the price of food. Whether or not the value of time is included in measuring the price of a food may have important implications for classifying foods based on their relative cost. The purpose of this article is to compare prices that exclude time (time-exclusive price) with prices that include time (time-inclusive price) for 2 types of home foods: home foods using basic ingredients (home recipes) vs. home foods using more processed ingredients (processed recipes). The time-inclusive and time-exclusive prices are compared to determine whether the time-exclusive prices in isolation may mislead in drawing inferences regarding the relative prices of foods. We calculated the time-exclusive price and time-inclusive price of 100 home recipes and 143 processed recipes and then categorized them into 5 standard food groups: grains, proteins, vegetables, fruit, and dairy. We then examined the relation between the time-exclusive prices and the time-inclusive prices and dietary recommendations. For any food group, the processed food time-inclusive price was always less than the home recipe time-inclusive price, even if the processed food's time-exclusive price was more expensive. Time-inclusive prices for home recipes were especially higher for the more time-intensive food groups, such as grains, vegetables, and fruit, which are generally underconsumed relative to the guidelines. Focusing only on the sticker price of a food and ignoring the time cost may lead to different conclusions about relative prices and policy recommendations than when the time cost is included. © 2015 American Society for Nutrition.
Optimization Techniques for Dimensionally Truncated Sparse Grids on Heterogeneous Systems

KAUST Repository

Deftu, A.

2013-02-01

Given the existing heterogeneous processor landscape dominated by CPUs and GPUs, topics such as programming productivity and performance portability have become increasingly important. In this context, an important question refers to how can we develop optimization strategies that cover both CPUs and GPUs. We answer this for fastsg, a library that provides functionality for handling efficiently high-dimensional functions. As it can be employed for compressing and decompressing large-scale simulation data, it finds itself at the core of a computational steering application which serves us as test case. We describe our experience with implementing fastsg\\'s time critical routines for Intel CPUs and Nvidia Fermi GPUs. We show the differences and especially the similarities between our optimization strategies for the two architectures. With regard to our test case for which achieving high speedups is a "must" for real-time visualization, we report a speedup of up to 6.2x times compared to the state-of-the-art implementation of the sparse grid technique for GPUs. © 2013 IEEE.
Time dependent analysis of assay comparability: a novel approach to understand intra- and inter-site variability over time

Science.gov (United States)

Winiwarter, Susanne; Middleton, Brian; Jones, Barry; Courtney, Paul; Lindmark, Bo; Page, Ken M.; Clark, Alan; Landqvist, Claire

2015-09-01

We demonstrate here a novel use of statistical tools to study intra- and inter-site assay variability of five early drug metabolism and pharmacokinetics in vitro assays over time. Firstly, a tool for process control is presented. It shows the overall assay variability but allows also the following of changes due to assay adjustments and can additionally highlight other, potentially unexpected variations. Secondly, we define the minimum discriminatory difference/ratio to support projects to understand how experimental values measured at different sites at a given time can be compared. Such discriminatory values are calculated for 3 month periods and followed over time for each assay. Again assay modifications, especially assay harmonization efforts, can be noted. Both the process control tool and the variability estimates are based on the results of control compounds tested every time an assay is run. Variability estimates for a limited set of project compounds were computed as well and found to be comparable. This analysis reinforces the need to consider assay variability in decision making, compound ranking and in silico modeling.
Application of Assembly of Finite Element Methods on Graphics Processors for Real-Time Elastodynamics

KAUST Repository

Cecka, Cris

2012-01-01

This chapter discusses multiple strategies to perform general computations on unstructured grids, with specific application to the assembly of matrices in finite element methods (FEMs). It reviews and applies two methods for assembly of FEMs to produce and accelerate a FEM model for a nonlinear hyperelastic solid where the assembly, solution, update, and visualization stages are performed solely on the GPU, benefiting from speed-ups in each stage and avoiding costly GPUCPU transfers of data. For each method, the chapter discusses the NVIDIA GPU hardware\\'s limiting resources, optimizations, key data structures, and dependence of the performance with respect to problem size, element size, and GPU hardware generation. Furthermore, this chapter informs potential users of the benefits of GPU technology, provides guidelines to help them implement their own FEM solutions, gives potential speed-ups that can be expected, and provides source code for reference. © 2012 Elsevier Inc. All rights reserved.
Making historic loss data comparable over time and place

Science.gov (United States)

Eichner, Jan; Steuer, Markus; Löw, Petra

2017-04-01

When utilizing historic loss data for present day risk assessment, it is necessary to make the data comparable over time and place. To achieve this, the assessment of costs from natural hazard events requires consistent and homogeneous methodologies for loss estimation as well as a robust treatment of loss data to estimate and/or reduce distorting effects due to a temporal bias in the reporting of small-scale loss events. Here we introduce Munich Re's NatCatSERVICE loss database and present a novel methodology of peril-specific normalization of the historic losses (to account for socio-economic growth of assets over time), and we introduce a metric of severity classification (called CatClass) that allows for a global comparison of impact severity across countries of different stages of economic development.
A Near-linear Time Approximation Algorithm for Angle-based Outlier Detection in High-dimensional Data

DEFF Research Database (Denmark)

Pham, Ninh Dang; Pagh, Rasmus

2012-01-01

projection-based technique that is able to estimate the angle-based outlier factor for all data points in time near-linear in the size of the data. Also, our approach is suitable to be performed in parallel environment to achieve a parallel speedup. We introduce a theoretical analysis of the quality...... neighbor are deteriorated in high-dimensional data. Following up on the work of Kriegel et al. (KDD '08), we investigate the use of angle-based outlier factor in mining high-dimensional outliers. While their algorithm runs in cubic time (with a quadratic time heuristic), we propose a novel random......Outlier mining in d-dimensional point sets is a fundamental and well studied data mining task due to its variety of applications. Most such applications arise in high-dimensional domains. A bottleneck of existing approaches is that implicit or explicit assessments on concepts of distance or nearest...
SPEEDUPtrademark ion exchange column model

International Nuclear Information System (INIS)

Hang, T.

2000-01-01

A transient model to describe the process of loading a solute onto the granular fixed bed in an ion exchange (IX) column has been developed using the SpeedUptrademark software package. SpeedUp offers the advantage of smooth integration into other existing SpeedUp flowsheet models. The mathematical algorithm of a porous particle diffusion model was adopted to account for convection, axial dispersion, film mass transfer, and pore diffusion. The method of orthogonal collocation on finite elements was employed to solve the governing transport equations. The model allows the use of a non-linear Langmuir isotherm based on an effective binary ionic exchange process. The SpeedUp column model was tested by comparing to the analytical solutions of three transport problems from the ion exchange literature. In addition, a sample calculation of a train of three crystalline silicotitanate (CST) IX columns in series was made using both the SpeedUp model and Purdue University's VERSE-LC code. All test cases showed excellent agreement between the SpeedUp model results and the test data. The model can be readily used for SuperLigtrademark ion exchange resins, once the experimental data are complete
Accelerating Electrostatic Surface Potential Calculation with Multiscale Approximation on Graphics Processing Units

Science.gov (United States)

Anandakrishnan, Ramu; Scogland, Tom R. W.; Fenley, Andrew T.; Gordon, John C.; Feng, Wu-chun; Onufriev, Alexey V.

2010-01-01

Tools that compute and visualize biomolecular electrostatic surface potential have been used extensively for studying biomolecular function. However, determining the surface potential for large biomolecules on a typical desktop computer can take days or longer using currently available tools and methods. Two commonly used techniques to speed up these types of electrostatic computations are approximations based on multi-scale coarse-graining and parallelization across multiple processors. This paper demonstrates that for the computation of electrostatic surface potential, these two techniques can be combined to deliver significantly greater speed-up than either one separately, something that is in general not always possible. Specifically, the electrostatic potential computation, using an analytical linearized Poisson Boltzmann (ALPB) method, is approximated using the hierarchical charge partitioning (HCP) multiscale method, and parallelized on an ATI Radeon 4870 graphical processing unit (GPU). The implementation delivers a combined 934-fold speed-up for a 476,040 atom viral capsid, compared to an equivalent non-parallel implementation on an Intel E6550 CPU without the approximation. This speed-up is significantly greater than the 42-fold speed-up for the HCP approximation alone or the 182-fold speed-up for the GPU alone. PMID:20452792
Building Input Adaptive Parallel Applications: A Case Study of Sparse Grid Interpolation

KAUST Repository

Murarasu, Alin

2012-12-01

The well-known power wall resulting in multi-cores requires special techniques for speeding up applications. In this sense, parallelization plays a crucial role. Besides standard serial optimizations, techniques such as input specialization can also bring a substantial contribution to the speedup. By identifying common patterns in the input data, we propose new algorithms for sparse grid interpolation that accelerate the state-of-the-art non-specialized version. Sparse grid interpolation is an inherently hierarchical method of interpolation employed for example in computational steering applications for decompressing highdimensional simulation data. In this context, improving the speedup is essential for real-time visualization. Using input specialization, we report a speedup of up to 9x over the nonspecialized version. The paper covers the steps we took to reach this speedup by means of input adaptivity. Our algorithms will be integrated in fastsg, a library for fast sparse grid interpolation. © 2012 IEEE.
Diagnostic time in digital pathology: A comparative study on 400 cases

Directory of Open Access Journals (Sweden)

Aleksandar Vodovnik

2016-01-01

Full Text Available Background: Numerous validation studies in digital pathology confirmed its value as a diagnostic tool. However, a longer time to diagnosis than traditional microscopy has been seen as a significant barrier to the routine use of digital pathology. As a part of our validation study, we compared a digital and microscopic diagnostic time in the routine diagnostic setting. Materials and Methods: One senior staff pathologist reported 400 consecutive cases in histology, nongynecological, and fine needle aspiration cytology (20 sessions, 20 cases/session, over 4 weeks. Complex, difficult, and rare cases were excluded from the study to reduce the bias. A primary diagnosis was digital, followed by traditional microscopy, 6 months later, with only request forms available for both. Microscopic slides were scanned at ×20, digital images accessed through the fully integrated laboratory information management system (LIMS and viewed in the image viewer on double 23” displays. A median broadband speed was 299 Mbps. A diagnostic time was measured from the point slides were made available to the point diagnosis was made or additional investigations were deemed necessary, recorded independently in minutes/session and compared. Results: A digital diagnostic time was 1841 and microscopic 1956 min; digital being shorter than microscopic in 13 sessions. Four sessions with shorter microscopic diagnostic time included more cases requiring extensive use of magnifications over ×20. Diagnostic time was similar in three sessions. Conclusions: A diagnostic time in digital pathology can be shorter than traditional microscopy in the routine diagnostic setting, with adequate and stable network speeds, fully integrated LIMS and double displays as default parameters. This also related to better ergonomics, larger viewing field, and absence of physical slide handling, with effects on both diagnostic and nondiagnostic time. Differences with previous studies included a design
Real-time unmanned aircraft systems surveillance video mosaicking using GPU

Science.gov (United States)

Camargo, Aldo; Anderson, Kyle; Wang, Yi; Schultz, Richard R.; Fevig, Ronald A.

2010-04-01

Digital video mosaicking from Unmanned Aircraft Systems (UAS) is being used for many military and civilian applications, including surveillance, target recognition, border protection, forest fire monitoring, traffic control on highways, monitoring of transmission lines, among others. Additionally, NASA is using digital video mosaicking to explore the moon and planets such as Mars. In order to compute a "good" mosaic from video captured by a UAS, the algorithm must deal with motion blur, frame-to-frame jitter associated with an imperfectly stabilized platform, perspective changes as the camera tilts in flight, as well as a number of other factors. The most suitable algorithms use SIFT (Scale-Invariant Feature Transform) to detect the features consistent between video frames. Utilizing these features, the next step is to estimate the homography between two consecutives video frames, perform warping to properly register the image data, and finally blend the video frames resulting in a seamless video mosaick. All this processing takes a great deal of resources of resources from the CPU, so it is almost impossible to compute a real time video mosaic on a single processor. Modern graphics processing units (GPUs) offer computational performance that far exceeds current CPU technology, allowing for real-time operation. This paper presents the development of a GPU-accelerated digital video mosaicking implementation and compares it with CPU performance. Our tests are based on two sets of real video captured by a small UAS aircraft; one video comes from Infrared (IR) and Electro-Optical (EO) cameras. Our results show that we can obtain a speed-up of more than 50 times using GPU technology, so real-time operation at a video capture of 30 frames per second is feasible.
A Parallel Framework with Block Matrices of a Discrete Fourier Transform for Vector-Valued Discrete-Time Signals

Directory of Open Access Journals (Sweden)

Pablo Soto-Quiros

2015-01-01

Full Text Available This paper presents a parallel implementation of a kind of discrete Fourier transform (DFT: the vector-valued DFT. The vector-valued DFT is a novel tool to analyze the spectra of vector-valued discrete-time signals. This parallel implementation is developed in terms of a mathematical framework with a set of block matrix operations. These block matrix operations contribute to analysis, design, and implementation of parallel algorithms in multicore processors. In this work, an implementation and experimental investigation of the mathematical framework are performed using MATLAB with the Parallel Computing Toolbox. We found that there is advantage to use multicore processors and a parallel computing environment to minimize the high execution time. Additionally, speedup increases when the number of logical processors and length of the signal increase.
GPU-accelerated algorithms for many-particle continuous-time quantum walks

Science.gov (United States)

Piccinini, Enrico; Benedetti, Claudia; Siloi, Ilaria; Paris, Matteo G. A.; Bordone, Paolo

2017-06-01

Many-particle continuous-time quantum walks (CTQWs) represent a resource for several tasks in quantum technology, including quantum search algorithms and universal quantum computation. In order to design and implement CTQWs in a realistic scenario, one needs effective simulation tools for Hamiltonians that take into account static noise and fluctuations in the lattice, i.e. Hamiltonians containing stochastic terms. To this aim, we suggest a parallel algorithm based on the Taylor series expansion of the evolution operator, and compare its performances with those of algorithms based on the exact diagonalization of the Hamiltonian or a 4th order Runge-Kutta integration. We prove that both Taylor-series expansion and Runge-Kutta algorithms are reliable and have a low computational cost, the Taylor-series expansion showing the additional advantage of a memory allocation not depending on the precision of calculation. Both algorithms are also highly parallelizable within the SIMT paradigm, and are thus suitable for GPGPU computing. In turn, we have benchmarked 4 NVIDIA GPUs and 3 quad-core Intel CPUs for a 2-particle system over lattices of increasing dimension, showing that the speedup provided by GPU computing, with respect to the OPENMP parallelization, lies in the range between 8x and (more than) 20x, depending on the frequency of post-processing. GPU-accelerated codes thus allow one to overcome concerns about the execution time, and make it possible simulations with many interacting particles on large lattices, with the only limit of the memory available on the device.
Rough surface scattering simulations using graphics cards

International Nuclear Information System (INIS)

Klapetek, Petr; Valtr, Miroslav; Poruba, Ales; Necas, David; Ohlidal, Miloslav

2010-01-01

In this article we present results of rough surface scattering calculations using a graphical processing unit implementation of the Finite Difference in Time Domain algorithm. Numerical results are compared to real measurements and computational performance is compared to computer processor implementation of the same algorithm. As a basis for computations, atomic force microscope measurements of surface morphology are used. It is shown that the graphical processing unit capabilities can be used to speedup presented computationally demanding algorithms without loss of precision.
Real-time implementation of optimized maximum noise fraction transform for feature extraction of hyperspectral images

Science.gov (United States)

Wu, Yuanfeng; Gao, Lianru; Zhang, Bing; Zhao, Haina; Li, Jun

2014-01-01

We present a parallel implementation of the optimized maximum noise fraction (G-OMNF) transform algorithm for feature extraction of hyperspectral images on commodity graphics processing units (GPUs). The proposed approach explored the algorithm data-level concurrency and optimized the computing flow. We first defined a three-dimensional grid, in which each thread calculates a sub-block data to easily facilitate the spatial and spectral neighborhood data searches in noise estimation, which is one of the most important steps involved in OMNF. Then, we optimized the processing flow and computed the noise covariance matrix before computing the image covariance matrix to reduce the original hyperspectral image data transmission. These optimization strategies can greatly improve the computing efficiency and can be applied to other feature extraction algorithms. The proposed parallel feature extraction algorithm was implemented on an Nvidia Tesla GPU using the compute unified device architecture and basic linear algebra subroutines library. Through the experiments on several real hyperspectral images, our GPU parallel implementation provides a significant speedup of the algorithm compared with the CPU implementation, especially for highly data parallelizable and arithmetically intensive algorithm parts, such as noise estimation. In order to further evaluate the effectiveness of G-OMNF, we used two different applications: spectral unmixing and classification for evaluation. Considering the sensor scanning rate and the data acquisition time, the proposed parallel implementation met the on-board real-time feature extraction.
Operating Time Division for a Bus Route Based on the Recovery of GPS Data

Directory of Open Access Journals (Sweden)

Jian Wang

2017-01-01

Full Text Available Bus travel time is an important source of data for time of day partition of the bus route. However, in practice, a bus driver may deliberately speed up or slow down on route so as to follow the predetermined timetable. The raw GPS data collected by the GPS device equipped on the bus, as a result, cannot reflect its real operating conditions. To address this concern, this study first develops a method to identify whether there is deliberate speed-up or slow-down movement of a bus. Building upon the relationships between the intersection delay, link travel time, and traffic flow, a recovery method is established for calculating the real bus travel time. Using the dwell time at each stop and the recovered travel time between each of them as the division indexes, a sequential clustering-based time of day partition method is proposed. The effectiveness of the developed method is demonstrated using the data of bus route 63 in Harbin, China. Results show that the partition method can help bus enterprises to design reasonable time of day intervals and significantly improve their level of service.
Adiabatic condition and the quantum hitting time of Markov chains

International Nuclear Information System (INIS)

Krovi, Hari; Ozols, Maris; Roland, Jeremie

2010-01-01

We present an adiabatic quantum algorithm for the abstract problem of searching marked vertices in a graph, or spatial search. Given a random walk (or Markov chain) P on a graph with a set of unknown marked vertices, one can define a related absorbing walk P ' where outgoing transitions from marked vertices are replaced by self-loops. We build a Hamiltonian H(s) from the interpolated Markov chain P(s)=(1-s)P+sP ' and use it in an adiabatic quantum algorithm to drive an initial superposition over all vertices to a superposition over marked vertices. The adiabatic condition implies that, for any reversible Markov chain and any set of marked vertices, the running time of the adiabatic algorithm is given by the square root of the classical hitting time. This algorithm therefore demonstrates a novel connection between the adiabatic condition and the classical notion of hitting time of a random walk. It also significantly extends the scope of previous quantum algorithms for this problem, which could only obtain a full quadratic speedup for state-transitive reversible Markov chains with a unique marked vertex.
Across-province standardization and comparative analysis of time-to-care intervals for cancer

Directory of Open Access Journals (Sweden)

Nugent Zoann

2007-10-01

Full Text Available Abstract Background A set of consistent, standardized definitions of intervals and populations on which to report across provinces is needed to inform the Provincial/Territorial Deputy Ministries of Health on progress of the Ten-Year Plan to Strengthen Health Care. The objectives of this project were to: 1 identify a set of criteria and variables needed to create comparable measures of important time-to-cancer-care intervals that could be applied across provinces and 2 use the measures to compare time-to-care across participating provinces for lung and colorectal cancer patients diagnosed in 2004. Methods A broad-based group of stakeholders from each of the three participating cancer agencies was assembled to identify criteria for time-to-care intervals to standardize, evaluate possible intervals and their corresponding start and end time points, and finalize the selection of intervals to pursue. Inclusion/exclusion criteria were identified for the patient population and the selected time points to reduce potential selection bias. The provincial 2004 colorectal and lung cancer data were used to illustrate across-province comparisons for the selected time-to-care intervals. Results Criteria identified as critical for time-to-care intervals and corresponding start and end points were: 1 relevant to patients, 2 relevant to clinical care, 3 unequivocally defined, and 4 currently captured consistently across cancer agencies. Time from diagnosis to first radiation or chemotherapy treatment and the smaller components, time from diagnosis to first consult with an oncologist and time from first consult to first radiation or chemotherapy treatment, were the only intervals that met all four criteria. Timeliness of care for the intervals evaluated was similar between the provinces for lung cancer patients but significant differences were found for colorectal cancer patients. Conclusion We identified criteria important for selecting time-to-care intervals
Dynamical decoupling assisted acceleration of two-spin evolution in XY spin-chain environment

Energy Technology Data Exchange (ETDEWEB)

Wei, Yong-Bo; Zou, Jian [School of Physics, Beijing Institute of Technology, Beijing 100081 (China); Wang, Zhao-Ming [Department of Physics, Ocean University of China, Qingdao 266100 (China); Shao, Bin, E-mail: sbin610@bit.edu.cn [School of Physics, Beijing Institute of Technology, Beijing 100081 (China); Li, Hai [School of Information and Electronic Engineering, Shandong Institute of Business and Technology, Yantai 264000 (China)

2016-01-28

We study the speed-up role of dynamical decoupling in an open system, which is modeled as two central spins coupled to their own XY spin-chain environment. We show that the fast bang–bang pulses can suppress the system evolution, which manifests the quantum Zeno effect. In contrast, with the increasing of the pulse interval time, the bang–bang pulses can enhance the decay of the quantum speed limit time and induce the speed-up process, which displays the quantum anti-Zeno effect. In addition, we show that the random pulses can also induce the speed-up of quantum evolution. - Highlights: • We propose a scheme to accelerate the dynamical evolution of central spins in an open system. • The quantum speed limit of central spins can be modulated by changing pulse frequency. • The random pulses can play the same role as the regular pulses do for small perturbation.
DTW4Omics: comparing patterns in biological time series.

Directory of Open Access Journals (Sweden)

Rachel Cavill

Full Text Available When studying time courses of biological measurements and comparing these to other measurements eg. gene expression and phenotypic endpoints, the analysis is complicated by the fact that although the associated elements may show the same patterns of behaviour, the changes do not occur simultaneously. In these cases standard correlation-based measures of similarity will fail to find significant associations. Dynamic time warping (DTW is a technique which can be used in these situations to find the optimal match between two time courses, which may then be assessed for its significance. We implement DTW4Omics, a tool for performing DTW in R. This tool extends existing R scripts for DTW making them applicable for "omics" datasets where thousands entities may need to be compared with a range of markers and endpoints. It includes facilities to estimate the significance of the matches between the supplied data, and provides a set of plots to enable the user to easily visualise the output. We illustrate the utility of this approach using a dataset linking the exposure of the colon carcinoma Caco-2 cell line to oxidative stress by hydrogen peroxide (H2O2 and menadione across 9 timepoints and show that on average 85% of the genes found are not obtained from a standard correlation analysis between the genes and the measured phenotypic endpoints. We then show that when we analyse the genes identified by DTW4Omics as significantly associated with a marker for oxidative DNA damage (8-oxodG, through over-representation, an Oxidative Stress pathway is identified as the most over-represented pathway demonstrating that the genes found by DTW4Omics are biologically relevant. In contrast, when the positively correlated genes were similarly analysed, no pathways were found. The tool is implemented as an R Package and is available, along with a user guide from http://web.tgx.unimaas.nl/svn/public/dtw/.

Comparing Part-time Employment in Germany, Sweden, Ireland and the Netherland

DEFF Research Database (Denmark)

Bekker, Sonja; Hipp, Lena; Leschke, Janine

2017-01-01

In the current discussions on combining work and family, the idea of shorter working hours is becoming ever more popular. However, much of the research on part-time employment has looked at women and mothers in particular. Much less is known about part-time work among men or fathers. Therefore......, this paper aims to establish the differences and similarities between men and women and particularly between mothers and fathers in their choices to work parttime, taking into account different household contexts and welfare state institutions. By analysing part-time work in Germany, Sweden, Ireland...... and the Netherlands in 2014 using individual level data from the European Labour Force Survey, we show that for men a lower earning capacity compared to their partner or family responsibilities do not seem to lead to higher part-time shares. This is the opposite of what we find for women. According to our analysis...
TIME ZONE DIFFERENCE, COMPARATIVE ADVANTAGE AND TRADE: A REVIEW OF LITERATURE

Directory of Open Access Journals (Sweden)

Alaka Shree Prasad

2017-09-01

Full Text Available With the growing development in communication technology and increased fragmentation of production process, services that were once considered non-tradable can now be traded across different nations. In this respect, trading countries located in different time zones of the world with non-overlapping working hours are able to develop a comparative advantage together for the supply of these services. Disintegrating the production of a service across different time zones can allow the production to be completed efficiently and make the product available in the market meeting consumer demand in a timely fashion. In this paper, we have reviewed some of important research that has been conducted in the area of time zone differences and trade. This type of trade further affects the factor market and production patterns of the involved countries and has also been significant for their growth and welfare.
P-HS-SFM: a parallel harmony search algorithm for the reproduction of experimental data in the continuous microscopic crowd dynamic models

Science.gov (United States)

Jaber, Khalid Mohammad; Alia, Osama Moh'd.; Shuaib, Mohammed Mahmod

2018-03-01

Finding the optimal parameters that can reproduce experimental data (such as the velocity-density relation and the specific flow rate) is a very important component of the validation and calibration of microscopic crowd dynamic models. Heavy computational demand during parameter search is a known limitation that exists in a previously developed model known as the Harmony Search-Based Social Force Model (HS-SFM). In this paper, a parallel-based mechanism is proposed to reduce the computational time and memory resource utilisation required to find these parameters. More specifically, two MATLAB-based multicore techniques (parfor and create independent jobs) using shared memory are developed by taking advantage of the multithreading capabilities of parallel computing, resulting in a new framework called the Parallel Harmony Search-Based Social Force Model (P-HS-SFM). The experimental results show that the parfor-based P-HS-SFM achieved a better computational time of about 26 h, an efficiency improvement of ? 54% and a speedup factor of 2.196 times in comparison with the HS-SFM sequential processor. The performance of the P-HS-SFM using the create independent jobs approach is also comparable to parfor with a computational time of 26.8 h, an efficiency improvement of about 30% and a speedup of 2.137 times.
Expanding Universe: slowdown or speedup?

International Nuclear Information System (INIS)

Bolotin, Yuriy L; Erokhin, Danylo A; Lemets, Oleg A

2012-01-01

The kinematics and the dynamical interpretation of cosmological expansion are reviewed in a widely accessible manner with emphasis on the acceleration aspect. Virtually all the approaches that can in principle account for the accelerated expansion of the Universe are reviewed, including dark energy as an item in the energy budget of the Universe, modified Einstein equations, and, on a fundamentally new level, the use of the holographic principle. (physics of our days)
Towards real-time detection and tracking of spatio-temporal features: Blob-filaments in fusion plasma

International Nuclear Information System (INIS)

Wu, Lingfei; Wu, Kesheng; Sim, Alex; Churchill, Michael; Choi, Jong Youl

2016-01-01

A novel algorithm and implementation of real-time identification and tracking of blob-filaments in fusion reactor data is presented. Similar spatio-temporal features are important in many other applications, for example, ignition kernels in combustion and tumor cells in a medical image. This work presents an approach for extracting these features by dividing the overall task into three steps: local identification of feature cells, grouping feature cells into extended feature, and tracking movement of feature through overlapping in space. Through our extensive work in parallelization, we demonstrate that this approach can effectively make use of a large number of compute nodes to detect and track blob-filaments in real time in fusion plasma. Here, on a set of 30GB fusion simulation data, we observed linear speedup on 1024 processes and completed blob detection in less than three milliseconds using Edison, a Cray XC30 system at NERSC.
An Efficient UD-Based Algorithm for the Computation of Maximum Likelihood Sensitivity of Continuous-Discrete Systems

DEFF Research Database (Denmark)

Boiroux, Dimitri; Juhl, Rune; Madsen, Henrik

2016-01-01

This paper addresses maximum likelihood parameter estimation of continuous-time nonlinear systems with discrete-time measurements. We derive an efficient algorithm for the computation of the log-likelihood function and its gradient, which can be used in gradient-based optimization algorithms....... This algorithm uses UD decomposition of symmetric matrices and the array algorithm for covariance update and gradient computation. We test our algorithm on the Lotka-Volterra equations. Compared to the maximum likelihood estimation based on finite difference gradient computation, we get a significant speedup...
R package imputeTestbench to compare imputations methods for univariate time series

OpenAIRE

Bokde, Neeraj; Kulat, Kishore; Beck, Marcus W; Asencio-Cortés, Gualberto

2016-01-01

This paper describes the R package imputeTestbench that provides a testbench for comparing imputation methods for missing data in univariate time series. The imputeTestbench package can be used to simulate the amount and type of missing data in a complete dataset and compare filled data using different imputation methods. The user has the option to simulate missing data by removing observations completely at random or in blocks of different sizes. Several default imputation methods are includ...
Massively Parallel and Scalable Implicit Time Integration Algorithms for Structural Dynamics

Science.gov (United States)

Farhat, Charbel

1997-01-01

Explicit codes are often used to simulate the nonlinear dynamics of large-scale structural systems, even for low frequency response, because the storage and CPU requirements entailed by the repeated factorizations traditionally found in implicit codes rapidly overwhelm the available computing resources. With the advent of parallel processing, this trend is accelerating because of the following additional facts: (a) explicit schemes are easier to parallelize than implicit ones, and (b) explicit schemes induce short range interprocessor communications that are relatively inexpensive, while the factorization methods used in most implicit schemes induce long range interprocessor communications that often ruin the sought-after speed-up. However, the time step restriction imposed by the Courant stability condition on all explicit schemes cannot yet be offset by the speed of the currently available parallel hardware. Therefore, it is essential to develop efficient alternatives to direct methods that are also amenable to massively parallel processing because implicit codes using unconditionally stable time-integration algorithms are computationally more efficient when simulating the low-frequency dynamics of aerospace structures.
Evaluation of the Xeon phi processor as a technology for the acceleration of real-time control in high-order adaptive optics systems

Science.gov (United States)

Barr, David; Basden, Alastair; Dipper, Nigel; Schwartz, Noah; Vick, Andy; Schnetler, Hermine

2014-08-01

We present wavefront reconstruction acceleration of high-order AO systems using an Intel Xeon Phi processor. The Xeon Phi is a coprocessor providing many integrated cores and designed for accelerating compute intensive, numerical codes. Unlike other accelerator technologies, it allows virtually unchanged C/C++ to be recompiled to run on the Xeon Phi, giving the potential of making development, upgrade and maintenance faster and less complex. We benchmark the Xeon Phi in the context of AO real-time control by running a matrix vector multiply (MVM) algorithm. We investigate variability in execution time and demonstrate a substantial speed-up in loop frequency. We examine the integration of a Xeon Phi into an existing RTC system and show that performance improvements can be achieved with limited development effort.
Comparing Dislodgeable 2,4-D Residues across Athletic Field Turfgrass Species and Time.

Directory of Open Access Journals (Sweden)

Matthew D Jeffries

Full Text Available 2,4-dimethylamine salt (2,4-D is an herbicide commonly applied on athletic fields for broadleaf weed control that can dislodge from treated turfgrass. Dislodge potential is affected by numerous factors, including turfgrass canopy conditions. Building on previous research confirming herbicide-turfgrass dynamics can vary widely between species, field research was initiated in 2014 and 2015 in Raleigh, NC, USA to quantify dislodgeable 2,4-D residues from dormant hybrid bermudagrass (Cynodon dactylon L. x C. transvaalensis and hybrid bermudagrass overseeded with perennial ryegrass (Lolium perenne L., which are common athletic field playing surfaces in subtropical climates. Additionally, dislodgeable 2,4-D was compared at AM (7:00 eastern standard time and PM (14:00 sample timings within a day. Samples collected from perennial ryegrass consistently resulted in greater 2,4-D dislodgment immediately after application (9.4 to 9.9% of applied compared to dormant hybrid bermudagrass (2.3 to 2.9%, as well as at all AM compared to PM timings from 1 to 3 d after treatment (DAT; 0.4 to 6.3% compared to 0.1 to 0.8%. Dislodgeable 2,4-D did not differ across turfgrass species at PM sample collections, with ≤ 0.1% of the 2,4-D applied dislodged from 1 to 6 DAT, and 2,4-D detection did not occur at 12 and 24 DAT. In conclusion, dislodgeable 2,4-D from treated turfgrass can vary between species and over short time-scales within a day. This information should be taken into account in human exposure risk assessments, as well as by turfgrass managers and athletic field event coordinators to minimize 2,4-D exposure.
A method for real-time memory efficient implementation of blob detection in large images

Directory of Open Access Journals (Sweden)

Petrović Vladimir L.

2017-01-01

Full Text Available In this paper we propose a method for real-time blob detection in large images with low memory cost. The method is suitable for implementation on the specialized parallel hardware such as multi-core platforms, FPGA and ASIC. It uses parallelism to speed-up the blob detection. The input image is divided into blocks of equal sizes to which the maximally stable extremal regions (MSER blob detector is applied in parallel. We propose the usage of multiresolution analysis for detection of large blobs which are not detected by processing the small blocks. This method can find its place in many applications such as medical imaging, text recognition, as well as video surveillance or wide area motion imagery (WAMI. We explored the possibilities of usage of detected blobs in the feature-based image alignment as well. When large images are processed, our approach is 10 to over 20 times more memory efficient than the state of the art hardware implementation of the MSER.
Parallel External Memory Graph Algorithms

DEFF Research Database (Denmark)

Arge, Lars Allan; Goodrich, Michael T.; Sitchinava, Nodari

2010-01-01

In this paper, we study parallel I/O efficient graph algorithms in the Parallel External Memory (PEM) model, one o f the private-cache chip multiprocessor (CMP) models. We study the fundamental problem of list ranking which leads to efficient solutions to problems on trees, such as computing lowest...... an optimal speedup of Â¿(P) in parallel I/O complexity and parallel computation time, compared to the single-processor external memory counterparts....
Comparative study of on-line response time measurement methods for platinum resistance thermometer

International Nuclear Information System (INIS)

Zwingelstein, G.; Gopal, R.

1979-01-01

This study deals with the in site determination of the response time of platinum resistance sensor. In the first part of this work, two methods furnishing the reference response time of the sensors are studied. In the second part of the work, two methods obtaining the response time without dismounting of the sensor, are studied. A comparative study of the performances of these methods is included for fluid velocities varying from 0 to 10 m/sec, in both laboratory and plant conditions
Recent improvements in the performance of the muiltitasked TORT on time-shared Cray computers

International Nuclear Information System (INIS)

Azmy, Y.Y.

1996-01-01

Coarse-grained angular domain decomposition of the mesh sweep algorithm has been implemented in ORNL's three dimensional transport code TORT for Cray's macrotasking environment on platforms running the UNICOS operating system. A performance model constructed earlier is reviewed and its main result, namely the identification of the sources of parallelization overhead, is used to motivate the present work. The sources of overhead treated here are: redundant operations in the angular loop across participating tasks; repetitive task creation; lock utilization to prevent overwriting the flux moment arrays accumulated by the participating tasks. Substantial reduction in the parallelization overhead is demonstrated via sample runs with fixed tunning, i.e. zero CPU hold time. Up to 50% improvement in the wall clock speedup over the previous implementation with autotunning is observed in some test problems
Comparative Performance Analysis of Coarse Solvers for Algebraic Multigrid on Multicore and Manycore Architectures

Energy Technology Data Exchange (ETDEWEB)

Druinsky, A; Ghysels, P; Li, XS; Marques, O; Williams, S; Barker, A; Kalchev, D; Vassilevski, P

2016-04-02

In this paper, we study the performance of a two-level algebraic-multigrid algorithm, with a focus on the impact of the coarse-grid solver on performance. We consider two algorithms for solving the coarse-space systems: the preconditioned conjugate gradient method and a new robust HSS-embedded low-rank sparse-factorization algorithm. Our test data comes from the SPE Comparative Solution Project for oil-reservoir simulations. We contrast the performance of our code on one 12-core socket of a Cray XC30 machine with performance on a 60-core Intel Xeon Phi coprocessor. To obtain top performance, we optimized the code to take full advantage of fine-grained parallelism and made it thread-friendly for high thread count. We also developed a bounds-and-bottlenecks performance model of the solver which we used to guide us through the optimization effort, and also carried out performance tuning in the solver’s large parameter space. Finally, as a result, significant speedups were obtained on both machines.
Blip decomposition of the path integral: exponential acceleration of real-time calculations on quantum dissipative systems.

Science.gov (United States)

Makri, Nancy

2014-10-07

The real-time path integral representation of the reduced density matrix for a discrete system in contact with a dissipative medium is rewritten in terms of the number of blips, i.e., elementary time intervals over which the forward and backward paths are not identical. For a given set of blips, it is shown that the path sum with respect to the coordinates of all remaining time points is isomorphic to that for the wavefunction of a system subject to an external driving term and thus can be summed by an inexpensive iterative procedure. This exact decomposition reduces the number of terms by a factor that increases exponentially with propagation time. Further, under conditions (moderately high temperature and/or dissipation strength) that lead primarily to incoherent dynamics, the "fully incoherent limit" zero-blip term of the series provides a reasonable approximation to the dynamics, and the blip series converges rapidly to the exact result. Retention of only the blips required for satisfactory convergence leads to speedup of full-memory path integral calculations by many orders of magnitude.
A Comparative Study of Personal Time Perspective Differences between Korean and American College Students

Science.gov (United States)

Kim, Oi-Sook; Geistfeld, Loren V.

2007-01-01

This article compares the personal time perspectives of Korean and American college students. The results indicate American students have a personal time perspective that is different from their Korean counterparts. Implications for working with Koreans and Americans as foreign students are considered. (Contains 5 tables.)
Speeding Up FPGA Placement via Partitioning and Multithreading

Directory of Open Access Journals (Sweden)

Cristinel Ababei

2009-01-01

placement subproblems are created by partitioning and then processed concurrently by multiple worker threads that are run on multiple cores of the same processor. Our main goal is to investigate the speedup that can be achieved with this simple approach compared to previous approaches that were based on distributed computing. The new hybrid parallel placement algorithm achieves an average speedup of 2.5× using four worker threads, while the total wire length and circuit delay after routing are minimally degraded.
Structure and Stability of Molecular Crystals with Many-Body Dispersion-Inclusive Density Functional Tight Binding.

Science.gov (United States)

Mortazavi, Majid; Brandenburg, Jan Gerit; Maurer, Reinhard J; Tkatchenko, Alexandre

2018-01-18

Accurate prediction of structure and stability of molecular crystals is crucial in materials science and requires reliable modeling of long-range dispersion interactions. Semiempirical electronic structure methods are computationally more efficient than their ab initio counterparts, allowing structure sampling with significant speedups. We combine the Tkatchenko-Scheffler van der Waals method (TS) and the many-body dispersion method (MBD) with third-order density functional tight-binding (DFTB3) via a charge population-based method. We find an overall good performance for the X23 benchmark database of molecular crystals, despite an underestimation of crystal volume that can be traced to the DFTB parametrization. We achieve accurate lattice energy predictions with DFT+MBD energetics on top of vdW-inclusive DFTB3 structures, resulting in a speedup of up to 3000 times compared with a full DFT treatment. This suggests that vdW-inclusive DFTB3 can serve as a viable structural prescreening tool in crystal structure prediction.
Combining Acceleration Techniques for Low-Dose X-Ray Cone Beam Computed Tomography Image Reconstruction.

Science.gov (United States)

Huang, Hsuan-Ming; Hsiao, Ing-Tsung

2017-01-01

Over the past decade, image quality in low-dose computed tomography has been greatly improved by various compressive sensing- (CS-) based reconstruction methods. However, these methods have some disadvantages including high computational cost and slow convergence rate. Many different speed-up techniques for CS-based reconstruction algorithms have been developed. The purpose of this paper is to propose a fast reconstruction framework that combines a CS-based reconstruction algorithm with several speed-up techniques. First, total difference minimization (TDM) was implemented using the soft-threshold filtering (STF). Second, we combined TDM-STF with the ordered subsets transmission (OSTR) algorithm for accelerating the convergence. To further speed up the convergence of the proposed method, we applied the power factor and the fast iterative shrinkage thresholding algorithm to OSTR and TDM-STF, respectively. Results obtained from simulation and phantom studies showed that many speed-up techniques could be combined to greatly improve the convergence speed of a CS-based reconstruction algorithm. More importantly, the increased computation time (≤10%) was minor as compared to the acceleration provided by the proposed method. In this paper, we have presented a CS-based reconstruction framework that combines several acceleration techniques. Both simulation and phantom studies provide evidence that the proposed method has the potential to satisfy the requirement of fast image reconstruction in practical CT.

Musrfit-Real Time Parameter Fitting Using GPUs

Science.gov (United States)

Locans, Uldis; Suter, Andreas

using the GPU version. The speedups using the GPU were measured comparing to the CPU implementation. Two different GPUs were used for the comparison — high end Nvidia Tesla K40c GPU designed for HPC applications and AMD Radeon R9 390× GPU designed for gaming industry.
Comparing and combining biomarkers as principle surrogates for time-to-event clinical endpoints.

Science.gov (United States)

Gabriel, Erin E; Sachs, Michael C; Gilbert, Peter B

2015-02-10

Principal surrogate endpoints are useful as targets for phase I and II trials. In many recent trials, multiple post-randomization biomarkers are measured. However, few statistical methods exist for comparison of or combination of biomarkers as principal surrogates, and none of these methods to our knowledge utilize time-to-event clinical endpoint information. We propose a Weibull model extension of the semi-parametric estimated maximum likelihood method that allows for the inclusion of multiple biomarkers in the same risk model as multivariate candidate principal surrogates. We propose several methods for comparing candidate principal surrogates and evaluating multivariate principal surrogates. These include the time-dependent and surrogate-dependent true and false positive fraction, the time-dependent and the integrated standardized total gain, and the cumulative distribution function of the risk difference. We illustrate the operating characteristics of our proposed methods in simulations and outline how these statistics can be used to evaluate and compare candidate principal surrogates. We use these methods to investigate candidate surrogates in the Diabetes Control and Complications Trial. Copyright © 2014 John Wiley & Sons, Ltd.
Low pain vs no pain multi-core Haskells

DEFF Research Database (Denmark)

Aswad, Mustafa; Trinder, Phil; Al Zain, Abyd

2011-01-01

to compare a ‘no pain’, i.e. entirely implicit, parallel implementation with three ‘low pain’, i.e. semi-explicit, language implementations. We report detailed studies comparing the parallel performance delivered. The comparative performance metric is speedup which normalises against sequential performance...... the parallelism. The results of the study are encouraging and, on occasion, surprising. We find that fully implicit parallelism as implemented in FDIP cannot yet compete with semi-explicit parallel approaches. Semi-explicit parallelism shows encouraging speedup for many of the programs in the test suite......Multicore and NUMA architectures are becoming the dominant processor technology and functional languages are theoretically well suited to exploit them. In practice, however, implementing effective high level parallel functional languages is extremely challenging. This paper is a systematic...
PLAST: parallel local alignment search tool for database comparison

Directory of Open Access Journals (Sweden)

Lavenier Dominique

2009-10-01

Full Text Available Abstract Background Sequence similarity searching is an important and challenging task in molecular biology and next-generation sequencing should further strengthen the need for faster algorithms to process such vast amounts of data. At the same time, the internal architecture of current microprocessors is tending towards more parallelism, leading to the use of chips with two, four and more cores integrated on the same die. The main purpose of this work was to design an effective algorithm to fit with the parallel capabilities of modern microprocessors. Results A parallel algorithm for comparing large genomic banks and targeting middle-range computers has been developed and implemented in PLAST software. The algorithm exploits two key parallel features of existing and future microprocessors: the SIMD programming model (SSE instruction set and the multithreading concept (multicore. Compared to multithreaded BLAST software, tests performed on an 8-processor server have shown speedup ranging from 3 to 6 with a similar level of accuracy. Conclusion A parallel algorithmic approach driven by the knowledge of the internal microprocessor architecture allows significant speedup to be obtained while preserving standard sensitivity for similarity search problems.
Modification of Brueschweiler quantum searching algorithm and realization by NMR experiment

International Nuclear Information System (INIS)

Yang Xiaodong; Wei Daxiu; Luo Jun; Miao Xijia

2002-01-01

In recent years, quantum computing research has made big progress, which exploit quantum mechanical laws, such as interference, superposition and parallelism, to perform computing tasks. The most inducing thing is that the quantum computing can provide large rise to the speedup in quantum algorithm. Quantum computing can solve some problems, which are impossible or difficult for the classical computing. The problem of searching for a specific item in an unsorted database can be solved with certain quantum algorithm, for example, Grover quantum algorithm and Brueschweiler quantum algorithm. The former gives a quadratic speedup, and the latter gives an exponential speedup comparing with the corresponding classical algorithm. In Brueschweiler quantum searching algorithm, the data qubit and the read-out qubit (the ancilla qubit) are different qubits. The authors have studied Brueschweiler algorithm and proposed a modified version, in which no ancilla qubit is needed to reach exponential speedup in the searching, the data and the read-out qubit are the same qubits. The modified Brueschweiler algorithm can be easier to design and realize. The authors also demonstrate the modified Brueschweiler algorithm in a 3-qubit molecular system by Nuclear Magnetic Resonance (NMR) experiment
Fast optimization and dose calculation in scanned ion beam therapy

International Nuclear Information System (INIS)

Hild, S.; Graeff, C.; Trautmann, J.; Kraemer, M.; Zink, K.; Durante, M.; Bert, C.

2014-01-01

Purpose: Particle therapy (PT) has advantages over photon irradiation on static tumors. An increased biological effectiveness and active target conformal dose shaping are strong arguments for PT. However, the sensitivity to changes of internal geometry complicates the use of PT for moving organs. In case of interfractionally moving objects adaptive radiotherapy (ART) concepts known from intensity modulated radiotherapy (IMRT) can be adopted for PT treatments. One ART strategy is to optimize a new treatment plan based on daily image data directly before a radiation fraction is delivered [treatment replanning (TRP)]. Optimizing treatment plans for PT using a scanned beam is a time consuming problem especially for particles other than protons where the biological effective dose has to be calculated. For the purpose of TRP, fast optimization and fast dose calculation have been implemented into the GSI in-house treatment planning system (TPS) TRiP98. Methods: This work reports about the outcome of a code analysis that resulted in optimization of the calculation processes as well as implementation of routines supporting parallel execution of the code. To benchmark the new features, the calculation time for therapy treatment planning has been studied. Results: Compared to the original version of the TPS, calculation times for treatment planning (optimization and dose calculation) have been improved by a factor of 10 with code optimization. The parallelization of the TPS resulted in a speedup factor of 12 and 5.5 for the original version and the code optimized version, respectively. Hence the total speedup of the new implementation of the authors' TPS yielded speedup factors up to 55. Conclusions: The improved TPS is capable of completing treatment planning for ion beam therapy of a prostate irradiation considering organs at risk in this has been overseen in the review process. Also see below 6 min
Multidisciplinary Simulation Acceleration using Multiple Shared-Memory Graphical Processing Units

Science.gov (United States)

Kemal, Jonathan Yashar

For purposes of optimizing and analyzing turbomachinery and other designs, the unsteady Favre-averaged flow-field differential equations for an ideal compressible gas can be solved in conjunction with the heat conduction equation. We solve all equations using the finite-volume multiple-grid numerical technique, with the dual time-step scheme used for unsteady simulations. Our numerical solver code targets CUDA-capable Graphical Processing Units (GPUs) produced by NVIDIA. Making use of MPI, our solver can run across networked compute notes, where each MPI process can use either a GPU or a Central Processing Unit (CPU) core for primary solver calculations. We use NVIDIA Tesla C2050/C2070 GPUs based on the Fermi architecture, and compare our resulting performance against Intel Zeon X5690 CPUs. Solver routines converted to CUDA typically run about 10 times faster on a GPU for sufficiently dense computational grids. We used a conjugate cylinder computational grid and ran a turbulent steady flow simulation using 4 increasingly dense computational grids. Our densest computational grid is divided into 13 blocks each containing 1033x1033 grid points, for a total of 13.87 million grid points or 1.07 million grid points per domain block. To obtain overall speedups, we compare the execution time of the solver's iteration loop, including all resource intensive GPU-related memory copies. Comparing the performance of 8 GPUs to that of 8 CPUs, we obtain an overall speedup of about 6.0 when using our densest computational grid. This amounts to an 8-GPU simulation running about 39.5 times faster than running than a single-CPU simulation.
Just-in-Time Compilation-Inspired Methodology for Parallelization of Compute Intensive Java Code

Directory of Open Access Journals (Sweden)

GHULAM MUSTAFA

2017-01-01

Full Text Available Compute intensive programs generally consume significant fraction of execution time in a small amount of repetitive code. Such repetitive code is commonly known as hotspot code. We observed that compute intensive hotspots often possess exploitable loop level parallelism. A JIT (Just-in-Time compiler profiles a running program to identify its hotspots. Hotspots are then translated into native code, for efficient execution. Using similar approach, we propose a methodology to identify hotspots and exploit their parallelization potential on multicore systems. Proposed methodology selects and parallelizes each DOALL loop that is either contained in a hotspot method or calls a hotspot method. The methodology could be integrated in front-end of a JIT compiler to parallelize sequential code, just before native translation. However, compilation to native code is out of scope of this work. As a case study, we analyze eighteen JGF (Java Grande Forum benchmarks to determine parallelization potential of hotspots. Eight benchmarks demonstrate a speedup of up to 7.6x on an 8-core system
Just-in-time compilation-inspired methodology for parallelization of compute intensive java code

International Nuclear Information System (INIS)

Mustafa, G.; Ghani, M.U.

2017-01-01

Compute intensive programs generally consume significant fraction of execution time in a small amount of repetitive code. Such repetitive code is commonly known as hotspot code. We observed that compute intensive hotspots often possess exploitable loop level parallelism. A JIT (Just-in-Time) compiler profiles a running program to identify its hotspots. Hotspots are then translated into native code, for efficient execution. Using similar approach, we propose a methodology to identify hotspots and exploit their parallelization potential on multicore systems. Proposed methodology selects and parallelizes each DOALL loop that is either contained in a hotspot method or calls a hotspot method. The methodology could be integrated in front-end of a JIT compiler to parallelize sequential code, just before native translation. However, compilation to native code is out of scope of this work. As a case study, we analyze eighteen JGF (Java Grande Forum) benchmarks to determine parallelization potential of hotspots. Eight benchmarks demonstrate a speedup of up to 7.6x on an 8-core system. (author)
RNA motif search with data-driven element ordering.

Science.gov (United States)

Rampášek, Ladislav; Jimenez, Randi M; Lupták, Andrej; Vinař, Tomáš; Brejová, Broňa

2016-05-18

In this paper, we study the problem of RNA motif search in long genomic sequences. This approach uses a combination of sequence and structure constraints to uncover new distant homologs of known functional RNAs. The problem is NP-hard and is traditionally solved by backtracking algorithms. We have designed a new algorithm for RNA motif search and implemented a new motif search tool RNArobo. The tool enhances the RNAbob descriptor language, allowing insertions in helices, which enables better characterization of ribozymes and aptamers. A typical RNA motif consists of multiple elements and the running time of the algorithm is highly dependent on their ordering. By approaching the element ordering problem in a principled way, we demonstrate more than 100-fold speedup of the search for complex motifs compared to previously published tools. We have developed a new method for RNA motif search that allows for a significant speedup of the search of complex motifs that include pseudoknots. Such speed improvements are crucial at a time when the rate of DNA sequencing outpaces growth in computing. RNArobo is available at http://compbio.fmph.uniba.sk/rnarobo .
Multitasking the code ARC3D. [for computational fluid dynamics

Science.gov (United States)

Barton, John T.; Hsiung, Christopher C.

1986-01-01

The CRAY multitasking system was developed in order to utilize all four processors and sharply reduce the wall clock run time. This paper describes the techniques used to modify the computational fluid dynamics code ARC3D for this run and analyzes the achieved speedup. The ARC3D code solves either the Euler or thin-layer N-S equations using an implicit approximate factorization scheme. Results indicate that multitask processing can be used to achieve wall clock speedup factors of over three times, depending on the nature of the program code being used. Multitasking appears to be particularly advantageous for large-memory problems running on multiple CPU computers.
Comparing an Annual and a Daily Time-Step Model for Predicting Field-Scale Phosphorus Loss.

Science.gov (United States)

Bolster, Carl H; Forsberg, Adam; Mittelstet, Aaron; Radcliffe, David E; Storm, Daniel; Ramirez-Avila, John; Sharpley, Andrew N; Osmond, Deanna

2017-11-01

A wide range of mathematical models are available for predicting phosphorus (P) losses from agricultural fields, ranging from simple, empirically based annual time-step models to more complex, process-based daily time-step models. In this study, we compare field-scale P-loss predictions between the Annual P Loss Estimator (APLE), an empirically based annual time-step model, and the Texas Best Management Practice Evaluation Tool (TBET), a process-based daily time-step model based on the Soil and Water Assessment Tool. We first compared predictions of field-scale P loss from both models using field and land management data collected from 11 research sites throughout the southern United States. We then compared predictions of P loss from both models with measured P-loss data from these sites. We observed a strong and statistically significant ( loss between the two models; however, APLE predicted, on average, 44% greater dissolved P loss, whereas TBET predicted, on average, 105% greater particulate P loss for the conditions simulated in our study. When we compared model predictions with measured P-loss data, neither model consistently outperformed the other, indicating that more complex models do not necessarily produce better predictions of field-scale P loss. Our results also highlight limitations with both models and the need for continued efforts to improve their accuracy. Copyright © by the American Society of Agronomy, Crop Science Society of America, and Soil Science Society of America, Inc.
On program restructuring, scheduling, and communication for parallel processor systems

Energy Technology Data Exchange (ETDEWEB)

Polychronopoulos, Constantine D. [Univ. of Illinois, Urbana, IL (United States)

1986-08-01

This dissertation discusses several software and hardware aspects of program execution on large-scale, high-performance parallel processor systems. The issues covered are program restructuring, partitioning, scheduling and interprocessor communication, synchronization, and hardware design issues of specialized units. All this work was performed focusing on a single goal: to maximize program speedup, or equivalently, to minimize parallel execution time. Parafrase, a Fortran restructuring compiler was used to transform programs in a parallel form and conduct experiments. Two new program restructuring techniques are presented, loop coalescing and subscript blocking. Compile-time and run-time scheduling schemes are covered extensively. Depending on the program construct, these algorithms generate optimal or near-optimal schedules. For the case of arbitrarily nested hybrid loops, two optimal scheduling algorithms for dynamic and static scheduling are presented. Simulation results are given for a new dynamic scheduling algorithm. The performance of this algorithm is compared to that of self-scheduling. Techniques for program partitioning and minimization of interprocessor communication for idealized program models and for real Fortran programs are also discussed. The close relationship between scheduling, interprocessor communication, and synchronization becomes apparent at several points in this work. Finally, the impact of various types of overhead on program speedup and experimental results are presented.
QR-decomposition based SENSE reconstruction using parallel architecture.

Science.gov (United States)

Ullah, Irfan; Nisar, Habab; Raza, Haseeb; Qasim, Malik; Inam, Omair; Omer, Hammad

2018-04-01

Magnetic Resonance Imaging (MRI) is a powerful medical imaging technique that provides essential clinical information about the human body. One major limitation of MRI is its long scan time. Implementation of advance MRI algorithms on a parallel architecture (to exploit inherent parallelism) has a great potential to reduce the scan time. Sensitivity Encoding (SENSE) is a Parallel Magnetic Resonance Imaging (pMRI) algorithm that utilizes receiver coil sensitivities to reconstruct MR images from the acquired under-sampled k-space data. At the heart of SENSE lies inversion of a rectangular encoding matrix. This work presents a novel implementation of GPU based SENSE algorithm, which employs QR decomposition for the inversion of the rectangular encoding matrix. For a fair comparison, the performance of the proposed GPU based SENSE reconstruction is evaluated against single and multicore CPU using openMP. Several experiments against various acceleration factors (AFs) are performed using multichannel (8, 12 and 30) phantom and in-vivo human head and cardiac datasets. Experimental results show that GPU significantly reduces the computation time of SENSE reconstruction as compared to multi-core CPU (approximately 12x speedup) and single-core CPU (approximately 53x speedup) without any degradation in the quality of the reconstructed images. Copyright © 2018 Elsevier Ltd. All rights reserved.
Limits on the efficiency of event-based algorithms for Monte Carlo neutron transport

Directory of Open Access Journals (Sweden)

Paul K. Romano

2017-09-01

Full Text Available The traditional form of parallelism in Monte Carlo particle transport simulations, wherein each individual particle history is considered a unit of work, does not lend itself well to data-level parallelism. Event-based algorithms, which were originally used for simulations on vector processors, may offer a path toward better utilizing data-level parallelism in modern computer architectures. In this study, a simple model is developed for estimating the efficiency of the event-based particle transport algorithm under two sets of assumptions. Data collected from simulations of four reactor problems using OpenMC was then used in conjunction with the models to calculate the speedup due to vectorization as a function of the size of the particle bank and the vector width. When each event type is assumed to have constant execution time, the achievable speedup is directly related to the particle bank size. We observed that the bank size generally needs to be at least 20 times greater than vector size to achieve vector efficiency greater than 90%. When the execution times for events are allowed to vary, the vector speedup is also limited by differences in the execution time for events being carried out in a single event-iteration.
Highly parallel line-based image coding for many cores.

Science.gov (United States)

Peng, Xiulian; Xu, Jizheng; Zhou, You; Wu, Feng

2012-01-01

Computers are developing along with a new trend from the dual-core and quad-core processors to ones with tens or even hundreds of cores. Multimedia, as one of the most important applications in computers, has an urgent need to design parallel coding algorithms for compression. Taking intraframe/image coding as a start point, this paper proposes a pure line-by-line coding scheme (LBLC) to meet the need. In LBLC, an input image is processed line by line sequentially, and each line is divided into small fixed-length segments. The compression of all segments from prediction to entropy coding is completely independent and concurrent at many cores. Results on a general-purpose computer show that our scheme can get a 13.9 times speedup with 15 cores at the encoder and a 10.3 times speedup at the decoder. Ideally, such near-linear speeding relation with the number of cores can be kept for more than 100 cores. In addition to the high parallelism, the proposed scheme can perform comparatively or even better than the H.264 high profile above middle bit rates. At near-lossless coding, it outperforms H.264 more than 10 dB. At lossless coding, up to 14% bit-rate reduction is observed compared with H.264 lossless coding at the high 4:4:4 profile.
SPEEDUP simulation of liquid waste batch processing. Revision 1

International Nuclear Information System (INIS)

Shannahan, K.L.; Aull, J.E.; Dimenna, R.A.

1994-01-01

The Savannah River Site (SRS) has accumulated radioactive hazardous waste for over 40 years during the time SRS made nuclear materials for the United States Department of Energy (DOE) and its predecessors. This waste is being stored as caustic slurry in a large number of 1 million gallon steel tanks, some of which were initially constructed in the early 1950's. SRS and DOE intend to clean up the Site and convert this waste into stable forms which then can be safely stored. The liquid waste will be separated into a partially decontaminated low-level and radioactive high-level waste in one feed preparation operation, In-Tank Precipitation. The low-level waste will be used to make a concrete product called saltstone in the Saltstone Facility, a part of the Defense Waste Processing Facility (DWPF). The concrete will be poured into large vaults, where it will be permanently stored. The high-level waste will be added to glass-formers and waste slurry solids from another feed preparation operation, Extended Sludge Processing. The mixture will then be converted to a stable borosilicate glass by a vitrification process that is the other major part of the DWPF. This glass will be poured into stainless steel canisters and sent to a temporary storage facility prior to delivery to a permanent underground storage site
LAPACKrc: Fast linear algebra kernels/solvers for FPGA accelerators

International Nuclear Information System (INIS)

Gonzalez, Juan; Nunez, Rafael C

2009-01-01

We present LAPACKrc, a family of FPGA-based linear algebra solvers able to achieve more than 100x speedup per commodity processor on certain problems. LAPACKrc subsumes some of the LAPACK and ScaLAPACK functionalities, and it also incorporates sparse direct and iterative matrix solvers. Current LAPACKrc prototypes demonstrate between 40x-150x speedup compared against top-of-the-line hardware/software systems. A technology roadmap is in place to validate current performance of LAPACKrc in HPC applications, and to increase the computational throughput by factors of hundreds within the next few years.
Comparing Biomechanical Properties, Repair Times, and Value of Common Core Flexor Tendon Repairs.

Science.gov (United States)

Chauhan, Aakash; Schimoler, Patrick; Miller, Mark C; Kharlamov, Alexander; Merrell, Gregory A; Palmer, Bradley A

2018-05-01

The aim of the study was to compare biomechanical strength, repair times, and repair values for zone II core flexor tendon repairs. A total of 75 fresh-frozen human cadaveric flexor tendons were harvested from the index through small finger and randomized into one of 5 repair groups: 4-stranded cross-stitch cruciate (4-0 polyester and 4-0 braided suture), 4-stranded double Pennington (2-0 knotless barbed suture), 4-stranded Pennington (4-0 double-stranded braided suture), and 6-stranded modified Lim-Tsai (4-0 looped braided suture). Repairs were measured in situ and their repair times were measured. Tendons were linearly loaded to failure and multiple biomechanical values were measured. The repair value was calculated based on operating room costs, repair times, and suture costs. Analysis of variance (ANOVA) and Tukey post hoc statistical analysis were used to compare repair data. The braided cruciate was the strongest repair ( P > .05) but the slowest ( P > .05), and the 4-stranded Pennington using double-stranded suture was the fastest ( P > .05) to perform. The total repair value was the highest for braided cruciate ( P > .05) compared with all other repairs. Barbed suture did not outperform any repairs in any categories. The braided cruciate was the strongest of the tested flexor tendon repairs. The 2-mm gapping and maximum load to failure for this repair approached similar historical strength of other 6- and 8-stranded repairs. In this study, suture cost was negligible in the overall repair cost and should be not a determining factor in choosing a repair.
Real-Time Agent-Based Modeling Simulation with in-situ Visualization of Complex Biological Systems: A Case Study on Vocal Fold Inflammation and Healing.

Science.gov (United States)

Seekhao, Nuttiiya; Shung, Caroline; JaJa, Joseph; Mongeau, Luc; Li-Jessen, Nicole Y K

2016-05-01

We present an efficient and scalable scheme for implementing agent-based modeling (ABM) simulation with In Situ visualization of large complex systems on heterogeneous computing platforms. The scheme is designed to make optimal use of the resources available on a heterogeneous platform consisting of a multicore CPU and a GPU, resulting in minimal to no resource idle time. Furthermore, the scheme was implemented under a client-server paradigm that enables remote users to visualize and analyze simulation data as it is being generated at each time step of the model. Performance of a simulation case study of vocal fold inflammation and wound healing with 3.8 million agents shows 35× and 7× speedup in execution time over single-core and multi-core CPU respectively. Each iteration of the model took less than 200 ms to simulate, visualize and send the results to the client. This enables users to monitor the simulation in real-time and modify its course as needed.

Krylov subspace acceleration of waveform relaxation

Energy Technology Data Exchange (ETDEWEB)

Lumsdaine, A.; Wu, Deyun [Univ. of Notre Dame, IN (United States)

1996-12-31

Standard solution methods for numerically solving time-dependent problems typically begin by discretizing the problem on a uniform time grid and then sequentially solving for successive time points. The initial time discretization imposes a serialization to the solution process and limits parallel speedup to the speedup available from parallelizing the problem at any given time point. This bottleneck can be circumvented by the use of waveform methods in which multiple time-points of the different components of the solution are computed independently. With the waveform approach, a problem is first spatially decomposed and distributed among the processors of a parallel machine. Each processor then solves its own time-dependent subsystem over the entire interval of interest using previous iterates from other processors as inputs. Synchronization and communication between processors take place infrequently, and communication consists of large packets of information - discretized functions of time (i.e., waveforms).
Implementation of a RANLUX Based Pseudo-Random Number Generator in FPGA Using VHDL and Impulse C

OpenAIRE

Agnieszka Dąbrowska-Boruch; Grzegorz Gancarczyk; Kazimierz Wiatr

2014-01-01

Monte Carlo simulations are widely used e.g. in the field of physics and molecular modelling. The main role played in these is by the high performance random number generators, such as RANLUX or MERSSENE TWISTER. In this paper the authors introduce the world's first implementation of the RANLUX algorithm on an FPGA platform for high performance computing purposes. A significant speed-up of one generator instance over 60 times, compared with a graphic card based solution, can be noticed. Compa...
Energy Efficient FPGA based Hardware Accelerators for Financial Applications

DEFF Research Database (Denmark)

Kenn Toft, Jakob; Nannarelli, Alberto

2014-01-01

Field Programmable Gate Arrays (FPGAs) based accelerators are very suitable to implement application-specific processors using uncommon operations or number systems. In this work, we design FPGA-based accelerators for two financial computations with different characteristics and we compare...... the accelerator performance and energy consumption to a software execution of the application. The experimental results show that significant speed-up and energy savings, can be obtained for large data sets by using the accelerator at expenses of a longer development time....
Comparing Sources of Storm-Time Ring Current O+

Science.gov (United States)

Kistler, L. M.

2015-12-01

The first observations of the storm-time ring current composition using AMPTE/CCE data showed that the O+ contribution to the ring current increases significantly during storms. The ring current is predominantly formed from inward transport of the near-earth plasma sheet. Thus the increase of O+ in the ring current implies that the ionospheric contribution to the plasma sheet has increased. The ionospheric plasma that reaches the plasma sheet can come from both the cusp and the nightside aurora. The cusp outflow moves through the lobe and enters the plasma sheet through reconnection at the near-earth neutral line. The nightside auroral outflow has direct access to nightside plasma sheet. Using data from Cluster and the Van Allen Probes spacecraft, we compare the development of storms in cases where there is a clear input of nightside auroral outflow, and in cases where there is a significant cusp input. We find that the cusp input, which enters the tail at ~15-20 Re becomes isotropized when it crosses the neutral sheet, and becomes part of the hot (>1 keV) plasma sheet population as it convects inward. The auroral outflow, which enters the plasma sheet closer to the earth, where the radius of curvature of the field line is larger, does not isotropize or become significantly energized, but remains a predominantly field aligned low energy population in the inner magnetosphere. It is the hot plasma sheet population that gets accelerated to high enough energies in the inner magnetosphere to contribute strongly to the ring current pressure. Thus it appears that O+ that enters the plasma sheet further down the tail has a greater impact on the storm-time ring current than ions that enter closer to the earth.
Comparative antianaerobic activities of doripenem determined by MIC and time-kill analysis.

Science.gov (United States)

Credito, Kim L; Ednie, Lois M; Appelbaum, Peter C

2008-01-01

Against 447 anaerobe strains, the investigational carbapenem doripenem had an MIC 50 of 0.125 microg/ml and an MIC 90 of 1 microg/ml. Results were similar to those for imipenem, meropenem, and ertapenem. Time-kill studies showed that doripenem had very good bactericidal activity compared to other carbapenems, with 99.9% killing of 11 strains at 2x MIC after 48 h.
Comparing the impact of time displaced and biased precipitation estimates for online updated urban runoff models.

Science.gov (United States)

Borup, Morten; Grum, Morten; Mikkelsen, Peter Steen

2013-01-01

When an online runoff model is updated from system measurements, the requirements of the precipitation input change. Using rain gauge data as precipitation input there will be a displacement between the time when the rain hits the gauge and the time where the rain hits the actual catchment, due to the time it takes for the rain cell to travel from the rain gauge to the catchment. Since this time displacement is not present for system measurements the data assimilation scheme might already have updated the model to include the impact from the particular rain cell when the rain data is forced upon the model, which therefore will end up including the same rain twice in the model run. This paper compares forecast accuracy of updated models when using time displaced rain input to that of rain input with constant biases. This is done using a simple time-area model and historic rain series that are either displaced in time or affected with a bias. The results show that for a 10 minute forecast, time displacements of 5 and 10 minutes compare to biases of 60 and 100%, respectively, independent of the catchments time of concentration.
The BlueGene/L Supercomputer and Quantum ChromoDynamics

International Nuclear Information System (INIS)

Vranas, P; Soltz, R

2006-01-01

In summary our update contains: (1) Perfect speedup sustaining 19.3% of peak for the Wilson D D-slash Dirac operator. (2) Measurements of the full Conjugate Gradient (CG) inverter that inverts the Dirac operator. The CG inverter contains two global sums over the entire machine. Nevertheless, our measurements retain perfect speedup scaling demonstrating the robustness of our methods. (3) We ran on the largest BG/L system, the LLNL 64 rack BG/L supercomputer, and obtained a sustained speed of 59.1 TFlops. Furthermore, the speedup scaling of the Dirac operator and of the CG inverter are perfect all the way up to the full size of the machine, 131,072 cores (please see Figure II). The local lattice is rather small (4 x 4 x 4 x 16) while the total lattice has been a lattice QCD vision for thermodynamic studies (a total of 128 x 128 x 256 x 32 lattice sites). This speed is about five times larger compared to the speed we quoted in our submission. As we have pointed out in our paper QCD is notoriously sensitive to network and memory latencies, has a relatively high communication to computation ratio which can not be overlapped in BGL in virtual node mode, and as an application is in a class of its own. The above results are thrilling to us and a 30 year long dream for lattice QCD
The Validity and Precision of the Comparative Interrupted Time-Series Design: Three Within-Study Comparisons

Science.gov (United States)

St. Clair, Travis; Hallberg, Kelly; Cook, Thomas D.

2016-01-01

We explore the conditions under which short, comparative interrupted time-series (CITS) designs represent valid alternatives to randomized experiments in educational evaluations. To do so, we conduct three within-study comparisons, each of which uses a unique data set to test the validity of the CITS design by comparing its causal estimates to…
BAYESIAN TECHNIQUES FOR COMPARING TIME-DEPENDENT GRMHD SIMULATIONS TO VARIABLE EVENT HORIZON TELESCOPE OBSERVATIONS

Energy Technology Data Exchange (ETDEWEB)

Kim, Junhan; Marrone, Daniel P.; Chan, Chi-Kwan; Medeiros, Lia; Özel, Feryal; Psaltis, Dimitrios, E-mail: junhankim@email.arizona.edu [Department of Astronomy and Steward Observatory, University of Arizona, 933 N. Cherry Avenue, Tucson, AZ 85721 (United States)

2016-12-01

The Event Horizon Telescope (EHT) is a millimeter-wavelength, very-long-baseline interferometry (VLBI) experiment that is capable of observing black holes with horizon-scale resolution. Early observations have revealed variable horizon-scale emission in the Galactic Center black hole, Sagittarius A* (Sgr A*). Comparing such observations to time-dependent general relativistic magnetohydrodynamic (GRMHD) simulations requires statistical tools that explicitly consider the variability in both the data and the models. We develop here a Bayesian method to compare time-resolved simulation images to variable VLBI data, in order to infer model parameters and perform model comparisons. We use mock EHT data based on GRMHD simulations to explore the robustness of this Bayesian method and contrast it to approaches that do not consider the effects of variability. We find that time-independent models lead to offset values of the inferred parameters with artificially reduced uncertainties. Moreover, neglecting the variability in the data and the models often leads to erroneous model selections. We finally apply our method to the early EHT data on Sgr A*.
Comparative Antianaerobic Activities of Doripenem Determined by MIC and Time-Kill Analysis▿

Science.gov (United States)

Credito, Kim L.; Ednie, Lois M.; Appelbaum, Peter C.

2008-01-01

Against 447 anaerobe strains, the investigational carbapenem doripenem had an MIC50 of 0.125 μg/ml and an MIC90 of 1 μg/ml. Results were similar to those for imipenem, meropenem, and ertapenem. Time-kill studies showed that doripenem had very good bactericidal activity compared to other carbapenems, with 99.9% killing of 11 strains at 2× MIC after 48 h. PMID:17938185
Nuclear arms race gearing for speedup

International Nuclear Information System (INIS)

Heylin, M.

1981-01-01

To probe the rationale behind the big buildup in US strategic arms that is presaged by the current enhanced R and D effort - and to explore the broader, more long-term role of science and technology in the nuclear arms race - C and EN in recent months spoke with a host of experts both within and outside the defense establishment. It is a topic of incredible complexity, high controversy, and of the highest stakes imaginable - the survival of civilization. This buildup will include over the next decade, apart from the MX, a new, highly accurate, submarine-launched ballistic missile and a fleet of very large submarines to carry it; an air-launched cruise missile; a new long-range bomber; a new intermediate-range missile and a new ground-launched cruise missile, both capable of hitting targets in the Soviet Union from proposed bases in Western Europe; and a new sea-launched cruise missile that can be fired from conventional submarines or other naval vessels. To spokesmen for, and members of, the defense establishment the US buildup is prudent, even minimal. According to them, it is needed to keep the US at least on a par with the growth of Soviet strategic might which was very substantial in the 1970's and which will carry over into the 1980's with further major gains. It also is needed to keep the lid on Soviet expansionism; and it is the best way to prevent a nuclear war. To critics, the proposed buildup is the height of lunacy. According to them, the US strategic arsenal is more than adequate today. And it can continue to serve its only legitimate purpose - to deter nuclear war, no matter how much the Soviets may choose to build up their nuclear forces - with a much-more-modest modernization program
Comparative measurement of the neutral density and particle confinement time in EBT

International Nuclear Information System (INIS)

Glowienka, J.C.; Richards, R.K.

1985-11-01

The neutral density and particle confinement time in the ELMO Bumpy Torus-Scale Experiment (EBT-S) have been determined by two different techniques. These involve a spectroscopic measurement of molecular and atomic hydrogen emissions and a time-decay measurement of a fast-ion population using a diagnostic neutral beam. The results from both diagnostics exhibit identical trends for either estimate, although the absolute values differ by a factor of 2 to 3. The observed variations with fill gas pressure and microwave power from either technique are consistent with measurements of electron density and temperature. In this paper, the measurement techniques are discussed, and the results are compared in the context of consistency with independently observed plasma behavior. 6 refs., 7 figs
Parallel Backprojection: A Case Study in High-Performance Reconfigurable Computing

Directory of Open Access Journals (Sweden)

Cordes Ben

2009-01-01

Full Text Available High-performance reconfigurable computing (HPRC is a novel approach to provide large-scale computing power to modern scientific applications. Using both general-purpose processors and FPGAs allows application designers to exploit fine-grained and coarse-grained parallelism, achieving high degrees of speedup. One scientific application that benefits from this technique is backprojection, an image formation algorithm that can be used as part of a synthetic aperture radar (SAR processing system. We present an implementation of backprojection for SAR on an HPRC system. Using simulated data taken at a variety of ranges, our implementation runs over 200 times faster than a similar software program, with an overall application speedup better than 50x. The backprojection application is easily parallelizable, achieving near-linear speedup when run on multiple nodes of a clustered HPRC system. The results presented can be applied to other systems and other algorithms with similar characteristics.
Parallel Backprojection: A Case Study in High-Performance Reconfigurable Computing

Directory of Open Access Journals (Sweden)

2009-03-01

Full Text Available High-performance reconfigurable computing (HPRC is a novel approach to provide large-scale computing power to modern scientific applications. Using both general-purpose processors and FPGAs allows application designers to exploit fine-grained and coarse-grained parallelism, achieving high degrees of speedup. One scientific application that benefits from this technique is backprojection, an image formation algorithm that can be used as part of a synthetic aperture radar (SAR processing system. We present an implementation of backprojection for SAR on an HPRC system. Using simulated data taken at a variety of ranges, our implementation runs over 200 times faster than a similar software program, with an overall application speedup better than 50x. The backprojection application is easily parallelizable, achieving near-linear speedup when run on multiple nodes of a clustered HPRC system. The results presented can be applied to other systems and other algorithms with similar characteristics.
Parallel Computation of the Jacobian Matrix for Nonlinear Equation Solvers Using MATLAB

Science.gov (United States)

Rose, Geoffrey K.; Nguyen, Duc T.; Newman, Brett A.

2017-01-01

Demonstrating speedup for parallel code on a multicore shared memory PC can be challenging in MATLAB due to underlying parallel operations that are often opaque to the user. This can limit potential for improvement of serial code even for the so-called embarrassingly parallel applications. One such application is the computation of the Jacobian matrix inherent to most nonlinear equation solvers. Computation of this matrix represents the primary bottleneck in nonlinear solver speed such that commercial finite element (FE) and multi-body-dynamic (MBD) codes attempt to minimize computations. A timing study using MATLAB's Parallel Computing Toolbox was performed for numerical computation of the Jacobian. Several approaches for implementing parallel code were investigated while only the single program multiple data (spmd) method using composite objects provided positive results. Parallel code speedup is demonstrated but the goal of linear speedup through the addition of processors was not achieved due to PC architecture.
A Unified Framework for Estimating Minimum Detectable Effects for Comparative Short Interrupted Time Series Designs

Science.gov (United States)

Price, Cristofer; Unlu, Fatih

2014-01-01

The Comparative Short Interrupted Time Series (C-SITS) design is a frequently employed quasi-experimental method, in which the pre- and post-intervention changes observed in the outcome levels of a treatment group is compared with those of a comparison group where the difference between the former and the latter is attributed to the treatment. The…
Accelerating large-scale protein structure alignments with graphics processing units

Directory of Open Access Journals (Sweden)

Pang Bin

2012-02-01

Full Text Available Abstract Background Large-scale protein structure alignment, an indispensable tool to structural bioinformatics, poses a tremendous challenge on computational resources. To ensure structure alignment accuracy and efficiency, efforts have been made to parallelize traditional alignment algorithms in grid environments. However, these solutions are costly and of limited accessibility. Others trade alignment quality for speedup by using high-level characteristics of structure fragments for structure comparisons. Findings We present ppsAlign, a parallel protein structure Alignment framework designed and optimized to exploit the parallelism of Graphics Processing Units (GPUs. As a general-purpose GPU platform, ppsAlign could take many concurrent methods, such as TM-align and Fr-TM-align, into the parallelized algorithm design. We evaluated ppsAlign on an NVIDIA Tesla C2050 GPU card, and compared it with existing software solutions running on an AMD dual-core CPU. We observed a 36-fold speedup over TM-align, a 65-fold speedup over Fr-TM-align, and a 40-fold speedup over MAMMOTH. Conclusions ppsAlign is a high-performance protein structure alignment tool designed to tackle the computational complexity issues from protein structural data. The solution presented in this paper allows large-scale structure comparisons to be performed using massive parallel computing power of GPU.
Rubus: A compiler for seamless and extensible parallelism

Science.gov (United States)

Adnan, Muhammad; Aslam, Faisal; Sarwar, Syed Mansoor

2017-01-01

Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU), originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer’s expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84 times has been
Rubus: A compiler for seamless and extensible parallelism.

Directory of Open Access Journals (Sweden)

Muhammad Adnan

Full Text Available Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU, originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer's expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84
Speeding up spin-component-scaled third-order pertubation theory with the chain of spheres approximation: the COSX-SCS-MP3 method

Science.gov (United States)

Izsák, Róbert; Neese, Frank

2013-07-01

The 'chain of spheres' approximation, developed earlier for the efficient evaluation of the self-consistent field exchange term, is introduced here into the evaluation of the external exchange term of higher order correlation methods. Its performance is studied in the specific case of the spin-component-scaled third-order Møller--Plesset perturbation (SCS-MP3) theory. The results indicate that the approximation performs excellently in terms of both computer time and achievable accuracy. Significant speedups over a conventional method are obtained for larger systems and basis sets. Owing to this development, SCS-MP3 calculations on molecules of the size of penicillin (42 atoms) with a polarised triple-zeta basis set can be performed in ∼3 hours using 16 cores of an Intel Xeon E7-8837 processor with a 2.67 GHz clock speed, which represents a speedup by a factor of 8-9 compared to the previously most efficient algorithm. Thus, the increased accuracy offered by SCS-MP3 can now be explored for at least medium-sized molecules.

Parallel S/sub n/ iteration schemes

International Nuclear Information System (INIS)

Wienke, B.R.; Hiromoto, R.E.

1986-01-01

The iterative, multigroup, discrete ordinates (S/sub n/) technique for solving the linear transport equation enjoys widespread usage and appeal. Serial iteration schemes and numerical algorithms developed over the years provide a timely framework for parallel extension. On the Denelcor HEP, the authors investigate three parallel iteration schemes for solving the one-dimensional S/sub n/ transport equation. The multigroup representation and serial iteration methods are also reviewed. This analysis represents a first attempt to extend serial S/sub n/ algorithms to parallel environments and provides good baseline estimates on ease of parallel implementation, relative algorithm efficiency, comparative speedup, and some future directions. The authors examine ordered and chaotic versions of these strategies, with and without concurrent rebalance and diffusion acceleration. Two strategies efficiently support high degrees of parallelization and appear to be robust parallel iteration techniques. The third strategy is a weaker parallel algorithm. Chaotic iteration, difficult to simulate on serial machines, holds promise and converges faster than ordered versions of the schemes. Actual parallel speedup and efficiency are high and payoff appears substantial
A J matrix engine for density functional theory calculations

International Nuclear Information System (INIS)

White, C.A.; Head-Gordon, M.

1996-01-01

We introduce a new method for the formation of the J matrix (Coulomb interaction matrix) within a basis of Cartesian Gaussian functions, as needed in density functional theory and Hartree endash Fock calculations. By summing the density matrix into the underlying Gaussian integral formulas, we have developed a J matrix open-quote open-quote engine close-quote close-quote which forms the exact J matrix without explicitly forming the full set of two electron integral intermediates. Several precomputable quantities have been identified, substantially reducing the number of floating point operations and memory accesses needed in a J matrix calculation. Initial timings indicate a speedup of greater than four times for the (pp parallel pp) class of integrals with speedups increasing to over ten times for (ff parallel ff) integrals. copyright 1996 American Institute of Physics
Dynamic response of Sjögren Inlet glaciers, Antarctic Peninsula, to ice shelf breakup derived from multi-mission remote sensing time series.

NARCIS (Netherlands)

Seehaus, T.C.; Marinsek, S.; Skvarca, P.; van Wessem, J.M.; Reijmer, C.H.; Seco, J.L.; Braun, M.

2016-01-01

The substantial retreat or disintegration of numerous ice shelves has been observed on the Antarctic Peninsula. The ice shelf in the Prince Gustav Channel has retreated gradually since the late 1980s and broke up in 1995. Tributary glaciers reacted with speed-up, surface lowering and increased ice
78 FR 49768 - Notice Pursuant to the National Cooperative Research and Production Act of 1993-Joint Task-Force...

Science.gov (United States)

2013-08-15

...; Ciena, Kanata, Ontario, CANADA; Cinegy, Munich, GERMANY; Cisco, San Jose, CA; Cobalt Digital Inc... speed-up content time-to-market. Patricia A. Brink, Director of Civil Enforcement, Antitrust Division...
Parallel algorithms for placement and routing in VLSI design. Ph.D. Thesis

Science.gov (United States)

Brouwer, Randall Jay

1991-01-01

The computational requirements for high quality synthesis, analysis, and verification of very large scale integration (VLSI) designs have rapidly increased with the fast growing complexity of these designs. Research in the past has focused on the development of heuristic algorithms, special purpose hardware accelerators, or parallel algorithms for the numerous design tasks to decrease the time required for solution. Two new parallel algorithms are proposed for two VLSI synthesis tasks, standard cell placement and global routing. The first algorithm, a parallel algorithm for global routing, uses hierarchical techniques to decompose the routing problem into independent routing subproblems that are solved in parallel. Results are then presented which compare the routing quality to the results of other published global routers and which evaluate the speedups attained. The second algorithm, a parallel algorithm for cell placement and global routing, hierarchically integrates a quadrisection placement algorithm, a bisection placement algorithm, and the previous global routing algorithm. Unique partitioning techniques are used to decompose the various stages of the algorithm into independent tasks which can be evaluated in parallel. Finally, results are presented which evaluate the various algorithm alternatives and compare the algorithm performance to other placement programs. Measurements are presented on the parallel speedups available.
Efficient Hardware Implementation of the Horn-Schunck Algorithm for High-Resolution Real-Time Dense Optical Flow Sensor

Science.gov (United States)

Komorkiewicz, Mateusz; Kryjak, Tomasz; Gorgon, Marek

2014-01-01

This article presents an efficient hardware implementation of the Horn-Schunck algorithm that can be used in an embedded optical flow sensor. An architecture is proposed, that realises the iterative Horn-Schunck algorithm in a pipelined manner. This modification allows to achieve data throughput of 175 MPixels/s and makes processing of Full HD video stream (1, 920 × 1, 080 @ 60 fps) possible. The structure of the optical flow module as well as pre- and post-filtering blocks and a flow reliability computation unit is described in details. Three versions of optical flow modules, with different numerical precision, working frequency and obtained results accuracy are proposed. The errors caused by switching from floating- to fixed-point computations are also evaluated. The described architecture was tested on popular sequences from an optical flow dataset of the Middlebury University. It achieves state-of-the-art results among hardware implementations of single scale methods. The designed fixed-point architecture achieves performance of 418 GOPS with power efficiency of 34 GOPS/W. The proposed floating-point module achieves 103 GFLOPS, with power efficiency of 24 GFLOPS/W. Moreover, a 100 times speedup compared to a modern CPU with SIMD support is reported. A complete, working vision system realized on Xilinx VC707 evaluation board is also presented. It is able to compute optical flow for Full HD video stream received from an HDMI camera in real-time. The obtained results prove that FPGA devices are an ideal platform for embedded vision systems. PMID:24526303
Compute-unified device architecture implementation of a block-matching algorithm for multiple graphical processing unit cards.

Science.gov (United States)

Massanes, Francesc; Cadennes, Marie; Brankov, Jovan G

2011-07-01

In this paper we describe and evaluate a fast implementation of a classical block matching motion estimation algorithm for multiple Graphical Processing Units (GPUs) using the Compute Unified Device Architecture (CUDA) computing engine. The implemented block matching algorithm (BMA) uses summed absolute difference (SAD) error criterion and full grid search (FS) for finding optimal block displacement. In this evaluation we compared the execution time of a GPU and CPU implementation for images of various sizes, using integer and non-integer search grids.The results show that use of a GPU card can shorten computation time by a factor of 200 times for integer and 1000 times for a non-integer search grid. The additional speedup for non-integer search grid comes from the fact that GPU has built-in hardware for image interpolation. Further, when using multiple GPU cards, the presented evaluation shows the importance of the data splitting method across multiple cards, but an almost linear speedup with a number of cards is achievable.In addition we compared execution time of the proposed FS GPU implementation with two existing, highly optimized non-full grid search CPU based motion estimations methods, namely implementation of the Pyramidal Lucas Kanade Optical flow algorithm in OpenCV and Simplified Unsymmetrical multi-Hexagon search in H.264/AVC standard. In these comparisons, FS GPU implementation still showed modest improvement even though the computational complexity of FS GPU implementation is substantially higher than non-FS CPU implementation.We also demonstrated that for an image sequence of 720×480 pixels in resolution, commonly used in video surveillance, the proposed GPU implementation is sufficiently fast for real-time motion estimation at 30 frames-per-second using two NVIDIA C1060 Tesla GPU cards.
Increasing the computational efficient of digital cross correlation by a vectorization method

Science.gov (United States)

Chang, Ching-Yuan; Ma, Chien-Ching

2017-08-01

This study presents a vectorization method for use in MATLAB programming aimed at increasing the computational efficiency of digital cross correlation in sound and images, resulting in a speedup of 6.387 and 36.044 times compared with performance values obtained from looped expression. This work bridges the gap between matrix operations and loop iteration, preserving flexibility and efficiency in program testing. This paper uses numerical simulation to verify the speedup of the proposed vectorization method as well as experiments to measure the quantitative transient displacement response subjected to dynamic impact loading. The experiment involved the use of a high speed camera as well as a fiber optic system to measure the transient displacement in a cantilever beam under impact from a steel ball. Experimental measurement data obtained from the two methods are in excellent agreement in both the time and frequency domain, with discrepancies of only 0.68%. Numerical and experiment results demonstrate the efficacy of the proposed vectorization method with regard to computational speed in signal processing and high precision in the correlation algorithm. We also present the source code with which to build MATLAB-executable functions on Windows as well as Linux platforms, and provide a series of examples to demonstrate the application of the proposed vectorization method.
High-Speed Computation of the Kleene Star in Max-Plus Algebraic System Using a Cell Broadband Engine

Science.gov (United States)

Goto, Hiroyuki

This research addresses a high-speed computation method for the Kleene star of the weighted adjacency matrix in a max-plus algebraic system. We focus on systems whose precedence constraints are represented by a directed acyclic graph and implement it on a Cell Broadband Engine™ (CBE) processor. Since the resulting matrix gives the longest travel times between two adjacent nodes, it is often utilized in scheduling problem solvers for a class of discrete event systems. This research, in particular, attempts to achieve a speedup by using two approaches: parallelization and SIMDization (Single Instruction, Multiple Data), both of which can be accomplished by a CBE processor. The former refers to a parallel computation using multiple cores, while the latter is a method whereby multiple elements are computed by a single instruction. Using the implementation on a Sony PlayStation 3™ equipped with a CBE processor, we found that the SIMDization is effective regardless of the system's size and the number of processor cores used. We also found that the scalability of using multiple cores is remarkable especially for systems with a large number of nodes. In a numerical experiment where the number of nodes is 2000, we achieved a speedup of 20 times compared with the method without the above techniques.
A SEMI-LAGRANGIAN TWO-LEVEL PRECONDITIONED NEWTON-KRYLOV SOLVER FOR CONSTRAINED DIFFEOMORPHIC IMAGE REGISTRATION.

Science.gov (United States)

Mang, Andreas; Biros, George

2017-01-01

We propose an efficient numerical algorithm for the solution of diffeomorphic image registration problems. We use a variational formulation constrained by a partial differential equation (PDE), where the constraints are a scalar transport equation. We use a pseudospectral discretization in space and second-order accurate semi-Lagrangian time stepping scheme for the transport equations. We solve for a stationary velocity field using a preconditioned, globalized, matrix-free Newton-Krylov scheme. We propose and test a two-level Hessian preconditioner. We consider two strategies for inverting the preconditioner on the coarse grid: a nested preconditioned conjugate gradient method (exact solve) and a nested Chebyshev iterative method (inexact solve) with a fixed number of iterations. We test the performance of our solver in different synthetic and real-world two-dimensional application scenarios. We study grid convergence and computational efficiency of our new scheme. We compare the performance of our solver against our initial implementation that uses the same spatial discretization but a standard, explicit, second-order Runge-Kutta scheme for the numerical time integration of the transport equations and a single-level preconditioner. Our improved scheme delivers significant speedups over our original implementation. As a highlight, we observe a 20 × speedup for a two dimensional, real world multi-subject medical image registration problem.
Propensity score estimation to address calendar time-specific channeling in comparative effectiveness research of second generation antipsychotics.

Directory of Open Access Journals (Sweden)

Stacie B Dusetzina

Full Text Available Channeling occurs when a medication and its potential comparators are selectively prescribed based on differences in underlying patient characteristics. Drug safety advisories can provide new information regarding the relative safety or effectiveness of a drug product which might increase selective prescribing. In particular, when reported adverse effects vary among drugs within a therapeutic class, clinicians may channel patients toward or away from a drug based on the patient's underlying risk for an adverse outcome. If channeling is not identified and appropriately managed it might lead to confounding in observational comparative effectiveness studies.To demonstrate channeling among new users of second generation antipsychotics following a Food and Drug Administration safety advisory and to evaluate the impact of channeling on cardiovascular risk estimates over time.Florida Medicaid data from 2001-2006.Retrospective cohort of adults initiating second generation antipsychotics. We used propensity scores to match olanzapine initiators with other second generation antipsychotic initiators. To evaluate channeling away from olanzapine following an FDA safety advisory, we estimated calendar time-specific propensity scores. We compare the performance of these calendar time-specific propensity scores with conventionally-estimated propensity scores on estimates of cardiovascular risk.Increased channeling away from olanzapine was evident for some, but not all, cardiovascular risk factors and corresponded with the timing of the FDA advisory. Covariate balance was optimized within period and across all periods when using the calendar time-specific propensity score. Hazard ratio estimates for cardiovascular outcomes did not differ across models (Conventional PS: 0.97, 95%CI: 0.81-3.18 versus calendar time-specific PS: 0.93, 95%CI: 0.77-3.04.Changes in channeling over time was evident for several covariates but had limited impact on cardiovascular risk
The overlapped radial basis function-finite difference (RBF-FD) method: A generalization of RBF-FD

Science.gov (United States)

Shankar, Varun

2017-08-01

We present a generalization of the RBF-FD method that computes RBF-FD weights in finite-sized neighborhoods around the centers of RBF-FD stencils by introducing an overlap parameter δ ∈ (0 , 1 ] such that δ = 1 recovers the standard RBF-FD method and δ = 0 results in a full decoupling of stencils. We provide experimental evidence to support this generalization, and develop an automatic stabilization procedure based on local Lebesgue functions for the stable selection of stencil weights over a wide range of δ values. We provide an a priori estimate for the speedup of our method over RBF-FD that serves as a good predictor for the true speedup. We apply our method to parabolic partial differential equations with time-dependent inhomogeneous boundary conditions - Neumann in 2D, and Dirichlet in 3D. Our results show that our method can achieve as high as a 60× speedup in 3D over existing RBF-FD methods in the task of forming differentiation matrices.
Monte Carlo methods for neutron transport on graphics processing units using Cuda - 015

International Nuclear Information System (INIS)

Nelson, A.G.; Ivanov, K.N.

2010-01-01

This work examined the feasibility of utilizing Graphics Processing Units (GPUs) to accelerate Monte Carlo neutron transport simulations. First, a clean-sheet MC code was written in C++ for an x86 CPU and later ported to run on GPUs using NVIDIA's CUDA programming language. After further optimization, the GPU ran 21 times faster than the CPU code when using single-precision floating point math. This can be further increased with no additional effort if accuracy is sacrificed for speed: using a compiler flag, the speedup was increased to 22x. Further, if double-precision floating point math is desired for neutron tracking through the geometry, a speedup of 11x was obtained. The GPUs have proven to be useful in this study, but the current generation does have limitations: the maximum memory currently available on a single GPU is only 4 GB; the GPU RAM does not provide error-checking and correction; and the optimization required for large speedups can lead to confusing code. (authors)
Computational efficiency using the CYBER-205 computer for the PACER Monte Carlo Program

International Nuclear Information System (INIS)

Candelore, N.R.; Maher, C.M.; Gast, R.C.

1985-09-01

The use of the large memory of the CYBER-205 and its vector data handling logic produced speedups over scalar code ranging from a factor of 7 for unit cell calculations with relatively few compositions to a factor of 5 for problems having more detailed geometry and materials. By vectorizing the neutron tracking in PACER (the collision analysis remained in scalar code), an asymptotic value of 200 neutrons/cpu-second was achieved for a batch size of 10,000 neutrons. The complete vectorization of the Monte Carlo method as performed by Brown resulted in even higher speedups in neutron processing rates over the use of scalar code. Large speedups in neutron processing rates are beneficial not only to achieve more accurate results for the neutronics calculations which are routinely done using Monte Carlo, but also to extend the use of the Monte Carlo method to applications that were previously considered impractical because of large running times
An improved Four-Russians method and sparsified Four-Russians algorithm for RNA folding.

Science.gov (United States)

Frid, Yelena; Gusfield, Dan

2016-01-01

The basic RNA secondary structure prediction problem or single sequence folding problem (SSF) was solved 35 years ago by a now well-known [Formula: see text]-time dynamic programming method. Recently three methodologies-Valiant, Four-Russians, and Sparsification-have been applied to speedup RNA secondary structure prediction. The sparsification method exploits two properties of the input: the number of subsequence Z with the endpoints belonging to the optimal folding set and the maximum number base-pairs L. These sparsity properties satisfy [Formula: see text] and [Formula: see text], and the method reduces the algorithmic running time to O(LZ). While the Four-Russians method utilizes tabling partial results. In this paper, we explore three different algorithmic speedups. We first expand the reformulate the single sequence folding Four-Russians [Formula: see text]-time algorithm, to utilize an on-demand lookup table. Second, we create a framework that combines the fastest Sparsification and new fastest on-demand Four-Russians methods. This combined method has worst-case running time of [Formula: see text], where [Formula: see text] and [Formula: see text]. Third we update the Four-Russians formulation to achieve an on-demand [Formula: see text]-time parallel algorithm. This then leads to an asymptotic speedup of [Formula: see text] where [Formula: see text] and [Formula: see text] the number of subsequence with the endpoint j belonging to the optimal folding set. The on-demand formulation not only removes all extraneous computation and allows us to incorporate more realistic scoring schemes, but leads us to take advantage of the sparsity properties. Through asymptotic analysis and empirical testing on the base-pair maximization variant and a more biologically informative scoring scheme, we show that this Sparse Four-Russians framework is able to achieve a speedup on every problem instance, that is asymptotically never worse, and empirically better than achieved by
Anisotropy, propagation failure, and wave speedup in traveling waves of discretizations of a Nagumo PDE

International Nuclear Information System (INIS)

Elmer, Christopher E.; Vleck, Erik S. van

2003-01-01

This article is concerned with effect of spatial and temporal discretizations on traveling wave solutions to parabolic PDEs (Nagumo type) possessing piecewise linear bistable nonlinearities. Solution behavior is compared in terms of waveforms and in terms of the so-called (a,c) relationship where a is a parameter controlling the bistable nonlinearity by varying the potential energy difference of the two phases and c is the wave speed of the traveling wave. Uniform spatial discretizations and A(α) stable linear multistep methods in time are considered. Results obtained show that although the traveling wave solutions to parabolic PDEs are stationary for only one value of the parameter a,a 0 , spatial discretization of these PDEs produce traveling waves which are stationary for a nontrivial interval of a values which include a 0 , i.e., failure of the solution to propagate in the presence of a driving force. This is true no matter how wide the interface is with respect to the discretization. For temporal discretizations at large wave speeds the set of parameter a values for which there are traveling wave solutions is constrained. An analysis of a complete discretization points out the potential for nonuniqueness in the (a,c) relationship
SWAMP+: multiple subsequence alignment using associative massive parallelism

Energy Technology Data Exchange (ETDEWEB)

Steinfadt, Shannon Irene [Los Alamos National Laboratory; Baker, Johnnie W [KENT STATE UNIV.

2010-10-18

A new parallel algorithm SWAMP+ incorporates the Smith-Waterman sequence alignment on an associative parallel model known as ASC. It is a highly sensitive parallel approach that expands traditional pairwise sequence alignment. This is the first parallel algorithm to provide multiple non-overlapping, non-intersecting subsequence alignments with the accuracy of Smith-Waterman. The efficient algorithm provides multiple alignments similar to BLAST while creating a better workflow for the end users. The parallel portions of the code run in O(m+n) time using m processors. When m = n, the algorithmic analysis becomes O(n) with a coefficient of two, yielding a linear speedup. Implementation of the algorithm on the SIMD ClearSpeed CSX620 confirms this theoretical linear speedup with real timings.
A comparative study of the time performance between NINO and FlexToT ASICs

International Nuclear Information System (INIS)

Sarasola, I.; Rato, P.; Marín, J.; Nemallapudi, M.V.; Gundacker, S.; Auffray, E.; Sánchez, D.; Gascón, D.

2017-01-01

Universitat de Barcelona (UB) and CIEMAT have designed the FlexToT ASIC for the front-end readout of SiPM-based scintillator detectors. This ASIC is aimed at time of flight (ToF) positron emission tomography (PET) applications. In this work we have evaluated the time performance of the FlexToT v2 ASIC compared to the NINO ASIC, a fast ASIC developped at CERN. NINO electronics give 64 ps sigma for single-photon time resolution (SPTR) and 93 ps FWHM for coincidence time resolution (CTR) with 2 × 2 × 5 mm 3 LSO:Ce,Ca crystals and S13360-3050CS SiPMs. Using the same SiPMs and crystals, the FlexToT v2 ASIC yields 91 ps sigma for SPTR and 123 ps FWHM for CTR. Despite worse time performace than NINO, FlexToT v2 features lower power consumption (11 vs. 27 mW/ch) and linear ToT energy measurement.
Speedup and fracturing of George VI Ice Shelf, Antarctic Peninsula

Directory of Open Access Journals (Sweden)

T. O. Holt

2013-05-01

Full Text Available George VI Ice Shelf (GVIIS is located on the Antarctic Peninsula, a region where several ice shelves have undergone rapid breakup in response to atmospheric and oceanic warming. We use a combination of optical (Landsat, radar (ERS 1/2 SAR and laser altimetry (GLAS datasets to examine the response of GVIIS to environmental change and to offer an assessment on its future stability. The spatial and structural changes of GVIIS (ca. 1973 to ca. 2010 are mapped and surface velocities are calculated at different time periods (InSAR and optical feature tracking from 1989 to 2009 to document changes in the ice shelf's flow regime. Surface elevation changes are recorded between 2003 and 2008 using repeat track ICESat acquisitions. We note an increase in fracture extent and distribution at the south ice front, ice-shelf acceleration towards both the north and south ice fronts and spatially varied negative surface elevation change throughout, with greater variations observed towards the central and southern regions of the ice shelf. We propose that whilst GVIIS is in no imminent danger of collapse, it is vulnerable to ongoing atmospheric and oceanic warming and is more susceptible to breakup along its southern margin in ice preconditioned for further retreat.
Fast Gridding on Commodity Graphics Hardware

DEFF Research Database (Denmark)

Sørensen, Thomas Sangild; Schaeffter, Tobias; Noe, Karsten Østergaard

2007-01-01

is the far most time consuming of the three steps (Table 1). Modern graphics cards (GPUs) can be utilised as a fast parallel processor provided that algorithms are reformulated in a parallel solution. The purpose of this work is to test the hypothesis, that a non-cartesian reconstruction can be efficiently...... implemented on graphics hardware giving a significant speedup compared to CPU based alternatives. We present a novel GPU implementation of the convolution step that overcomes the problems of memory bandwidth that has limited the speed of previous GPU gridding algorithms [2]....

Open problems in CEM: Porting an explicit time-domain volume-integral- equation solver on GPUs with OpenACC

KAUST Repository

Ergül, Özgür

2014-04-01

Graphics processing units (GPUs) are gradually becoming mainstream in high-performance computing, as their capabilities for enhancing performance of a large spectrum of scientific applications to many fold when compared to multi-core CPUs have been clearly identified and proven. In this paper, implementation and performance-tuning details for porting an explicit marching-on-in-time (MOT)-based time-domain volume-integral-equation (TDVIE) solver onto GPUs are described in detail. To this end, a high-level approach, utilizing the OpenACC directive-based parallel programming model, is used to minimize two often-faced challenges in GPU programming: developer productivity and code portability. The MOT-TDVIE solver code, originally developed for CPUs, is annotated with compiler directives to port it to GPUs in a fashion similar to how OpenMP targets multi-core CPUs. In contrast to CUDA and OpenCL, where significant modifications to CPU-based codes are required, this high-level approach therefore requires minimal changes to the codes. In this work, we make use of two available OpenACC compilers, CAPS and PGI. Our experience reveals that different annotations of the code are required for each of the compilers, due to different interpretations of the fairly new standard by the compiler developers. Both versions of the OpenACC accelerated code achieved significant performance improvements, with up to 30× speedup against the sequential CPU code using recent hardware technology. Moreover, we demonstrated that the GPU-accelerated fully explicit MOT-TDVIE solver leveraged energy-consumption gains of the order of 3× against its CPU counterpart. © 2014 IEEE.
Comparing predictors of part-time and no vocational engagement in youth primary mental health services: A brief report.

Science.gov (United States)

Cairns, Alice J; Kavanagh, David J; Dark, Frances; McPhail, Steven M

2017-05-19

This investigation aims to identify if correlates of not working or studying were also correlated with part-time vocational participation. Demographic and vocational engagement information was collected from 226 participant clinical charts aged 15 to 25 years accessing a primary youth health clinic. Multinomial logistic regressions were used to examine potential correlates no and part-time vocational engagement compared to those full-time. A total of 33% were not working or studying and 19% were part-time. Not working or studying was associated with secondary school dropout and a history of drug use. These associations were not observed in those participating part-time. This result suggests that the markers of disadvantage observed in those not working or studying do not carry over to those who are part-time. Potentially, those who are part-time are less vulnerable to long-term disadvantage compared to their unemployed counterparts as they do not share the same indicators of disadvantage. © 2017 John Wiley & Sons Australia, Ltd.
Comparative timing measurements of LYSO and LFS-3 to achieve the best time resolution for TOF-PET

CERN Document Server

Doroud, K; Zichichi, A; Zuyeuski, R

2015-01-01

The best Coincidence Time Resolution (CTR) obtained so far – with very short crystals of 3–5 mm in length – reach values between 100 and 150 ps. Such crystals are not really practical for a TOF PET imaging device, since the sensitivity is quite small for the detection of the 511 keV gammas resulting from a positron annihilation. We present our setup and measurements using 15 mm length crystals; a length we regard as reasonable for a TOF-PET scanner. We have used a new series of Silicon Photo-Multipliers (SiPM) manufactured by Hamamatsu. These are the High Fill Factor (HFF) and Low Cross-Talk (LCT) Multi-Pixel Photon Counters (MPPC). We have compared three different crystals, LFS-3 (supplied by Zecotek) and two samples of LYSO (manufactured by Saint Gobain and CPI). We have obtained an excellent value of 148 ps for the Coincidence Time Resolution (CTR) with two LFS-3 crystals (15 mm long) mounted on each side of a 22Na radioactive source with the HFF-MPPCs at 3.3 V over-voltage. Our results are148 ps obt...
An alternative effective method for verifying the multileaf collimator leaves speed by using a digital-video imaging system

International Nuclear Information System (INIS)

Hwang, Ing-Ming; Wu, Jay; Chuang, Keh-Shih; Ding, Hueisch-Jy

2010-01-01

We present an alternative effective method for verifying the multileaf collimator (MLC) leaves speed using a digital-video imaging system in daily dynamic conformal radiation therapy (DCRT) and intensity-modulation radiation therapy (IMRT) in achieving increased convenience and shorter treatment times. The horizontal leaves speed measured was within 1.76-2.08 cm/s. The mean full range of traveling time was 20 s. The initial speed-up time was within 1.5-2.0 s, and the slowing-down time was within 2.0-2.5 s. Due to gravity the maximum speed-up effect in the X1 bank was +0.10 cm/s, but the lagging effect in the X2 bank was -0.20 cm/s. This technique offered an alternative method with electronic portal imaging device (EPID), charged coupled device (CCD) or a light field for the measurement of MLC leaves speed. When time taken on the linac was kept to a minimum, the image could be processed off-line.
Ultrafast convolution/superposition using tabulated and exponential kernels on GPU

Energy Technology Data Exchange (ETDEWEB)

Chen Quan; Chen Mingli; Lu Weiguo [TomoTherapy Inc., 1240 Deming Way, Madison, Wisconsin 53717 (United States)

2011-03-15

Purpose: Collapsed-cone convolution/superposition (CCCS) dose calculation is the workhorse for IMRT dose calculation. The authors present a novel algorithm for computing CCCS dose on the modern graphic processing unit (GPU). Methods: The GPU algorithm includes a novel TERMA calculation that has no write-conflicts and has linear computation complexity. The CCCS algorithm uses either tabulated or exponential cumulative-cumulative kernels (CCKs) as reported in literature. The authors have demonstrated that the use of exponential kernels can reduce the computation complexity by order of a dimension and achieve excellent accuracy. Special attentions are paid to the unique architecture of GPU, especially the memory accessing pattern, which increases performance by more than tenfold. Results: As a result, the tabulated kernel implementation in GPU is two to three times faster than other GPU implementations reported in literature. The implementation of CCCS showed significant speedup on GPU over single core CPU. On tabulated CCK, speedups as high as 70 are observed; on exponential CCK, speedups as high as 90 are observed. Conclusions: Overall, the GPU algorithm using exponential CCK is 1000-3000 times faster over a highly optimized single-threaded CPU implementation using tabulated CCK, while the dose differences are within 0.5% and 0.5 mm. This ultrafast CCCS algorithm will allow many time-sensitive applications to use accurate dose calculation.
High-resolution, time-resolved MRA provides superior definition of lower-extremity arterial segments compared to 2D time-of-flight imaging.

Science.gov (United States)

Thornton, F J; Du, J; Suleiman, S A; Dieter, R; Tefera, G; Pillai, K R; Korosec, F R; Mistretta, C A; Grist, T M

2006-08-01

To evaluate a novel time-resolved contrast-enhanced (CE) projection reconstruction (PR) magnetic resonance angiography (MRA) method for identifying potential bypass graft target vessels in patients with Class II-IV peripheral vascular disease. Twenty patients (M:F = 15:5, mean age = 58 years, range = 48-83 years), were recruited from routine MRA referrals. All imaging was performed on a 1.5 T MRI system with fast gradients (Signa LX; GE Healthcare, Waukesha, WI). Images were acquired with a novel technique that combined undersampled PR with a time-resolved acquisition to yield an MRA method with high temporal and spatial resolution. The method is called PR hyper time-resolved imaging of contrast kinetics (PR-hyperTRICKS). Quantitative and qualitative analyses were used to compare two-dimensional (2D) time-of-flight (TOF) and PR-hyperTRICKS in 13 arterial segments per lower extremity. Statistical analysis was performed with the Wilcoxon signed-rank test. Fifteen percent (77/517) of the vessels were scored as missing or nondiagnostic with 2D TOF, but were scored as diagnostic with PR-hyperTRICKS. Image quality was superior with PR-hyperTRICKS vs. 2D TOF (on a four-point scale, mean rank = 3.3 +/- 1.2 vs. 2.9 +/- 1.2, P < 0.0001). PR-hyperTRICKS produced images with high contrast-to-noise ratios (CNR) and high spatial and temporal resolution. 2D TOF images were of inferior quality due to moderate spatial resolution, inferior CNR, greater flow-related artifacts, and absence of temporal resolution. PR-hyperTRICKS provides superior preoperative assessment of lower limb ischemia compared to 2D TOF.
Quantum supremacy in constant-time measurement-based computation: A unified architecture for sampling and verification

Science.gov (United States)

Miller, Jacob; Sanders, Stephen; Miyake, Akimasa

2017-12-01

While quantum speed-up in solving certain decision problems by a fault-tolerant universal quantum computer has been promised, a timely research interest includes how far one can reduce the resource requirement to demonstrate a provable advantage in quantum devices without demanding quantum error correction, which is crucial for prolonging the coherence time of qubits. We propose a model device made of locally interacting multiple qubits, designed such that simultaneous single-qubit measurements on it can output probability distributions whose average-case sampling is classically intractable, under similar assumptions as the sampling of noninteracting bosons and instantaneous quantum circuits. Notably, in contrast to these previous unitary-based realizations, our measurement-based implementation has two distinctive features. (i) Our implementation involves no adaptation of measurement bases, leading output probability distributions to be generated in constant time, independent of the system size. Thus, it could be implemented in principle without quantum error correction. (ii) Verifying the classical intractability of our sampling is done by changing the Pauli measurement bases only at certain output qubits. Our usage of random commuting quantum circuits in place of computationally universal circuits allows a unique unification of sampling and verification, so they require the same physical resource requirements in contrast to the more demanding verification protocols seen elsewhere in the literature.
Performing T-tests to Compare Autocorrelated Time Series Data Collected from Direct-Reading Instruments.

Science.gov (United States)

O'Shaughnessy, Patrick; Cavanaugh, Joseph E

2015-01-01

Industrial hygienists now commonly use direct-reading instruments to evaluate hazards in the workplace. The stored values over time from these instruments constitute a time series of measurements that are often autocorrelated. Given the need to statistically compare two occupational scenarios using values from a direct-reading instrument, a t-test must consider measurement autocorrelation or the resulting test will have a largely inflated type-1 error probability (false rejection of the null hypothesis). A method is described for both the one-sample and two-sample cases which properly adjusts for autocorrelation. This method involves the computation of an "equivalent sample size" that effectively decreases the actual sample size when determining the standard error of the mean for the time series. An example is provided for the one-sample case, and an example is given where a two-sample t-test is conducted for two autocorrelated time series comprised of lognormally distributed measurements.
Time-Efficiency Analysis Comparing Digital and Conventional Workflows for Implant Crowns: A Prospective Clinical Crossover Trial.

Science.gov (United States)

Joda, Tim; Brägger, Urs

2015-01-01

To compare time-efficiency in the production of implant crowns using a digital workflow versus the conventional pathway. This prospective clinical study used a crossover design that included 20 study participants receiving single-tooth replacements in posterior sites. Each patient received a customized titanium abutment plus a computer-aided design/computer-assisted manufacture (CAD/CAM) zirconia suprastructure (for those in the test group, using digital workflow) and a standardized titanium abutment plus a porcelain-fused-to-metal crown (for those in the control group, using a conventional pathway). The start of the implant prosthetic treatment was established as the baseline. Time-efficiency analysis was defined as the primary outcome, and was measured for every single clinical and laboratory work step in minutes. Statistical analysis was calculated with the Wilcoxon rank sum test. All crowns could be provided within two clinical appointments, independent of the manufacturing process. The mean total production time, as the sum of clinical plus laboratory work steps, was significantly different. The mean ± standard deviation (SD) time was 185.4 ± 17.9 minutes for the digital workflow process and 223.0 ± 26.2 minutes for the conventional pathway (P = .0001). Therefore, digital processing for overall treatment was 16% faster. Detailed analysis for the clinical treatment revealed a significantly reduced mean ± SD chair time of 27.3 ± 3.4 minutes for the test group compared with 33.2 ± 4.9 minutes for the control group (P = .0001). Similar results were found for the mean laboratory work time, with a significant decrease of 158.1 ± 17.2 minutes for the test group vs 189.8 ± 25.3 minutes for the control group (P = .0001). Only a few studies have investigated efficiency parameters of digital workflows compared with conventional pathways in implant dental medicine. This investigation shows that the digital workflow seems to be more time-efficient than the
Compaction-Based Deformable Terrain Model as an Interface for Real-Time Vehicle Dynamics Simulations

Science.gov (United States)

2013-04-16

N.Y. [20] Wulfsohn, D., and Upadhyaya, S. K., 1992, "Prediction of traction and soil compaction using three-dimensional soil- tyre contact profile," Journal of Terramechanics, 29(6), pp. 541-564. ...the relative speedup of utilizing GPUs for computational acceleration. INTRODUCTION In order to enable off- road vehicle dynamics analysis...ANSI Std Z39-18 Page 2 of 8 Figure 2. Tire geometry used to determine collision points with the terrain In the context of off- road vehicle
Comparing the accuracy of ABC and time-driven ABC in complex and dynamic environments: a simulation analysis

OpenAIRE

S. HOOZÉE; M. VANHOUCKE; W. BRUGGEMAN; -

2010-01-01

This paper compares the accuracy of traditional ABC and time-driven ABC in complex and dynamic environments through simulation analysis. First, when unit times in time-driven ABC are known or can be flawlessly estimated, time-driven ABC coincides with the benchmark system and in this case our results show that the overall accuracy of traditional ABC depends on (1) existing capacity utilization, (2) diversity in the actual mix of productive work, and (3) error in the estimated percentage mix. ...
Parallel/vector algorithms for the spherical SN transport theory method

International Nuclear Information System (INIS)

Haghighat, A.; Mattis, R.E.

1990-01-01

This paper discusses vector and parallel processing of a 1-D curvilinear (i.e. spherical) S N transport theory algorithm on the Cornell National SuperComputer Facility (CNSF) IBM 3090/600E. Two different vector algorithms were developed and parallelized based on angular decomposition. It is shown that significant speedups are attainable. For example, for problems with large granularity, using 4 processors, the parallel/vector algorithm achieves speedups (for wall-clock time) of more than 4.5 relative to the old serial/scalar algorithm. Furthermore, this work has demonstrated the existing potential for the development of faster processing vector and parallel algorithms for multidimensional curvilinear geometries. (author)
A comparative study of simple auditory reaction time in blind (congenitally) and sighted subjects.

Science.gov (United States)

Gandhi, Pritesh Hariprasad; Gokhale, Pradnya A; Mehta, H B; Shah, C J

2013-07-01

Reaction time is the time interval between the application of a stimulus and the appearance of appropriate voluntary response by a subject. It involves stimulus processing, decision making, and response programming. Reaction time study has been popular due to their implication in sports physiology. Reaction time has been widely studied as its practical implications may be of great consequence e.g., a slower than normal reaction time while driving can have grave results. To study simple auditory reaction time in congenitally blind subjects and in age sex matched sighted subjects. To compare the simple auditory reaction time between congenitally blind subjects and healthy control subjects. STUDY HAD BEEN CARRIED OUT IN TWO GROUPS: The 1(st) of 50 congenitally blind subjects and 2(nd) group comprises of 50 healthy controls. It was carried out on Multiple Choice Reaction Time Apparatus, Inco Ambala Ltd. (Accuracy±0.001 s) in a sitting position at Government Medical College and Hospital, Bhavnagar and at a Blind School, PNR campus, Bhavnagar, Gujarat, India. Simple auditory reaction time response with four different type of sound (horn, bell, ring, and whistle) was recorded in both groups. According to our study, there is no significant different in reaction time between congenital blind and normal healthy persons. Blind individuals commonly utilize tactual and auditory cues for information and orientation and they reliance on touch and audition, together with more practice in using these modalities to guide behavior, is often reflected in better performance of blind relative to sighted participants in tactile or auditory discrimination tasks, but there is not any difference in reaction time between congenitally blind and sighted people.
Performance of a Bounce-Averaged Global Model of Super-Thermal Electron Transport in the Earth's Magnetic Field

Science.gov (United States)

McGuire, Tim

1998-01-01

In this paper, we report the results of our recent research on the application of a multiprocessor Cray T916 supercomputer in modeling super-thermal electron transport in the earth's magnetic field. In general, this mathematical model requires numerical solution of a system of partial differential equations. The code we use for this model is moderately vectorized. By using Amdahl's Law for vector processors, it can be verified that the code is about 60% vectorized on a Cray computer. Speedup factors on the order of 2.5 were obtained compared to the unvectorized code. In the following sections, we discuss the methodology of improving the code. In addition to our goal of optimizing the code for solution on the Cray computer, we had the goal of scalability in mind. Scalability combines the concepts of portabilty with near-linear speedup. Specifically, a scalable program is one whose performance is portable across many different architectures with differing numbers of processors for many different problem sizes. Though we have access to a Cray at this time, the goal was to also have code which would run well on a variety of architectures.
Time-frequency analysis of phonocardiogram signals using wavelet transform: a comparative study.

Science.gov (United States)

Ergen, Burhan; Tatar, Yetkin; Gulcur, Halil Ozcan

2012-01-01

Analysis of phonocardiogram (PCG) signals provides a non-invasive means to determine the abnormalities caused by cardiovascular system pathology. In general, time-frequency representation (TFR) methods are used to study the PCG signal because it is one of the non-stationary bio-signals. The continuous wavelet transform (CWT) is especially suitable for the analysis of non-stationary signals and to obtain the TFR, due to its high resolution, both in time and in frequency and has recently become a favourite tool. It decomposes a signal in terms of elementary contributions called wavelets, which are shifted and dilated copies of a fixed mother wavelet function, and yields a joint TFR. Although the basic characteristics of the wavelets are similar, each type of the wavelets produces a different TFR. In this study, eight real types of the most known wavelets are examined on typical PCG signals indicating heart abnormalities in order to determine the best wavelet to obtain a reliable TFR. For this purpose, the wavelet energy and frequency spectrum estimations based on the CWT and the spectra of the chosen wavelets were compared with the energy distribution and the autoregressive frequency spectra in order to determine the most suitable wavelet. The results show that Morlet wavelet is the most reliable wavelet for the time-frequency analysis of PCG signals.
Desflurane Allows for a Faster Emergence when Compared to Sevoflurane Without Affecting the Baseline Cognitive Recovery Time.

Directory of Open Access Journals (Sweden)

Joseph G. Werner

2015-10-01

Full Text Available Aims, We compared the effect of desflurane and sevoflurane on anesthesia recovery time in patients undergoing urological cystoscopic surgery. The Short Orientation Memory Concentration Test (SOMCT measured and compared cognitive impairment between groups and coughing was assessed throughout the anesthetic.Methods and Materials, This investigation included 75 ambulatory patients. Patients were randomized to receive either desflurane or sevoflurane. Inhalational anesthetics were discontinued after removal of the cystoscope and once repositioning of the patient was final. Coughing assessment and awakening time from anesthesia were assessed by a blinded observer.Statistical analysis used: Statistical analysis was performed by using t-test for parametric variables and Mann-Whitney U test for nonparametric variables. Results, The primary endpoint, mean time to eye-opening, was 5.0±2.5 minutes for desflurane, and 7.9±4.1 minutes for sevoflurane (p <0.001. There were no significant differences in time to SOMCT recovery (p=0.109, overall time spent in the post anesthesia care unit (p=0.924 or time to discharge (p=0.363. Median time until readiness for discharge was nine minutes in the desflurane group, while the sevoflurane group had a median time of 20 minutes (p=0.020. The overall incidence of coughing during the perioperative period was significantly higher in the desflurane (p=0.030. Conclusions, We re-confirmed that patients receiving desflurane had a faster emergence and met the criteria to be discharged from the post anesthesia care unit earlier. No difference was found in time to return to baseline cognition between desflurane and sevoflurane.
Optimizing the performance and structure of the D0 Collie confidence limit evaluator

Energy Technology Data Exchange (ETDEWEB)

Fishchler, Mark; /Fermilab

2010-07-01

D0 Collie is a program used to perform limit calculations based on ensembles of pseudo-experiments ('PEs'). Since the application of this program to the crucial Higgs mass limit is quite CPU intensive, it has been deemed important to carefully review this program, with an eye toward identifying and implementing potential performance improvements. At the same time, we identify any coding errors or opportunities for potential structural (or algorithm) improvement discovered in the course of gaining sufficient understanding of the workings of Collie to sensibly explore for optimizations. Based on a careful analysis of the program, a series of code changes with potential for improving performance has been identified. The implementation and evaluation of the most important parts of this series has been done, with gratifying speedup results. The bottom line: We have identified and implemented changes leading to a factor of 2.19 speedup in the example program provided, and expected to translate to a factor of roughly 4 speedup in typical realistic usage.
Accelerated pharmacokinetic map determination for dynamic contrast enhanced MRI using frequency-domain based Tofts model.

Science.gov (United States)

Vajuvalli, Nithin N; Nayak, Krupa N; Geethanath, Sairam

2014-01-01

Dynamic Contrast Enhanced Magnetic Resonance Imaging (DCE-MRI) is widely used in the diagnosis of cancer and is also a promising tool for monitoring tumor response to treatment. The Tofts model has become a standard for the analysis of DCE-MRI. The process of curve fitting employed in the Tofts equation to obtain the pharmacokinetic (PK) parameters is time-consuming for high resolution scans. Current work demonstrates a frequency-domain approach applied to the standard Tofts equation to speed-up the process of curve-fitting in order to obtain the pharmacokinetic parameters. The results obtained show that using the frequency domain approach, the process of curve fitting is computationally more efficient compared to the time-domain approach.
Data assimilation using a GPU accelerated path integral Monte Carlo approach

Science.gov (United States)

Quinn, John C.; Abarbanel, Henry D. I.

2011-09-01

The answers to data assimilation questions can be expressed as path integrals over all possible state and parameter histories. We show how these path integrals can be evaluated numerically using a Markov Chain Monte Carlo method designed to run in parallel on a graphics processing unit (GPU). We demonstrate the application of the method to an example with a transmembrane voltage time series of a simulated neuron as an input, and using a Hodgkin-Huxley neuron model. By taking advantage of GPU computing, we gain a parallel speedup factor of up to about 300, compared to an equivalent serial computation on a CPU, with performance increasing as the length of the observation time used for data assimilation increases.
Real-time high-level video understanding using data warehouse

Science.gov (United States)

Lienard, Bruno; Desurmont, Xavier; Barrie, Bertrand; Delaigle, Jean-Francois

2006-02-01

High-level Video content analysis such as video-surveillance is often limited by computational aspects of automatic image understanding, i.e. it requires huge computing resources for reasoning processes like categorization and huge amount of data to represent knowledge of objects, scenarios and other models. This article explains how to design and develop a "near real-time adaptive image datamart", used, as a decisional support system for vision algorithms, and then as a mass storage system. Using RDF specification as storing format of vision algorithms meta-data, we can optimise the data warehouse concepts for video analysis, add some processes able to adapt the current model and pre-process data to speed-up queries. In this way, when new data is sent from a sensor to the data warehouse for long term storage, using remote procedure call embedded in object-oriented interfaces to simplified queries, they are processed and in memory data-model is updated. After some processing, possible interpretations of this data can be returned back to the sensor. To demonstrate this new approach, we will present typical scenarios applied to this architecture such as people tracking and events detection in a multi-camera network. Finally we will show how this system becomes a high-semantic data container for external data-mining.

Accelerating Pseudo-Random Number Generator for MCNP on GPU

Science.gov (United States)

Gong, Chunye; Liu, Jie; Chi, Lihua; Hu, Qingfeng; Deng, Li; Gong, Zhenghu

2010-09-01

Pseudo-random number generators (PRNG) are intensively used in many stochastic algorithms in particle simulations, artificial neural networks and other scientific computation. The PRNG in Monte Carlo N-Particle Transport Code (MCNP) requires long period, high quality, flexible jump and fast enough. In this paper, we implement such a PRNG for MCNP on NVIDIA's GTX200 Graphics Processor Units (GPU) using CUDA programming model. Results shows that 3.80 to 8.10 times speedup are achieved compared with 4 to 6 cores CPUs and more than 679.18 million double precision random numbers can be generated per second on GPU.
Indexing Motion Detection Data for Surveillance Video

DEFF Research Database (Denmark)

Vind, Søren Juhl; Bille, Philip; Gørtz, Inge Li

2014-01-01

We show how to compactly index video data to support fast motion detection queries. A query specifies a time interval T, a area A in the video and two thresholds v and p. The answer to a query is a list of timestamps in T where ≥ p% of A has changed by ≥ v values. Our results show that by building...... a small index, we can support queries with a speedup of two to three orders of magnitude compared to motion detection without an index. For high resolution video, the index size is about 20% of the compressed video size....
MOSRA-Light; high speed three-dimensional nodal diffusion code for vector computers

Energy Technology Data Exchange (ETDEWEB)

Okumura, Keisuke [Japan Atomic Energy Research Inst., Tokai, Ibaraki (Japan). Tokai Research Establishment

1998-10-01

MOSRA-Light is a three-dimensional neutron diffusion calculation code for X-Y-Z geometry. It is based on the 4th order polynomial nodal expansion method (NEM). As the 4th order NEM is not sensitive to mesh sizes, accurate calculation is possible by the use of coarse meshes of about 20 cm. The drastic decrease of number of unknowns in a 3-dimensional problem results in very fast computation. Furthermore, it employs newly developed computation algorithm `boundary separated checkerboard sweep method` appropriate to vector computers. This method is very efficient because the speedup factor by vectorization increases, as a scale of problem becomes larger. Speed-up factor compared to the scalar calculation is from 20 to 40 in the case of PWR core calculation. Considering the both effects by the vectorization and the coarse mesh method, total speedup factor is more than 1000 as compared with conventional scalar code with the finite difference method. MOSRA-Light can be available on most of vector or scalar computers with the UNIX or it`s similar operating systems (e.g. freeware like Linux). Users can easily install it by the help of the conversation style installer. This report contains the general theory of NEM, the fast computation algorithm, benchmark calculation results and detailed information for usage of this code including input data instructions and sample input data. (author)
MOSRA-Light; high speed three-dimensional nodal diffusion code for vector computers

International Nuclear Information System (INIS)

Okumura, Keisuke

1998-10-01

MOSRA-Light is a three-dimensional neutron diffusion calculation code for X-Y-Z geometry. It is based on the 4th order polynomial nodal expansion method (NEM). As the 4th order NEM is not sensitive to mesh sizes, accurate calculation is possible by the use of coarse meshes of about 20 cm. The drastic decrease of number of unknowns in a 3-dimensional problem results in very fast computation. Furthermore, it employs newly developed computation algorithm 'boundary separated checkerboard sweep method' appropriate to vector computers. This method is very efficient because the speedup factor by vectorization increases, as a scale of problem becomes larger. Speed-up factor compared to the scalar calculation is from 20 to 40 in the case of PWR core calculation. Considering the both effects by the vectorization and the coarse mesh method, total speedup factor is more than 1000 as compared with conventional scalar code with the finite difference method. MOSRA-Light can be available on most of vector or scalar computers with the UNIX or it's similar operating systems (e.g. freeware like Linux). Users can easily install it by the help of the conversation style installer. This report contains the general theory of NEM, the fast computation algorithm, benchmark calculation results and detailed information for usage of this code including input data instructions and sample input data. (author)
Limits on the Efficiency of Event-Based Algorithms for Monte Carlo Neutron Transport

Energy Technology Data Exchange (ETDEWEB)

Romano, Paul K.; Siegel, Andrew R.

2017-04-16

The traditional form of parallelism in Monte Carlo particle transport simulations, wherein each individual particle history is considered a unit of work, does not lend itself well to data-level parallelism. Event-based algorithms, which were originally used for simulations on vector processors, may offer a path toward better utilizing data-level parallelism in modern computer architectures. In this study, a simple model is developed for estimating the efficiency of the event-based particle transport algorithm under two sets of assumptions. Data collected from simulations of four reactor problems using OpenMC was then used in conjunction with the models to calculate the speedup due to vectorization as a function of two parameters: the size of the particle bank and the vector width. When each event type is assumed to have constant execution time, the achievable speedup is directly related to the particle bank size. We observed that the bank size generally needs to be at least 20 times greater than vector size in order to achieve vector efficiency greater than 90%. When the execution times for events are allowed to vary, however, the vector speedup is also limited by differences in execution time for events being carried out in a single event-iteration. For some problems, this implies that vector effciencies over 50% may not be attainable. While there are many factors impacting performance of an event-based algorithm that are not captured by our model, it nevertheless provides insights into factors that may be limiting in a real implementation.
Contextualizing Teacher Autonomy in Time and Space: A Model for Comparing Various Forms of Governing the Teaching Profession

Science.gov (United States)

Wermke, Wieland; Höstfält, Gabriella

2014-01-01

This study aims to develop a model for comparing different forms of teacher autonomy in various national contexts and at different times. Understanding and explaining local differences and global similarities in the teaching profession in a globalized world require conceptions that contribute to further theorization of comparative and…
Comparative effects of ionizing radiation on cycle time and mitotic duration. A time-lapse cinematography study

International Nuclear Information System (INIS)

D'Hooghe, M.C.; Hemon, D.; Valleron, A.J.; Malaise, E.P.

1980-01-01

The effects of 60 Co γ rays on the length of the intermitotic period, the duration of mitosis, and the division probability of EMT6 cells have been studied in vitro using time-lapse cinematography. Irradiation increases the duration of the mitosis and of the cycle in comparable proportions: both parameters are practically doubled by a dose of 10 Gy. When daughters of irradiated cells die, the mitotic delay and lengthening of mitosis of their mother cells are longer than average. Mitotic delay and lengthening of mitosis depend on the age of cells at the moment of irradiation. The mitotic delay increases progressively when cells are irradiated during the first 8 h of their cycle (i.e., before the transition point), whereas mitosis is slightly prolonged. On the other hand, when the cells are irradiated after this transition point the mitotic delay decreases markedly, whereas the lengthening of mitosis increases sharply. These results tend to indicate that two different mechanisms are responsible for mitotic delay and prolongation of mitosis observed after irradiation
Comparative effects of ionizing radiation on cycle time and mitotic duration. A time-lapse cinematography study

Energy Technology Data Exchange (ETDEWEB)

D' Hooghe, M.C. (Institut de Recherches sur le Cancer, Lille, France); Hemon, D.; Valleron, A.J.; Malaise, E.P.

1980-03-01

The effects of /sup 60/Co ..gamma.. rays on the length of the intermitotic period, the duration of mitosis, and the division probability of EMT6 cells have been studied in vitro using time-lapse cinematography. Irradiation increases the duration of the mitosis and of the cycle in comparable proportions: both parameters are practically doubled by a dose of 10 Gy. When daughters of irradiated cells die, the mitotic delay and lengthening of mitosis of their mother cells are longer than average. Mitotic delay and lengthening of mitosis depend on the age of cells at the moment of irradiation. The mitotic delay increases progressively when cells are irradiated during the first 8 h of their cycle (i.e., before the transition point), whereas mitosis is slightly prolonged. On the other hand, when the cells are irradiated after this transition point the mitotic delay decreases markedly, whereas the lengthening of mitosis increases sharply. These results tend to indicate that two different mechanisms are responsible for mitotic delay and prolongation of mitosis observed after irradiation.
Vectorization of nuclear codes for atmospheric transport and exposure calculation of radioactive materials

International Nuclear Information System (INIS)

Asai, Kiyoshi; Shinozawa, Naohisa; Ishikawa, Hirohiko; Chino, Masamichi; Hayashi, Takashi

1983-02-01

Three computer codes MATHEW, ADPIC of LLNL and GAMPUL of JAERI for prediction of wind field, concentration and external exposure rate of airborne radioactive materials are vectorized and the results are presented. Using the continuous equation of incompressible flow as a constraint, the MATHEW calculates the three dimensional wind field by a variational method. Using the particle-in -cell method, the ADPIC calculates the advection and diffusion of radioactive materials in three dimensional wind field and terrain, and gives the concentration of the materials in each cell of the domain. The GAMPUL calculates the external exposure rate assuming Gaussian plume type distribution of concentration. The vectorized code MATHEW attained 7.8 times speedup by a vector processor FACOM230-75 APU. The ADPIC and GAMPUL are estimated to attain 1.5 and 4 times speedup respectively on CRAY-1 type vector processor. (author)
Modeling laser wakefield accelerators in a Lorentz boosted frame

Energy Technology Data Exchange (ETDEWEB)

Vay, J.-L.; Geddes, C.G.R.; Cormier-Michel, E.; Grote, D.P.

2010-09-15

Modeling of laser-plasma wakefield accelerators in an optimal frame of reference [1] is shown to produce orders of magnitude speed-up of calculations from first principles. Obtaining these speedups requires mitigation of a high frequency instability that otherwise limits effectiveness in addition to solutions for handling data input and output in a relativistically boosted frame of reference. The observed high-frequency instability is mitigated using methods including an electromagnetic solver with tunable coefficients, its extension to accomodate Perfectly Matched Layers and Friedman's damping algorithms, as well as an efficient large bandwidth digital filter. It is shown that choosing theframe of the wake as the frame of reference allows for higher levels of filtering and damping than is possible in other frames for the same accuracy. Detailed testing also revealed serendipitously the existence of a singular time step at which the instability level is minimized, independently of numerical dispersion, thus indicating that the observed instability may not be due primarily to Numerical Cerenkov as has been conjectured. The techniques developed for Cerenkov mitigation prove nonetheless to be very efficient at controlling the instability. Using these techniques, agreement at the percentage level is demonstrated between simulations using different frames of reference, with speedups reaching two orders of magnitude for a 0.1 GeV class stages. The method then allows direct and efficient full-scale modeling of deeply depleted laser-plasma stages of 10 GeV-1 TeV for the first time, verifying the scaling of plasma accelerators to very high energies. Over 4, 5 and 6 orders of magnitude speedup is achieved for the modeling of 10 GeV, 100 GeV and 1 TeV class stages, respectively.
Modeling laser wakefield accelerators in a Lorentz boosted frame

Energy Technology Data Exchange (ETDEWEB)

Vay, J.-L.; Geddes, C.G.R.; Cormier-Michel, E.; Grotec, D. P.

2010-06-15

Modeling of laser-plasma wakefield accelerators in an optimal frame of reference is shown to produce orders of magnitude speed-up of calculations from first principles. Obtaining these speedups requires mitigation of a high-frequency instability that otherwise limits effectiveness in addition to solutions for handling data input and output in a relativistically boosted frame of reference. The observed high-frequency instability is mitigated using methods including an electromagnetic solver with tunable coefficients, its extension to accomodate Perfectly Matched Layers and Friedman's damping algorithms, as well as an efficient large bandwidth digital filter. It is shown that choosing the frame of the wake as the frame of reference allows for higher levels of filtering and damping than is possible in other frames for the same accuracy. Detailed testing also revealed serendipitously the existence of a singular time step at which the instability level is minimized, independently of numerical dispersion, thus indicating that the observed instability may not be due primarily to Numerical Cerenkov as has been conjectured. The techniques developed for Cerenkov mitigation prove nonetheless to be very efficient at controlling the instability. Using these techniques, agreement at the percentage level is demonstrated between simulations using different frames of reference, with speedups reaching two orders of magnitude for a 0.1 GeV class stages. The method then allows direct and efficient full-scale modeling of deeply depleted laser-plasma stages of 10 GeV-1 TeV for the first time, verifying the scaling of plasma accelerators to very high energies. Over 4, 5 and 6 orders of magnitude speedup is achieved for the modeling of 10 GeV, 100 GeV and 1 TeV class stages, respectively.
Modeling laser wakefield accelerators in a Lorentz boosted frame

International Nuclear Information System (INIS)

Vay, J.-L.; Geddes, C.G.R.; Cormier-Michel, E.; Grote, D.P.

2010-01-01

Modeling of laser-plasma wakefield accelerators in an optimal frame of reference (1) is shown to produce orders of magnitude speed-up of calculations from first principles. Obtaining these speedups requires mitigation of a high frequency instability that otherwise limits effectiveness in addition to solutions for handling data input and output in a relativistically boosted frame of reference. The observed high-frequency instability is mitigated using methods including an electromagnetic solver with tunable coefficients, its extension to accommodate Perfectly Matched Layers and Friedman's damping algorithms, as well as an efficient large bandwidth digital filter. It is shown that choosing the frame of the wake as the frame of reference allows for higher levels of filtering and damping than is possible in other frames for the same accuracy. Detailed testing also revealed serendipitously the existence of a singular time step at which the instability level is minimized, independently of numerical dispersion, thus indicating that the observed instability may not be due primarily to Numerical Cerenkov as has been conjectured. The techniques developed for Cerenkov mitigation prove nonetheless to be very efficient at controlling the instability. Using these techniques, agreement at the percentage level is demonstrated between simulations using different frames of reference, with speedups reaching two orders of magnitude for a 0.1 GeV class stages. The method then allows direct and efficient full-scale modeling of deeply depleted laser-plasma stages of 10 GeV-1 TeV for the first time, verifying the scaling of plasma accelerators to very high energies. Over 4, 5 and 6 orders of magnitude speedup is achieved for the modeling of 10 GeV, 100 GeV and 1 TeV class stages, respectively.
Comparing emerging and mature markets during times of crises: A non-extensive statistical approach

Science.gov (United States)

Namaki, A.; Koohi Lai, Z.; Jafari, G. R.; Raei, R.; Tehrani, R.

2013-07-01

One of the important issues in finance and economics for both scholars and practitioners is to describe the behavior of markets, especially during times of crises. In this paper, we analyze the behavior of some mature and emerging markets with a Tsallis entropy framework that is a non-extensive statistical approach based on non-linear dynamics. During the past decade, this technique has been successfully applied to a considerable number of complex systems such as stock markets in order to describe the non-Gaussian behavior of these systems. In this approach, there is a parameter q, which is a measure of deviation from Gaussianity, that has proved to be a good index for detecting crises. We investigate the behavior of this parameter in different time scales for the market indices. It could be seen that the specified pattern for q differs for mature markets with regard to emerging markets. The findings show the robustness of the stated approach in order to follow the market conditions over time. It is obvious that, in times of crises, q is much greater than in other times. In addition, the response of emerging markets to global events is delayed compared to that of mature markets, and tends to a Gaussian profile on increasing the scale. This approach could be very useful in application to risk and portfolio management in order to detect crises by following the parameter q in different time scales.
Parallel local search for solving Constraint Problems on the Cell Broadband Engine (Preliminary Results

Directory of Open Access Journals (Sweden)

Salvator Abreu

2009-10-01

Full Text Available We explore the use of the Cell Broadband Engine (Cell/BE for short for combinatorial optimization applications: we present a parallel version of a constraint-based local search algorithm that has been implemented on a multiprocessor BladeCenter machine with twin Cell/BE processors (total of 16 SPUs per blade. This algorithm was chosen because it fits very well the Cell/BE architecture and requires neither shared memory nor communication between processors, while retaining a compact memory footprint. We study the performance on several large optimization benchmarks and show that this achieves mostly linear time speedups, even sometimes super-linear. This is possible because the parallel implementation might explore simultaneously different parts of the search space and therefore converge faster towards the best sub-space and thus towards a solution. Besides getting speedups, the resulting times exhibit a much smaller variance, which benefits applications where a timely reply is critical.
Galois Field Instructions in the Sandblaster 2.0 Architectrue

Directory of Open Access Journals (Sweden)

Mayan Moudgill

2009-01-01

SIMD variants of the poly-multiply and poly-remainder instructions. We use a Reed-Solomon encoder and decoder to demonstrate the performance of our approach. Our new approach achieves speedup of 11.5x compared to the standard SIMD processor of 8x.
Restless Tuneup of High-Fidelity Qubit Gates

Science.gov (United States)

Rol, M. A.; Bultink, C. C.; O'Brien, T. E.; de Jong, S. R.; Theis, L. S.; Fu, X.; Luthi, F.; Vermeulen, R. F. L.; de Sterke, J. C.; Bruno, A.; Deurloo, D.; Schouten, R. N.; Wilhelm, F. K.; DiCarlo, L.

2017-04-01

We present a tuneup protocol for qubit gates with tenfold speedup over traditional methods reliant on qubit initialization by energy relaxation. This speedup is achieved by constructing a cost function for Nelder-Mead optimization from real-time correlation of nondemolition measurements interleaving gate operations without pause. Applying the protocol on a transmon qubit achieves 0.999 average Clifford fidelity in one minute, as independently verified using randomized benchmarking and gate-set tomography. The adjustable sensitivity of the cost function allows the detection of fractional changes in the gate error with a nearly constant signal-to-noise ratio. The restless concept demonstrated can be readily extended to the tuneup of two-qubit gates and measurement operations.
Experiences and results multitasking a hydrodynamics code on global and local memory machines

International Nuclear Information System (INIS)

Mandell, D.

1987-01-01

A one-dimensional, time-dependent Lagrangian hydrodynamics code using a Godunov solution method has been multimasked for the Cray X-MP/48, the Intel iPSC hypercube, the Alliant FX series and the IBM RP3 computers. Actual multitasking results have been obtained for the Cray, Intel and Alliant computers and simulated results were obtained for the Cray and RP3 machines. The differences in the methods required to multitask on each of the machines is discussed. Results are presented for a sample problem involving a shock wave moving down a channel. Comparisons are made between theoretical speedups, predicted by Amdahl's law, and the actual speedups obtained. The problems of debugging on the different machines are also described
Higher dimensional time-energy entanglement

International Nuclear Information System (INIS)

Richart, Daniel Lampert

2014-01-01

freedom improves its applicability to long distance quantum communication schemes. By doing that, the intrinsic limitations of other schemes based on the encoding into the momentum and polarization degree of freedom are overcome. This work presents results on a scalable experimental implementation of time-energy encoded higher dimensional states, demonstrating the feasibility of the scheme. Further tools are defined and used to characterize the properties of the prepared quantum states, such as their entanglement, their dimension and their preparation fidelity. Finally, the method of quantum state tomography is used to fully determine the underlying quantum states at the cost of an increased measurement effort and thus operation time. It is at this point that results obtained from the research field of compressed sensing help to decrease the necessary number of measurements. This scheme is compared with an adaptive tomography scheme designed to offer an additional reconstruction speedup. These results display the scalability of the scheme to bipartite dimensions higher than 2 x 8, equivalent to the encoding of quantum information into more than 6 qubits.
Higher dimensional time-energy entanglement

Energy Technology Data Exchange (ETDEWEB)

Richart, Daniel Lampert

2014-07-08

freedom improves its applicability to long distance quantum communication schemes. By doing that, the intrinsic limitations of other schemes based on the encoding into the momentum and polarization degree of freedom are overcome. This work presents results on a scalable experimental implementation of time-energy encoded higher dimensional states, demonstrating the feasibility of the scheme. Further tools are defined and used to characterize the properties of the prepared quantum states, such as their entanglement, their dimension and their preparation fidelity. Finally, the method of quantum state tomography is used to fully determine the underlying quantum states at the cost of an increased measurement effort and thus operation time. It is at this point that results obtained from the research field of compressed sensing help to decrease the necessary number of measurements. This scheme is compared with an adaptive tomography scheme designed to offer an additional reconstruction speedup. These results display the scalability of the scheme to bipartite dimensions higher than 2 x 8, equivalent to the encoding of quantum information into more than 6 qubits.
Comparative evaluation of nickel discharge from brackets in artificial saliva at different time intervals.

Science.gov (United States)

Jithesh, C; Venkataramana, V; Penumatsa, Narendravarma; Reddy, S N; Poornima, K Y; Rajasigamani, K

2015-08-01

To determine and compare the potential difference of nickel release from three different orthodontic brackets, in different artificial pH, in different time intervals. Twenty-seven samples of three different orthodontic brackets were selected and grouped as 1, 2, and 3. Each group was divided into three subgroups depending on the type of orthodontic brackets, salivary pH and the time interval. The Nickel release from each subgroup were analyzed by using inductively coupled plasma-Atomic Emission Spectrophotometer (Perkin Elmer, Optima 2100 DV, USA) model. Quantitative analysis of nickel was performed three times, and the mean value was used as result. ANOVA (F-test) was used to test the significant difference among the groups at 0.05 level of significance (P brackets have the highest at all 4.2 pH except in 120 h. The study result shows that the nickel release from the recycled stainless steel brackets is highest. Metal slot ceramic bracket release significantly less nickel. So, recycled stainless steel brackets should not be used for nickel allergic patients. Metal slot ceramic brackets are advisable.

Implementation of the DPM Monte Carlo code on a parallel architecture for treatment planning applications.

Science.gov (United States)

Tyagi, Neelam; Bose, Abhijit; Chetty, Indrin J

2004-09-01

We have parallelized the Dose Planning Method (DPM), a Monte Carlo code optimized for radiotherapy class problems, on distributed-memory processor architectures using the Message Passing Interface (MPI). Parallelization has been investigated on a variety of parallel computing architectures at the University of Michigan-Center for Advanced Computing, with respect to efficiency and speedup as a function of the number of processors. We have integrated the parallel pseudo random number generator from the Scalable Parallel Pseudo-Random Number Generator (SPRNG) library to run with the parallel DPM. The Intel cluster consisting of 800 MHz Intel Pentium III processor shows an almost linear speedup up to 32 processors for simulating 1 x 10(8) or more particles. The speedup results are nearly linear on an Athlon cluster (up to 24 processors based on availability) which consists of 1.8 GHz+ Advanced Micro Devices (AMD) Athlon processors on increasing the problem size up to 8 x 10(8) histories. For a smaller number of histories (1 x 10(8)) the reduction of efficiency with the Athlon cluster (down to 83.9% with 24 processors) occurs because the processing time required to simulate 1 x 10(8) histories is less than the time associated with interprocessor communication. A similar trend was seen with the Opteron Cluster (consisting of 1400 MHz, 64-bit AMD Opteron processors) on increasing the problem size. Because of the 64-bit architecture Opteron processors are capable of storing and processing instructions at a faster rate and hence are faster as compared to the 32-bit Athlon processors. We have validated our implementation with an in-phantom dose calculation study using a parallel pencil monoenergetic electron beam of 20 MeV energy. The phantom consists of layers of water, lung, bone, aluminum, and titanium. The agreement in the central axis depth dose curves and profiles at different depths shows that the serial and parallel codes are equivalent in accuracy.
Implementation of the DPM Monte Carlo code on a parallel architecture for treatment planning applications

International Nuclear Information System (INIS)

Tyagi, Neelam; Bose, Abhijit; Chetty, Indrin J.

2004-01-01

We have parallelized the Dose Planning Method (DPM), a Monte Carlo code optimized for radiotherapy class problems, on distributed-memory processor architectures using the Message Passing Interface (MPI). Parallelization has been investigated on a variety of parallel computing architectures at the University of Michigan-Center for Advanced Computing, with respect to efficiency and speedup as a function of the number of processors. We have integrated the parallel pseudo random number generator from the Scalable Parallel Pseudo-Random Number Generator (SPRNG) library to run with the parallel DPM. The Intel cluster consisting of 800 MHz Intel Pentium III processor shows an almost linear speedup up to 32 processors for simulating 1x10 8 or more particles. The speedup results are nearly linear on an Athlon cluster (up to 24 processors based on availability) which consists of 1.8 GHz+ Advanced Micro Devices (AMD) Athlon processors on increasing the problem size up to 8x10 8 histories. For a smaller number of histories (1x10 8 ) the reduction of efficiency with the Athlon cluster (down to 83.9% with 24 processors) occurs because the processing time required to simulate 1x10 8 histories is less than the time associated with interprocessor communication. A similar trend was seen with the Opteron Cluster (consisting of 1400 MHz, 64-bit AMD Opteron processors) on increasing the problem size. Because of the 64-bit architecture Opteron processors are capable of storing and processing instructions at a faster rate and hence are faster as compared to the 32-bit Athlon processors. We have validated our implementation with an in-phantom dose calculation study using a parallel pencil monoenergetic electron beam of 20 MeV energy. The phantom consists of layers of water, lung, bone, aluminum, and titanium. The agreement in the central axis depth dose curves and profiles at different depths shows that the serial and parallel codes are equivalent in accuracy
Analysis code of three dimensional core dynamics for high temperature gas-cooled reactors, COMIC-2

International Nuclear Information System (INIS)

Takano, Makoto

1987-04-01

The code has been improved and modified in order to speedup calculation and to make more effective since it was developed in 1985. This report is written as a user's manual of the latest version of the code (COMIC-2). Speedup of the code is performed by the improvement of program flow and vector programming. The total speedup factor depends on problem, however, is about 10 in the case of a sample ploblem. (author)
Performance comparison analysis library communication cluster system using merge sort

Science.gov (United States)

Wulandari, D. A. R.; Ramadhan, M. E.

2018-04-01

Begins by using a single processor, to increase the speed of computing time, the use of multi-processor was introduced. The second paradigm is known as parallel computing, example cluster. The cluster must have the communication potocol for processing, one of it is message passing Interface (MPI). MPI have many library, both of them OPENMPI and MPICH2. Performance of the cluster machine depend on suitable between performance characters of library communication and characters of the problem so this study aims to analyze the comparative performances libraries in handling parallel computing process. The case study in this research are MPICH2 and OpenMPI. This case research execute sorting’s problem to know the performance of cluster system. The sorting problem use mergesort method. The research method is by implementing OpenMPI and MPICH2 on a Linux-based cluster by using five computer virtual then analyze the performance of the system by different scenario tests and three parameters for to know the performance of MPICH2 and OpenMPI. These performances are execution time, speedup and efficiency. The results of this study showed that the addition of each data size makes OpenMPI and MPICH2 have an average speed-up and efficiency tend to increase but at a large data size decreases. increased data size doesn’t necessarily increased speed up and efficiency but only execution time example in 100000 data size. OpenMPI has a execution time greater than MPICH2 example in 1000 data size average execution time with MPICH2 is 0,009721 and OpenMPI is 0,003895 OpenMPI can customize communication needs.
A prospective observational study comparing a physiological scoring system with time-based discharge criteria in pediatric ambulatory surgical patients.

Science.gov (United States)

Armstrong, James; Forrest, Helen; Crawford, Mark W

2015-10-01

Discharge criteria based on physiological scoring systems can be used in the postanesthesia care unit (PACU) to fast-track patients after ambulatory surgery; however, studies comparing physiological scoring systems with traditional time-based discharge criteria are lacking. The purpose of this study was to compare PACU discharge readiness times using physiological vs time-based discharge criteria in pediatric ambulatory surgical patients. We recorded physiological observations from consecutive American Society of Anesthesiologists physical status I-III patients aged 1-18 yr who were admitted to the PACU after undergoing ambulatory surgery in a tertiary academic pediatric hospital. The physiological score was a combination of the Aldrete and Chung systems. Scores were recorded every 15 min starting upon arrival in the PACU. Patients were considered fit for discharge once they attained a score ≥12 (maximum score, 14), provided no score was zero, with the time to achieve a score ≥12 defining the criteria-based discharge (CBD) time. Patients were discharged from the PACU when both the CBD and the existing time-based discharge (TBD) criteria were met. The CBD and TBD data were compared using Kaplan-Meier and log-rank analysis. Observations from 506 children are presented. Median (interquartile range [IQR]) age was 5.5 [2.8-9.9] yr. Median [IQR] CBD and TBD PACU discharge readiness times were 30 [15-45] min and 60 [45-60] min, respectively. Analysis of Kaplan-Meier curves indicated a significant difference in discharge times using the different criteria (hazard ratio, 5.43; 95% confidence interval, 4.51 to 6.53; P < 0.001). All patients were discharged home without incident. This prospective study suggests that discharge decisions based on physiological criteria have the potential for significantly speeding the transit of children through the PACU, thereby enhancing PACU efficiency and resource utilization.
Massively parallel mathematical sieves

Energy Technology Data Exchange (ETDEWEB)

Montry, G.R.

1989-01-01

The Sieve of Eratosthenes is a well-known algorithm for finding all prime numbers in a given subset of integers. A parallel version of the Sieve is described that produces computational speedups over 800 on a hypercube with 1,024 processing elements for problems of fixed size. Computational speedups as high as 980 are achieved when the problem size per processor is fixed. The method of parallelization generalizes to other sieves and will be efficient on any ensemble architecture. We investigate two highly parallel sieves using scattered decomposition and compare their performance on a hypercube multiprocessor. A comparison of different parallelization techniques for the sieve illustrates the trade-offs necessary in the design and implementation of massively parallel algorithms for large ensemble computers.
Hypercube Expert System Shell - Applying Production Parallelism.

Science.gov (United States)

1989-12-01

possible processor organizations, or int( rconntction n thod,, for par- allel architetures . The following are examples of commonlv used interconnection...this timing analysis because match speed-up avaiiah& from production parallelism is proportional to the average number of affected produclions1 ( 11:5
The impact of closed-loop electronic medication management on time to first dose: a comparative study between paper and digital hospital environments.

Science.gov (United States)

Austin, Jodie A; Smith, Ian R; Tariq, Amina

2018-01-22

Closed-loop electronic medication management systems (EMMS) are recognised as an effective intervention to improve medication safety, yet evidence of their effectiveness in hospitals is limited. Few studies have compared medication turnaround time for a closed-loop electronic versus paper-based medication management environment. To compare medication turnaround times in a paper-based hospital environment with a digital hospital equipped with a closed-loop EMMS, consisting of computerised physician order entry, profiled automated dispensing cabinets packaged with unit dose medications and barcode medication administration. Data were collected during 2 weeks at three private hospital sites (one with closed-loop EMMS) within the same organisation network in Queensland, Australia. Time between scheduled and actual administration times was analysed for first dose of time-critical and non-critical medications located on the ward or sourced via pharmacy. Medication turnaround times at the EMMS site were less compared to the paper-based sites (median, IQR: 35 min, 8-57 min versus 120 min, 30-180 min, P medications, 77% were administered within 60 min of scheduled time at the EMMS site versus 38% for the paper-based sites. Similar difference was observed for non-critical medications, 80% were administered within 60 min of their scheduled time at the EMMS site versus 41% at the paper-based facilities. The study indicates medication turnaround times utilising a closed-loop EMMS are less compared to paper-based systems. This improvement may be attributable to increased accessibility of medications using automated dispensing cabinets and electronic medication administration records flagging tasks to nurses in real time. © 2018 Royal Pharmaceutical Society.
High performance GPU processing for inversion using uniform grid searches

Science.gov (United States)

Venetis, Ioannis E.; Saltogianni, Vasso; Stiros, Stathis; Gallopoulos, Efstratios

2017-04-01

Many geophysical problems are described by systems of redundant, highly non-linear systems of ordinary equations with constant terms deriving from measurements and hence representing stochastic variables. Solution (inversion) of such problems is based on numerical, optimization methods, based on Monte Carlo sampling or on exhaustive searches in cases of two or even three "free" unknown variables. Recently the TOPological INVersion (TOPINV) algorithm, a grid search-based technique in the Rn space, has been proposed. TOPINV is not based on the minimization of a certain cost function and involves only forward computations, hence avoiding computational errors. The basic concept is to transform observation equations into inequalities on the basis of an optimization parameter k and of their standard errors, and through repeated "scans" of n-dimensional search grids for decreasing values of k to identify the optimal clusters of gridpoints which satisfy observation inequalities and by definition contain the "true" solution. Stochastic optimal solutions and their variance-covariance matrices are then computed as first and second statistical moments. Such exhaustive uniform searches produce an excessive computational load and are extremely time consuming for common computers based on a CPU. An alternative is to use a computing platform based on a GPU, which nowadays is affordable to the research community, which provides a much higher computing performance. Using the CUDA programming language to implement TOPINV allows the investigation of the attained speedup in execution time on such a high performance platform. Based on synthetic data we compared the execution time required for two typical geophysical problems, modeling magma sources and seismic faults, described with up to 18 unknown variables, on both CPU/FORTRAN and GPU/CUDA platforms. The same problems for several different sizes of search grids (up to 1012 gridpoints) and numbers of unknown variables were solved on
QuickVina: accelerating AutoDock Vina using gradient-based heuristics for global optimization.

Science.gov (United States)

Handoko, Stephanus Daniel; Ouyang, Xuchang; Su, Chinh Tran To; Kwoh, Chee Keong; Ong, Yew Soon

2012-01-01

Predicting binding between macromolecule and small molecule is a crucial phase in the field of rational drug design. AutoDock Vina, one of the most widely used docking software released in 2009, uses an empirical scoring function to evaluate the binding affinity between the molecules and employs the iterated local search global optimizer for global optimization, achieving a significantly improved speed and better accuracy of the binding mode prediction compared its predecessor, AutoDock 4. In this paper, we propose further improvement in the local search algorithm of Vina by heuristically preventing some intermediate points from undergoing local search. Our improved version of Vina-dubbed QVina-achieved a maximum acceleration of about 25 times with the average speed-up of 8.34 times compared to the original Vina when tested on a set of 231 protein-ligand complexes while maintaining the optimal scores mostly identical. Using our heuristics, larger number of different ligands can be quickly screened against a given receptor within the same time frame.
Lessons from Elsewhere?: Comparative Music Education in Times of Globalization

Science.gov (United States)

Kertz-Welzel, Alexandra

2015-01-01

In recent years, comparative education and comparative music education became important fields of research. Due to globalization, but also to international student assessments, it is most common to compare the outcomes of entire school systems or specific subject areas. The main goal is to identify the most successful systems and their best…
Effect of saturating ferrite on the field in a prototype kicker magnet

International Nuclear Information System (INIS)

Barnes, M.J.; Wait, G.D.

1994-06-01

The field rise for kicker magnets is often specified between 1% and 99% of full strength. Three-gap thyratrons are frequently used as switches for kicker magnet systems. These thyratrons turn on in three stages: the collapse of voltage across one gap causes a displacement current to flow in the parasitic capacitance of off-state gap(s). The displacement current flows in the external circuit and can thus increase the effective rise-time of the field in the kicker magnet. One promising method of decreasing the effect of the displacement current involves the use of saturating ferrites. Another method for achieving the specified rise-time and 'flatness' for the kick strength is to utilize speed-up networks in the electrical circuit. Measurements have been carried out on a prototype kicker magnet with a speed-up network and various geometries of saturating ferrite. Measurements and PSpice calculations are presented. (author)
Resonator reset in circuit QED by optimal control for large open quantum systems

Science.gov (United States)

Boutin, Samuel; Andersen, Christian Kraglund; Venkatraman, Jayameenakshi; Ferris, Andrew J.; Blais, Alexandre

2017-10-01

We study an implementation of the open GRAPE (gradient ascent pulse engineering) algorithm well suited for large open quantum systems. While typical implementations of optimal control algorithms for open quantum systems rely on explicit matrix exponential calculations, our implementation avoids these operations, leading to a polynomial speedup of the open GRAPE algorithm in cases of interest. This speedup, as well as the reduced memory requirements of our implementation, are illustrated by comparison to a standard implementation of open GRAPE. As a practical example, we apply this open-system optimization method to active reset of a readout resonator in circuit QED. In this problem, the shape of a microwave pulse is optimized such as to empty the cavity from measurement photons as fast as possible. Using our open GRAPE implementation, we obtain pulse shapes, leading to a reset time over 4 times faster than passive reset.
A comparative analysis of short-range travel time prediction methods

NARCIS (Netherlands)

Huisken, Giovanni; van Berkum, Eric C.

2003-01-01

Increasing car mobility has lead to an increasing demand for traffic information. This contribution deals with information about travel times. When car drivers are provided with this type of information, the travel times should ideally be the times that they will encounter. As a result travel times
The Mechanisms of Space-Time Association: Comparing Motor and Perceptual Contributions in Time Reproduction

Science.gov (United States)

Fabbri, Marco; Cellini, Nicola; Martoni, Monica; Tonetti, Lorenzo; Natale, Vincenzo

2013-01-01

The spatial-temporal association indicates that time is represented spatially along a left-to-right line. It is unclear whether the spatial-temporal association is mainly related to a perceptual or a motor component. In addition, the spatial-temporal association is not consistently found using a time reproduction task. Our rationale for this…
The effects of Red Bull energy drink compared with caffeine on cycling time-trial performance.

Science.gov (United States)

Quinlivan, Alannah; Irwin, Christopher; Grant, Gary D; Anoopkumar-Dukie, Sheilandra; Skinner, Tina; Leveritt, Michael; Desbrow, Ben

2015-10-01

This study investigated the ergogenic effects of a commercial energy drink (Red Bull) or an equivalent dose of anhydrous caffeine in comparison with a noncaffeinated control beverage on cycling performance. Eleven trained male cyclists (31.7 ± 5.9 y 82.3 ± 6.1 kg, VO2max = 60.3 ± 7.8 mL · kg-1 · min-1) participated in a double-blind, placebo-controlled, crossover-design study involving 3 experimental conditions. Participants were randomly administered Red Bull (9.4 mL/kg body mass [BM] containing 3 mg/kg BM caffeine), anhydrous caffeine (3 mg/kg BM given in capsule form), or a placebo 90 min before commencing a time trial equivalent to 1 h cycling at 75% peak power output. Carbohydrate and fluid volumes were matched across all trials. Performance improved by 109 ± 153 s (2.8%, P = .039) after Red Bull compared with placebo and by 120 ± 172 s (3.1%, P = .043) after caffeine compared with placebo. No significant difference (P > .05) in performance time was detected between Red Bull and caffeine treatments. There was no significant difference (P > .05) in mean heart rate or rating of perceived exertion among the 3 treatments. This study demonstrated that a moderate dose of caffeine consumed as either Red Bull or in anhydrous form enhanced cycling time-trial performance. The ergogenic benefits of Red Bull energy drink are therefore most likely due to the effects of caffeine, with the other ingredients not likely to offer additional benefit.
Parallel optimization of IDW interpolation algorithm on multicore platform

Science.gov (United States)

Guan, Xuefeng; Wu, Huayi

2009-10-01

Due to increasing power consumption, heat dissipation, and other physical issues, the architecture of central processing unit (CPU) has been turning to multicore rapidly in recent years. Multicore processor is packaged with multiple processor cores in the same chip, which not only offers increased performance, but also presents significant challenges to application developers. As a matter of fact, in GIS field most of current GIS algorithms were implemented serially and could not best exploit the parallelism potential on such multicore platforms. In this paper, we choose Inverse Distance Weighted spatial interpolation algorithm (IDW) as an example to study how to optimize current serial GIS algorithms on multicore platform in order to maximize performance speedup. With the help of OpenMP, threading methodology is introduced to split and share the whole interpolation work among processor cores. After parallel optimization, execution time of interpolation algorithm is greatly reduced and good performance speedup is achieved. For example, performance speedup on Intel Xeon 5310 is 1.943 with 2 execution threads and 3.695 with 4 execution threads respectively. An additional output comparison between pre-optimization and post-optimization is carried out and shows that parallel optimization does to affect final interpolation result.
Reliable RANSAC Using a Novel Preprocessing Model

Directory of Open Access Journals (Sweden)

Xiaoyan Wang

2013-01-01

Full Text Available Geometric assumption and verification with RANSAC has become a crucial step for corresponding to local features due to its wide applications in biomedical feature analysis and vision computing. However, conventional RANSAC is very time-consuming due to redundant sampling times, especially dealing with the case of numerous matching pairs. This paper presents a novel preprocessing model to explore a reduced set with reliable correspondences from initial matching dataset. Both geometric model generation and verification are carried out on this reduced set, which leads to considerable speedups. Afterwards, this paper proposes a reliable RANSAC framework using preprocessing model, which was implemented and verified using Harris and SIFT features, respectively. Compared with traditional RANSAC, experimental results show that our method is more efficient.
GPU-advanced 3D electromagnetic simulations of superconductors in the Ginzburg–Landau formalism

Energy Technology Data Exchange (ETDEWEB)

Stošić, Darko; Stošić, Dušan; Ludermir, Teresa [Centro de Informática, Universidade Federal de Pernambuco, Av. Luiz Freire s/n, 50670-901, Recife, PE (Brazil); Stošić, Borko [Departamento de Estatística e Informática, Universidade Federal Rural de Pernambuco, Rua Dom Manoel de Medeiros s/n, Dois Irmãos, 52171-900 Recife, PE (Brazil); Milošević, Milorad V., E-mail: milorad.milosevic@uantwerpen.be [Departement Fysica, Universiteit Antwerpen, Groenenborgerlaan 171, B-2020 Antwerpen (Belgium)

2016-10-01

Ginzburg–Landau theory is one of the most powerful phenomenological theories in physics, with particular predictive value in superconductivity. The formalism solves coupled nonlinear differential equations for both the electronic and magnetic responsiveness of a given superconductor to external electromagnetic excitations. With order parameter varying on the short scale of the coherence length, and the magnetic field being long-range, the numerical handling of 3D simulations becomes extremely challenging and time-consuming for realistic samples. Here we show precisely how one can employ graphics-processing units (GPUs) for this type of calculations, and obtain physics answers of interest in a reasonable time-frame – with speedup of over 100× compared to best available CPU implementations of the theory on a 256{sup 3} grid.
Comparing Young and Elderly Serial Reaction Time Task Performance on Repeated and Random Conditions

Directory of Open Access Journals (Sweden)

Fatemeh Ehsani

2012-07-01

Full Text Available Objectives: Acquisition motor skill training in elderly is at great importance. The main purpose of this study was to compare young and elderly performance in serial reaction time task on different repeated and random conditions. Methods & Materials: A serial reaction time task by using software was applied for studying motor learning in 30 young and 30 elderly. Each group divided randomly implicitly and explicitly into subgroups. A task 4 squares with different colors appeared on the monitor and subjects were asked to press its defined key immediately after observing it. Subjects practiced 8 motor blocks (4 repeated blocks, then 2 random blocks and 2 repeated blocks. Block time that was dependent variable measured and Independent-samples t- test with repeated ANOVA measures were used in this test. Results: young groups performed both repeated and random sequences significantly faster than elderly (P0.05. Explicit older subgroup performed 7,8 blocks slower than 6 block with a significant difference (P<0.05. Conclusion: Young adults discriminate high level performance than elderly in both repeated and random practice. Elderly performed random practice better than repeated practice.

GPU-accelerated atmospheric chemical kinetics in the ECHAM/MESSy (EMAC) Earth system model (version 2.52)

Science.gov (United States)

Alvanos, Michail; Christoudias, Theodoros

2017-10-01

This paper presents an application of GPU accelerators in Earth system modeling. We focus on atmospheric chemical kinetics, one of the most computationally intensive tasks in climate-chemistry model simulations. We developed a software package that automatically generates CUDA kernels to numerically integrate atmospheric chemical kinetics in the global climate model ECHAM/MESSy Atmospheric Chemistry (EMAC), used to study climate change and air quality scenarios. A source-to-source compiler outputs a CUDA-compatible kernel by parsing the FORTRAN code generated by the Kinetic PreProcessor (KPP) general analysis tool. All Rosenbrock methods that are available in the KPP numerical library are supported.Performance evaluation, using Fermi and Pascal CUDA-enabled GPU accelerators, shows achieved speed-ups of 4. 5 × and 20. 4 × , respectively, of the kernel execution time. A node-to-node real-world production performance comparison shows a 1. 75 × speed-up over the non-accelerated application using the KPP three-stage Rosenbrock solver. We provide a detailed description of the code optimizations used to improve the performance including memory optimizations, control code simplification, and reduction of idle time. The accuracy and correctness of the accelerated implementation are evaluated by comparing to the CPU-only code of the application. The median relative difference is found to be less than 0.000000001 % when comparing the output of the accelerated kernel the CPU-only code.The approach followed, including the computational workload division, and the developed GPU solver code can potentially be used as the basis for hardware acceleration of numerous geoscientific models that rely on KPP for atmospheric chemical kinetics applications.
Untitled

Indian Academy of Sciences (India)

Gs run time of sequential algorithm on 1 CPU run time of parallel algorithm on NCPU. This measure only shows the efficiency of the parallelization in terms of the algorithm itself but not a comparison with the best sequential algorithm. Thus a more realistic measure is defined as the speedup of the parallel algorithm against ...
Utilization of reduced fuelling ripple set in ROP detector layout optimization

International Nuclear Information System (INIS)

Kastanya, Doddy

2012-01-01

Highlights: ► ADORE is an ROP detect layout optimization algorithm in CANDU reactors. ► The effect of using reduced set of fuelling ripples in ADORE is assessed. ► Significant speedup can be realized by adopting this approach. ► The quality of the results is comparable to results from full set of ripples. - Abstract: The ADORE (Alternative Detector layout Optimization for REgional overpower protection system) algorithm for performing the optimization of regional overpower protection (ROP) for CANDU® reactors has been recently developed. This algorithm utilizes the simulated annealing (SA) stochastic optimization technique to come up with an optimized detector layout for the ROP systems. For each history in the SA iteration where a particular detector layout is evaluated, the goodness of this detector layout is measured in terms of its trip set point value which is obtained by performing a probabilistic trip set point calculation using the ROVER-F code. Since during each optimization execution thousands of candidate detector layouts are evaluated, the overall optimization process is time consuming. Since for each ROVER-F evaluation the number of fuelling ripples controls the execution time, reducing the number of fuelling ripples will reduce the overall execution time. This approach has been investigated and the results are presented in this paper. The challenge is to construct a set of representative fuelling ripples which will significantly speedup the optimization process while guaranteeing that the resulting detector layout has similar quality to the ones produced when the complete set of fuelling ripples is employed.
Comparative evaluation of conventional RT-PCR and real-time RT-PCR (RRT-PCR) for detection of avian metapneumovirus subtype A

OpenAIRE

Ferreira, HL; Spilki, FR; dos Santos, MMAB; de Almeida, RS; Arns, CW

2009-01-01

Avian metapneumovirus (AMPV) belongs to Metapneumovirus genus of Paramyxoviridae family. Virus isolation, serology, and detection of genomic RNA are used as diagnostic methods for AMPV. The aim of the present study was to compare the detection of six subgroup A AMPV isolates (AMPV/A) viral RNA by using different conventional and real time RT-PCR methods. Two new RT-PCR tests and two real time RT-PCR tests, both detecting fusion (F) gene and nucleocapsid (N) gene were compared with an establis...
View-Dependent Adaptive Cloth Simulation with Buckling Compensation.

Science.gov (United States)

Koh, Woojong; Narain, Rahul; O'Brien, James F

2015-10-01

This paper describes a method for view-dependent cloth simulation using dynamically adaptive mesh refinement and coarsening. Given a prescribed camera motion, the method adjusts the criteria controlling refinement to account for visibility and apparent size in the camera's view. Objectionable dynamic artifacts are avoided by anticipative refinement and smoothed coarsening, while locking in extremely coarsened regions is inhibited by modifying the material model to compensate for unresolved sub-element buckling. This approach preserves the appearance of detailed cloth throughout the animation while avoiding the wasted effort of simulating details that would not be discernible to the viewer. The computational savings realized by this method increase as scene complexity grows. The approach produces a 2× speed-up for a single character and more than 4× for a small group as compared to view-independent adaptive simulations, and respectively 5× and 9× speed-ups as compared to non-adaptive simulations.
Comparative analysis of long-time variations of multicomponent ion ring current according to data of geostationary Gorizont satellite

International Nuclear Information System (INIS)

Kovtyukh, A.S.; Panasyuk, M.I.; Vlasova, N.A.; Sosnovets, Eh.N.

1990-01-01

Long-time variations of the fluxes of the H + , [N,O] 2+ and [C,N,O] 6 6 + ions with energy E/Q∼60-120 keV/e measured by the GORIZONT (1985-07A) satellite in the geostationary orbit at noon time are analyzed. The results are dsicussed and are compared with current models of the formation of the Earth's ion ring current
Conventional and narrow bore short capillary columns with cyclodextrin derivatives as chiral selectors to speed-up enantioselective gas chromatography and enantioselective gas chromatography-mass spectrometry analyses.

Science.gov (United States)

Bicchi, Carlo; Liberto, Erica; Cagliero, Cecilia; Cordero, Chiara; Sgorbini, Barbara; Rubiolo, Patrizia

2008-11-28

The analysis of complex real-world samples of vegetable origin requires rapid and accurate routine methods, enabling laboratories to increase sample throughput and productivity while reducing analysis costs. This study examines shortening enantioselective-GC (ES-GC) analysis time following the approaches used in fast GC. ES-GC separations are due to a weak enantiomer-CD host-guest interaction and the separation is thermodynamically driven and strongly influenced by temperature. As a consequence, fast temperature rates can interfere with enantiomeric discrimination; thus the use of short and/or narrow bore columns is a possible approach to speeding-up ES-GC analyses. The performance of ES-GC with a conventional inner diameter (I.D.) column (25 m length x 0.25 mm I.D., 0.15 microm and 0.25 microm d(f)) coated with 30% of 2,3-di-O-ethyl-6-O-tert-butyldimethylsilyl-beta-cyclodextrin in PS-086 is compared to those of conventional I.D. short column (5m length x 0.25 mm I.D., 0.15 microm d(f)) and of different length narrow bore columns (1, 2, 5 and 10 m long x 0.10 mm I.D., 0.10 microm d(f)) in analysing racemate standards of pesticides and in the flavour and fragrance field and real-world-samples. Short conventional I.D. columns gave shorter analysis time and comparable or lower resolutions with the racemate standards, depending mainly on analyte volatility. Narrow-bore columns were tested under different analysis conditions; they provided shorter analysis time and resolutions comparable to those of conventional I.D. ES columns. The narrow-bore columns offering the most effective compromise between separation efficiency and analysis time are the 5 and 2m columns; in combination with mass spectrometry as detector, applied to lavender and bergamot essential oil analyses, these reduced analysis time by a factor of at least three while separation of chiral markers remained unaltered.
Large-Scale Multi-Dimensional Document Clustering on GPU Clusters

Energy Technology Data Exchange (ETDEWEB)

Cui, Xiaohui [ORNL; Mueller, Frank [North Carolina State University; Zhang, Yongpeng [ORNL; Potok, Thomas E [ORNL

2010-01-01

Document clustering plays an important role in data mining systems. Recently, a flocking-based document clustering algorithm has been proposed to solve the problem through simulation resembling the flocking behavior of birds in nature. This method is superior to other clustering algorithms, including k-means, in the sense that the outcome is not sensitive to the initial state. One limitation of this approach is that the algorithmic complexity is inherently quadratic in the number of documents. As a result, execution time becomes a bottleneck with large number of documents. In this paper, we assess the benefits of exploiting the computational power of Beowulf-like clusters equipped with contemporary Graphics Processing Units (GPUs) as a means to significantly reduce the runtime of flocking-based document clustering. Our framework scales up to over one million documents processed simultaneously in a sixteennode GPU cluster. Results are also compared to a four-node cluster with higher-end GPUs. On these clusters, we observe 30X-50X speedups, which demonstrates the potential of GPU clusters to efficiently solve massive data mining problems. Such speedups combined with the scalability potential and accelerator-based parallelization are unique in the domain of document-based data mining, to the best of our knowledge.
GPGPU Implementation of a Genetic Algorithm for Stereo Refinement

Directory of Open Access Journals (Sweden)

Álvaro Arranz

2015-03-01

Full Text Available During the last decade, the general-purpose computing on graphics processing units Graphics (GPGPU has turned out to be a useful tool for speeding up many scientific calculations. Computer vision is known to be one of the fields with more penetration of these new techniques. This paper explores the advantages of using GPGPU implementation to speedup a genetic algorithm used for stereo refinement. The main contribution of this paper is analyzing which genetic operators take advantage of a parallel approach and the description of an efficient state- of-the-art implementation for each one. As a result, speed-ups close to x80 can be achieved, demonstrating to be the only way of achieving close to real-time performance.
A Coflow-based Co-optimization Framework for High-performance Data Analytics

NARCIS (Netherlands)

Cheng, Long; Wang, Ying; Pei, Yulong; Epema, D.H.J.

2017-01-01

Efficient execution of distributed database operators such as joining and aggregating is critical for the performance of big data analytics. With the increase of the compute speedup of modern CPUs, reducing the network
communication time of these operators in large systems is becoming
A coflow-based co-optimization framework for high-performance data analytics

NARCIS (Netherlands)

Cheng, L.; Wang, Y.; Pei, Y.; Epema, D.H.J.

2017-01-01

Efficient execution of distributed database operators such as joining and aggregating is critical for the performance of big data analytics. With the increase of the compute speedup of modern CPUs, reducing the network communication time of these operators in large systems is becoming increasingly
Low noise buffer amplifiers and buffered phase comparators for precise time and frequency measurement and distribution

Science.gov (United States)

Eichinger, R. A.; Dachel, P.; Miller, W. H.; Ingold, J. S.

1982-01-01

Extremely low noise, high performance, wideband buffer amplifiers and buffered phase comparators were developed. These buffer amplifiers are designed to distribute reference frequencies from 30 KHz to 45 MHz from a hydrogen maser without degrading the hydrogen maser's performance. The buffered phase comparators are designed to intercompare the phase of state of the art hydrogen masers without adding any significant measurement system noise. These devices have a 27 femtosecond phase stability floor and are stable to better than one picosecond for long periods of time. Their temperature coefficient is less than one picosecond per degree C, and they have shown virtually no voltage coefficients.
Numerical Solution of Diffusion Models in Biomedical Imaging on Multicore Processors

Directory of Open Access Journals (Sweden)

Luisa D'Amore

2011-01-01

Full Text Available In this paper, we consider nonlinear partial differential equations (PDEs of diffusion/advection type underlying most problems in image analysis. As case study, we address the segmentation of medical structures. We perform a comparative study of numerical algorithms arising from using the semi-implicit and the fully implicit discretization schemes. Comparison criteria take into account both the accuracy and the efficiency of the algorithms. As measure of accuracy, we consider the Hausdorff distance and the residuals of numerical solvers, while as measure of efficiency we consider convergence history, execution time, speedup, and parallel efficiency. This analysis is carried out in a multicore-based parallel computing environment.
Sample size for comparing negative binomial rates in noninferiority and equivalence trials with unequal follow-up times.

Science.gov (United States)

Tang, Yongqiang

2017-05-25

We derive the sample size formulae for comparing two negative binomial rates based on both the relative and absolute rate difference metrics in noninferiority and equivalence trials with unequal follow-up times, and establish an approximate relationship between the sample sizes required for the treatment comparison based on the two treatment effect metrics. The proposed method allows the dispersion parameter to vary by treatment groups. The accuracy of these methods is assessed by simulations. It is demonstrated that ignoring the between-subject variation in the follow-up time by setting the follow-up time for all individuals to be the mean follow-up time may greatly underestimate the required size, resulting in underpowered studies. Methods are provided for back-calculating the dispersion parameter based on the published summary results.
Transmission Near-Infrared (NIR) and Photon Time-of-Flight (PTOF) Spectroscopy in a Comparative Analysis of Pharmaceuticals

DEFF Research Database (Denmark)

Kamran, Faisal; Abildgaard, Otto Højager Attermann; Sparén, Anders

2015-01-01

We present a comprehensive study of the application of photon time-of-flight spectroscopy (PTOFS) in the wavelength range 1050– 1350 nm as a spectroscopic technique for the evaluation of the chemical composition and structural properties of pharmaceutical tablets. PTOFS is compared to transmissio...
AirCompare

Data.gov (United States)

U.S. Environmental Protection Agency — AirCompare contains air quality information that allows a user to compare conditions in different localities over time and compare conditions in the same location at...
Parallel Optimization of a Reversible (Quantum) Ripple-Carry Adder

DEFF Research Database (Denmark)

Thomsen, Michael Kirkedal; Axelsen, Holger Bock

2008-01-01

(mk). We also show designs for garbage-less reversible set-less-than circuits. We compare the circuit costs of the CDKM and parallel adder in measures of circuit delay, width, gate and transistor count, and find that the parallelized adder offers significant speedups at realistic word sizes with modest...
MONA Implementation Secrets

DEFF Research Database (Denmark)

Klarlund, Nils; Møller, Anders; Schwartzbach, Michael Ignatieff

2002-01-01

a period of six years. Compared to the first naive version, the present tool is faster by several orders of magnitude. This speedup is obtained from many different contributions working on all levels of the compilation and execution of formulas. We present a selection of implementation "secrets" that have...
Baseline simple and complex reaction times in female compared to male boxers.

Science.gov (United States)

Bianco, M; Ferri, M; Fabiano, C; Giorgiano, F; Tavella, S; Manili, U; Faina, M; Palmieri, V; Zeppilli, P

2011-06-01

The aim of the study was to compare baseline cognitive performance of female in respect to male amateur boxers. Study population included 28 female amateur boxers. Fifty-six male boxers, matched for age, employment and competitive level to female athletes, formed the control group. All boxers had no history of head concussions (except boxing). Each boxer was requested to: 1) fulfill a questionnaire collecting demographic data, level of education, occupational status, boxing record and number of head concussions during boxing; 2) undergo a baseline computerized neuropsychological (NP) test (CogSport) measuring simple and complex reaction times (RT). Female were lighter than male boxers (56±7 vs. 73.1±9.8 kg, Pknock-outs, etc.) correlated with NP scores. Female and male Olympic-style boxers have no (or minimal) differences in baseline cognitive performance. Further research with larger series of female boxers is required to confirm these findings.
Using Intel's Knight Landing Processor to Accelerate Global Nested Air Quality Prediction Modeling System (GNAQPMS) Model

Science.gov (United States)

Wang, H.; Chen, H.; Chen, X.; Wu, Q.; Wang, Z.

2016-12-01

The Global Nested Air Quality Prediction Modeling System for Hg (GNAQPMS-Hg) is a global chemical transport model coupled Hg transport module to investigate the mercury pollution. In this study, we present our work of transplanting the GNAQPMS model on Intel Xeon Phi processor, Knights Landing (KNL) to accelerate the model. KNL is the second-generation product adopting Many Integrated Core Architecture (MIC) architecture. Compared with the first generation Knight Corner (KNC), KNL has more new hardware features, that it can be used as unique processor as well as coprocessor with other CPU. According to the Vtune tool, the high overhead modules in GNAQPMS model have been addressed, including CBMZ gas chemistry, advection and convection module, and wet deposition module. These high overhead modules were accelerated by optimizing code and using new techniques of KNL. The following optimized measures was done: 1) Changing the pure MPI parallel mode to hybrid parallel mode with MPI and OpenMP; 2.Vectorizing the code to using the 512-bit wide vector computation unit. 3. Reducing unnecessary memory access and calculation. 4. Reducing Thread Local Storage (TLS) for common variables with each OpenMP thread in CBMZ. 5. Changing the way of global communication from files writing and reading to MPI functions. After optimization, the performance of GNAQPMS is greatly increased both on CPU and KNL platform, the single-node test showed that optimized version has 2.6x speedup on two sockets CPU platform and 3.3x speedup on one socket KNL platform compared with the baseline version code, which means the KNL has 1.29x speedup when compared with 2 sockets CPU platform.

Comparative Effectiveness of Low-Volume Time-Efficient Resistance Training Versus Endurance Training in Patients With Heart Failure

DEFF Research Database (Denmark)

Munch, Gregers Winding; Birgitte Rosenmeier, Jaya; Petersen, Morten

2018-01-01

-related quality of life in lower New York Heart Association-stage HF patients, despite less time required as well as lower energy expenditure during TRE than during AMC. Therefore, TRE might represent a time-efficient exercise modality for improving adherence to exercise in patients with class I-II HF.......PURPOSE: Cardiorespiratory fitness is positively related to heart failure (HF) prognosis, but lack of time and low energy are barriers for adherence to exercise. We, therefore, compared the effect of low-volume time-based resistance exercise training (TRE) with aerobic moderate-intensity cycling...... (AMC) on maximal and submaximal exercise capacity, health-related quality of life, and vascular function. METHODS: Twenty-eight HF patients (New York Heart Association class I-II) performed AMC (n = 14) or TRE (n = 14). Maximal and submaximal exercise capacity, health-related quality of life...
Load balancing in highly parallel processing of Monte Carlo code for particle transport

International Nuclear Information System (INIS)

Higuchi, Kenji; Takemiya, Hiroshi; Kawasaki, Takuji

1998-01-01

In parallel processing of Monte Carlo (MC) codes for neutron, photon and electron transport problems, particle histories are assigned to processors making use of independency of the calculation for each particle. Although we can easily parallelize main part of a MC code by this method, it is necessary and practically difficult to optimize the code concerning load balancing in order to attain high speedup ratio in highly parallel processing. In fact, the speedup ratio in the case of 128 processors remains in nearly one hundred times when using the test bed for the performance evaluation. Through the parallel processing of the MCNP code, which is widely used in the nuclear field, it is shown that it is difficult to attain high performance by static load balancing in especially neutron transport problems, and a load balancing method, which dynamically changes the number of assigned particles minimizing the sum of the computational and communication costs, overcomes the difficulty, resulting in nearly fifteen percentage of reduction for execution time. (author)
Aerodynamic optimization of the blades of diffuser-augmented wind turbines

International Nuclear Information System (INIS)

Vaz, Jerson R.P.; Wood, David H.

2016-01-01

Highlights: • An optimization procedure to design shrouded wind turbine blades is proposed. • The procedure relies on the diffuser speed-up ratio. • The diffuser speed-up ratio increases the velocity at the rotor plane. • Chord and twist angle are optimized for typical speed-up ratios. • The procedure is applicable for any tip-speed ratio greater than 1. - Abstract: Adding an exit diffuser is known to allow wind turbines to exceed the classical Betz–Joukowsky limit for a bare turbine. It is not clear, however, if there is a limit for diffuser-augmented turbines or whether the structural and other costs of the diffuser outweigh any gain in power. This work presents a new approach to the aerodynamic optimization of a wind turbine with a diffuser. It is based on an extension of the well-known Blade Element Theory and a simple model for diffuser efficiency. It is assumed that the same conditions for the axial velocity in the wake of an ordinary wind turbine can be applied on the flow far downwind of the diffuser outlet. An algorithm to optimize the blade chord and twist angle distributions in the presence of a diffuser was developed and implemented. As a result, an aerodynamic improvement of the turbine rotor geometry was achieved with the blade shape sensitive to the diffuser speed-up ratio. In order to evaluate the proposed approach, a comparison with the classical Glauert optimization was performed for a flanged diffuser, which increased the efficiency. In addition, a comparative assessment was made with experimental results available in the literature, suggesting better performance for the rotor designed with the proposed optimization procedure.
MONA Implementation Secrets

DEFF Research Database (Denmark)

Klarlund, Nils; Møller, Anders; Schwartzbach, Michael Ignatieff

2001-01-01

a period of six years. Compared to the first naive version, the present tool is faster by several orders of magnitude. This speedup is obtained from many different contributions working on all levels of the compilation and execution of formulas. We present a selection of implementation “secrets” that have...
The importance of the time scale in radiation detection exemplified by comparing conventional and avalache semiconductor detectors

Energy Technology Data Exchange (ETDEWEB)

Tove, P A; Cho, Z H; Huth, G C [California Univ., Los Angeles (USA). Lab. of Nuclear Medicine and Radiation Biology

1976-02-01

The profound importance of the time scale of a radiation detection process is discussed in an analysis of limitations in energy resolution and timing, with emphasis on semiconductor detectors used for X-ray detection. The basic event detection time involves stopping of the particle and creating a distribution of free electrons and holes containing all desired information (energy, time position) about the particle or quantum, in a time approximately equal to 10/sup -12/s. The process of extracting this information usually involves a much longer time because the signal is generated in the relatively slow process of charge collection, and further prolongation may be caused by signal processing required to depress noise for improving energy resolution. This is a common situation for conventional semiconductor detectors with external amplifiers where time constants of 10/sup -5/-10/sup -4/s may be optimum, primarily because of amplifier noise. A different situation applies to the avalanche detector where internal amplification helps in suppressing noise without expanding the time scale of detections, resulting in an optimum time of 10/sup -9/-10/sup -8/s. These two cases are illustrated by plotting energy resolution vs. time constant, for different magnitudes of the parallel and series type noise sources. The effects of the inherent energy spread due to statistips and spatial inhomogeneities are also discussed to illustrate the potential of these two approaches for energy and time determination. Two constructional approaches for avalanche detectors are briefly compared.
Overtaking CPU DBMSes with a GPU in whole-query analytic processing with parallelism-friendly execution plan optimization

NARCIS (Netherlands)

A. Agbaria (Adnan); D. Minor (David); N. Peterfreund (Natan); E. Rozenberg (Eyal); O. Rosenberg (Ofer); Huawei Research

2016-01-01

textabstractExisting work on accelerating analytic DB query processing with (discrete) GPUs fails to fully realize their potential for speedup through parallelism: Published results do not achieve significant speedup over more performant CPU-only DBMSes when processing complete queries. This
JIST: Just-In-Time Scheduling Translation for Parallel Processors

Directory of Open Access Journals (Sweden)

Giovanni Agosta

2005-01-01

Full Text Available The application fields of bytecode virtual machines and VLIW processors overlap in the area of embedded and mobile systems, where the two technologies offer different benefits, namely high code portability, low power consumption and reduced hardware cost. Dynamic compilation makes it possible to bridge the gap between the two technologies, but special attention must be paid to software instruction scheduling, a must for the VLIW architectures. We have implemented JIST, a Virtual Machine and JIT compiler for Java Bytecode targeted to a VLIW processor. We show the impact of various optimizations on the performance of code compiled with JIST through the experimental study on a set of benchmark programs. We report significant speedups, and increments in the number of instructions issued per cycle up to 50% with respect to the non-scheduling version of the JITcompiler. Further optimizations are discussed.
A Matter of Time: Faster Percolator Analysis via Efficient SVM Learning for Large-Scale Proteomics.

Science.gov (United States)

Halloran, John T; Rocke, David M

2018-05-04

Percolator is an important tool for greatly improving the results of a database search and subsequent downstream analysis. Using support vector machines (SVMs), Percolator recalibrates peptide-spectrum matches based on the learned decision boundary between targets and decoys. To improve analysis time for large-scale data sets, we update Percolator's SVM learning engine through software and algorithmic optimizations rather than heuristic approaches that necessitate the careful study of their impact on learned parameters across different search settings and data sets. We show that by optimizing Percolator's original learning algorithm, l 2 -SVM-MFN, large-scale SVM learning requires nearly only a third of the original runtime. Furthermore, we show that by employing the widely used Trust Region Newton (TRON) algorithm instead of l 2 -SVM-MFN, large-scale Percolator SVM learning is reduced to nearly only a fifth of the original runtime. Importantly, these speedups only affect the speed at which Percolator converges to a global solution and do not alter recalibration performance. The upgraded versions of both l 2 -SVM-MFN and TRON are optimized within the Percolator codebase for multithreaded and single-thread use and are available under Apache license at bitbucket.org/jthalloran/percolator_upgrade .
Effects of hyperbolic rotation in Minkowski space on the modeling of plasma accelerators in a Lorentz boosted frame

International Nuclear Information System (INIS)

Vay, J.-L.; Geddes, C. G. R.; Cormier-Michel, E.; Grote, D. P.

2011-01-01

The effects of hyperbolic rotation in Minkowski space resulting from the use of Lorentz boosted frames of calculation on laser propagation in plasmas are analyzed. Selection of a boost frame at the laser group velocity is shown to alter the laser spectrum, allowing the use of higher boost velocities. The technique is applied to simulations of laser driven plasma wakefield accelerators, which promise much smaller machines and whose development requires detailed simulations that challenge or exceed current capabilities. Speedups approaching the theoretical optima are demonstrated, producing the first direct simulations of stages up to 1 TeV. This is made possible by a million times speedup thanks to a frame boost with a relativistic factor γ b as high as 1300, taking advantage of the rotation to mitigate an instability that limited previous work.
Parallel iterative solvers and preconditioners using approximate hierarchical methods

Energy Technology Data Exchange (ETDEWEB)

Grama, A.; Kumar, V.; Sameh, A. [Univ. of Minnesota, Minneapolis, MN (United States)

1996-12-31

In this paper, we report results of the performance, convergence, and accuracy of a parallel GMRES solver for Boundary Element Methods. The solver uses a hierarchical approximate matrix-vector product based on a hybrid Barnes-Hut / Fast Multipole Method. We study the impact of various accuracy parameters on the convergence and show that with minimal loss in accuracy, our solver yields significant speedups. We demonstrate the excellent parallel efficiency and scalability of our solver. The combined speedups from approximation and parallelism represent an improvement of several orders in solution time. We also develop fast and paralellizable preconditioners for this problem. We report on the performance of an inner-outer scheme and a preconditioner based on truncated Green`s function. Experimental results on a 256 processor Cray T3D are presented.
On developing B-spline registration algorithms for multi-core processors

International Nuclear Information System (INIS)

Shackleford, J A; Kandasamy, N; Sharp, G C

2010-01-01

Spline-based deformable registration methods are quite popular within the medical-imaging community due to their flexibility and robustness. However, they require a large amount of computing time to obtain adequate results. This paper makes two contributions towards accelerating B-spline-based registration. First, we propose a grid-alignment scheme and associated data structures that greatly reduce the complexity of the registration algorithm. Based on this grid-alignment scheme, we then develop highly data parallel designs for B-spline registration within the stream-processing model, suitable for implementation on multi-core processors such as graphics processing units (GPUs). Particular attention is focused on an optimal method for performing analytic gradient computations in a data parallel fashion. CPU and GPU versions are validated for execution time and registration quality. Performance results on large images show that our GPU algorithm achieves a speedup of 15 times over the single-threaded CPU implementation whereas our multi-core CPU algorithm achieves a speedup of 8 times over the single-threaded implementation. The CPU and GPU versions achieve near-identical registration quality in terms of RMS differences between the generated vector fields.
Exploiting MIC architectures for the simulation of channeling of charged particles in crystals

Science.gov (United States)

Bagli, Enrico; Karpusenko, Vadim

2016-08-01

Coherent effects of ultra-relativistic particles in crystals is an area of science under development. DYNECHARM + + is a toolkit for the simulation of coherent interactions between high-energy charged particles and complex crystal structures. The particle trajectory in a crystal is computed through numerical integration of the equation of motion. The code was revised and improved in order to exploit parallelization on multi-cores and vectorization of single instructions on multiple data. An Intel Xeon Phi card was adopted for the performance measurements. The computation time was proved to scale linearly as a function of the number of physical and virtual cores. By enabling the auto-vectorization flag of the compiler a three time speedup was obtained. The performances of the card were compared to the Dual Xeon ones.
Coverage-maximization in networks under resource constraints.

Science.gov (United States)

Nandi, Subrata; Brusch, Lutz; Deutsch, Andreas; Ganguly, Niloy

2010-06-01

Efficient coverage algorithms are essential for information search or dispersal in all kinds of networks. We define an extended coverage problem which accounts for constrained resources of consumed bandwidth B and time T . Our solution to the network challenge is here studied for regular grids only. Using methods from statistical mechanics, we develop a coverage algorithm with proliferating message packets and temporally modulated proliferation rate. The algorithm performs as efficiently as a single random walker but O(B(d-2)/d) times faster, resulting in significant service speed-up on a regular grid of dimension d . The algorithm is numerically compared to a class of generalized proliferating random walk strategies and on regular grids shown to perform best in terms of the product metric of speed and efficiency.
Examining the Internal Validity and Statistical Precision of the Comparative Interrupted Time Series Design by Comparison with a Randomized Experiment

Science.gov (United States)

St.Clair, Travis; Cook, Thomas D.; Hallberg, Kelly

2014-01-01

Although evaluators often use an interrupted time series (ITS) design to test hypotheses about program effects, there are few empirical tests of the design's validity. We take a randomized experiment on an educational topic and compare its effects to those from a comparative ITS (CITS) design that uses the same treatment group as the experiment…
Fast solution of neutron diffusion problem by reduced basis finite element method

International Nuclear Information System (INIS)

Chunyu, Zhang; Gong, Chen

2018-01-01

Highlights: •An extremely efficient method is proposed to solve the neutron diffusion equation with varying the cross sections. •Three orders of speedup is achieved for IAEA benchmark problems. •The method may open a new possibility of efficient high-fidelity modeling of large scale problems in nuclear engineering. -- Abstract: For the important applications which need carry out many times of neutron diffusion calculations such as the fuel depletion analysis and the neutronics-thermohydraulics coupling analysis, fast and accurate solutions of the neutron diffusion equation are demanding but necessary. In the present work, the certified reduced basis finite element method is proposed and implemented to solve the generalized eigenvalue problems of neutron diffusion with variable cross sections. The order reduced model is built upon high-fidelity finite element approximations during the offline stage. During the online stage, both the k eff and the spatical distribution of neutron flux can be obtained very efficiently for any given set of cross sections. Numerical tests show that a speedup of around 1100 is achieved for the IAEA two-dimensional PWR benchmark problem and a speedup of around 3400 is achieved for the three-dimensional counterpart with the fission cross-sections, the absorption cross-sections and the scattering cross-sections treated as parameters.
Parallel computing of physical maps--a comparative study in SIMD and MIMD parallelism.

Science.gov (United States)

Bhandarkar, S M; Chirravuri, S; Arnold, J

1996-01-01

Ordering clones from a genomic library into physical maps of whole chromosomes presents a central computational problem in genetics. Chromosome reconstruction via clone ordering is usually isomorphic to the NP-complete Optimal Linear Arrangement problem. Parallel SIMD and MIMD algorithms for simulated annealing based on Markov chain distribution are proposed and applied to the problem of chromosome reconstruction via clone ordering. Perturbation methods and problem-specific annealing heuristics are proposed and described. The SIMD algorithms are implemented on a 2048 processor MasPar MP-2 system which is an SIMD 2-D toroidal mesh architecture whereas the MIMD algorithms are implemented on an 8 processor Intel iPSC/860 which is an MIMD hypercube architecture. A comparative analysis of the various SIMD and MIMD algorithms is presented in which the convergence, speedup, and scalability characteristics of the various algorithms are analyzed and discussed. On a fine-grained, massively parallel SIMD architecture with a low synchronization overhead such as the MasPar MP-2, a parallel simulated annealing algorithm based on multiple periodically interacting searches performs the best. For a coarse-grained MIMD architecture with high synchronization overhead such as the Intel iPSC/860, a parallel simulated annealing algorithm based on multiple independent searches yields the best results. In either case, distribution of clonal data across multiple processors is shown to exacerbate the tendency of the parallel simulated annealing algorithm to get trapped in a local optimum.
GPU-accelerated atmospheric chemical kinetics in the ECHAM/MESSy (EMAC Earth system model (version 2.52

Directory of Open Access Journals (Sweden)

M. Alvanos

2017-10-01

Full Text Available This paper presents an application of GPU accelerators in Earth system modeling. We focus on atmospheric chemical kinetics, one of the most computationally intensive tasks in climate–chemistry model simulations. We developed a software package that automatically generates CUDA kernels to numerically integrate atmospheric chemical kinetics in the global climate model ECHAM/MESSy Atmospheric Chemistry (EMAC, used to study climate change and air quality scenarios. A source-to-source compiler outputs a CUDA-compatible kernel by parsing the FORTRAN code generated by the Kinetic PreProcessor (KPP general analysis tool. All Rosenbrock methods that are available in the KPP numerical library are supported.Performance evaluation, using Fermi and Pascal CUDA-enabled GPU accelerators, shows achieved speed-ups of 4. 5 × and 20. 4 × , respectively, of the kernel execution time. A node-to-node real-world production performance comparison shows a 1. 75 × speed-up over the non-accelerated application using the KPP three-stage Rosenbrock solver. We provide a detailed description of the code optimizations used to improve the performance including memory optimizations, control code simplification, and reduction of idle time. The accuracy and correctness of the accelerated implementation are evaluated by comparing to the CPU-only code of the application. The median relative difference is found to be less than 0.000000001 % when comparing the output of the accelerated kernel the CPU-only code.The approach followed, including the computational workload division, and the developed GPU solver code can potentially be used as the basis for hardware acceleration of numerous geoscientific models that rely on KPP for atmospheric chemical kinetics applications.
GPU-based streaming architectures for fast cone-beam CT image reconstruction and demons deformable registration

International Nuclear Information System (INIS)

Sharp, G C; Kandasamy, N; Singh, H; Folkert, M

2007-01-01

This paper shows how to significantly accelerate cone-beam CT reconstruction and 3D deformable image registration using the stream-processing model. We describe data-parallel designs for the Feldkamp, Davis and Kress (FDK) reconstruction algorithm, and the demons deformable registration algorithm, suitable for use on a commodity graphics processing unit. The streaming versions of these algorithms are implemented using the Brook programming environment and executed on an NVidia 8800 GPU. Performance results using CT data of a preserved swine lung indicate that the GPU-based implementations of the FDK and demons algorithms achieve a substantial speedup-up to 80 times for FDK and 70 times for demons when compared to an optimized reference implementation on a 2.8 GHz Intel processor. In addition, the accuracy of the GPU-based implementations was found to be excellent. Compared with CPU-based implementations, the RMS differences were less than 0.1 Hounsfield unit for reconstruction and less than 0.1 mm for deformable registration
Long-time atomistic simulations with the Parallel Replica Dynamics method

Science.gov (United States)

Perez, Danny

Molecular Dynamics (MD) -- the numerical integration of atomistic equations of motion -- is a workhorse of computational materials science. Indeed, MD can in principle be used to obtain any thermodynamic or kinetic quantity, without introducing any approximation or assumptions beyond the adequacy of the interaction potential. It is therefore an extremely powerful and flexible tool to study materials with atomistic spatio-temporal resolution. These enviable qualities however come at a steep computational price, hence limiting the system sizes and simulation times that can be achieved in practice. While the size limitation can be efficiently addressed with massively parallel implementations of MD based on spatial decomposition strategies, allowing for the simulation of trillions of atoms, the same approach usually cannot extend the timescales much beyond microseconds. In this article, we discuss an alternative parallel-in-time approach, the Parallel Replica Dynamics (ParRep) method, that aims at addressing the timescale limitation of MD for systems that evolve through rare state-to-state transitions. We review the formal underpinnings of the method and demonstrate that it can provide arbitrarily accurate results for any definition of the states. When an adequate definition of the states is available, ParRep can simulate trajectories with a parallel speedup approaching the number of replicas used. We demonstrate the usefulness of ParRep by presenting different examples of materials simulations where access to long timescales was essential to access the physical regime of interest and discuss practical considerations that must be addressed to carry out these simulations. Work supported by the United States Department of Energy (U.S. DOE), Office of Science, Office of Basic Energy Sciences, Materials Sciences and Engineering Division.
GPU-accelerated 3D neutron diffusion code based on finite difference method

Energy Technology Data Exchange (ETDEWEB)

Xu, Q.; Yu, G.; Wang, K. [Dept. of Engineering Physics, Tsinghua Univ. (China)

2012-07-01

Finite difference method, as a traditional numerical solution to neutron diffusion equation, although considered simpler and more precise than the coarse mesh nodal methods, has a bottle neck to be widely applied caused by the huge memory and unendurable computation time it requires. In recent years, the concept of General-Purpose computation on GPUs has provided us with a powerful computational engine for scientific research. In this study, a GPU-Accelerated multi-group 3D neutron diffusion code based on finite difference method was developed. First, a clean-sheet neutron diffusion code (3DFD-CPU) was written in C++ on the CPU architecture, and later ported to GPUs under NVIDIA's CUDA platform (3DFD-GPU). The IAEA 3D PWR benchmark problem was calculated in the numerical test, where three different codes, including the original CPU-based sequential code, the HYPRE (High Performance Pre-conditioners)-based diffusion code and CITATION, were used as counterpoints to test the efficiency and accuracy of the GPU-based program. The results demonstrate both high efficiency and adequate accuracy of the GPU implementation for neutron diffusion equation. A speedup factor of about 46 times was obtained, using NVIDIA's Geforce GTX470 GPU card against a 2.50 GHz Intel Quad Q9300 CPU processor. Compared with the HYPRE-based code performing in parallel on an 8-core tower server, the speedup of about 2 still could be observed. More encouragingly, without any mathematical acceleration technology, the GPU implementation ran about 5 times faster than CITATION which was speeded up by using the SOR method and Chebyshev extrapolation technique. (authors)

GPU-accelerated 3D neutron diffusion code based on finite difference method

International Nuclear Information System (INIS)

Xu, Q.; Yu, G.; Wang, K.

2012-01-01

Finite difference method, as a traditional numerical solution to neutron diffusion equation, although considered simpler and more precise than the coarse mesh nodal methods, has a bottle neck to be widely applied caused by the huge memory and unendurable computation time it requires. In recent years, the concept of General-Purpose computation on GPUs has provided us with a powerful computational engine for scientific research. In this study, a GPU-Accelerated multi-group 3D neutron diffusion code based on finite difference method was developed. First, a clean-sheet neutron diffusion code (3DFD-CPU) was written in C++ on the CPU architecture, and later ported to GPUs under NVIDIA's CUDA platform (3DFD-GPU). The IAEA 3D PWR benchmark problem was calculated in the numerical test, where three different codes, including the original CPU-based sequential code, the HYPRE (High Performance Pre-conditioners)-based diffusion code and CITATION, were used as counterpoints to test the efficiency and accuracy of the GPU-based program. The results demonstrate both high efficiency and adequate accuracy of the GPU implementation for neutron diffusion equation. A speedup factor of about 46 times was obtained, using NVIDIA's Geforce GTX470 GPU card against a 2.50 GHz Intel Quad Q9300 CPU processor. Compared with the HYPRE-based code performing in parallel on an 8-core tower server, the speedup of about 2 still could be observed. More encouragingly, without any mathematical acceleration technology, the GPU implementation ran about 5 times faster than CITATION which was speeded up by using the SOR method and Chebyshev extrapolation technique. (authors)
Comparative study of the blinking time between young adult and adult video display terminal users in indoor environment.

Science.gov (United States)

Schaefer, Tânia Mara Cunha; Schaefer, Arthur Rubens Cunha; Abib, Fernando Cesar; José, Newton Kara

2009-01-01

Investigate the average blinking time in conversation and in Video Display Terminal use of young adults and adults in the presbyopic age group. A transversal analytical study in a readily accessible sample consisting of Volkswagen do Brasil - Curitiba, Paraná employees was performed. The cohort group consisted of 108 subjects divided into two age groups: Group 1, the young adult group (age range 20-39): 77 employees, mean age of 30.09 +/- 5.09; Group 2, the presbyopic adult group, (age range 40-53): 31 employees, mean age of 44.17 +/- 3. Subjects under 18 years of age, with a history of ocular disorders, contact lens wearers and computer non-users were excluded. The subjects had their faces filmed for 10 minutes in conversation and VDT reading. Student's t-test was used and the statistical significance level was 95%. The average time between blinks in Group 1 for conversation and VDT reading was 5.16 +/- 1.83 and 10.42 +/- 7.78 seconds, respectively; in Group 2. 4,9 +/- 1.49 and 10.46 +/- 5.54 seconds. In both age groups, the time between blinks in VDT reading situations was higher (pgroups were compared (p>0.05). There was an increase in the blinking time between young adults and the presbyopic group in VDT use situations when compared with reading situations. The difference in the blinking frequency between young adults and the presbyopic group in VDT use and reading situations was not statistically significant.
High Performance Biological Pairwise Sequence Alignment: FPGA versus GPU versus Cell BE versus GPP

Directory of Open Access Journals (Sweden)

Khaled Benkrid

2012-01-01

Full Text Available This paper explores the pros and cons of reconfigurable computing in the form of FPGAs for high performance efficient computing. In particular, the paper presents the results of a comparative study between three different acceleration technologies, namely, Field Programmable Gate Arrays (FPGAs, Graphics Processor Units (GPUs, and IBM’s Cell Broadband Engine (Cell BE, in the design and implementation of the widely-used Smith-Waterman pairwise sequence alignment algorithm, with general purpose processors as a base reference implementation. Comparison criteria include speed, energy consumption, and purchase and development costs. The study shows that FPGAs largely outperform all other implementation platforms on performance per watt criterion and perform better than all other platforms on performance per dollar criterion, although by a much smaller margin. Cell BE and GPU come second and third, respectively, on both performance per watt and performance per dollar criteria. In general, in order to outperform other technologies on performance per dollar criterion (using currently available hardware and development tools, FPGAs need to achieve at least two orders of magnitude speed-up compared to general-purpose processors and one order of magnitude speed-up compared to domain-specific technologies such as GPUs.
Analysis of impact of general-purpose graphics processor units in supersonic flow modeling

Science.gov (United States)

Emelyanov, V. N.; Karpenko, A. G.; Kozelkov, A. S.; Teterina, I. V.; Volkov, K. N.; Yalozo, A. V.

2017-06-01

Computational methods are widely used in prediction of complex flowfields associated with off-normal situations in aerospace engineering. Modern graphics processing units (GPU) provide architectures and new programming models that enable to harness their large processing power and to design computational fluid dynamics (CFD) simulations at both high performance and low cost. Possibilities of the use of GPUs for the simulation of external and internal flows on unstructured meshes are discussed. The finite volume method is applied to solve three-dimensional unsteady compressible Euler and Navier-Stokes equations on unstructured meshes with high resolution numerical schemes. CUDA technology is used for programming implementation of parallel computational algorithms. Solutions of some benchmark test cases on GPUs are reported, and the results computed are compared with experimental and computational data. Approaches to optimization of the CFD code related to the use of different types of memory are considered. Speedup of solution on GPUs with respect to the solution on central processor unit (CPU) is compared. Performance measurements show that numerical schemes developed achieve 20-50 speedup on GPU hardware compared to CPU reference implementation. The results obtained provide promising perspective for designing a GPU-based software framework for applications in CFD.
A Comparative Analysis of Competency Frameworks for Youth Workers in the Out-of-School Time Field

OpenAIRE

Vance, Femi

2010-01-01

Research suggests that the quality of out-of-school time (OST) programs is related to positive youth outcomes and skilled staff are a critical component of high quality programming. This descriptive case study of competency frameworks for youth workers in the OST field demonstrates how experts and practitioners characterize a skilled youth worker. A comparative analysis of 11 competency frameworks is conducted to identify a set of common core competencies. A set of 12 competency areas that ar...
Comparative study of internet cloud and cloudlet over wireless mesh networks for real-time applications

Science.gov (United States)

Khan, Kashif A.; Wang, Qi; Luo, Chunbo; Wang, Xinheng; Grecos, Christos

2014-05-01

Mobile cloud computing is receiving world-wide momentum for ubiquitous on-demand cloud services for mobile users provided by Amazon, Google etc. with low capital cost. However, Internet-centric clouds introduce wide area network (WAN) delays that are often intolerable for real-time applications such as video streaming. One promising approach to addressing this challenge is to deploy decentralized mini-cloud facility known as cloudlets to enable localized cloud services. When supported by local wireless connectivity, a wireless cloudlet is expected to offer low cost and high performance cloud services for the users. In this work, we implement a realistic framework that comprises both a popular Internet cloud (Amazon Cloud) and a real-world cloudlet (based on Ubuntu Enterprise Cloud (UEC)) for mobile cloud users in a wireless mesh network. We focus on real-time video streaming over the HTTP standard and implement a typical application. We further perform a comprehensive comparative analysis and empirical evaluation of the application's performance when it is delivered over the Internet cloud and the cloudlet respectively. The study quantifies the influence of the two different cloud networking architectures on supporting real-time video streaming. We also enable movement of the users in the wireless mesh network and investigate the effect of user's mobility on mobile cloud computing over the cloudlet and Amazon cloud respectively. Our experimental results demonstrate the advantages of the cloudlet paradigm over its Internet cloud counterpart in supporting the quality of service of real-time applications.
Measurement of the ecological flow of the Acaponeta river, Nayarit, comparing different time intervals

Directory of Open Access Journals (Sweden)

Guadalupe de la Lanza Espino

2012-07-01

Full Text Available The diverse management of river water in Mexico has been unequal due to the different anthropological activities, and it is associated with inter-annual changes in the climate and runoff patterns, leading to a loss of the ecosystem integrity. However, nowadays there are different methods to assess the water volume that is necessary to conserve the environment, among which are hydrological methods, such as those applied here, that are based on information on water volumes recorded over decades, which are not always available in the country. For this reason, this study compares runoff records for different time ranges: minimum of 10 years, medium of 20 years, and more than 50 years, to quantify the environmental flow. These time intervals provided similar results, which mean that not only for the Acaponeta river, but possibly for others lotic systems as well, a 10-year interval may be used satisfactorily. In this river, the runoff water that must be kept for environmental purposes is: for 10 years 70.1%, for 20 years 78.1% and for >50 years 68.8%, with an average of 72.3% of the total water volume or of the average annual runoff.
High speed finite element simulations on the graphics card

Energy Technology Data Exchange (ETDEWEB)

Huthwaite, P.; Lowe, M. J. S. [Department of Mechanical Engineering, Imperial College, London, SW7 2AZ (United Kingdom)

2014-02-18

A software package is developed to perform explicit time domain finite element simulations of ultrasonic propagation on the graphical processing unit, using Nvidia’s CUDA. Of critical importance for this problem is the arrangement of nodes in memory, allowing data to be loaded efficiently and minimising communication between the independently executed blocks of threads. The initial stage of memory arrangement is partitioning the mesh; both a well established ‘greedy’ partitioner and a new, more efficient ‘aligned’ partitioner are investigated. A method is then developed to efficiently arrange the memory within each partition. The technique is compared to a commercial CPU equivalent, demonstrating an overall speedup of at least 100 for a non-destructive testing weld model.
High speed finite element simulations on the graphics card

International Nuclear Information System (INIS)

Huthwaite, P.; Lowe, M. J. S.

2014-01-01

A software package is developed to perform explicit time domain finite element simulations of ultrasonic propagation on the graphical processing unit, using Nvidia’s CUDA. Of critical importance for this problem is the arrangement of nodes in memory, allowing data to be loaded efficiently and minimising communication between the independently executed blocks of threads. The initial stage of memory arrangement is partitioning the mesh; both a well established ‘greedy’ partitioner and a new, more efficient ‘aligned’ partitioner are investigated. A method is then developed to efficiently arrange the memory within each partition. The technique is compared to a commercial CPU equivalent, demonstrating an overall speedup of at least 100 for a non-destructive testing weld model
Comparative mapping reveals quantitative trait loci that affect spawning time in coho salmon (Oncorhynchus kisutch

Directory of Open Access Journals (Sweden)

Cristian Araneda

2012-01-01

Full Text Available Spawning time in salmonids is a sex-limited quantitative trait that can be modified by selection. In rainbow trout (Oncorhynchus mykiss, various quantitative trait loci (QTL that affect the expression of this trait have been discovered. In this study, we describe four microsatellite loci associated with two possible spawning time QTL regions in coho salmon (Oncorhynchus kisutch. The four loci were identified in females from two populations (early and late spawners produced by divergent selection from the same base population. Three of the loci (OmyFGT34TUF, One2ASC and One19ASC that were strongly associated with spawning time in coho salmon (p < 0.0002 were previously associated with QTL for the same trait in rainbow trout; a fourth loci (Oki10 with a suggestive association (p = 0.00035 mapped 10 cM from locus OmyFGT34TUF in rainbow trout. The changes in allelic frequency observed after three generations of selection were greater than expected because of genetic drift. This work shows that comparing information from closely-related species is a valid strategy for identifying QTLs for marker-assisted selection in species whose genomes are poorly characterized or lack a saturated genetic map.
The effect of charging time on the comparative environmental performance of different vehicle types

International Nuclear Information System (INIS)

Crossin, Enda; Doherty, Peter J.B.

2016-01-01

Highlights: • The environmental performance of a PHEV and equivalent ICE were analysed using LCA. • Charging behaviour and electricity profiles of Australia’s NEM grid were included. • A methodology to model the marginal electricity supply mix was developed. • PHEVs charged from the NEM present greenhouse gas benefits. • Burden shifts towards other environmental indicators may occur, but are uncertain. - Abstract: This study combines electricity supply mix profiles and observed charging behaviour to compare the environmental performance of a petrol-hybrid electric vehicle (PHEV) with a class-equivalent internal combustion engine (ICE) vehicle over the full life cycle. Environmental performance is compared using a suite of indicators across the life cycle, accounting for both marginal and average electricity supply mixes for Australia’s National Energy Market (NEM) grid. The use of average emission factors for the NEM grid can serve as a good proxy for accounting charging behaviour, provided that there is a strong correlation between the time of charging and total electricity demand. Compared with an equivalent ICE, PHEVs charged from Australia’s NEM can reduce greenhouse gas emissions over the life cycle. Potential burden shifts towards acidification, eutrophication and human toxicity impacts may occur, but these impacts are uncertain due to modelling limitations. This study has the potential to inform both short and long term forecasts of the environmental impacts associated with EV deployment in Australia and provides a better understanding the temporal variations in emissions associated with electricity use in the short term.
MUMAX: A new high-performance micromagnetic simulation tool

International Nuclear Information System (INIS)

Vansteenkiste, A.; Van de Wiele, B.

2011-01-01

We present MUMAX, a general-purpose micromagnetic simulation tool running on graphical processing units (GPUs). MUMAX is designed for high-performance computations and specifically targets large simulations. In that case speedups of over a factor 100 x can be obtained compared to the CPU-based OOMMF program developed at NIST. MUMAX aims to be general and broadly applicable. It solves the classical Landau-Lifshitz equation taking into account the magnetostatic, exchange and anisotropy interactions, thermal effects and spin-transfer torque. Periodic boundary conditions can optionally be imposed. A spatial discretization using finite differences in two or three dimensions can be employed. MUMAX is publicly available as open-source software. It can thus be freely used and extended by community. Due to its high computational performance, MUMAX should open up the possibility of running extensive simulations that would be nearly inaccessible with typical CPU-based simulators. - Highlights: → Novel, open-source micromagnetic simulator on GPU hardware. → Speedup of ∝100x compared to other widely used tools. → Extensively validated against standard problems. → Makes previously infeasible simulations accessible.
Use of general purpose graphics processing units with MODFLOW

Science.gov (United States)

Hughes, Joseph D.; White, Jeremy T.

2013-01-01

To evaluate the use of general-purpose graphics processing units (GPGPUs) to improve the performance of MODFLOW, an unstructured preconditioned conjugate gradient (UPCG) solver has been developed. The UPCG solver uses a compressed sparse row storage scheme and includes Jacobi, zero fill-in incomplete, and modified-incomplete lower-upper (LU) factorization, and generalized least-squares polynomial preconditioners. The UPCG solver also includes options for sequential and parallel solution on the central processing unit (CPU) using OpenMP. For simulations utilizing the GPGPU, all basic linear algebra operations are performed on the GPGPU; memory copies between the central processing unit CPU and GPCPU occur prior to the first iteration of the UPCG solver and after satisfying head and flow criteria or exceeding a maximum number of iterations. The efficiency of the UPCG solver for GPGPU and CPU solutions is benchmarked using simulations of a synthetic, heterogeneous unconfined aquifer with tens of thousands to millions of active grid cells. Testing indicates GPGPU speedups on the order of 2 to 8, relative to the standard MODFLOW preconditioned conjugate gradient (PCG) solver, can be achieved when (1) memory copies between the CPU and GPGPU are optimized, (2) the percentage of time performing memory copies between the CPU and GPGPU is small relative to the calculation time, (3) high-performance GPGPU cards are utilized, and (4) CPU-GPGPU combinations are used to execute sequential operations that are difficult to parallelize. Furthermore, UPCG solver testing indicates GPGPU speedups exceed parallel CPU speedups achieved using OpenMP on multicore CPUs for preconditioners that can be easily parallelized.
Efficient sequential and parallel algorithms for record linkage.

Science.gov (United States)

Mamun, Abdullah-Al; Mi, Tian; Aseltine, Robert; Rajasekaran, Sanguthevar

2014-01-01

Integrating data from multiple sources is a crucial and challenging problem. Even though there exist numerous algorithms for record linkage or deduplication, they suffer from either large time needs or restrictions on the number of datasets that they can integrate. In this paper we report efficient sequential and parallel algorithms for record linkage which handle any number of datasets and outperform previous algorithms. Our algorithms employ hierarchical clustering algorithms as the basis. A key idea that we use is radix sorting on certain attributes to eliminate identical records before any further processing. Another novel idea is to form a graph that links similar records and find the connected components. Our sequential and parallel algorithms have been tested on a real dataset of 1,083,878 records and synthetic datasets ranging in size from 50,000 to 9,000,000 records. Our sequential algorithm runs at least two times faster, for any dataset, than the previous best-known algorithm, the two-phase algorithm using faster computation of the edit distance (TPA (FCED)). The speedups obtained by our parallel algorithm are almost linear. For example, we get a speedup of 7.5 with 8 cores (residing in a single node), 14.1 with 16 cores (residing in two nodes), and 26.4 with 32 cores (residing in four nodes). We have compared the performance of our sequential algorithm with TPA (FCED) and found that our algorithm outperforms the previous one. The accuracy is the same as that of this previous best-known algorithm.
A GPU-accelerated semi-implicit fractional-step method for numerical solutions of incompressible Navier-Stokes equations

Science.gov (United States)

Ha, Sanghyun; Park, Junshin; You, Donghyun

2018-01-01

Utility of the computational power of Graphics Processing Units (GPUs) is elaborated for solutions of incompressible Navier-Stokes equations which are integrated using a semi-implicit fractional-step method. The Alternating Direction Implicit (ADI) and the Fourier-transform-based direct solution methods used in the semi-implicit fractional-step method take advantage of multiple tridiagonal matrices whose inversion is known as the major bottleneck for acceleration on a typical multi-core machine. A novel implementation of the semi-implicit fractional-step method designed for GPU acceleration of the incompressible Navier-Stokes equations is presented. Aspects of the programing model of Compute Unified Device Architecture (CUDA), which are critical to the bandwidth-bound nature of the present method are discussed in detail. A data layout for efficient use of CUDA libraries is proposed for acceleration of tridiagonal matrix inversion and fast Fourier transform. OpenMP is employed for concurrent collection of turbulence statistics on a CPU while the Navier-Stokes equations are computed on a GPU. Performance of the present method using CUDA is assessed by comparing the speed of solving three tridiagonal matrices using ADI with the speed of solving one heptadiagonal matrix using a conjugate gradient method. An overall speedup of 20 times is achieved using a Tesla K40 GPU in comparison with a single-core Xeon E5-2660 v3 CPU in simulations of turbulent boundary-layer flow over a flat plate conducted on over 134 million grids. Enhanced performance of 48 times speedup is reached for the same problem using a Tesla P100 GPU.
Stochastic coalescence in finite systems: an algorithm for the numerical solution of the multivariate master equation.

Science.gov (United States)

Alfonso, Lester; Zamora, Jose; Cruz, Pedro

2015-04-01

The stochastic approach to coagulation considers the coalescence process going in a system of a finite number of particles enclosed in a finite volume. Within this approach, the full description of the system can be obtained from the solution of the multivariate master equation, which models the evolution of the probability distribution of the state vector for the number of particles of a given mass. Unfortunately, due to its complexity, only limited results were obtained for certain type of kernels and monodisperse initial conditions. In this work, a novel numerical algorithm for the solution of the multivariate master equation for stochastic coalescence that works for any type of kernels and initial conditions is introduced. The performance of the method was checked by comparing the numerically calculated particle mass spectrum with analytical solutions obtained for the constant and sum kernels, with an excellent correspondence between the analytical and numerical solutions. In order to increase the speedup of the algorithm, software parallelization techniques with OpenMP standard were used, along with an implementation in order to take advantage of new accelerator technologies. Simulations results show an important speedup of the parallelized algorithms. This study was funded by a grant from Consejo Nacional de Ciencia y Tecnologia de Mexico SEP-CONACYT CB-131879. The authors also thanks LUFAC® Computacion SA de CV for CPU time and all the support provided.
A comparative analysis of spectral exponent estimation techniques for 1/f(β) processes with applications to the analysis of stride interval time series.

Science.gov (United States)

Schaefer, Alexander; Brach, Jennifer S; Perera, Subashan; Sejdić, Ervin

2014-01-30

The time evolution and complex interactions of many nonlinear systems, such as in the human body, result in fractal types of parameter outcomes that exhibit self similarity over long time scales by a power law in the frequency spectrum S(f)=1/f(β). The scaling exponent β is thus often interpreted as a "biomarker" of relative health and decline. This paper presents a thorough comparative numerical analysis of fractal characterization techniques with specific consideration given to experimentally measured gait stride interval time series. The ideal fractal signals generated in the numerical analysis are constrained under varying lengths and biases indicative of a range of physiologically conceivable fractal signals. This analysis is to complement previous investigations of fractal characteristics in healthy and pathological gait stride interval time series, with which this study is compared. The results of our analysis showed that the averaged wavelet coefficient method consistently yielded the most accurate results. Class dependent methods proved to be unsuitable for physiological time series. Detrended fluctuation analysis as most prevailing method in the literature exhibited large estimation variances. The comparative numerical analysis and experimental applications provide a thorough basis for determining an appropriate and robust method for measuring and comparing a physiologically meaningful biomarker, the spectral index β. In consideration of the constraints of application, we note the significant drawbacks of detrended fluctuation analysis and conclude that the averaged wavelet coefficient method can provide reasonable consistency and accuracy for characterizing these fractal time series. Copyright © 2013 Elsevier B.V. All rights reserved.
A comparative analysis of spectral exponent estimation techniques for 1/fβ processes with applications to the analysis of stride interval time series

Science.gov (United States)

Schaefer, Alexander; Brach, Jennifer S.; Perera, Subashan; Sejdić, Ervin

2013-01-01

Background The time evolution and complex interactions of many nonlinear systems, such as in the human body, result in fractal types of parameter outcomes that exhibit self similarity over long time scales by a power law in the frequency spectrum S(f) = 1/fβ. The scaling exponent β is thus often interpreted as a “biomarker” of relative health and decline. New Method This paper presents a thorough comparative numerical analysis of fractal characterization techniques with specific consideration given to experimentally measured gait stride interval time series. The ideal fractal signals generated in the numerical analysis are constrained under varying lengths and biases indicative of a range of physiologically conceivable fractal signals. This analysis is to complement previous investigations of fractal characteristics in healthy and pathological gait stride interval time series, with which this study is compared. Results The results of our analysis showed that the averaged wavelet coefficient method consistently yielded the most accurate results. Comparison with Existing Methods: Class dependent methods proved to be unsuitable for physiological time series. Detrended fluctuation analysis as most prevailing method in the literature exhibited large estimation variances. Conclusions The comparative numerical analysis and experimental applications provide a thorough basis for determining an appropriate and robust method for measuring and comparing a physiologically meaningful biomarker, the spectral index β. In consideration of the constraints of application, we note the significant drawbacks of detrended fluctuation analysis and conclude that the averaged wavelet coefficient method can provide reasonable consistency and accuracy for characterizing these fractal time series. PMID:24200509
A comparative study of boar semen extenders with different proposed preservation times and their effect on semen quality and fertility

Directory of Open Access Journals (Sweden)

Marina Anastasia Karageorgiou

2016-01-01

Full Text Available The present study compared the quality characteristics of boar semen diluted with three extenders of different proposed preservation times (short-term, medium-term and long-term. A part of extended semen was used for artificial insemination on the farm (30 sows/extender, while the remaining part was stored for three days (16–18 °C. Stored and used semen was also laboratory assessed at insemination time, on days 1 and 2 after the collection (day 0. The long-term extender was used for a short time, within 2 days from semen collection, with the aim to investigate a possible advantage over the others regarding laboratory or farm fertility indicators at the beginning of the preservation time. Viability, motility, kinetic indicators, morphology and DNA fragmentation were estimated. The results showed reduced viability, higher values for most of the kinetics, and higher immotile spermatozoa from day 1 to day 2 in all extenders; however, the long-term extender was superior compared to the other two on both days. With regard to morphology and chromatin integrity, the percentage of abnormal and fragmented spermatozoa increased on day 2 compared to day 1 for all of the extenders. However, based on the farrowing rate and the number of piglets born alive after the application of conventional artificial insemination within 2 days from semen collection/dilution, it was found that the medium-term diluents were more effective. In conclusion, it seems that the in vivo fertilization process involves more factors than simply the quality of laboratory evaluated sperm indicators, warranting further research.
Monte Carlo method for neutron transport calculations in graphics processing units (GPUs)

International Nuclear Information System (INIS)

Pellegrino, Esteban

2011-01-01

Monte Carlo simulation is well suited for solving the Boltzmann neutron transport equation in an inhomogeneous media for complicated geometries. However, routine applications require the computation time to be reduced to hours and even minutes in a desktop PC. The interest in adopting Graphics Processing Units (GPUs) for Monte Carlo acceleration is rapidly growing. This is due to the massive parallelism provided by the latest GPU technologies which is the most promising solution to the challenge of performing full-size reactor core analysis on a routine basis. In this study, Monte Carlo codes for a fixed-source neutron transport problem were developed for GPU environments in order to evaluate issues associated with computational speedup using GPUs. Results obtained in this work suggest that a speedup of several orders of magnitude is possible using the state-of-the-art GPU technologies. (author) [es

Reconstruction of Computerized Tomography Images on a Cell Broadband Engine using Ray based Interpolation

DEFF Research Database (Denmark)

Jørgensen, M. E.; Vinter, Brian

2009-01-01

. The implementation is tested on a Playstation 3 with 256MB memory and six available SPEs. The system is using Fedora Core 7 as OS. Tested on 1, 2, 3, 4 and 6 SPEs, the result is a parallel algorithm with near linear speedup. The results of the parallel algorithm is compared to a sequential implementation, executed...
Curvature Analysis of Cardiac Excitation Wavefronts

Science.gov (United States)

2013-04-01

reconstructed the isolines of 10,000 frames, both with PIRA and the contour function of MATLAB. The first took 1.65 seconds while the second took 720...seconds. Hence, PIRA had a 444.44-fold speedup compared to contour. The platform was equipped with Intel Core i7-930 2.8 GHz LGA 1,366 130W Quad-Core
Journal of Chemical Sciences | Indian Academy of Sciences

Indian Academy of Sciences (India)

Therefore, the speed-up compared to conventional EOM-MP2 method is more prominent in case of EA, EE and SF case where the storage bottleneck is significant than the EOM-IP-MP2 method, where thestorage requirement is significantly less. However, the RI/CD based EOM-IP-MP2 can be coupled with frozen natural ...
Physical Activity and Sedentary Time among Young Children in Full-Day Kindergarten: Comparing Traditional and Balanced Day Schedules

Science.gov (United States)

Vanderloo, Leigh M.; Tucker, Patricia

2017-01-01

Objective: To compare physical activity and sedentary time among young children whose schools adhere to traditional (i.e. three outdoor playtimes = 70 minutes) versus balanced day (i.e. two outdoor playtimes = ~55 minutes) schedules in Ontario full-day kindergarten classrooms. Design: The project was part of a larger, 2-year cross-sectional study.…
Waiting times for diagnosis and treatment of head and neck cancer in Denmark in 2010 compared to 1992 and 2002

DEFF Research Database (Denmark)

Lyhne, N M; Christensen, A; Alanin, M C

2013-01-01

BACKGROUND AND AIM: Significant tumour progression was observed during waiting time for treatment of head and neck cancer. To reduce waiting times, a Danish national policy of fast track accelerated clinical pathways was introduced in 2007. This study describes changes in waiting time and the pot......BACKGROUND AND AIM: Significant tumour progression was observed during waiting time for treatment of head and neck cancer. To reduce waiting times, a Danish national policy of fast track accelerated clinical pathways was introduced in 2007. This study describes changes in waiting time...... and the potential influence of fast track by comparing waiting times in 2010 to 2002 and 1992. METHODS: Charts of all new patients diagnosed with squamous cell carcinoma of the oral cavity, pharynx and larynx at the five Danish head and neck oncology centres from January to April 2010 (n=253) were reviewed...
Comparative Analysis of Several Real-Time Systems for Tracking People and/or Moving Objects using GPS

OpenAIRE

Radinski, Gligorcho; Mileva, Aleksandra

2015-01-01

When we talk about real-time systems for tracking people and/or moving objects using a Global Positioning System (GPS), there are several categories of such systems and the ways in which they work. Some uses additional hardware to extend the functionality of the offered opportunities, some are free, some are too complex and cost too much money. This paper aims to provide a clearer picture of several such systems and to show results from a comparative analysis of some popular systems for trac...
Comparative Time Course Profiles of Phthalate Stereoisomers in Mice

Science.gov (United States)

ABSTRACT More efficient models are needed to assess potential carcinogenicity hazard of environmental chemicals. Here we evaluated time course profiles for two reference phthalates, di(2-ethylhexyl) phthalate (DEHP) and its stereoisomer di-n-octyl phthalate (DNOP), to identify...
Optimization Strategies for Hardware-Based Cofactorization

Science.gov (United States)

Loebenberger, Daniel; Putzka, Jens

We use the specific structure of the inputs to the cofactorization step in the general number field sieve (GNFS) in order to optimize the runtime for the cofactorization step on a hardware cluster. An optimal distribution of bitlength-specific ECM modules is proposed and compared to existing ones. With our optimizations we obtain a speedup between 17% and 33% of the cofactorization step of the GNFS when compared to the runtime of an unoptimized cluster.
Application of the reduction of scale range in a Lorentz boosted frame to the numerical simulation of particle acceleration devices

International Nuclear Information System (INIS)

Vay, J.; Fawley, W.M.; Geddes, C.G.; Cormier-Michel, E.; Grote, D.P.

2009-01-01

It has been shown that the ratio of longest to shortest space and time scales of a system of two or more components crossing at relativistic velocities is not invariant under Lorentz transformation. This implies the existence of a frame of reference minimizing an aggregate measure of the ratio of space and time scales. It was demonstrated that this translated into a reduction by orders of magnitude in computer simulation run times, using methods based on first principles (e.g., Particle-In-Cell), for particle acceleration devices and for problems such as: free electron laser, laser-plasma accelerator, and particle beams interacting with electron clouds. Since then, speed-ups ranging from 75 to more than four orders of magnitude have been reported for the simulation of either scaled or reduced models of the above-cited problems. In it was shown that to achieve full benefits of the calculation in a boosted frame, some of the standard numerical techniques needed to be revised. The theory behind the speed-up of numerical simulation in a boosted frame, latest developments of numerical methods, and example applications with new opportunities that they offer are all presented
Space Object Collision Probability via Monte Carlo on the Graphics Processing Unit

Science.gov (United States)

Vittaldev, Vivek; Russell, Ryan P.

2017-09-01

Fast and accurate collision probability computations are essential for protecting space assets. Monte Carlo (MC) simulation is the most accurate but computationally intensive method. A Graphics Processing Unit (GPU) is used to parallelize the computation and reduce the overall runtime. Using MC techniques to compute the collision probability is common in literature as the benchmark. An optimized implementation on the GPU, however, is a challenging problem and is the main focus of the current work. The MC simulation takes samples from the uncertainty distributions of the Resident Space Objects (RSOs) at any time during a time window of interest and outputs the separations at closest approach. Therefore, any uncertainty propagation method may be used and the collision probability is automatically computed as a function of RSO collision radii. Integration using a fixed time step and a quartic interpolation after every Runge Kutta step ensures that no close approaches are missed. Two orders of magnitude speedups over a serial CPU implementation are shown, and speedups improve moderately with higher fidelity dynamics. The tool makes the MC approach tractable on a single workstation, and can be used as a final product, or for verifying surrogate and analytical collision probability methods.
gSLICr: SLIC superpixels at over 250Hz

OpenAIRE

Ren, Carl Yuheng; Prisacariu, Victor Adrian; Reid, Ian D

2015-01-01

We introduce a parallel GPU implementation of the Simple Linear Iterative Clustering (SLIC) superpixel segmentation. Using a single graphic card, our implementation achieves speedups of up to $83\\times$ from the standard sequential implementation. Our implementation is fully compatible with the standard sequential implementation and the software is now available online and is open source.
Multicore and GPU algorithms for Nussinov RNA folding

Science.gov (United States)

2014-01-01

Background One segment of a RNA sequence might be paired with another segment of the same RNA sequence due to the force of hydrogen bonds. This two-dimensional structure is called the RNA sequence's secondary structure. Several algorithms have been proposed to predict an RNA sequence's secondary structure. These algorithms are referred to as RNA folding algorithms. Results We develop cache efficient, multicore, and GPU algorithms for RNA folding using Nussinov's algorithm. Conclusions Our cache efficient algorithm provides a speedup between 1.6 and 3.0 relative to a naive straightforward single core code. The multicore version of the cache efficient single core algorithm provides a speedup, relative to the naive single core algorithm, between 7.5 and 14.0 on a 6 core hyperthreaded CPU. Our GPU algorithm for the NVIDIA C2050 is up to 1582 times as fast as the naive single core algorithm and between 5.1 and 11.2 times as fast as the fastest previously known GPU algorithm for Nussinov RNA folding. PMID:25082539
CUDA-Accelerated Geodesic Ray-Tracing for Fiber Tracking

Directory of Open Access Journals (Sweden)

Evert van Aart

2011-01-01

Full Text Available Diffusion Tensor Imaging (DTI allows to noninvasively measure the diffusion of water in fibrous tissue. By reconstructing the fibers from DTI data using a fiber-tracking algorithm, we can deduce the structure of the tissue. In this paper, we outline an approach to accelerating such a fiber-tracking algorithm using a Graphics Processing Unit (GPU. This algorithm, which is based on the calculation of geodesics, has shown promising results for both synthetic and real data, but is limited in its applicability by its high computational requirements. We present a solution which uses the parallelism offered by modern GPUs, in combination with the CUDA platform by NVIDIA, to significantly reduce the execution time of the fiber-tracking algorithm. Compared to a multithreaded CPU implementation of the same algorithm, our GPU mapping achieves a speedup factor of up to 40 times.
Leisure activities are linked to mental health benefits by providing time structure: comparing employed, unemployed and homemakers.

Science.gov (United States)

Goodman, William K; Geiger, Ashley M; Wolf, Jutta M

2017-01-01

Unemployment has consistently been linked to negative mental health outcomes, emphasising the need to characterise the underlying mechanisms. The current study aimed at testing whether compared with other employment groups, fewer leisure activities observed in unemployment may contribute to elevated risk for negative mental health via loss of time structure. Depressive symptoms (Center for Epidemiologic Studies Depression), leisure activities (exercise, self-focused, social), and time structure (Time Structure Questionnaire (TSQ)) were assessed cross-sectionally in 406 participants (unemployed=155, employed=140, homemakers=111) recruited through Amazon Mechanical Turk. Controlling for gender and age, structural equation modelling revealed time structure partially (employed, homemakers) and fully (unemployed) mediated the relationship between leisure activities and depressive symptoms. With the exception of differential effects for structured routines, all other TSQ factors (sense of purpose, present orientation, effective organisation and persistence) contributed significantly to all models. These findings support the idea that especially for the unemployed, leisure activities impose their mental health benefits through increasing individuals' perception of spending their time effectively. Social leisure activities that provide a sense of daily structure may thereby be a particularly promising low-cost intervention to improve mental health in this population. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.
GPU-accelerated adjoint algorithmic differentiation

Science.gov (United States)

Gremse, Felix; Höfter, Andreas; Razik, Lukas; Kiessling, Fabian; Naumann, Uwe

2016-03-01

Many scientific problems such as classifier training or medical image reconstruction can be expressed as minimization of differentiable real-valued cost functions and solved with iterative gradient-based methods. Adjoint algorithmic differentiation (AAD) enables automated computation of gradients of such cost functions implemented as computer programs. To backpropagate adjoint derivatives, excessive memory is potentially required to store the intermediate partial derivatives on a dedicated data structure, referred to as the ;tape;. Parallelization is difficult because threads need to synchronize their accesses during taping and backpropagation. This situation is aggravated for many-core architectures, such as Graphics Processing Units (GPUs), because of the large number of light-weight threads and the limited memory size in general as well as per thread. We show how these limitations can be mediated if the cost function is expressed using GPU-accelerated vector and matrix operations which are recognized as intrinsic functions by our AAD software. We compare this approach with naive and vectorized implementations for CPUs. We use four increasingly complex cost functions to evaluate the performance with respect to memory consumption and gradient computation times. Using vectorization, CPU and GPU memory consumption could be substantially reduced compared to the naive reference implementation, in some cases even by an order of complexity. The vectorization allowed usage of optimized parallel libraries during forward and reverse passes which resulted in high speedups for the vectorized CPU version compared to the naive reference implementation. The GPU version achieved an additional speedup of 7.5 ± 4.4, showing that the processing power of GPUs can be utilized for AAD using this concept. Furthermore, we show how this software can be systematically extended for more complex problems such as nonlinear absorption reconstruction for fluorescence-mediated tomography.
3D Seismic Imaging through Reverse-Time Migration on Homogeneous and Heterogeneous Multi-Core Processors

Directory of Open Access Journals (Sweden)

Mauricio Araya-Polo

2009-01-01

Full Text Available Reverse-Time Migration (RTM is a state-of-the-art technique in seismic acoustic imaging, because of the quality and integrity of the images it provides. Oil and gas companies trust RTM with crucial decisions on multi-million-dollar drilling investments. But RTM requires vastly more computational power than its predecessor techniques, and this has somewhat hindered its practical success. On the other hand, despite multi-core architectures promise to deliver unprecedented computational power, little attention has been devoted to mapping efficiently RTM to multi-cores. In this paper, we present a mapping of the RTM computational kernel to the IBM Cell/B.E. processor that reaches close-to-optimal performance. The kernel proves to be memory-bound and it achieves a 98% utilization of the peak memory bandwidth. Our Cell/B.E. implementation outperforms a traditional processor (PowerPC 970MP in terms of performance (with an 15.0× speedup and energy-efficiency (with a 10.0× increase in the GFlops/W delivered. Also, it is the fastest RTM implementation available to the best of our knowledge. These results increase the practical usability of RTM. Also, the RTM-Cell/B.E. combination proves to be a strong competitor in the seismic arena.
Conjugate-Gradient Algorithms For Dynamics Of Manipulators

Science.gov (United States)

Fijany, Amir; Scheid, Robert E.

1993-01-01

Algorithms for serial and parallel computation of forward dynamics of multiple-link robotic manipulators by conjugate-gradient method developed. Parallel algorithms have potential for speedup of computations on multiple linked, specialized processors implemented in very-large-scale integrated circuits. Such processors used to stimulate dynamics, possibly faster than in real time, for purposes of planning and control.
Comparative study of the requantization of the time-dependent mean field for the dynamics of nuclear pairing

Science.gov (United States)

Ni, Fang; Nakatsukasa, Takashi

2018-04-01

To describe quantal collective phenomena, it is useful to requantize the time-dependent mean-field dynamics. We study the time-dependent Hartree-Fock-Bogoliubov (TDHFB) theory for the two-level pairing Hamiltonian, and compare results of different quantization methods. The one constructing microscopic wave functions, using the TDHFB trajectories fulfilling the Einstein-Brillouin-Keller quantization condition, turns out to be the most accurate. The method is based on the stationary-phase approximation to the path integral. We also examine the performance of the collective model which assumes that the pairing gap parameter is the collective coordinate. The applicability of the collective model is limited for the nuclear pairing with a small number of single-particle levels, because the pairing gap parameter represents only a half of the pairing collective space.
GPU accelerated simulations of 3D deterministic particle transport using discrete ordinates method

International Nuclear Information System (INIS)

Gong Chunye; Liu Jie; Chi Lihua; Huang Haowei; Fang Jingyue; Gong Zhenghu

2011-01-01

Graphics Processing Unit (GPU), originally developed for real-time, high-definition 3D graphics in computer games, now provides great faculty in solving scientific applications. The basis of particle transport simulation is the time-dependent, multi-group, inhomogeneous Boltzmann transport equation. The numerical solution to the Boltzmann equation involves the discrete ordinates (S n ) method and the procedure of source iteration. In this paper, we present a GPU accelerated simulation of one energy group time-independent deterministic discrete ordinates particle transport in 3D Cartesian geometry (Sweep3D). The performance of the GPU simulations are reported with the simulations of vacuum boundary condition. The discussion of the relative advantages and disadvantages of the GPU implementation, the simulation on multi GPUs, the programming effort and code portability are also reported. The results show that the overall performance speedup of one NVIDIA Tesla M2050 GPU ranges from 2.56 compared with one Intel Xeon X5670 chip to 8.14 compared with one Intel Core Q6600 chip for no flux fixup. The simulation with flux fixup on one M2050 is 1.23 times faster than on one X5670.
GPU accelerated simulations of 3D deterministic particle transport using discrete ordinates method

Science.gov (United States)

Gong, Chunye; Liu, Jie; Chi, Lihua; Huang, Haowei; Fang, Jingyue; Gong, Zhenghu

2011-07-01

Graphics Processing Unit (GPU), originally developed for real-time, high-definition 3D graphics in computer games, now provides great faculty in solving scientific applications. The basis of particle transport simulation is the time-dependent, multi-group, inhomogeneous Boltzmann transport equation. The numerical solution to the Boltzmann equation involves the discrete ordinates ( Sn) method and the procedure of source iteration. In this paper, we present a GPU accelerated simulation of one energy group time-independent deterministic discrete ordinates particle transport in 3D Cartesian geometry (Sweep3D). The performance of the GPU simulations are reported with the simulations of vacuum boundary condition. The discussion of the relative advantages and disadvantages of the GPU implementation, the simulation on multi GPUs, the programming effort and code portability are also reported. The results show that the overall performance speedup of one NVIDIA Tesla M2050 GPU ranges from 2.56 compared with one Intel Xeon X5670 chip to 8.14 compared with one Intel Core Q6600 chip for no flux fixup. The simulation with flux fixup on one M2050 is 1.23 times faster than on one X5670.

A facile method to compare EFTEM maps obtained from materials changing composition over time

KAUST Repository

Casu, Alberto

2015-10-31

Energy Filtered Transmission Electron Microscopy (EFTEM) is an analytical tool that has been successfully and widely employed in the last two decades for obtaining fast elemental maps in TEM mode. Several studies and efforts have been addressed to investigate limitations and advantages of such technique, as well as to improve the spatial resolution of compositional maps. Usually, EFTEM maps undergo post-acquisition treatments by changing brightness and contrast levels, either via dedicated software or via human elaboration, in order to maximize their signal-to-noise ratio and render them as visible as possible. However, elemental maps forming a single set of EFTEM images are usually subjected to independent map-by-map image treatment. This post-acquisition step becomes crucial when analyzing materials that change composition over time as a consequence of an external stimulus, because the map-by-map approach doesn\\'t take into account how the chemical features of the imaged materials actually progress, in particular when the investigated elements exhibit very low signals. In this article, we present a facile procedure applicable to whole sets of EFTEM maps acquired on a sample that is evolving over time. The main aim is to find a common method to treat the images features, in order to make them as comparable as possible without affecting the information there contained. Microsc. Res. Tech. 78:1090–1097, 2015. © 2015 Wiley Periodicals, Inc.
A facile method to compare EFTEM maps obtained from materials changing composition over time

KAUST Repository

Casu, Alberto; Genovese, Alessandro; Di Benedetto, Cristiano; Lentijo Mozo, Sergio; Sogne, Elisa; Zuddas, Efisio; Falqui, Andrea

2015-01-01

Energy Filtered Transmission Electron Microscopy (EFTEM) is an analytical tool that has been successfully and widely employed in the last two decades for obtaining fast elemental maps in TEM mode. Several studies and efforts have been addressed to investigate limitations and advantages of such technique, as well as to improve the spatial resolution of compositional maps. Usually, EFTEM maps undergo post-acquisition treatments by changing brightness and contrast levels, either via dedicated software or via human elaboration, in order to maximize their signal-to-noise ratio and render them as visible as possible. However, elemental maps forming a single set of EFTEM images are usually subjected to independent map-by-map image treatment. This post-acquisition step becomes crucial when analyzing materials that change composition over time as a consequence of an external stimulus, because the map-by-map approach doesn't take into account how the chemical features of the imaged materials actually progress, in particular when the investigated elements exhibit very low signals. In this article, we present a facile procedure applicable to whole sets of EFTEM maps acquired on a sample that is evolving over time. The main aim is to find a common method to treat the images features, in order to make them as comparable as possible without affecting the information there contained. Microsc. Res. Tech. 78:1090–1097, 2015. © 2015 Wiley Periodicals, Inc.
Comparative evaluation of image quality in computed radiology systems using imaging plates with different usage time

International Nuclear Information System (INIS)

Lazzaro, M.V.; Luz, R.M. da; Capaverde, A.S.; Silva, A.M. Marques da

2015-01-01

Computed Radiology (CR) systems use imaging plates (IPs) for latent image acquisition. Taking into account the quality control (QC) of these systems, imaging plates usage time is undetermined. Different recommendations and publications on the subject suggest tests to evaluate these systems. The objective of this study is to compare the image quality of IPs of a CR system, in a mammography service, considering the usage time and consistency of assessments. 8 IPs were used divided into two groups: the first group included 4 IPs with 3 years of use (Group A); the second group consisted of 4 new IPs with no previous exposure (Group B). The tests used to assess the IP's quality were: Uniformity, Differential Signal to Noise Ratio (SDNR), Ghost Effect and Figure of Merit (FOM). Statistical results show that the proposed tests are shown efficient in assessing the conditions of image quality obtained in CR systems in mammography and can be used as determining factors for the replacement of IP's. Moreover, comparing the two sets of IP, results led to the replacement of all the set of IP’s with 3 years of use. This work demonstrates the importance of an efficient quality control, not only with regard to the quality of IP's used, but in the acquisition system as a whole. From this work, these tests will be conducted on an annual basis, already targeting as future work, monitoring the wear of IP's Group B and the creation of a baseline for analysis and future replacements. (author)
Automating data analysis for two-dimensional gas chromatography/time-of-flight mass spectrometry non-targeted analysis of comparative samples.

Science.gov (United States)

Titaley, Ivan A; Ogba, O Maduka; Chibwe, Leah; Hoh, Eunha; Cheong, Paul H-Y; Simonich, Staci L Massey

2018-03-16

Non-targeted analysis of environmental samples, using comprehensive two-dimensional gas chromatography coupled with time-of-flight mass spectrometry (GC × GC/ToF-MS), poses significant data analysis challenges due to the large number of possible analytes. Non-targeted data analysis of complex mixtures is prone to human bias and is laborious, particularly for comparative environmental samples such as contaminated soil pre- and post-bioremediation. To address this research bottleneck, we developed OCTpy, a Python™ script that acts as a data reduction filter to automate GC × GC/ToF-MS data analysis from LECO ® ChromaTOF ® software and facilitates selection of analytes of interest based on peak area comparison between comparative samples. We used data from polycyclic aromatic hydrocarbon (PAH) contaminated soil, pre- and post-bioremediation, to assess the effectiveness of OCTpy in facilitating the selection of analytes that have formed or degraded following treatment. Using datasets from the soil extracts pre- and post-bioremediation, OCTpy selected, on average, 18% of the initial suggested analytes generated by the LECO ® ChromaTOF ® software Statistical Compare feature. Based on this list, 63-100% of the candidate analytes identified by a highly trained individual were also selected by OCTpy. This process was accomplished in several minutes per sample, whereas manual data analysis took several hours per sample. OCTpy automates the analysis of complex mixtures of comparative samples, reduces the potential for human error during heavy data handling and decreases data analysis time by at least tenfold. Copyright © 2018 Elsevier B.V. All rights reserved.
Split-time artificial insemination in beef cattle: III. Comparing fixed-time artificial insemination to split-time artificial insemination with delayed administration of GnRH in postpartum cows.

Science.gov (United States)

Bishop, B E; Thomas, J M; Abel, J M; Poock, S E; Ellersieck, M R; Smith, M F; Patterson, D J

2017-09-01

This experiment was designed to compare pregnancy rates in postpartum beef cows following split-time (STAI) or fixed-time (FTAI) artificial insemination. Estrus was synchronized for 671 cows at seven locations following administration of the 7-d CO-Synch + CIDR protocol (100 μg GnRH + CIDR insert [1.38 g progesterone] on d 0; 25 mg prostaglandin F 2α [PG] at CIDR removal on d 7). Cows were assigned to treatments that were balanced across locations based on age, body condition score, and days postpartum at the time treatments were initiated. All cows in treatment 1 (n = 333; FTAI) were inseminated at 66 h after PG and GnRH was administered concurrent with insemination regardless of estrus expression. For cows in treatment 2 (n = 338; STAI), inseminations were performed at 66 or 90 h after PG, and estrous status was recorded at these times. Cows in the STAI treatment that exhibited estrus by 66 h were inseminated at that time and did not receive GnRH, whereas AI was delayed 24 h until 90 h after PG for cows that failed to exhibit estrus by 66 h. Gonadotropin-releasing hormone (100 μg) was administered concurrent with AI at 90 h only to cows failing to exhibit estrus. Estrus expression that occurred during the 24 h delay period among cows assigned to the STAI treatment increased the total proportion of cows that expressed estrus prior to insemination (1 = 60%; 2 = 86%; P cows inseminated at 66 h that exhibited estrus did not differ between treatments (1 = 58%; 2 = 58%; P = 0.93); however, pregnancy rates among non-estrous cows at 66 h were improved (1 = 35%; 2 = 51%; P = 0.01) among cows assigned to the STAI treatment when insemination was postponed by 24 h. Consequently, total AI pregnancy rate tended to be higher for cows that received STAI (1 = 49%; 2 = 56%; P = 0.06). In summary, following administration of the 7-d CO-Synch + CIDR protocol, total estrous response increased and pregnancy rates resulting from AI
Incidence of Appendicitis over Time: A Comparative Analysis of an Administrative Healthcare Database and a Pathology-Proven Appendicitis Registry

Science.gov (United States)

Clement, Fiona; Zimmer, Scott; Dixon, Elijah; Ball, Chad G.; Heitman, Steven J.; Swain, Mark; Ghosh, Subrata

2016-01-01

Importance At the turn of the 21st century, studies evaluating the change in incidence of appendicitis over time have reported inconsistent findings. Objectives We compared the differences in the incidence of appendicitis derived from a pathology registry versus an administrative database in order to validate coding in administrative databases and establish temporal trends in the incidence of appendicitis. Design We conducted a population-based comparative cohort study to identify all individuals with appendicitis from 2000 to2008. Setting & Participants Two population-based data sources were used to identify cases of appendicitis: 1) a pathology registry (n = 8,822); and 2) a hospital discharge abstract database (n = 10,453). Intervention & Main Outcome The administrative database was compared to the pathology registry for the following a priori analyses: 1) to calculate the positive predictive value (PPV) of administrative codes; 2) to compare the annual incidence of appendicitis; and 3) to assess differences in temporal trends. Temporal trends were assessed using a generalized linear model that assumed a Poisson distribution and reported as an annual percent change (APC) with 95% confidence intervals (CI). Analyses were stratified by perforated and non-perforated appendicitis. Results The administrative database (PPV = 83.0%) overestimated the incidence of appendicitis (100.3 per 100,000) when compared to the pathology registry (84.2 per 100,000). Codes for perforated appendicitis were not reliable (PPV = 52.4%) leading to overestimation in the incidence of perforated appendicitis in the administrative database (34.8 per 100,000) as compared to the pathology registry (19.4 per 100,000). The incidence of appendicitis significantly increased over time in both the administrative database (APC = 2.1%; 95% CI: 1.3, 2.8) and pathology registry (APC = 4.1; 95% CI: 3.1, 5.0). Conclusion & Relevance The administrative database overestimated the incidence of appendicitis
Time-driven activity based costing: a comparative study with the activity based costing

Directory of Open Access Journals (Sweden)

Marina Battistella Luna

2017-06-01

Full Text Available The activity-based costing (ABC emerged in the 1980s to meet the new necessities of cost information facing companies, the result of continuous changes in the business environment. In the 2000s, a new costing method, known as time-driven activity-based costing (TDABC was introduced in order to simplify the ABC. This paper compares these methods in order to provide information to assist managers to decide which of these methods better suits the reality of their companies. Therefore, they were analyzed based on information obtained through a systematic search in the Scopus and Web of Knowledge databases, as well as papers from the annals of the Congresso Brasileiro de Custos, Congresso de Controladoria e Contabilidade da USP and Encontro Nacional de Engenharia de Produção (considering scientific papers published between 2004 and 2016. From this analysis, in most cases it was concluded that TDABC is a simpler and more practical option than ABC. However, it was also apparent that managers, before choosing a particular method, must verify whether the conditions that enable its applicability exist.
A quantum speedup in machine learning: finding an N-bit Boolean function for a classification

International Nuclear Information System (INIS)

Yoo, Seokwon; Lee, Jinhyoung; Bang, Jeongho; Lee, Changhyoup

2014-01-01

We compare quantum and classical machines designed for learning an N-bit Boolean function in order to address how a quantum system improves the machine learning behavior. The machines of the two types consist of the same number of operations and control parameters, but only the quantum machines utilize the quantum coherence naturally induced by unitary operators. We show that quantum superposition enables quantum learning that is faster than classical learning by expanding the approximate solution regions, i.e., the acceptable regions. This is also demonstrated by means of numerical simulations with a standard feedback model, namely random search, and a practical model, namely differential evolution. (paper)
Using cover, copy, and compare spelling with and without timing for elementary students with behavior disorders

Directory of Open Access Journals (Sweden)

Danette Darrow

2012-04-01

Full Text Available The purpose of this study was to determine the effectiveness of cover, copy, and compare (CCC procedures on spelling performance with two students. The participants were two elementary students enrolled in a self-contained behavior intervention classroom. A multiple baseline design across participants was employed to evaluate the effects of CCC on time to completion and words spelled correctly. Improvements in all measures were found when CCC was in effect. The participants enjoyed the procedures and each improved their spelling over baseline performance. The applicability of CCC across academic contexts and for students with behavior disorders was discussed.
Vectorization of nuclear codes on FACOM 230-75 APU computer

International Nuclear Information System (INIS)

Harada, Hiroo; Higuchi, Kenji; Ishiguro, Misako; Tsutsui, Tsuneo; Fujii, Minoru

1983-02-01

To provide for the future usage of supercomputer, we have investigated the vector processing efficiency of nuclear codes which are being used at JAERI. The investigation is performed by using FACOM 230-75 APU computer. The codes are CITATION (3D neutron diffusion), SAP5 (structural analysis), CASCMARL (irradiation damage simulation). FEM-BABEL (3D neutron diffusion by FEM), GMSCOPE (microscope simulation). DWBA (cross section calculation at molecular collisions). A new type of cell density calculation for particle-in-cell method is also investigated. For each code we have obtained a significant speedup which ranges from 1.8 (CASCMARL) to 7.5 (GMSCOPE), respectively. We have described in this report the running time dynamic profile analysis of the codes, numerical algorithms used, program restructuring for the vectorization, numerical experiments of the iterative process, vectorized ratios, speedup ratios on the FACOM 230-75 APU computer, and some vectorization views. (author)
High spatial and temporal resolution retrospective cine cardiovascular magnetic resonance from shortened free breathing real-time acquisitions.

Science.gov (United States)

Xue, Hui; Kellman, Peter; Larocca, Gina; Arai, Andrew E; Hansen, Michael S

2013-11-14

scores were given to segmented and retrospective cine series. Volumetric measurements of cardiac function were also compared by manually tracing the myocardium for segmented and retrospective cines. Mean image quality scores were similar for short axis and long axis views for both tested resolutions. Short axis scores were 4.52/4.31 (high/low matrix sizes) for breath-hold vs. 4.54/4.56 for real-time (paired t-test, P = 0.756/0.011). Long axis scores were 4.09/4.37 vs. 3.99/4.29 (P = 0.475/0.463). Mean ejection fraction was 60.8/61.4 for breath-held acquisitions vs. 60.3/60.3 for real-time acquisitions (P = 0.439/0.093). No significant differences were seen in end-diastolic volume (P = 0.460/0.268) but there was a trend towards a small overestimation of end-systolic volume of 2.0/2.5 ml, which did not reach statistical significance (P = 0.052/0.083). Real-time free breathing CMR can be used to obtain high quality retrospectively gated cine images in 16-20s per slice. Volumetric measurements and image quality scores were similar in images from breath-held segmented and free breathing, real-time acquisitions. Further speedup of image reconstruction is still needed.
Fast prototyping H.264 deblocking filter using ESL tools

International Nuclear Information System (INIS)

Damak, T.; Werda, I.; Masmoud, N.; Bilavarn, S.

2011-01-01

This paper presents a design methodology for hardware/software (HW/SW) architecture design using ESL tools (Electronic System Level). From C++ descriptions, our design flow is able to generate hardware blocks running with a software part and all necessary codes to prototype the HW/SW system on Xilinx FPGAs. Therefore we use assistance of high level synthesis tools (Catapult C Synthesis), logic synthesis and Xilinx tools. As application, we developed an optimized Deblocking filter C code, designed to be used as a part of a complete H.264 video coding system [1]. Based on this code, we explored many configurations of Catapult Synthesis to analyze different area/time tradeoffs. Results show execution speedups of 95,5 pour cent compared to pure software execution etc.
A simple two stage optimization algorithm for constrained power economic dispatch

International Nuclear Information System (INIS)

Huang, G.; Song, K.

1994-01-01

A simple two stage optimization algorithm is proposed and investigated for fast computation of constrained power economic dispatch control problems. The method is a simple demonstration of the hierarchical aggregation-disaggregation (HAD) concept. The algorithm first solves an aggregated problem to obtain an initial solution. This aggregated problem turns out to be classical economic dispatch formulation, and it can be solved in 1% of overall computation time. In the second stage, linear programming method finds optimal solution which satisfies power balance constraints, generation and transmission inequality constraints and security constraints. Implementation of the algorithm for IEEE systems and EPRI Scenario systems shows that the two stage method obtains average speedup ratio 10.64 as compared to classical LP-based method
Cooling many particles at once

International Nuclear Information System (INIS)

Vitiello, G.; Knight, P.; Beige, A.

2005-01-01

Full text: We propose a mechanism for the collective cooling of a large number N of trapped particles to very low temperatures by applying red-detuned laser fields and coupling them to the quantized field inside an optical resonator. The dynamics is described by what appear to be rate equations but where some of the major quantities are coherences and not populations. It is shown that the cooperative behaviour of the system provides cooling rates of the same order of magnitude as the cavity decay rate. This constitutes a significant speed-up compared to other cooling mechanisms since this rate can, in principle, be as large as the square root of N times the single-particle cavity or laser coupling constants. (author)
Examination of Speed Contribution of Parallelization for Several Fingerprint Pre-Processing Algorithms

Directory of Open Access Journals (Sweden)

GORGUNOGLU, S.

2014-05-01

Full Text Available In analysis of minutiae based fingerprint systems, fingerprints needs to be pre-processed. The pre-processing is carried out to enhance the quality of the fingerprint and to obtain more accurate minutiae points. Reducing the pre-processing time is important for identification and verification in real time systems and especially for databases holding large fingerprints information. Parallel processing and parallel CPU computing can be considered as distribution of processes over multi core processor. This is done by using parallel programming techniques. Reducing the execution time is the main objective in parallel processing. In this study, pre-processing of minutiae based fingerprint system is implemented by parallel processing on multi core computers using OpenMP and on graphics processor using CUDA to improve execution time. The execution times and speedup ratios are compared with the one that of single core processor. The results show that by using parallel processing, execution time is substantially improved. The improvement ratios obtained for different pre-processing algorithms allowed us to make suggestions on the more suitable approaches for parallelization.
Distillation with labelled transition systems

DEFF Research Database (Denmark)

Hamilton, Geoffrey William; Jones, Neil

2012-01-01

In this paper, we provide an improved basis for the " distillation" program transformation. It is known that superlinear speedups can be obtained using distillation, but cannot be obtained by other earlier automatic program transformation techniques such as deforestation, positive supercompilation...... and partial evaluation. We give distillation an improved semantic basis, and explain how superlinear speedups can occur....
Excited-state absorption in tetrapyridyl porphyrins: comparing real-time and quadratic-response time-dependent density functional theory

Energy Technology Data Exchange (ETDEWEB)

Bowman, David N. [Department of Chemistry; Supercomputing Institute and Chemical Theory Center; University of Minnesota; Minneapolis; USA; Asher, Jason C. [Department of Chemistry; Supercomputing Institute and Chemical Theory Center; University of Minnesota; Minneapolis; USA; Fischer, Sean A. [William R. Wiley Environmental Molecular Sciences Laboratory; Pacific Northwest National Laboratory; P.O. Box 999; Richland; USA; Cramer, Christopher J. [Department of Chemistry; Supercomputing Institute and Chemical Theory Center; University of Minnesota; Minneapolis; USA; Govind, Niranjan [William R. Wiley Environmental Molecular Sciences Laboratory; Pacific Northwest National Laboratory; P.O. Box 999; Richland; USA

2017-01-01

Threemeso-substituted tetrapyridyl porphyrins (free base, Ni(ii), and Cu(ii)) were investigated for their optical limiting (OL) capabilities using real-time (RT-), linear-response (LR-), and quadratic-response (QR-) time-dependent density functional theory (TDDFT) methods.
Scalability of Parallel Spatial Direct Numerical Simulations on Intel Hypercube and IBM SP1 and SP2

Science.gov (United States)

Joslin, Ronald D.; Hanebutte, Ulf R.; Zubair, Mohammad

1995-01-01

The implementation and performance of a parallel spatial direct numerical simulation (PSDNS) approach on the Intel iPSC/860 hypercube and IBM SP1 and SP2 parallel computers is documented. Spatially evolving disturbances associated with the laminar-to-turbulent transition in boundary-layer flows are computed with the PSDNS code. The feasibility of using the PSDNS to perform transition studies on these computers is examined. The results indicate that PSDNS approach can effectively be parallelized on a distributed-memory parallel machine by remapping the distributed data structure during the course of the calculation. Scalability information is provided to estimate computational costs to match the actual costs relative to changes in the number of grid points. By increasing the number of processors, slower than linear speedups are achieved with optimized (machine-dependent library) routines. This slower than linear speedup results because the computational cost is dominated by FFT routine, which yields less than ideal speedups. By using appropriate compile options and optimized library routines on the SP1, the serial code achieves 52-56 M ops on a single node of the SP1 (45 percent of theoretical peak performance). The actual performance of the PSDNS code on the SP1 is evaluated with a "real world" simulation that consists of 1.7 million grid points. One time step of this simulation is calculated on eight nodes of the SP1 in the same time as required by a Cray Y/MP supercomputer. For the same simulation, 32-nodes of the SP1 and SP2 are required to reach the performance of a Cray C-90. A 32 node SP1 (SP2) configuration is 2.9 (4.6) times faster than a Cray Y/MP for this simulation, while the hypercube is roughly 2 times slower than the Y/MP for this application. KEY WORDS: Spatial direct numerical simulations; incompressible viscous flows; spectral methods; finite differences; parallel computing.
Faster Algorithms for Computing Longest Common Increasing Subsequences

DEFF Research Database (Denmark)

Kutz, Martin; Brodal, Gerth Stølting; Kaligosi, Kanela

2011-01-01

of the alphabet, and Sort is the time to sort each input sequence. For k⩾3 length-n sequences we present an algorithm which improves the previous best bound by more than a factor k for many inputs. In both cases, our algorithms are conceptually quite simple but rely on existing sophisticated data structures......We present algorithms for finding a longest common increasing subsequence of two or more input sequences. For two sequences of lengths n and m, where m⩾n, we present an algorithm with an output-dependent expected running time of and O(m) space, where ℓ is the length of an LCIS, σ is the size....... Finally, we introduce the problem of longest common weakly-increasing (or non-decreasing) subsequences (LCWIS), for which we present an -time algorithm for the 3-letter alphabet case. For the extensively studied longest common subsequence problem, comparable speedups have not been achieved for small...
Time to pediatric epilepsy surgery is longer and developmental outcomes lower for government compared with private insurance.

Science.gov (United States)

Hauptman, Jason S; Dadour, Andrew; Oh, Taemin; Baca, Christine B; Vickrey, Barbara G; Vassar, Stefanie; Sankar, Raman; Salamon, Noriko; Vinters, Harry V; Mathern, Gary W

2013-07-01

It is unclear if socioeconomic factors like type of insurance influence time to referral and developmental outcomes for pediatric patients undergoing epilepsy surgery. This study determined whether private compared with state government insurance was associated with shorter intervals of seizure onset to surgery and better developmental quotients for pediatric patients undergoing epilepsy surgery. A consecutive cohort (n = 420) of pediatric patients undergoing epilepsy surgery were retrospectively categorized into those with Medicaid (California Children's Services; n = 91) or private (Preferred Provider Organization, Health Maintenance Organization, Indemnity; n = 329) insurance. Intervals from seizure onset to referral and surgery and Vineland developmental assessments were compared by insurance type with the use of log-rank tests. Compared with private insurance, children with Medicaid had longer intervals from seizure onset to referral for evaluation (log-rank test, P = .034), and from seizure onset to surgery (P = .017). In a subset (25%) that had Vineland assessments, children with Medicaid compared with private insurance had lower Vineland scores presurgery (P = .042) and postsurgery (P = .003). Type of insurance was not associated with seizure severity, types of operations, etiology, postsurgical seizure-free outcomes, and complication rate. Compared with Medicaid, children with private insurance had shorter intervals from seizure onset to referral and to epilepsy surgery, and this was associated with lower Vineland scores before surgery. These findings may reflect delayed access for uninsured children who eventually obtained state insurance. Reasons for the delay and whether longer intervals before epilepsy surgery affect long-term cognitive and developmental outcomes warrant further prospective investigations.

Usage of Microsoft Kinect for augmented prototyping speed-up

Directory of Open Access Journals (Sweden)

Jaromír Landa

2012-01-01

Full Text Available Physical model is a common tool for testing of the product features during the design process. This model is usually made of clay or plastic because of the modifiability of these materials. Therefore, the designer could easily adjust the model shape to enhance the look or ergonomics of the product. Nowadays, some companies use augmented reality to enhance their design process. This concept is called augmented prototyping. Common approach uses artificial markers to augment the product prototype by digital 3D models. These 3D models that are shown on the markers positions can represent e.g. car spare parts such as different lights, wheels, spoiler etc. This allows the designer interactively change the look of the physical model. Further, it is also necessary to transfer physical adjustments made on the model surface back to the computer digital model. Well-known tool for this purpose a professional 3D scanner. Nevertheless, the cost of such scanner is substantial. Therefore, we focused on different solution – a motion capture device Microsoft Kinect that is used for computer games. This article outlines a new augmented prototyping approach that directly updates the digital model during the design process using Kinect depth camera. This solution is a cost effective alternative to the professional 3D scanners. Our article describes especially how depth data can be obtained by the Kinect and also provides an evaluation of depth measurement precision.
Parallelizing Compiler Framework and API for Power Reduction and Software Productivity of Real-Time Heterogeneous Multicores

Science.gov (United States)

Hayashi, Akihiro; Wada, Yasutaka; Watanabe, Takeshi; Sekiguchi, Takeshi; Mase, Masayoshi; Shirako, Jun; Kimura, Keiji; Kasahara, Hironori

Heterogeneous multicores have been attracting much attention to attain high performance keeping power consumption low in wide spread of areas. However, heterogeneous multicores force programmers very difficult programming. The long application program development period lowers product competitiveness. In order to overcome such a situation, this paper proposes a compilation framework which bridges a gap between programmers and heterogeneous multicores. In particular, this paper describes the compilation framework based on OSCAR compiler. It realizes coarse grain task parallel processing, data transfer using a DMA controller, power reduction control from user programs with DVFS and clock gating on various heterogeneous multicores from different vendors. This paper also evaluates processing performance and the power reduction by the proposed framework on a newly developed 15 core heterogeneous multicore chip named RP-X integrating 8 general purpose processor cores and 3 types of accelerator cores which was developed by Renesas Electronics, Hitachi, Tokyo Institute of Technology and Waseda University. The framework attains speedups up to 32x for an optical flow program with eight general purpose processor cores and four DRP(Dynamically Reconfigurable Processor) accelerator cores against sequential execution by a single processor core and 80% of power reduction for the real-time AAC encoding.
A comparative study of boar semen extenders with different proposed preservation times and their effect on semen quality and fertility

OpenAIRE

Marina Anastasia Karageorgiou; Georgios Tsousis; Constantin M. Boscos; Eleni D. Tzika; Panagiotis D. Tassis; Ioannis A. Tsakmakidis

2016-01-01

The present study compared the quality characteristics of boar semen diluted with three extenders of different proposed preservation times (short-term, medium-term and long-term). A part of extended semen was used for artificial insemination on the farm (30 sows/extender), while the remaining part was stored for three days (16–18 °C). Stored and used semen was also laboratory assessed at insemination time, on days 1 and 2 after the collection (day 0). The long-term extender was used for a sho...
Particle-in-Cell laser-plasma simulation on Xeon Phi coprocessors

Science.gov (United States)

Surmin, I. A.; Bastrakov, S. I.; Efimenko, E. S.; Gonoskov, A. A.; Korzhimanov, A. V.; Meyerov, I. B.

2016-05-01

This paper concerns the development of a high-performance implementation of the Particle-in-Cell method for plasma simulation on Intel Xeon Phi coprocessors. We discuss the suitability of the method for Xeon Phi architecture and present our experience in the porting and optimization of the existing parallel Particle-in-Cell code PICADOR. Direct porting without code modification gives performance on Xeon Phi close to that of an 8-core CPU on a benchmark problem with 50 particles per cell. We demonstrate step-by-step optimization techniques, such as improving data locality, enhancing parallelization efficiency and vectorization leading to an overall 4.2 × speedup on CPU and 7.5 × on Xeon Phi compared to the baseline version. The optimized version achieves 16.9 ns per particle update on an Intel Xeon E5-2660 CPU and 9.3 ns per particle update on an Intel Xeon Phi 5110P. For a real problem of laser ion acceleration in targets with surface grating, where a large number of macroparticles per cell is required, the speedup of Xeon Phi compared to CPU is 1.6 ×.
Multitasking a three-dimensional Navier-Stokes algorithm on the Cray-2

Science.gov (United States)

Swisshelm, Julie M.

1989-01-01

A three-dimensional computational aerodynamics algorithm has been multitasked for efficient parallel execution on the Cray-2. It provides a means for examining the multitasking performance of a complete CFD application code. An embedded zonal multigrid scheme is used to solve the Reynolds-averaged Navier-Stokes equations for an internal flow model problem. The explicit nature of each component of the method allows a spatial partitioning of the computational domain to achieve a well-balanced task load for MIMD computers with vector-processing capability. Experiments have been conducted with both two- and three-dimensional multitasked cases. The best speedup attained by an individual task group was 3.54 on four processors of the Cray-2, while the entire solver yielded a speedup of 2.67 on four processors for the three-dimensional case. The multiprocessing efficiency of various types of computational tasks is examined, performance on two Cray-2s with different memory access speeds is compared, and extrapolation to larger problems is discussed.
Vectorization of the KENO V.a criticality safety code

International Nuclear Information System (INIS)

Hollenbach, D.F.; Dodds, H.L.; Petrie, L.M.

1991-01-01

The development of the vector processor, which is used in the current generation of supercomputers and is beginning to be used in workstations, provides the potential for dramatic speed-up for codes that are able to process data as vectors. Unfortunately, the stochastic nature of Monte Carlo codes prevents the old scalar version of these codes from taking advantage of the vector processors. New Monte Carlo algorithms that process all the histories undergoing the same event as a batch are required. Recently, new vectorized Monte Carlo codes have been developed that show significant speed-ups when compared to the scalar version of themselves or equivalent codes. This paper discusses the vectorization of an already existing and widely used criticality safety code, KENO V.a All the changes made to KENO V.a are transparent to the user making it possible to upgrade from the standard scalar version of KENO V.a to the vectorized version without learning a new code
Performance Evaluation of Multithreaded Geant4 Simulations Using an Intel Xeon Phi Cluster

Directory of Open Access Journals (Sweden)

P. Schweitzer

2015-01-01

Full Text Available The objective of this study is to evaluate the performances of Intel Xeon Phi hardware accelerators for Geant4 simulations, especially for multithreaded applications. We present the complete methodology to guide users for the compilation of their Geant4 applications on Phi processors. Then, we propose series of benchmarks to compare the performance of Xeon CPUs and Phi processors for a Geant4 example dedicated to the simulation of electron dose point kernels, the TestEm12 example. First, we compare a distributed execution of a sequential version of the Geant4 example on both architectures before evaluating the multithreaded version of the Geant4 example. If Phi processors demonstrated their ability to accelerate computing time (till a factor 3.83 when distributing sequential Geant4 simulations, we do not reach the same level of speedup when considering the multithreaded version of the Geant4 example.
A fast resonance interference treatment scheme with subgroup method

International Nuclear Information System (INIS)

Cao, L.; He, Q.; Wu, H.; Zu, T.; Shen, W.

2015-01-01

A fast Resonance Interference Factor (RIF) scheme is proposed to treat the resonance interference effects between different resonance nuclides. This scheme utilizes the conventional subgroup method to evaluate the self-shielded cross sections of the dominant resonance nuclide in the heterogeneous system and the hyper-fine energy group method to represent the resonance interference effects in a simplified homogeneous model. In this paper, the newly implemented scheme is compared to the background iteration scheme, the Resonance Nuclide Group (RNG) scheme and the conventional RIF scheme. The numerical results show that the errors of the effective self-shielded cross sections are significantly reduced by the fast RIF scheme compared with the background iteration scheme and the RNG scheme. Besides, the fast RIF scheme consumes less computation time than the conventional RIF schemes. The speed-up ratio is ~4.5 for MOX pin cell problems. (author)
Smoothed Particle Hydrodynamics Coupled with Radiation Transfer

Science.gov (United States)

Susa, Hajime

2006-04-01

We have constructed a brand-new radiation hydrodynamics solver based upon Smoothed Particle Hydrodynamics, which works on a parallel computer system. The code is designed to investigate the formation and evolution of first-generation objects at z ≳ 10, where the radiative feedback from various sources plays important roles. The code can compute the fraction of chemical species e, H+, H, H-, H2, and H+2 by by fully implicit time integration. It also can deal with multiple sources of ionizing radiation, as well as radiation at Lyman-Werner band. We compare the results for a few test calculations with the results of one-dimensional simulations, in which we find good agreements with each other. We also evaluate the speedup by parallelization, which is found to be almost ideal, as long as the number of sources is comparable to the number of processors.
Development of efficient GPU parallelization of WRF Yonsei University planetary boundary layer scheme

Directory of Open Access Journals (Sweden)

M. Huang

2015-09-01

Full Text Available The planetary boundary layer (PBL is the lowest part of the atmosphere and where its character is directly affected by its contact with the underlying planetary surface. The PBL is responsible for vertical sub-grid-scale fluxes due to eddy transport in the whole atmospheric column. It determines the flux profiles within the well-mixed boundary layer and the more stable layer above. It thus provides an evolutionary model of atmospheric temperature, moisture (including clouds, and horizontal momentum in the entire atmospheric column. For such purposes, several PBL models have been proposed and employed in the weather research and forecasting (WRF model of which the Yonsei University (YSU scheme is one. To expedite weather research and prediction, we have put tremendous effort into developing an accelerated implementation of the entire WRF model using graphics processing unit (GPU massive parallel computing architecture whilst maintaining its accuracy as compared to its central processing unit (CPU-based implementation. This paper presents our efficient GPU-based design on a WRF YSU PBL scheme. Using one NVIDIA Tesla K40 GPU, the GPU-based YSU PBL scheme achieves a speedup of 193× with respect to its CPU counterpart running on one CPU core, whereas the speedup for one CPU socket (4 cores with respect to 1 CPU core is only 3.5×. We can even boost the speedup to 360× with respect to 1 CPU core as two K40 GPUs are applied.
Fast index based algorithms and software for matching position specific scoring matrices

Directory of Open Access Journals (Sweden)

Homann Robert

2006-08-01

Full Text Available Abstract Background In biological sequence analysis, position specific scoring matrices (PSSMs are widely used to represent sequence motifs in nucleotide as well as amino acid sequences. Searching with PSSMs in complete genomes or large sequence databases is a common, but computationally expensive task. Results We present a new non-heuristic algorithm, called ESAsearch, to efficiently find matches of PSSMs in large databases. Our approach preprocesses the search space, e.g., a complete genome or a set of protein sequences, and builds an enhanced suffix array that is stored on file. This allows the searching of a database with a PSSM in sublinear expected time. Since ESAsearch benefits from small alphabets, we present a variant operating on sequences recoded according to a reduced alphabet. We also address the problem of non-comparable PSSM-scores by developing a method which allows the efficient computation of a matrix similarity threshold for a PSSM, given an E-value or a p-value. Our method is based on dynamic programming and, in contrast to other methods, it employs lazy evaluation of the dynamic programming matrix. We evaluated algorithm ESAsearch with nucleotide PSSMs and with amino acid PSSMs. Compared to the best previous methods, ESAsearch shows speedups of a factor between 17 and 275 for nucleotide PSSMs, and speedups up to factor 1.8 for amino acid PSSMs. Comparisons with the most widely used programs even show speedups by a factor of at least 3.8. Alphabet reduction yields an additional speedup factor of 2 on amino acid sequences compared to results achieved with the 20 symbol standard alphabet. The lazy evaluation method is also much faster than previous methods, with speedups of a factor between 3 and 330. Conclusion Our analysis of ESAsearch reveals sublinear runtime in the expected case, and linear runtime in the worst case for sequences not shorter than |A MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92Aae
Improvement of visual debugging tool. Shortening the elapsed time for getting data and adding new functions to compare/combine a set of visualized data

International Nuclear Information System (INIS)

Matsuda, Katsuyuki; Takemiya, Hiroshi

2001-03-01

The visual debugging tool 'vdebug' has been improved, which was designed for the debugging of programs for scientific computing. Improved were the following two points; (1) shortening the elapsed time required for getting appropriate data to visualize; (2) adding new functions which enable to compare and/or combine a set of visualized data originated from two or more different programs. As for shortening elapsed time for getting data, with the improved version of 'vdebug', we could achieve the following results; over hundred times shortening the elapsed time with dbx, pdbx of SX-4 and over ten times with ndb of SR2201. As for the new functions to compare/combine visualized data, it was confirmed that we could easily checked the consistency between the computational results obtained in each calculational steps on two different computers: SP and ONYX. In this report, we illustrate how the tool 'vdebug' has been improved with an example. (author)
Introduction of electronic referral from community associated with more timely review by secondary services.

Science.gov (United States)

Warren, J; White, S; Day, K J; Gu, Y; Pollock, M

2011-01-01

Electronic referral (eReferral) from community into public secondary healthcare services was introduced to 30 referring general medical practices and 28 hospital based services in late 2007. To measure the extent of uptake of eReferral and its association with changes in referral processing. Analysis of transactional data from the eReferral message service and the patient information management system of the affected hospital; interview of clinical, operational and management stakeholders. eReferral use rose steadily to 1000 transactions per month in 2008, thereafter showing moderate growth to 1200 per month in 2010. Rate of eReferral from the community in 2010 is estimated at 56% of total referrals to the hospital from general practice, and as 71% of referrals from those having done at least one referral electronically. Referral latency from letter date to hospital triage improves significantly from 2007 to 2009 (psystem usability issues. With eReferrals, a referral's status can be checked, and its content read, by any authorized user at any time. The period of eReferral uptake was associated with significant speed-up in referral processing without changes in staffing levels. The eReferral system provides a foundation for further innovation in the community-secondary interface, such as electronic decision support and shared care planning systems. We observed substantial rapid voluntary uptake of eReferrals associated with faster, more reliable and more transparent referral processing.
Disparities in abnormal mammogram follow-up time for Asian women compared to non-Hispanic Whites and between Asian ethnic groups

Science.gov (United States)

Nguyen, KH; Pasick, RJ; Stewart, SL; Kerlikowske, K; Karliner, LS

2017-01-01

Background Delays in abnormal mammogram follow-up contribute to poor outcomes. We examined abnormal screening mammogram follow-up differences for non-Hispanic Whites (NHW) and Asian women. Methods Prospective cohort of NHW and Asian women with a Breast Imaging Reporting and Data System abnormal result of 0 or 3+ in the San Francisco Mammography Registry between 2000–2010. We performed Kaplan-Meier estimation for median-days to follow-up with a diagnostic radiologic test, and compared proportion with follow-up at 30, 60 and 90 days, and no follow-up at one-year for Asians overall (and Asian ethnic groups) and NHWs. We additionally assessed the relationship between race/ethnicity and time-to-follow-up with adjusted Cox proportional hazards models. Results Among Asian women, Vietnamese and Filipinas had the longest, and Japanese the shortest, median follow-up time (32, 28, 19 days, respectively) compared to NHWs (15 days). The proportion of women receiving follow-up at 30 days was lower for Asians vs NHWs (57% vs 77%, pAsian ethnic groups except Japanese. Asians had a reduced hazard of follow-up compared with NHWs (aHR 0.70, 95% CI 0.69–0.72). Asians also had a higher rate than NHWs of no follow-up (15% vs 10%; pAsian ethnic groups, Filipinas had the highest percentage of women with no follow-up (18.1%). Conclusion Asian, particularly Filipina and Vietnamese, women were less likely than NHWs to receive timely follow-up after an abnormal screening mammogram. Research should disaggregate Asian ethnicity to better understand and address barriers to effective cancer prevention. PMID:28603859
Quantifying and comparing dynamic predictive accuracy of joint models for longitudinal marker and time-to-event in presence of censoring and competing risks.

Science.gov (United States)

Blanche, Paul; Proust-Lima, Cécile; Loubère, Lucie; Berr, Claudine; Dartigues, Jean-François; Jacqmin-Gadda, Hélène

2015-03-01

Thanks to the growing interest in personalized medicine, joint modeling of longitudinal marker and time-to-event data has recently started to be used to derive dynamic individual risk predictions. Individual predictions are called dynamic because they are updated when information on the subject's health profile grows with time. We focus in this work on statistical methods for quantifying and comparing dynamic predictive accuracy of this kind of prognostic models, accounting for right censoring and possibly competing events. Dynamic area under the ROC curve (AUC) and Brier Score (BS) are used to quantify predictive accuracy. Nonparametric inverse probability of censoring weighting is used to estimate dynamic curves of AUC and BS as functions of the time at which predictions are made. Asymptotic results are established and both pointwise confidence intervals and simultaneous confidence bands are derived. Tests are also proposed to compare the dynamic prediction accuracy curves of two prognostic models. The finite sample behavior of the inference procedures is assessed via simulations. We apply the proposed methodology to compare various prediction models using repeated measures of two psychometric tests to predict dementia in the elderly, accounting for the competing risk of death. Models are estimated on the French Paquid cohort and predictive accuracies are evaluated and compared on the French Three-City cohort. © 2014, The International Biometric Society.
Aerodynamic optimization of supersonic compressor cascade using differential evolution on GPU

Energy Technology Data Exchange (ETDEWEB)

Aissa, Mohamed Hasanine; Verstraete, Tom [Von Karman Institute for Fluid Dynamics (VKI) 1640 Sint-Genesius-Rode (Belgium); Vuik, Cornelis [Delft University of Technology 2628 CD Delft (Netherlands)

2016-06-08

Differential Evolution (DE) is a powerful stochastic optimization method. Compared to gradient-based algorithms, DE is able to avoid local minima but requires at the same time more function evaluations. In turbomachinery applications, function evaluations are performed with time-consuming CFD simulation, which results in a long, non affordable, design cycle. Modern High Performance Computing systems, especially Graphic Processing Units (GPUs), are able to alleviate this inconvenience by accelerating the design evaluation itself. In this work we present a validated CFD Solver running on GPUs, able to accelerate the design evaluation and thus the entire design process. An achieved speedup of 20x to 30x enabled the DE algorithm to run on a high-end computer instead of a costly large cluster. The GPU-enhanced DE was used to optimize the aerodynamics of a supersonic compressor cascade, achieving an aerodynamic loss minimization of 20%.
Comparing cryptomarkets for drugs. A characterisation of sellers and buyers over time.

Science.gov (United States)

Tzanetakis, Meropi

2018-02-12

Cryptomarkets operating on the darknet are a recent phenomenon that has gained importance only over the last couple of years (Barratt, 2012). However, they now constitute an evolving part of illicit drug markets. Although selling and buying a variety of psychoactive substances on the Internet has a long history, new technological developments enable systematic drug trading on the net.These technological innovations on the Internet allow users to proceed with (illicit) drug transactions with almost completely anonymous identities and locations. In this paper, we provide a systematic measurement analysis of structures and trends on the most popular anonymous drug marketplace, and discuss the role of cryptomarkets in drug distribution. Data collection and analysis include a long-term measurement of the cryptomarket 'AlphaBay', the most popular platform during the survey period. By developing and applying a web-scraping tool, market data was extracted from the marketplace on a daily basis during a period of twelve months between September 2015 and August 2016. The data was analysed by using business-intelligence software, which allows the linking of various data sets. We found 2188 unique vendors offering 11,925 drug items. The findings of our long-term monitoring and data analysis are compared over time and across marketplaces, offering a detailed understanding of the development of revenues generated, characterisation of countries of origin and destination, and distribution of vendors and customers over time. We provide a nuanced and highly detailed longitudinal analysis of drug trading on the darknet marketplace 'AlphaBay', which was the largest cryptomarket in operation. 1) Total sales volumes for the 'drugs' section was estimated at approximately USD 94 million for the period from September 2015 to August 2016. 2) In addition, about 64% of all sales are made with cocaine-, cannabis-, heroin-, and ecstasy-related products. 3) Average selling prices increase over
Ground-state projection multigrid for propagators in 4-dimensional SU(2) gauge fields

International Nuclear Information System (INIS)

Kalkreuter, T.

1991-09-01

The ground-state projection multigrid method is studied for computations of slowly decaying bosonic propagators in 4-dimensional SU(2) lattice gauge theory. The defining eigenvalue equation for the restriction operator is solved exactly. Although the critical exponent z is not reduced in nontrivial gauge fields, multigrid still yields considerable speedup compared with conventional relaxation. Multigrid is also able to outperform the conjugate gradient algorithm. (orig.)
Comparative Evaluation of Four Real-Time PCR Methods for the Quantitative Detection of Epstein-Barr Virus from Whole Blood Specimens.

Science.gov (United States)

Buelow, Daelynn; Sun, Yilun; Tang, Li; Gu, Zhengming; Pounds, Stanley; Hayden, Randall

2016-07-01

Monitoring of Epstein-Barr virus (EBV) load in immunocompromised patients has become integral to their care. An increasing number of reagents are available for quantitative detection of EBV; however, there are little published comparative data. Four real-time PCR systems (one using laboratory-developed reagents and three using analyte-specific reagents) were compared with one another for detection of EBV from whole blood. Whole blood specimens seeded with EBV were used to determine quantitative linearity, analytical measurement range, lower limit of detection, and CV for each assay. Retrospective testing of 198 clinical samples was performed in parallel with all methods; results were compared to determine relative quantitative and qualitative performance. All assays showed similar performance. No significant difference was found in limit of detection (3.12-3.49 log10 copies/mL; P = 0.37). A strong qualitative correlation was seen with all assays that used clinical samples (positive detection rates of 89.5%-95.8%). Quantitative correlation of clinical samples across assays was also seen in pairwise regression analysis, with R(2) ranging from 0.83 to 0.95. Normalizing clinical sample results to IU/mL did not alter the quantitative correlation between assays. Quantitative EBV detection by real-time PCR can be performed over a wide linear dynamic range, using three different commercially available reagents and laboratory-developed methods. EBV was detected with comparable sensitivity and quantitative correlation for all assays. Copyright © 2016 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.
Outage time reduction in GKN II without loss of safety

International Nuclear Information System (INIS)

Sturm, J.

1999-01-01

GKN II is a 1340 MWE 4-loop pressurised water reactor from Siemens KONVOI type, located in the south of Germany. It was originally connected to the grid at the end of 1988. Commercial operation under utility responsibility started at the second half of 1989. The first outage was performed in 1990. Beginning from this date, the outage duration was contiguously reduced from 33 days to 15 days in 1996. In 1998, two refueling and maintenance outages were performed, each with a duration of 7 days. Key planning factors to achieve these results are: A well adapted planning organisation with an outage manager and an outage planning team. An effective long term planning. This means the combination of work with a long duration every 4 or 8 years. No longlasting work in the years in between. Main work only on one safety train per year. Optimisation and standardisation of the shutdown and the startup sequence. The real change of reactor states have been modified, compared to the vendor recommendations. An tests are assigned to plant conditions, where they are most effective and are less time critical. Small modifications in the plant, mainly on the auxiliary systems, to speedup some sequences. Extreme detailed planning of maintenance and periodic tests. Each work/test can be found in a detailed schedule with a dedicated time widow. Optimized tools to perform the detailed planning and to implement the feedback of experience from former outages. Optimized tools for maintenance and handlings of heavy equipment on the critical path. Optimized tools to perform periodic tests. Key factors during outage are: Permanent control of the schedules with an updated 3-day program. Best and permanent information with this 3-day program of all people that are involved. Fast reaction on delays. Outage managers permanent on site. Gain in safety during shutdown states, with reduced outage duration: It has to be proven, that short outages don't lead to faster and less accurate work. It can be

Comparing twice- versus four-times daily insulin in mothers with gestational diabetes in Pakistan and its implications.

Science.gov (United States)

Saleem, Nazish; Godman, Brian; Hussain, Shahzad

2016-08-01

Gestational diabetes mellitus is a common medical problem associated with maternal and fetal complications. Good glycemic control is the cornerstone of treatment. Compare outcomes between four times (q.i.d) and twice daily (b.i.d) regimens. The morning dose of the b.i.d regimen contained two-thirds of the total insulin, comprising a third human regular insulin and two-thirds human intermediate insulin; equal amounts in the evening. 480 women at >30 weeks with gestational diabetes mellitus with failure to control blood glucose were randomly assigned to either regimen. Mean time to the control of blood glucose was significantly less and glycemic control significantly increased with the q.i.d regimen. Operative deliveries, extent of neonatal hypoglycemia, babies with low Agpar scores and those with hyperbilirubinemia were significantly higher with the b.i.d daily regimen. The q.i.d daily regime was associated with improved fetal and maternal outcomes. Consequently should increasingly be used in Pakistan, assisted by lower acquisition costs.
Parallel Solver for Diffuse Optical Tomography on Realistic Head Models With Scattering and Clear Regions.

Science.gov (United States)

Placati, Silvio; Guermandi, Marco; Samore, Andrea; Scarselli, Eleonora Franchi; Guerrieri, Roberto

2016-09-01

Diffuse optical tomography is an imaging technique, based on evaluation of how light propagates within the human head to obtain the functional information about the brain. Precision in reconstructing such an optical properties map is highly affected by the accuracy of the light propagation model implemented, which needs to take into account the presence of clear and scattering tissues. We present a numerical solver based on the radiosity-diffusion model, integrating the anatomical information provided by a structural MRI. The solver is designed to run on parallel heterogeneous platforms based on multiple GPUs and CPUs. We demonstrate how the solver provides a 7 times speed-up over an isotropic-scattered parallel Monte Carlo engine based on a radiative transport equation for a domain composed of 2 million voxels, along with a significant improvement in accuracy. The speed-up greatly increases for larger domains, allowing us to compute the light distribution of a full human head ( ≈ 3 million voxels) in 116 s for the platform used.
Parallelization of the model-based iterative reconstruction algorithm DIRA

International Nuclear Information System (INIS)

Oertenberg, A.; Sandborg, M.; Alm Carlsson, G.; Malusek, A.; Magnusson, M.

2016-01-01

New paradigms for parallel programming have been devised to simplify software development on multi-core processors and many-core graphical processing units (GPU). Despite their obvious benefits, the parallelization of existing computer programs is not an easy task. In this work, the use of the Open Multiprocessing (OpenMP) and Open Computing Language (OpenCL) frameworks is considered for the parallelization of the model-based iterative reconstruction algorithm DIRA with the aim to significantly shorten the code's execution time. Selected routines were parallelized using OpenMP and OpenCL libraries; some routines were converted from MATLAB to C and optimised. Parallelization of the code with the OpenMP was easy and resulted in an overall speedup of 15 on a 16-core computer. Parallelization with OpenCL was more difficult owing to differences between the central processing unit and GPU architectures. The resulting speedup was substantially lower than the theoretical peak performance of the GPU; the cause was explained. (authors)
Computational chemistry with transputers: A direct SCF program

International Nuclear Information System (INIS)

Wedig, U.; Burkhardt, A.; Schnering, H.G. von

1989-01-01

By using transputers it is possible to build up networks of parallel processors with varying topology. Due to the architecture of the processors it is appropriate to use the MIMD (multiple instruction multiple data) concept of parallel computing. The most suitable programming language is OCCAM. We investigate the use of transputer networks in computational chemistry, starting with the direct SCF method. The most time consuming step, the calculation of the two electron integrals is executed parallelly. Each node in the network calculates whole batches of integrals. The main program is written in OCCAM. For some large-scale arithmetic processes running on a single node, however, we used FORTRAN subroutines out of standard ab-initio programs to reduce the programming effort. Test calculations show, that the integral calculation step can be parallelled very efficiently. We observed a speed-up of almost 8 using eight network processors. Even in consideration of the scalar part of the SCF iteration, the speed-up is not less than 7.1. (orig.)
Solution to PDEs using radial basis function finite-differences (RBF-FD) on multiple GPUs

International Nuclear Information System (INIS)

Bollig, Evan F.; Flyer, Natasha; Erlebacher, Gordon

2012-01-01

This paper presents parallelization strategies for the radial basis function-finite difference (RBF-FD) method. As a generalized finite differencing scheme, the RBF-FD method functions without the need for underlying meshes to structure nodes. It offers high-order accuracy approximation and scales as O(N) per time step, with N being with the total number of nodes. To our knowledge, this is the first implementation of the RBF-FD method to leverage GPU accelerators for the solution of PDEs. Additionally, this implementation is the first to span both multiple CPUs and multiple GPUs. OpenCL kernels target the GPUs and inter-processor communication and synchronization is managed by the Message Passing Interface (MPI). We verify our implementation of the RBF-FD method with two hyperbolic PDEs on the sphere, and demonstrate up to 9x speedup on a commodity GPU with unoptimized kernel implementations. On a high performance cluster, the method achieves up to 7x speedup for the maximum problem size of 27,556 nodes.
Comparative study of the sensitivity of ADC value and T{sub 2} relaxation time for early detection of Wallerian degeneration

Energy Technology Data Exchange (ETDEWEB)

Zhang Fan [Department of Radiology, Nanjing Jinling Hospital, Clinical School of Medical College of Nanjing University, Nanjing 210002 (China); Lu Guangming, E-mail: cjr.luguangming@vip.163.com [Department of Radiology, Nanjing Jinling Hospital, Clinical School of Medical College of Nanjing University, Nanjing 210002 (China); Zee Chishing, E-mail: chishing@usc.edu [Department of Radiology, USC Keck School of Medicine (United States)

2011-07-15

Background and purpose: Wallerian degeneration (WD), the secondary degeneration of axons from cortical and subcortical injuries, is associated with poor neurological outcome. There is some quantitative MR imaging techniques used to estimate the biologic changes secondary to delayed neuronal and axonal losses. Our purpose is to assess the sensitivity of ADC value and T{sub 2} relaxation time for early detection of WD. Methods: Ten male Sprague-Dawley rats were used to establish in vivo Wallerian degeneration model of CNS by ipsilateral motor-sensory cortex ablation. 5 days after cortex ablation, multiecho-T{sub 2} relaxometry and multi-b value DWI were acquired by using a 7 T MR imaging scanner. ADC-map and T{sub 2}-map were reconstructed by post-processing. ROIs are selected according to pathway of corticospinal tract from cortex, internal capsule, cerebral peduncle, pons, medulla oblongata to upper cervical spinal cord to measure ADC value and T{sub 2} relaxation time of healthy side and affected side. The results were compared between the side with cortical ablation and the side without ablation. Results: Excluding ablated cortex, ADC values of the corticospinal tract were significantly increased (P < 0.05) in affected side compared to the unaffected, healthy side; no difference in T{sub 2} relaxation time was observed between the affected and healthy sides. Imaging findings were correlated with histological examinations. Conclusion: As shown in this animal experiment, ADC values could non-invasively demonstrate the secondary degeneration involving descending white matter tracts. ADC values are more sensitive indicators for detection of early WD than T{sub 2} relaxation time.
Ultrafast and scalable cone-beam CT reconstruction using MapReduce in a cloud computing environment.

Science.gov (United States)

Meng, Bowen; Pratx, Guillem; Xing, Lei

2011-12-01

Four-dimensional CT (4DCT) and cone beam CT (CBCT) are widely used in radiation therapy for accurate tumor target definition and localization. However, high-resolution and dynamic image reconstruction is computationally demanding because of the large amount of data processed. Efficient use of these imaging techniques in the clinic requires high-performance computing. The purpose of this work is to develop a novel ultrafast, scalable and reliable image reconstruction technique for 4D CBCT∕CT using a parallel computing framework called MapReduce. We show the utility of MapReduce for solving large-scale medical physics problems in a cloud computing environment. In this work, we accelerated the Feldcamp-Davis-Kress (FDK) algorithm by porting it to Hadoop, an open-source MapReduce implementation. Gated phases from a 4DCT scans were reconstructed independently. Following the MapReduce formalism, Map functions were used to filter and backproject subsets of projections, and Reduce function to aggregate those partial backprojection into the whole volume. MapReduce automatically parallelized the reconstruction process on a large cluster of computer nodes. As a validation, reconstruction of a digital phantom and an acquired CatPhan 600 phantom was performed on a commercial cloud computing environment using the proposed 4D CBCT∕CT reconstruction algorithm. Speedup of reconstruction time is found to be roughly linear with the number of nodes employed. For instance, greater than 10 times speedup was achieved using 200 nodes for all cases, compared to the same code executed on a single machine. Without modifying the code, faster reconstruction is readily achievable by allocating more nodes in the cloud computing environment. Root mean square error between the images obtained using MapReduce and a single-threaded reference implementation was on the order of 10(-7). Our study also proved that cloud computing with MapReduce is fault tolerant: the reconstruction completed
On the adequacy of message-passing parallel supercomputers for solving neutron transport problems

International Nuclear Information System (INIS)

Azmy, Y.Y.

1990-01-01

A coarse-grained, static-scheduling parallelization of the standard iterative scheme used for solving the discrete-ordinates approximation of the neutron transport equation is described. The parallel algorithm is based on a decomposition of the angular domain along the discrete ordinates, thus naturally producing a set of completely uncoupled systems of equations in each iteration. Implementation of the parallel code on Intcl's iPSC/2 hypercube, and solutions to test problems are presented as evidence of the high speedup and efficiency of the parallel code. The performance of the parallel code on the iPSC/2 is analyzed, and a model for the CPU time as a function of the problem size (order of angular quadrature) and the number of participating processors is developed and validated against measured CPU times. The performance model is used to speculate on the potential of massively parallel computers for significantly speeding up real-life transport calculations at acceptable efficiencies. We conclude that parallel computers with a few hundred processors are capable of producing large speedups at very high efficiencies in very large three-dimensional problems. 10 refs., 8 figs
Extended Traffic Crash Modelling through Precision and Response Time Using Fuzzy Clustering Algorithms Compared with Multi-layer Perceptron

Directory of Open Access Journals (Sweden)

Iman Aghayan

2012-11-01

Full Text Available This paper compares two fuzzy clustering algorithms – fuzzy subtractive clustering and fuzzy C-means clustering – to a multi-layer perceptron neural network for their ability to predict the severity of crash injuries and to estimate the response time on the traffic crash data. Four clustering algorithms – hierarchical, K-means, subtractive clustering, and fuzzy C-means clustering – were used to obtain the optimum number of clusters based on the mean silhouette coefficient and R-value before applying the fuzzy clustering algorithms. The best-fit algorithms were selected according to two criteria: precision (root mean square, R-value, mean absolute errors, and sum of square error and response time (t. The highest R-value was obtained for the multi-layer perceptron (0.89, demonstrating that the multi-layer perceptron had a high precision in traffic crash prediction among the prediction models, and that it was stable even in the presence of outliers and overlapping data. Meanwhile, in comparison with other prediction models, fuzzy subtractive clustering provided the lowest value for response time (0.284 second, 9.28 times faster than the time of multi-layer perceptron, meaning that it could lead to developing an on-line system for processing data from detectors and/or a real-time traffic database. The model can be extended through improvements based on additional data through induction procedure.
Accelerating ROP detector layout optimization

International Nuclear Information System (INIS)

Kastanya, D.; Fodor, B.

2012-01-01

The ADORE (Alternating Detector layout Optimization for REgional overpower protection system) algorithm for performing the optimization of regional overpower protection (ROP) system for CANDU® reactors have been recently developed. The simulated annealing (SA) stochastic optimization technique is utilized to come up with a quasi optimized detector layout for the ROP systems. Within each simulated annealing history, the objective function is calculated as a function of the trip set point (TSP) corresponding to the detector layout for that particular history. The evaluation of the TSP is done probabilistically using the ROVER-F code. Since during each optimization execution thousands of candidate detector layouts are evaluated, the overall optimization process is time consuming. Since for each ROVER-F evaluation the number of fuelling ripples controls the execution time, reducing the number of fuelling ripples used during the calculation of TSP will reduce the overall optimization execution time. This approach has been investigated and the results are presented in this paper. The challenge is to construct a set of representative fuelling ripples which will significantly speedup the optimization process while guaranteeing that the resulting detector layout has similar quality to the ones produced when the complete set of fuelling ripples is employed. Results presented in this paper indicate that a speedup of up to around 40 times is attainable when this approach is utilized. (author)
A Novel Approach to Fast SOLIS Stokes Inversion for Photospheric Vector Magnetography.

Science.gov (United States)

Harker, Brian; Mighell, K.

2009-05-01

The SOLIS (Synoptic Optical Long-term Investigations of the Sun) Vector Spectromagnetograph (VSM) is a full-disc spectropolarimeter, located at Kitt Peak National Observatory, which records Zeeman-induced polarization in the magnetically-sensitive FeI spectral lines at 630.15 nm and 630.25 nm. A SOLIS VSM full-disc dataset consists of 2048 scanlines, with each scanline containing the Stokes I, Q, U, and V spectral line profiles in 128 unique wavelength bins for all 2048 pixels in the scanline. These Stokes polarization profiles are inverted to obtain the magnetic and thermodynamic structure of the observations, based on a model Milne-Eddington plane-parallel atmosphere. Until recently, this has been a compute-intensive, relatively slow process. This poster presents a novel method of producing such model-based characterizations of the photospheric magnetic field by utilizing an inversion engine based on a genetic algorithm. The algorithm executes in a heterogeneous compute environment composed of both a CPU and a graphics processing unit (GPU). Using the cutting-edge NVIDIA CUDA platform, we are able to offload the compute-intensive portions of the inversion code to the GPU, which results in significant speedup. This speedup provides the impetus which has driven the development of this strategy. Currently, SOLIS vector magnetic field products are generated with a modified version of the HAO ASP inversion code developed by Skumanich & Lites (1987), and these data products are made available to the scientific community 24 hours after the actual observation(s). With this work, we aim to drastically reduce this waiting period to allow near real-time characterizations of the photospheric vector magnetic field. Here, we here detail the inversion method we have pioneered, present preliminary results on the derived full-disc magnetic field as well as timing/speedup considerations, and finally offer some outlooks on the future direction of this work.
Deterministic and stochastic transport theories for the analysis of complex nuclear systems

International Nuclear Information System (INIS)

Giffard, F.X.

2000-01-01

In the field of reactor and fuel cycle physics, particle transport plays an important role. Neutronic design, operation and evaluation calculations of nuclear systems make use of large and powerful computer codes. However, current limitations in terms of computer resources make it necessary to introduce simplifications and approximations in order to keep calculation time and cost within reasonable limits. Two different types of methods are available in these codes. The first one is the deterministic method, which is applicable in most practical cases but requires approximations. The other method is the Monte Carlo method, which does not make these approximations but which generally requires exceedingly long running times. The main motivation of this work is to investigate the possibility of a combined use of the two methods in such a way as to retain their advantages while avoiding their drawbacks. Our work has mainly focused on the speed-up of 3-D continuous energy Monte Carlo calculations (TRIPOLI-4 code) by means of an optimized biasing scheme derived from importance maps obtained from the deterministic code ERANOS. The application of this method to two different practical shielding-type problems has demonstrated its efficiency: speed-up factors of 100 have been reached. In addition, the method offers the advantage of being easily implemented as it is not very sensitive to the choice of the importance mesh grid. It has also been demonstrated that significant speed-ups can be achieved by this method in the case of coupled neutron-gamma transport problems, provided that the interdependence of the neutron and photon importance maps is taken into account. Complementary studies are necessary to tackle a problem brought out by this work, namely undesirable jumps in the Monte Carlo variance estimates. (author)
Developments based on stochastic and determinist methods for studying complex nuclear systems

International Nuclear Information System (INIS)

Giffard, F.X.

2000-01-01

In the field of reactor and fuel cycle physics, particle transport plays and important role. Neutronic design, operation and evaluation calculations of nuclear system make use of large and powerful computer codes. However, current limitations in terms of computer resources make it necessary to introduce simplifications and approximations in order to keep calculation time and cost within reasonable limits. Two different types of methods are available in these codes. The first one is the deterministic method, which is applicable in most practical cases but requires approximations. The other method is the Monte Carlo method, which does not make these approximations but which generally requires exceedingly long running times. The main motivation of this work is to investigate the possibility of a combined use of the two methods in such a way as to retain their advantages while avoiding their drawbacks. Our work has mainly focused on the speed-up of 3-D continuous energy Monte Carlo calculations (TRIPOLI-4 code) by means of an optimized biasing scheme derived from importance maps obtained from the deterministic code ERANOS. The application of this method to two different practical shielding-type problems has demonstrated its efficiency: speed-up factors of 100 have been reached. In addition, the method offers the advantage of being easily implemented as it is not very to the choice of the importance mesh grid. It has also been demonstrated that significant speed-ups can be achieved by this method in the case of coupled neutron-gamma transport problems, provided that the interdependence of the neutron and photon importance maps is taken into account. Complementary studies are necessary to tackle a problem brought out by this work, namely undesirable jumps in the Monte Carlo variance estimates. (author)
Developments based on stochastic and determinist methods for studying complex nuclear systems; Developpements utilisant des methodes stochastiques et deterministes pour l'analyse de systemes nucleaires complexes

Energy Technology Data Exchange (ETDEWEB)

Giffard, F.X

2000-05-19

In the field of reactor and fuel cycle physics, particle transport plays and important role. Neutronic design, operation and evaluation calculations of nuclear system make use of large and powerful computer codes. However, current limitations in terms of computer resources make it necessary to introduce simplifications and approximations in order to keep calculation time and cost within reasonable limits. Two different types of methods are available in these codes. The first one is the deterministic method, which is applicable in most practical cases but requires approximations. The other method is the Monte Carlo method, which does not make these approximations but which generally requires exceedingly long running times. The main motivation of this work is to investigate the possibility of a combined use of the two methods in such a way as to retain their advantages while avoiding their drawbacks. Our work has mainly focused on the speed-up of 3-D continuous energy Monte Carlo calculations (TRIPOLI-4 code) by means of an optimized biasing scheme derived from importance maps obtained from the deterministic code ERANOS. The application of this method to two different practical shielding-type problems has demonstrated its efficiency: speed-up factors of 100 have been reached. In addition, the method offers the advantage of being easily implemented as it is not very to the choice of the importance mesh grid. It has also been demonstrated that significant speed-ups can be achieved by this method in the case of coupled neutron-gamma transport problems, provided that the interdependence of the neutron and photon importance maps is taken into account. Complementary studies are necessary to tackle a problem brought out by this work, namely undesirable jumps in the Monte Carlo variance estimates. (author)
Developments based on stochastic and determinist methods for studying complex nuclear systems; Developpements utilisant des methodes stochastiques et deterministes pour l'analyse de systemes nucleaires complexes

Energy Technology Data Exchange (ETDEWEB)

Giffard, F X

2000-05-19

In the field of reactor and fuel cycle physics, particle transport plays and important role. Neutronic design, operation and evaluation calculations of nuclear system make use of large and powerful computer codes. However, current limitations in terms of computer resources make it necessary to introduce simplifications and approximations in order to keep calculation time and cost within reasonable limits. Two different types of methods are available in these codes. The first one is the deterministic method, which is applicable in most practical cases but requires approximations. The other method is the Monte Carlo method, which does not make these approximations but which generally requires exceedingly long running times. The main motivation of this work is to investigate the possibility of a combined use of the two methods in such a way as to retain their advantages while avoiding their drawbacks. Our work has mainly focused on the speed-up of 3-D continuous energy Monte Carlo calculations (TRIPOLI-4 code) by means of an optimized biasing scheme derived from importance maps obtained from the deterministic code ERANOS. The application of this method to two different practical shielding-type problems has demonstrated its efficiency: speed-up factors of 100 have been reached. In addition, the method offers the advantage of being easily implemented as it is not very to the choice of the importance mesh grid. It has also been demonstrated that significant speed-ups can be achieved by this method in the case of coupled neutron-gamma transport problems, provided that the interdependence of the neutron and photon importance maps is taken into account. Complementary studies are necessary to tackle a problem brought out by this work, namely undesirable jumps in the Monte Carlo variance estimates. (author)
Fast acceleration of 2D wave propagation simulations using modern computational accelerators.

Directory of Open Access Journals (Sweden)

Wei Wang

Full Text Available Recent developments in modern computational accelerators like Graphics Processing Units (GPUs and coprocessors provide great opportunities for making scientific applications run faster than ever before. However, efficient parallelization of scientific code using new programming tools like CUDA requires a high level of expertise that is not available to many scientists. This, plus the fact that parallelized code is usually not portable to different architectures, creates major challenges for exploiting the full capabilities of modern computational accelerators. In this work, we sought to overcome these challenges by studying how to achieve both automated parallelization using OpenACC and enhanced portability using OpenCL. We applied our parallelization schemes using GPUs as well as Intel Many Integrated Core (MIC coprocessor to reduce the run time of wave propagation simulations. We used a well-established 2D cardiac action potential model as a specific case-study. To the best of our knowledge, we are the first to study auto-parallelization of 2D cardiac wave propagation simulations using OpenACC. Our results identify several approaches that provide substantial speedups. The OpenACC-generated GPU code achieved more than 150x speedup above the sequential implementation and required the addition of only a few OpenACC pragmas to the code. An OpenCL implementation provided speedups on GPUs of at least 200x faster than the sequential implementation and 30x faster than a parallelized OpenMP implementation. An implementation of OpenMP on Intel MIC coprocessor provided speedups of 120x with only a few code changes to the sequential implementation. We highlight that OpenACC provides an automatic, efficient, and portable approach to achieve parallelization of 2D cardiac wave simulations on GPUs. Our approach of using OpenACC, OpenCL, and OpenMP to parallelize this particular model on modern computational accelerators should be applicable to other
Time-domain, nuclear-resonant, forward scattering: the classical approach

International Nuclear Information System (INIS)

Hoy, G.R.

1997-01-01

This paper deals with the interaction of electromagnetic radiation with matter assuming the matter to have nuclear transitions in resonance with incident electromagnetic radiation. The source of the radiation is taken to be of two types; natural radioactive gamma decay and synchrotron radiation. Numerical examples using 57 Fe are given for the two types of source radiation. Calculated results are contrasted for the two cases. Electromagnetic radiation produced by recoil-free gamma-ray emission has essentially the natural linewidth. Electromagnetic radiation from a synchrotron, even with the best monochromators available, has a relatively broad-band spectrum, essentially constant for these considerations. Polarization effects are considered. In general, the nuclear-resonant medium changes the polarization of the input radiation on traversing the medium. Calculations are presented to illustrate that synchrotron radiation studies using nuclear-resonant forward scattering have the potential for making high-precision measurements of hyperfine fields and recoilless fractions. An interesting aspect of nuclear-resonant forward scattering, relative to possible gamma-ray laser development, is the so-called 'speed-up' effect
Clinical evaluation of PET image quality as a function of acquisition time in a new TOF-PET/MR compared to TOF-PET/CT - initial results

International Nuclear Information System (INIS)

Zeimpekis, Konstantinos; Huellner, Martin; De Galiza Barbosa, Felipe; Ter Voert, Edwin; Davison, Helen; Delso, Gaspar; Veit-Haibach, Patrick

2015-01-01

The recently available integrated PET/MR imaging can offer significant additional advances in clinical imaging. The purpose of this study was to compare the PET performance between a PET/CT scanner and an integrated TOF-PET/MR scanner concerning image quality parameters and quantification in terms of SUV as a function of acquisition time (a surrogate of dose). Five brain and five whole body patients were included in the study. The PET/CT scan was used as a reference and the PET/MR acquisition time was consecutively adjusted, taking into account the decay between the scans in order to expose both systems to the same amount of emitted signal. The acquisition times were then retrospectively reduced to assess the performance of the PET/MRI for lower count rates. Image quality, image sharpness, artifacts and noise were evaluated. SUV measurements were taken in the liver and in white matter to compare quantification. Quantitative evaluation showed good correlation between PET/CT and PET/MR brain SUVs. Liver correlation was lower, with uptake underestimation in PET/MR, partially justified by bio-redistribution. The clinical evaluation showed that PET/MR offers higher image quality and sharpness with lower levels of noise and artefacts compared to PET/CT with reduced acquisition times for whole body scans, while for brain scans there is no significant difference. The PET-component of the TOF-PET/MR showed higher image quality compared to PET/CT as tested with reduced imaging times. However, these results account mainly for body imaging, while no significant difference were found in brain imaging. This overall higher image quality suggests that the acquisition time or injected activity can be reduced by at least 37% on the PET/MR scanner.
Clinical evaluation of PET image quality as a function of acquisition time in a new TOF-PET/MR compared to TOF-PET/CT - initial results

Energy Technology Data Exchange (ETDEWEB)

Zeimpekis, Konstantinos; Huellner, Martin; De Galiza Barbosa, Felipe; Ter Voert, Edwin; Davison, Helen; Delso, Gaspar; Veit-Haibach, Patrick [Nuclear Medicine, University Hospital Zurich (Switzerland)

2015-05-18

The recently available integrated PET/MR imaging can offer significant additional advances in clinical imaging. The purpose of this study was to compare the PET performance between a PET/CT scanner and an integrated TOF-PET/MR scanner concerning image quality parameters and quantification in terms of SUV as a function of acquisition time (a surrogate of dose). Five brain and five whole body patients were included in the study. The PET/CT scan was used as a reference and the PET/MR acquisition time was consecutively adjusted, taking into account the decay between the scans in order to expose both systems to the same amount of emitted signal. The acquisition times were then retrospectively reduced to assess the performance of the PET/MRI for lower count rates. Image quality, image sharpness, artifacts and noise were evaluated. SUV measurements were taken in the liver and in white matter to compare quantification. Quantitative evaluation showed good correlation between PET/CT and PET/MR brain SUVs. Liver correlation was lower, with uptake underestimation in PET/MR, partially justified by bio-redistribution. The clinical evaluation showed that PET/MR offers higher image quality and sharpness with lower levels of noise and artefacts compared to PET/CT with reduced acquisition times for whole body scans, while for brain scans there is no significant difference. The PET-component of the TOF-PET/MR showed higher image quality compared to PET/CT as tested with reduced imaging times. However, these results account mainly for body imaging, while no significant difference were found in brain imaging. This overall higher image quality suggests that the acquisition time or injected activity can be reduced by at least 37% on the PET/MR scanner.
Parallel statistical image reconstruction for cone-beam x-ray CT on a shared memory computation platform

International Nuclear Information System (INIS)

Kole, J S; Beekman, F J

2005-01-01

Statistical reconstruction methods offer possibilities of improving image quality as compared to analytical methods, but current reconstruction times prohibit routine clinical applications. To reduce reconstruction times we have parallelized a statistical reconstruction algorithm for cone-beam x-ray CT, the ordered subset convex algorithm (OSC), and evaluated it on a shared memory computer. Two different parallelization strategies were developed: one that employs parallelism by computing the work for all projections within a subset in parallel, and one that divides the total volume into parts and processes the work for each sub-volume in parallel. Both methods are used to reconstruct a three-dimensional mathematical phantom on two different grid densities. The reconstructed images are binary identical to the result of the serial (non-parallelized) algorithm. The speed-up factor equals approximately 30 when using 32 to 40 processors, and scales almost linearly with the number of cpus for both methods. The huge reduction in computation time allows us to apply statistical reconstruction to clinically relevant studies for the first time

Comparing personal alpha dosimetry with the conventional area monitoring-time weighting methods of exposure estimation: a Canadian assessment

International Nuclear Information System (INIS)

Balint, A.B.; Viljoen, J.

1988-01-01

An experimental personal alpha dosimetry program for monitoring exposures of uranium mining facility workers in Canada has been completed. All licenced operating mining facilities were participating. Dosimetry techniques, description of dosimeters used by licences, performance and problems associated with the implementation of the programme as well as technical and administrative advantages and difficulties experienced are discussed. Area monitoring-time weighting methods used and results obtained to determine individual radon and thoron daughter exposure and exposure results generated by using dosimeters are assessed and compared
Multiple time step molecular dynamics in the optimized isokinetic ensemble steered with the molecular theory of solvation: Accelerating with advanced extrapolation of effective solvation forces

International Nuclear Information System (INIS)

Omelyan, Igor; Kovalenko, Andriy

2013-01-01

We develop efficient handling of solvation forces in the multiscale method of multiple time step molecular dynamics (MTS-MD) of a biomolecule steered by the solvation free energy (effective solvation forces) obtained from the 3D-RISM-KH molecular theory of solvation (three-dimensional reference interaction site model complemented with the Kovalenko-Hirata closure approximation). To reduce the computational expenses, we calculate the effective solvation forces acting on the biomolecule by using advanced solvation force extrapolation (ASFE) at inner time steps while converging the 3D-RISM-KH integral equations only at large outer time steps. The idea of ASFE consists in developing a discrete non-Eckart rotational transformation of atomic coordinates that minimizes the distances between the atomic positions of the biomolecule at different time moments. The effective solvation forces for the biomolecule in a current conformation at an inner time step are then extrapolated in the transformed subspace of those at outer time steps by using a modified least square fit approach applied to a relatively small number of the best force-coordinate pairs. The latter are selected from an extended set collecting the effective solvation forces obtained from 3D-RISM-KH at outer time steps over a broad time interval. The MTS-MD integration with effective solvation forces obtained by converging 3D-RISM-KH at outer time steps and applying ASFE at inner time steps is stabilized by employing the optimized isokinetic Nosé-Hoover chain (OIN) ensemble. Compared to the previous extrapolation schemes used in combination with the Langevin thermostat, the ASFE approach substantially improves the accuracy of evaluation of effective solvation forces and in combination with the OIN thermostat enables a dramatic increase of outer time steps. We demonstrate on a fully flexible model of alanine dipeptide in aqueous solution that the MTS-MD/OIN/ASFE/3D-RISM-KH multiscale method of molecular dynamics
INFLUENCE ANALYSES OF DESIGNED CHARACTERISTICS OF THE ELEVATOR TO THE PARAMETERS OF ITS DRIVE

Directory of Open Access Journals (Sweden)

V. M. Bohomaz

2015-04-01

Full Text Available Purpose. A drive behaves to the basic elements of scoop band elevators. To determine the drive power calculations are carried out according to standard methods, which take plenty of time. It is necessary to conduct the analysis of influence of project characteristics on the parameters of high-speed drive in a scoop bent elevator and to build the improved algorithm of speed-up determination of its drive power on project descriptions, that will take into account the type of load, rise height, necessary productivity, standard parameters of scoops and tape. Methodology. Using parametric dependences of elevators drive power on its project characteristics, received by the author earlier, the improved algorithm of speed-up determination of elevators high-speed drive power with deep and shallow scoops at set type of load, productivity, rise height was offered. Findings. The algorithm of drive power speed-up determination of high-speed vertical elevators with deep and shallow scoops depending on project parameters was offered. The example of such algorithm application is considered for an elevator that is intended for transporting of cement. Analytical dependences of drive power of such elevator are determined on the productivity and rise height. Corresponding graphic dependences were built and character of drive power changes during the size changing of any project characteristics was determined. Originality. The improved algorithm of drive power elevator determination at the given project characteristics (type of load, rise height, productivity that takes into account standard sizes, parameters of scoops and ribbons, was built at first time. Practical value. The use of an offered algorithm of calculation gives the possibility of relatively fast determination of reference value of high-speed elevators drive power with deep and shallow scoops, to build graphic dependences of drive power on the values of the productivity and rise height at the certain
A comparative study of peripheral to central circulation delivery times between intraosseous and intravenous injection using a radionuclide technique in normovolemic and hypovolemic canines

International Nuclear Information System (INIS)

Cameron, J.L.; Fontanarosa, P.B.; Passalaqua, A.M.

1989-01-01

Intraosseous infusion is considered a useful technique for administration of medications and fluids in emergency situations when peripheral intravascular access is unobtainable. This study examined the effectiveness of intraosseous infusion for delivery of substances to the central circulation. Central deliveries of a radionuclide tracer administered by the intraosseous and intravenous routes were evaluated during normovolemic and hypovolemic states. Intraosseous infusion achieved peripheral to central circulation transit times comparable to those achieved by the intravenous route. Analysis of variance revealed no statistically significant differences between the peripheral to central delivery times comparing intraosseous and intravenous administration. The results demonstrate that intraosseous infusion is a rapid and effective method of delivery to the central circulation and is an alternative method for intravascular access. This study also suggests that a radionuclide tracer is useful for the evaluation of transit times following intraosseous injection
Least-squares Migration and Full Waveform Inversion with Multisource Frequency Selection

KAUST Repository

Huang, Yunsong

2013-09-01

Multisource Least-Squares Migration (LSM) of phase-encoded supergathers has shown great promise in reducing the computational cost of conventional migration. But for the marine acquisition geometry this approach faces the challenge of erroneous misfit due to the mismatch between the limited number of live traces/shot recorded in the field and the pervasive number of traces generated by the finite-difference modeling method. To tackle this mismatch problem, I present a frequency selection strategy with LSM of supergathers. The key idea is, at each LSM iteration, to assign a unique frequency band to each shot gather, so that the spectral overlap among those shots—and therefore their crosstallk—is zero. Consequently, each receiver can unambiguously identify and then discount the superfluous sources—those that are not associated with the receiver in marine acquisition. To compare with standard migration, I apply the proposed method to 2D SEG/EAGE salt model and obtain better resolved images computed at about 1/8 the cost; results for 3D SEG/EAGE salt model, with Ocean Bottom Seismometer (OBS) survey, show a speedup of 40×. This strategy is next extended to multisource Full Waveform Inversion (FWI) of supergathers for marine streamer data, with the same advantages of computational efficiency and storage savings. In the Finite-Difference Time-Domain (FDTD) method, to mitigate spectral leakage due to delayed onsets of sine waves detected at receivers, I double the simulation time and retain only the second half of the simulated records. To compare with standard FWI, I apply the proposed method to 2D velocity model of SEG/EAGE salt and to Gulf Of Mexico (GOM) field data, and obtain a speedup of about 4× and 8×. Formulas are then derived for the resolution limits of various constituent wavepaths pertaining to FWI: diving waves, primary reflections, diffractions, and multiple reflections. They suggest that inverting multiples can provide some low and intermediate
PCIU: Hardware Implementations of an Efficient Packet Classification Algorithm with an Incremental Update Capability

Directory of Open Access Journals (Sweden)

O. Ahmed

2011-01-01

Full Text Available Packet classification plays a crucial role for a number of network services such as policy-based routing, firewalls, and traffic billing, to name a few. However, classification can be a bottleneck in the above-mentioned applications if not implemented properly and efficiently. In this paper, we propose PCIU, a novel classification algorithm, which improves upon previously published work. PCIU provides lower preprocessing time, lower memory consumption, ease of incremental rule update, and reasonable classification time compared to state-of-the-art algorithms. The proposed algorithm was evaluated and compared to RFC and HiCut using several benchmarks. Results obtained indicate that PCIU outperforms these algorithms in terms of speed, memory usage, incremental update capability, and preprocessing time. The algorithm, furthermore, was improved and made more accessible for a variety of applications through implementation in hardware. Two such implementations are detailed and discussed in this paper. The results indicate that a hardware/software codesign approach results in a slower, but easier to optimize and improve within time constraints, PCIU solution. A hardware accelerator based on an ESL approach using Handel-C, on the other hand, resulted in a 31x speed-up over a pure software implementation running on a state of the art Xeon processor.
Rate and time to develop first central line-associated bloodstream infections when comparing open and closed infusion containers in a Brazilian Hospital

Directory of Open Access Journals (Sweden)

Margarete Vilins

Full Text Available The objective of the study was to determine the effect of switching from an open (glass or semi-rigid plastic infusion container to a closed, fully collapsible plastic infusion container (Viaflex® on rate and time to onset of central lineassociated bloodstream infections (CLABSI. An open-label, prospective cohort, active healthcare-associated infection surveillance, sequential study was conducted in three intensive care units in Brazil. The CLABSI rate using open infusion containers was compared to the rate using a closed infusion container. Probability of acquiring CLABSI was assessed over time and compared between open and closed infusion container periods; three-day intervals were examined. A total of 1125 adult ICU patients were enrolled. CLABSI rate was significantly higher during the open compared with the closed infusion container period (6.5 versus 3.2 CLABSI/1000 CL days; RR=0.49, 95%CI=0.26- 0.95, p=0.031. During the closed infusion container period, the probability of acquiring a CLABSI remained relatively constant along the time of central line use (0.8% Days 2-4 to 0.7% Days 11-13 but increased in the open infusion container period (1.5% Days 2-4 to 2.3% Days 11-13. Combined across all time intervals, the chance of a patient acquiring a CLABSI was significantly lower (55% in the closed infusion container period (Cox proportional hazard ratio 0.45, p= 0.019. CLABSIs can be reduced with the use of full barrier precautions, education, and performance feedback. Our results show that switching from an open to a closed infusion container may further reduce CLABSI rate as well as delay the onset of CLABSIs. Closed infusion containers significantly reduced CLABSI rate and the probability of acquiring CLABSI.
Times of Crisis – From a Comparative Perspective

Directory of Open Access Journals (Sweden)

Gabriela Marchis

2011-10-01

Full Text Available Are we accursed to live in these tumultuous times that we are crossing now? Nowadays, one of the most heard questions is: What is the economic crisis and how it manifests itself over the years? However, we ask about causes and consequences and most of all when it will ends? Economic crises are forms of disruption to economic life, due in large part to an “overproduction”. The term “overproduction” does not refer here to an output exceeding the society needs, but the situation when these needs remain uncovered, and the demand drops due to lack of funds. This major financial crisis affected the economy of all countries in all its segments: industry, agriculture, construction, trade, transport and etcetera, due to the close links between countries, as a natural consequence of globalization. Thus the current financial and economic crisis has affected industries on which the entire world economy relies on. But, from an economic perspective, the crisis is not a surprise, knowing that the economic cycles are repeated. This paper tries to identify the similarities with the previous economic downturns as a necessity to learn from the lessons of the past.
Decreased Time to Return to Work Using Robotic-Assisted Unicompartmental Knee Arthroplasty Compared to Conventional Techniques.

Science.gov (United States)

Jinnah, Alexander H; Augart, Marco A; Lara, Daniel L; Jinnah, Riyaz H; Poehling, Gary G; Gwam, Chukwuweike U; Plate, Johannes F

2018-06-01

Unicompartmental knee arthroplasty (UKA) is a commonly used procedure for patients suffering from debilitating unicompartmental knee arthritis. For UKA recipients, robotic-assisted surgery has served as an aid in improving surgical accuracy and precision. While studies exist detailing outcomes of robotic UKA, to our knowledge, there are no studies assessing time to return to work using robotic-assisted UKA. Thus, the purpose of this study was to prospectively assess the time to return to work and to achieve the level of work activity following robotic-assisted UKA to create recommendations for patients preoperatively. We hypothesized that the return to work time would be shorter for robotic-assisted UKAs compared with TKAs and manual UKAs, due to more accurate ligament balancing and precise implementation of the operative plan. Thirty consecutive patients scheduled to undergo a robotic-assisted UKA at an academic teaching hospital were prospectively enrolled in the study. Inclusion criteria included employment at the time of surgery, with the intent on returning to the same occupation following surgery and having end-stage knee degenerative joint disease (DJD) limited to the medial compartment. Patients were contacted via email, letter, or phone at two, four, six, and 12 weeks following surgery until they returned to work. The Baecke physical activity questionnaire (BQ) was administered to assess patients' level of activity at work pre- and postoperatively. Statistical analysis was performed using SAS Enterprise Guide (SAS Institute Inc., Cary, North Carolina) and Excel® (Microsoft Corporation, Redmond, Washington). Descriptive statistics were calculated to assess the demographics of the patient population. Boxplots were generated using an Excel® spreadsheet to visualize the BQ scores and a two-tailed t-test was used to assess for differences between pre- and postoperative scores with alpha 0.05. The mean time to return to work was 6.4 weeks (SD=3.4, range 2
A robust variant of block Jacobi-Davidson for extracting a large number of eigenpairs: Application to grid-based real-space density functional theory

Science.gov (United States)

Lee, M.; Leiter, K.; Eisner, C.; Breuer, A.; Wang, X.

2017-09-01

In this work, we investigate a block Jacobi-Davidson (J-D) variant suitable for sparse symmetric eigenproblems where a substantial number of extremal eigenvalues are desired (e.g., ground-state real-space quantum chemistry). Most J-D algorithm variations tend to slow down as the number of desired eigenpairs increases due to frequent orthogonalization against a growing list of solved eigenvectors. In our specification of block J-D, all of the steps of the algorithm are performed in clusters, including the linear solves, which allows us to greatly reduce computational effort with blocked matrix-vector multiplies. In addition, we move orthogonalization against locked eigenvectors and working eigenvectors outside of the inner loop but retain the single Ritz vector projection corresponding to the index of the correction vector. Furthermore, we minimize the computational effort by constraining the working subspace to the current vectors being updated and the latest set of corresponding correction vectors. Finally, we incorporate accuracy thresholds based on the precision required by the Fermi-Dirac distribution. The net result is a significant reduction in the computational effort against most previous block J-D implementations, especially as the number of wanted eigenpairs grows. We compare our approach with another robust implementation of block J-D (JDQMR) and the state-of-the-art Chebyshev filter subspace (CheFSI) method for various real-space density functional theory systems. Versus CheFSI, for first-row elements, our method yields competitive timings for valence-only systems and 4-6× speedups for all-electron systems with up to 10× reduced matrix-vector multiplies. For all-electron calculations on larger elements (e.g., gold) where the wanted spectrum is quite narrow compared to the full spectrum, we observe 60× speedup with 200× fewer matrix-vector multiples vs. CheFSI.
Multidimensional upwind hydrodynamics on unstructured meshes using graphics processing units - I. Two-dimensional uniform meshes

Science.gov (United States)

Paardekooper, S.-J.

2017-08-01

We present a new method for numerical hydrodynamics which uses a multidimensional generalization of the Roe solver and operates on an unstructured triangular mesh. The main advantage over traditional methods based on Riemann solvers, which commonly use one-dimensional flux estimates as building blocks for a multidimensional integration, is its inherently multidimensional nature, and as a consequence its ability to recognize multidimensional stationary states that are not hydrostatic. A second novelty is the focus on graphics processing units (GPUs). By tailoring the algorithms specifically to GPUs, we are able to get speedups of 100-250 compared to a desktop machine. We compare the multidimensional upwind scheme to a traditional, dimensionally split implementation of the Roe solver on several test problems, and we find that the new method significantly outperforms the Roe solver in almost all cases. This comes with increased computational costs per time-step, which makes the new method approximately a factor of 2 slower than a dimensionally split scheme acting on a structured grid.
Completing the Physical Representation of Quantum Algorithms Provides a Quantitative Explanation of Their Computational Speedup

Science.gov (United States)

Castagnoli, Giuseppe

2018-03-01

The usual representation of quantum algorithms, limited to the process of solving the problem, is physically incomplete. We complete it in three steps: (i) extending the representation to the process of setting the problem, (ii) relativizing the extended representation to the problem solver to whom the problem setting must be concealed, and (iii) symmetrizing the relativized representation for time reversal to represent the reversibility of the underlying physical process. The third steps projects the input state of the representation, where the problem solver is completely ignorant of the setting and thus the solution of the problem, on one where she knows half solution (half of the information specifying it when the solution is an unstructured bit string). Completing the physical representation shows that the number of computation steps (oracle queries) required to solve any oracle problem in an optimal quantum way should be that of a classical algorithm endowed with the advanced knowledge of half solution.
VISPA2: a scalable pipeline for high-throughput identification and annotation of vector integration sites.

Science.gov (United States)

Spinozzi, Giulio; Calabria, Andrea; Brasca, Stefano; Beretta, Stefano; Merelli, Ivan; Milanesi, Luciano; Montini, Eugenio

2017-11-25

Bioinformatics tools designed to identify lentiviral or retroviral vector insertion sites in the genome of host cells are used to address the safety and long-term efficacy of hematopoietic stem cell gene therapy applications and to study the clonal dynamics of hematopoietic reconstitution. The increasing number of gene therapy clinical trials combined with the increasing amount of Next Generation Sequencing data, aimed at identifying integration sites, require both highly accurate and efficient computational software able to correctly process "big data" in a reasonable computational time. Here we present VISPA2 (Vector Integration Site Parallel Analysis, version 2), the latest optimized computational pipeline for integration site identification and analysis with the following features: (1) the sequence analysis for the integration site processing is fully compliant with paired-end reads and includes a sequence quality filter before and after the alignment on the target genome; (2) an heuristic algorithm to reduce false positive integration sites at nucleotide level to reduce the impact of Polymerase Chain Reaction or trimming/alignment artifacts; (3) a classification and annotation module for integration sites; (4) a user friendly web interface as researcher front-end to perform integration site analyses without computational skills; (5) the time speedup of all steps through parallelization (Hadoop free). We tested VISPA2 performances using simulated and real datasets of lentiviral vector integration sites, previously obtained from patients enrolled in a hematopoietic stem cell gene therapy clinical trial and compared the results with other preexisting tools for integration site analysis. On the computational side, VISPA2 showed a > 6-fold speedup and improved precision and recall metrics (1 and 0.97 respectively) compared to previously developed computational pipelines. These performances indicate that VISPA2 is a fast, reliable and user-friendly tool for
A Benchmark for Comparing Different Approaches for Specifying and Verifying Real-Time Systems

Science.gov (United States)

1993-01-01

To be considered correct or useful, real - time systems must deliver results within specified time intervals, either without exception or with high...probability. Recently, a large number of formal methods have been invented for specifying and verifying real - time systems . It has been suggested that...these formal methods need to be tested out on actual real - time systems . Such testing will allow the scalability of the methods to be assessed and also
Increased detection of mastitis pathogens by real-time PCR compared to bacterial culture.

Science.gov (United States)

Keane, O M; Budd, K E; Flynn, J; McCoy, F

2013-09-21

Rapid and accurate identification of mastitis pathogens is important for disease control. Bacterial culture and isolate identification is considered the gold standard in mastitis diagnosis but is time consuming and results in many culture-negative samples. Identification of mastitis pathogens by PCR has been proposed as a fast and sensitive alternative to bacterial culture. The results of bacterial culture and PCR for the identification of the aetiological agent of clinical mastitis were compared. The pathogen identified by traditional culture methods was also detected by PCR in 98 per cent of cases indicating good agreement between the positive results of bacterial culture and PCR. A mastitis pathogen could not be recovered from approximately 30 per cent of samples by bacterial culture, however, an aetiological agent was identified by PCR in 79 per cent of these samples. Therefore, a mastitis pathogen was detected in significantly more milk samples by PCR than by bacterial culture (92 per cent and 70 per cent, respectively) although the clinical relevance of PCR-positive culture-negative results remains controversial. A mixed infection of two or more mastitis pathogens was also detected more commonly by PCR. Culture-negative samples due to undetected Staphylococcus aureus infections were rare. The use of PCR technology may assist in rapid mastitis diagnosis, however, accurate interpretation of PCR results in the absence of bacterial culture remains problematic.
PEM-PCA: A Parallel Expectation-Maximization PCA Face Recognition Architecture

Directory of Open Access Journals (Sweden)

Kanokmon Rujirakul

2014-01-01

Full Text Available Principal component analysis or PCA has been traditionally used as one of the feature extraction techniques in face recognition systems yielding high accuracy when requiring a small number of features. However, the covariance matrix and eigenvalue decomposition stages cause high computational complexity, especially for a large database. Thus, this research presents an alternative approach utilizing an Expectation-Maximization algorithm to reduce the determinant matrix manipulation resulting in the reduction of the stages’ complexity. To improve the computational time, a novel parallel architecture was employed to utilize the benefits of parallelization of matrix computation during feature extraction and classification stages including parallel preprocessing, and their combinations, so-called a Parallel Expectation-Maximization PCA architecture. Comparing to a traditional PCA and its derivatives, the results indicate lower complexity with an insignificant difference in recognition precision leading to high speed face recognition systems, that is, the speed-up over nine and three times over PCA and Parallel PCA.
PEM-PCA: a parallel expectation-maximization PCA face recognition architecture.

Science.gov (United States)

Rujirakul, Kanokmon; So-In, Chakchai; Arnonkijpanich, Banchar

2014-01-01

Principal component analysis or PCA has been traditionally used as one of the feature extraction techniques in face recognition systems yielding high accuracy when requiring a small number of features. However, the covariance matrix and eigenvalue decomposition stages cause high computational complexity, especially for a large database. Thus, this research presents an alternative approach utilizing an Expectation-Maximization algorithm to reduce the determinant matrix manipulation resulting in the reduction of the stages' complexity. To improve the computational time, a novel parallel architecture was employed to utilize the benefits of parallelization of matrix computation during feature extraction and classification stages including parallel preprocessing, and their combinations, so-called a Parallel Expectation-Maximization PCA architecture. Comparing to a traditional PCA and its derivatives, the results indicate lower complexity with an insignificant difference in recognition precision leading to high speed face recognition systems, that is, the speed-up over nine and three times over PCA and Parallel PCA.
Research on Palmprint Identification Method Based on Quantum Algorithms

Directory of Open Access Journals (Sweden)

Hui Li

2014-01-01

Full Text Available Quantum image recognition is a technology by using quantum algorithm to process the image information. It can obtain better effect than classical algorithm. In this paper, four different quantum algorithms are used in the three stages of palmprint recognition. First, quantum adaptive median filtering algorithm is presented in palmprint filtering processing. Quantum filtering algorithm can get a better filtering result than classical algorithm through the comparison. Next, quantum Fourier transform (QFT is used to extract pattern features by only one operation due to quantum parallelism. The proposed algorithm exhibits an exponential speed-up compared with discrete Fourier transform in the feature extraction. Finally, quantum set operations and Grover algorithm are used in palmprint matching. According to the experimental results, quantum algorithm only needs to apply square of N operations to find out the target palmprint, but the traditional method needs N times of calculation. At the same time, the matching accuracy of quantum algorithm is almost 100%.
FPGA-Based Efficient Hardware/Software Co-Design for Industrial Systems with Consideration of Output Selection

Science.gov (United States)

Deliparaschos, Kyriakos M.; Michail, Konstantinos; Zolotas, Argyrios C.; Tzafestas, Spyros G.

2016-05-01

This work presents a field programmable gate array (FPGA)-based embedded software platform coupled with a software-based plant, forming a hardware-in-the-loop (HIL) that is used to validate a systematic sensor selection framework. The systematic sensor selection framework combines multi-objective optimization, linear-quadratic-Gaussian (LQG)-type control, and the nonlinear model of a maglev suspension. A robustness analysis of the closed-loop is followed (prior to implementation) supporting the appropriateness of the solution under parametric variation. The analysis also shows that quantization is robust under different controller gains. While the LQG controller is implemented on an FPGA, the physical process is realized in a high-level system modeling environment. FPGA technology enables rapid evaluation of the algorithms and test designs under realistic scenarios avoiding heavy time penalty associated with hardware description language (HDL) simulators. The HIL technique facilitates significant speed-up in the required execution time when compared to its software-based counterpart model.
Simulating electron wave dynamics in graphene superlattices exploiting parallel processing advantages

Science.gov (United States)

Rodrigues, Manuel J.; Fernandes, David E.; Silveirinha, Mário G.; Falcão, Gabriel

2018-01-01

This work introduces a parallel computing framework to characterize the propagation of electron waves in graphene-based nanostructures. The electron wave dynamics is modeled using both "microscopic" and effective medium formalisms and the numerical solution of the two-dimensional massless Dirac equation is determined using a Finite-Difference Time-Domain scheme. The propagation of electron waves in graphene superlattices with localized scattering centers is studied, and the role of the symmetry of the microscopic potential in the electron velocity is discussed. The computational methodologies target the parallel capabilities of heterogeneous multi-core CPU and multi-GPU environments and are built with the OpenCL parallel programming framework which provides a portable, vendor agnostic and high throughput-performance solution. The proposed heterogeneous multi-GPU implementation achieves speedup ratios up to 75x when compared to multi-thread and multi-core CPU execution, reducing simulation times from several hours to a couple of minutes.

Clinical Evaluation of PET Image Quality as a Function of Acquisition Time in a New TOF-PET/MRI Compared to TOF-PET/CT--Initial Results.

Science.gov (United States)

Zeimpekis, Konstantinos G; Barbosa, Felipe; Hüllner, Martin; ter Voert, Edwin; Davison, Helen; Veit-Haibach, Patrick; Delso, Gaspar

2015-10-01

The purpose of this study was to compare only the performance of the PET component between a TOF-PET/CT (henceforth noted as PET/CT) scanner and an integrated TOF-PET/MRI (henceforth noted as PET/MRI) scanner concerning image quality parameters and quantification in terms of standardized uptake value (SUV) as a function of acquisition time (a surrogate of dose). The CT and MR image quality were not assessed, and that is beyond the scope of this study. Five brain and five whole-body patients were included in the study. The PET/CT scan was used as a reference and the PET/MRI acquisition time was consecutively adjusted, taking into account the decay between the scans in order to expose both systems to the same amount of the emitted signal. The acquisition times were then retrospectively reduced to assess the performance of the PET/MRI for lower count rates. Image quality, image sharpness, artifacts, and noise were evaluated. SUV measurements were taken in the liver and in the white matter to compare quantification. Quantitative evaluation showed strong correlation between PET/CT and PET/MRI brain SUVs. Liver correlation was good, however, with lower uptake estimation in PET/MRI, partially justified by bio-redistribution. The clinical evaluation showed that PET/MRI offers higher image quality and sharpness with lower levels of noise and artifacts compared to PET/CT with reduced acquisition times for whole-body scans while for brain scans there is no significant difference. The TOF-PET/MRI showed higher image quality compared to TOF-PET/CT as tested with reduced imaging times. However, this result accounts mainly for body imaging, while no significant differences were found in brain imaging.
Real-time autocorrelator for fluorescence correlation spectroscopy based on graphical-processor-unit architecture: method, implementation, and comparative studies

Science.gov (United States)

Laracuente, Nicholas; Grossman, Carl

2013-03-01

We developed an algorithm and software to calculate autocorrelation functions from real-time photon-counting data using the fast, parallel capabilities of graphical processor units (GPUs). Recent developments in hardware and software have allowed for general purpose computing with inexpensive GPU hardware. These devices are more suited for emulating hardware autocorrelators than traditional CPU-based software applications by emphasizing parallel throughput over sequential speed. Incoming data are binned in a standard multi-tau scheme with configurable points-per-bin size and are mapped into a GPU memory pattern to reduce time-expensive memory access. Applications include dynamic light scattering (DLS) and fluorescence correlation spectroscopy (FCS) experiments. We ran the software on a 64-core graphics pci card in a 3.2 GHz Intel i5 CPU based computer running Linux. FCS measurements were made on Alexa-546 and Texas Red dyes in a standard buffer (PBS). Software correlations were compared to hardware correlator measurements on the same signals. Supported by HHMI and Swarthmore College
Comparative Study of Time-Frequency Decomposition Techniques for Fault Detection in Induction Motors Using Vibration Analysis during Startup Transient

Directory of Open Access Journals (Sweden)

Paulo Antonio Delgado-Arredondo

2015-01-01

Full Text Available Induction motors are critical components for most industries and the condition monitoring has become necessary to detect faults. There are several techniques for fault diagnosis of induction motors and analyzing the startup transient vibration signals is not as widely used as other techniques like motor current signature analysis. Vibration analysis gives a fault diagnosis focused on the location of spectral components associated with faults. Therefore, this paper presents a comparative study of different time-frequency analysis methodologies that can be used for detecting faults in induction motors analyzing vibration signals during the startup transient. The studied methodologies are the time-frequency distribution of Gabor (TFDG, the time-frequency Morlet scalogram (TFMS, multiple signal classification (MUSIC, and fast Fourier transform (FFT. The analyzed vibration signals are one broken rotor bar, two broken bars, unbalance, and bearing defects. The obtained results have shown the feasibility of detecting faults in induction motors using the time-frequency spectral analysis applied to vibration signals, and the proposed methodology is applicable when it does not have current signals and only has vibration signals. Also, the methodology has applications in motors that are not fed directly to the supply line, in such cases the analysis of current signals is not recommended due to poor current signal quality.
Comparing Response Times and Error Rates in a Simultaneous Masking Paradigm

Directory of Open Access Journals (Sweden)

F Hermens

2014-08-01

Full Text Available In simultaneous masking, performance on a foveally presented target is impaired by one or more flanking elements. Previous studies have demonstrated strong effects of the grouping of the target and the flankers on the strength of masking (e.g., Malania, Herzog & Westheimer, 2007. These studies have predominantly examined performance by measuring offset discrimination thresholds as a measure of performance, and it is therefore unclear whether other measures of performance provide similar outcomes. A recent study, which examined the role of grouping on error rates and response times in a speeded vernier offset discrimination task, similar to that used by Malania et al. (2007, suggested a possible dissociation between the two measures, with error rates mimicking threshold performance, but response times showing differential results (Panis & Hermens, 2014. We here report the outcomes of three experiments examining this possible dissociation, and demonstrate an overall similar pattern of results for error rates and response times across a broad range of mask layouts. Moreover, the pattern of results in our experiments strongly correlates with threshold performance reported earlier (Malania et al., 2007. Our results suggest that outcomes in a simultaneous masking paradigm do not critically depend on the outcome measure used, and therefore provide evidence for a common underlying mechanism.
One Coin has Two Sides: A Comparative Appraisal of New York Times and China Daily’s News Coverage of Alleged Internet Hacking

Directory of Open Access Journals (Sweden)

Wenyu Liu

2015-04-01

Full Text Available There is always some debate as to whether news reporting is objective and emotionless. Under the guidance of engagement system, a sub-system of Appraisal Theory, concerning the conveyance of ideology, the present study attempts to make a comparative study of different ideological attitudes by news coverage in New York Times and China Daily’s (English version on alleged internet hacking. The findings suggest that similar distributions of engagement resources in the news reports are adopted when engaged in an ideological attitude. Additionally, China Daily and New York Times hold different attitudes to Internet hacking. The attitude by China Daily changes before and after Snowden Event, while New York Times remains the same.
FPGA Accelerator for Wavelet-Based Automated Global Image Registration

Directory of Open Access Journals (Sweden)

Baofeng Li

2009-01-01

Full Text Available Wavelet-based automated global image registration (WAGIR is fundamental for most remote sensing image processing algorithms and extremely computation-intensive. With more and more algorithms migrating from ground computing to onboard computing, an efficient dedicated architecture of WAGIR is desired. In this paper, a BWAGIR architecture is proposed based on a block resampling scheme. BWAGIR achieves a significant performance by pipelining computational logics, parallelizing the resampling process and the calculation of correlation coefficient and parallel memory access. A proof-of-concept implementation with 1 BWAGIR processing unit of the architecture performs at least 7.4X faster than the CL cluster system with 1 node, and at least 3.4X than the MPM massively parallel machine with 1 node. Further speedup can be achieved by parallelizing multiple BWAGIR units. The architecture with 5 units achieves a speedup of about 3X against the CL with 16 nodes and a comparative speed with the MPM with 30 nodes. More importantly, the BWAGIR architecture can be deployed onboard economically.
FPGA Accelerator for Wavelet-Based Automated Global Image Registration

Directory of Open Access Journals (Sweden)

Li Baofeng

2009-01-01

Full Text Available Abstract Wavelet-based automated global image registration (WAGIR is fundamental for most remote sensing image processing algorithms and extremely computation-intensive. With more and more algorithms migrating from ground computing to onboard computing, an efficient dedicated architecture of WAGIR is desired. In this paper, a BWAGIR architecture is proposed based on a block resampling scheme. BWAGIR achieves a significant performance by pipelining computational logics, parallelizing the resampling process and the calculation of correlation coefficient and parallel memory access. A proof-of-concept implementation with 1 BWAGIR processing unit of the architecture performs at least 7.4X faster than the CL cluster system with 1 node, and at least 3.4X than the MPM massively parallel machine with 1 node. Further speedup can be achieved by parallelizing multiple BWAGIR units. The architecture with 5 units achieves a speedup of about 3X against the CL with 16 nodes and a comparative speed with the MPM with 30 nodes. More importantly, the BWAGIR architecture can be deployed onboard economically.
Using of FPGA coprocessor for improving the execution speed of the pattern recognition algorithm for ATLAS - high energy physics experiment

CERN Document Server

Hinkelbein, C; Kugel, A; Männer, R; Miiller, M

2004-01-01

Pattern recognition algorithms are used in experimental High Energy physics for getting parameters (features) of particles tracks in detectors. It is particularly important to have fast algorithms in trigger system. This paper investigates the suitability of using FPGA coprocessor for speedup of the TRT-LUT algorithm - one of the feature extraction algorithms for second level trigger for ATLAS experiment (CERN). Two realization of the same algorithm have been compared: C++ realization tested on a computer equipped with dual Xeon 2.4 GHz CPU, 64-bit, 66MHz PCI bus, 1024Mb DDR RAM main memories with Red Hat Linux 7.1 and hybrid C++ - VHDL realisation tested on same PC equipped in addition by MPRACE board (FPGA-Coprocessor board based on Xilinx Virtex-II FPGA and made as 64-bit, 66 MHz PCI card developed at the University of Mannheim). Usage of the FPGA coprocessor can give some reasonable speedup in contrast to general purpose processor only for those algorithms (or parts of algorithms), for which there is a po...
Efficient classical simulation of the Deutsch-Jozsa and Simon's algorithms

Science.gov (United States)

Johansson, Niklas; Larsson, Jan-Åke

2017-09-01

A long-standing aim of quantum information research is to understand what gives quantum computers their advantage. This requires separating problems that need genuinely quantum resources from those for which classical resources are enough. Two examples of quantum speed-up are the Deutsch-Jozsa and Simon's problem, both efficiently solvable on a quantum Turing machine, and both believed to lack efficient classical solutions. Here we present a framework that can simulate both quantum algorithms efficiently, solving the Deutsch-Jozsa problem with probability 1 using only one oracle query, and Simon's problem using linearly many oracle queries, just as expected of an ideal quantum computer. The presented simulation framework is in turn efficiently simulatable in a classical probabilistic Turing machine. This shows that the Deutsch-Jozsa and Simon's problem do not require any genuinely quantum resources, and that the quantum algorithms show no speed-up when compared with their corresponding classical simulation. Finally, this gives insight into what properties are needed in the two algorithms and calls for further study of oracle separation between quantum and classical computation.
Full Waveform Inversion with Multisource Frequency Selection of Marine Streamer Data

KAUST Repository

Huang, Yunsong

2017-10-27

The theory and practice of multisource full waveform inversion of marine supergathers are described with a frequency-selection strategy. The key enabling property of frequency selection is that it eliminates the crosstalk among sources, thus overcoming the aperture mismatch of marine multisource inversion. Tests on multisource full waveform inversion of synthetic marine data and Gulf of Mexico data show speedups of 4× and 8×, respectively, compared to conventional full waveform inversion.
Full Waveform Inversion with Multisource Frequency Selection of Marine Streamer Data

KAUST Repository

Huang, Yunsong; Schuster, Gerard T.

2017-01-01

The theory and practice of multisource full waveform inversion of marine supergathers are described with a frequency-selection strategy. The key enabling property of frequency selection is that it eliminates the crosstalk among sources, thus overcoming the aperture mismatch of marine multisource inversion. Tests on multisource full waveform inversion of synthetic marine data and Gulf of Mexico data show speedups of 4× and 8×, respectively, compared to conventional full waveform inversion.
GPU-Accelerated Stony-Brook University 5-class Microphysics Scheme in WRF

Science.gov (United States)

Mielikainen, J.; Huang, B.; Huang, A.

2011-12-01

multiple levels, which correspond to various vertical heights in the atmosphere. The size of the CONUS 12 km domain is 433 x 308 horizontal grid points with 35 vertical levels. First, the entire SBU-YLIN Fortran code was rewritten in C in preparation of GPU accelerated version. After that, C code was verified against Fortran code for identical outputs. Default compiler options from WRF were used for gfortran and gcc compilers. The processing time for the original Fortran code is 12274 ms and 12893 ms for C version. The processing times for GPU implementation of SBU-YLIN microphysics scheme with I/O are 57.7 ms and 37.2 ms for 1 and 2 GPUs, respectively. The corresponding speedups are 213x and 330x compared to a Fortran implementation. Without I/O the speedup is 896x on 1 GPU. Obviously, ignoring I/O time speedup scales linearly with GPUs. Thus, 2 GPUs have a speedup of 1788x without I/O. Microphysics computation is just a small part of the whole WRF model. After having completely implemented WRF on GPU, the inputs for SBU-YLIN do not have to be transferred from CPU. Instead they are results of previous WRF modules. Therefore, the role of I/O is greatly diminished once all of WRF have been converted to run on GPUs. In the near future, we expect to have a WRF running completely on GPUs for a superior performance.
OpenCL-based vicinity computation for 3D multiresolution mesh compression

Science.gov (United States)

Hachicha, Soumaya; Elkefi, Akram; Ben Amar, Chokri

2017-03-01

3D multiresolution mesh compression systems are still widely addressed in many domains. These systems are more and more requiring volumetric data to be processed in real-time. Therefore, the performance is becoming constrained by material resources usage and an overall reduction in the computational time. In this paper, our contribution entirely lies on computing, in real-time, triangles neighborhood of 3D progressive meshes for a robust compression algorithm based on the scan-based wavelet transform(WT) technique. The originality of this latter algorithm is to compute the WT with minimum memory usage by processing data as they are acquired. However, with large data, this technique is considered poor in term of computational complexity. For that, this work exploits the GPU to accelerate the computation using OpenCL as a heterogeneous programming language. Experiments demonstrate that, aside from the portability across various platforms and the flexibility guaranteed by the OpenCL-based implementation, this method can improve performance gain in speedup factor of 5 compared to the sequential CPU implementation.
Green cloud environment by using robust planning algorithm

Directory of Open Access Journals (Sweden)

Jyoti Thaman

2017-11-01

Full Text Available Cloud computing provided a framework for seamless access to resources through network. Access to resources is quantified through SLA between service providers and users. Service provider tries to best exploit their resources and reduce idle times of the resources. Growing energy concerns further makes the life of service providers miserable. User’s requests are served by allocating users tasks to resources in Clouds and Grid environment through scheduling algorithms and planning algorithms. With only few Planning algorithms in existence rarely planning and scheduling algorithms are differentiated. This paper proposes a robust hybrid planning algorithm, Robust Heterogeneous-Earliest-Finish-Time (RHEFT for binding tasks to VMs. The allocation of tasks to VMs is based on a novel task matching algorithm called Interior Scheduling. The consistent performance of proposed RHEFT algorithm is compared with Heterogeneous-Earliest-Finish-Time (HEFT and Distributed HEFT (DHEFT for various parameters like utilization ratio, makespan, Speed-up and Energy Consumption. RHEFT’s consistent performance against HEFT and DHEFT has established the robustness of the hybrid planning algorithm through rigorous simulations.
Recrafting the neighbor-joining method

Directory of Open Access Journals (Sweden)

Pedersen Christian NS

2006-01-01

Full Text Available Abstract Background The neighbor-joining method by Saitou and Nei is a widely used method for constructing phylogenetic trees. The formulation of the method gives rise to a canonical Θ(n3 algorithm upon which all existing implementations are based. Results In this paper we present techniques for speeding up the canonical neighbor-joining method. Our algorithms construct the same phylogenetic trees as the canonical neighbor-joining method. The best-case running time of our algorithms are O(n2 but the worst-case remains O(n3. We empirically evaluate the performance of our algoritms on distance matrices obtained from the Pfam collection of alignments. The experiments indicate that the running time of our algorithms evolve as Θ(n2 on the examined instance collection. We also compare the running time with that of the QuickTree tool, a widely used efficient implementation of the canonical neighbor-joining method. Conclusion The experiments show that our algorithms also yield a significant speed-up, already for medium sized instances.
Speed-up Template Matching through Integral Image based Weak Classifiers

NARCIS (Netherlands)

Wu, t.; Toet, A.

2014-01-01

Template matching is a widely used pattern recognition method, especially in industrial inspection. However, the computational costs of traditional template matching increase dramatically with both template-and scene imagesize. This makes traditional template matching less useful for many (e.g.
The large-scale blast score ratio (LS-BSR pipeline: a method to rapidly compare genetic content between bacterial genomes

Directory of Open Access Journals (Sweden)

Jason W. Sahl

2014-04-01

Full Text Available Background. As whole genome sequence data from bacterial isolates becomes cheaper to generate, computational methods are needed to correlate sequence data with biological observations. Here we present the large-scale BLAST score ratio (LS-BSR pipeline, which rapidly compares the genetic content of hundreds to thousands of bacterial genomes, and returns a matrix that describes the relatedness of all coding sequences (CDSs in all genomes surveyed. This matrix can be easily parsed in order to identify genetic relationships between bacterial genomes. Although pipelines have been published that group peptides by sequence similarity, no other software performs the rapid, large-scale, full-genome comparative analyses carried out by LS-BSR.Results. To demonstrate the utility of the method, the LS-BSR pipeline was tested on 96 Escherichia coli and Shigella genomes; the pipeline ran in 163 min using 16 processors, which is a greater than 7-fold speedup compared to using a single processor. The BSR values for each CDS, which indicate a relative level of relatedness, were then mapped to each genome on an independent core genome single nucleotide polymorphism (SNP based phylogeny. Comparisons were then used to identify clade specific CDS markers and validate the LS-BSR pipeline based on molecular markers that delineate between classical E. coli pathogenic variant (pathovar designations. Scalability tests demonstrated that the LS-BSR pipeline can process 1,000 E. coli genomes in 27–57 h, depending upon the alignment method, using 16 processors.Conclusions. LS-BSR is an open-source, parallel implementation of the BSR algorithm, enabling rapid comparison of the genetic content of large numbers of genomes. The results of the pipeline can be used to identify specific markers between user-defined phylogenetic groups, and to identify the loss and/or acquisition of genetic information between bacterial isolates. Taxa-specific genetic markers can then be translated
GPU Linear Algebra Libraries and GPGPU Programming for Accelerating MOPAC Semiempirical Quantum Chemistry Calculations.

Science.gov (United States)

Maia, Julio Daniel Carvalho; Urquiza Carvalho, Gabriel Aires; Mangueira, Carlos Peixoto; Santana, Sidney Ramos; Cabral, Lucidio Anjos Formiga; Rocha, Gerd B

2012-09-11

In this study, we present some modifications in the semiempirical quantum chemistry MOPAC2009 code that accelerate single-point energy calculations (1SCF) of medium-size (up to 2500 atoms) molecular systems using GPU coprocessors and multithreaded shared-memory CPUs. Our modifications consisted of using a combination of highly optimized linear algebra libraries for both CPU (LAPACK and BLAS from Intel MKL) and GPU (MAGMA and CUBLAS) to hasten time-consuming parts of MOPAC such as the pseudodiagonalization, full diagonalization, and density matrix assembling. We have shown that it is possible to obtain large speedups just by using CPU serial linear algebra libraries in the MOPAC code. As a special case, we show a speedup of up to 14 times for a methanol simulation box containing 2400 atoms and 4800 basis functions, with even greater gains in performance when using multithreaded CPUs (2.1 times in relation to the single-threaded CPU code using linear algebra libraries) and GPUs (3.8 times). This degree of acceleration opens new perspectives for modeling larger structures which appear in inorganic chemistry (such as zeolites and MOFs), biochemistry (such as polysaccharides, small proteins, and DNA fragments), and materials science (such as nanotubes and fullerenes). In addition, we believe that this parallel (GPU-GPU) MOPAC code will make it feasible to use semiempirical methods in lengthy molecular simulations using both hybrid QM/MM and QM/QM potentials.
Time to Detection with BacT/Alert FA Plus Compared to BacT/Alert FA Blood Culture Media.

Science.gov (United States)

Nutman, A; Fisher Even-Tsur, S; Shapiro, G; Braun, T; Schwartz, D; Carmeli, Y

2016-09-01

Rapid identification of the causative pathogen in patients with bacteremia allows adjustment of antibiotic therapy and improves patient outcomes. We compared in vitro and real-life time to detection (TTD) of two blood culture media, BacT/Alert FA (FA) and BacT/Alert FA Plus (FA Plus), for the nine most common species of bacterial pathogens recovered from blood samples. Experimental data from simulated cultures was compared with microbiology records of TTD for both culture media with growth of the species of interest in clinical blood cultures. In the experimental conditions, median TTD was 3.8 hours (23.9 %) shorter using FA Plus media. The magnitude of reduction differed between species. Similarly, in real life data, FA Plus had shorter TTD than FA media; however, the difference between culture media was smaller, and median TTD was only 1 hour (8.5 %) less. We found shorter TTD with BacT/Alert FA Plus culture media, both experimentally and in real-life conditions and unrelated to antibiotic neutralization, highlighting the importance of appropriate blood culture media selection.
Timely disclosure of progress in long-term cancer survival: the boomerang method substantially improved estimates in a comparative study.

Science.gov (United States)

Brenner, Hermann; Jansen, Lina

2016-02-01

Monitoring cancer survival is a key task of cancer registries, but timely disclosure of progress in long-term survival remains a challenge. We introduce and evaluate a novel method, denoted "boomerang method," for deriving more up-to-date estimates of long-term survival. We applied three established methods (cohort, complete, and period analysis) and the boomerang method to derive up-to-date 10-year relative survival of patients diagnosed with common solid cancers and hematological malignancies in the United States. Using the Surveillance, Epidemiology and End Results 9 database, we compared the most up-to-date age-specific estimates that might have been obtained with the database including patients diagnosed up to 2001 with 10-year survival later observed for patients diagnosed in 1997-2001. For cancers with little or no increase in survival over time, the various estimates of 10-year relative survival potentially available by the end of 2001 were generally rather similar. For malignancies with strongly increasing survival over time, including breast and prostate cancer and all hematological malignancies, the boomerang method provided estimates that were closest to later observed 10-year relative survival in 23 of the 34 groups assessed. The boomerang method can substantially improve up-to-dateness of long-term cancer survival estimates in times of ongoing improvement in prognosis. Copyright © 2016 Elsevier Inc. All rights reserved.

Advanced Numerical Techniques of Performance Evaluation. Volume 1

Science.gov (United States)

1990-06-01

34Speedup": Performance Analysis of Parallel Programs. Technical Report ANL-87-7, Mathematics and Computer Science Division, Argonne National Laboratory...and dequeue- pilicessoms a shared bus, and a write-through cache coherency4 hras anocu i amlc.wih a rotm sig if roocl[ Lvel & hakr 9U.Th Smety a atie...James M. Boyle. Beyond "Speedup": Parlor- mance Analysis of Parallel Progrms. Technical Report ANL-J7-7. Mathematics and Computer Science Division, Aronn
Assessment of Clinically Suspected Tubercular Lymphadenopathy by Real-Time PCR Compared to Non-Molecular Methods on Lymph Node Aspirates.

Science.gov (United States)

Gupta, Vivek; Bhake, Arvind

2018-01-01

The diagnosis of tubercular lymphadenitis (TBLN) is challenging. This study assesses the role of diagnostic intervention with real-time PCR in clinically suspected tubercular lymphadenopathy in relation to cytology and microbiological methods. The cross-sectional study involved 214 patients, and PCR, cytology, and Ziehl-Neelsen (ZN) staining was performed on aspirates. The findings were compared with culture on Lowenstein-Jensen medium. The overall concordance of cytology and PCR, both individually and combined, was calculated. χ2 and Phi values were assessed between cytology, PCR, and culture. A cytological diagnosis of tuberculosis (TB), reactive lymphoid hyperplasia, and suppurative lymphadenitis was made in 71, 112, and 6 patients, respectively. PCR and culture were positive in 40% of the cases. Among the TBLN patients, PCR showed higher positivity in necrosis and culture showed higher positivity in necrotizing granuloma. Positive ZN staining was observed in 29.6% of the TBLN cases, with an overall positivity of 11%. PCR could additionally detect 82 cases missed by ZN staining. The overall concordance rate for either diagnostic modality, i.e., PCR or cytology, was highest (75%), and for PCR alone was 74%. Phi values were observed to be 0.47 between PCR and culture. Real-time PCR for Mycobacterium tuberculosis complex on aspirates offers a definitive and comparable diagnosis of TBLN. Including this approach as the primary investigation in the work-up of TBLN could reduce the burden of TB. © 2017 S. Karger AG, Basel.
Parallel 4-dimensional cellular automaton track finder for the CBM experiment

Energy Technology Data Exchange (ETDEWEB)

Akishina, Valentina [Goethe-Universitaet Frankfurt am Main, Frankfurt am Main (Germany); Frankfurt Institute for Advanced Studies, Frankfurt am Main (Germany); GSI Helmholtzzentrum fuer Schwerionenforschung GmbH, Darmstadt (Germany); JINR Joint Institute for Nuclear Research, Dubna (Russian Federation); Kisel, Ivan [Goethe-Universitaet Frankfurt am Main, Frankfurt am Main (Germany); Frankfurt Institute for Advanced Studies, Frankfurt am Main (Germany); GSI Helmholtzzentrum fuer Schwerionenforschung GmbH, Darmstadt (Germany); Collaboration: CBM-Collaboration

2016-07-01

The CBM experiment at FAIR will focus on the measurement of rare probes at interaction rates up to 10 MHz. The beam will provide free stream of particles, so that information about different collisions may overlap in time. It requires the full online event reconstruction not only in space, but also in time, so-called 4D (4-dimensional) event building. This is a task of the First-Level Event Selection (FLES) package. The FLES reconstruction package consists of several modules: track finding, track fitting, short-lived particles finding, event building and selection. The Silicon Tracking System (STS) time measurement information was included into the Cellular Automaton (CA) track finder algorithm. The 4D track finder algorithm speed (8.5 ms per event in a time-slice) and efficiency is comparable with the event-based analysis. The CA track finder was fully parallelised inside the time-slice. The parallel version achieves a speed-up factor of 10.6 while parallelising between 10 Intel Xeon physical cores with a hyper-threading. The first version of event building based on 4D track finder was implemented.
High-Efficient Parallel CAVLC Encoders on Heterogeneous Multicore Architectures

Directory of Open Access Journals (Sweden)

H. Y. Su

2012-04-01

Full Text Available This article presents two high-efficient parallel realizations of the context-based adaptive variable length coding (CAVLC based on heterogeneous multicore processors. By optimizing the architecture of the CAVLC encoder, three kinds of dependences are eliminated or weaken, including the context-based data dependence, the memory accessing dependence and the control dependence. The CAVLC pipeline is divided into three stages: two scans, coding, and lag packing, and be implemented on two typical heterogeneous multicore architectures. One is a block-based SIMD parallel CAVLC encoder on multicore stream processor STORM. The other is a component-oriented SIMT parallel encoder on massively parallel architecture GPU. Both of them exploited rich data-level parallelism. Experiments results show that compared with the CPU version, more than 70 times of speedup can be obtained for STORM and over 50 times for GPU. The implementation of encoder on STORM can make a real-time processing for 1080p @30fps and GPU-based version can satisfy the requirements for 720p real-time encoding. The throughput of the presented CAVLC encoders is more than 10 times higher than that of published software encoders on DSP and multicore platforms.
Comparative analysis of time efficiency and spatial resolution between different EIT reconstruction algorithms

International Nuclear Information System (INIS)

Kacarska, Marija; Loskovska, Suzana

2002-01-01

In this paper comparative analysis between different EIT algorithms is presented. Analysis is made for spatial and temporal resolution of obtained images by several different algorithms. Discussions consider spatial resolution dependent on data acquisition method, too. Obtained results show that conventional applied-current EIT is more powerful compared to induced-current EIT. (Author)
Treatment comfort, time perception, and preference for conventional and digital impression techniques: A comparative study in young patients.

Science.gov (United States)

Burhardt, Lukasz; Livas, Christos; Kerdijk, Wouter; van der Meer, Wicher Joerd; Ren, Yijin

2016-08-01

The aim of this crossover study was to assess perceptions and preferences for impression techniques in young orthodontic patients receiving alginate and 2 different digital impressions. Thirty-eight subjects aged 10 to 17 years requiring impressions for orthodontic treatment were randomly allocated to 3 groups that differed in the order that an alginate impressions and 2 different intraoral scanning procedures were administered. After each procedure, the patients were asked to score their perceptions on a 5-point Likert scale for gag reflex, queasiness, difficulty to breathe, uncomfortable feeling, perception of the scanning time, state of anxiety, and use of a powder, and to select the preferred impression system. Chairside time and maximal mouth opening were also registered. More queasiness (P = 0.00) and discomfort (P = 0.02) during alginate impression taking of the maxilla were perceived compared with the scans with the CEREC Omnicam (Sirona Dental Systems, Bensheim, Germany). There were no significant differences in perceptions between the alginate impressions and the Lava C.O.S. (3M ESPE, St Paul, Minn) and between the 2 scanners. Chairside times for the alginate impressions (9.7 ± 1.8 minutes) and the CEREC Omnicam (10.7 ± 1.8 minutes) were significantly lower (P <0.001) than for the Lava C.O.S. (17.8 ± 4.0 minutes). Digital impressions were favored by 51% of the subjects, whereas 29% chose alginate impressions, and 20% had no preference. Regardless of the significant differences in the registered times among the 3 impression-taking methods, the distributions of the Likert scores of time perception and maximal mouth opening were similar in all 3 groups. Young orthodontic patients preferred the digital impression techniques over the alginate method, although alginate impressions required the shortest chairside time. Copyright © 2016 American Association of Orthodontists. Published by Elsevier Inc. All rights reserved.
General upper bounds on the runtime of parallel evolutionary algorithms.

Science.gov (United States)

Lässig, Jörg; Sudholt, Dirk

2014-01-01

We present a general method for analyzing the runtime of parallel evolutionary algorithms with spatially structured populations. Based on the fitness-level method, it yields upper bounds on the expected parallel runtime. This allows for a rigorous estimate of the speedup gained by parallelization. Tailored results are given for common migration topologies: ring graphs, torus graphs, hypercubes, and the complete graph. Example applications for pseudo-Boolean optimization show that our method is easy to apply and that it gives powerful results. In our examples the performance guarantees improve with the density of the topology. Surprisingly, even sparse topologies such as ring graphs lead to a significant speedup for many functions while not increasing the total number of function evaluations by more than a constant factor. We also identify which number of processors lead to the best guaranteed speedups, thus giving hints on how to parameterize parallel evolutionary algorithms.
Hierarchical neural network model of the visual system determining figure/ground relation

Science.gov (United States)

Kikuchi, Masayuki

2017-07-01

One of the most important functions of the visual perception in the brain is figure/ground interpretation from input images. Figural region in 2D image corresponding to object in 3D space are distinguished from background region extended behind the object. Previously the author proposed a neural network model of figure/ground separation constructed on the standpoint that local geometric features such as curvatures and outer angles at corners are extracted and propagated along input contour in a single layer network (Kikuchi & Akashi, 2001). However, such a processing principle has the defect that signal propagation requires manyiterations despite the fact that actual visual system determines figure/ground relation within the short period (Zhou et al., 2000). In order to attain speed-up for determining figure/ground, this study incorporates hierarchical architecture into the previous model. This study confirmed the effect of the hierarchization as for the computation time by simulation. As the number of layers increased, the required computation time reduced. However, such speed-up effect was saturatedas the layers increased to some extent. This study attempted to explain this saturation effect by the notion of average distance between vertices in the area of complex network, and succeeded to mimic the saturation effect by computer simulation.
Preconditioning 2D Integer Data for Fast Convex Hull Computations.

Science.gov (United States)

Cadenas, José Oswaldo; Megson, Graham M; Luengo Hendriks, Cris L

2016-01-01

In order to accelerate computing the convex hull on a set of n points, a heuristic procedure is often applied to reduce the number of points to a set of s points, s ≤ n, which also contains the same hull. We present an algorithm to precondition 2D data with integer coordinates bounded by a box of size p × q before building a 2D convex hull, with three distinct advantages. First, we prove that under the condition min(p, q) ≤ n the algorithm executes in time within O(n); second, no explicit sorting of data is required; and third, the reduced set of s points forms a simple polygonal chain and thus can be directly pipelined into an O(n) time convex hull algorithm. This paper empirically evaluates and quantifies the speed up gained by preconditioning a set of points by a method based on the proposed algorithm before using common convex hull algorithms to build the final hull. A speedup factor of at least four is consistently found from experiments on various datasets when the condition min(p, q) ≤ n holds; the smaller the ratio min(p, q)/n is in the dataset, the greater the speedup factor achieved.
Accelerating statistical image reconstruction algorithms for fan-beam x-ray CT using cloud computing

Science.gov (United States)

Srivastava, Somesh; Rao, A. Ravishankar; Sheinin, Vadim

2011-03-01

Statistical image reconstruction algorithms potentially offer many advantages to x-ray computed tomography (CT), e.g. lower radiation dose. But, their adoption in practical CT scanners requires extra computation power, which is traditionally provided by incorporating additional computing hardware (e.g. CPU-clusters, GPUs, FPGAs etc.) into a scanner. An alternative solution is to access the required computation power over the internet from a cloud computing service, which is orders-of-magnitude more cost-effective. This is because users only pay a small pay-as-you-go fee for the computation resources used (i.e. CPU time, storage etc.), and completely avoid purchase, maintenance and upgrade costs. In this paper, we investigate the benefits and shortcomings of using cloud computing for statistical image reconstruction. We parallelized the most time-consuming parts of our application, the forward and back projectors, using MapReduce, the standard parallelization library on clouds. From preliminary investigations, we found that a large speedup is possible at a very low cost. But, communication overheads inside MapReduce can limit the maximum speedup, and a better MapReduce implementation might become necessary in the future. All the experiments for this paper, including development and testing, were completed on the Amazon Elastic Compute Cloud (EC2) for less than $20.
Intermittent compared to continuous real-time fMRI neurofeedback boosts control over amygdala activation.

Science.gov (United States)

Hellrung, Lydia; Dietrich, Anja; Hollmann, Maurice; Pleger, Burkhard; Kalberlah, Christian; Roggenhofer, Elisabeth; Villringer, Arno; Horstmann, Annette

2018-02-01

Real-time fMRI neurofeedback is a feasible tool to learn the volitional regulation of brain activity. So far, most studies provide continuous feedback information that is presented upon every volume acquisition. Although this maximizes the temporal resolution of feedback information, it may be accompanied by some disadvantages. Participants can be distracted from the regulation task due to (1) the intrinsic delay of the hemodynamic response and associated feedback and (2) limited cognitive resources available to simultaneously evaluate feedback information and stay engaged with the task. Here, we systematically investigate differences between groups presented with different variants of feedback (continuous vs. intermittent) and a control group receiving no feedback on their ability to regulate amygdala activity using positive memories and feelings. In contrast to the feedback groups, no learning effect was observed in the group without any feedback presentation. The group receiving intermittent feedback exhibited better amygdala regulation performance when compared with the group receiving continuous feedback. Behavioural measurements show that these effects were reflected in differences in task engagement. Overall, we not only demonstrate that the presentation of feedback is a prerequisite to learn volitional control of amygdala activity but also that intermittent feedback is superior to continuous feedback presentation. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Development of potential map for landslides by comparing instability indices of various time periods

Science.gov (United States)

Chiang, Jie-Lun; Tian, Yu-Qing; Chen, Yie-Ruey; Tsai, Kuang-Jung

2017-04-01

In recent years, extreme rainfall events occur frequently and induced serious landslides and debris flow disasters in Taiwan. The instability indices will differ when using landslide maps of different time periods. We analyzed the landslide records during the period year, 2008 2012, the landslide area contributed 0.42% 2.94% of the total watershed area, the 2.94% was caused by the typhoon Morakot in August, 2009, which brought massive rainfall in which the cumulative maximum rainfall was up to 2900 mm. We analyzed the instability factors including elevation, slope, aspect, soil, and geology. And comparing the instability indices by using individual landslide map of 2008 2012, the landslide maps of the union of the five years, and interaction of the five years. The landslide area from union of the five years contributed 3.71%,the landslide area from interaction of the five years contributed 0.14%. In this study, Kriging was used to establish the susceptibility map in selected watershed. From interaction of the five years, we found the instability index above 4.3 can correspond to those landslide records. The potential landslide area of the selected watershed, where collapses occur more likely, belongs to high level and medium-high level; the area is 13.43% and 3.04% respectively.
Is Time Predictability Quantifiable?

DEFF Research Database (Denmark)

Schoeberl, Martin

2012-01-01

Computer architects and researchers in the realtime domain start to investigate processors and architectures optimized for real-time systems. Optimized for real-time systems means time predictable, i.e., architectures where it is possible to statically derive a tight bound of the worst......-case execution time. To compare different approaches we would like to quantify time predictability. That means we need to measure time predictability. In this paper we discuss the different approaches for these measurements and conclude that time predictability is practically not quantifiable. We can only...... compare the worst-case execution time bounds of different architectures....
Hardware Accelerators Targeting a Novel Group Based Packet Classification Algorithm

Directory of Open Access Journals (Sweden)

O. Ahmed

2013-01-01

Full Text Available Packet classification is a ubiquitous and key building block for many critical network devices. However, it remains as one of the main bottlenecks faced when designing fast network devices. In this paper, we propose a novel Group Based Search packet classification Algorithm (GBSA that is scalable, fast, and efficient. GBSA consumes an average of 0.4 megabytes of memory for a 10 k rule set. The worst-case classification time per packet is 2 microseconds, and the preprocessing speed is 3 M rules/second based on an Xeon processor operating at 3.4 GHz. When compared with other state-of-the-art classification techniques, the results showed that GBSA outperforms the competition with respect to speed, memory usage, and processing time. Moreover, GBSA is amenable to implementation in hardware. Three different hardware implementations are also presented in this paper including an Application Specific Instruction Set Processor (ASIP implementation and two pure Register-Transfer Level (RTL implementations based on Impulse-C and Handel-C flows, respectively. Speedups achieved with these hardware accelerators ranged from 9x to 18x compared with a pure software implementation running on an Xeon processor.
Partial wave analysis using graphics processing units

Energy Technology Data Exchange (ETDEWEB)

Berger, Niklaus; Liu Beijiang; Wang Jike, E-mail: nberger@ihep.ac.c [Institute of High Energy Physics, Chinese Academy of Sciences, 19B Yuquan Lu, Shijingshan, 100049 Beijing (China)

2010-04-01

Partial wave analysis is an important tool for determining resonance properties in hadron spectroscopy. For large data samples however, the un-binned likelihood fits employed are computationally very expensive. At the Beijing Spectrometer (BES) III experiment, an increase in statistics compared to earlier experiments of up to two orders of magnitude is expected. In order to allow for a timely analysis of these datasets, additional computing power with short turnover times has to be made available. It turns out that graphics processing units (GPUs) originally developed for 3D computer games have an architecture of massively parallel single instruction multiple data floating point units that is almost ideally suited for the algorithms employed in partial wave analysis. We have implemented a framework for tensor manipulation and partial wave fits called GPUPWA. The user writes a program in pure C++ whilst the GPUPWA classes handle computations on the GPU, memory transfers, caching and other technical details. In conjunction with a recent graphics processor, the framework provides a speed-up of the partial wave fit by more than two orders of magnitude compared to legacy FORTRAN code.
Parallelization of Subchannel Analysis Code MATRA

International Nuclear Information System (INIS)

Kim, Seongjin; Hwang, Daehyun; Kwon, Hyouk

2014-01-01

A stand-alone calculation of MATRA code used up pertinent computing time for the thermal margin calculations while a relatively considerable time is needed to solve the whole core pin-by-pin problems. In addition, it is strongly required to improve the computation speed of the MATRA code to satisfy the overall performance of the multi-physics coupling calculations. Therefore, a parallel approach to improve and optimize the computability of the MATRA code is proposed and verified in this study. The parallel algorithm is embodied in the MATRA code using the MPI communication method and the modification of the previous code structure was minimized. An improvement is confirmed by comparing the results between the single and multiple processor algorithms. The speedup and efficiency are also evaluated when increasing the number of processors. The parallel algorithm was implemented to the subchannel code MATRA using the MPI. The performance of the parallel algorithm was verified by comparing the results with those from the MATRA with the single processor. It is also noticed that the performance of the MATRA code was greatly improved by implementing the parallel algorithm for the 1/8 core and whole core problems
A Comparative Study of Simple Auditory Reaction Time in Blind (Congenitally) and Sighted Subjects

OpenAIRE

Gandhi, Pritesh Hariprasad; Gokhale, Pradnya A.; Mehta, H. B.; Shah, C. J.

2013-01-01

Background: Reaction time is the time interval between the application of a stimulus and the appearance of appropriate voluntary response by a subject. It involves stimulus processing, decision making, and response programming. Reaction time study has been popular due to their implication in sports physiology. Reaction time has been widely studied as its practical implications may be of great consequence e.g., a slower than normal reaction time while driving can have grave results. Objective:...
Accelerating population balance-Monte Carlo simulation for coagulation dynamics from the Markov jump model, stochastic algorithm and GPU parallel computing

International Nuclear Information System (INIS)

Xu, Zuwei; Zhao, Haibo; Zheng, Chuguang

2015-01-01

This paper proposes a comprehensive framework for accelerating population balance-Monte Carlo (PBMC) simulation of particle coagulation dynamics. By combining Markov jump model, weighted majorant kernel and GPU (graphics processing unit) parallel computing, a significant gain in computational efficiency is achieved. The Markov jump model constructs a coagulation-rule matrix of differentially-weighted simulation particles, so as to capture the time evolution of particle size distribution with low statistical noise over the full size range and as far as possible to reduce the number of time loopings. Here three coagulation rules are highlighted and it is found that constructing appropriate coagulation rule provides a route to attain the compromise between accuracy and cost of PBMC methods. Further, in order to avoid double looping over all simulation particles when considering the two-particle events (typically, particle coagulation), the weighted majorant kernel is introduced to estimate the maximum coagulation rates being used for acceptance–rejection processes by single-looping over all particles, and meanwhile the mean time-step of coagulation event is estimated by summing the coagulation kernels of rejected and accepted particle pairs. The computational load of these fast differentially-weighted PBMC simulations (based on the Markov jump model) is reduced greatly to be proportional to the number of simulation particles in a zero-dimensional system (single cell). Finally, for a spatially inhomogeneous multi-dimensional (multi-cell) simulation, the proposed fast PBMC is performed in each cell, and multiple cells are parallel processed by multi-cores on a GPU that can implement the massively threaded data-parallel tasks to obtain remarkable speedup ratio (comparing with CPU computation, the speedup ratio of GPU parallel computing is as high as 200 in a case of 100 cells with 10 000 simulation particles per cell). These accelerating approaches of PBMC are
Time-dependent opportunities in energy business. A comparative study of locally available renewable and conventional fuels

International Nuclear Information System (INIS)

Tolis, Athanasios I.; Rentizelas, Athanasios A.; Tatsiopoulos, Ilias P.

2010-01-01

This work investigates and compares energy-related, private business strategies, potentially interesting for investors willing to exploit either local biomass sources or strategic conventional fuels. Two distinct fuels and related power-production technologies are compared as a case study, in terms of economic efficiency: the biomass of cotton stalks and the natural gas. The carbon capture and storage option are also investigated for power plants based on both fuel types. The model used in this study investigates important economic aspects using a 'real options' method instead of traditional Discounted Cash Flow techniques, as it might handle in a more effective way the problems arising from the stochastic nature of significant cash flow contributors' evolution like electricity, fuel and CO 2 allowance prices. The capital costs have also a functional relationship with time, thus providing an additional reason for implementing 'real options' as well as the learning-curves technique. The methodology as well as the results presented in this work, may lead to interesting conclusions and affect potential private investment strategies and future decision making. This study indicates that both technologies lead to positive investment yields, with the natural gas being more profitable for the case study examined, while the carbon capture and storage does not seem to be cost efficient with the current CO 2 allowance prices. Furthermore, low interest rates might encourage potential investors to wait before actualising their business plans while higher interest rates favor immediate investment decisions. (author)
Activation of Human Peripheral Blood Eosinophils by Cytokines in a Comparative Time-Course Proteomic/Phosphoproteomic Study.

Science.gov (United States)

Soman, Kizhake V; Stafford, Susan J; Pazdrak, Konrad; Wu, Zheng; Luo, Xuemei; White, Wendy I; Wiktorowicz, John E; Calhoun, William J; Kurosky, Alexander

2017-08-04

Activated eosinophils contribute to airway dysfunction and tissue remodeling in asthma and thus are considered to be important factors in asthma pathology. We report here comparative proteomic and phosphoproteomic changes upon activation of eosinophils using eight cytokines individually and in selected cytokine combinations in time-course reactions. Differential protein and phosphoprotein expressions were determined by mass spectrometry after 2-dimensional gel electrophoresis (2DGE) and by LC-MS/MS. We found that each cytokine-stimulation produced significantly different changes in the eosinophil proteome and phosphoproteome, with phosphoproteomic changes being more pronounced and having an earlier onset. Furthermore, we observed that IL-5, GM-CSF, and IL-3 showed the greatest change in protein expression and phosphorylation, and this expression differed markedly from those of the other five cytokines evaluated. Comprehensive univariate and multivariate statistical analyses were employed to evaluate the comparative results. We also monitored eosinophil activation using flow cytometry (FC) analysis of CD69. In agreement with our proteomic studies, FC indicated that IL-5, GM-CSF, and IL-3 were more effective than the other five cytokines studied in stimulating a cell surface CD69 increase indicative of eosinophil activation. Moreover, selected combinations of cytokines revealed proteomic patterns with many proteins in common with single cytokine expression patterns but also showed a greater effect of the two cytokines employed, indicating a more complex signaling pathway that was reflective of a more typical inflammatory pathology.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.