large scale parallelism: Topics by WorldWideScience.org

Sample records for large scale parallelism

Parallel clustering algorithm for large-scale biological data sets.

Science.gov (United States)

Wang, Minchao; Zhang, Wu; Ding, Wang; Dai, Dongbo; Zhang, Huiran; Xie, Hao; Chen, Luonan; Guo, Yike; Xie, Jiang

2014-01-01

Recent explosion of biological data brings a great challenge for the traditional clustering algorithms. With increasing scale of data sets, much larger memory and longer runtime are required for the cluster identification problems. The affinity propagation algorithm outperforms many other classical clustering algorithms and is widely applied into the biological researches. However, the time and space complexity become a great bottleneck when handling the large-scale data sets. Moreover, the similarity matrix, whose constructing procedure takes long runtime, is required before running the affinity propagation algorithm, since the algorithm clusters data sets based on the similarities between data pairs. Two types of parallel architectures are proposed in this paper to accelerate the similarity matrix constructing procedure and the affinity propagation algorithm. The memory-shared architecture is used to construct the similarity matrix, and the distributed system is taken for the affinity propagation algorithm, because of its large memory size and great computing capacity. An appropriate way of data partition and reduction is designed in our method, in order to minimize the global communication cost among processes. A speedup of 100 is gained with 128 cores. The runtime is reduced from serval hours to a few seconds, which indicates that parallel algorithm is capable of handling large-scale data sets effectively. The parallel affinity propagation also achieves a good performance when clustering large-scale gene data (microarray) and detecting families in large protein superfamilies.
A new asynchronous parallel algorithm for inferring large-scale gene regulatory networks.

Directory of Open Access Journals (Sweden)

Xiangyun Xiao

Full Text Available The reconstruction of gene regulatory networks (GRNs from high-throughput experimental data has been considered one of the most important issues in systems biology research. With the development of high-throughput technology and the complexity of biological problems, we need to reconstruct GRNs that contain thousands of genes. However, when many existing algorithms are used to handle these large-scale problems, they will encounter two important issues: low accuracy and high computational cost. To overcome these difficulties, the main goal of this study is to design an effective parallel algorithm to infer large-scale GRNs based on high-performance parallel computing environments. In this study, we proposed a novel asynchronous parallel framework to improve the accuracy and lower the time complexity of large-scale GRN inference by combining splitting technology and ordinary differential equation (ODE-based optimization. The presented algorithm uses the sparsity and modularity of GRNs to split whole large-scale GRNs into many small-scale modular subnetworks. Through the ODE-based optimization of all subnetworks in parallel and their asynchronous communications, we can easily obtain the parameters of the whole network. To test the performance of the proposed approach, we used well-known benchmark datasets from Dialogue for Reverse Engineering Assessments and Methods challenge (DREAM, experimentally determined GRN of Escherichia coli and one published dataset that contains more than 10 thousand genes to compare the proposed approach with several popular algorithms on the same high-performance computing environments in terms of both accuracy and time complexity. The numerical results demonstrate that our parallel algorithm exhibits obvious superiority in inferring large-scale GRNs.
A new asynchronous parallel algorithm for inferring large-scale gene regulatory networks.

Science.gov (United States)

Xiao, Xiangyun; Zhang, Wei; Zou, Xiufen

2015-01-01

The reconstruction of gene regulatory networks (GRNs) from high-throughput experimental data has been considered one of the most important issues in systems biology research. With the development of high-throughput technology and the complexity of biological problems, we need to reconstruct GRNs that contain thousands of genes. However, when many existing algorithms are used to handle these large-scale problems, they will encounter two important issues: low accuracy and high computational cost. To overcome these difficulties, the main goal of this study is to design an effective parallel algorithm to infer large-scale GRNs based on high-performance parallel computing environments. In this study, we proposed a novel asynchronous parallel framework to improve the accuracy and lower the time complexity of large-scale GRN inference by combining splitting technology and ordinary differential equation (ODE)-based optimization. The presented algorithm uses the sparsity and modularity of GRNs to split whole large-scale GRNs into many small-scale modular subnetworks. Through the ODE-based optimization of all subnetworks in parallel and their asynchronous communications, we can easily obtain the parameters of the whole network. To test the performance of the proposed approach, we used well-known benchmark datasets from Dialogue for Reverse Engineering Assessments and Methods challenge (DREAM), experimentally determined GRN of Escherichia coli and one published dataset that contains more than 10 thousand genes to compare the proposed approach with several popular algorithms on the same high-performance computing environments in terms of both accuracy and time complexity. The numerical results demonstrate that our parallel algorithm exhibits obvious superiority in inferring large-scale GRNs.
Hierarchical approach to optimization of parallel matrix multiplication on large-scale platforms

KAUST Repository

Hasanov, Khalid

2014-03-04

© 2014, Springer Science+Business Media New York. Many state-of-the-art parallel algorithms, which are widely used in scientific applications executed on high-end computing systems, were designed in the twentieth century with relatively small-scale parallelism in mind. Indeed, while in 1990s a system with few hundred cores was considered a powerful supercomputer, modern top supercomputers have millions of cores. In this paper, we present a hierarchical approach to optimization of message-passing parallel algorithms for execution on large-scale distributed-memory systems. The idea is to reduce the communication cost by introducing hierarchy and hence more parallelism in the communication scheme. We apply this approach to SUMMA, the state-of-the-art parallel algorithm for matrix–matrix multiplication, and demonstrate both theoretically and experimentally that the modified Hierarchical SUMMA significantly improves the communication cost and the overall performance on large-scale platforms.
Fast Simulation of Large-Scale Floods Based on GPU Parallel Computing

Directory of Open Access Journals (Sweden)

Qiang Liu

2018-05-01

Full Text Available Computing speed is a significant issue of large-scale flood simulations for real-time response to disaster prevention and mitigation. Even today, most of the large-scale flood simulations are generally run on supercomputers due to the massive amounts of data and computations necessary. In this work, a two-dimensional shallow water model based on an unstructured Godunov-type finite volume scheme was proposed for flood simulation. To realize a fast simulation of large-scale floods on a personal computer, a Graphics Processing Unit (GPU-based, high-performance computing method using the OpenACC application was adopted to parallelize the shallow water model. An unstructured data management method was presented to control the data transportation between the GPU and CPU (Central Processing Unit with minimum overhead, and then both computation and data were offloaded from the CPU to the GPU, which exploited the computational capability of the GPU as much as possible. The parallel model was validated using various benchmarks and real-world case studies. The results demonstrate that speed-ups of up to one order of magnitude can be achieved in comparison with the serial model. The proposed parallel model provides a fast and reliable tool with which to quickly assess flood hazards in large-scale areas and, thus, has a bright application prospect for dynamic inundation risk identification and disaster assessment.
Visual analysis of inter-process communication for large-scale parallel computing.

Science.gov (United States)

Muelder, Chris; Gygi, Francois; Ma, Kwan-Liu

2009-01-01

In serial computation, program profiling is often helpful for optimization of key sections of code. When moving to parallel computation, not only does the code execution need to be considered but also communication between the different processes which can induce delays that are detrimental to performance. As the number of processes increases, so does the impact of the communication delays on performance. For large-scale parallel applications, it is critical to understand how the communication impacts performance in order to make the code more efficient. There are several tools available for visualizing program execution and communications on parallel systems. These tools generally provide either views which statistically summarize the entire program execution or process-centric views. However, process-centric visualizations do not scale well as the number of processes gets very large. In particular, the most common representation of parallel processes is a Gantt char t with a row for each process. As the number of processes increases, these charts can become difficult to work with and can even exceed screen resolution. We propose a new visualization approach that affords more scalability and then demonstrate it on systems running with up to 16,384 processes.
Large-Scale Parallel Finite Element Analysis of the Stress Singular Problems

International Nuclear Information System (INIS)

Noriyuki Kushida; Hiroshi Okuda; Genki Yagawa

2002-01-01

In this paper, the convergence behavior of large-scale parallel finite element method for the stress singular problems was investigated. The convergence behavior of iterative solvers depends on the efficiency of the pre-conditioners. However, efficiency of pre-conditioners may be influenced by the domain decomposition that is necessary for parallel FEM. In this study the following results were obtained: Conjugate gradient method without preconditioning and the diagonal scaling preconditioned conjugate gradient method were not influenced by the domain decomposition as expected. symmetric successive over relaxation method preconditioned conjugate gradient method converged 6% faster as maximum if the stress singular area was contained in one sub-domain. (authors)
Application of parallel computing techniques to a large-scale reservoir simulation

International Nuclear Information System (INIS)

Zhang, Keni; Wu, Yu-Shu; Ding, Chris; Pruess, Karsten

2001-01-01

Even with the continual advances made in both computational algorithms and computer hardware used in reservoir modeling studies, large-scale simulation of fluid and heat flow in heterogeneous reservoirs remains a challenge. The problem commonly arises from intensive computational requirement for detailed modeling investigations of real-world reservoirs. This paper presents the application of a massive parallel-computing version of the TOUGH2 code developed for performing large-scale field simulations. As an application example, the parallelized TOUGH2 code is applied to develop a three-dimensional unsaturated-zone numerical model simulating flow of moisture, gas, and heat in the unsaturated zone of Yucca Mountain, Nevada, a potential repository for high-level radioactive waste. The modeling approach employs refined spatial discretization to represent the heterogeneous fractured tuffs of the system, using more than a million 3-D gridblocks. The problem of two-phase flow and heat transfer within the model domain leads to a total of 3,226,566 linear equations to be solved per Newton iteration. The simulation is conducted on a Cray T3E-900, a distributed-memory massively parallel computer. Simulation results indicate that the parallel computing technique, as implemented in the TOUGH2 code, is very efficient. The reliability and accuracy of the model results have been demonstrated by comparing them to those of small-scale (coarse-grid) models. These comparisons show that simulation results obtained with the refined grid provide more detailed predictions of the future flow conditions at the site, aiding in the assessment of proposed repository performance
Parallelizing Gene Expression Programming Algorithm in Enabling Large-Scale Classification

Directory of Open Access Journals (Sweden)

Lixiong Xu

2017-01-01

Full Text Available As one of the most effective function mining algorithms, Gene Expression Programming (GEP algorithm has been widely used in classification, pattern recognition, prediction, and other research fields. Based on the self-evolution, GEP is able to mine an optimal function for dealing with further complicated tasks. However, in big data researches, GEP encounters low efficiency issue due to its long time mining processes. To improve the efficiency of GEP in big data researches especially for processing large-scale classification tasks, this paper presents a parallelized GEP algorithm using MapReduce computing model. The experimental results show that the presented algorithm is scalable and efficient for processing large-scale classification tasks.
Parallel Quasi Newton Algorithms for Large Scale Non Linear Unconstrained Optimization

International Nuclear Information System (INIS)

Rahman, M. A.; Basarudin, T.

1997-01-01

This paper discusses about Quasi Newton (QN) method to solve non-linear unconstrained minimization problems. One of many important of QN method is choice of matrix Hk. to be positive definite and satisfies to QN method. Our interest here is the parallel QN methods which will suite for the solution of large-scale optimization problems. The QN methods became less attractive in large-scale problems because of the storage and computational requirements. How ever, it is often the case that the Hessian is space matrix. In this paper we include the mechanism of how to reduce the Hessian update and hold the Hessian properties.One major reason of our research is that the QN method may be good in solving certain type of minimization problems, but it is efficiency degenerate when is it applied to solve other category of problems. For this reason, we use an algorithm containing several direction strategies which are processed in parallel. We shall attempt to parallelized algorithm by exploring different search directions which are generated by various QN update during the minimization process. The different line search strategies will be employed simultaneously in the process of locating the minimum along each direction.The code of algorithm will be written in Occam language 2 which is run on the transputer machine
Large-Scale, Parallel, Multi-Sensor Data Fusion in the Cloud

Science.gov (United States)

Wilson, B. D.; Manipon, G.; Hua, H.

2012-12-01

NASA's Earth Observing System (EOS) is an ambitious facility for studying global climate change. The mandate now is to combine measurements from the instruments on the "A-Train" platforms (AIRS, AMSR-E, MODIS, MISR, MLS, and CloudSat) and other Earth probes to enable large-scale studies of climate change over periods of years to decades. However, moving from predominantly single-instrument studies to a multi-sensor, measurement-based model for long-duration analysis of important climate variables presents serious challenges for large-scale data mining and data fusion. For example, one might want to compare temperature and water vapor retrievals from one instrument (AIRS) to another instrument (MODIS), and to a model (ECMWF), stratify the comparisons using a classification of the "cloud scenes" from CloudSat, and repeat the entire analysis over years of AIRS data. To perform such an analysis, one must discover & access multiple datasets from remote sites, find the space/time "matchups" between instruments swaths and model grids, understand the quality flags and uncertainties for retrieved physical variables, assemble merged datasets, and compute fused products for further scientific and statistical analysis. To efficiently assemble such decade-scale datasets in a timely manner, we are utilizing Elastic Computing in the Cloud and parallel map/reduce-based algorithms. "SciReduce" is a Hadoop-like parallel analysis system, programmed in parallel python, that is designed from the ground up for Earth science. SciReduce executes inside VMWare images and scales to any number of nodes in the Cloud. Unlike Hadoop, in which simple tuples (keys & values) are passed between the map and reduce functions, SciReduce operates on bundles of named numeric arrays, which can be passed in memory or serialized to disk in netCDF4 or HDF5. Thus, SciReduce uses the native datatypes (geolocated grids, swaths, and points) that geo-scientists are familiar with. We are deploying within Sci
Parallel Framework for Dimensionality Reduction of Large-Scale Datasets

Directory of Open Access Journals (Sweden)

Sai Kiranmayee Samudrala

2015-01-01

Full Text Available Dimensionality reduction refers to a set of mathematical techniques used to reduce complexity of the original high-dimensional data, while preserving its selected properties. Improvements in simulation strategies and experimental data collection methods are resulting in a deluge of heterogeneous and high-dimensional data, which often makes dimensionality reduction the only viable way to gain qualitative and quantitative understanding of the data. However, existing dimensionality reduction software often does not scale to datasets arising in real-life applications, which may consist of thousands of points with millions of dimensions. In this paper, we propose a parallel framework for dimensionality reduction of large-scale data. We identify key components underlying the spectral dimensionality reduction techniques, and propose their efficient parallel implementation. We show that the resulting framework can be used to process datasets consisting of millions of points when executed on a 16,000-core cluster, which is beyond the reach of currently available methods. To further demonstrate applicability of our framework we perform dimensionality reduction of 75,000 images representing morphology evolution during manufacturing of organic solar cells in order to identify how processing parameters affect morphology evolution.
Streaming Parallel GPU Acceleration of Large-Scale filter-based Spiking Neural Networks

NARCIS (Netherlands)

L.P. Slazynski (Leszek); S.M. Bohte (Sander)

2012-01-01

htmlabstractThe arrival of graphics processing (GPU) cards suitable for massively parallel computing promises a↵ordable large-scale neural network simulation previously only available at supercomputing facil- ities. While the raw numbers suggest that GPUs may outperform CPUs by at least an order of
Random number generators for large-scale parallel Monte Carlo simulations on FPGA

Science.gov (United States)

Lin, Y.; Wang, F.; Liu, B.

2018-05-01

Through parallelization, field programmable gate array (FPGA) can achieve unprecedented speeds in large-scale parallel Monte Carlo (LPMC) simulations. FPGA presents both new constraints and new opportunities for the implementations of random number generators (RNGs), which are key elements of any Monte Carlo (MC) simulation system. Using empirical and application based tests, this study evaluates all of the four RNGs used in previous FPGA based MC studies and newly proposed FPGA implementations for two well-known high-quality RNGs that are suitable for LPMC studies on FPGA. One of the newly proposed FPGA implementations: a parallel version of additive lagged Fibonacci generator (Parallel ALFG) is found to be the best among the evaluated RNGs in fulfilling the needs of LPMC simulations on FPGA.
Exploiting multi-scale parallelism for large scale numerical modelling of laser wakefield accelerators

International Nuclear Information System (INIS)

Fonseca, R A; Vieira, J; Silva, L O; Fiuza, F; Davidson, A; Tsung, F S; Mori, W B

2013-01-01

A new generation of laser wakefield accelerators (LWFA), supported by the extreme accelerating fields generated in the interaction of PW-Class lasers and underdense targets, promises the production of high quality electron beams in short distances for multiple applications. Achieving this goal will rely heavily on numerical modelling to further understand the underlying physics and identify optimal regimes, but large scale modelling of these scenarios is computationally heavy and requires the efficient use of state-of-the-art petascale supercomputing systems. We discuss the main difficulties involved in running these simulations and the new developments implemented in the OSIRIS framework to address these issues, ranging from multi-dimensional dynamic load balancing and hybrid distributed/shared memory parallelism to the vectorization of the PIC algorithm. We present the results of the OASCR Joule Metric program on the issue of large scale modelling of LWFA, demonstrating speedups of over 1 order of magnitude on the same hardware. Finally, scalability to over ∼10 6 cores and sustained performance over ∼2 P Flops is demonstrated, opening the way for large scale modelling of LWFA scenarios. (paper)
Parameter estimation in large-scale systems biology models: a parallel and self-adaptive cooperative strategy.

Science.gov (United States)

Penas, David R; González, Patricia; Egea, Jose A; Doallo, Ramón; Banga, Julio R

2017-01-21

The development of large-scale kinetic models is one of the current key issues in computational systems biology and bioinformatics. Here we consider the problem of parameter estimation in nonlinear dynamic models. Global optimization methods can be used to solve this type of problems but the associated computational cost is very large. Moreover, many of these methods need the tuning of a number of adjustable search parameters, requiring a number of initial exploratory runs and therefore further increasing the computation times. Here we present a novel parallel method, self-adaptive cooperative enhanced scatter search (saCeSS), to accelerate the solution of this class of problems. The method is based on the scatter search optimization metaheuristic and incorporates several key new mechanisms: (i) asynchronous cooperation between parallel processes, (ii) coarse and fine-grained parallelism, and (iii) self-tuning strategies. The performance and robustness of saCeSS is illustrated by solving a set of challenging parameter estimation problems, including medium and large-scale kinetic models of the bacterium E. coli, bakerés yeast S. cerevisiae, the vinegar fly D. melanogaster, Chinese Hamster Ovary cells, and a generic signal transduction network. The results consistently show that saCeSS is a robust and efficient method, allowing very significant reduction of computation times with respect to several previous state of the art methods (from days to minutes, in several cases) even when only a small number of processors is used. The new parallel cooperative method presented here allows the solution of medium and large scale parameter estimation problems in reasonable computation times and with small hardware requirements. Further, the method includes self-tuning mechanisms which facilitate its use by non-experts. We believe that this new method can play a key role in the development of large-scale and even whole-cell dynamic models.
Decomposition and parallelization strategies for solving large-scale MDO problems

Energy Technology Data Exchange (ETDEWEB)

Grauer, M.; Eschenauer, H.A. [Research Center for Multidisciplinary Analyses and Applied Structural Optimization, FOMAAS, Univ. of Siegen (Germany)

2007-07-01

During previous years, structural optimization has been recognized as a useful tool within the discriptiones of engineering and economics. However, the optimization of large-scale systems or structures is impeded by an immense solution effort. This was the reason to start a joint research and development (R and D) project between the Institute of Mechanics and Control Engineering and the Information and Decision Sciences Institute within the Research Center for Multidisciplinary Analyses and Applied Structural Optimization (FOMAAS) on cluster computing for parallel and distributed solution of multidisciplinary optimization (MDO) problems based on the OpTiX-Workbench. Here the focus of attention will be put on coarsegrained parallelization and its implementation on clusters of workstations. A further point of emphasis was laid on the development of a parallel decomposition strategy called PARDEC, for the solution of very complex optimization problems which cannot be solved efficiently by sequential integrated optimization. The use of the OptiX-Workbench together with the FEM ground water simulation system FEFLOW is shown for a special water management problem. (orig.)
Large-scale parallel genome assembler over cloud computing environment.

Science.gov (United States)

Das, Arghya Kusum; Koppa, Praveen Kumar; Goswami, Sayan; Platania, Richard; Park, Seung-Jong

2017-06-01

The size of high throughput DNA sequencing data has already reached the terabyte scale. To manage this huge volume of data, many downstream sequencing applications started using locality-based computing over different cloud infrastructures to take advantage of elastic (pay as you go) resources at a lower cost. However, the locality-based programming model (e.g. MapReduce) is relatively new. Consequently, developing scalable data-intensive bioinformatics applications using this model and understanding the hardware environment that these applications require for good performance, both require further research. In this paper, we present a de Bruijn graph oriented Parallel Giraph-based Genome Assembler (GiGA), as well as the hardware platform required for its optimal performance. GiGA uses the power of Hadoop (MapReduce) and Giraph (large-scale graph analysis) to achieve high scalability over hundreds of compute nodes by collocating the computation and data. GiGA achieves significantly higher scalability with competitive assembly quality compared to contemporary parallel assemblers (e.g. ABySS and Contrail) over traditional HPC cluster. Moreover, we show that the performance of GiGA is significantly improved by using an SSD-based private cloud infrastructure over traditional HPC cluster. We observe that the performance of GiGA on 256 cores of this SSD-based cloud infrastructure closely matches that of 512 cores of traditional HPC cluster.
Parallel supercomputing: Advanced methods, algorithms, and software for large-scale linear and nonlinear problems

Energy Technology Data Exchange (ETDEWEB)

Carey, G.F.; Young, D.M.

1993-12-31

The program outlined here is directed to research on methods, algorithms, and software for distributed parallel supercomputers. Of particular interest are finite element methods and finite difference methods together with sparse iterative solution schemes for scientific and engineering computations of very large-scale systems. Both linear and nonlinear problems will be investigated. In the nonlinear case, applications with bifurcation to multiple solutions will be considered using continuation strategies. The parallelizable numerical methods of particular interest are a family of partitioning schemes embracing domain decomposition, element-by-element strategies, and multi-level techniques. The methods will be further developed incorporating parallel iterative solution algorithms with associated preconditioners in parallel computer software. The schemes will be implemented on distributed memory parallel architectures such as the CRAY MPP, Intel Paragon, the NCUBE3, and the Connection Machine. We will also consider other new architectures such as the Kendall-Square (KSQ) and proposed machines such as the TERA. The applications will focus on large-scale three-dimensional nonlinear flow and reservoir problems with strong convective transport contributions. These are legitimate grand challenge class computational fluid dynamics (CFD) problems of significant practical interest to DOE. The methods developed and algorithms will, however, be of wider interest.
Implementation of highly parallel and large scale GW calculations within the OpenAtom software

Science.gov (United States)

Ismail-Beigi, Sohrab

The need to describe electronic excitations with better accuracy than provided by band structures produced by Density Functional Theory (DFT) has been a long-term enterprise for the computational condensed matter and materials theory communities. In some cases, appropriate theoretical frameworks have existed for some time but have been difficult to apply widely due to computational cost. For example, the GW approximation incorporates a great deal of important non-local and dynamical electronic interaction effects but has been too computationally expensive for routine use in large materials simulations. OpenAtom is an open source massively parallel ab initiodensity functional software package based on plane waves and pseudopotentials (http://charm.cs.uiuc.edu/OpenAtom/) that takes advantage of the Charm + + parallel framework. At present, it is developed via a three-way collaboration, funded by an NSF SI2-SSI grant (ACI-1339804), between Yale (Ismail-Beigi), IBM T. J. Watson (Glenn Martyna) and the University of Illinois at Urbana Champaign (Laxmikant Kale). We will describe the project and our current approach towards implementing large scale GW calculations with OpenAtom. Potential applications of large scale parallel GW software for problems involving electronic excitations in semiconductor and/or metal oxide systems will be also be pointed out.

Parallel Motion Simulation of Large-Scale Real-Time Crowd in a Hierarchical Environmental Model

Directory of Open Access Journals (Sweden)

Xin Wang

2012-01-01

Full Text Available This paper presents a parallel real-time crowd simulation method based on a hierarchical environmental model. A dynamical model of the complex environment should be constructed to simulate the state transition and propagation of individual motions. By modeling of a virtual environment where virtual crowds reside, we employ different parallel methods on a topological layer, a path layer and a perceptual layer. We propose a parallel motion path matching method based on the path layer and a parallel crowd simulation method based on the perceptual layer. The large-scale real-time crowd simulation becomes possible with these methods. Numerical experiments are carried out to demonstrate the methods and results.
Very Large-Scale Neighborhoods with Performance Guarantees for Minimizing Makespan on Parallel Machines

NARCIS (Netherlands)

Brueggemann, T.; Hurink, Johann L.; Vredeveld, T.; Woeginger, Gerhard

2006-01-01

We study the problem of minimizing the makespan on m parallel machines. We introduce a very large-scale neighborhood of exponential size (in the number of machines) that is based on a matching in a complete graph. The idea is to partition the jobs assigned to the same machine into two sets. This
DGDFT: A massively parallel method for large scale density functional theory calculations.

Science.gov (United States)

Hu, Wei; Lin, Lin; Yang, Chao

2015-09-28

We describe a massively parallel implementation of the recently developed discontinuous Galerkin density functional theory (DGDFT) method, for efficient large-scale Kohn-Sham DFT based electronic structure calculations. The DGDFT method uses adaptive local basis (ALB) functions generated on-the-fly during the self-consistent field iteration to represent the solution to the Kohn-Sham equations. The use of the ALB set provides a systematic way to improve the accuracy of the approximation. By using the pole expansion and selected inversion technique to compute electron density, energy, and atomic forces, we can make the computational complexity of DGDFT scale at most quadratically with respect to the number of electrons for both insulating and metallic systems. We show that for the two-dimensional (2D) phosphorene systems studied here, using 37 basis functions per atom allows us to reach an accuracy level of 1.3 × 10(-4) Hartree/atom in terms of the error of energy and 6.2 × 10(-4) Hartree/bohr in terms of the error of atomic force, respectively. DGDFT can achieve 80% parallel efficiency on 128,000 high performance computing cores when it is used to study the electronic structure of 2D phosphorene systems with 3500-14 000 atoms. This high parallel efficiency results from a two-level parallelization scheme that we will describe in detail.
DGDFT: A massively parallel method for large scale density functional theory calculations

International Nuclear Information System (INIS)

Hu, Wei; Yang, Chao; Lin, Lin

2015-01-01

We describe a massively parallel implementation of the recently developed discontinuous Galerkin density functional theory (DGDFT) method, for efficient large-scale Kohn-Sham DFT based electronic structure calculations. The DGDFT method uses adaptive local basis (ALB) functions generated on-the-fly during the self-consistent field iteration to represent the solution to the Kohn-Sham equations. The use of the ALB set provides a systematic way to improve the accuracy of the approximation. By using the pole expansion and selected inversion technique to compute electron density, energy, and atomic forces, we can make the computational complexity of DGDFT scale at most quadratically with respect to the number of electrons for both insulating and metallic systems. We show that for the two-dimensional (2D) phosphorene systems studied here, using 37 basis functions per atom allows us to reach an accuracy level of 1.3 × 10 −4 Hartree/atom in terms of the error of energy and 6.2 × 10 −4 Hartree/bohr in terms of the error of atomic force, respectively. DGDFT can achieve 80% parallel efficiency on 128,000 high performance computing cores when it is used to study the electronic structure of 2D phosphorene systems with 3500-14 000 atoms. This high parallel efficiency results from a two-level parallelization scheme that we will describe in detail
DGDFT: A massively parallel method for large scale density functional theory calculations

Energy Technology Data Exchange (ETDEWEB)

Hu, Wei, E-mail: whu@lbl.gov; Yang, Chao, E-mail: cyang@lbl.gov [Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720 (United States); Lin, Lin, E-mail: linlin@math.berkeley.edu [Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720 (United States); Department of Mathematics, University of California, Berkeley, California 94720 (United States)

2015-09-28

We describe a massively parallel implementation of the recently developed discontinuous Galerkin density functional theory (DGDFT) method, for efficient large-scale Kohn-Sham DFT based electronic structure calculations. The DGDFT method uses adaptive local basis (ALB) functions generated on-the-fly during the self-consistent field iteration to represent the solution to the Kohn-Sham equations. The use of the ALB set provides a systematic way to improve the accuracy of the approximation. By using the pole expansion and selected inversion technique to compute electron density, energy, and atomic forces, we can make the computational complexity of DGDFT scale at most quadratically with respect to the number of electrons for both insulating and metallic systems. We show that for the two-dimensional (2D) phosphorene systems studied here, using 37 basis functions per atom allows us to reach an accuracy level of 1.3 × 10{sup −4} Hartree/atom in terms of the error of energy and 6.2 × 10{sup −4} Hartree/bohr in terms of the error of atomic force, respectively. DGDFT can achieve 80% parallel efficiency on 128,000 high performance computing cores when it is used to study the electronic structure of 2D phosphorene systems with 3500-14 000 atoms. This high parallel efficiency results from a two-level parallelization scheme that we will describe in detail.
Efficient graph-based dynamic load-balancing for parallel large-scale agent-based traffic simulation

NARCIS (Netherlands)

Xu, Y.; Cai, W.; Aydt, H.; Lees, M.; Tolk, A.; Diallo, S.Y.; Ryzhov, I.O.; Yilmaz, L.; Buckley, S.; Miller, J.A.

2014-01-01

One of the issues of parallelizing large-scale agent-based traffic simulations is partitioning and load-balancing. Traffic simulations are dynamic applications where the distribution of workload in the spatial domain constantly changes. Dynamic load-balancing at run-time has shown better efficiency
Parallel Tensor Compression for Large-Scale Scientific Data.

Energy Technology Data Exchange (ETDEWEB)

Kolda, Tamara G. [Sandia National Lab. (SNL-CA), Livermore, CA (United States); Ballard, Grey [Sandia National Lab. (SNL-CA), Livermore, CA (United States); Austin, Woody Nathan [Univ. of Texas, Austin, TX (United States)

2015-10-01

As parallel computing trends towards the exascale, scientific data produced by high-fidelity simulations are growing increasingly massive. For instance, a simulation on a three-dimensional spatial grid with 512 points per dimension that tracks 64 variables per grid point for 128 time steps yields 8 TB of data. By viewing the data as a dense five way tensor, we can compute a Tucker decomposition to find inherent low-dimensional multilinear structure, achieving compression ratios of up to 10000 on real-world data sets with negligible loss in accuracy. So that we can operate on such massive data, we present the first-ever distributed memory parallel implementation for the Tucker decomposition, whose key computations correspond to parallel linear algebra operations, albeit with nonstandard data layouts. Our approach specifies a data distribution for tensors that avoids any tensor data redistribution, either locally or in parallel. We provide accompanying analysis of the computation and communication costs of the algorithms. To demonstrate the compression and accuracy of the method, we apply our approach to real-world data sets from combustion science simulations. We also provide detailed performance results, including parallel performance in both weak and strong scaling experiments.
Multilevel parallel strategy on Monte Carlo particle transport for the large-scale full-core pin-by-pin simulations

International Nuclear Information System (INIS)

Zhang, B.; Li, G.; Wang, W.; Shangguan, D.; Deng, L.

2015-01-01

This paper introduces the Strategy of multilevel hybrid parallelism of JCOGIN Infrastructure on Monte Carlo Particle Transport for the large-scale full-core pin-by-pin simulations. The particle parallelism, domain decomposition parallelism and MPI/OpenMP parallelism are designed and implemented. By the testing, JMCT presents the parallel scalability of JCOGIN, which reaches the parallel efficiency 80% on 120,000 cores for the pin-by-pin computation of the BEAVRS benchmark. (author)
A review of parallel computing for large-scale remote sensing image mosaicking

OpenAIRE

Chen, Lajiao; Ma, Yan; Liu, Peng; Wei, Jingbo; Jie, Wei; He, Jijun

2015-01-01

Interest in image mosaicking has been spurred by a wide variety of research and management needs. However, for large-scale applications, remote sensing image mosaicking usually requires significant computational capabilities. Several studies have attempted to apply parallel computing to improve image mosaicking algorithms and to speed up calculation process. The state of the art of this field has not yet been summarized, which is, however, essential for a better understanding and for further ...
On Modeling Large-Scale Multi-Agent Systems with Parallel, Sequential and Genuinely Asynchronous Cellular Automata

International Nuclear Information System (INIS)

Tosic, P.T.

2011-01-01

We study certain types of Cellular Automata (CA) viewed as an abstraction of large-scale Multi-Agent Systems (MAS). We argue that the classical CA model needs to be modified in several important respects, in order to become a relevant and sufficiently general model for the large-scale MAS, and so that thus generalized model can capture many important MAS properties at the level of agent ensembles and their long-term collective behavior patterns. We specifically focus on the issue of inter-agent communication in CA, and propose sequential cellular automata (SCA) as the first step, and genuinely Asynchronous Cellular Automata (ACA) as the ultimate deterministic CA-based abstract models for large-scale MAS made of simple reactive agents. We first formulate deterministic and nondeterministic versions of sequential CA, and then summarize some interesting configuration space properties (i.e., possible behaviors) of a restricted class of sequential CA. In particular, we compare and contrast those properties of sequential CA with the corresponding properties of the classical (that is, parallel and perfectly synchronous) CA with the same restricted class of update rules. We analytically demonstrate failure of the studied sequential CA models to simulate all possible behaviors of perfectly synchronous parallel CA, even for a very restricted class of non-linear totalistic node update rules. The lesson learned is that the interleaving semantics of concurrency, when applied to sequential CA, is not refined enough to adequately capture the perfect synchrony of parallel CA updates. Last but not least, we outline what would be an appropriate CA-like abstraction for large-scale distributed computing insofar as the inter-agent communication model is concerned, and in that context we propose genuinely asynchronous CA. (author)
Visual Data-Analytics of Large-Scale Parallel Discrete-Event Simulations

Energy Technology Data Exchange (ETDEWEB)

Ross, Caitlin; Carothers, Christopher D.; Mubarak, Misbah; Carns, Philip; Ross, Robert; Li, Jianping Kelvin; Ma, Kwan-Liu

2016-11-13

Parallel discrete-event simulation (PDES) is an important tool in the codesign of extreme-scale systems because PDES provides a cost-effective way to evaluate designs of highperformance computing systems. Optimistic synchronization algorithms for PDES, such as Time Warp, allow events to be processed without global synchronization among the processing elements. A rollback mechanism is provided when events are processed out of timestamp order. Although optimistic synchronization protocols enable the scalability of large-scale PDES, the performance of the simulations must be tuned to reduce the number of rollbacks and provide an improved simulation runtime. To enable efficient large-scale optimistic simulations, one has to gain insight into the factors that affect the rollback behavior and simulation performance. We developed a tool for ROSS model developers that gives them detailed metrics on the performance of their large-scale optimistic simulations at varying levels of simulation granularity. Model developers can use this information for parameter tuning of optimistic simulations in order to achieve better runtime and fewer rollbacks. In this work, we instrument the ROSS optimistic PDES framework to gather detailed statistics about the simulation engine. We have also developed an interactive visualization interface that uses the data collected by the ROSS instrumentation to understand the underlying behavior of the simulation engine. The interface connects real time to virtual time in the simulation and provides the ability to view simulation data at different granularities. We demonstrate the usefulness of our framework by performing a visual analysis of the dragonfly network topology model provided by the CODES simulation framework built on top of ROSS. The instrumentation needs to minimize overhead in order to accurately collect data about the simulation performance. To ensure that the instrumentation does not introduce unnecessary overhead, we perform a
Large-Scale, Parallel, Multi-Sensor Atmospheric Data Fusion Using Cloud Computing

Science.gov (United States)

Wilson, B. D.; Manipon, G.; Hua, H.; Fetzer, E. J.

2013-12-01

NASA's Earth Observing System (EOS) is an ambitious facility for studying global climate change. The mandate now is to combine measurements from the instruments on the 'A-Train' platforms (AIRS, AMSR-E, MODIS, MISR, MLS, and CloudSat) and other Earth probes to enable large-scale studies of climate change over decades. Moving to multi-sensor, long-duration analyses of important climate variables presents serious challenges for large-scale data mining and fusion. For example, one might want to compare temperature and water vapor retrievals from one instrument (AIRS) to another (MODIS), and to a model (MERRA), stratify the comparisons using a classification of the 'cloud scenes' from CloudSat, and repeat the entire analysis over 10 years of data. To efficiently assemble such datasets, we are utilizing Elastic Computing in the Cloud and parallel map/reduce-based algorithms. However, these problems are Data Intensive computing so the data transfer times and storage costs (for caching) are key issues. SciReduce is a Hadoop-like parallel analysis system, programmed in parallel python, that is designed from the ground up for Earth science. SciReduce executes inside VMWare images and scales to any number of nodes in the Cloud. Unlike Hadoop, SciReduce operates on bundles of named numeric arrays, which can be passed in memory or serialized to disk in netCDF4 or HDF5. Figure 1 shows the architecture of the full computational system, with SciReduce at the core. Multi-year datasets are automatically 'sharded' by time and space across a cluster of nodes so that years of data (millions of files) can be processed in a massively parallel way. Input variables (arrays) are pulled on-demand into the Cloud using OPeNDAP URLs or other subsetting services, thereby minimizing the size of the cached input and intermediate datasets. We are using SciReduce to automate the production of multiple versions of a ten-year A-Train water vapor climatology under a NASA MEASURES grant. We will
Parallel Computational Fluid Dynamics 2007 : Implementations and Experiences on Large Scale and Grid Computing

CERN Document Server

2009-01-01

At the 19th Annual Conference on Parallel Computational Fluid Dynamics held in Antalya, Turkey, in May 2007, the most recent developments and implementations of large-scale and grid computing were presented. This book, comprised of the invited and selected papers of this conference, details those advances, which are of particular interest to CFD and CFD-related communities. It also offers the results related to applications of various scientific and engineering problems involving flows and flow-related topics. Intended for CFD researchers and graduate students, this book is a state-of-the-art presentation of the relevant methodology and implementation techniques of large-scale computing.
Parallel Optimization of Polynomials for Large-scale Problems in Stability and Control

Science.gov (United States)

Kamyar, Reza

In this thesis, we focus on some of the NP-hard problems in control theory. Thanks to the converse Lyapunov theory, these problems can often be modeled as optimization over polynomials. To avoid the problem of intractability, we establish a trade off between accuracy and complexity. In particular, we develop a sequence of tractable optimization problems --- in the form of Linear Programs (LPs) and/or Semi-Definite Programs (SDPs) --- whose solutions converge to the exact solution of the NP-hard problem. However, the computational and memory complexity of these LPs and SDPs grow exponentially with the progress of the sequence - meaning that improving the accuracy of the solutions requires solving SDPs with tens of thousands of decision variables and constraints. Setting up and solving such problems is a significant challenge. The existing optimization algorithms and software are only designed to use desktop computers or small cluster computers --- machines which do not have sufficient memory for solving such large SDPs. Moreover, the speed-up of these algorithms does not scale beyond dozens of processors. This in fact is the reason we seek parallel algorithms for setting-up and solving large SDPs on large cluster- and/or super-computers. We propose parallel algorithms for stability analysis of two classes of systems: 1) Linear systems with a large number of uncertain parameters; 2) Nonlinear systems defined by polynomial vector fields. First, we develop a distributed parallel algorithm which applies Polya's and/or Handelman's theorems to some variants of parameter-dependent Lyapunov inequalities with parameters defined over the standard simplex. The result is a sequence of SDPs which possess a block-diagonal structure. We then develop a parallel SDP solver which exploits this structure in order to map the computation, memory and communication to a distributed parallel environment. Numerical tests on a supercomputer demonstrate the ability of the algorithm to
SQDFT: Spectral Quadrature method for large-scale parallel O(N) Kohn-Sham calculations at high temperature

Science.gov (United States)

Suryanarayana, Phanish; Pratapa, Phanisri P.; Sharma, Abhiraj; Pask, John E.

2018-03-01

We present SQDFT: a large-scale parallel implementation of the Spectral Quadrature (SQ) method for O(N) Kohn-Sham Density Functional Theory (DFT) calculations at high temperature. Specifically, we develop an efficient and scalable finite-difference implementation of the infinite-cell Clenshaw-Curtis SQ approach, in which results for the infinite crystal are obtained by expressing quantities of interest as bilinear forms or sums of bilinear forms, that are then approximated by spatially localized Clenshaw-Curtis quadrature rules. We demonstrate the accuracy of SQDFT by showing systematic convergence of energies and atomic forces with respect to SQ parameters to reference diagonalization results, and convergence with discretization to established planewave results, for both metallic and insulating systems. We further demonstrate that SQDFT achieves excellent strong and weak parallel scaling on computer systems consisting of tens of thousands of processors, with near perfect O(N) scaling with system size and wall times as low as a few seconds per self-consistent field iteration. Finally, we verify the accuracy of SQDFT in large-scale quantum molecular dynamics simulations of aluminum at high temperature.
Large-scale parallel configuration interaction. II. Two- and four-component double-group general active space implementation with application to BiH

DEFF Research Database (Denmark)

Knecht, Stefan; Jensen, Hans Jørgen Aagaard; Fleig, Timo

2010-01-01

We present a parallel implementation of a large-scale relativistic double-group configuration interaction CIprogram. It is applicable with a large variety of two- and four-component Hamiltonians. The parallel algorithm is based on a distributed data model in combination with a static load balanci...
Parallel Dynamic Analysis of a Large-Scale Water Conveyance Tunnel under Seismic Excitation Using ALE Finite-Element Method

Directory of Open Access Journals (Sweden)

Xiaoqing Wang

2016-01-01

Full Text Available Parallel analyses about the dynamic responses of a large-scale water conveyance tunnel under seismic excitation are presented in this paper. A full three-dimensional numerical model considering the water-tunnel-soil coupling is established and adopted to investigate the tunnel’s dynamic responses. The movement and sloshing of the internal water are simulated using the multi-material Arbitrary Lagrangian Eulerian (ALE method. Nonlinear fluid–structure interaction (FSI between tunnel and inner water is treated by using the penalty method. Nonlinear soil-structure interaction (SSI between soil and tunnel is dealt with by using the surface to surface contact algorithm. To overcome computing power limitations and to deal with such a large-scale calculation, a parallel algorithm based on the modified recursive coordinate bisection (MRCB considering the balance of SSI and FSI loads is proposed and used. The whole simulation is accomplished on Dawning 5000 A using the proposed MRCB based parallel algorithm optimized to run on supercomputers. The simulation model and the proposed approaches are validated by comparison with the added mass method. Dynamic responses of the tunnel are analyzed and the parallelism is discussed. Besides, factors affecting the dynamic responses are investigated. Better speedup and parallel efficiency show the scalability of the parallel method and the analysis results can be used to aid in the design of water conveyance tunnels.
An efficient implementation of 3D high-resolution imaging for large-scale seismic data with GPU/CPU heterogeneous parallel computing

Science.gov (United States)

Xu, Jincheng; Liu, Wei; Wang, Jin; Liu, Linong; Zhang, Jianfeng

2018-02-01

De-absorption pre-stack time migration (QPSTM) compensates for the absorption and dispersion of seismic waves by introducing an effective Q parameter, thereby making it an effective tool for 3D, high-resolution imaging of seismic data. Although the optimal aperture obtained via stationary-phase migration reduces the computational cost of 3D QPSTM and yields 3D stationary-phase QPSTM, the associated computational efficiency is still the main problem in the processing of 3D, high-resolution images for real large-scale seismic data. In the current paper, we proposed a division method for large-scale, 3D seismic data to optimize the performance of stationary-phase QPSTM on clusters of graphics processing units (GPU). Then, we designed an imaging point parallel strategy to achieve an optimal parallel computing performance. Afterward, we adopted an asynchronous double buffering scheme for multi-stream to perform the GPU/CPU parallel computing. Moreover, several key optimization strategies of computation and storage based on the compute unified device architecture (CUDA) were adopted to accelerate the 3D stationary-phase QPSTM algorithm. Compared with the initial GPU code, the implementation of the key optimization steps, including thread optimization, shared memory optimization, register optimization and special function units (SFU), greatly improved the efficiency. A numerical example employing real large-scale, 3D seismic data showed that our scheme is nearly 80 times faster than the CPU-QPSTM algorithm. Our GPU/CPU heterogeneous parallel computing framework significant reduces the computational cost and facilitates 3D high-resolution imaging for large-scale seismic data.
Hierarchical approach to optimization of parallel matrix multiplication on large-scale platforms

KAUST Repository

Hasanov, Khalid; Quintin, Jean-Noë l; Lastovetsky, Alexey

2014-01-01

-scale parallelism in mind. Indeed, while in 1990s a system with few hundred cores was considered a powerful supercomputer, modern top supercomputers have millions of cores. In this paper, we present a hierarchical approach to optimization of message-passing parallel
Large-scale modeling of epileptic seizures: scaling properties of two parallel neuronal network simulation algorithms.

Science.gov (United States)

Pesce, Lorenzo L; Lee, Hyong C; Hereld, Mark; Visser, Sid; Stevens, Rick L; Wildeman, Albert; van Drongelen, Wim

2013-01-01

Our limited understanding of the relationship between the behavior of individual neurons and large neuronal networks is an important limitation in current epilepsy research and may be one of the main causes of our inadequate ability to treat it. Addressing this problem directly via experiments is impossibly complex; thus, we have been developing and studying medium-large-scale simulations of detailed neuronal networks to guide us. Flexibility in the connection schemas and a complete description of the cortical tissue seem necessary for this purpose. In this paper we examine some of the basic issues encountered in these multiscale simulations. We have determined the detailed behavior of two such simulators on parallel computer systems. The observed memory and computation-time scaling behavior for a distributed memory implementation were very good over the range studied, both in terms of network sizes (2,000 to 400,000 neurons) and processor pool sizes (1 to 256 processors). Our simulations required between a few megabytes and about 150 gigabytes of RAM and lasted between a few minutes and about a week, well within the capability of most multinode clusters. Therefore, simulations of epileptic seizures on networks with millions of cells should be feasible on current supercomputers.

Large-Scale Modeling of Epileptic Seizures: Scaling Properties of Two Parallel Neuronal Network Simulation Algorithms

Directory of Open Access Journals (Sweden)

Lorenzo L. Pesce

2013-01-01

Full Text Available Our limited understanding of the relationship between the behavior of individual neurons and large neuronal networks is an important limitation in current epilepsy research and may be one of the main causes of our inadequate ability to treat it. Addressing this problem directly via experiments is impossibly complex; thus, we have been developing and studying medium-large-scale simulations of detailed neuronal networks to guide us. Flexibility in the connection schemas and a complete description of the cortical tissue seem necessary for this purpose. In this paper we examine some of the basic issues encountered in these multiscale simulations. We have determined the detailed behavior of two such simulators on parallel computer systems. The observed memory and computation-time scaling behavior for a distributed memory implementation were very good over the range studied, both in terms of network sizes (2,000 to 400,000 neurons and processor pool sizes (1 to 256 processors. Our simulations required between a few megabytes and about 150 gigabytes of RAM and lasted between a few minutes and about a week, well within the capability of most multinode clusters. Therefore, simulations of epileptic seizures on networks with millions of cells should be feasible on current supercomputers.
Parallelization of a beam dynamics code and first large scale radio frequency quadrupole simulations

Directory of Open Access Journals (Sweden)

J. Xu

2007-01-01

Full Text Available The design and operation support of hadron (proton and heavy-ion linear accelerators require substantial use of beam dynamics simulation tools. The beam dynamics code TRACK has been originally developed at Argonne National Laboratory (ANL to fulfill the special requirements of the rare isotope accelerator (RIA accelerator systems. From the beginning, the code has been developed to make it useful in the three stages of a linear accelerator project, namely, the design, commissioning, and operation of the machine. To realize this concept, the code has unique features such as end-to-end simulations from the ion source to the final beam destination and automatic procedures for tuning of a multiple charge state heavy-ion beam. The TRACK code has become a general beam dynamics code for hadron linacs and has found wide applications worldwide. Until recently, the code has remained serial except for a simple parallelization used for the simulation of multiple seeds to study the machine errors. To speed up computation, the TRACK Poisson solver has been parallelized. This paper discusses different parallel models for solving the Poisson equation with the primary goal to extend the scalability of the code onto 1024 and more processors of the new generation of supercomputers known as BlueGene (BG/L. Domain decomposition techniques have been adapted and incorporated into the parallel version of the TRACK code. To demonstrate the new capabilities of the parallelized TRACK code, the dynamics of a 45 mA proton beam represented by 10^{8} particles has been simulated through the 325 MHz radio frequency quadrupole and initial accelerator section of the proposed FNAL proton driver. The results show the benefits and advantages of large-scale parallel computing in beam dynamics simulations.
Distributed parallel cooperative coevolutionary multi-objective large-scale immune algorithm for deployment of wireless sensor networks

DEFF Research Database (Denmark)

Cao, Bin; Zhao, Jianwei; Yang, Po

2018-01-01

-objective evolutionary algorithms the Cooperative Coevolutionary Generalized Differential Evolution 3, the Cooperative Multi-objective Differential Evolution and the Nondominated Sorting Genetic Algorithm III, the proposed algorithm addresses the deployment optimization problem efficiently and effectively.......Using immune algorithms is generally a time-intensive process especially for problems with a large number of variables. In this paper, we propose a distributed parallel cooperative coevolutionary multi-objective large-scale immune algorithm that is implemented using the message passing interface...... (MPI). The proposed algorithm is composed of three layers: objective, group and individual layers. First, for each objective in the multi-objective problem to be addressed, a subpopulation is used for optimization, and an archive population is used to optimize all the objectives. Second, the large...
Robust large-scale parallel nonlinear solvers for simulations.

Energy Technology Data Exchange (ETDEWEB)

Bader, Brett William; Pawlowski, Roger Patrick; Kolda, Tamara Gibson (Sandia National Laboratories, Livermore, CA)

2005-11-01

This report documents research to develop robust and efficient solution techniques for solving large-scale systems of nonlinear equations. The most widely used method for solving systems of nonlinear equations is Newton's method. While much research has been devoted to augmenting Newton-based solvers (usually with globalization techniques), little has been devoted to exploring the application of different models. Our research has been directed at evaluating techniques using different models than Newton's method: a lower order model, Broyden's method, and a higher order model, the tensor method. We have developed large-scale versions of each of these models and have demonstrated their use in important applications at Sandia. Broyden's method replaces the Jacobian with an approximation, allowing codes that cannot evaluate a Jacobian or have an inaccurate Jacobian to converge to a solution. Limited-memory methods, which have been successful in optimization, allow us to extend this approach to large-scale problems. We compare the robustness and efficiency of Newton's method, modified Newton's method, Jacobian-free Newton-Krylov method, and our limited-memory Broyden method. Comparisons are carried out for large-scale applications of fluid flow simulations and electronic circuit simulations. Results show that, in cases where the Jacobian was inaccurate or could not be computed, Broyden's method converged in some cases where Newton's method failed to converge. We identify conditions where Broyden's method can be more efficient than Newton's method. We also present modifications to a large-scale tensor method, originally proposed by Bouaricha, for greater efficiency, better robustness, and wider applicability. Tensor methods are an alternative to Newton-based methods and are based on computing a step based on a local quadratic model rather than a linear model. The advantage of Bouaricha's method is that it can use any
A novel two-level dynamic parallel data scheme for large 3-D SN calculations

International Nuclear Information System (INIS)

Sjoden, G.E.; Shedlock, D.; Haghighat, A.; Yi, C.

2005-01-01

We introduce a new dynamic parallel memory optimization scheme for executing large scale 3-D discrete ordinates (Sn) simulations on distributed memory parallel computers. In order for parallel transport codes to be truly scalable, they must use parallel data storage, where only the variables that are locally computed are locally stored. Even with parallel data storage for the angular variables, cumulative storage requirements for large discrete ordinates calculations can be prohibitive. To address this problem, Memory Tuning has been implemented into the PENTRAN 3-D parallel discrete ordinates code as an optimized, two-level ('large' array, 'small' array) parallel data storage scheme. Memory Tuning can be described as the process of parallel data memory optimization. Memory Tuning dynamically minimizes the amount of required parallel data in allocated memory on each processor using a statistical sampling algorithm. This algorithm is based on the integral average and standard deviation of the number of fine meshes contained in each coarse mesh in the global problem. Because PENTRAN only stores the locally computed problem phase space, optimal two-level memory assignments can be unique on each node, depending upon the parallel decomposition used (hybrid combinations of angular, energy, or spatial). As demonstrated in the two large discrete ordinates models presented (a storage cask and an OECD MOX Benchmark), Memory Tuning can save a substantial amount of memory per parallel processor, allowing one to accomplish very large scale Sn computations. (authors)
Parallel simulation of tsunami inundation on a large-scale supercomputer

Science.gov (United States)

Oishi, Y.; Imamura, F.; Sugawara, D.

2013-12-01

An accurate prediction of tsunami inundation is important for disaster mitigation purposes. One approach is to approximate the tsunami wave source through an instant inversion analysis using real-time observation data (e.g., Tsushima et al., 2009) and then use the resulting wave source data in an instant tsunami inundation simulation. However, a bottleneck of this approach is the large computational cost of the non-linear inundation simulation and the computational power of recent massively parallel supercomputers is helpful to enable faster than real-time execution of a tsunami inundation simulation. Parallel computers have become approximately 1000 times faster in 10 years (www.top500.org), and so it is expected that very fast parallel computers will be more and more prevalent in the near future. Therefore, it is important to investigate how to efficiently conduct a tsunami simulation on parallel computers. In this study, we are targeting very fast tsunami inundation simulations on the K computer, currently the fastest Japanese supercomputer, which has a theoretical peak performance of 11.2 PFLOPS. One computing node of the K computer consists of 1 CPU with 8 cores that share memory, and the nodes are connected through a high-performance torus-mesh network. The K computer is designed for distributed-memory parallel computation, so we have developed a parallel tsunami model. Our model is based on TUNAMI-N2 model of Tohoku University, which is based on a leap-frog finite difference method. A grid nesting scheme is employed to apply high-resolution grids only at the coastal regions. To balance the computation load of each CPU in the parallelization, CPUs are first allocated to each nested layer in proportion to the number of grid points of the nested layer. Using CPUs allocated to each layer, 1-D domain decomposition is performed on each layer. In the parallel computation, three types of communication are necessary: (1) communication to adjacent neighbours for the
Parallel integer sorting with medium and fine-scale parallelism

Science.gov (United States)

Dagum, Leonardo

1993-01-01

Two new parallel integer sorting algorithms, queue-sort and barrel-sort, are presented and analyzed in detail. These algorithms do not have optimal parallel complexity, yet they show very good performance in practice. Queue-sort designed for fine-scale parallel architectures which allow the queueing of multiple messages to the same destination. Barrel-sort is designed for medium-scale parallel architectures with a high message passing overhead. The performance results from the implementation of queue-sort on a Connection Machine CM-2 and barrel-sort on a 128 processor iPSC/860 are given. The two implementations are found to be comparable in performance but not as good as a fully vectorized bucket sort on the Cray YMP.
Parallel continuous simulated tempering and its applications in large-scale molecular simulations

Energy Technology Data Exchange (ETDEWEB)

Zang, Tianwu; Yu, Linglin; Zhang, Chong [Applied Physics Program and Department of Bioengineering, Rice University, Houston, Texas 77005 (United States); Ma, Jianpeng, E-mail: jpma@bcm.tmc.edu [Applied Physics Program and Department of Bioengineering, Rice University, Houston, Texas 77005 (United States); Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, BCM-125, Houston, Texas 77030 (United States)

2014-07-28

In this paper, we introduce a parallel continuous simulated tempering (PCST) method for enhanced sampling in studying large complex systems. It mainly inherits the continuous simulated tempering (CST) method in our previous studies [C. Zhang and J. Ma, J. Chem. Phys. 130, 194112 (2009); C. Zhang and J. Ma, J. Chem. Phys. 132, 244101 (2010)], while adopts the spirit of parallel tempering (PT), or replica exchange method, by employing multiple copies with different temperature distributions. Differing from conventional PT methods, despite the large stride of total temperature range, the PCST method requires very few copies of simulations, typically 2–3 copies, yet it is still capable of maintaining a high rate of exchange between neighboring copies. Furthermore, in PCST method, the size of the system does not dramatically affect the number of copy needed because the exchange rate is independent of total potential energy, thus providing an enormous advantage over conventional PT methods in studying very large systems. The sampling efficiency of PCST was tested in two-dimensional Ising model, Lennard-Jones liquid and all-atom folding simulation of a small globular protein trp-cage in explicit solvent. The results demonstrate that the PCST method significantly improves sampling efficiency compared with other methods and it is particularly effective in simulating systems with long relaxation time or correlation time. We expect the PCST method to be a good alternative to parallel tempering methods in simulating large systems such as phase transition and dynamics of macromolecules in explicit solvent.
Fast Simulation of Large-Scale Floods Based on GPU Parallel Computing

OpenAIRE

Qiang Liu; Yi Qin; Guodong Li

2018-01-01

Computing speed is a significant issue of large-scale flood simulations for real-time response to disaster prevention and mitigation. Even today, most of the large-scale flood simulations are generally run on supercomputers due to the massive amounts of data and computations necessary. In this work, a two-dimensional shallow water model based on an unstructured Godunov-type finite volume scheme was proposed for flood simulation. To realize a fast simulation of large-scale floods on a personal...
Parallel Algorithm for Incremental Betweenness Centrality on Large Graphs

KAUST Repository

Jamour, Fuad Tarek

2017-10-17

Betweenness centrality quantifies the importance of nodes in a graph in many applications, including network analysis, community detection and identification of influential users. Typically, graphs in such applications evolve over time. Thus, the computation of betweenness centrality should be performed incrementally. This is challenging because updating even a single edge may trigger the computation of all-pairs shortest paths in the entire graph. Existing approaches cannot scale to large graphs: they either require excessive memory (i.e., quadratic to the size of the input graph) or perform unnecessary computations rendering them prohibitively slow. We propose iCentral; a novel incremental algorithm for computing betweenness centrality in evolving graphs. We decompose the graph into biconnected components and prove that processing can be localized within the affected components. iCentral is the first algorithm to support incremental betweeness centrality computation within a graph component. This is done efficiently, in linear space; consequently, iCentral scales to large graphs. We demonstrate with real datasets that the serial implementation of iCentral is up to 3.7 times faster than existing serial methods. Our parallel implementation that scales to large graphs, is an order of magnitude faster than the state-of-the-art parallel algorithm, while using an order of magnitude less computational resources.
MapReduce Based Parallel Neural Networks in Enabling Large Scale Machine Learning.

Science.gov (United States)

Liu, Yang; Yang, Jie; Huang, Yuan; Xu, Lixiong; Li, Siguang; Qi, Man

2015-01-01

Artificial neural networks (ANNs) have been widely used in pattern recognition and classification applications. However, ANNs are notably slow in computation especially when the size of data is large. Nowadays, big data has received a momentum from both industry and academia. To fulfill the potentials of ANNs for big data applications, the computation process must be speeded up. For this purpose, this paper parallelizes neural networks based on MapReduce, which has become a major computing model to facilitate data intensive applications. Three data intensive scenarios are considered in the parallelization process in terms of the volume of classification data, the size of the training data, and the number of neurons in the neural network. The performance of the parallelized neural networks is evaluated in an experimental MapReduce computer cluster from the aspects of accuracy in classification and efficiency in computation.
Large-scale computing with Quantum Espresso

International Nuclear Information System (INIS)

Giannozzi, P.; Cavazzoni, C.

2009-01-01

This paper gives a short introduction to Quantum Espresso: a distribution of software for atomistic simulations in condensed-matter physics, chemical physics, materials science, and to its usage in large-scale parallel computing.
MapReduce Based Parallel Neural Networks in Enabling Large Scale Machine Learning

Directory of Open Access Journals (Sweden)

Yang Liu

2015-01-01

Full Text Available Artificial neural networks (ANNs have been widely used in pattern recognition and classification applications. However, ANNs are notably slow in computation especially when the size of data is large. Nowadays, big data has received a momentum from both industry and academia. To fulfill the potentials of ANNs for big data applications, the computation process must be speeded up. For this purpose, this paper parallelizes neural networks based on MapReduce, which has become a major computing model to facilitate data intensive applications. Three data intensive scenarios are considered in the parallelization process in terms of the volume of classification data, the size of the training data, and the number of neurons in the neural network. The performance of the parallelized neural networks is evaluated in an experimental MapReduce computer cluster from the aspects of accuracy in classification and efficiency in computation.
Parallel Computing in SCALE

International Nuclear Information System (INIS)

DeHart, Mark D.; Williams, Mark L.; Bowman, Stephen M.

2010-01-01

The SCALE computational architecture has remained basically the same since its inception 30 years ago, although constituent modules and capabilities have changed significantly. This SCALE concept was intended to provide a framework whereby independent codes can be linked to provide a more comprehensive capability than possible with the individual programs - allowing flexibility to address a wide variety of applications. However, the current system was designed originally for mainframe computers with a single CPU and with significantly less memory than today's personal computers. It has been recognized that the present SCALE computation system could be restructured to take advantage of modern hardware and software capabilities, while retaining many of the modular features of the present system. Preliminary work is being done to define specifications and capabilities for a more advanced computational architecture. This paper describes the state of current SCALE development activities and plans for future development. With the release of SCALE 6.1 in 2010, a new phase of evolutionary development will be available to SCALE users within the TRITON and NEWT modules. The SCALE (Standardized Computer Analyses for Licensing Evaluation) code system developed by Oak Ridge National Laboratory (ORNL) provides a comprehensive and integrated package of codes and nuclear data for a wide range of applications in criticality safety, reactor physics, shielding, isotopic depletion and decay, and sensitivity/uncertainty (S/U) analysis. Over the last three years, since the release of version 5.1 in 2006, several important new codes have been introduced within SCALE, and significant advances applied to existing codes. Many of these new features became available with the release of SCALE 6.0 in early 2009. However, beginning with SCALE 6.1, a first generation of parallel computing is being introduced. In addition to near-term improvements, a plan for longer term SCALE enhancement
Topology Optimization of Large Scale Stokes Flow Problems

DEFF Research Database (Denmark)

Aage, Niels; Poulsen, Thomas Harpsøe; Gersborg-Hansen, Allan

2008-01-01

This note considers topology optimization of large scale 2D and 3D Stokes flow problems using parallel computations. We solve problems with up to 1.125.000 elements in 2D and 128.000 elements in 3D on a shared memory computer consisting of Sun UltraSparc IV CPUs.......This note considers topology optimization of large scale 2D and 3D Stokes flow problems using parallel computations. We solve problems with up to 1.125.000 elements in 2D and 128.000 elements in 3D on a shared memory computer consisting of Sun UltraSparc IV CPUs....
A concurrent visualization system for large-scale unsteady simulations. Parallel vector performance on an NEC SX-4

International Nuclear Information System (INIS)

Takei, Toshifumi; Doi, Shun; Matsumoto, Hideki; Muramatsu, Kazuhiro

2000-01-01

We have developed a concurrent visualization system RVSLIB (Real-time Visual Simulation Library). This paper shows the effectiveness of the system when it is applied to large-scale unsteady simulations, for which the conventional post-processing approach may no longer work, on high-performance parallel vector supercomputers. The system performs almost all of the visualization tasks on a computation server and uses compressed visualized image data for efficient communication between the server and the user terminal. We have introduced several techniques, including vectorization and parallelization, into the system to minimize the computational costs of the visualization tools. The performance of RVSLIB was evaluated by using an actual CFD code on an NEC SX-4. The computational time increase due to the concurrent visualization was at most 3% for a smaller (1.6 million) grid and less than 1% for a larger (6.2 million) one. (author)
Concurrent Programming Using Actors: Exploiting Large-Scale Parallelism,

Science.gov (United States)

1985-10-07

ORGANIZATION NAME AND ADDRESS 10. PROGRAM ELEMENT. PROJECT. TASK* Artificial Inteligence Laboratory AREA Is WORK UNIT NUMBERS 545 Technology Square...D-R162 422 CONCURRENT PROGRMMIZNG USING f"OS XL?ITP TEH l’ LARGE-SCALE PARALLELISH(U) NASI AC E Al CAMBRIDGE ARTIFICIAL INTELLIGENCE L. G AGHA ET AL...RESOLUTION TEST CHART N~ATIONAL BUREAU OF STANDA.RDS - -96 A -E. __ _ __ __’ .,*- - -- •. - MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL
Parallel Index and Query for Large Scale Data Analysis

Energy Technology Data Exchange (ETDEWEB)

Chou, Jerry; Wu, Kesheng; Ruebel, Oliver; Howison, Mark; Qiang, Ji; Prabhat,; Austin, Brian; Bethel, E. Wes; Ryne, Rob D.; Shoshani, Arie

2011-07-18

Modern scientific datasets present numerous data management and analysis challenges. State-of-the-art index and query technologies are critical for facilitating interactive exploration of large datasets, but numerous challenges remain in terms of designing a system for process- ing general scientific datasets. The system needs to be able to run on distributed multi-core platforms, efficiently utilize underlying I/O infrastructure, and scale to massive datasets. We present FastQuery, a novel software framework that address these challenges. FastQuery utilizes a state-of-the-art index and query technology (FastBit) and is designed to process mas- sive datasets on modern supercomputing platforms. We apply FastQuery to processing of a massive 50TB dataset generated by a large scale accelerator modeling code. We demonstrate the scalability of the tool to 11,520 cores. Motivated by the scientific need to search for inter- esting particles in this dataset, we use our framework to reduce search time from hours to tens of seconds.
Large-scale Intelligent Transporation Systems simulation

Energy Technology Data Exchange (ETDEWEB)

Ewing, T.; Canfield, T.; Hannebutte, U.; Levine, D.; Tentner, A.

1995-06-01

A prototype computer system has been developed which defines a high-level architecture for a large-scale, comprehensive, scalable simulation of an Intelligent Transportation System (ITS) capable of running on massively parallel computers and distributed (networked) computer systems. The prototype includes the modelling of instrumented ``smart`` vehicles with in-vehicle navigation units capable of optimal route planning and Traffic Management Centers (TMC). The TMC has probe vehicle tracking capabilities (display position and attributes of instrumented vehicles), and can provide 2-way interaction with traffic to provide advisories and link times. Both the in-vehicle navigation module and the TMC feature detailed graphical user interfaces to support human-factors studies. The prototype has been developed on a distributed system of networked UNIX computers but is designed to run on ANL`s IBM SP-X parallel computer system for large scale problems. A novel feature of our design is that vehicles will be represented by autonomus computer processes, each with a behavior model which performs independent route selection and reacts to external traffic events much like real vehicles. With this approach, one will be able to take advantage of emerging massively parallel processor (MPP) systems.
Optical technologies for data communication in large parallel systems

International Nuclear Information System (INIS)

Ritter, M B; Vlasov, Y; Kash, J A; Benner, A

2011-01-01

Large, parallel systems have greatly aided scientific computation and data collection, but performance scaling now relies on chip and system-level parallelism. This has happened because power density limits have caused processor frequency growth to stagnate, driving the new multi-core architecture paradigm, which would seem to provide generations of performance increases as transistors scale. However, this paradigm will be constrained by electrical I/O bandwidth limits; first off the processor card, then off the processor module itself. We will present best-estimates of these limits, then show how optical technologies can help provide more bandwidth to allow continued system scaling. We will describe the current status of optical transceiver technology which is already being used to exceed off-board electrical bandwidth limits, then present work on silicon nanophotonic transceivers and 3D integration technologies which, taken together, promise to allow further increases in off-module and off-card bandwidth. Finally, we will show estimated limits of nanophotonic links and discuss breakthroughs that are needed for further progress, and will speculate on whether we will reach Exascale-class machine performance at affordable powers.

Optical technologies for data communication in large parallel systems

Energy Technology Data Exchange (ETDEWEB)

Ritter, M B; Vlasov, Y; Kash, J A [IBM T.J. Watson Research Center, Yorktown Heights, NY (United States); Benner, A, E-mail: mritter@us.ibm.com [IBM Poughkeepsie, Poughkeepsie, NY (United States)

2011-01-15

Large, parallel systems have greatly aided scientific computation and data collection, but performance scaling now relies on chip and system-level parallelism. This has happened because power density limits have caused processor frequency growth to stagnate, driving the new multi-core architecture paradigm, which would seem to provide generations of performance increases as transistors scale. However, this paradigm will be constrained by electrical I/O bandwidth limits; first off the processor card, then off the processor module itself. We will present best-estimates of these limits, then show how optical technologies can help provide more bandwidth to allow continued system scaling. We will describe the current status of optical transceiver technology which is already being used to exceed off-board electrical bandwidth limits, then present work on silicon nanophotonic transceivers and 3D integration technologies which, taken together, promise to allow further increases in off-module and off-card bandwidth. Finally, we will show estimated limits of nanophotonic links and discuss breakthroughs that are needed for further progress, and will speculate on whether we will reach Exascale-class machine performance at affordable powers.
Regional-scale calculation of the LS factor using parallel processing

Science.gov (United States)

Liu, Kai; Tang, Guoan; Jiang, Ling; Zhu, A.-Xing; Yang, Jianyi; Song, Xiaodong

2015-05-01

With the increase of data resolution and the increasing application of USLE over large areas, the existing serial implementation of algorithms for computing the LS factor is becoming a bottleneck. In this paper, a parallel processing model based on message passing interface (MPI) is presented for the calculation of the LS factor, so that massive datasets at a regional scale can be processed efficiently. The parallel model contains algorithms for calculating flow direction, flow accumulation, drainage network, slope, slope length and the LS factor. According to the existence of data dependence, the algorithms are divided into local algorithms and global algorithms. Parallel strategy are designed according to the algorithm characters including the decomposition method for maintaining the integrity of the results, optimized workflow for reducing the time taken for exporting the unnecessary intermediate data and a buffer-communication-computation strategy for improving the communication efficiency. Experiments on a multi-node system show that the proposed parallel model allows efficient calculation of the LS factor at a regional scale with a massive dataset.
Efficient numerical methods for the large-scale, parallel solution of elastoplastic contact problems

KAUST Repository

Frohne, Jö rg; Heister, Timo; Bangerth, Wolfgang

2015-01-01

© 2016 John Wiley & Sons, Ltd. Quasi-static elastoplastic contact problems are ubiquitous in many industrial processes and other contexts, and their numerical simulation is consequently of great interest in accurately describing and optimizing production processes. The key component in these simulations is the solution of a single load step of a time iteration. From a mathematical perspective, the problems to be solved in each time step are characterized by the difficulties of variational inequalities for both the plastic behavior and the contact problem. Computationally, they also often lead to very large problems. In this paper, we present and evaluate a complete set of methods that are (1) designed to work well together and (2) allow for the efficient solution of such problems. In particular, we use adaptive finite element meshes with linear and quadratic elements, a Newton linearization of the plasticity, active set methods for the contact problem, and multigrid-preconditioned linear solvers. Through a sequence of numerical experiments, we show the performance of these methods. This includes highly accurate solutions of a three-dimensional benchmark problem and scaling our methods in parallel to 1024 cores and more than a billion unknowns.
Efficient numerical methods for the large-scale, parallel solution of elastoplastic contact problems

KAUST Repository

Frohne, Jörg

2015-08-06

© 2016 John Wiley & Sons, Ltd. Quasi-static elastoplastic contact problems are ubiquitous in many industrial processes and other contexts, and their numerical simulation is consequently of great interest in accurately describing and optimizing production processes. The key component in these simulations is the solution of a single load step of a time iteration. From a mathematical perspective, the problems to be solved in each time step are characterized by the difficulties of variational inequalities for both the plastic behavior and the contact problem. Computationally, they also often lead to very large problems. In this paper, we present and evaluate a complete set of methods that are (1) designed to work well together and (2) allow for the efficient solution of such problems. In particular, we use adaptive finite element meshes with linear and quadratic elements, a Newton linearization of the plasticity, active set methods for the contact problem, and multigrid-preconditioned linear solvers. Through a sequence of numerical experiments, we show the performance of these methods. This includes highly accurate solutions of a three-dimensional benchmark problem and scaling our methods in parallel to 1024 cores and more than a billion unknowns.
Neurite, a finite difference large scale parallel program for the simulation of electrical signal propagation in neurites under mechanical loading.

Directory of Open Access Journals (Sweden)

Julián A García-Grajales

Full Text Available With the growing body of research on traumatic brain injury and spinal cord injury, computational neuroscience has recently focused its modeling efforts on neuronal functional deficits following mechanical loading. However, in most of these efforts, cell damage is generally only characterized by purely mechanistic criteria, functions of quantities such as stress, strain or their corresponding rates. The modeling of functional deficits in neurites as a consequence of macroscopic mechanical insults has been rarely explored. In particular, a quantitative mechanically based model of electrophysiological impairment in neuronal cells, Neurite, has only very recently been proposed. In this paper, we present the implementation details of this model: a finite difference parallel program for simulating electrical signal propagation along neurites under mechanical loading. Following the application of a macroscopic strain at a given strain rate produced by a mechanical insult, Neurite is able to simulate the resulting neuronal electrical signal propagation, and thus the corresponding functional deficits. The simulation of the coupled mechanical and electrophysiological behaviors requires computational expensive calculations that increase in complexity as the network of the simulated cells grows. The solvers implemented in Neurite--explicit and implicit--were therefore parallelized using graphics processing units in order to reduce the burden of the simulation costs of large scale scenarios. Cable Theory and Hodgkin-Huxley models were implemented to account for the electrophysiological passive and active regions of a neurite, respectively, whereas a coupled mechanical model accounting for the neurite mechanical behavior within its surrounding medium was adopted as a link between electrophysiology and mechanics. This paper provides the details of the parallel implementation of Neurite, along with three different application examples: a long myelinated axon
Large-Scale Parallel Viscous Flow Computations using an Unstructured Multigrid Algorithm

Science.gov (United States)

Mavriplis, Dimitri J.

1999-01-01

The development and testing of a parallel unstructured agglomeration multigrid algorithm for steady-state aerodynamic flows is discussed. The agglomeration multigrid strategy uses a graph algorithm to construct the coarse multigrid levels from the given fine grid, similar to an algebraic multigrid approach, but operates directly on the non-linear system using the FAS (Full Approximation Scheme) approach. The scalability and convergence rate of the multigrid algorithm are examined on the SGI Origin 2000 and the Cray T3E. An argument is given which indicates that the asymptotic scalability of the multigrid algorithm should be similar to that of its underlying single grid smoothing scheme. For medium size problems involving several million grid points, near perfect scalability is obtained for the single grid algorithm, while only a slight drop-off in parallel efficiency is observed for the multigrid V- and W-cycles, using up to 128 processors on the SGI Origin 2000, and up to 512 processors on the Cray T3E. For a large problem using 25 million grid points, good scalability is observed for the multigrid algorithm using up to 1450 processors on a Cray T3E, even when the coarsest grid level contains fewer points than the total number of processors.
Speedup predictions on large scientific parallel programs

International Nuclear Information System (INIS)

Williams, E.; Bobrowicz, F.

1985-01-01

How much speedup can we expect for large scientific parallel programs running on supercomputers. For insight into this problem we extend the parallel processing environment currently existing on the Cray X-MP (a shared memory multiprocessor with at most four processors) to a simulated N-processor environment, where N greater than or equal to 1. Several large scientific parallel programs from Los Alamos National Laboratory were run in this simulated environment, and speedups were predicted. A speedup of 14.4 on 16 processors was measured for one of the three most used codes at the Laboratory
A Parallel Solver for Large-Scale Markov Chains

Czech Academy of Sciences Publication Activity Database

Benzi, M.; Tůma, Miroslav

2002-01-01

Roč. 41, - (2002), s. 135-153 ISSN 0168-9274 R&D Projects: GA AV ČR IAA2030801; GA ČR GA101/00/1035 Keywords : parallel preconditioning * iterative methods * discrete Markov chains * generalized inverses * singular matrices * graph partitioning * AINV * Bi-CGSTAB Subject RIV: BA - General Mathematics Impact factor: 0.504, year: 2002
Parallel time domain solvers for electrically large transient scattering problems

KAUST Repository

Liu, Yang

2014-09-26

Marching on in time (MOT)-based integral equation solvers represent an increasingly appealing avenue for analyzing transient electromagnetic interactions with large and complex structures. MOT integral equation solvers for analyzing electromagnetic scattering from perfect electrically conducting objects are obtained by enforcing electric field boundary conditions and implicitly time advance electric surface current densities by iteratively solving sparse systems of equations at all time steps. Contrary to finite difference and element competitors, these solvers apply to nonlinear and multi-scale structures comprising geometrically intricate and deep sub-wavelength features residing atop electrically large platforms. Moreover, they are high-order accurate, stable in the low- and high-frequency limits, and applicable to conducting and penetrable structures represented by highly irregular meshes. This presentation reviews some recent advances in the parallel implementations of time domain integral equation solvers, specifically those that leverage multilevel plane-wave time-domain algorithm (PWTD) on modern manycore computer architectures including graphics processing units (GPUs) and distributed memory supercomputers. The GPU-based implementation achieves at least one order of magnitude speedups compared to serial implementations while the distributed parallel implementation are highly scalable to thousands of compute-nodes. A distributed parallel PWTD kernel has been adopted to solve time domain surface/volume integral equations (TDSIE/TDVIE) for analyzing transient scattering from large and complex-shaped perfectly electrically conducting (PEC)/dielectric objects involving ten million/tens of millions of spatial unknowns.
Accelerating large-scale protein structure alignments with graphics processing units

Directory of Open Access Journals (Sweden)

Pang Bin

2012-02-01

Full Text Available Abstract Background Large-scale protein structure alignment, an indispensable tool to structural bioinformatics, poses a tremendous challenge on computational resources. To ensure structure alignment accuracy and efficiency, efforts have been made to parallelize traditional alignment algorithms in grid environments. However, these solutions are costly and of limited accessibility. Others trade alignment quality for speedup by using high-level characteristics of structure fragments for structure comparisons. Findings We present ppsAlign, a parallel protein structure Alignment framework designed and optimized to exploit the parallelism of Graphics Processing Units (GPUs. As a general-purpose GPU platform, ppsAlign could take many concurrent methods, such as TM-align and Fr-TM-align, into the parallelized algorithm design. We evaluated ppsAlign on an NVIDIA Tesla C2050 GPU card, and compared it with existing software solutions running on an AMD dual-core CPU. We observed a 36-fold speedup over TM-align, a 65-fold speedup over Fr-TM-align, and a 40-fold speedup over MAMMOTH. Conclusions ppsAlign is a high-performance protein structure alignment tool designed to tackle the computational complexity issues from protein structural data. The solution presented in this paper allows large-scale structure comparisons to be performed using massive parallel computing power of GPU.
Research on precision grinding technology of large scale and ultra thin optics

Science.gov (United States)

Zhou, Lian; Wei, Qiancai; Li, Jie; Chen, Xianhua; Zhang, Qinghua

2018-03-01

The flatness and parallelism error of large scale and ultra thin optics have an important influence on the subsequent polishing efficiency and accuracy. In order to realize the high precision grinding of those ductile elements, the low deformation vacuum chuck was designed first, which was used for clamping the optics with high supporting rigidity in the full aperture. Then the optics was planar grinded under vacuum adsorption. After machining, the vacuum system was turned off. The form error of optics was on-machine measured using displacement sensor after elastic restitution. The flatness would be convergenced with high accuracy by compensation machining, whose trajectories were integrated with the measurement result. For purpose of getting high parallelism, the optics was turned over and compensation grinded using the form error of vacuum chuck. Finally, the grinding experiment of large scale and ultra thin fused silica optics with aperture of 430mm×430mm×10mm was performed. The best P-V flatness of optics was below 3 μm, and parallelism was below 3 ″. This machining technique has applied in batch grinding of large scale and ultra thin optics.
Leveraging human oversight and intervention in large-scale parallel processing of open-source data

Science.gov (United States)

Casini, Enrico; Suri, Niranjan; Bradshaw, Jeffrey M.

2015-05-01

The popularity of cloud computing along with the increased availability of cheap storage have led to the necessity of elaboration and transformation of large volumes of open-source data, all in parallel. One way to handle such extensive volumes of information properly is to take advantage of distributed computing frameworks like Map-Reduce. Unfortunately, an entirely automated approach that excludes human intervention is often unpredictable and error prone. Highly accurate data processing and decision-making can be achieved by supporting an automatic process through human collaboration, in a variety of environments such as warfare, cyber security and threat monitoring. Although this mutual participation seems easily exploitable, human-machine collaboration in the field of data analysis presents several challenges. First, due to the asynchronous nature of human intervention, it is necessary to verify that once a correction is made, all the necessary reprocessing is done in chain. Second, it is often needed to minimize the amount of reprocessing in order to optimize the usage of resources due to limited availability. In order to improve on these strict requirements, this paper introduces improvements to an innovative approach for human-machine collaboration in the processing of large amounts of open-source data in parallel.
Faster Parallel Traversal of Scale Free Graphs at Extreme Scale with Vertex Delegates

KAUST Repository

Pearce, Roger

2014-11-01

© 2014 IEEE. At extreme scale, irregularities in the structure of scale-free graphs such as social network graphs limit our ability to analyze these important and growing datasets. A key challenge is the presence of high-degree vertices (hubs), that leads to parallel workload and storage imbalances. The imbalances occur because existing partitioning techniques are not able to effectively partition high-degree vertices. We present techniques to distribute storage, computation, and communication of hubs for extreme scale graphs in distributed memory supercomputers. To balance the hub processing workload, we distribute hub data structures and related computation among a set of delegates. The delegates coordinate using highly optimized, yet portable, asynchronous broadcast and reduction operations. We demonstrate scalability of our new algorithmic technique using Breadth-First Search (BFS), Single Source Shortest Path (SSSP), K-Core Decomposition, and Page-Rank on synthetically generated scale-free graphs. Our results show excellent scalability on large scale-free graphs up to 131K cores of the IBM BG/P, and outperform the best known Graph500 performance on BG/P Intrepid by 15%
Faster Parallel Traversal of Scale Free Graphs at Extreme Scale with Vertex Delegates

KAUST Repository

Pearce, Roger; Gokhale, Maya; Amato, Nancy M.

2014-01-01

© 2014 IEEE. At extreme scale, irregularities in the structure of scale-free graphs such as social network graphs limit our ability to analyze these important and growing datasets. A key challenge is the presence of high-degree vertices (hubs), that leads to parallel workload and storage imbalances. The imbalances occur because existing partitioning techniques are not able to effectively partition high-degree vertices. We present techniques to distribute storage, computation, and communication of hubs for extreme scale graphs in distributed memory supercomputers. To balance the hub processing workload, we distribute hub data structures and related computation among a set of delegates. The delegates coordinate using highly optimized, yet portable, asynchronous broadcast and reduction operations. We demonstrate scalability of our new algorithmic technique using Breadth-First Search (BFS), Single Source Shortest Path (SSSP), K-Core Decomposition, and Page-Rank on synthetically generated scale-free graphs. Our results show excellent scalability on large scale-free graphs up to 131K cores of the IBM BG/P, and outperform the best known Graph500 performance on BG/P Intrepid by 15%
Computational challenges of large-scale, long-time, first-principles molecular dynamics

International Nuclear Information System (INIS)

Kent, P R C

2008-01-01

Plane wave density functional calculations have traditionally been able to use the largest available supercomputing resources. We analyze the scalability of modern projector-augmented wave implementations to identify the challenges in performing molecular dynamics calculations of large systems containing many thousands of electrons. Benchmark calculations on the Cray XT4 demonstrate that global linear-algebra operations are the primary reason for limited parallel scalability. Plane-wave related operations can be made sufficiently scalable. Improving parallel linear-algebra performance is an essential step to reaching longer timescales in future large-scale molecular dynamics calculations
Hierarchical Parallel Matrix Multiplication on Large-Scale Distributed Memory Platforms

KAUST Repository

Quintin, Jean-Noel

2013-10-01

Matrix multiplication is a very important computation kernel both in its own right as a building block of many scientific applications and as a popular representative for other scientific applications. Cannon\\'s algorithm which dates back to 1969 was the first efficient algorithm for parallel matrix multiplication providing theoretically optimal communication cost. However this algorithm requires a square number of processors. In the mid-1990s, the SUMMA algorithm was introduced. SUMMA overcomes the shortcomings of Cannon\\'s algorithm as it can be used on a nonsquare number of processors as well. Since then the number of processors in HPC platforms has increased by two orders of magnitude making the contribution of communication in the overall execution time more significant. Therefore, the state of the art parallel matrix multiplication algorithms should be revisited to reduce the communication cost further. This paper introduces a new parallel matrix multiplication algorithm, Hierarchical SUMMA (HSUMMA), which is a redesign of SUMMA. Our algorithm reduces the communication cost of SUMMA by introducing a two-level virtual hierarchy into the two-dimensional arrangement of processors. Experiments on an IBM BlueGene/P demonstrate the reduction of communication cost up to 2.08 times on 2048 cores and up to 5.89 times on 16384 cores. © 2013 IEEE.
Hierarchical Parallel Matrix Multiplication on Large-Scale Distributed Memory Platforms

KAUST Repository

Quintin, Jean-Noel; Hasanov, Khalid; Lastovetsky, Alexey

2013-01-01

Matrix multiplication is a very important computation kernel both in its own right as a building block of many scientific applications and as a popular representative for other scientific applications. Cannon's algorithm which dates back to 1969 was the first efficient algorithm for parallel matrix multiplication providing theoretically optimal communication cost. However this algorithm requires a square number of processors. In the mid-1990s, the SUMMA algorithm was introduced. SUMMA overcomes the shortcomings of Cannon's algorithm as it can be used on a nonsquare number of processors as well. Since then the number of processors in HPC platforms has increased by two orders of magnitude making the contribution of communication in the overall execution time more significant. Therefore, the state of the art parallel matrix multiplication algorithms should be revisited to reduce the communication cost further. This paper introduces a new parallel matrix multiplication algorithm, Hierarchical SUMMA (HSUMMA), which is a redesign of SUMMA. Our algorithm reduces the communication cost of SUMMA by introducing a two-level virtual hierarchy into the two-dimensional arrangement of processors. Experiments on an IBM BlueGene/P demonstrate the reduction of communication cost up to 2.08 times on 2048 cores and up to 5.89 times on 16384 cores. © 2013 IEEE.
Building a parallel file system simulator

International Nuclear Information System (INIS)

Molina-Estolano, E; Maltzahn, C; Brandt, S A; Bent, J

2009-01-01

Parallel file systems are gaining in popularity in high-end computing centers as well as commercial data centers. High-end computing systems are expected to scale exponentially and to pose new challenges to their storage scalability in terms of cost and power. To address these challenges scientists and file system designers will need a thorough understanding of the design space of parallel file systems. Yet there exist few systematic studies of parallel file system behavior at petabyte- and exabyte scale. An important reason is the significant cost of getting access to large-scale hardware to test parallel file systems. To contribute to this understanding we are building a parallel file system simulator that can simulate parallel file systems at very large scale. Our goal is to simulate petabyte-scale parallel file systems on a small cluster or even a single machine in reasonable time and fidelity. With this simulator, file system experts will be able to tune existing file systems for specific workloads, scientists and file system deployment engineers will be able to better communicate workload requirements, file system designers and researchers will be able to try out design alternatives and innovations at scale, and instructors will be able to study very large-scale parallel file system behavior in the class room. In this paper we describe our approach and provide preliminary results that are encouraging both in terms of fidelity and simulation scalability.
Large Scale Simulations of the Euler Equations on GPU Clusters

KAUST Repository

Liebmann, Manfred; Douglas, Craig C.; Haase, Gundolf; Horvá th, Zoltá n

2010-01-01

The paper investigates the scalability of a parallel Euler solver, using the Vijayasundaram method, on a GPU cluster with 32 Nvidia Geforce GTX 295 boards. The aim of this research is to enable large scale fluid dynamics simulations with up to one
Final Report: Migration Mechanisms for Large-scale Parallel Applications

Energy Technology Data Exchange (ETDEWEB)

Jason Nieh

2009-10-30

Process migration is the ability to transfer a process from one machine to another. It is a useful facility in distributed computing environments, especially as computing devices become more pervasive and Internet access becomes more ubiquitous. The potential benefits of process migration, among others, are fault resilience by migrating processes off of faulty hosts, data access locality by migrating processes closer to the data, better system response time by migrating processes closer to users, dynamic load balancing by migrating processes to less loaded hosts, and improved service availability and administration by migrating processes before host maintenance so that applications can continue to run with minimal downtime. Although process migration provides substantial potential benefits and many approaches have been considered, achieving transparent process migration functionality has been difficult in practice. To address this problem, our work has designed, implemented, and evaluated new and powerful transparent process checkpoint-restart and migration mechanisms for desktop, server, and parallel applications that operate across heterogeneous cluster and mobile computing environments. A key aspect of this work has been to introduce lightweight operating system virtualization to provide processes with private, virtual namespaces that decouple and isolate processes from dependencies on the host operating system instance. This decoupling enables processes to be transparently checkpointed and migrated without modifying, recompiling, or relinking applications or the operating system. Building on this lightweight operating system virtualization approach, we have developed novel technologies that enable (1) coordinated, consistent checkpoint-restart and migration of multiple processes, (2) fast checkpointing of process and file system state to enable restart of multiple parallel execution environments and time travel, (3) process migration across heterogeneous

Performance characteristics of hybrid MPI/OpenMP implementations of NAS parallel benchmarks SP and BT on large-scale multicore supercomputers

KAUST Repository

Wu, Xingfu; Taylor, Valerie

2011-01-01

The NAS Parallel Benchmarks (NPB) are well-known applications with the fixed algorithms for evaluating parallel systems and tools. Multicore supercomputers provide a natural programming paradigm for hybrid programs, whereby OpenMP can be used with the data sharing with the multicores that comprise a node and MPI can be used with the communication between nodes. In this paper, we use SP and BT benchmarks of MPI NPB 3.3 as a basis for a comparative approach to implement hybrid MPI/OpenMP versions of SP and BT. In particular, we can compare the performance of the hybrid SP and BT with the MPI counterparts on large-scale multicore supercomputers. Our performance results indicate that the hybrid SP outperforms the MPI SP by up to 20.76%, and the hybrid BT outperforms the MPI BT by up to 8.58% on up to 10,000 cores on BlueGene/P at Argonne National Laboratory and Jaguar (Cray XT4/5) at Oak Ridge National Laboratory. We also use performance tools and MPI trace libraries available on these supercomputers to further investigate the performance characteristics of the hybrid SP and BT.
Performance characteristics of hybrid MPI/OpenMP implementations of NAS parallel benchmarks SP and BT on large-scale multicore supercomputers

KAUST Repository

Wu, Xingfu

2011-03-29

The NAS Parallel Benchmarks (NPB) are well-known applications with the fixed algorithms for evaluating parallel systems and tools. Multicore supercomputers provide a natural programming paradigm for hybrid programs, whereby OpenMP can be used with the data sharing with the multicores that comprise a node and MPI can be used with the communication between nodes. In this paper, we use SP and BT benchmarks of MPI NPB 3.3 as a basis for a comparative approach to implement hybrid MPI/OpenMP versions of SP and BT. In particular, we can compare the performance of the hybrid SP and BT with the MPI counterparts on large-scale multicore supercomputers. Our performance results indicate that the hybrid SP outperforms the MPI SP by up to 20.76%, and the hybrid BT outperforms the MPI BT by up to 8.58% on up to 10,000 cores on BlueGene/P at Argonne National Laboratory and Jaguar (Cray XT4/5) at Oak Ridge National Laboratory. We also use performance tools and MPI trace libraries available on these supercomputers to further investigate the performance characteristics of the hybrid SP and BT.
Parallel Multivariate Spatio-Temporal Clustering of Large Ecological Datasets on Hybrid Supercomputers

Energy Technology Data Exchange (ETDEWEB)

Sreepathi, Sarat [ORNL; Kumar, Jitendra [ORNL; Mills, Richard T. [Argonne National Laboratory; Hoffman, Forrest M. [ORNL; Sripathi, Vamsi [Intel Corporation; Hargrove, William Walter [United States Department of Agriculture (USDA), United States Forest Service (USFS)

2017-09-01

A proliferation of data from vast networks of remote sensing platforms (satellites, unmanned aircraft systems (UAS), airborne etc.), observational facilities (meteorological, eddy covariance etc.), state-of-the-art sensors, and simulation models offer unprecedented opportunities for scientific discovery. Unsupervised classification is a widely applied data mining approach to derive insights from such data. However, classification of very large data sets is a complex computational problem that requires efficient numerical algorithms and implementations on high performance computing (HPC) platforms. Additionally, increasing power, space, cooling and efficiency requirements has led to the deployment of hybrid supercomputing platforms with complex architectures and memory hierarchies like the Titan system at Oak Ridge National Laboratory. The advent of such accelerated computing architectures offers new challenges and opportunities for big data analytics in general and specifically, large scale cluster analysis in our case. Although there is an existing body of work on parallel cluster analysis, those approaches do not fully meet the needs imposed by the nature and size of our large data sets. Moreover, they had scaling limitations and were mostly limited to traditional distributed memory computing platforms. We present a parallel Multivariate Spatio-Temporal Clustering (MSTC) technique based on k-means cluster analysis that can target hybrid supercomputers like Titan. We developed a hybrid MPI, CUDA and OpenACC implementation that can utilize both CPU and GPU resources on computational nodes. We describe performance results on Titan that demonstrate the scalability and efficacy of our approach in processing large ecological data sets.
Parallel real-time visualization system for large-scale simulation. Application to WSPEEDI

International Nuclear Information System (INIS)

Muramatsu, Kazuhiro; Otani, Takayuki; Kitabata, Hideyuki; Matsumoto, Hideki; Takei, Toshifumi; Doi, Shun

2000-01-01

The real-time visualization system, PATRAS (PArallel TRAcking Steering system) has been developed on parallel computing servers. The system performs almost all of the visualization tasks on a parallel computing server, and uses image data compression technique for efficient communication between the server and the client terminal. Therefore, the system realizes high performance concurrent visualization in an internet computing environment. The experience in applying PATRAS to WSPEEDI (Worldwide version of System for Prediction Environmental Emergency Dose Information) is reported. The application of PATRAS to WSPEEDI enables users to understand behaviours of radioactive tracers from different release points easily and quickly. (author)
Parallel multiple instance learning for extremely large histopathology image analysis.

Science.gov (United States)

Xu, Yan; Li, Yeshu; Shen, Zhengyang; Wu, Ziwei; Gao, Teng; Fan, Yubo; Lai, Maode; Chang, Eric I-Chao

2017-08-03

Histopathology images are critical for medical diagnosis, e.g., cancer and its treatment. A standard histopathology slice can be easily scanned at a high resolution of, say, 200,000×200,000 pixels. These high resolution images can make most existing imaging processing tools infeasible or less effective when operated on a single machine with limited memory, disk space and computing power. In this paper, we propose an algorithm tackling this new emerging "big data" problem utilizing parallel computing on High-Performance-Computing (HPC) clusters. Experimental results on a large-scale data set (1318 images at a scale of 10 billion pixels each) demonstrate the efficiency and effectiveness of the proposed algorithm for low-latency real-time applications. The framework proposed an effective and efficient system for extremely large histopathology image analysis. It is based on the multiple instance learning formulation for weakly-supervised learning for image classification, segmentation and clustering. When a max-margin concept is adopted for different clusters, we obtain further improvement in clustering performance.
Parallel Scaling Characteristics of Selected NERSC User ProjectCodes

Energy Technology Data Exchange (ETDEWEB)

Skinner, David; Verdier, Francesca; Anand, Harsh; Carter,Jonathan; Durst, Mark; Gerber, Richard

2005-03-05

This report documents parallel scaling characteristics of NERSC user project codes between Fiscal Year 2003 and the first half of Fiscal Year 2004 (Oct 2002-March 2004). The codes analyzed cover 60% of all the CPU hours delivered during that time frame on seaborg, a 6080 CPU IBM SP and the largest parallel computer at NERSC. The scale in terms of concurrency and problem size of the workload is analyzed. Drawing on batch queue logs, performance data and feedback from researchers we detail the motivations, benefits, and challenges of implementing highly parallel scientific codes on current NERSC High Performance Computing systems. An evaluation and outlook of the NERSC workload for Allocation Year 2005 is presented.
Multi Scale Finite Element Analyses By Using SEM-EBSD Crystallographic Modeling and Parallel Computing

International Nuclear Information System (INIS)

Nakamachi, Eiji

2005-01-01

A crystallographic homogenization procedure is introduced to the conventional static-explicit and dynamic-explicit finite element formulation to develop a multi scale - double scale - analysis code to predict the plastic strain induced texture evolution, yield loci and formability of sheet metal. The double-scale structure consists of a crystal aggregation - micro-structure - and a macroscopic elastic plastic continuum. At first, we measure crystal morphologies by using SEM-EBSD apparatus, and define a unit cell of micro structure, which satisfy the periodicity condition in the real scale of polycrystal. Next, this crystallographic homogenization FE code is applied to 3N pure-iron and 'Benchmark' aluminum A6022 polycrystal sheets. It reveals that the initial crystal orientation distribution - the texture - affects very much to a plastic strain induced texture and anisotropic hardening evolutions and sheet deformation. Since, the multi-scale finite element analysis requires a large computation time, a parallel computing technique by using PC cluster is developed for a quick calculation. In this parallelization scheme, a dynamic workload balancing technique is introduced for quick and efficient calculations
Large Scale GW Calculations on the Cori System

Science.gov (United States)

Deslippe, Jack; Del Ben, Mauro; da Jornada, Felipe; Canning, Andrew; Louie, Steven

The NERSC Cori system, powered by 9000+ Intel Xeon-Phi processors, represents one of the largest HPC systems for open-science in the United States and the world. We discuss the optimization of the GW methodology for this system, including both node level and system-scale optimizations. We highlight multiple large scale (thousands of atoms) case studies and discuss both absolute application performance and comparison to calculations on more traditional HPC architectures. We find that the GW method is particularly well suited for many-core architectures due to the ability to exploit a large amount of parallelism across many layers of the system. This work was supported by the U.S. Department of Energy, Office of Science, Basic Energy Sciences, Materials Sciences and Engineering Division, as part of the Computational Materials Sciences Program.
Large scale electrolysers

International Nuclear Information System (INIS)

B Bello; M Junker

2006-01-01

Hydrogen production by water electrolysis represents nearly 4 % of the world hydrogen production. Future development of hydrogen vehicles will require large quantities of hydrogen. Installation of large scale hydrogen production plants will be needed. In this context, development of low cost large scale electrolysers that could use 'clean power' seems necessary. ALPHEA HYDROGEN, an European network and center of expertise on hydrogen and fuel cells, has performed for its members a study in 2005 to evaluate the potential of large scale electrolysers to produce hydrogen in the future. The different electrolysis technologies were compared. Then, a state of art of the electrolysis modules currently available was made. A review of the large scale electrolysis plants that have been installed in the world was also realized. The main projects related to large scale electrolysis were also listed. Economy of large scale electrolysers has been discussed. The influence of energy prices on the hydrogen production cost by large scale electrolysis was evaluated. (authors)
Parallel computing works!

CERN Document Server

Fox, Geoffrey C; Messina, Guiseppe C

2014-01-01

A clear illustration of how parallel computers can be successfully appliedto large-scale scientific computations. This book demonstrates how avariety of applications in physics, biology, mathematics and other scienceswere implemented on real parallel computers to produce new scientificresults. It investigates issues of fine-grained parallelism relevant forfuture supercomputers with particular emphasis on hypercube architecture. The authors describe how they used an experimental approach to configuredifferent massively parallel machines, design and implement basic systemsoftware, and develop
A parallel orbital-updating based plane-wave basis method for electronic structure calculations

International Nuclear Information System (INIS)

Pan, Yan; Dai, Xiaoying; Gironcoli, Stefano de; Gong, Xin-Gao; Rignanese, Gian-Marco; Zhou, Aihui

2017-01-01

Highlights: • Propose three parallel orbital-updating based plane-wave basis methods for electronic structure calculations. • These new methods can avoid the generating of large scale eigenvalue problems and then reduce the computational cost. • These new methods allow for two-level parallelization which is particularly interesting for large scale parallelization. • Numerical experiments show that these new methods are reliable and efficient for large scale calculations on modern supercomputers. - Abstract: Motivated by the recently proposed parallel orbital-updating approach in real space method , we propose a parallel orbital-updating based plane-wave basis method for electronic structure calculations, for solving the corresponding eigenvalue problems. In addition, we propose two new modified parallel orbital-updating methods. Compared to the traditional plane-wave methods, our methods allow for two-level parallelization, which is particularly interesting for large scale parallelization. Numerical experiments show that these new methods are more reliable and efficient for large scale calculations on modern supercomputers.
Large-scale hydrogen production using nuclear reactors

Energy Technology Data Exchange (ETDEWEB)

Ryland, D.; Stolberg, L.; Kettner, A.; Gnanapragasam, N.; Suppiah, S. [Atomic Energy of Canada Limited, Chalk River, ON (Canada)

2014-07-01

For many years, Atomic Energy of Canada Limited (AECL) has been studying the feasibility of using nuclear reactors, such as the Supercritical Water-cooled Reactor, as an energy source for large scale hydrogen production processes such as High Temperature Steam Electrolysis and the Copper-Chlorine thermochemical cycle. Recent progress includes the augmentation of AECL's experimental capabilities by the construction of experimental systems to test high temperature steam electrolysis button cells at ambient pressure and temperatures up to 850{sup o}C and CuCl/HCl electrolysis cells at pressures up to 7 bar and temperatures up to 100{sup o}C. In parallel, detailed models of solid oxide electrolysis cells and the CuCl/HCl electrolysis cell are being refined and validated using experimental data. Process models are also under development to assess options for economic integration of these hydrogen production processes with nuclear reactors. Options for large-scale energy storage, including hydrogen storage, are also under study. (author)
PALNS - A software framework for parallel large neighborhood search

DEFF Research Database (Denmark)

Røpke, Stefan

2009-01-01

This paper propose a simple, parallel, portable software framework for the metaheuristic named large neighborhood search (LNS). The aim is to provide a framework where the user has to set up a few data structures and implement a few functions and then the framework provides a metaheuristic where ...... parallelization "comes for free". We apply the parallel LNS heuristic to two different problems: the traveling salesman problem with pickup and delivery (TSPPD) and the capacitated vehicle routing problem (CVRP)....
Large scale simulations of lattice QCD thermodynamics on Columbia Parallel Supercomputers

International Nuclear Information System (INIS)

Ohta, Shigemi

1989-01-01

The Columbia Parallel Supercomputer project aims at the construction of a parallel processing, multi-gigaflop computer optimized for numerical simulations of lattice QCD. The project has three stages; 16-node, 1/4GF machine completed in April 1985, 64-node, 1GF machine completed in August 1987, and 256-node, 16GF machine now under construction. The machines all share a common architecture; a two dimensional torus formed from a rectangular array of N 1 x N 2 independent and identical processors. A processor is capable of operating in a multi-instruction multi-data mode, except for periods of synchronous interprocessor communication with its four nearest neighbors. Here the thermodynamics simulations on the two working machines are reported. (orig./HSI)
Using Agent Base Models to Optimize Large Scale Network for Large System Inventories

Science.gov (United States)

Shameldin, Ramez Ahmed; Bowling, Shannon R.

2010-01-01

The aim of this paper is to use Agent Base Models (ABM) to optimize large scale network handling capabilities for large system inventories and to implement strategies for the purpose of reducing capital expenses. The models used in this paper either use computational algorithms or procedure implementations developed by Matlab to simulate agent based models in a principal programming language and mathematical theory using clusters, these clusters work as a high performance computational performance to run the program in parallel computational. In both cases, a model is defined as compilation of a set of structures and processes assumed to underlie the behavior of a network system.
Enabling parallel simulation of large-scale HPC network systems

International Nuclear Information System (INIS)

Mubarak, Misbah; Carothers, Christopher D.; Ross, Robert B.; Carns, Philip

2016-01-01

Here, with the increasing complexity of today’s high-performance computing (HPC) architectures, simulation has become an indispensable tool for exploring the design space of HPC systems—in particular, networks. In order to make effective design decisions, simulations of these systems must possess the following properties: (1) have high accuracy and fidelity, (2) produce results in a timely manner, and (3) be able to analyze a broad range of network workloads. Most state-of-the-art HPC network simulation frameworks, however, are constrained in one or more of these areas. In this work, we present a simulation framework for modeling two important classes of networks used in today’s IBM and Cray supercomputers: torus and dragonfly networks. We use the Co-Design of Multi-layer Exascale Storage Architecture (CODES) simulation framework to simulate these network topologies at a flit-level detail using the Rensselaer Optimistic Simulation System (ROSS) for parallel discrete-event simulation. Our simulation framework meets all the requirements of a practical network simulation and can assist network designers in design space exploration. First, it uses validated and detailed flit-level network models to provide an accurate and high-fidelity network simulation. Second, instead of relying on serial time-stepped or traditional conservative discrete-event simulations that limit simulation scalability and efficiency, we use the optimistic event-scheduling capability of ROSS to achieve efficient and scalable HPC network simulations on today’s high-performance cluster systems. Third, our models give network designers a choice in simulating a broad range of network workloads, including HPC application workloads using detailed network traces, an ability that is rarely offered in parallel with high-fidelity network simulations
Vacuum Large Current Parallel Transfer Numerical Analysis

Directory of Open Access Journals (Sweden)

Enyuan Dong

2014-01-01

Full Text Available The stable operation and reliable breaking of large generator current are a difficult problem in power system. It can be solved successfully by the parallel interrupters and proper timing sequence with phase-control technology, in which the strategy of breaker’s control is decided by the time of both the first-opening phase and second-opening phase. The precise transfer current’s model can provide the proper timing sequence to break the generator circuit breaker. By analysis of the transfer current’s experiments and data, the real vacuum arc resistance and precise correctional model in the large transfer current’s process are obtained in this paper. The transfer time calculated by the correctional model of transfer current is very close to the actual transfer time. It can provide guidance for planning proper timing sequence and breaking the vacuum generator circuit breaker with the parallel interrupters.
Performance Characteristics of Hybrid MPI/OpenMP Implementations of NAS Parallel Benchmarks SP and BT on Large-Scale Multicore Clusters

KAUST Repository

Wu, X.; Taylor, V.

2011-01-01

The NAS Parallel Benchmarks (NPB) are well-known applications with fixed algorithms for evaluating parallel systems and tools. Multicore clusters provide a natural programming paradigm for hybrid programs, whereby OpenMP can be used with the data sharing with the multicores that comprise a node, and MPI can be used with the communication between nodes. In this paper, we use Scalar Pentadiagonal (SP) and Block Tridiagonal (BT) benchmarks of MPI NPB 3.3 as a basis for a comparative approach to implement hybrid MPI/OpenMP versions of SP and BT. In particular, we can compare the performance of the hybrid SP and BT with the MPI counterparts on large-scale multicore clusters, Intrepid (BlueGene/P) at Argonne National Laboratory and Jaguar (Cray XT4/5) at Oak Ridge National Laboratory. Our performance results indicate that the hybrid SP outperforms the MPI SP by up to 20.76 %, and the hybrid BT outperforms the MPI BT by up to 8.58 % on up to 10 000 cores on Intrepid and Jaguar. We also use performance tools and MPI trace libraries available on these clusters to further investigate the performance characteristics of the hybrid SP and BT. © 2011 The Author. Published by Oxford University Press on behalf of The British Computer Society. All rights reserved.
Performance Characteristics of Hybrid MPI/OpenMP Implementations of NAS Parallel Benchmarks SP and BT on Large-Scale Multicore Clusters

KAUST Repository

Wu, X.

2011-07-18

The NAS Parallel Benchmarks (NPB) are well-known applications with fixed algorithms for evaluating parallel systems and tools. Multicore clusters provide a natural programming paradigm for hybrid programs, whereby OpenMP can be used with the data sharing with the multicores that comprise a node, and MPI can be used with the communication between nodes. In this paper, we use Scalar Pentadiagonal (SP) and Block Tridiagonal (BT) benchmarks of MPI NPB 3.3 as a basis for a comparative approach to implement hybrid MPI/OpenMP versions of SP and BT. In particular, we can compare the performance of the hybrid SP and BT with the MPI counterparts on large-scale multicore clusters, Intrepid (BlueGene/P) at Argonne National Laboratory and Jaguar (Cray XT4/5) at Oak Ridge National Laboratory. Our performance results indicate that the hybrid SP outperforms the MPI SP by up to 20.76 %, and the hybrid BT outperforms the MPI BT by up to 8.58 % on up to 10 000 cores on Intrepid and Jaguar. We also use performance tools and MPI trace libraries available on these clusters to further investigate the performance characteristics of the hybrid SP and BT. © 2011 The Author. Published by Oxford University Press on behalf of The British Computer Society. All rights reserved.
Development of design technology on thermal-hydraulic performance in tight-lattice rod bundle. 4. Large paralleled simulation by the advanced two-fluid model code

International Nuclear Information System (INIS)

Misawa, Takeharu; Yoshida, Hiroyuki; Akimoto, Hajime

2008-01-01

In Japan Atomic Energy Agency (JAEA), the Innovative Water Reactor for Flexible Fuel Cycle (FLWR) has been developed. For thermal design of FLWR, it is necessary to develop analytical method to predict boiling transition of FLWR. Japan Atomic Energy Agency (JAEA) has been developing three-dimensional two-fluid model analysis code ACE-3D, which adopts boundary fitted coordinate system to simulate complex shape channel flow. In this paper, as a part of development of ACE-3D to apply to rod bundle analysis, introduction of parallelization to ACE-3D and assessments of ACE-3D are shown. In analysis of large-scale domain such as a rod bundle, even two-fluid model requires large number of computational cost, which exceeds upper limit of memory amount of 1 CPU. Therefore, parallelization was introduced to ACE-3D to divide data amount for analysis of large-scale domain among large number of CPUs, and it is confirmed that analysis of large-scale domain such as a rod bundle can be performed by parallel computation with keeping parallel computation performance even using large number of CPUs. ACE-3D adopts two-phase flow models, some of which are dependent upon channel geometry. Therefore, analyses in the domains, which simulate individual subchannel and 37 rod bundle, are performed, and compared with experiments. It is confirmed that the results obtained by both analyses using ACE-3D show agreement with past experimental result qualitatively. (author)

Parallel Implementation of the Multi-Dimensional Spectral Code SPECT3D on large 3D grids.

Science.gov (United States)

Golovkin, Igor E.; Macfarlane, Joseph J.; Woodruff, Pamela R.; Pereyra, Nicolas A.

2006-10-01

The multi-dimensional collisional-radiative, spectral analysis code SPECT3D can be used to study radiation from complex plasmas. SPECT3D can generate instantaneous and time-gated images and spectra, space-resolved and streaked spectra, which makes it a valuable tool for post-processing hydrodynamics calculations and direct comparison between simulations and experimental data. On large three dimensional grids, transporting radiation along lines of sight (LOS) requires substantial memory and CPU resources. Currently, the parallel option in SPECT3D is based on parallelization over photon frequencies and allows for a nearly linear speed-up for a variety of problems. In addition, we are introducing a new parallel mechanism that will greatly reduce memory requirements. In the new implementation, spatial domain decomposition will be utilized allowing transport along a LOS to be performed only on the mesh cells the LOS crosses. The ability to operate on a fraction of the grid is crucial for post-processing the results of large-scale three-dimensional hydrodynamics simulations. We will present a parallel implementation of the code and provide a scalability study performed on a Linux cluster.
Parallel algorithms for large-scale biological sequence alignment on Xeon-Phi based clusters.

Science.gov (United States)

Lan, Haidong; Chan, Yuandong; Xu, Kai; Schmidt, Bertil; Peng, Shaoliang; Liu, Weiguo

2016-07-19

Computing alignments between two or more sequences are common operations frequently performed in computational molecular biology. The continuing growth of biological sequence databases establishes the need for their efficient parallel implementation on modern accelerators. This paper presents new approaches to high performance biological sequence database scanning with the Smith-Waterman algorithm and the first stage of progressive multiple sequence alignment based on the ClustalW heuristic on a Xeon Phi-based compute cluster. Our approach uses a three-level parallelization scheme to take full advantage of the compute power available on this type of architecture; i.e. cluster-level data parallelism, thread-level coarse-grained parallelism, and vector-level fine-grained parallelism. Furthermore, we re-organize the sequence datasets and use Xeon Phi shuffle operations to improve I/O efficiency. Evaluations show that our method achieves a peak overall performance up to 220 GCUPS for scanning real protein sequence databanks on a single node consisting of two Intel E5-2620 CPUs and two Intel Xeon Phi 7110P cards. It also exhibits good scalability in terms of sequence length and size, and number of compute nodes for both database scanning and multiple sequence alignment. Furthermore, the achieved performance is highly competitive in comparison to optimized Xeon Phi and GPU implementations. Our implementation is available at https://github.com/turbo0628/LSDBS-mpi .
THE EFFECT OF INTERMITTENT GYRO-SCALE SLAB TURBULENCE ON PARALLEL AND PERPENDICULAR COSMIC-RAY TRANSPORT

International Nuclear Information System (INIS)

Le Roux, J. A.

2011-01-01

Earlier work based on nonlinear guiding center (NLGC) theory suggested that perpendicular cosmic-ray transport is diffusive when cosmic rays encounter random three-dimensional magnetohydrodynamic turbulence dominated by uniform two-dimensional (2D) turbulence with a minor uniform slab turbulence component. In this approach large-scale perpendicular cosmic-ray transport is due to cosmic rays microscopically diffusing along the meandering magnetic field dominated by 2D turbulence because of gyroresonant interactions with slab turbulence. However, turbulence in the solar wind is intermittent and it has been suggested that intermittent turbulence might be responsible for the observation of 'dropout' events in solar energetic particle fluxes on small scales. In a previous paper le Roux et al. suggested, using NLGC theory as a basis, that if gyro-scale slab turbulence is intermittent, large-scale perpendicular cosmic-ray transport in weak uniform 2D turbulence will be superdiffusive or subdiffusive depending on the statistical characteristics of the intermittent slab turbulence. In this paper we expand and refine our previous work further by investigating how both parallel and perpendicular transport are affected by intermittent slab turbulence for weak as well as strong uniform 2D turbulence. The main new finding is that both parallel and perpendicular transport are the net effect of an interplay between diffusive and nondiffusive (superdiffusive or subdiffusive) transport effects as a consequence of this intermittency.
THE EFFECT OF INTERMITTENT GYRO-SCALE SLAB TURBULENCE ON PARALLEL AND PERPENDICULAR COSMIC-RAY TRANSPORT

Energy Technology Data Exchange (ETDEWEB)

Le Roux, J. A. [Department of Physics, University of Alabama in Huntsville, Huntsville, AL 35899 (United States)

2011-12-10

Earlier work based on nonlinear guiding center (NLGC) theory suggested that perpendicular cosmic-ray transport is diffusive when cosmic rays encounter random three-dimensional magnetohydrodynamic turbulence dominated by uniform two-dimensional (2D) turbulence with a minor uniform slab turbulence component. In this approach large-scale perpendicular cosmic-ray transport is due to cosmic rays microscopically diffusing along the meandering magnetic field dominated by 2D turbulence because of gyroresonant interactions with slab turbulence. However, turbulence in the solar wind is intermittent and it has been suggested that intermittent turbulence might be responsible for the observation of 'dropout' events in solar energetic particle fluxes on small scales. In a previous paper le Roux et al. suggested, using NLGC theory as a basis, that if gyro-scale slab turbulence is intermittent, large-scale perpendicular cosmic-ray transport in weak uniform 2D turbulence will be superdiffusive or subdiffusive depending on the statistical characteristics of the intermittent slab turbulence. In this paper we expand and refine our previous work further by investigating how both parallel and perpendicular transport are affected by intermittent slab turbulence for weak as well as strong uniform 2D turbulence. The main new finding is that both parallel and perpendicular transport are the net effect of an interplay between diffusive and nondiffusive (superdiffusive or subdiffusive) transport effects as a consequence of this intermittency.
A parallel form of the Gudjonsson Suggestibility Scale.

Science.gov (United States)

Gudjonsson, G H

1987-09-01

The purpose of this study is twofold: (1) to present a parallel form of the Gudjonsson Suggestibility Scale (GSS, Form 1); (2) to study test-retest reliabilities of interrogative suggestibility. Three groups of subjects were administered the two suggestibility scales in a counterbalanced order. Group 1 (28 normal subjects) and Group 2 (32 'forensic' patients) completed both scales within the same testing session, whereas Group 3 (30 'forensic' patients) completed the two scales between one week and eight months apart. All the correlations were highly significant, giving support for high 'temporal consistency' of interrogative suggestibility.
Constructing sites on a large scale

DEFF Research Database (Denmark)

Braae, Ellen Marie; Tietjen, Anne

2011-01-01

Since the 1990s, the regional scale has regained importance in urban and landscape design. In parallel, the focus in design tasks has shifted from master plans for urban extension to strategic urban transformation projects. A prominent example of a contemporary spatial development approach...... for setting the design brief in a large scale urban landscape in Norway, the Jaeren region around the city of Stavanger. In this paper, we first outline the methodological challenges and then present and discuss the proposed method based on our teaching experiences. On this basis, we discuss aspects...... is the IBA Emscher Park in the Ruhr area in Germany. Over a 10 years period (1988-1998), more than a 100 local transformation projects contributed to the transformation from an industrial to a post-industrial region. The current paradigm of planning by projects reinforces the role of the design disciplines...
Large scale parallel FEM computations of far/near stress field changes in rocks

Czech Academy of Sciences Publication Activity Database

Blaheta, Radim; Byczanski, Petr; Jakl, Ondřej; Kohut, Roman; Kolcun, Alexej; Krečmer, Karel; Starý, Jiří

2006-01-01

Roč. 22, č. 4 (2006), s. 449-459 ISSN 0167-739X R&D Projects: GA ČR(CZ) GA105/02/0492; GA AV ČR(CZ) 1ET400300415 Institutional research plan: CEZ:AV0Z30860518 Keywords : large scale finite element analysis Subject RIV: BA - General Mathematics Impact factor: 0.722, year: 2006
High performance parallel I/O

CERN Document Server

Prabhat

2014-01-01

Gain Critical Insight into the Parallel I/O EcosystemParallel I/O is an integral component of modern high performance computing (HPC), especially in storing and processing very large datasets to facilitate scientific discovery. Revealing the state of the art in this field, High Performance Parallel I/O draws on insights from leading practitioners, researchers, software architects, developers, and scientists who shed light on the parallel I/O ecosystem.The first part of the book explains how large-scale HPC facilities scope, configure, and operate systems, with an emphasis on choices of I/O har
Large-scale pool fires

Directory of Open Access Journals (Sweden)

Steinhaus Thomas

2007-01-01

Full Text Available A review of research into the burning behavior of large pool fires and fuel spill fires is presented. The features which distinguish such fires from smaller pool fires are mainly associated with the fire dynamics at low source Froude numbers and the radiative interaction with the fire source. In hydrocarbon fires, higher soot levels at increased diameters result in radiation blockage effects around the perimeter of large fire plumes; this yields lower emissive powers and a drastic reduction in the radiative loss fraction; whilst there are simplifying factors with these phenomena, arising from the fact that soot yield can saturate, there are other complications deriving from the intermittency of the behavior, with luminous regions of efficient combustion appearing randomly in the outer surface of the fire according the turbulent fluctuations in the fire plume. Knowledge of the fluid flow instabilities, which lead to the formation of large eddies, is also key to understanding the behavior of large-scale fires. Here modeling tools can be effectively exploited in order to investigate the fluid flow phenomena, including RANS- and LES-based computational fluid dynamics codes. The latter are well-suited to representation of the turbulent motions, but a number of challenges remain with their practical application. Massively-parallel computational resources are likely to be necessary in order to be able to adequately address the complex coupled phenomena to the level of detail that is necessary.
A Topology Visualization Early Warning Distribution Algorithm for Large-Scale Network Security Incidents

Directory of Open Access Journals (Sweden)

Hui He

2013-01-01

Full Text Available It is of great significance to research the early warning system for large-scale network security incidents. It can improve the network system’s emergency response capabilities, alleviate the cyber attacks’ damage, and strengthen the system’s counterattack ability. A comprehensive early warning system is presented in this paper, which combines active measurement and anomaly detection. The key visualization algorithm and technology of the system are mainly discussed. The large-scale network system’s plane visualization is realized based on the divide and conquer thought. First, the topology of the large-scale network is divided into some small-scale networks by the MLkP/CR algorithm. Second, the sub graph plane visualization algorithm is applied to each small-scale network. Finally, the small-scale networks’ topologies are combined into a topology based on the automatic distribution algorithm of force analysis. As the algorithm transforms the large-scale network topology plane visualization problem into a series of small-scale network topology plane visualization and distribution problems, it has higher parallelism and is able to handle the display of ultra-large-scale network topology.
GPU-based large-scale visualization

KAUST Repository

Hadwiger, Markus

2013-11-19

Recent advances in image and volume acquisition as well as computational advances in simulation have led to an explosion of the amount of data that must be visualized and analyzed. Modern techniques combine the parallel processing power of GPUs with out-of-core methods and data streaming to enable the interactive visualization of giga- and terabytes of image and volume data. A major enabler for interactivity is making both the computational and the visualization effort proportional to the amount of data that is actually visible on screen, decoupling it from the full data size. This leads to powerful display-aware multi-resolution techniques that enable the visualization of data of almost arbitrary size. The course consists of two major parts: An introductory part that progresses from fundamentals to modern techniques, and a more advanced part that discusses details of ray-guided volume rendering, novel data structures for display-aware visualization and processing, and the remote visualization of large online data collections. You will learn how to develop efficient GPU data structures and large-scale visualizations, implement out-of-core strategies and concepts such as virtual texturing that have only been employed recently, as well as how to use modern multi-resolution representations. These approaches reduce the GPU memory requirements of extremely large data to a working set size that fits into current GPUs. You will learn how to perform ray-casting of volume data of almost arbitrary size and how to render and process gigapixel images using scalable, display-aware techniques. We will describe custom virtual texturing architectures as well as recent hardware developments in this area. We will also describe client/server systems for distributed visualization, on-demand data processing and streaming, and remote visualization. We will describe implementations using OpenGL as well as CUDA, exploiting parallelism on GPUs combined with additional asynchronous
Imprint of non-linear effects on HI intensity mapping on large scales

Energy Technology Data Exchange (ETDEWEB)

Umeh, Obinna, E-mail: umeobinna@gmail.com [Department of Physics and Astronomy, University of the Western Cape, Cape Town 7535 (South Africa)

2017-06-01

Intensity mapping of the HI brightness temperature provides a unique way of tracing large-scale structures of the Universe up to the largest possible scales. This is achieved by using a low angular resolution radio telescopes to detect emission line from cosmic neutral Hydrogen in the post-reionization Universe. We use general relativistic perturbation theory techniques to derive for the first time the full expression for the HI brightness temperature up to third order in perturbation theory without making any plane-parallel approximation. We use this result and the renormalization prescription for biased tracers to study the impact of nonlinear effects on the power spectrum of HI brightness temperature both in real and redshift space. We show how mode coupling at nonlinear order due to nonlinear bias parameters and redshift space distortion terms modulate the power spectrum on large scales. The large scale modulation may be understood to be due to the effective bias parameter and effective shot noise.
Large Scale Parallel DNA Detection by Two-Dimensional Solid-State Multipore Systems.

Science.gov (United States)

Athreya, Nagendra Bala Murali; Sarathy, Aditya; Leburton, Jean-Pierre

2018-04-23

We describe a scalable device design of a dense array of multiple nanopores made from nanoscale semiconductor materials to detect and identify translocations of many biomolecules in a massively parallel detection scheme. We use molecular dynamics coupled to nanoscale device simulations to illustrate the ability of this device setup to uniquely identify DNA parallel translocations. We show that the transverse sheet currents along membranes are immune to the crosstalk effects arising from simultaneous translocations of biomolecules through multiple pores, due to their ability to sense only the local potential changes. We also show that electronic sensing across the nanopore membrane offers a higher detection resolution compared to ionic current blocking technique in a multipore setup, irrespective of the irregularities that occur while fabricating the nanopores in a two-dimensional membrane.
Developing a Massively Parallel Forward Projection Radiography Model for Large-Scale Industrial Applications

Energy Technology Data Exchange (ETDEWEB)

Bauerle, Matthew [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

2014-08-01

This project utilizes Graphics Processing Units (GPUs) to compute radiograph simulations for arbitrary objects. The generation of radiographs, also known as the forward projection imaging model, is computationally intensive and not widely utilized. The goal of this research is to develop a massively parallel algorithm that can compute forward projections for objects with a trillion voxels (3D pixels). To achieve this end, the data are divided into blocks that can each t into GPU memory. The forward projected image is also divided into segments to allow for future parallelization and to avoid needless computations.
Large-scale parallel uncontracted multireference-averaged quadratic coupled cluster: the ground state of the chromium dimer revisited.

Science.gov (United States)

Müller, Thomas

2009-11-12

The accurate prediction of the potential energy function of the X1Sigmag+ state of Cr2 is a remarkable challenge; large differential electron correlation effects, significant scalar relativistic contributions, the need for large flexible basis sets containing g functions, the importance of semicore valence electron correlation, and its multireference nature pose considerable obstacles. So far, the only reasonable successful approaches were based on multireference perturbation theory (MRPT). Recently, there was some controversy in the literature about the role of error compensation and systematic defects of various MRPT implementations that cannot be easily overcome. A detailed basis set study of the potential energy function is presented, adopting a variational method. The method of choice for this electron-rich target with up to 28 correlated electrons is fully uncontracted multireference-averaged quadratic coupled cluster (MR-AQCC), which shares the flexibility of the multireference configuration interaction (MRCI) approach and is, in addition, approximately size-extensive (0.02 eV in error as compared to the MRCI value of 1.37 eV for two noninteracting chromium atoms). The best estimate for De arrives at 1.48 eV and agrees well with the experimental data of 1.47 +/- 0.056 eV. At the estimated CBS limit, the equilibrium bond distance (1.685 A) and vibrational frequency (459 cm-1) are in agreement with experiment (1.679 A, 481 cm-1). Large basis sets and reference configuration spaces invariably result in huge wave function expansions (here, up to 2.8 billion configuration state functions), and efficient parallel implementations of the method are crucial. Hence, relevant details on implementation and general performance of the parallel program code are discussed as well.
Non-parametric co-clustering of large scale sparse bipartite networks on the GPU

DEFF Research Database (Denmark)

Hansen, Toke Jansen; Mørup, Morten; Hansen, Lars Kai

2011-01-01

of row and column clusters from a hypothesis space of an infinite number of clusters. To reach large scale applications of co-clustering we exploit that parameter inference for co-clustering is well suited for parallel computing. We develop a generic GPU framework for efficient inference on large scale...... sparse bipartite networks and achieve a speedup of two orders of magnitude compared to estimation based on conventional CPUs. In terms of scalability we find for networks with more than 100 million links that reliable inference can be achieved in less than an hour on a single GPU. To efficiently manage...
Node-based finite element method for large-scale adaptive fluid analysis in parallel environments

International Nuclear Information System (INIS)

Toshimitsu, Fujisawa; Genki, Yagawa

2003-01-01

In this paper, a FEM-based (finite element method) mesh free method with a probabilistic node generation technique is presented. In the proposed method, all computational procedures, from the mesh generation to the solution of a system of equations, can be performed fluently in parallel in terms of nodes. Local finite element mesh is generated robustly around each node, even for harsh boundary shapes such as cracks. The algorithm and the data structure of finite element calculation are based on nodes, and parallel computing is realized by dividing a system of equations by the row of the global coefficient matrix. In addition, the node-based finite element method is accompanied by a probabilistic node generation technique, which generates good-natured points for nodes of finite element mesh. Furthermore, the probabilistic node generation technique can be performed in parallel environments. As a numerical example of the proposed method, we perform a compressible flow simulation containing strong shocks. Numerical simulations with frequent mesh refinement, which are required for such kind of analysis, can effectively be performed on parallel processors by using the proposed method. (authors)
Node-based finite element method for large-scale adaptive fluid analysis in parallel environments

Energy Technology Data Exchange (ETDEWEB)

Toshimitsu, Fujisawa [Tokyo Univ., Collaborative Research Center of Frontier Simulation Software for Industrial Science, Institute of Industrial Science (Japan); Genki, Yagawa [Tokyo Univ., Department of Quantum Engineering and Systems Science (Japan)

2003-07-01

In this paper, a FEM-based (finite element method) mesh free method with a probabilistic node generation technique is presented. In the proposed method, all computational procedures, from the mesh generation to the solution of a system of equations, can be performed fluently in parallel in terms of nodes. Local finite element mesh is generated robustly around each node, even for harsh boundary shapes such as cracks. The algorithm and the data structure of finite element calculation are based on nodes, and parallel computing is realized by dividing a system of equations by the row of the global coefficient matrix. In addition, the node-based finite element method is accompanied by a probabilistic node generation technique, which generates good-natured points for nodes of finite element mesh. Furthermore, the probabilistic node generation technique can be performed in parallel environments. As a numerical example of the proposed method, we perform a compressible flow simulation containing strong shocks. Numerical simulations with frequent mesh refinement, which are required for such kind of analysis, can effectively be performed on parallel processors by using the proposed method. (authors)
Neural Parallel Engine: A toolbox for massively parallel neural signal processing.

Science.gov (United States)

Tam, Wing-Kin; Yang, Zhi

2018-05-01

Large-scale neural recordings provide detailed information on neuronal activities and can help elicit the underlying neural mechanisms of the brain. However, the computational burden is also formidable when we try to process the huge data stream generated by such recordings. In this study, we report the development of Neural Parallel Engine (NPE), a toolbox for massively parallel neural signal processing on graphical processing units (GPUs). It offers a selection of the most commonly used routines in neural signal processing such as spike detection and spike sorting, including advanced algorithms such as exponential-component-power-component (EC-PC) spike detection and binary pursuit spike sorting. We also propose a new method for detecting peaks in parallel through a parallel compact operation. Our toolbox is able to offer a 5× to 110× speedup compared with its CPU counterparts depending on the algorithms. A user-friendly MATLAB interface is provided to allow easy integration of the toolbox into existing workflows. Previous efforts on GPU neural signal processing only focus on a few rudimentary algorithms, are not well-optimized and often do not provide a user-friendly programming interface to fit into existing workflows. There is a strong need for a comprehensive toolbox for massively parallel neural signal processing. A new toolbox for massively parallel neural signal processing has been created. It can offer significant speedup in processing signals from large-scale recordings up to thousands of channels. Copyright © 2018 Elsevier B.V. All rights reserved.
OpenMP parallelization of a gridded SWAT (SWATG)

Science.gov (United States)

Zhang, Ying; Hou, Jinliang; Cao, Yongpan; Gu, Juan; Huang, Chunlin

2017-12-01

Large-scale, long-term and high spatial resolution simulation is a common issue in environmental modeling. A Gridded Hydrologic Response Unit (HRU)-based Soil and Water Assessment Tool (SWATG) that integrates grid modeling scheme with different spatial representations also presents such problems. The time-consuming problem affects applications of very high resolution large-scale watershed modeling. The OpenMP (Open Multi-Processing) parallel application interface is integrated with SWATG (called SWATGP) to accelerate grid modeling based on the HRU level. Such parallel implementation takes better advantage of the computational power of a shared memory computer system. We conducted two experiments at multiple temporal and spatial scales of hydrological modeling using SWATG and SWATGP on a high-end server. At 500-m resolution, SWATGP was found to be up to nine times faster than SWATG in modeling over a roughly 2000 km2 watershed with 1 CPU and a 15 thread configuration. The study results demonstrate that parallel models save considerable time relative to traditional sequential simulation runs. Parallel computations of environmental models are beneficial for model applications, especially at large spatial and temporal scales and at high resolutions. The proposed SWATGP model is thus a promising tool for large-scale and high-resolution water resources research and management in addition to offering data fusion and model coupling ability.

Large-scale micromagnetics simulations with dipolar interaction using all-to-all communications

Directory of Open Access Journals (Sweden)

Hiroshi Tsukahara

2016-05-01

Full Text Available We implement on our micromagnetics simulator low-complexity parallel fast-Fourier-transform algorithms, which reduces the frequency of all-to-all communications from six to two times. Almost all the computation time of micromagnetics simulation is taken up by the calculation of the magnetostatic field which can be calculated using the fast Fourier transform method. The results show that the simulation time is decreased with good scalability, even if the micromagentics simulation is performed using 8192 physical cores. This high parallelization effect enables large-scale micromagentics simulation using over one billion to be performed. Because massively parallel computing is needed to simulate the magnetization dynamics of real permanent magnets composed of many micron-sized grains, it is expected that our simulator reveals how magnetization dynamics influences the coercivity of the permanent magnet.
Visual coherence for large-scale line-plot visualizations

KAUST Repository

Muigg, Philipp

2011-06-01

Displaying a large number of lines within a limited amount of screen space is a task that is common to many different classes of visualization techniques such as time-series visualizations, parallel coordinates, link-node diagrams, and phase-space diagrams. This paper addresses the challenging problems of cluttering and overdraw inherent to such visualizations. We generate a 2x2 tensor field during line rasterization that encodes the distribution of line orientations through each image pixel. Anisotropic diffusion of a noise texture is then used to generate a dense, coherent visualization of line orientation. In order to represent features of different scales, we employ a multi-resolution representation of the tensor field. The resulting technique can easily be applied to a wide variety of line-based visualizations. We demonstrate this for parallel coordinates, a time-series visualization, and a phase-space diagram. Furthermore, we demonstrate how to integrate a focus+context approach by incorporating a second tensor field. Our approach achieves interactive rendering performance for large data sets containing millions of data items, due to its image-based nature and ease of implementation on GPUs. Simulation results from computational fluid dynamics are used to evaluate the performance and usefulness of the proposed method. © 2011 The Author(s).
Visual coherence for large-scale line-plot visualizations

KAUST Repository

Muigg, Philipp; Hadwiger, Markus; Doleisch, Helmut; Grö ller, Eduard M.

2011-01-01

Displaying a large number of lines within a limited amount of screen space is a task that is common to many different classes of visualization techniques such as time-series visualizations, parallel coordinates, link-node diagrams, and phase-space diagrams. This paper addresses the challenging problems of cluttering and overdraw inherent to such visualizations. We generate a 2x2 tensor field during line rasterization that encodes the distribution of line orientations through each image pixel. Anisotropic diffusion of a noise texture is then used to generate a dense, coherent visualization of line orientation. In order to represent features of different scales, we employ a multi-resolution representation of the tensor field. The resulting technique can easily be applied to a wide variety of line-based visualizations. We demonstrate this for parallel coordinates, a time-series visualization, and a phase-space diagram. Furthermore, we demonstrate how to integrate a focus+context approach by incorporating a second tensor field. Our approach achieves interactive rendering performance for large data sets containing millions of data items, due to its image-based nature and ease of implementation on GPUs. Simulation results from computational fluid dynamics are used to evaluate the performance and usefulness of the proposed method. © 2011 The Author(s).
Traffic Flow Prediction Model for Large-Scale Road Network Based on Cloud Computing

Directory of Open Access Journals (Sweden)

Zhaosheng Yang

2014-01-01

Full Text Available To increase the efficiency and precision of large-scale road network traffic flow prediction, a genetic algorithm-support vector machine (GA-SVM model based on cloud computing is proposed in this paper, which is based on the analysis of the characteristics and defects of genetic algorithm and support vector machine. In cloud computing environment, firstly, SVM parameters are optimized by the parallel genetic algorithm, and then this optimized parallel SVM model is used to predict traffic flow. On the basis of the traffic flow data of Haizhu District in Guangzhou City, the proposed model was verified and compared with the serial GA-SVM model and parallel GA-SVM model based on MPI (message passing interface. The results demonstrate that the parallel GA-SVM model based on cloud computing has higher prediction accuracy, shorter running time, and higher speedup.
Finding Tropical Cyclones on a Cloud Computing Cluster: Using Parallel Virtualization for Large-Scale Climate Simulation Analysis

Energy Technology Data Exchange (ETDEWEB)

Hasenkamp, Daren; Sim, Alexander; Wehner, Michael; Wu, Kesheng

2010-09-30

Extensive computing power has been used to tackle issues such as climate changes, fusion energy, and other pressing scientific challenges. These computations produce a tremendous amount of data; however, many of the data analysis programs currently only run a single processor. In this work, we explore the possibility of using the emerging cloud computing platform to parallelize such sequential data analysis tasks. As a proof of concept, we wrap a program for analyzing trends of tropical cyclones in a set of virtual machines (VMs). This approach allows the user to keep their familiar data analysis environment in the VMs, while we provide the coordination and data transfer services to ensure the necessary input and output are directed to the desired locations. This work extensively exercises the networking capability of the cloud computing systems and has revealed a number of weaknesses in the current cloud system software. In our tests, we are able to scale the parallel data analysis job to a modest number of VMs and achieve a speedup that is comparable to running the same analysis task using MPI. However, compared to MPI based parallelization, the cloud-based approach has a number of advantages. The cloud-based approach is more flexible because the VMs can capture arbitrary software dependencies without requiring the user to rewrite their programs. The cloud-based approach is also more resilient to failure; as long as a single VM is running, it can make progress while as soon as one MPI node fails the whole analysis job fails. In short, this initial work demonstrates that a cloud computing system is a viable platform for distributed scientific data analyses traditionally conducted on dedicated supercomputing systems.
Finding Tropical Cyclones on a Cloud Computing Cluster: Using Parallel Virtualization for Large-Scale Climate Simulation Analysis

International Nuclear Information System (INIS)

Hasenkamp, Daren; Sim, Alexander; Wehner, Michael; Wu, Kesheng

2010-01-01

Extensive computing power has been used to tackle issues such as climate changes, fusion energy, and other pressing scientific challenges. These computations produce a tremendous amount of data; however, many of the data analysis programs currently only run a single processor. In this work, we explore the possibility of using the emerging cloud computing platform to parallelize such sequential data analysis tasks. As a proof of concept, we wrap a program for analyzing trends of tropical cyclones in a set of virtual machines (VMs). This approach allows the user to keep their familiar data analysis environment in the VMs, while we provide the coordination and data transfer services to ensure the necessary input and output are directed to the desired locations. This work extensively exercises the networking capability of the cloud computing systems and has revealed a number of weaknesses in the current cloud system software. In our tests, we are able to scale the parallel data analysis job to a modest number of VMs and achieve a speedup that is comparable to running the same analysis task using MPI. However, compared to MPI based parallelization, the cloud-based approach has a number of advantages. The cloud-based approach is more flexible because the VMs can capture arbitrary software dependencies without requiring the user to rewrite their programs. The cloud-based approach is also more resilient to failure; as long as a single VM is running, it can make progress while as soon as one MPI node fails the whole analysis job fails. In short, this initial work demonstrates that a cloud computing system is a viable platform for distributed scientific data analyses traditionally conducted on dedicated supercomputing systems.
Large-scale solar purchasing

International Nuclear Information System (INIS)

1999-01-01

The principal objective of the project was to participate in the definition of a new IEA task concerning solar procurement (''the Task'') and to assess whether involvement in the task would be in the interest of the UK active solar heating industry. The project also aimed to assess the importance of large scale solar purchasing to UK active solar heating market development and to evaluate the level of interest in large scale solar purchasing amongst potential large scale purchasers (in particular housing associations and housing developers). A further aim of the project was to consider means of stimulating large scale active solar heating purchasing activity within the UK. (author)
Parallel Monte Carlo reactor neutronics

International Nuclear Information System (INIS)

Blomquist, R.N.; Brown, F.B.

1994-01-01

The issues affecting implementation of parallel algorithms for large-scale engineering Monte Carlo neutron transport simulations are discussed. For nuclear reactor calculations, these include load balancing, recoding effort, reproducibility, domain decomposition techniques, I/O minimization, and strategies for different parallel architectures. Two codes were parallelized and tested for performance. The architectures employed include SIMD, MIMD-distributed memory, and workstation network with uneven interactive load. Speedups linear with the number of nodes were achieved
Massive parallel electromagnetic field simulation program JEMS-FDTD design and implementation on jasmin

International Nuclear Information System (INIS)

Li Hanyu; Zhou Haijing; Dong Zhiwei; Liao Cheng; Chang Lei; Cao Xiaolin; Xiao Li

2010-01-01

A large-scale parallel electromagnetic field simulation program JEMS-FDTD(J Electromagnetic Solver-Finite Difference Time Domain) is designed and implemented on JASMIN (J parallel Adaptive Structured Mesh applications INfrastructure). This program can simulate propagation, radiation, couple of electromagnetic field by solving Maxwell equations on structured mesh explicitly with FDTD method. JEMS-FDTD is able to simulate billion-mesh-scale problems on thousands of processors. In this article, the program is verified by simulating the radiation of an electric dipole. A beam waveguide is simulated to demonstrate the capability of large scale parallel computation. A parallel performance test indicates that a high parallel efficiency is obtained. (authors)
A Parallel, Finite-Volume Algorithm for Large-Eddy Simulation of Turbulent Flows

Science.gov (United States)

Bui, Trong T.

1999-01-01

A parallel, finite-volume algorithm has been developed for large-eddy simulation (LES) of compressible turbulent flows. This algorithm includes piecewise linear least-square reconstruction, trilinear finite-element interpolation, Roe flux-difference splitting, and second-order MacCormack time marching. Parallel implementation is done using the message-passing programming model. In this paper, the numerical algorithm is described. To validate the numerical method for turbulence simulation, LES of fully developed turbulent flow in a square duct is performed for a Reynolds number of 320 based on the average friction velocity and the hydraulic diameter of the duct. Direct numerical simulation (DNS) results are available for this test case, and the accuracy of this algorithm for turbulence simulations can be ascertained by comparing the LES solutions with the DNS results. The effects of grid resolution, upwind numerical dissipation, and subgrid-scale dissipation on the accuracy of the LES are examined. Comparison with DNS results shows that the standard Roe flux-difference splitting dissipation adversely affects the accuracy of the turbulence simulation. For accurate turbulence simulations, only 3-5 percent of the standard Roe flux-difference splitting dissipation is needed.
The build up of the correlation between halo spin and the large-scale structure

Science.gov (United States)

Wang, Peng; Kang, Xi

2018-01-01

Both simulations and observations have confirmed that the spin of haloes/galaxies is correlated with the large-scale structure (LSS) with a mass dependence such that the spin of low-mass haloes/galaxies tend to be parallel with the LSS, while that of massive haloes/galaxies tend to be perpendicular with the LSS. It is still unclear how this mass dependence is built up over time. We use N-body simulations to trace the evolution of the halo spin-LSS correlation and find that at early times the spin of all halo progenitors is parallel with the LSS. As time goes on, mass collapsing around massive halo is more isotropic, especially the recent mass accretion along the slowest collapsing direction is significant and it brings the halo spin to be perpendicular with the LSS. Adopting the fractional anisotropy (FA) parameter to describe the degree of anisotropy of the large-scale environment, we find that the spin-LSS correlation is a strong function of the environment such that a higher FA (more anisotropic environment) leads to an aligned signal, and a lower anisotropy leads to a misaligned signal. In general, our results show that the spin-LSS correlation is a combined consequence of mass flow and halo growth within the cosmic web. Our predicted environmental dependence between spin and large-scale structure can be further tested using galaxy surveys.
The TeraShake Computational Platform for Large-Scale Earthquake Simulations

Science.gov (United States)

Cui, Yifeng; Olsen, Kim; Chourasia, Amit; Moore, Reagan; Maechling, Philip; Jordan, Thomas

Geoscientific and computer science researchers with the Southern California Earthquake Center (SCEC) are conducting a large-scale, physics-based, computationally demanding earthquake system science research program with the goal of developing predictive models of earthquake processes. The computational demands of this program continue to increase rapidly as these researchers seek to perform physics-based numerical simulations of earthquake processes for larger meet the needs of this research program, a multiple-institution team coordinated by SCEC has integrated several scientific codes into a numerical modeling-based research tool we call the TeraShake computational platform (TSCP). A central component in the TSCP is a highly scalable earthquake wave propagation simulation program called the TeraShake anelastic wave propagation (TS-AWP) code. In this chapter, we describe how we extended an existing, stand-alone, wellvalidated, finite-difference, anelastic wave propagation modeling code into the highly scalable and widely used TS-AWP and then integrated this code into the TeraShake computational platform that provides end-to-end (initialization to analysis) research capabilities. We also describe the techniques used to enhance the TS-AWP parallel performance on TeraGrid supercomputers, as well as the TeraShake simulations phases including input preparation, run time, data archive management, and visualization. As a result of our efforts to improve its parallel efficiency, the TS-AWP has now shown highly efficient strong scaling on over 40K processors on IBM’s BlueGene/L Watson computer. In addition, the TSCP has developed into a computational system that is useful to many members of the SCEC community for performing large-scale earthquake simulations.
Scalable Parallel Methods for Analyzing Metagenomics Data at Extreme Scale

Energy Technology Data Exchange (ETDEWEB)

Daily, Jeffrey A. [Washington State Univ., Pullman, WA (United States)

2015-05-01

The field of bioinformatics and computational biology is currently experiencing a data revolution. The exciting prospect of making fundamental biological discoveries is fueling the rapid development and deployment of numerous cost-effective, high-throughput next-generation sequencing technologies. The result is that the DNA and protein sequence repositories are being bombarded with new sequence information. Databases are continuing to report a Moore’s law-like growth trajectory in their database sizes, roughly doubling every 18 months. In what seems to be a paradigm-shift, individual projects are now capable of generating billions of raw sequence data that need to be analyzed in the presence of already annotated sequence information. While it is clear that data-driven methods, such as sequencing homology detection, are becoming the mainstay in the field of computational life sciences, the algorithmic advancements essential for implementing complex data analytics at scale have mostly lagged behind. Sequence homology detection is central to a number of bioinformatics applications including genome sequencing and protein family characterization. Given millions of sequences, the goal is to identify all pairs of sequences that are highly similar (or “homologous”) on the basis of alignment criteria. While there are optimal alignment algorithms to compute pairwise homology, their deployment for large-scale is currently not feasible; instead, heuristic methods are used at the expense of quality. In this dissertation, we present the design and evaluation of a parallel implementation for conducting optimal homology detection on distributed memory supercomputers. Our approach uses a combination of techniques from asynchronous load balancing (viz. work stealing, dynamic task counters), data replication, and exact-matching filters to achieve homology detection at scale. Results for a collection of 2.56M sequences show parallel efficiencies of ~75-100% on up to 8K cores
Scalable Parallel Methods for Analyzing Metagenomics Data at Extreme Scale

International Nuclear Information System (INIS)

Daily, Jeffrey A.

2015-01-01

The field of bioinformatics and computational biology is currently experiencing a data revolution. The exciting prospect of making fundamental biological discoveries is fueling the rapid development and deployment of numerous cost-effective, high-throughput next-generation sequencing technologies. The result is that the DNA and protein sequence repositories are being bombarded with new sequence information. Databases are continuing to report a Moore's law-like growth trajectory in their database sizes, roughly doubling every 18 months. In what seems to be a paradigm-shift, individual projects are now capable of generating billions of raw sequence data that need to be analyzed in the presence of already annotated sequence information. While it is clear that data-driven methods, such as sequencing homology detection, are becoming the mainstay in the field of computational life sciences, the algorithmic advancements essential for implementing complex data analytics at scale have mostly lagged behind. Sequence homology detection is central to a number of bioinformatics applications including genome sequencing and protein family characterization. Given millions of sequences, the goal is to identify all pairs of sequences that are highly similar (or 'homologous') on the basis of alignment criteria. While there are optimal alignment algorithms to compute pairwise homology, their deployment for large-scale is currently not feasible; instead, heuristic methods are used at the expense of quality. In this dissertation, we present the design and evaluation of a parallel implementation for conducting optimal homology detection on distributed memory supercomputers. Our approach uses a combination of techniques from asynchronous load balancing (viz. work stealing, dynamic task counters), data replication, and exact-matching filters to achieve homology detection at scale. Results for a collection of 2.56M sequences show parallel efficiencies of ~75-100% on up to 8K
Parallel Computation of RCS of Electrically Large Platform with Coatings Modeled with NURBS Surfaces

Directory of Open Access Journals (Sweden)

Ying Yan

2012-01-01

Full Text Available The significance of Radar Cross Section (RCS in the military applications makes its prediction an important problem. This paper uses large-scale parallel Physical Optics (PO to realize the fast computation of RCS to electrically large targets, which are modeled by Non-Uniform Rational B-Spline (NURBS surfaces and coated with dielectric materials. Some numerical examples are presented to validate this paper’s method. In addition, 1024 CPUs are used in Shanghai Supercomputer Center (SSC to perform the simulation of a model with the maximum electrical size 1966.7 λ for the first time in China. From which, it can be found that this paper’s method can greatly speed the calculation and is capable of solving the real-life problem of RCS prediction.
Performance of Air Pollution Models on Massively Parallel Computers

DEFF Research Database (Denmark)

Brown, John; Hansen, Per Christian; Wasniewski, Jerzy

1996-01-01

To compare the performance and use of three massively parallel SIMD computers, we implemented a large air pollution model on the computers. Using a realistic large-scale model, we gain detailed insight about the performance of the three computers when used to solve large-scale scientific problems...
An inertia-free filter line-search algorithm for large-scale nonlinear programming

Energy Technology Data Exchange (ETDEWEB)

Chiang, Nai-Yuan; Zavala, Victor M.

2016-02-15

We present a filter line-search algorithm that does not require inertia information of the linear system. This feature enables the use of a wide range of linear algebra strategies and libraries, which is essential to tackle large-scale problems on modern computing architectures. The proposed approach performs curvature tests along the search step to detect negative curvature and to trigger convexification. We prove that the approach is globally convergent and we implement the approach within a parallel interior-point framework to solve large-scale and highly nonlinear problems. Our numerical tests demonstrate that the inertia-free approach is as efficient as inertia detection via symmetric indefinite factorizations. We also demonstrate that the inertia-free approach can lead to reductions in solution time because it reduces the amount of convexification needed.
The parallel volume at large distances

DEFF Research Database (Denmark)

Kampf, Jürgen

In this paper we examine the asymptotic behavior of the parallel volume of planar non-convex bodies as the distance tends to infinity. We show that the difference between the parallel volume of the convex hull of a body and the parallel volume of the body itself tends to . This yields a new proof...... for the fact that a planar body can only have polynomial parallel volume, if it is convex. Extensions to Minkowski spaces and random sets are also discussed....
The parallel volume at large distances

DEFF Research Database (Denmark)

Kampf, Jürgen

In this paper we examine the asymptotic behavior of the parallel volume of planar non-convex bodies as the distance tends to infinity. We show that the difference between the parallel volume of the convex hull of a body and the parallel volume of the body itself tends to 0. This yields a new proof...... for the fact that a planar body can only have polynomial parallel volume, if it is convex. Extensions to Minkowski spaces and random sets are also discussed....
Energy transfers in large-scale and small-scale dynamos

Science.gov (United States)

Samtaney, Ravi; Kumar, Rohit; Verma, Mahendra

2015-11-01

We present the energy transfers, mainly energy fluxes and shell-to-shell energy transfers in small-scale dynamo (SSD) and large-scale dynamo (LSD) using numerical simulations of MHD turbulence for Pm = 20 (SSD) and for Pm = 0.2 on 10243 grid. For SSD, we demonstrate that the magnetic energy growth is caused by nonlocal energy transfers from the large-scale or forcing-scale velocity field to small-scale magnetic field. The peak of these energy transfers move towards lower wavenumbers as dynamo evolves, which is the reason for the growth of the magnetic fields at the large scales. The energy transfers U2U (velocity to velocity) and B2B (magnetic to magnetic) are forward and local. For LSD, we show that the magnetic energy growth takes place via energy transfers from large-scale velocity field to large-scale magnetic field. We observe forward U2U and B2B energy flux, similar to SSD.

Large-scale Cosmic-Ray Anisotropy as a Probe of Interstellar Turbulence

Energy Technology Data Exchange (ETDEWEB)

Giacinti, Gwenael; Kirk, John G. [Max-Planck-Institut für Kernphysik, Postfach 103980, D-69029 Heidelberg (Germany)

2017-02-01

We calculate the large-scale cosmic-ray (CR) anisotropies predicted for a range of Goldreich–Sridhar (GS) and isotropic models of interstellar turbulence, and compare them with IceTop data. In general, the predicted CR anisotropy is not a pure dipole; the cold spots reported at 400 TeV and 2 PeV are consistent with a GS model that contains a smooth deficit of parallel-propagating waves and a broad resonance function, though some other possibilities cannot, as yet, be ruled out. In particular, isotropic fast magnetosonic wave turbulence can match the observations at high energy, but cannot accommodate an energy dependence in the shape of the CR anisotropy. Our findings suggest that improved data on the large-scale CR anisotropy could provide a valuable probe of the properties—notably the power-spectrum—of the interstellar turbulence within a few tens of parsecs from Earth.
A comparison of parallel dust and fibre measurements of airborne chrysotile asbestos in a large mine and processing factories in the Russian Federation

NARCIS (Netherlands)

Feletto, Eleonora; Schonfeld, Sara J; Kovalevskiy, Evgeny V; Bukhtiyarov, Igor V; Kashanskiy, Sergey V; Moissonnier, Monika; Straif, Kurt; Kromhout, Hans

2017-01-01

INTRODUCTION: Historic dust concentrations are available in a large-scale cohort study of workers in a chrysotile mine and processing factories in Asbest, Russian Federation. Parallel dust (gravimetric) and fibre (phase-contrast optical microscopy) concentrations collected in 1995, 2007 and 2013/14
Solving Large Quadratic|Assignment Problems in Parallel

DEFF Research Database (Denmark)

Clausen, Jens; Perregaard, Michael

1997-01-01

and recalculation of bounds between branchings when used in a parallel Branch-and-Bound algorithm. The algorithm has been implemented on a 16-processor MEIKO Computing Surface with Intel i860 processors. Computational results from the solution of a number of large QAPs, including the classical Nugent 20...... processors, and have hence not been ideally suited for computations essentially involving non-vectorizable computations on integers.In this paper we investigate the combination of one of the best bound functions for a Branch-and-Bound algorithm (the Gilmore-Lawler bound) and various testing, variable binding...
Massively Parallel Finite Element Programming

KAUST Repository

Heister, Timo; Kronbichler, Martin; Bangerth, Wolfgang

2010-01-01

Today's large finite element simulations require parallel algorithms to scale on clusters with thousands or tens of thousands of processor cores. We present data structures and algorithms to take advantage of the power of high performance computers in generic finite element codes. Existing generic finite element libraries often restrict the parallelization to parallel linear algebra routines. This is a limiting factor when solving on more than a few hundreds of cores. We describe routines for distributed storage of all major components coupled with efficient, scalable algorithms. We give an overview of our effort to enable the modern and generic finite element library deal.II to take advantage of the power of large clusters. In particular, we describe the construction of a distributed mesh and develop algorithms to fully parallelize the finite element calculation. Numerical results demonstrate good scalability. © 2010 Springer-Verlag.
Massively Parallel Finite Element Programming

KAUST Repository

Heister, Timo

2010-01-01

Today\\'s large finite element simulations require parallel algorithms to scale on clusters with thousands or tens of thousands of processor cores. We present data structures and algorithms to take advantage of the power of high performance computers in generic finite element codes. Existing generic finite element libraries often restrict the parallelization to parallel linear algebra routines. This is a limiting factor when solving on more than a few hundreds of cores. We describe routines for distributed storage of all major components coupled with efficient, scalable algorithms. We give an overview of our effort to enable the modern and generic finite element library deal.II to take advantage of the power of large clusters. In particular, we describe the construction of a distributed mesh and develop algorithms to fully parallelize the finite element calculation. Numerical results demonstrate good scalability. © 2010 Springer-Verlag.
HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing.

Science.gov (United States)

Wan, Shixiang; Zou, Quan

2017-01-01

Multiple sequence alignment (MSA) plays a key role in biological sequence analyses, especially in phylogenetic tree construction. Extreme increase in next-generation sequencing results in shortage of efficient ultra-large biological sequence alignment approaches for coping with different sequence types. Distributed and parallel computing represents a crucial technique for accelerating ultra-large (e.g. files more than 1 GB) sequence analyses. Based on HAlign and Spark distributed computing system, we implement a highly cost-efficient and time-efficient HAlign-II tool to address ultra-large multiple biological sequence alignment and phylogenetic tree construction. The experiments in the DNA and protein large scale data sets, which are more than 1GB files, showed that HAlign II could save time and space. It outperformed the current software tools. HAlign-II can efficiently carry out MSA and construct phylogenetic trees with ultra-large numbers of biological sequences. HAlign-II shows extremely high memory efficiency and scales well with increases in computing resource. THAlign-II provides a user-friendly web server based on our distributed computing infrastructure. HAlign-II with open-source codes and datasets was established at http://lab.malab.cn/soft/halign.
Solving very large scattering problems using a parallel PWTD-enhanced surface integral equation solver

KAUST Repository

Liu, Yang

2013-07-01

The computational complexity and memory requirements of multilevel plane wave time domain (PWTD)-accelerated marching-on-in-time (MOT)-based surface integral equation (SIE) solvers scale as O(NtNs(log 2)Ns) and O(Ns 1.5); here N t and Ns denote numbers of temporal and spatial basis functions discretizing the current [Shanker et al., IEEE Trans. Antennas Propag., 51, 628-641, 2003]. In the past, serial versions of these solvers have been successfully applied to the analysis of scattering from perfect electrically conducting as well as homogeneous penetrable targets involving up to Ns ≈ 0.5 × 106 and Nt ≈ 10 3. To solve larger problems, parallel PWTD-enhanced MOT solvers are called for. Even though a simple parallelization strategy was demonstrated in the context of electromagnetic compatibility analysis [M. Lu et al., in Proc. IEEE Int. Symp. AP-S, 4, 4212-4215, 2004], by and large, progress in this area has been slow. The lack of progress can be attributed wholesale to difficulties associated with the construction of a scalable PWTD kernel. © 2013 IEEE.
Large-scale data analytics

CERN Document Server

Gkoulalas-Divanis, Aris

2014-01-01

Provides cutting-edge research in large-scale data analytics from diverse scientific areas Surveys varied subject areas and reports on individual results of research in the field Shares many tips and insights into large-scale data analytics from authors and editors with long-term experience and specialization in the field
Large-scale self-assembled zirconium phosphate smectic layers via a simple spray-coating process

Science.gov (United States)

Wong, Minhao; Ishige, Ryohei; White, Kevin L.; Li, Peng; Kim, Daehak; Krishnamoorti, Ramanan; Gunther, Robert; Higuchi, Takeshi; Jinnai, Hiroshi; Takahara, Atsushi; Nishimura, Riichi; Sue, Hung-Jue

2014-04-01

The large-scale assembly of asymmetric colloidal particles is used in creating high-performance fibres. A similar concept is extended to the manufacturing of thin films of self-assembled two-dimensional crystal-type materials with enhanced and tunable properties. Here we present a spray-coating method to manufacture thin, flexible and transparent epoxy films containing zirconium phosphate nanoplatelets self-assembled into a lamellar arrangement aligned parallel to the substrate. The self-assembled mesophase of zirconium phosphate nanoplatelets is stabilized by epoxy pre-polymer and exhibits rheology favourable towards large-scale manufacturing. The thermally cured film forms a mechanically robust coating and shows excellent gas barrier properties at both low- and high humidity levels as a result of the highly aligned and overlapping arrangement of nanoplatelets. This work shows that the large-scale ordering of high aspect ratio nanoplatelets is easier to achieve than previously thought and may have implications in the technological applications for similar materials.
Synchronization Techniques in Parallel Discrete Event Simulation

OpenAIRE

Lindén, Jonatan

2018-01-01

Discrete event simulation is an important tool for evaluating system models in many fields of science and engineering. To improve the performance of large-scale discrete event simulations, several techniques to parallelize discrete event simulation have been developed. In parallel discrete event simulation, the work of a single discrete event simulation is distributed over multiple processing elements. A key challenge in parallel discrete event simulation is to ensure that causally dependent ...
The multilevel fast multipole algorithm (MLFMA) for solving large-scale computational electromagnetics problems

CERN Document Server

Ergul, Ozgur

2014-01-01

The Multilevel Fast Multipole Algorithm (MLFMA) for Solving Large-Scale Computational Electromagnetic Problems provides a detailed and instructional overview of implementing MLFMA. The book: Presents a comprehensive treatment of the MLFMA algorithm, including basic linear algebra concepts, recent developments on the parallel computation, and a number of application examplesCovers solutions of electromagnetic problems involving dielectric objects and perfectly-conducting objectsDiscusses applications including scattering from airborne targets, scattering from red
A Model of Parallel Kinematics for Machine Calibration

DEFF Research Database (Denmark)

Pedersen, David Bue; Bæk Nielsen, Morten; Kløve Christensen, Simon

2016-01-01

Parallel kinematics have been adopted by more than 25 manufacturers of high-end desktop 3D printers [Wohlers Report (2015), p.118] as well as by research projects such as the WASP project [WASP (2015)], a 12 meter tall linear delta robot for Additive Manufacture of large-scale components for cons......Parallel kinematics have been adopted by more than 25 manufacturers of high-end desktop 3D printers [Wohlers Report (2015), p.118] as well as by research projects such as the WASP project [WASP (2015)], a 12 meter tall linear delta robot for Additive Manufacture of large-scale components...
Large Scale Simulations of the Euler Equations on GPU Clusters

KAUST Repository

Liebmann, Manfred

2010-08-01

The paper investigates the scalability of a parallel Euler solver, using the Vijayasundaram method, on a GPU cluster with 32 Nvidia Geforce GTX 295 boards. The aim of this research is to enable large scale fluid dynamics simulations with up to one billion elements. We investigate communication protocols for the GPU cluster to compensate for the slow Gigabit Ethernet network between the GPU compute nodes and to maintain overall efficiency. A diesel engine intake-port and a nozzle, meshed in different resolutions, give good real world examples for the scalability tests on the GPU cluster. © 2010 IEEE.
[Parallel virtual reality visualization of extreme large medical datasets].

Science.gov (United States)

Tang, Min

2010-04-01

On the basis of a brief description of grid computing, the essence and critical techniques of parallel visualization of extreme large medical datasets are discussed in connection with Intranet and common-configuration computers of hospitals. In this paper are introduced several kernel techniques, including the hardware structure, software framework, load balance and virtual reality visualization. The Maximum Intensity Projection algorithm is realized in parallel using common PC cluster. In virtual reality world, three-dimensional models can be rotated, zoomed, translated and cut interactively and conveniently through the control panel built on virtual reality modeling language (VRML). Experimental results demonstrate that this method provides promising and real-time results for playing the role in of a good assistant in making clinical diagnosis.
A cloud-based framework for large-scale traditional Chinese medical record retrieval.

Science.gov (United States)

Liu, Lijun; Liu, Li; Fu, Xiaodong; Huang, Qingsong; Zhang, Xianwen; Zhang, Yin

2018-01-01

Electronic medical records are increasingly common in medical practice. The secondary use of medical records has become increasingly important. It relies on the ability to retrieve the complete information about desired patient populations. How to effectively and accurately retrieve relevant medical records from large- scale medical big data is becoming a big challenge. Therefore, we propose an efficient and robust framework based on cloud for large-scale Traditional Chinese Medical Records (TCMRs) retrieval. We propose a parallel index building method and build a distributed search cluster, the former is used to improve the performance of index building, and the latter is used to provide high concurrent online TCMRs retrieval. Then, a real-time multi-indexing model is proposed to ensure the latest relevant TCMRs are indexed and retrieved in real-time, and a semantics-based query expansion method and a multi- factor ranking model are proposed to improve retrieval quality. Third, we implement a template-based visualization method for displaying medical reports. The proposed parallel indexing method and distributed search cluster can improve the performance of index building and provide high concurrent online TCMRs retrieval. The multi-indexing model can ensure the latest relevant TCMRs are indexed and retrieved in real-time. The semantics expansion method and the multi-factor ranking model can enhance retrieval quality. The template-based visualization method can enhance the availability and universality, where the medical reports are displayed via friendly web interface. In conclusion, compared with the current medical record retrieval systems, our system provides some advantages that are useful in improving the secondary use of large-scale traditional Chinese medical records in cloud environment. The proposed system is more easily integrated with existing clinical systems and be used in various scenarios. Copyright © 2017. Published by Elsevier Inc.
Large-scale grid management

International Nuclear Information System (INIS)

Langdal, Bjoern Inge; Eggen, Arnt Ove

2003-01-01

The network companies in the Norwegian electricity industry now have to establish a large-scale network management, a concept essentially characterized by (1) broader focus (Broad Band, Multi Utility,...) and (2) bigger units with large networks and more customers. Research done by SINTEF Energy Research shows so far that the approaches within large-scale network management may be structured according to three main challenges: centralization, decentralization and out sourcing. The article is part of a planned series
Parallel computing works

Energy Technology Data Exchange (ETDEWEB)

1991-10-23

An account of the Caltech Concurrent Computation Program (C{sup 3}P), a five year project that focused on answering the question: Can parallel computers be used to do large-scale scientific computations '' As the title indicates, the question is answered in the affirmative, by implementing numerous scientific applications on real parallel computers and doing computations that produced new scientific results. In the process of doing so, C{sup 3}P helped design and build several new computers, designed and implemented basic system software, developed algorithms for frequently used mathematical computations on massively parallel machines, devised performance models and measured the performance of many computers, and created a high performance computing facility based exclusively on parallel computers. While the initial focus of C{sup 3}P was the hypercube architecture developed by C. Seitz, many of the methods developed and lessons learned have been applied successfully on other massively parallel architectures.
Research of the effectiveness of parallel multithreaded realizations of interpolation methods for scaling raster images

Science.gov (United States)

Vnukov, A. A.; Shershnev, M. B.

2018-01-01

The aim of this work is the software implementation of three image scaling algorithms using parallel computations, as well as the development of an application with a graphical user interface for the Windows operating system to demonstrate the operation of algorithms and to study the relationship between system performance, algorithm execution time and the degree of parallelization of computations. Three methods of interpolation were studied, formalized and adapted to scale images. The result of the work is a program for scaling images by different methods. Comparison of the quality of scaling by different methods is given.
Highly parallel machines and future of scientific computing

International Nuclear Information System (INIS)

Singh, G.S.

1992-01-01

Computing requirement of large scale scientific computing has always been ahead of what state of the art hardware could supply in the form of supercomputers of the day. And for any single processor system the limit to increase in the computing power was realized a few years back itself. Now with the advent of parallel computing systems the availability of machines with the required computing power seems a reality. In this paper the author tries to visualize the future large scale scientific computing in the penultimate decade of the present century. The author summarized trends in parallel computers and emphasize the need for a better programming environment and software tools for optimal performance. The author concludes this paper with critique on parallel architectures, software tools and algorithms. (author). 10 refs., 2 tabs
Large Scale Community Detection Using a Small World Model

Directory of Open Access Journals (Sweden)

Ranjan Kumar Behera

2017-11-01

Full Text Available In a social network, small or large communities within the network play a major role in deciding the functionalities of the network. Despite of diverse definitions, communities in the network may be defined as the group of nodes that are more densely connected as compared to nodes outside the group. Revealing such hidden communities is one of the challenging research problems. A real world social network follows small world phenomena, which indicates that any two social entities can be reachable in a small number of steps. In this paper, nodes are mapped into communities based on the random walk in the network. However, uncovering communities in large-scale networks is a challenging task due to its unprecedented growth in the size of social networks. A good number of community detection algorithms based on random walk exist in literature. In addition, when large-scale social networks are being considered, these algorithms are observed to take considerably longer time. In this work, with an objective to improve the efficiency of algorithms, parallel programming framework like Map-Reduce has been considered for uncovering the hidden communities in social network. The proposed approach has been compared with some standard existing community detection algorithms for both synthetic and real-world datasets in order to examine its performance, and it is observed that the proposed algorithm is more efficient than the existing ones.

Planning under uncertainty solving large-scale stochastic linear programs

Energy Technology Data Exchange (ETDEWEB)

Infanger, G. [Stanford Univ., CA (United States). Dept. of Operations Research]|[Technische Univ., Vienna (Austria). Inst. fuer Energiewirtschaft

1992-12-01

For many practical problems, solutions obtained from deterministic models are unsatisfactory because they fail to hedge against certain contingencies that may occur in the future. Stochastic models address this shortcoming, but up to recently seemed to be intractable due to their size. Recent advances both in solution algorithms and in computer technology now allow us to solve important and general classes of practical stochastic problems. We show how large-scale stochastic linear programs can be efficiently solved by combining classical decomposition and Monte Carlo (importance) sampling techniques. We discuss the methodology for solving two-stage stochastic linear programs with recourse, present numerical results of large problems with numerous stochastic parameters, show how to efficiently implement the methodology on a parallel multi-computer and derive the theory for solving a general class of multi-stage problems with dependency of the stochastic parameters within a stage and between different stages.
Ethics of large-scale change

OpenAIRE

Arler, Finn

2006-01-01

The subject of this paper is long-term large-scale changes in human society. Some very significant examples of large-scale change are presented: human population growth, human appropriation of land and primary production, the human use of fossil fuels, and climate change. The question is posed, which kind of attitude is appropriate when dealing with large-scale changes like these from an ethical point of view. Three kinds of approaches are discussed: Aldo Leopold's mountain thinking, th...
Massively parallel Monte Carlo. Experiences running nuclear simulations on a large condor cluster

International Nuclear Information System (INIS)

Tickner, James; O'Dwyer, Joel; Roach, Greg; Uher, Josef; Hitchen, Greg

2010-01-01

The trivially-parallel nature of Monte Carlo (MC) simulations make them ideally suited for running on a distributed, heterogeneous computing environment. We report on the setup and operation of a large, cycle-harvesting Condor computer cluster, used to run MC simulations of nuclear instruments ('jobs') on approximately 4,500 desktop PCs. Successful operation must balance the competing goals of maximizing the availability of machines for running jobs whilst minimizing the impact on users' PC performance. This requires classification of jobs according to anticipated run-time and priority and careful optimization of the parameters used to control job allocation to host machines. To maximize use of a large Condor cluster, we have created a powerful suite of tools to handle job submission and analysis, as the manual creation, submission and evaluation of large numbers (hundred to thousands) of jobs would be too arduous. We describe some of the key aspects of this suite, which has been interfaced to the well-known MCNP and EGSnrc nuclear codes and our in-house PHOTON optical MC code. We report on our practical experiences of operating our Condor cluster and present examples of several large-scale instrument design problems that have been solved using this tool. (author)
Towards a Database System for Large-scale Analytics on Strings

KAUST Repository

Sahli, Majed A.

2015-07-23

Recent technological advances are causing an explosion in the production of sequential data. Biological sequences, web logs and time series are represented as strings. Currently, strings are stored, managed and queried in an ad-hoc fashion because they lack a standardized data model and query language. String queries are computationally demanding, especially when strings are long and numerous. Existing approaches cannot handle the growing number of strings produced by environmental, healthcare, bioinformatic, and space applications. There is a trade- off between performing analytics efficiently and scaling to thousands of cores to finish in reasonable times. In this thesis, we introduce a data model that unifies the input and output representations of core string operations. We define a declarative query language for strings where operators can be pipelined to form complex queries. A rich set of core string operators is described to support string analytics. We then demonstrate a database system for string analytics based on our model and query language. In particular, we propose the use of a novel data structure augmented by efficient parallel computation to strike a balance between preprocessing overheads and query execution times. Next, we delve into repeated motifs extraction as a core string operation for large-scale string analytics. Motifs are frequent patterns used, for example, to identify biological functionality, periodic trends, or malicious activities. Statistical approaches are fast but inexact while combinatorial methods are sound but slow. We introduce ACME, a combinatorial repeated motifs extractor. We study the spatial and temporal locality of motif extraction and devise a cache-aware search space traversal technique. ACME is the only method that scales to gigabyte- long strings, handles large alphabets, and supports interesting motif types with minimal overhead. While ACME is cache-efficient, it is limited by being serial. We devise a lightweight
NonLinear Parallel OPtimization Tool, Phase II

Data.gov (United States)

National Aeronautics and Space Administration — The technological advancement proposed is a novel large-scale Noninear Parallel OPtimization Tool (NLPAROPT). This software package will eliminate the computational...
The linearly scaling 3D fragment method for large scale electronic structure calculations

Energy Technology Data Exchange (ETDEWEB)

Zhao Zhengji [National Energy Research Scientific Computing Center (NERSC) (United States); Meza, Juan; Shan Hongzhang; Strohmaier, Erich; Bailey, David; Wang Linwang [Computational Research Division, Lawrence Berkeley National Laboratory (United States); Lee, Byounghak, E-mail: ZZhao@lbl.go [Physics Department, Texas State University (United States)

2009-07-01

The linearly scaling three-dimensional fragment (LS3DF) method is an O(N) ab initio electronic structure method for large-scale nano material simulations. It is a divide-and-conquer approach with a novel patching scheme that effectively cancels out the artificial boundary effects, which exist in all divide-and-conquer schemes. This method has made ab initio simulations of thousand-atom nanosystems feasible in a couple of hours, while retaining essentially the same accuracy as the direct calculation methods. The LS3DF method won the 2008 ACM Gordon Bell Prize for algorithm innovation. Our code has reached 442 Tflop/s running on 147,456 processors on the Cray XT5 (Jaguar) at OLCF, and has been run on 163,840 processors on the Blue Gene/P (Intrepid) at ALCF, and has been applied to a system containing 36,000 atoms. In this paper, we will present the recent parallel performance results of this code, and will apply the method to asymmetric CdSe/CdS core/shell nanorods, which have potential applications in electronic devices and solar cells.
An Efficient Parallel Multi-Scale Segmentation Method for Remote Sensing Imagery

Directory of Open Access Journals (Sweden)

Haiyan Gu

2018-04-01

Full Text Available Remote sensing (RS image segmentation is an essential step in geographic object-based image analysis (GEOBIA to ultimately derive “meaningful objects”. While many segmentation methods exist, most of them are not efficient for large data sets. Thus, the goal of this research is to develop an efficient parallel multi-scale segmentation method for RS imagery by combining graph theory and the fractal net evolution approach (FNEA. Specifically, a minimum spanning tree (MST algorithm in graph theory is proposed to be combined with a minimum heterogeneity rule (MHR algorithm that is used in FNEA. The MST algorithm is used for the initial segmentation while the MHR algorithm is used for object merging. An efficient implementation of the segmentation strategy is presented using data partition and the “reverse searching-forward processing” chain based on message passing interface (MPI parallel technology. Segmentation results of the proposed method using images from multiple sensors (airborne, SPECIM AISA EAGLE II, WorldView-2, RADARSAT-2 and different selected landscapes (residential/industrial, residential/agriculture covering four test sites indicated its efficiency in accuracy and speed. We conclude that the proposed method is applicable and efficient for the segmentation of a variety of RS imagery (airborne optical, satellite optical, SAR, high-spectral, while the accuracy is comparable with that of the FNEA method.
Large-scale visualization system for grid environment

International Nuclear Information System (INIS)

Suzuki, Yoshio

2007-01-01

Center for Computational Science and E-systems of Japan Atomic Energy Agency (CCSE/JAEA) has been conducting R and Ds of distributed computing (grid computing) environments: Seamless Thinking Aid (STA), Information Technology Based Laboratory (ITBL) and Atomic Energy Grid InfraStructure (AEGIS). In these R and Ds, we have developed the visualization technology suitable for the distributed computing environment. As one of the visualization tools, we have developed the Parallel Support Toolkit (PST) which can execute the visualization process parallely on a computer. Now, we improve PST to be executable simultaneously on multiple heterogeneous computers using Seamless Thinking Aid Message Passing Interface (STAMPI). STAMPI, we have developed in these R and Ds, is the MPI library executable on a heterogeneous computing environment. The improvement realizes the visualization of extremely large-scale data and enables more efficient visualization processes in a distributed computing environment. (author)
A Parallel Distributed-Memory Particle Method Enables Acquisition-Rate Segmentation of Large Fluorescence Microscopy Images.

Science.gov (United States)

Afshar, Yaser; Sbalzarini, Ivo F

2016-01-01

Modern fluorescence microscopy modalities, such as light-sheet microscopy, are capable of acquiring large three-dimensional images at high data rate. This creates a bottleneck in computational processing and analysis of the acquired images, as the rate of acquisition outpaces the speed of processing. Moreover, images can be so large that they do not fit the main memory of a single computer. We address both issues by developing a distributed parallel algorithm for segmentation of large fluorescence microscopy images. The method is based on the versatile Discrete Region Competition algorithm, which has previously proven useful in microscopy image segmentation. The present distributed implementation decomposes the input image into smaller sub-images that are distributed across multiple computers. Using network communication, the computers orchestrate the collectively solving of the global segmentation problem. This not only enables segmentation of large images (we test images of up to 10(10) pixels), but also accelerates segmentation to match the time scale of image acquisition. Such acquisition-rate image segmentation is a prerequisite for the smart microscopes of the future and enables online data compression and interactive experiments.
A parallel nearly implicit time-stepping scheme

OpenAIRE

Botchev, Mike A.; van der Vorst, Henk A.

2001-01-01

Across-the-space parallelism still remains the most mature, convenient and natural way to parallelize large scale problems. One of the major problems here is that implicit time stepping is often difficult to parallelize due to the structure of the system. Approximate implicit schemes have been suggested to circumvent the problem. These schemes have attractive stability properties and they are also very well parallelizable. The purpose of this article is to give an overall assessment of the pa...
Overview of the Force Scientific Parallel Language

Directory of Open Access Journals (Sweden)

Gita Alaghband

1994-01-01

Full Text Available The Force parallel programming language designed for large-scale shared-memory multiprocessors is presented. The language provides a number of parallel constructs as extensions to the ordinary Fortran language and is implemented as a two-level macro preprocessor to support portability across shared memory multiprocessors. The global parallelism model on which the Force is based provides a powerful parallel language. The parallel constructs, generic synchronization, and freedom from process management supported by the Force has resulted in structured parallel programs that are ported to the many multiprocessors on which the Force is implemented. Two new parallel constructs for looping and functional decomposition are discussed. Several programming examples to illustrate some parallel programming approaches using the Force are also presented.
Large-Scale Cubic-Scaling Random Phase Approximation Correlation Energy Calculations Using a Gaussian Basis.

Science.gov (United States)

Wilhelm, Jan; Seewald, Patrick; Del Ben, Mauro; Hutter, Jürg

2016-12-13

We present an algorithm for computing the correlation energy in the random phase approximation (RPA) in a Gaussian basis requiring [Formula: see text] operations and [Formula: see text] memory. The method is based on the resolution of the identity (RI) with the overlap metric, a reformulation of RI-RPA in the Gaussian basis, imaginary time, and imaginary frequency integration techniques, and the use of sparse linear algebra. Additional memory reduction without extra computations can be achieved by an iterative scheme that overcomes the memory bottleneck of canonical RPA implementations. We report a massively parallel implementation that is the key for the application to large systems. Finally, cubic-scaling RPA is applied to a thousand water molecules using a correlation-consistent triple-ζ quality basis.
3D large-scale calculations using the method of characteristics

International Nuclear Information System (INIS)

Dahmani, M.; Roy, R.; Koclas, J.

2004-01-01

An overview of the computational requirements and the numerical developments made in order to be able to solve 3D large-scale problems using the characteristics method will be presented. To accelerate the MCI solver, efficient acceleration techniques were implemented and parallelization was performed. However, for the very large problems, the size of the tracking file used to store the tracks can still become prohibitive and exceed the capacity of the machine. The new 3D characteristics solver MCG will now be introduced. This methodology is dedicated to solve very large 3D problems (a part or a whole core) without spatial homogenization. In order to eliminate the input/output problems occurring when solving these large problems, we define a new computing scheme that requires more CPU resources than the usual one, based on sweeps over large tracking files. The huge capacity of storage needed in some problems and the related I/O queries needed by the characteristics solver are replaced by on-the-fly recalculation of tracks at each iteration step. Using this technique, large 3D problems are no longer I/O-bound, and distributed CPU resources can be efficiently used. (author)
A Proactive Complex Event Processing Method for Large-Scale Transportation Internet of Things

OpenAIRE

Wang, Yongheng; Cao, Kening

2014-01-01

The Internet of Things (IoT) provides a new way to improve the transportation system. The key issue is how to process the numerous events generated by IoT. In this paper, a proactive complex event processing method is proposed for large-scale transportation IoT. Based on a multilayered adaptive dynamic Bayesian model, a Bayesian network structure learning algorithm using search-and-score is proposed to support accurate predictive analytics. A parallel Markov decision processes model is design...
State-of-the-Art in GPU-Based Large-Scale Volume Visualization

KAUST Repository

Beyer, Johanna

2015-05-01

This survey gives an overview of the current state of the art in GPU techniques for interactive large-scale volume visualization. Modern techniques in this field have brought about a sea change in how interactive visualization and analysis of giga-, tera- and petabytes of volume data can be enabled on GPUs. In addition to combining the parallel processing power of GPUs with out-of-core methods and data streaming, a major enabler for interactivity is making both the computational and the visualization effort proportional to the amount and resolution of data that is actually visible on screen, i.e. \\'output-sensitive\\' algorithms and system designs. This leads to recent output-sensitive approaches that are \\'ray-guided\\', \\'visualization-driven\\' or \\'display-aware\\'. In this survey, we focus on these characteristics and propose a new categorization of GPU-based large-scale volume visualization techniques based on the notions of actual output-resolution visibility and the current working set of volume bricks-the current subset of data that is minimally required to produce an output image of the desired display resolution. Furthermore, we discuss the differences and similarities of different rendering and data traversal strategies in volume rendering by putting them into a common context-the notion of address translation. For our purposes here, we view parallel (distributed) visualization using clusters as an orthogonal set of techniques that we do not discuss in detail but that can be used in conjunction with what we present in this survey. © 2015 The Eurographics Association and John Wiley & Sons Ltd.
State-of-the-Art in GPU-Based Large-Scale Volume Visualization

KAUST Repository

Beyer, Johanna; Hadwiger, Markus; Pfister, Hanspeter

2015-01-01

This survey gives an overview of the current state of the art in GPU techniques for interactive large-scale volume visualization. Modern techniques in this field have brought about a sea change in how interactive visualization and analysis of giga-, tera- and petabytes of volume data can be enabled on GPUs. In addition to combining the parallel processing power of GPUs with out-of-core methods and data streaming, a major enabler for interactivity is making both the computational and the visualization effort proportional to the amount and resolution of data that is actually visible on screen, i.e. 'output-sensitive' algorithms and system designs. This leads to recent output-sensitive approaches that are 'ray-guided', 'visualization-driven' or 'display-aware'. In this survey, we focus on these characteristics and propose a new categorization of GPU-based large-scale volume visualization techniques based on the notions of actual output-resolution visibility and the current working set of volume bricks-the current subset of data that is minimally required to produce an output image of the desired display resolution. Furthermore, we discuss the differences and similarities of different rendering and data traversal strategies in volume rendering by putting them into a common context-the notion of address translation. For our purposes here, we view parallel (distributed) visualization using clusters as an orthogonal set of techniques that we do not discuss in detail but that can be used in conjunction with what we present in this survey. © 2015 The Eurographics Association and John Wiley & Sons Ltd.
Stability of Large Parallel Tunnels Excavated in Weak Rocks: A Case Study

Science.gov (United States)

Ding, Xiuli; Weng, Yonghong; Zhang, Yuting; Xu, Tangjin; Wang, Tuanle; Rao, Zhiwen; Qi, Zufang

2017-09-01

Diversion tunnels are important structures for hydropower projects but are always placed in locations with less favorable geological conditions than those in which other structures are placed. Because diversion tunnels are usually large and closely spaced, the rock pillar between adjacent tunnels in weak rocks is affected on both sides, and conventional support measures may not be adequate to achieve the required stability. Thus, appropriate reinforcement support measures are needed, and the design philosophy regarding large parallel tunnels in weak rocks should be updated. This paper reports a recent case in which two large parallel diversion tunnels are excavated. The rock masses are thin- to ultra-thin-layered strata coated with phyllitic films, which significantly decrease the soundness and strength of the strata and weaken the rocks. The behaviors of the surrounding rock masses under original (and conventional) support measures are detailed in terms of rock mass deformation, anchor bolt stress, and the extent of the excavation disturbed zone (EDZ), as obtained from safety monitoring and field testing. In situ observed phenomena and their interpretation are also included. The sidewall deformations exhibit significant time-dependent characteristics, and large magnitudes are recorded. The stresses in the anchor bolts are small, but the extents of the EDZs are large. The stability condition under the original support measures is evaluated as poor. To enhance rock mass stability, attempts are made to reinforce support design and improve safety monitoring programs. The main feature of these attempts is the use of prestressed cables that run through the rock pillar between the parallel tunnels. The efficacy of reinforcement support measures is verified by further safety monitoring data and field test results. Numerical analysis is constantly performed during the construction process to provide a useful reference for decision making. The calculated deformations are in
The Parallel System for Integrating Impact Models and Sectors (pSIMS)

Science.gov (United States)

Elliott, Joshua; Kelly, David; Chryssanthacopoulos, James; Glotter, Michael; Jhunjhnuwala, Kanika; Best, Neil; Wilde, Michael; Foster, Ian

2014-01-01

We present a framework for massively parallel climate impact simulations: the parallel System for Integrating Impact Models and Sectors (pSIMS). This framework comprises a) tools for ingesting and converting large amounts of data to a versatile datatype based on a common geospatial grid; b) tools for translating this datatype into custom formats for site-based models; c) a scalable parallel framework for performing large ensemble simulations, using any one of a number of different impacts models, on clusters, supercomputers, distributed grids, or clouds; d) tools and data standards for reformatting outputs to common datatypes for analysis and visualization; and e) methodologies for aggregating these datatypes to arbitrary spatial scales such as administrative and environmental demarcations. By automating many time-consuming and error-prone aspects of large-scale climate impacts studies, pSIMS accelerates computational research, encourages model intercomparison, and enhances reproducibility of simulation results. We present the pSIMS design and use example assessments to demonstrate its multi-model, multi-scale, and multi-sector versatility.
Political consultation and large-scale research

International Nuclear Information System (INIS)

Bechmann, G.; Folkers, H.

1977-01-01

Large-scale research and policy consulting have an intermediary position between sociological sub-systems. While large-scale research coordinates science, policy, and production, policy consulting coordinates science, policy and political spheres. In this very position, large-scale research and policy consulting lack of institutional guarantees and rational back-ground guarantee which are characteristic for their sociological environment. This large-scale research can neither deal with the production of innovative goods under consideration of rentability, nor can it hope for full recognition by the basis-oriented scientific community. Policy consulting knows neither the competence assignment of the political system to make decisions nor can it judge succesfully by the critical standards of the established social science, at least as far as the present situation is concerned. This intermediary position of large-scale research and policy consulting has, in three points, a consequence supporting the thesis which states that this is a new form of institutionalization of science: These are: 1) external control, 2) the organization form, 3) the theoretical conception of large-scale research and policy consulting. (orig.) [de
Large-scale multimedia modeling applications

International Nuclear Information System (INIS)

Droppo, J.G. Jr.; Buck, J.W.; Whelan, G.; Strenge, D.L.; Castleton, K.J.; Gelston, G.M.

1995-08-01

Over the past decade, the US Department of Energy (DOE) and other agencies have faced increasing scrutiny for a wide range of environmental issues related to past and current practices. A number of large-scale applications have been undertaken that required analysis of large numbers of potential environmental issues over a wide range of environmental conditions and contaminants. Several of these applications, referred to here as large-scale applications, have addressed long-term public health risks using a holistic approach for assessing impacts from potential waterborne and airborne transport pathways. Multimedia models such as the Multimedia Environmental Pollutant Assessment System (MEPAS) were designed for use in such applications. MEPAS integrates radioactive and hazardous contaminants impact computations for major exposure routes via air, surface water, ground water, and overland flow transport. A number of large-scale applications of MEPAS have been conducted to assess various endpoints for environmental and human health impacts. These applications are described in terms of lessons learned in the development of an effective approach for large-scale applications

Decentralized Large-Scale Power Balancing

DEFF Research Database (Denmark)

Halvgaard, Rasmus; Jørgensen, John Bagterp; Poulsen, Niels Kjølstad

2013-01-01

problem is formulated as a centralized large-scale optimization problem but is then decomposed into smaller subproblems that are solved locally by each unit connected to an aggregator. For large-scale systems the method is faster than solving the full problem and can be distributed to include an arbitrary...
Large Scale Earth's Bow Shock with Northern IMF as Simulated by PIC Code in Parallel with MHD Model

Science.gov (United States)

Baraka, Suleiman

2016-06-01

In this paper, we propose a 3D kinetic model (particle-in-cell, PIC) for the description of the large scale Earth's bow shock. The proposed version is stable and does not require huge or extensive computer resources. Because PIC simulations work with scaled plasma and field parameters, we also propose to validate our code by comparing its results with the available MHD simulations under same scaled solar wind (SW) and (IMF) conditions. We report new results from the two models. In both codes the Earth's bow shock position is found to be ≈14.8 R E along the Sun-Earth line, and ≈29 R E on the dusk side. Those findings are consistent with past in situ observations. Both simulations reproduce the theoretical jump conditions at the shock. However, the PIC code density and temperature distributions are inflated and slightly shifted sunward when compared to the MHD results. Kinetic electron motions and reflected ions upstream may cause this sunward shift. Species distributions in the foreshock region are depicted within the transition of the shock (measured ≈2 c/ ω pi for Θ Bn = 90° and M MS = 4.7) and in the downstream. The size of the foot jump in the magnetic field at the shock is measured to be (1.7 c/ ω pi ). In the foreshocked region, the thermal velocity is found equal to 213 km s-1 at 15 R E and is equal to 63 km s -1 at 12 R E (magnetosheath region). Despite the large cell size of the current version of the PIC code, it is powerful to retain macrostructure of planets magnetospheres in very short time, thus it can be used for pedagogical test purposes. It is also likely complementary with MHD to deepen our understanding of the large scale magnetosphere.
Automating large-scale reactor systems

International Nuclear Information System (INIS)

Kisner, R.A.

1985-01-01

This paper conveys a philosophy for developing automated large-scale control systems that behave in an integrated, intelligent, flexible manner. Methods for operating large-scale systems under varying degrees of equipment degradation are discussed, and a design approach that separates the effort into phases is suggested. 5 refs., 1 fig
Parallel analysis tools and new visualization techniques for ultra-large climate data set

Energy Technology Data Exchange (ETDEWEB)

Middleton, Don [National Center for Atmospheric Research, Boulder, CO (United States); Haley, Mary [National Center for Atmospheric Research, Boulder, CO (United States)

2014-12-10

ParVis was a project funded under LAB 10-05: “Earth System Modeling: Advanced Scientific Visualization of Ultra-Large Climate Data Sets”. Argonne was the lead lab with partners at PNNL, SNL, NCAR and UC-Davis. This report covers progress from January 1st, 2013 through Dec 1st, 2014. Two previous reports covered the period from Summer, 2010, through September 2011 and October 2011 through December 2012, respectively. While the project was originally planned to end on April 30, 2013, personnel and priority changes allowed many of the institutions to continue work through FY14 using existing funds. A primary focus of ParVis was introducing parallelism to climate model analysis to greatly reduce the time-to-visualization for ultra-large climate data sets. Work in the first two years was conducted on two tracks with different time horizons: one track to provide immediate help to climate scientists already struggling to apply their analysis to existing large data sets and another focused on building a new data-parallel library and tool for climate analysis and visualization that will give the field a platform for performing analysis and visualization on ultra-large datasets for the foreseeable future. In the final 2 years of the project, we focused mostly on the new data-parallel library and associated tools for climate analysis and visualization.
A Parallel Distributed-Memory Particle Method Enables Acquisition-Rate Segmentation of Large Fluorescence Microscopy Images

Science.gov (United States)

Afshar, Yaser; Sbalzarini, Ivo F.

2016-01-01

Modern fluorescence microscopy modalities, such as light-sheet microscopy, are capable of acquiring large three-dimensional images at high data rate. This creates a bottleneck in computational processing and analysis of the acquired images, as the rate of acquisition outpaces the speed of processing. Moreover, images can be so large that they do not fit the main memory of a single computer. We address both issues by developing a distributed parallel algorithm for segmentation of large fluorescence microscopy images. The method is based on the versatile Discrete Region Competition algorithm, which has previously proven useful in microscopy image segmentation. The present distributed implementation decomposes the input image into smaller sub-images that are distributed across multiple computers. Using network communication, the computers orchestrate the collectively solving of the global segmentation problem. This not only enables segmentation of large images (we test images of up to 1010 pixels), but also accelerates segmentation to match the time scale of image acquisition. Such acquisition-rate image segmentation is a prerequisite for the smart microscopes of the future and enables online data compression and interactive experiments. PMID:27046144
A Parallel Distributed-Memory Particle Method Enables Acquisition-Rate Segmentation of Large Fluorescence Microscopy Images.

Directory of Open Access Journals (Sweden)

Yaser Afshar

Full Text Available Modern fluorescence microscopy modalities, such as light-sheet microscopy, are capable of acquiring large three-dimensional images at high data rate. This creates a bottleneck in computational processing and analysis of the acquired images, as the rate of acquisition outpaces the speed of processing. Moreover, images can be so large that they do not fit the main memory of a single computer. We address both issues by developing a distributed parallel algorithm for segmentation of large fluorescence microscopy images. The method is based on the versatile Discrete Region Competition algorithm, which has previously proven useful in microscopy image segmentation. The present distributed implementation decomposes the input image into smaller sub-images that are distributed across multiple computers. Using network communication, the computers orchestrate the collectively solving of the global segmentation problem. This not only enables segmentation of large images (we test images of up to 10(10 pixels, but also accelerates segmentation to match the time scale of image acquisition. Such acquisition-rate image segmentation is a prerequisite for the smart microscopes of the future and enables online data compression and interactive experiments.
Synchronization Of Parallel Discrete Event Simulations

Science.gov (United States)

Steinman, Jeffrey S.

1992-01-01

Adaptive, parallel, discrete-event-simulation-synchronization algorithm, Breathing Time Buckets, developed in Synchronous Parallel Environment for Emulation and Discrete Event Simulation (SPEEDES) operating system. Algorithm allows parallel simulations to process events optimistically in fluctuating time cycles that naturally adapt while simulation in progress. Combines best of optimistic and conservative synchronization strategies while avoiding major disadvantages. Algorithm processes events optimistically in time cycles adapting while simulation in progress. Well suited for modeling communication networks, for large-scale war games, for simulated flights of aircraft, for simulations of computer equipment, for mathematical modeling, for interactive engineering simulations, and for depictions of flows of information.
Massively Parallel Computing: A Sandia Perspective

Energy Technology Data Exchange (ETDEWEB)

Dosanjh, Sudip S.; Greenberg, David S.; Hendrickson, Bruce; Heroux, Michael A.; Plimpton, Steve J.; Tomkins, James L.; Womble, David E.

1999-05-06

The computing power available to scientists and engineers has increased dramatically in the past decade, due in part to progress in making massively parallel computing practical and available. The expectation for these machines has been great. The reality is that progress has been slower than expected. Nevertheless, massively parallel computing is beginning to realize its potential for enabling significant break-throughs in science and engineering. This paper provides a perspective on the state of the field, colored by the authors' experiences using large scale parallel machines at Sandia National Laboratories. We address trends in hardware, system software and algorithms, and we also offer our view of the forces shaping the parallel computing industry.
Implicit solvers for large-scale nonlinear problems

International Nuclear Information System (INIS)

Keyes, David E; Reynolds, Daniel R; Woodward, Carol S

2006-01-01

Computational scientists are grappling with increasingly complex, multi-rate applications that couple such physical phenomena as fluid dynamics, electromagnetics, radiation transport, chemical and nuclear reactions, and wave and material propagation in inhomogeneous media. Parallel computers with large storage capacities are paving the way for high-resolution simulations of coupled problems; however, hardware improvements alone will not prove enough to enable simulations based on brute-force algorithmic approaches. To accurately capture nonlinear couplings between dynamically relevant phenomena, often while stepping over rapid adjustments to quasi-equilibria, simulation scientists are increasingly turning to implicit formulations that require a discrete nonlinear system to be solved for each time step or steady state solution. Recent advances in iterative methods have made fully implicit formulations a viable option for solution of these large-scale problems. In this paper, we overview one of the most effective iterative methods, Newton-Krylov, for nonlinear systems and point to software packages with its implementation. We illustrate the method with an example from magnetically confined plasma fusion and briefly survey other areas in which implicit methods have bestowed important advantages, such as allowing high-order temporal integration and providing a pathway to sensitivity analyses and optimization. Lastly, we overview algorithm extensions under development motivated by current SciDAC applications
SWAP-Assembler 2: Optimization of De Novo Genome Assembler at Large Scale

Energy Technology Data Exchange (ETDEWEB)

Meng, Jintao; Seo, Sangmin; Balaji, Pavan; Wei, Yanjie; Wang, Bingqiang; Feng, Shengzhong

2016-08-16

In this paper, we analyze and optimize the most time-consuming steps of the SWAP-Assembler, a parallel genome assembler, so that it can scale to a large number of cores for huge genomes with the size of sequencing data ranging from terabyes to petabytes. According to the performance analysis results, the most time-consuming steps are input parallelization, k-mer graph construction, and graph simplification (edge merging). For the input parallelization, the input data is divided into virtual fragments with nearly equal size, and the start position and end position of each fragment are automatically separated at the beginning of the reads. In k-mer graph construction, in order to improve the communication efficiency, the message size is kept constant between any two processes by proportionally increasing the number of nucleotides to the number of processes in the input parallelization step for each round. The memory usage is also decreased because only a small part of the input data is processed in each round. With graph simplification, the communication protocol reduces the number of communication loops from four to two loops and decreases the idle communication time. The optimized assembler is denoted as SWAP-Assembler 2 (SWAP2). In our experiments using a 1000 Genomes project dataset of 4 terabytes (the largest dataset ever used for assembling) on the supercomputer Mira, the results show that SWAP2 scales to 131,072 cores with an efficiency of 40%. We also compared our work with both the HipMER assembler and the SWAP-Assembler. On the Yanhuang dataset of 300 gigabytes, SWAP2 shows a 3X speedup and 4X better scalability compared with the HipMer assembler and is 45 times faster than the SWAP-Assembler. The SWAP2 software is available at https://sourceforge.net/projects/swapassembler.
Alignment between galaxies and large-scale structure

International Nuclear Information System (INIS)

Faltenbacher, A.; Li Cheng; White, Simon D. M.; Jing, Yi-Peng; Mao Shude; Wang Jie

2009-01-01

Based on the Sloan Digital Sky Survey DR6 (SDSS) and the Millennium Simulation (MS), we investigate the alignment between galaxies and large-scale structure. For this purpose, we develop two new statistical tools, namely the alignment correlation function and the cos(2θ)-statistic. The former is a two-dimensional extension of the traditional two-point correlation function and the latter is related to the ellipticity correlation function used for cosmic shear measurements. Both are based on the cross correlation between a sample of galaxies with orientations and a reference sample which represents the large-scale structure. We apply the new statistics to the SDSS galaxy catalog. The alignment correlation function reveals an overabundance of reference galaxies along the major axes of red, luminous (L ∼ * ) galaxies out to projected separations of 60 h- 1 Mpc. The signal increases with central galaxy luminosity. No alignment signal is detected for blue galaxies. The cos(2θ)-statistic yields very similar results. Starting from a MS semi-analytic galaxy catalog, we assign an orientation to each red, luminous and central galaxy, based on that of the central region of the host halo (with size similar to that of the stellar galaxy). As an alternative, we use the orientation of the host halo itself. We find a mean projected misalignment between a halo and its central region of ∼ 25 deg. The misalignment decreases slightly with increasing luminosity of the central galaxy. Using the orientations and luminosities of the semi-analytic galaxies, we repeat our alignment analysis on mock surveys of the MS. Agreement with the SDSS results is good if the central orientations are used. Predictions using the halo orientations as proxies for central galaxy orientations overestimate the observed alignment by more than a factor of 2. Finally, the large volume of the MS allows us to generate a two-dimensional map of the alignment correlation function, which shows the reference
Integration experiences and performance studies of A COTS parallel archive systems

Energy Technology Data Exchange (ETDEWEB)

Chen, Hsing-bung [Los Alamos National Laboratory; Scott, Cody [Los Alamos National Laboratory; Grider, Bary [Los Alamos National Laboratory; Torres, Aaron [Los Alamos National Laboratory; Turley, Milton [Los Alamos National Laboratory; Sanchez, Kathy [Los Alamos National Laboratory; Bremer, John [Los Alamos National Laboratory

2010-01-01

Current and future Archive Storage Systems have been asked to (a) scale to very high bandwidths, (b) scale in metadata performance, (c) support policy-based hierarchical storage management capability, (d) scale in supporting changing needs of very large data sets, (e) support standard interface, and (f) utilize commercial-off-the-shelf(COTS) hardware. Parallel file systems have been asked to do the same thing but at one or more orders of magnitude faster in performance. Archive systems continue to move closer to file systems in their design due to the need for speed and bandwidth, especially metadata searching speeds such as more caching and less robust semantics. Currently the number of extreme highly scalable parallel archive solutions is very small especially those that will move a single large striped parallel disk file onto many tapes in parallel. We believe that a hybrid storage approach of using COTS components and innovative software technology can bring new capabilities into a production environment for the HPC community much faster than the approach of creating and maintaining a complete end-to-end unique parallel archive software solution. In this paper, we relay our experience of integrating a global parallel file system and a standard backup/archive product with a very small amount of additional code to provide a scalable, parallel archive. Our solution has a high degree of overlap with current parallel archive products including (a) doing parallel movement to/from tape for a single large parallel file, (b) hierarchical storage management, (c) ILM features, (d) high volume (non-single parallel file) archives for backup/archive/content management, and (e) leveraging all free file movement tools in Linux such as copy, move, ls, tar, etc. We have successfully applied our working COTS Parallel Archive System to the current world's first petaflop/s computing system, LANL's Roadrunner, and demonstrated its capability to address requirements of
Integration experiments and performance studies of a COTS parallel archive system

Energy Technology Data Exchange (ETDEWEB)

Chen, Hsing-bung [Los Alamos National Laboratory; Scott, Cody [Los Alamos National Laboratory; Grider, Gary [Los Alamos National Laboratory; Torres, Aaron [Los Alamos National Laboratory; Turley, Milton [Los Alamos National Laboratory; Sanchez, Kathy [Los Alamos National Laboratory; Bremer, John [Los Alamos National Laboratory

2010-06-16

Current and future Archive Storage Systems have been asked to (a) scale to very high bandwidths, (b) scale in metadata performance, (c) support policy-based hierarchical storage management capability, (d) scale in supporting changing needs of very large data sets, (e) support standard interface, and (f) utilize commercial-off-the-shelf (COTS) hardware. Parallel file systems have been asked to do the same thing but at one or more orders of magnitude faster in performance. Archive systems continue to move closer to file systems in their design due to the need for speed and bandwidth, especially metadata searching speeds such as more caching and less robust semantics. Currently the number of extreme highly scalable parallel archive solutions is very small especially those that will move a single large striped parallel disk file onto many tapes in parallel. We believe that a hybrid storage approach of using COTS components and innovative software technology can bring new capabilities into a production environment for the HPC community much faster than the approach of creating and maintaining a complete end-to-end unique parallel archive software solution. In this paper, we relay our experience of integrating a global parallel file system and a standard backup/archive product with a very small amount of additional code to provide a scalable, parallel archive. Our solution has a high degree of overlap with current parallel archive products including (a) doing parallel movement to/from tape for a single large parallel file, (b) hierarchical storage management, (c) ILM features, (d) high volume (non-single parallel file) archives for backup/archive/content management, and (e) leveraging all free file movement tools in Linux such as copy, move, Is, tar, etc. We have successfully applied our working COTS Parallel Archive System to the current world's first petafiop/s computing system, LANL's Roadrunner machine, and demonstrated its capability to address
Optimization of large-scale industrial systems : an emerging method

Energy Technology Data Exchange (ETDEWEB)

Hammache, A.; Aube, F.; Benali, M.; Cantave, R. [Natural Resources Canada, Varennes, PQ (Canada). CANMET Energy Technology Centre

2006-07-01

This paper reviewed optimization methods of large-scale industrial production systems and presented a novel systematic multi-objective and multi-scale optimization methodology. The methodology was based on a combined local optimality search with global optimality determination, and advanced system decomposition and constraint handling. The proposed method focused on the simultaneous optimization of the energy, economy and ecology aspects of industrial systems (E{sup 3}-ISO). The aim of the methodology was to provide guidelines for decision-making strategies. The approach was based on evolutionary algorithms (EA) with specifications including hybridization of global optimality determination with a local optimality search; a self-adaptive algorithm to account for the dynamic changes of operating parameters and design variables occurring during the optimization process; interactive optimization; advanced constraint handling and decomposition strategy; and object-oriented programming and parallelization techniques. Flowcharts of the working principles of the basic EA were presented. It was concluded that the EA uses a novel decomposition and constraint handling technique to enhance the Pareto solution search procedure for multi-objective problems. 6 refs., 9 figs.
Parallel Computing Strategies for Irregular Algorithms

Science.gov (United States)

Biswas, Rupak; Oliker, Leonid; Shan, Hongzhang; Biegel, Bryan (Technical Monitor)

2002-01-01

Parallel computing promises several orders of magnitude increase in our ability to solve realistic computationally-intensive problems, but relies on their efficient mapping and execution on large-scale multiprocessor architectures. Unfortunately, many important applications are irregular and dynamic in nature, making their effective parallel implementation a daunting task. Moreover, with the proliferation of parallel architectures and programming paradigms, the typical scientist is faced with a plethora of questions that must be answered in order to obtain an acceptable parallel implementation of the solution algorithm. In this paper, we consider three representative irregular applications: unstructured remeshing, sparse matrix computations, and N-body problems, and parallelize them using various popular programming paradigms on a wide spectrum of computer platforms ranging from state-of-the-art supercomputers to PC clusters. We present the underlying problems, the solution algorithms, and the parallel implementation strategies. Smart load-balancing, partitioning, and ordering techniques are used to enhance parallel performance. Overall results demonstrate the complexity of efficiently parallelizing irregular algorithms.
Efficient parallel and out of core algorithms for constructing large bi-directed de Bruijn graphs

Directory of Open Access Journals (Sweden)

Vaughn Matthew

2010-11-01

-directed de Bruijn graph is a fundamental data structure for any sequence assembly program based on Eulerian approach. Our algorithms for constructing Bi-directed de Bruijn graphs are efficient in parallel and out of core settings. These algorithms can be used in building large scale bi-directed de Bruijn graphs. Furthermore, our algorithms do not employ any all-to-all communications in a parallel setting and perform better than the prior algorithms. Finally our out-of-core algorithm is extremely memory efficient and can replace the existing graph construction algorithm in VELVET.
Efficient parallel and out of core algorithms for constructing large bi-directed de Bruijn graphs.

Science.gov (United States)

Kundeti, Vamsi K; Rajasekaran, Sanguthevar; Dinh, Hieu; Vaughn, Matthew; Thapar, Vishal

2010-11-15

any sequence assembly program based on Eulerian approach. Our algorithms for constructing Bi-directed de Bruijn graphs are efficient in parallel and out of core settings. These algorithms can be used in building large scale bi-directed de Bruijn graphs. Furthermore, our algorithms do not employ any all-to-all communications in a parallel setting and perform better than the prior algorithms. Finally our out-of-core algorithm is extremely memory efficient and can replace the existing graph construction algorithm in VELVET.
Large-scale structure after COBE: Peculiar velocities and correlations of cold dark matter halos

Science.gov (United States)

Zurek, Wojciech H.; Quinn, Peter J.; Salmon, John K.; Warren, Michael S.

1994-01-01

Large N-body simulations on parallel supercomputers allow one to simultaneously investigate large-scale structure and the formation of galactic halos with unprecedented resolution. Our study shows that the masses as well as the spatial distribution of halos on scales of tens of megaparsecs in a cold dark matter (CDM) universe with the spectrum normalized to the anisotropies detected by Cosmic Background Explorer (COBE) is compatible with the observations. We also show that the average value of the relative pairwise velocity dispersion sigma(sub v) - used as a principal argument against COBE-normalized CDM models-is significantly lower for halos than for individual particles. When the observational methods of extracting sigma(sub v) are applied to the redshift catalogs obtained from the numerical experiments, estimates differ significantly between different observation-sized samples and overlap observational estimates obtained following the same procedure.
Eighth SIAM conference on parallel processing for scientific computing: Final program and abstracts

Energy Technology Data Exchange (ETDEWEB)

NONE

1997-12-31

This SIAM conference is the premier forum for developments in parallel numerical algorithms, a field that has seen very lively and fruitful developments over the past decade, and whose health is still robust. Themes for this conference were: combinatorial optimization; data-parallel languages; large-scale parallel applications; message-passing; molecular modeling; parallel I/O; parallel libraries; parallel software tools; parallel compilers; particle simulations; problem-solving environments; and sparse matrix computations.
A hybrid parallel framework for the cellular Potts model simulations

Energy Technology Data Exchange (ETDEWEB)

Jiang, Yi [Los Alamos National Laboratory; He, Kejing [SOUTH CHINA UNIV; Dong, Shoubin [SOUTH CHINA UNIV

2009-01-01

The Cellular Potts Model (CPM) has been widely used for biological simulations. However, most current implementations are either sequential or approximated, which can't be used for large scale complex 3D simulation. In this paper we present a hybrid parallel framework for CPM simulations. The time-consuming POE solving, cell division, and cell reaction operation are distributed to clusters using the Message Passing Interface (MPI). The Monte Carlo lattice update is parallelized on shared-memory SMP system using OpenMP. Because the Monte Carlo lattice update is much faster than the POE solving and SMP systems are more and more common, this hybrid approach achieves good performance and high accuracy at the same time. Based on the parallel Cellular Potts Model, we studied the avascular tumor growth using a multiscale model. The application and performance analysis show that the hybrid parallel framework is quite efficient. The hybrid parallel CPM can be used for the large scale simulation ({approx}10{sup 8} sites) of complex collective behavior of numerous cells ({approx}10{sup 6}).

Large scale three-dimensional topology optimisation of heat sinks cooled by natural convection

DEFF Research Database (Denmark)

Alexandersen, Joe; Sigmund, Ole; Aage, Niels

2016-01-01

the Bousinessq approximation. The fully coupled non-linear multiphysics system is solved using stabilised trilinear equal-order finite elements in a parallel framework allowing for the optimisation of large scale problems with order of 20-330 million state degrees of freedom. The flow is assumed to be laminar...... topologies verify prior conclusions regarding fin length/thickness ratios and Biot numbers, but also indicate that carefully tailored and complex geometries may improve cooling behaviour considerably compared to simple heat fin geometries. (C) 2016 Elsevier Ltd. All rights reserved....
Xyce parallel electronic simulator release notes.

Energy Technology Data Exchange (ETDEWEB)

Keiter, Eric R; Hoekstra, Robert John; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Rankin, Eric Lamont; Coffey, Todd S; Pawlowski, Roger P; Santarelli, Keith R.

2010-05-01

The Xyce Parallel Electronic Simulator has been written to support, in a rigorous manner, the simulation needs of the Sandia National Laboratories electrical designers. Specific requirements include, among others, the ability to solve extremely large circuit problems by supporting large-scale parallel computing platforms, improved numerical performance and object-oriented code design and implementation. The Xyce release notes describe: Hardware and software requirements New features and enhancements Any defects fixed since the last release Current known defects and defect workarounds For up-to-date information not available at the time these notes were produced, please visit the Xyce web page at http://www.cs.sandia.gov/xyce.
The Software Reliability of Large Scale Integration Circuit and Very Large Scale Integration Circuit

OpenAIRE

Artem Ganiyev; Jan Vitasek

2010-01-01

This article describes evaluation method of faultless function of large scale integration circuits (LSI) and very large scale integration circuits (VLSI). In the article there is a comparative analysis of factors which determine faultless of integrated circuits, analysis of already existing methods and model of faultless function evaluation of LSI and VLSI. The main part describes a proposed algorithm and program for analysis of fault rate in LSI and VLSI circuits.
Scaling Behavior of Dilute Polymer Solutions Confined between Parallel Plates

NARCIS (Netherlands)

Vliet, J.H. van; Luyten, M.C.; Brinke, G. ten

1992-01-01

The average size and shape of a polymer coil confined in a slit between two parallel plates depends on the distance L between the plates. On the basis of numerical results, four different regimes can be distingubhed. For large values of L the coil is essentially unconfined. For intermediate values
Coupling graph perturbation theory with scalable parallel algorithms for large-scale enumeration of maximal cliques in biological graphs

International Nuclear Information System (INIS)

Samatova, N F; Schmidt, M C; Hendrix, W; Breimyer, P; Thomas, K; Park, B-H

2008-01-01

Data-driven construction of predictive models for biological systems faces challenges from data intensity, uncertainty, and computational complexity. Data-driven model inference is often considered a combinatorial graph problem where an enumeration of all feasible models is sought. The data-intensive and the NP-hard nature of such problems, however, challenges existing methods to meet the required scale of data size and uncertainty, even on modern supercomputers. Maximal clique enumeration (MCE) in a graph derived from such biological data is often a rate-limiting step in detecting protein complexes in protein interaction data, finding clusters of co-expressed genes in microarray data, or identifying clusters of orthologous genes in protein sequence data. We report two key advances that address this challenge. We designed and implemented the first (to the best of our knowledge) parallel MCE algorithm that scales linearly on thousands of processors running MCE on real-world biological networks with thousands and hundreds of thousands of vertices. In addition, we proposed and developed the Graph Perturbation Theory (GPT) that establishes a foundation for efficiently solving the MCE problem in perturbed graphs, which model the uncertainty in the data. GPT formulates necessary and sufficient conditions for detecting the differences between the sets of maximal cliques in the original and perturbed graphs and reduces the enumeration time by more than 80% compared to complete recomputation
Apparatus to examine pulsed parallel field losses in large conductors

International Nuclear Information System (INIS)

Miller, J.R.; Shen, S.S.

1977-01-01

Conductors in tokamak toroidal field coils will be exposed to pulsed fields both parallel and perpendicular to the current direction. These conductors will likely be quite high capacity (10 to 20 kA) and therefore probably will be built up out of smaller units. We have previously published measurements of losses in conductors exposed to a pulsed parallel field, but those experiments necessarily used monolithic conductors of relatively small cross section because the pulse coil, a torus that surrounded the test conductor, was itself small. Here we describe an apparatus that is conceptually similar but has been scaled up to accept conductors of much larger cross section and current capacity. The apparatus consists basically of a superconducting torus that contains a movable spool to allow test samples to be wound inside without unwinding the torus. Details of apparatus design and capabilities are described and preliminary results from tests of the apparatus and from loss measurements using it are reported
Accurate and Efficient Parallel Implementation of an Effective Linear-Scaling Direct Random Phase Approximation Method.

Science.gov (United States)

Graf, Daniel; Beuerle, Matthias; Schurkus, Henry F; Luenser, Arne; Savasci, Gökcen; Ochsenfeld, Christian

2018-05-08

An efficient algorithm for calculating the random phase approximation (RPA) correlation energy is presented that is as accurate as the canonical molecular orbital resolution-of-the-identity RPA (RI-RPA) with the important advantage of an effective linear-scaling behavior (instead of quartic) for large systems due to a formulation in the local atomic orbital space. The high accuracy is achieved by utilizing optimized minimax integration schemes and the local Coulomb metric attenuated by the complementary error function for the RI approximation. The memory bottleneck of former atomic orbital (AO)-RI-RPA implementations ( Schurkus, H. F.; Ochsenfeld, C. J. Chem. Phys. 2016 , 144 , 031101 and Luenser, A.; Schurkus, H. F.; Ochsenfeld, C. J. Chem. Theory Comput. 2017 , 13 , 1647 - 1655 ) is addressed by precontraction of the large 3-center integral matrix with the Cholesky factors of the ground state density reducing the memory requirements of that matrix by a factor of [Formula: see text]. Furthermore, we present a parallel implementation of our method, which not only leads to faster RPA correlation energy calculations but also to a scalable decrease in memory requirements, opening the door for investigations of large molecules even on small- to medium-sized computing clusters. Although it is known that AO methods are highly efficient for extended systems, where sparsity allows for reaching the linear-scaling regime, we show that our work also extends the applicability when considering highly delocalized systems for which no linear scaling can be achieved. As an example, the interlayer distance of two covalent organic framework pore fragments (comprising 384 atoms in total) is analyzed.
Phylogenetic distribution of large-scale genome patchiness

Directory of Open Access Journals (Sweden)

Hackenberg Michael

2008-04-01

Full Text Available Abstract Background The phylogenetic distribution of large-scale genome structure (i.e. mosaic compositional patchiness has been explored mainly by analytical ultracentrifugation of bulk DNA. However, with the availability of large, good-quality chromosome sequences, and the recently developed computational methods to directly analyze patchiness on the genome sequence, an evolutionary comparative analysis can be carried out at the sequence level. Results The local variations in the scaling exponent of the Detrended Fluctuation Analysis are used here to analyze large-scale genome structure and directly uncover the characteristic scales present in genome sequences. Furthermore, through shuffling experiments of selected genome regions, computationally-identified, isochore-like regions were identified as the biological source for the uncovered large-scale genome structure. The phylogenetic distribution of short- and large-scale patchiness was determined in the best-sequenced genome assemblies from eleven eukaryotic genomes: mammals (Homo sapiens, Pan troglodytes, Mus musculus, Rattus norvegicus, and Canis familiaris, birds (Gallus gallus, fishes (Danio rerio, invertebrates (Drosophila melanogaster and Caenorhabditis elegans, plants (Arabidopsis thaliana and yeasts (Saccharomyces cerevisiae. We found large-scale patchiness of genome structure, associated with in silico determined, isochore-like regions, throughout this wide phylogenetic range. Conclusion Large-scale genome structure is detected by directly analyzing DNA sequences in a wide range of eukaryotic chromosome sequences, from human to yeast. In all these genomes, large-scale patchiness can be associated with the isochore-like regions, as directly detected in silico at the sequence level.
Managing large-scale models: DBS

International Nuclear Information System (INIS)

1981-05-01

A set of fundamental management tools for developing and operating a large scale model and data base system is presented. Based on experience in operating and developing a large scale computerized system, the only reasonable way to gain strong management control of such a system is to implement appropriate controls and procedures. Chapter I discusses the purpose of the book. Chapter II classifies a broad range of generic management problems into three groups: documentation, operations, and maintenance. First, system problems are identified then solutions for gaining management control are disucssed. Chapters III, IV, and V present practical methods for dealing with these problems. These methods were developed for managing SEAS but have general application for large scale models and data bases
Large Scale Self-Organizing Information Distribution System

National Research Council Canada - National Science Library

Low, Steven

2005-01-01

This project investigates issues in "large-scale" networks. Here "large-scale" refers to networks with large number of high capacity nodes and transmission links, and shared by a large number of users...
Self-assembly of highly fluorescent semiconductor nanorods into large scale smectic liquid crystal structures by coffee stain evaporation dynamics

International Nuclear Information System (INIS)

Nobile, Concetta; Carbone, Luigi; Fiore, Angela; Cingolani, Roberto; Manna, Liberato; Krahne, Roman

2009-01-01

We deposit droplets of nanorods dispersed in solvents on substrate surfaces and let the solvent evaporate. We find that strong contact line pinning leads to dense nanorod deposition inside coffee stain fringes, where we observe large scale lateral ordering of the nanorods with the long axis of the rods oriented parallel to the contact line. We observe birefringence of these coffee stain fringes by polarized microscopy and we find the direction of the extraordinary refractive index parallel to the long axis of the nanorods.
Development of a large-scale general purpose two-phase flow analysis code

International Nuclear Information System (INIS)

Terasaka, Haruo; Shimizu, Sensuke

2001-01-01

A general purpose three-dimensional two-phase flow analysis code has been developed for solving large-scale problems in industrial fields. The code uses a two-fluid model to describe the conservation equations for two-phase flow in order to be applicable to various phenomena. Complicated geometrical conditions are modeled by FAVOR method in structured grid systems, and the discretization equations are solved by a modified SIMPLEST scheme. To reduce computing time a matrix solver for the pressure correction equation is parallelized with OpenMP. Results of numerical examples show that the accurate solutions can be obtained efficiently and stably. (author)
Highly uniform parallel microfabrication using a large numerical aperture system

Energy Technology Data Exchange (ETDEWEB)

Zhang, Zi-Yu; Su, Ya-Hui, E-mail: ustcsyh@ahu.edu.cn, E-mail: dongwu@ustc.edu.cn [School of Electrical Engineering and Automation, Anhui University, Hefei 230601 (China); Zhang, Chen-Chu; Hu, Yan-Lei; Wang, Chao-Wei; Li, Jia-Wen; Chu, Jia-Ru; Wu, Dong, E-mail: ustcsyh@ahu.edu.cn, E-mail: dongwu@ustc.edu.cn [CAS Key Laboratory of Mechanical Behavior and Design of Materials, Department of Precision Machinery and Precision Instrumentation, University of Science and Technology of China, Hefei 230026 (China)

2016-07-11

In this letter, we report an improved algorithm to produce accurate phase patterns for generating highly uniform diffraction-limited multifocal arrays in a large numerical aperture objective system. It is shown that based on the original diffraction integral, the uniformity of the diffraction-limited focal arrays can be improved from ∼75% to >97%, owing to the critical consideration of the aperture function and apodization effect associated with a large numerical aperture objective. The experimental results, e.g., 3 × 3 arrays of square and triangle, seven microlens arrays with high uniformity, further verify the advantage of the improved algorithm. This algorithm enables the laser parallel processing technology to realize uniform microstructures and functional devices in the microfabrication system with a large numerical aperture objective.
Visual Interfaces for Parallel Simulations (VIPS), Phase I

Data.gov (United States)

National Aeronautics and Space Administration — Configuring the 3D geometry and physics of large scale parallel physics simulations is increasingly complex. Given the investment in time and effort to run these...
Large scale structure and baryogenesis

International Nuclear Information System (INIS)

Kirilova, D.P.; Chizhov, M.V.

2001-08-01

We discuss a possible connection between the large scale structure formation and the baryogenesis in the universe. An update review of the observational indications for the presence of a very large scale 120h -1 Mpc in the distribution of the visible matter of the universe is provided. The possibility to generate a periodic distribution with the characteristic scale 120h -1 Mpc through a mechanism producing quasi-periodic baryon density perturbations during inflationary stage, is discussed. The evolution of the baryon charge density distribution is explored in the framework of a low temperature boson condensate baryogenesis scenario. Both the observed very large scale of a the visible matter distribution in the universe and the observed baryon asymmetry value could naturally appear as a result of the evolution of a complex scalar field condensate, formed at the inflationary stage. Moreover, for some model's parameters a natural separation of matter superclusters from antimatter ones can be achieved. (author)
A Parallel Algorithm for Connected Component Labelling of Gray-scale Images on Homogeneous Multicore Architectures

International Nuclear Information System (INIS)

Niknam, Mehdi; Thulasiraman, Parimala; Camorlinga, Sergio

2010-01-01

Connected component labelling is an essential step in image processing. We provide a parallel version of Suzuki's sequential connected component algorithm in order to speed up the labelling process. Also, we modify the algorithm to enable labelling gray-scale images. Due to the data dependencies in the algorithm we used a method similar to pipeline to exploit parallelism. The parallel algorithm method achieved a speedup of 2.5 for image size of 256 x 256 pixels using 4 processing threads.
Automatic management software for large-scale cluster system

International Nuclear Information System (INIS)

Weng Yunjian; Chinese Academy of Sciences, Beijing; Sun Gongxing

2007-01-01

At present, the large-scale cluster system faces to the difficult management. For example the manager has large work load. It needs to cost much time on the management and the maintenance of large-scale cluster system. The nodes in large-scale cluster system are very easy to be chaotic. Thousands of nodes are put in big rooms so that some managers are very easy to make the confusion with machines. How do effectively carry on accurate management under the large-scale cluster system? The article introduces ELFms in the large-scale cluster system. Furthermore, it is proposed to realize the large-scale cluster system automatic management. (authors)
A review of advanced small-scale parallel bioreactor technology for accelerated process development: current state and future need.

Science.gov (United States)

Bareither, Rachel; Pollard, David

2011-01-01

The pharmaceutical and biotech industries face continued pressure to reduce development costs and accelerate process development. This challenge occurs alongside the need for increased upstream experimentation to support quality by design initiatives and the pursuit of predictive models from systems biology. A small scale system enabling multiple reactions in parallel (n ≥ 20), with automated sampling and integrated to purification, would provide significant improvement (four to fivefold) to development timelines. State of the art attempts to pursue high throughput process development include shake flasks, microfluidic reactors, microtiter plates and small-scale stirred reactors. The limitations of these systems are compared to desired criteria to mimic large scale commercial processes. The comparison shows that significant technological improvement is still required to provide automated solutions that can speed upstream process development. Copyright © 2010 American Institute of Chemical Engineers (AIChE).
Parallel finite elements with domain decomposition and its pre-processing

International Nuclear Information System (INIS)

Yoshida, A.; Yagawa, G.; Hamada, S.

1993-01-01

This paper describes a parallel finite element analysis using a domain decomposition method, and the pre-processing for the parallel calculation. Computer simulations are about to replace experiments in various fields, and the scale of model to be simulated tends to be extremely large. On the other hand, computational environment has drastically changed in these years. Especially, parallel processing on massively parallel computers or computer networks is considered to be promising techniques. In order to achieve high efficiency on such parallel computation environment, large granularity of tasks, a well-balanced workload distribution are key issues. It is also important to reduce the cost of pre-processing in such parallel FEM. From the point of view, the authors developed the domain decomposition FEM with the automatic and dynamic task-allocation mechanism and the automatic mesh generation/domain subdivision system for it. (author)
Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets.

Science.gov (United States)

Shrimankar, D D; Sathe, S R

2016-01-01

Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today's supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures.

Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets

Science.gov (United States)

Shrimankar, D. D.; Sathe, S. R.

2016-01-01

Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today’s supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures. PMID:27932868
Parallel-In-Time For Moving Meshes

Energy Technology Data Exchange (ETDEWEB)

Falgout, R. D. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Manteuffel, T. A. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Southworth, B. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Schroder, J. B. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

2016-02-04

With steadily growing computational resources available, scientists must develop e ective ways to utilize the increased resources. High performance, highly parallel software has be- come a standard. However until recent years parallelism has focused primarily on the spatial domain. When solving a space-time partial di erential equation (PDE), this leads to a sequential bottleneck in the temporal dimension, particularly when taking a large number of time steps. The XBraid parallel-in-time library was developed as a practical way to add temporal parallelism to existing se- quential codes with only minor modi cations. In this work, a rezoning-type moving mesh is applied to a di usion problem and formulated in a parallel-in-time framework. Tests and scaling studies are run using XBraid and demonstrate excellent results for the simple model problem considered herein.
Large scale network-centric distributed systems

CERN Document Server

Sarbazi-Azad, Hamid

2014-01-01

A highly accessible reference offering a broad range of topics and insights on large scale network-centric distributed systems Evolving from the fields of high-performance computing and networking, large scale network-centric distributed systems continues to grow as one of the most important topics in computing and communication and many interdisciplinary areas. Dealing with both wired and wireless networks, this book focuses on the design and performance issues of such systems. Large Scale Network-Centric Distributed Systems provides in-depth coverage ranging from ground-level hardware issu
Large-Scale Outflows in Seyfert Galaxies

Science.gov (United States)

Colbert, E. J. M.; Baum, S. A.

1995-12-01

\\catcode`\\@=11 \\ialign{m @th#1hfil ##hfil \\crcr#2\\crcr\\sim\\crcr}}} \\catcode`\\@=12 Highly collimated outflows extend out to Mpc scales in many radio-loud active galaxies. In Seyfert galaxies, which are radio-quiet, the outflows extend out to kpc scales and do not appear to be as highly collimated. In order to study the nature of large-scale (>~1 kpc) outflows in Seyferts, we have conducted optical, radio and X-ray surveys of a distance-limited sample of 22 edge-on Seyfert galaxies. Results of the optical emission-line imaging and spectroscopic survey imply that large-scale outflows are present in >~{{1} /{4}} of all Seyferts. The radio (VLA) and X-ray (ROSAT) surveys show that large-scale radio and X-ray emission is present at about the same frequency. Kinetic luminosities of the outflows in Seyferts are comparable to those in starburst-driven superwinds. Large-scale radio sources in Seyferts appear diffuse, but do not resemble radio halos found in some edge-on starburst galaxies (e.g. M82). We discuss the feasibility of the outflows being powered by the active nucleus (e.g. a jet) or a circumnuclear starburst.
GENESIS: a hybrid-parallel and multi-scale molecular dynamics simulator with enhanced sampling algorithms for biomolecular and cellular simulations.

Science.gov (United States)

Jung, Jaewoon; Mori, Takaharu; Kobayashi, Chigusa; Matsunaga, Yasuhiro; Yoda, Takao; Feig, Michael; Sugita, Yuji

2015-07-01

GENESIS (Generalized-Ensemble Simulation System) is a new software package for molecular dynamics (MD) simulations of macromolecules. It has two MD simulators, called ATDYN and SPDYN. ATDYN is parallelized based on an atomic decomposition algorithm for the simulations of all-atom force-field models as well as coarse-grained Go-like models. SPDYN is highly parallelized based on a domain decomposition scheme, allowing large-scale MD simulations on supercomputers. Hybrid schemes combining OpenMP and MPI are used in both simulators to target modern multicore computer architectures. Key advantages of GENESIS are (1) the highly parallel performance of SPDYN for very large biological systems consisting of more than one million atoms and (2) the availability of various REMD algorithms (T-REMD, REUS, multi-dimensional REMD for both all-atom and Go-like models under the NVT, NPT, NPAT, and NPγT ensembles). The former is achieved by a combination of the midpoint cell method and the efficient three-dimensional Fast Fourier Transform algorithm, where the domain decomposition space is shared in real-space and reciprocal-space calculations. Other features in SPDYN, such as avoiding concurrent memory access, reducing communication times, and usage of parallel input/output files, also contribute to the performance. We show the REMD simulation results of a mixed (POPC/DMPC) lipid bilayer as a real application using GENESIS. GENESIS is released as free software under the GPLv2 licence and can be easily modified for the development of new algorithms and molecular models. WIREs Comput Mol Sci 2015, 5:310-323. doi: 10.1002/wcms.1220.
Large-scale transportation network congestion evolution prediction using deep learning theory.

Science.gov (United States)

Ma, Xiaolei; Yu, Haiyang; Wang, Yunpeng; Wang, Yinhai

2015-01-01

Understanding how congestion at one location can cause ripples throughout large-scale transportation network is vital for transportation researchers and practitioners to pinpoint traffic bottlenecks for congestion mitigation. Traditional studies rely on either mathematical equations or simulation techniques to model traffic congestion dynamics. However, most of the approaches have limitations, largely due to unrealistic assumptions and cumbersome parameter calibration process. With the development of Intelligent Transportation Systems (ITS) and Internet of Things (IoT), transportation data become more and more ubiquitous. This triggers a series of data-driven research to investigate transportation phenomena. Among them, deep learning theory is considered one of the most promising techniques to tackle tremendous high-dimensional data. This study attempts to extend deep learning theory into large-scale transportation network analysis. A deep Restricted Boltzmann Machine and Recurrent Neural Network architecture is utilized to model and predict traffic congestion evolution based on Global Positioning System (GPS) data from taxi. A numerical study in Ningbo, China is conducted to validate the effectiveness and efficiency of the proposed method. Results show that the prediction accuracy can achieve as high as 88% within less than 6 minutes when the model is implemented in a Graphic Processing Unit (GPU)-based parallel computing environment. The predicted congestion evolution patterns can be visualized temporally and spatially through a map-based platform to identify the vulnerable links for proactive congestion mitigation.
SCALE INTERACTION IN A MIXING LAYER. THE ROLE OF THE LARGE-SCALE GRADIENTS

KAUST Repository

Fiscaletti, Daniele

2015-08-23

The interaction between scales is investigated in a turbulent mixing layer. The large-scale amplitude modulation of the small scales already observed in other works depends on the crosswise location. Large-scale positive fluctuations correlate with a stronger activity of the small scales on the low speed-side of the mixing layer, and a reduced activity on the high speed-side. However, from physical considerations we would expect the scales to interact in a qualitatively similar way within the flow and across different turbulent flows. Therefore, instead of the large-scale fluctuations, the large-scale gradients modulation of the small scales has been additionally investigated.
A Web-based Distributed Voluntary Computing Platform for Large Scale Hydrological Computations

Science.gov (United States)

Demir, I.; Agliamzanov, R.

2014-12-01

Distributed volunteer computing can enable researchers and scientist to form large parallel computing environments to utilize the computing power of the millions of computers on the Internet, and use them towards running large scale environmental simulations and models to serve the common good of local communities and the world. Recent developments in web technologies and standards allow client-side scripting languages to run at speeds close to native application, and utilize the power of Graphics Processing Units (GPU). Using a client-side scripting language like JavaScript, we have developed an open distributed computing framework that makes it easy for researchers to write their own hydrologic models, and run them on volunteer computers. Users will easily enable their websites for visitors to volunteer sharing their computer resources to contribute running advanced hydrological models and simulations. Using a web-based system allows users to start volunteering their computational resources within seconds without installing any software. The framework distributes the model simulation to thousands of nodes in small spatial and computational sizes. A relational database system is utilized for managing data connections and queue management for the distributed computing nodes. In this paper, we present a web-based distributed volunteer computing platform to enable large scale hydrological simulations and model runs in an open and integrated environment.
A Parallel, Multi-Scale Watershed-Hydrologic-Inundation Model with Adaptively Switching Mesh for Capturing Flooding and Lake Dynamics

Science.gov (United States)

Ji, X.; Shen, C.

2017-12-01

Flood inundation presents substantial societal hazards and also changes biogeochemistry for systems like the Amazon. It is often expensive to simulate high-resolution flood inundation and propagation in a long-term watershed-scale model. Due to the Courant-Friedrichs-Lewy (CFL) restriction, high resolution and large local flow velocity both demand prohibitively small time steps even for parallel codes. Here we develop a parallel surface-subsurface process-based model enhanced by multi-resolution meshes that are adaptively switched on or off. The high-resolution overland flow meshes are enabled only when the flood wave invades to floodplains. This model applies semi-implicit, semi-Lagrangian (SISL) scheme in solving dynamic wave equations, and with the assistant of the multi-mesh method, it also adaptively chooses the dynamic wave equation only in the area of deep inundation. Therefore, the model achieves a balance between accuracy and computational cost.
Dissecting the large-scale galactic conformity

Science.gov (United States)

Seo, Seongu

2018-01-01

Galactic conformity is an observed phenomenon that galaxies located in the same region have similar properties such as star formation rate, color, gas fraction, and so on. The conformity was first observed among galaxies within in the same halos (“one-halo conformity”). The one-halo conformity can be readily explained by mutual interactions among galaxies within a halo. Recent observations however further witnessed a puzzling connection among galaxies with no direct interaction. In particular, galaxies located within a sphere of ~5 Mpc radius tend to show similarities, even though the galaxies do not share common halos with each other ("two-halo conformity" or “large-scale conformity”). Using a cosmological hydrodynamic simulation, Illustris, we investigate the physical origin of the two-halo conformity and put forward two scenarios. First, back-splash galaxies are likely responsible for the large-scale conformity. They have evolved into red galaxies due to ram-pressure stripping in a given galaxy cluster and happen to reside now within a ~5 Mpc sphere. Second, galaxies in strong tidal field induced by large-scale structure also seem to give rise to the large-scale conformity. The strong tides suppress star formation in the galaxies. We discuss the importance of the large-scale conformity in the context of galaxy evolution.
Fast electrostatic force calculation on parallel computer clusters

International Nuclear Information System (INIS)

Kia, Amirali; Kim, Daejoong; Darve, Eric

2008-01-01

The fast multipole method (FMM) and smooth particle mesh Ewald (SPME) are well known fast algorithms to evaluate long range electrostatic interactions in molecular dynamics and other fields. FMM is a multi-scale method which reduces the computation cost by approximating the potential due to a group of particles at a large distance using few multipole functions. This algorithm scales like O(N) for N particles. SPME algorithm is an O(NlnN) method which is based on an interpolation of the Fourier space part of the Ewald sum and evaluating the resulting convolutions using fast Fourier transform (FFT). Those algorithms suffer from relatively poor efficiency on large parallel machines especially for mid-size problems around hundreds of thousands of atoms. A variation of the FMM, called PWA, based on plane wave expansions is presented in this paper. A new parallelization strategy for PWA, which takes advantage of the specific form of this expansion, is described. Its parallel efficiency is compared with SPME through detail time measurements on two different computer clusters
Analysis of passive scalar advection in parallel shear flows: Sorting of modes at intermediate time scales

Science.gov (United States)

Camassa, Roberto; McLaughlin, Richard M.; Viotti, Claudio

2010-11-01

The time evolution of a passive scalar advected by parallel shear flows is studied for a class of rapidly varying initial data. Such situations are of practical importance in a wide range of applications from microfluidics to geophysics. In these contexts, it is well-known that the long-time evolution of the tracer concentration is governed by Taylor's asymptotic theory of dispersion. In contrast, we focus here on the evolution of the tracer at intermediate time scales. We show how intermediate regimes can be identified before Taylor's, and in particular, how the Taylor regime can be delayed indefinitely by properly manufactured initial data. A complete characterization of the sorting of these time scales and their associated spatial structures is presented. These analytical predictions are compared with highly resolved numerical simulations. Specifically, this comparison is carried out for the case of periodic variations in the streamwise direction on the short scale with envelope modulations on the long scales, and show how this structure can lead to "anomalously" diffusive transients in the evolution of the scalar onto the ultimate regime governed by Taylor dispersion. Mathematically, the occurrence of these transients can be viewed as a competition in the asymptotic dominance between large Péclet (Pe) numbers and the long/short scale aspect ratios (LVel/LTracer≡k), two independent nondimensional parameters of the problem. We provide analytical predictions of the associated time scales by a modal analysis of the eigenvalue problem arising in the separation of variables of the governing advection-diffusion equation. The anomalous time scale in the asymptotic limit of large k Pe is derived for the short scale periodic structure of the scalar's initial data, for both exactly solvable cases and in general with WKBJ analysis. In particular, the exactly solvable sawtooth flow is especially important in that it provides a short cut to the exact solution to the
Cosmic Shear With ACS Pure Parallels

Science.gov (United States)

Rhodes, Jason

2002-07-01

Small distortions in the shapes of background galaxies by foreground mass provide a powerful method of directly measuring the amount and distribution of dark matter. Several groups have recently detected this weak lensing by large-scale structure, also called cosmic shear. The high resolution and sensitivity of HST/ACS provide a unique opportunity to measure cosmic shear accurately on small scales. Using 260 parallel orbits in Sloan textiti {F775W} we will measure for the first time: beginlistosetlength sep0cm setlengthemsep0cm setlengthopsep0cm em the cosmic shear variance on scales Omega_m^0.5, with signal-to-noise {s/n} 20, and the mass density Omega_m with s/n=4. They will be done at small angular scales where non-linear effects dominate the power spectrum, providing a test of the gravitational instability paradigm for structure formation. Measurements on these scales are not possible from the ground, because of the systematic effects induced by PSF smearing from seeing. Having many independent lines of sight reduces the uncertainty due to cosmic variance, making parallel observations ideal.
Large-scale perspective as a challenge

NARCIS (Netherlands)

Plomp, M.G.A.

2012-01-01

1. Scale forms a challenge for chain researchers: when exactly is something ‘large-scale’? What are the underlying factors (e.g. number of parties, data, objects in the chain, complexity) that determine this? It appears to be a continuum between small- and large-scale, where positioning on that
Algorithm 896: LSA: Algorithms for Large-Scale Optimization

Czech Academy of Sciences Publication Activity Database

Lukšan, Ladislav; Matonoha, Ctirad; Vlček, Jan

2009-01-01

Roč. 36, č. 3 (2009), 16-1-16-29 ISSN 0098-3500 R&D Pro jects: GA AV ČR IAA1030405; GA ČR GP201/06/P397 Institutional research plan: CEZ:AV0Z10300504 Keywords : algorithms * design * large-scale optimization * large-scale nonsmooth optimization * large-scale nonlinear least squares * large-scale nonlinear minimax * large-scale systems of nonlinear equations * sparse pro blems * partially separable pro blems * limited-memory methods * discrete Newton methods * quasi-Newton methods * primal interior-point methods Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 1.904, year: 2009
Scale interactions in a mixing layer – the role of the large-scale gradients

KAUST Repository

Fiscaletti, D.

2016-02-15

© 2016 Cambridge University Press. The interaction between the large and the small scales of turbulence is investigated in a mixing layer, at a Reynolds number based on the Taylor microscale of , via direct numerical simulations. The analysis is performed in physical space, and the local vorticity root-mean-square (r.m.s.) is taken as a measure of the small-scale activity. It is found that positive large-scale velocity fluctuations correspond to large vorticity r.m.s. on the low-speed side of the mixing layer, whereas, they correspond to low vorticity r.m.s. on the high-speed side. The relationship between large and small scales thus depends on position if the vorticity r.m.s. is correlated with the large-scale velocity fluctuations. On the contrary, the correlation coefficient is nearly constant throughout the mixing layer and close to unity if the vorticity r.m.s. is correlated with the large-scale velocity gradients. Therefore, the small-scale activity appears closely related to large-scale gradients, while the correlation between the small-scale activity and the large-scale velocity fluctuations is shown to reflect a property of the large scales. Furthermore, the vorticity from unfiltered (small scales) and from low pass filtered (large scales) velocity fields tend to be aligned when examined within vortical tubes. These results provide evidence for the so-called \\'scale invariance\\' (Meneveau & Katz, Annu. Rev. Fluid Mech., vol. 32, 2000, pp. 1-32), and suggest that some of the large-scale characteristics are not lost at the small scales, at least at the Reynolds number achieved in the present simulation.
Parallel processing for artificial intelligence 2

CERN Document Server

Kumar, V; Suttner, CB

1994-01-01

With the increasing availability of parallel machines and the raising of interest in large scale and real world applications, research on parallel processing for Artificial Intelligence (AI) is gaining greater importance in the computer science environment. Many applications have been implemented and delivered but the field is still considered to be in its infancy. This book assembles diverse aspects of research in the area, providing an overview of the current state of technology. It also aims to promote further growth across the discipline. Contributions have been grouped according to their
Efficient Computation of Sparse Matrix Functions for Large-Scale Electronic Structure Calculations: The CheSS Library.

Science.gov (United States)

Mohr, Stephan; Dawson, William; Wagner, Michael; Caliste, Damien; Nakajima, Takahito; Genovese, Luigi

2017-10-10

We present CheSS, the "Chebyshev Sparse Solvers" library, which has been designed to solve typical problems arising in large-scale electronic structure calculations using localized basis sets. The library is based on a flexible and efficient expansion in terms of Chebyshev polynomials and presently features the calculation of the density matrix, the calculation of matrix powers for arbitrary powers, and the extraction of eigenvalues in a selected interval. CheSS is able to exploit the sparsity of the matrices and scales linearly with respect to the number of nonzero entries, making it well-suited for large-scale calculations. The approach is particularly adapted for setups leading to small spectral widths of the involved matrices and outperforms alternative methods in this regime. By coupling CheSS to the DFT code BigDFT, we show that such a favorable setup is indeed possible in practice. In addition, the approach based on Chebyshev polynomials can be massively parallelized, and CheSS exhibits excellent scaling up to thousands of cores even for relatively small matrix sizes.
Large-scale matrix-handling subroutines 'ATLAS'

International Nuclear Information System (INIS)

Tsunematsu, Toshihide; Takeda, Tatsuoki; Fujita, Keiichi; Matsuura, Toshihiko; Tahara, Nobuo

1978-03-01

Subroutine package ''ATLAS'' has been developed for handling large-scale matrices. The package is composed of four kinds of subroutines, i.e., basic arithmetic routines, routines for solving linear simultaneous equations and for solving general eigenvalue problems and utility routines. The subroutines are useful in large scale plasma-fluid simulations. (auth.)
Large Scale Flutter Data for Design of Rotating Blades Using Navier-Stokes Equations

Science.gov (United States)

Guruswamy, Guru P.

2012-01-01

A procedure to compute flutter boundaries of rotating blades is presented; a) Navier-Stokes equations. b) Frequency domain method compatible with industry practice. Procedure is initially validated: a) Unsteady loads with flapping wing experiment. b) Flutter boundary with fixed wing experiment. Large scale flutter computation is demonstrated for rotating blade: a) Single job submission script. b) Flutter boundary in 24 hour wall clock time with 100 cores. c) Linearly scalable with number of cores. Tested with 1000 cores that produced data in 25 hrs for 10 flutter boundaries. Further wall-clock speed-up is possible by performing parallel computations within each case.

Parallel computing by Monte Carlo codes MVP/GMVP

International Nuclear Information System (INIS)

Nagaya, Yasunobu; Nakagawa, Masayuki; Mori, Takamasa

2001-01-01

General-purpose Monte Carlo codes MVP/GMVP are well-vectorized and thus enable us to perform high-speed Monte Carlo calculations. In order to achieve more speedups, we parallelized the codes on the different types of parallel computing platforms or by using a standard parallelization library MPI. The platforms used for benchmark calculations are a distributed-memory vector-parallel computer Fujitsu VPP500, a distributed-memory massively parallel computer Intel paragon and a distributed-memory scalar-parallel computer Hitachi SR2201, IBM SP2. As mentioned generally, linear speedup could be obtained for large-scale problems but parallelization efficiency decreased as the batch size per a processing element(PE) was smaller. It was also found that the statistical uncertainty for assembly powers was less than 0.1% by the PWR full-core calculation with more than 10 million histories and it took about 1.5 hours by massively parallel computing. (author)
Large-scale solar heat

Energy Technology Data Exchange (ETDEWEB)

Tolonen, J.; Konttinen, P.; Lund, P. [Helsinki Univ. of Technology, Otaniemi (Finland). Dept. of Engineering Physics and Mathematics

1998-12-31

In this project a large domestic solar heating system was built and a solar district heating system was modelled and simulated. Objectives were to improve the performance and reduce costs of a large-scale solar heating system. As a result of the project the benefit/cost ratio can be increased by 40 % through dimensioning and optimising the system at the designing stage. (orig.)
Parallel Implementation and Scaling of an Adaptive Mesh Discrete Ordinates Algorithm for Transport

International Nuclear Information System (INIS)

Howell, L H

2004-01-01

Block-structured adaptive mesh refinement (AMR) uses a mesh structure built up out of locally-uniform rectangular grids. In the BoxLib parallel framework used by the Raptor code, each processor operates on one or more of these grids at each refinement level. The decomposition of the mesh into grids and the distribution of these grids among processors may change every few timesteps as a calculation proceeds. Finer grids use smaller timesteps than coarser grids, requiring additional work to keep the system synchronized and ensure conservation between different refinement levels. In a paper for NECDC 2002 I presented preliminary results on implementation of parallel transport sweeps on the AMR mesh, conjugate gradient acceleration, accuracy of the AMR solution, and scalar speedup of the AMR algorithm compared to a uniform fully-refined mesh. This paper continues with a more in-depth examination of the parallel scaling properties of the scheme, both in single-level and multi-level calculations. Both sweeping and setup costs are considered. The algorithm scales with acceptable performance to several hundred processors. Trends suggest, however, that this is the limit for efficient calculations with traditional transport sweeps, and that modifications to the sweep algorithm will be increasingly needed as job sizes in the thousands of processors become common
Multiple Independent File Parallel I/O with HDF5

Energy Technology Data Exchange (ETDEWEB)

Miller, M. C.

2016-07-13

The HDF5 library has supported the I/O requirements of HPC codes at Lawrence Livermore National Labs (LLNL) since the late 90’s. In particular, HDF5 used in the Multiple Independent File (MIF) parallel I/O paradigm has supported LLNL code’s scalable I/O requirements and has recently been gainfully used at scales as large as O(10⁶) parallel tasks.
Reducing computational costs in large scale 3D EIT by using a sparse Jacobian matrix with block-wise CGLS reconstruction

International Nuclear Information System (INIS)

Yang, C L; Wei, H Y; Soleimani, M; Adler, A

2013-01-01

Electrical impedance tomography (EIT) is a fast and cost-effective technique to provide a tomographic conductivity image of a subject from boundary current–voltage data. This paper proposes a time and memory efficient method for solving a large scale 3D EIT inverse problem using a parallel conjugate gradient (CG) algorithm. The 3D EIT system with a large number of measurement data can produce a large size of Jacobian matrix; this could cause difficulties in computer storage and the inversion process. One of challenges in 3D EIT is to decrease the reconstruction time and memory usage, at the same time retaining the image quality. Firstly, a sparse matrix reduction technique is proposed using thresholding to set very small values of the Jacobian matrix to zero. By adjusting the Jacobian matrix into a sparse format, the element with zeros would be eliminated, which results in a saving of memory requirement. Secondly, a block-wise CG method for parallel reconstruction has been developed. The proposed method has been tested using simulated data as well as experimental test samples. Sparse Jacobian with a block-wise CG enables the large scale EIT problem to be solved efficiently. Image quality measures are presented to quantify the effect of sparse matrix reduction in reconstruction results. (paper)
Reducing computational costs in large scale 3D EIT by using a sparse Jacobian matrix with block-wise CGLS reconstruction.

Science.gov (United States)

Yang, C L; Wei, H Y; Adler, A; Soleimani, M

2013-06-01

Electrical impedance tomography (EIT) is a fast and cost-effective technique to provide a tomographic conductivity image of a subject from boundary current-voltage data. This paper proposes a time and memory efficient method for solving a large scale 3D EIT inverse problem using a parallel conjugate gradient (CG) algorithm. The 3D EIT system with a large number of measurement data can produce a large size of Jacobian matrix; this could cause difficulties in computer storage and the inversion process. One of challenges in 3D EIT is to decrease the reconstruction time and memory usage, at the same time retaining the image quality. Firstly, a sparse matrix reduction technique is proposed using thresholding to set very small values of the Jacobian matrix to zero. By adjusting the Jacobian matrix into a sparse format, the element with zeros would be eliminated, which results in a saving of memory requirement. Secondly, a block-wise CG method for parallel reconstruction has been developed. The proposed method has been tested using simulated data as well as experimental test samples. Sparse Jacobian with a block-wise CG enables the large scale EIT problem to be solved efficiently. Image quality measures are presented to quantify the effect of sparse matrix reduction in reconstruction results.
A Pervasive Parallel Processing Framework for Data Visualization and Analysis at Extreme Scale

Energy Technology Data Exchange (ETDEWEB)

Moreland, Kenneth [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Geveci, Berk [Kitware, Inc., Clifton Park, NY (United States)

2014-11-01

The evolution of the computing world from teraflop to petaflop has been relatively effortless, with several of the existing programming models scaling effectively to the petascale. The migration to exascale, however, poses considerable challenges. All industry trends infer that the exascale machine will be built using processors containing hundreds to thousands of cores per chip. It can be inferred that efficient concurrency on exascale machines requires a massive amount of concurrent threads, each performing many operations on a localized piece of data. Currently, visualization libraries and applications are based off what is known as the visualization pipeline. In the pipeline model, algorithms are encapsulated as filters with inputs and outputs. These filters are connected by setting the output of one component to the input of another. Parallelism in the visualization pipeline is achieved by replicating the pipeline for each processing thread. This works well for today’s distributed memory parallel computers but cannot be sustained when operating on processors with thousands of cores. Our project investigates a new visualization framework designed to exhibit the pervasive parallelism necessary for extreme scale machines. Our framework achieves this by defining algorithms in terms of worklets, which are localized stateless operations. Worklets are atomic operations that execute when invoked unlike filters, which execute when a pipeline request occurs. The worklet design allows execution on a massive amount of lightweight threads with minimal overhead. Only with such fine-grained parallelism can we hope to fill the billions of threads we expect will be necessary for efficient computation on an exascale machine.
Probes of large-scale structure in the Universe

International Nuclear Information System (INIS)

Suto, Yasushi; Gorski, K.; Juszkiewicz, R.; Silk, J.

1988-01-01

Recent progress in observational techniques has made it possible to confront quantitatively various models for the large-scale structure of the Universe with detailed observational data. We develop a general formalism to show that the gravitational instability theory for the origin of large-scale structure is now capable of critically confronting observational results on cosmic microwave background radiation angular anisotropies, large-scale bulk motions and large-scale clumpiness in the galaxy counts. (author)
Micrometer and nanometer-scale parallel patterning of ceramic and organic-inorganic hybrid materials

NARCIS (Netherlands)

ten Elshof, Johan E.; Khan, Sajid; Göbel, Ole

2010-01-01

This review gives an overview of the progress made in recent years in the development of low-cost parallel patterning techniques for ceramic materials, silica, and organic–inorganic silsesquioxane-based hybrids from wet-chemical solutions and suspensions on the micrometer and nanometer-scale. The
Dynamic Analysis and Vibration Attenuation of Cable-Driven Parallel Manipulators for Large Workspace Applications

Directory of Open Access Journals (Sweden)

Jingli Du

2013-01-01

Full Text Available Cable-driven parallel manipulators are one of the best solutions to achieving large workspace since flexible cables can be easily stored on reels. However, due to the negligible flexural stiffness of cables, long cables will unavoidably vibrate during operation for large workspace applications. In this paper a finite element model for cable-driven parallel manipulators is proposed to mimic small amplitude vibration of cables around their desired position. Output feedback of the cable tension variation at the end of the end-effector is utilized to design the vibration attenuation controller which aims at attenuating the vibration of cables by slightly varying the cable length, thus decreasing its effect on the end-effector. When cable vibration is attenuated, motion controller could be designed for implementing precise large motion to track given trajectories. A numerical example is presented to demonstrate the dynamic model and the control algorithm.
Large-scale grid management; Storskala Nettforvaltning

Energy Technology Data Exchange (ETDEWEB)

Langdal, Bjoern Inge; Eggen, Arnt Ove

2003-07-01

The network companies in the Norwegian electricity industry now have to establish a large-scale network management, a concept essentially characterized by (1) broader focus (Broad Band, Multi Utility,...) and (2) bigger units with large networks and more customers. Research done by SINTEF Energy Research shows so far that the approaches within large-scale network management may be structured according to three main challenges: centralization, decentralization and out sourcing. The article is part of a planned series.
Japanese large-scale interferometers

CERN Document Server

Kuroda, K; Miyoki, S; Ishizuka, H; Taylor, C T; Yamamoto, K; Miyakawa, O; Fujimoto, M K; Kawamura, S; Takahashi, R; Yamazaki, T; Arai, K; Tatsumi, D; Ueda, A; Fukushima, M; Sato, S; Shintomi, T; Yamamoto, A; Suzuki, T; Saitô, Y; Haruyama, T; Sato, N; Higashi, Y; Uchiyama, T; Tomaru, T; Tsubono, K; Ando, M; Takamori, A; Numata, K; Ueda, K I; Yoneda, H; Nakagawa, K; Musha, M; Mio, N; Moriwaki, S; Somiya, K; Araya, A; Kanda, N; Telada, S; Sasaki, M; Tagoshi, H; Nakamura, T; Tanaka, T; Ohara, K

2002-01-01

The objective of the TAMA 300 interferometer was to develop advanced technologies for kilometre scale interferometers and to observe gravitational wave events in nearby galaxies. It was designed as a power-recycled Fabry-Perot-Michelson interferometer and was intended as a step towards a final interferometer in Japan. The present successful status of TAMA is presented. TAMA forms a basis for LCGT (large-scale cryogenic gravitational wave telescope), a 3 km scale cryogenic interferometer to be built in the Kamioka mine in Japan, implementing cryogenic mirror techniques. The plan of LCGT is schematically described along with its associated R and D.
Stormbow: A Cloud-Based Tool for Reads Mapping and Expression Quantification in Large-Scale RNA-Seq Studies.

Science.gov (United States)

Zhao, Shanrong; Prenger, Kurt; Smith, Lance

2013-01-01

RNA-Seq is becoming a promising replacement to microarrays in transcriptome profiling and differential gene expression study. Technical improvements have decreased sequencing costs and, as a result, the size and number of RNA-Seq datasets have increased rapidly. However, the increasing volume of data from large-scale RNA-Seq studies poses a practical challenge for data analysis in a local environment. To meet this challenge, we developed Stormbow, a cloud-based software package, to process large volumes of RNA-Seq data in parallel. The performance of Stormbow has been tested by practically applying it to analyse 178 RNA-Seq samples in the cloud. In our test, it took 6 to 8 hours to process an RNA-Seq sample with 100 million reads, and the average cost was $3.50 per sample. Utilizing Amazon Web Services as the infrastructure for Stormbow allows us to easily scale up to handle large datasets with on-demand computational resources. Stormbow is a scalable, cost effective, and open-source based tool for large-scale RNA-Seq data analysis. Stormbow can be freely downloaded and can be used out of box to process Illumina RNA-Seq datasets.
BFAST: an alignment tool for large scale genome resequencing.

Directory of Open Access Journals (Sweden)

Nils Homer

2009-11-01

Full Text Available The new generation of massively parallel DNA sequencers, combined with the challenge of whole human genome resequencing, result in the need for rapid and accurate alignment of billions of short DNA sequence reads to a large reference genome. Speed is obviously of great importance, but equally important is maintaining alignment accuracy of short reads, in the 25-100 base range, in the presence of errors and true biological variation.We introduce a new algorithm specifically optimized for this task, as well as a freely available implementation, BFAST, which can align data produced by any of current sequencing platforms, allows for user-customizable levels of speed and accuracy, supports paired end data, and provides for efficient parallel and multi-threaded computation on a computer cluster. The new method is based on creating flexible, efficient whole genome indexes to rapidly map reads to candidate alignment locations, with arbitrary multiple independent indexes allowed to achieve robustness against read errors and sequence variants. The final local alignment uses a Smith-Waterman method, with gaps to support the detection of small indels.We compare BFAST to a selection of large-scale alignment tools -- BLAT, MAQ, SHRiMP, and SOAP -- in terms of both speed and accuracy, using simulated and real-world datasets. We show BFAST can achieve substantially greater sensitivity of alignment in the context of errors and true variants, especially insertions and deletions, and minimize false mappings, while maintaining adequate speed compared to other current methods. We show BFAST can align the amount of data needed to fully resequence a human genome, one billion reads, with high sensitivity and accuracy, on a modest computer cluster in less than 24 hours. BFAST is available at (http://bfast.sourceforge.net.
Large-scale transportation network congestion evolution prediction using deep learning theory.

Directory of Open Access Journals (Sweden)

Xiaolei Ma

Full Text Available Understanding how congestion at one location can cause ripples throughout large-scale transportation network is vital for transportation researchers and practitioners to pinpoint traffic bottlenecks for congestion mitigation. Traditional studies rely on either mathematical equations or simulation techniques to model traffic congestion dynamics. However, most of the approaches have limitations, largely due to unrealistic assumptions and cumbersome parameter calibration process. With the development of Intelligent Transportation Systems (ITS and Internet of Things (IoT, transportation data become more and more ubiquitous. This triggers a series of data-driven research to investigate transportation phenomena. Among them, deep learning theory is considered one of the most promising techniques to tackle tremendous high-dimensional data. This study attempts to extend deep learning theory into large-scale transportation network analysis. A deep Restricted Boltzmann Machine and Recurrent Neural Network architecture is utilized to model and predict traffic congestion evolution based on Global Positioning System (GPS data from taxi. A numerical study in Ningbo, China is conducted to validate the effectiveness and efficiency of the proposed method. Results show that the prediction accuracy can achieve as high as 88% within less than 6 minutes when the model is implemented in a Graphic Processing Unit (GPU-based parallel computing environment. The predicted congestion evolution patterns can be visualized temporally and spatially through a map-based platform to identify the vulnerable links for proactive congestion mitigation.
Compiler Technology for Parallel Scientific Computation

Directory of Open Access Journals (Sweden)

Can Özturan

1994-01-01

Full Text Available There is a need for compiler technology that, given the source program, will generate efficient parallel codes for different architectures with minimal user involvement. Parallel computation is becoming indispensable in solving large-scale problems in science and engineering. Yet, the use of parallel computation is limited by the high costs of developing the needed software. To overcome this difficulty we advocate a comprehensive approach to the development of scalable architecture-independent software for scientific computation based on our experience with equational programming language (EPL. Our approach is based on a program decomposition, parallel code synthesis, and run-time support for parallel scientific computation. The program decomposition is guided by the source program annotations provided by the user. The synthesis of parallel code is based on configurations that describe the overall computation as a set of interacting components. Run-time support is provided by the compiler-generated code that redistributes computation and data during object program execution. The generated parallel code is optimized using techniques of data alignment, operator placement, wavefront determination, and memory optimization. In this article we discuss annotations, configurations, parallel code generation, and run-time support suitable for parallel programs written in the functional parallel programming language EPL and in Fortran.
Survey and research for the enhancement of large-scale technology development 2. How large-scale technology development should be in the future; Ogata gijutsu kaihatsu suishin no tame no chosa kenkyu. 2. Kongo no ogata gijutsu kaihatsu no arikata

Energy Technology Data Exchange (ETDEWEB)

NONE

1981-03-01

A survey is conducted over the subject matter by holding interviews with people, employed with the entrusted businesses participating in the large-scale industrial technology development system, who are engaged in the development of industrial technologies, and with people of experience or academic background involved in the project enhancement effort. Needs of improvement are pointed out that the competition principle based for example on parallel development be introduced; that research-on-research be practiced for effective task institution; midway evaluation be substantiated since prior evaluation is difficult; efforts be made to organize new industries utilizing the fruits of large-scale industrial technology for the creation of markets, not to induce economic conflicts; that transfer of technologies be enhanced from the private sector to public sector. Studies are made about the review of research conducting systems; utilization of the power of private sector research and development efforts; enlightening about industrial proprietorship; and the diffusion of large-scale project systems. In this connection, problems are pointed out, requests are submitted, and remedial measures and suggestions are presented. (NEDO)
pcircle - A Suite of Scalable Parallel File System Tools

Energy Technology Data Exchange (ETDEWEB)

2015-10-01

Most of the software related to file system are written for conventional local file system, they are serialized and can't take advantage of the benefit of a large scale parallel file system. "pcircle" software builds on top of ubiquitous MPI in cluster computing environment and "work-stealing" pattern to provide a scalable, high-performance suite of file system tools. In particular - it implemented parallel data copy and parallel data checksumming, with advanced features such as async progress report, checkpoint and restart, as well as integrity checking.
Large scale model testing

International Nuclear Information System (INIS)

Brumovsky, M.; Filip, R.; Polachova, H.; Stepanek, S.

1989-01-01

Fracture mechanics and fatigue calculations for WWER reactor pressure vessels were checked by large scale model testing performed using large testing machine ZZ 8000 (with a maximum load of 80 MN) at the SKODA WORKS. The results are described from testing the material resistance to fracture (non-ductile). The testing included the base materials and welded joints. The rated specimen thickness was 150 mm with defects of a depth between 15 and 100 mm. The results are also presented of nozzles of 850 mm inner diameter in a scale of 1:3; static, cyclic, and dynamic tests were performed without and with surface defects (15, 30 and 45 mm deep). During cyclic tests the crack growth rate in the elastic-plastic region was also determined. (author). 6 figs., 2 tabs., 5 refs
Hi-Corrector: a fast, scalable and memory-efficient package for normalizing large-scale Hi-C data.

Science.gov (United States)

Li, Wenyuan; Gong, Ke; Li, Qingjiao; Alber, Frank; Zhou, Xianghong Jasmine

2015-03-15

Genome-wide proximity ligation assays, e.g. Hi-C and its variant TCC, have recently become important tools to study spatial genome organization. Removing biases from chromatin contact matrices generated by such techniques is a critical preprocessing step of subsequent analyses. The continuing decline of sequencing costs has led to an ever-improving resolution of the Hi-C data, resulting in very large matrices of chromatin contacts. Such large-size matrices, however, pose a great challenge on the memory usage and speed of its normalization. Therefore, there is an urgent need for fast and memory-efficient methods for normalization of Hi-C data. We developed Hi-Corrector, an easy-to-use, open source implementation of the Hi-C data normalization algorithm. Its salient features are (i) scalability-the software is capable of normalizing Hi-C data of any size in reasonable times; (ii) memory efficiency-the sequential version can run on any single computer with very limited memory, no matter how little; (iii) fast speed-the parallel version can run very fast on multiple computing nodes with limited local memory. The sequential version is implemented in ANSI C and can be easily compiled on any system; the parallel version is implemented in ANSI C with the MPI library (a standardized and portable parallel environment designed for solving large-scale scientific problems). The package is freely available at http://zhoulab.usc.edu/Hi-Corrector/. © The Author 2014. Published by Oxford University Press.

Why small-scale cannabis growers stay small: five mechanisms that prevent small-scale growers from going large scale.

Science.gov (United States)

Hammersvik, Eirik; Sandberg, Sveinung; Pedersen, Willy

2012-11-01

Over the past 15-20 years, domestic cultivation of cannabis has been established in a number of European countries. New techniques have made such cultivation easier; however, the bulk of growers remain small-scale. In this study, we explore the factors that prevent small-scale growers from increasing their production. The study is based on 1 year of ethnographic fieldwork and qualitative interviews conducted with 45 Norwegian cannabis growers, 10 of whom were growing on a large-scale and 35 on a small-scale. The study identifies five mechanisms that prevent small-scale indoor growers from going large-scale. First, large-scale operations involve a number of people, large sums of money, a high work-load and a high risk of detection, and thus demand a higher level of organizational skills than for small growing operations. Second, financial assets are needed to start a large 'grow-site'. Housing rent, electricity, equipment and nutrients are expensive. Third, to be able to sell large quantities of cannabis, growers need access to an illegal distribution network and knowledge of how to act according to black market norms and structures. Fourth, large-scale operations require advanced horticultural skills to maximize yield and quality, which demands greater skills and knowledge than does small-scale cultivation. Fifth, small-scale growers are often embedded in the 'cannabis culture', which emphasizes anti-commercialism, anti-violence and ecological and community values. Hence, starting up large-scale production will imply having to renegotiate or abandon these values. Going from small- to large-scale cannabis production is a demanding task-ideologically, technically, economically and personally. The many obstacles that small-scale growers face and the lack of interest and motivation for going large-scale suggest that the risk of a 'slippery slope' from small-scale to large-scale growing is limited. Possible political implications of the findings are discussed. Copyright
Distributed large-scale dimensional metrology new insights

CERN Document Server

Franceschini, Fiorenzo; Maisano, Domenico

2011-01-01

Focuses on the latest insights into and challenges of distributed large scale dimensional metrology Enables practitioners to study distributed large scale dimensional metrology independently Includes specific examples of the development of new system prototypes
High-Speed Interrogation for Large-Scale Fiber Bragg Grating Sensing.

Science.gov (United States)

Hu, Chenyuan; Bai, Wei

2018-02-24

A high-speed interrogation scheme for large-scale fiber Bragg grating (FBG) sensing arrays is presented. This technique employs parallel computing and pipeline control to modulate incident light and demodulate the reflected sensing signal. One Electro-optic modulator (EOM) and one semiconductor optical amplifier (SOA) were used to generate a phase delay to filter reflected spectrum form multiple candidate FBGs with the same optical path difference (OPD). Experimental results showed that the fastest interrogation delay time for the proposed method was only about 27.2 us for a single FBG interrogation, and the system scanning period was only limited by the optical transmission delay in the sensing fiber owing to the multiple simultaneous central wavelength calculations. Furthermore, the proposed FPGA-based technique had a verified FBG wavelength demodulation stability of ±1 pm without average processing.
SCALE INTERACTION IN A MIXING LAYER. THE ROLE OF THE LARGE-SCALE GRADIENTS

KAUST Repository

Fiscaletti, Daniele; Attili, Antonio; Bisetti, Fabrizio; Elsinga, Gerrit E.

2015-01-01

from physical considerations we would expect the scales to interact in a qualitatively similar way within the flow and across different turbulent flows. Therefore, instead of the large-scale fluctuations, the large-scale gradients modulation of the small scales has been additionally investigated.
Parallelism, fractal geometry and other aspects of computational mathematics

International Nuclear Information System (INIS)

Churchhouse, R.F.

1991-01-01

In some fields such as meteorology, theoretical physics, quantum chemistry and hydrodynamics there are problems which involve so much computation that computers of the power of a thousand times a Cray 2 could be fully utilised if they were available. Since it is unlikely that uniprocessors of such power will be available, such large scale problems could be solved by using systems of computers running in parallel. This approach, of course, requires to find appropriate algorithms for the solution of such problems which can efficiently make use of a large number of computers working in parallel. 11 refs, 10 figs, 1 tab
Parallel scalability of Hartree-Fock calculations

Science.gov (United States)

Chow, Edmond; Liu, Xing; Smelyanskiy, Mikhail; Hammond, Jeff R.

2015-03-01

Quantum chemistry is increasingly performed using large cluster computers consisting of multiple interconnected nodes. For a fixed molecular problem, the efficiency of a calculation usually decreases as more nodes are used, due to the cost of communication between the nodes. This paper empirically investigates the parallel scalability of Hartree-Fock calculations. The construction of the Fock matrix and the density matrix calculation are analyzed separately. For the former, we use a parallelization of Fock matrix construction based on a static partitioning of work followed by a work stealing phase. For the latter, we use density matrix purification from the linear scaling methods literature, but without using sparsity. When using large numbers of nodes for moderately sized problems, density matrix computations are network-bandwidth bound, making purification methods potentially faster than eigendecomposition methods.
Trends in large-scale testing of reactor structures

International Nuclear Information System (INIS)

Blejwas, T.E.

2003-01-01

Large-scale tests of reactor structures have been conducted at Sandia National Laboratories since the late 1970s. This paper describes a number of different large-scale impact tests, pressurization tests of models of containment structures, and thermal-pressure tests of models of reactor pressure vessels. The advantages of large-scale testing are evident, but cost, in particular limits its use. As computer models have grown in size, such as number of degrees of freedom, the advent of computer graphics has made possible very realistic representation of results - results that may not accurately represent reality. A necessary condition to avoiding this pitfall is the validation of the analytical methods and underlying physical representations. Ironically, the immensely larger computer models sometimes increase the need for large-scale testing, because the modeling is applied to increasing more complex structural systems and/or more complex physical phenomena. Unfortunately, the cost of large-scale tests is a disadvantage that will likely severely limit similar testing in the future. International collaborations may provide the best mechanism for funding future programs with large-scale tests. (author)
Hybrid MPI-OpenMP Parallelism in the ONETEP Linear-Scaling Electronic Structure Code: Application to the Delamination of Cellulose Nanofibrils.

Science.gov (United States)

Wilkinson, Karl A; Hine, Nicholas D M; Skylaris, Chris-Kriton

2014-11-11

We present a hybrid MPI-OpenMP implementation of Linear-Scaling Density Functional Theory within the ONETEP code. We illustrate its performance on a range of high performance computing (HPC) platforms comprising shared-memory nodes with fast interconnect. Our work has focused on applying OpenMP parallelism to the routines which dominate the computational load, attempting where possible to parallelize different loops from those already parallelized within MPI. This includes 3D FFT box operations, sparse matrix algebra operations, calculation of integrals, and Ewald summation. While the underlying numerical methods are unchanged, these developments represent significant changes to the algorithms used within ONETEP to distribute the workload across CPU cores. The new hybrid code exhibits much-improved strong scaling relative to the MPI-only code and permits calculations with a much higher ratio of cores to atoms. These developments result in a significantly shorter time to solution than was possible using MPI alone and facilitate the application of the ONETEP code to systems larger than previously feasible. We illustrate this with benchmark calculations from an amyloid fibril trimer containing 41,907 atoms. We use the code to study the mechanism of delamination of cellulose nanofibrils when undergoing sonification, a process which is controlled by a large number of interactions that collectively determine the structural properties of the fibrils. Many energy evaluations were needed for these simulations, and as these systems comprise up to 21,276 atoms this would not have been feasible without the developments described here.
Assessing Programming Costs of Explicit Memory Localization on a Large Scale Shared Memory Multiprocessor

Directory of Open Access Journals (Sweden)

Silvio Picano

1992-01-01

Full Text Available We present detailed experimental work involving a commercially available large scale shared memory multiple instruction stream-multiple data stream (MIMD parallel computer having a software controlled cache coherence mechanism. To make effective use of such an architecture, the programmer is responsible for designing the program's structure to match the underlying multiprocessors capabilities. We describe the techniques used to exploit our multiprocessor (the BBN TC2000 on a network simulation program, showing the resulting performance gains and the associated programming costs. We show that an efficient implementation relies heavily on the user's ability to explicitly manage the memory system.
Remote collaboration system based on large scale simulation

International Nuclear Information System (INIS)

Kishimoto, Yasuaki; Sugahara, Akihiro; Li, J.Q.

2008-01-01

Large scale simulation using super-computer, which generally requires long CPU time and produces large amount of data, has been extensively studied as a third pillar in various advanced science fields in parallel to theory and experiment. Such a simulation is expected to lead new scientific discoveries through elucidation of various complex phenomena, which are hardly identified only by conventional theoretical and experimental approaches. In order to assist such large simulation studies for which many collaborators working at geographically different places participate and contribute, we have developed a unique remote collaboration system, referred to as SIMON (simulation monitoring system), which is based on client-server system control introducing an idea of up-date processing, contrary to that of widely used post-processing. As a key ingredient, we have developed a trigger method, which transmits various requests for the up-date processing from the simulation (client) running on a super-computer to a workstation (server). Namely, the simulation running on a super-computer actively controls the timing of up-date processing. The server that has received the requests from the ongoing simulation such as data transfer, data analyses, and visualizations, etc. starts operations according to the requests during the simulation. The server makes the latest results available to web browsers, so that the collaborators can monitor the results at any place and time in the world. By applying the system to a specific simulation project of laser-matter interaction, we have confirmed that the system works well and plays an important role as a collaboration platform on which many collaborators work with one another
Plasma turbulence driven by transversely large-scale standing shear Alfvén waves

International Nuclear Information System (INIS)

Singh, Nagendra; Rao, Sathyanarayan

2012-01-01

Using two-dimensional particle-in-cell simulations, we study generation of turbulence consisting of transversely small-scale dispersive Alfvén and electrostatic waves when plasma is driven by a large-scale standing shear Alfvén wave (LS-SAW). The standing wave is set up by reflecting a propagating LS-SAW. The ponderomotive force of the standing wave generates transversely large-scale density modifications consisting of density cavities and enhancements. The drifts of the charged particles driven by the ponderomotive force and those directly caused by the fields of the standing LS-SAW generate non-thermal features in the plasma. Parametric instabilities driven by the inherent plasma nonlinearities associated with the LS-SAW in combination with the non-thermal features generate small-scale electromagnetic and electrostatic waves, yielding a broad frequency spectrum ranging from below the source frequency of the LS-SAW to ion cyclotron and lower hybrid frequencies and beyond. The power spectrum of the turbulence has peaks at distinct perpendicular wave numbers (k ⊥ ) lying in the range d e −1 -6d e −1 , d e being the electron inertial length, suggesting non-local parametric decay from small to large k ⊥ . The turbulence spectrum encompassing both electromagnetic and electrostatic fluctuations is also broadband in parallel wave number (k || ). In a standing-wave supported density cavity, the ratio of the perpendicular electric to magnetic field amplitude is R(k ⊥ ) = |E ⊥ (k ⊥ )/|B ⊥ (k ⊥ )| ≪ V A for k ⊥ d e A is the Alfvén velocity. The characteristic features of the broadband plasma turbulence are compared with those available from satellite observations in space plasmas.
A parallel algorithm for 3D dislocation dynamics

International Nuclear Information System (INIS)

Wang Zhiqiang; Ghoniem, Nasr; Swaminarayan, Sriram; LeSar, Richard

2006-01-01

Dislocation dynamics (DD), a discrete dynamic simulation method in which dislocations are the fundamental entities, is a powerful tool for investigation of plasticity, deformation and fracture of materials at the micron length scale. However, severe computational difficulties arising from complex, long-range interactions between these curvilinear line defects limit the application of DD in the study of large-scale plastic deformation. We present here the development of a parallel algorithm for accelerated computer simulations of DD. By representing dislocations as a 3D set of dislocation particles, we show here that the problem of an interacting ensemble of dislocations can be converted to a problem of a particle ensemble, interacting with a long-range force field. A grid using binary space partitioning is constructed to keep track of node connectivity across domains. We demonstrate the computational efficiency of the parallel micro-plasticity code and discuss how O(N) methods map naturally onto the parallel data structure. Finally, we present results from applications of the parallel code to deformation in single crystal fcc metals
Large Scale Computations in Air Pollution Modelling

DEFF Research Database (Denmark)

Zlatev, Z.; Brandt, J.; Builtjes, P. J. H.

Proceedings of the NATO Advanced Research Workshop on Large Scale Computations in Air Pollution Modelling, Sofia, Bulgaria, 6-10 July 1998......Proceedings of the NATO Advanced Research Workshop on Large Scale Computations in Air Pollution Modelling, Sofia, Bulgaria, 6-10 July 1998...
Large-Scale 3D Printing: The Way Forward

Science.gov (United States)

Jassmi, Hamad Al; Najjar, Fady Al; Ismail Mourad, Abdel-Hamid

2018-03-01

Research on small-scale 3D printing has rapidly evolved, where numerous industrial products have been tested and successfully applied. Nonetheless, research on large-scale 3D printing, directed to large-scale applications such as construction and automotive manufacturing, yet demands a great a great deal of efforts. Large-scale 3D printing is considered an interdisciplinary topic and requires establishing a blended knowledge base from numerous research fields including structural engineering, materials science, mechatronics, software engineering, artificial intelligence and architectural engineering. This review article summarizes key topics of relevance to new research trends on large-scale 3D printing, particularly pertaining (1) technological solutions of additive construction (i.e. the 3D printers themselves), (2) materials science challenges, and (3) new design opportunities.
Growth Limits in Large Scale Networks

DEFF Research Database (Denmark)

Knudsen, Thomas Phillip

limitations. The rising complexity of network management with the convergence of communications platforms is shown as problematic for both automatic management feasibility and for manpower resource management. In the fourth step the scope is extended to include the present society with the DDN project as its......The Subject of large scale networks is approached from the perspective of the network planner. An analysis of the long term planning problems is presented with the main focus on the changing requirements for large scale networks and the potential problems in meeting these requirements. The problems...... the fundamental technological resources in network technologies are analysed for scalability. Here several technological limits to continued growth are presented. The third step involves a survey of major problems in managing large scale networks given the growth of user requirements and the technological...
Accelerating sustainability in large-scale facilities

CERN Multimedia

Marina Giampietro

2011-01-01

Scientific research centres and large-scale facilities are intrinsically energy intensive, but how can big science improve its energy management and eventually contribute to the environmental cause with new cleantech? CERN’s commitment to providing tangible answers to these questions was sealed in the first workshop on energy management for large scale scientific infrastructures held in Lund, Sweden, on the 13-14 October. Participants at the energy management for large scale scientific infrastructures workshop. The workshop, co-organised with the European Spallation Source (ESS) and the European Association of National Research Facilities (ERF), tackled a recognised need for addressing energy issues in relation with science and technology policies. It brought together more than 150 representatives of Research Infrastrutures (RIs) and energy experts from Europe and North America. “Without compromising our scientific projects, we can ...
Large scale reflood test

International Nuclear Information System (INIS)

Hirano, Kemmei; Murao, Yoshio

1980-01-01

The large-scale reflood test with a view to ensuring the safety of light water reactors was started in fiscal 1976 based on the special account act for power source development promotion measures by the entrustment from the Science and Technology Agency. Thereafter, to establish the safety of PWRs in loss-of-coolant accidents by joint international efforts, the Japan-West Germany-U.S. research cooperation program was started in April, 1980. Thereupon, the large-scale reflood test is now included in this program. It consists of two tests using a cylindrical core testing apparatus for examining the overall system effect and a plate core testing apparatus for testing individual effects. Each apparatus is composed of the mock-ups of pressure vessel, primary loop, containment vessel and ECCS. The testing method, the test results and the research cooperation program are described. (J.P.N.)
Multi-Agent System Supporting Automated Large-Scale Photometric Computations

Directory of Open Access Journals (Sweden)

Adam Sȩdziwy

2016-02-01

Full Text Available The technologies related to green energy, smart cities and similar areas being dynamically developed in recent years, face frequently problems of a computational nature rather than a technological one. The example is the ability of accurately predicting the weather conditions for PV farms or wind turbines. Another group of issues is related to the complexity of the computations required to obtain an optimal setup of a solution being designed. In this article, we present the case representing the latter group of problems, namely designing large-scale power-saving lighting installations. The term “large-scale” refers to an entire city area, containing tens of thousands of luminaires. Although a simple power reduction for a single street, giving limited savings, is relatively easy, it becomes infeasible for tasks covering thousands of luminaires described by precise coordinates (instead of simplified layouts. To overcome this critical issue, we propose introducing a formal representation of a computing problem and applying a multi-agent system to perform design-related computations in parallel. The important measure introduced in the article indicating optimization progress is entropy. It also allows for terminating optimization when the solution is satisfying. The article contains the results of real-life calculations being made with the help of the presented approach.
A Parallel Computational Model for Multichannel Phase Unwrapping Problem

Science.gov (United States)

Imperatore, Pasquale; Pepe, Antonio; Lanari, Riccardo

2015-05-01

In this paper, a parallel model for the solution of the computationally intensive multichannel phase unwrapping (MCh-PhU) problem is proposed. Firstly, the Extended Minimum Cost Flow (EMCF) algorithm for solving MCh-PhU problem is revised within the rigorous mathematical framework of the discrete calculus ; thus permitting to capture its topological structure in terms of meaningful discrete differential operators. Secondly, emphasis is placed on those methodological and practical aspects, which lead to a parallel reformulation of the EMCF algorithm. Thus, a novel dual-level parallel computational model, in which the parallelism is hierarchically implemented at two different (i.e., process and thread) levels, is presented. The validity of our approach has been demonstrated through a series of experiments that have revealed a significant speedup. Therefore, the attained high-performance prototype is suitable for the solution of large-scale phase unwrapping problems in reasonable time frames, with a significant impact on the systematic exploitation of the existing, and rapidly growing, large archives of SAR data.
Large Scale Cosmological Anomalies and Inhomogeneous Dark Energy

Directory of Open Access Journals (Sweden)

Leandros Perivolaropoulos

2014-01-01

Full Text Available A wide range of large scale observations hint towards possible modifications on the standard cosmological model which is based on a homogeneous and isotropic universe with a small cosmological constant and matter. These observations, also known as “cosmic anomalies” include unexpected Cosmic Microwave Background perturbations on large angular scales, large dipolar peculiar velocity flows of galaxies (“bulk flows”, the measurement of inhomogenous values of the fine structure constant on cosmological scales (“alpha dipole” and other effects. The presence of the observational anomalies could either be a large statistical fluctuation in the context of ΛCDM or it could indicate a non-trivial departure from the cosmological principle on Hubble scales. Such a departure is very much constrained by cosmological observations for matter. For dark energy however there are no significant observational constraints for Hubble scale inhomogeneities. In this brief review I discuss some of the theoretical models that can naturally lead to inhomogeneous dark energy, their observational constraints and their potential to explain the large scale cosmic anomalies.

Large-scale patterns in Rayleigh-Benard convection

International Nuclear Information System (INIS)

Hardenberg, J. von; Parodi, A.; Passoni, G.; Provenzale, A.; Spiegel, E.A.

2008-01-01

Rayleigh-Benard convection at large Rayleigh number is characterized by the presence of intense, vertically moving plumes. Both laboratory and numerical experiments reveal that the rising and descending plumes aggregate into separate clusters so as to produce large-scale updrafts and downdrafts. The horizontal scales of the aggregates reported so far have been comparable to the horizontal extent of the containers, but it has not been clear whether that represents a limitation imposed by domain size. In this work, we present numerical simulations of convection at sufficiently large aspect ratio to ascertain whether there is an intrinsic saturation scale for the clustering process when that ratio is large enough. From a series of simulations of Rayleigh-Benard convection with Rayleigh numbers between 10 5 and 10 8 and with aspect ratios up to 12π, we conclude that the clustering process has a finite horizontal saturation scale with at most a weak dependence on Rayleigh number in the range studied
FFTLasso: Large-Scale LASSO in the Fourier Domain

KAUST Repository

Bibi, Adel Aamer

2017-11-09

In this paper, we revisit the LASSO sparse representation problem, which has been studied and used in a variety of different areas, ranging from signal processing and information theory to computer vision and machine learning. In the vision community, it found its way into many important applications, including face recognition, tracking, super resolution, image denoising, to name a few. Despite advances in efficient sparse algorithms, solving large-scale LASSO problems remains a challenge. To circumvent this difficulty, people tend to downsample and subsample the problem (e.g. via dimensionality reduction) to maintain a manageable sized LASSO, which usually comes at the cost of losing solution accuracy. This paper proposes a novel circulant reformulation of the LASSO that lifts the problem to a higher dimension, where ADMM can be efficiently applied to its dual form. Because of this lifting, all optimization variables are updated using only basic element-wise operations, the most computationally expensive of which is a 1D FFT. In this way, there is no need for a linear system solver nor matrix-vector multiplication. Since all operations in our FFTLasso method are element-wise, the subproblems are completely independent and can be trivially parallelized (e.g. on a GPU). The attractive computational properties of FFTLasso are verified by extensive experiments on synthetic and real data and on the face recognition task. They demonstrate that FFTLasso scales much more effectively than a state-of-the-art solver.
FFTLasso: Large-Scale LASSO in the Fourier Domain

KAUST Repository

Bibi, Adel Aamer; Itani, Hani; Ghanem, Bernard

2017-01-01

In this paper, we revisit the LASSO sparse representation problem, which has been studied and used in a variety of different areas, ranging from signal processing and information theory to computer vision and machine learning. In the vision community, it found its way into many important applications, including face recognition, tracking, super resolution, image denoising, to name a few. Despite advances in efficient sparse algorithms, solving large-scale LASSO problems remains a challenge. To circumvent this difficulty, people tend to downsample and subsample the problem (e.g. via dimensionality reduction) to maintain a manageable sized LASSO, which usually comes at the cost of losing solution accuracy. This paper proposes a novel circulant reformulation of the LASSO that lifts the problem to a higher dimension, where ADMM can be efficiently applied to its dual form. Because of this lifting, all optimization variables are updated using only basic element-wise operations, the most computationally expensive of which is a 1D FFT. In this way, there is no need for a linear system solver nor matrix-vector multiplication. Since all operations in our FFTLasso method are element-wise, the subproblems are completely independent and can be trivially parallelized (e.g. on a GPU). The attractive computational properties of FFTLasso are verified by extensive experiments on synthetic and real data and on the face recognition task. They demonstrate that FFTLasso scales much more effectively than a state-of-the-art solver.
Benefits of Parallel I/O in Ab Initio Nuclear Physics Calculations

International Nuclear Information System (INIS)

Laghave, Nikhil; Sosonkina, Masha; Maris, Pieter; Vary, James P.

2009-01-01

Many modern scientific applications rely on highly parallel calculations, which scale to 10's of thousands processors. However, most applications do not concentrate on parallelizing input/output operations. In particular, sequential I/O has been identified as a bottleneck for the highly scalable MFDn (Many Fermion Dynamics for nuclear structure) code performing ab initio nuclear structure calculations. In this paper, we develop interfaces and parallel I/O procedures to use a well-known parallel I/O library in MFDn. As a result, we gain efficient input/output of large datasets along with their portability and ease of use in the downstream processing.
Development of large scale fusion plasma simulation and storage grid on JAERI Origin3800 system

International Nuclear Information System (INIS)

Idomura, Yasuhiro; Wang, Xin

2003-01-01

Under the Numerical EXperiment of Tokamak (NEXT) research project, various fluid, particle, and hybrid codes have been developed. These codes require a computational environment which consists of high performance processors, high speed storage system, and high speed parallelized visualization system. In this paper, the performance of the JAERI Origin3800 system is examined from a point of view of these requests. In the performance tests, it is shown that the representative particle and fluid codes operate with 15 - 40% of processing efficiency up to 512 processors. A storage area network (SAN) provides high speed parallel data transfer. A parallel visualization system enables order to magnitude faster visualization of a large scale simulation data compared with the previous graphic workstations. Accordingly, an extremely advanced simulation environment is realized on the JAERI Origin3800 system. Recently, development of a storage grid is underway in order to improve a computational environment of remote users. The storage grid is constructed by a combination of SAN and a wavelength division multiplexer (WDM). The preliminary tests show that compared with the existing data transfer methods, it enables dramatically high speed data transfer ∼100 Gbps over a wide area network. (author)
Manufacturing test of large scale hollow capsule and long length cladding in the large scale oxide dispersion strengthened (ODS) martensitic steel

International Nuclear Information System (INIS)

Narita, Takeshi; Ukai, Shigeharu; Kaito, Takeji; Ohtsuka, Satoshi; Fujiwara, Masayuki

2004-04-01

Mass production capability of oxide dispersion strengthened (ODS) martensitic steel cladding (9Cr) has being evaluated in the Phase II of the Feasibility Studies on Commercialized Fast Reactor Cycle System. The cost for manufacturing mother tube (raw materials powder production, mechanical alloying (MA) by ball mill, canning, hot extrusion, and machining) is a dominant factor in the total cost for manufacturing ODS ferritic steel cladding. In this study, the large-sale 9Cr-ODS martensitic steel mother tube which is made with a large-scale hollow capsule, and long length claddings were manufactured, and the applicability of these processes was evaluated. Following results were obtained in this study. (1) Manufacturing the large scale mother tube in the dimension of 32 mm OD, 21 mm ID, and 2 m length has been successfully carried out using large scale hollow capsule. This mother tube has a high degree of accuracy in size. (2) The chemical composition and the micro structure of the manufactured mother tube are similar to the existing mother tube manufactured by a small scale can. And the remarkable difference between the bottom and top sides in the manufactured mother tube has not been observed. (3) The long length cladding has been successfully manufactured from the large scale mother tube which was made using a large scale hollow capsule. (4) For reducing the manufacturing cost of the ODS steel claddings, manufacturing process of the mother tubes using a large scale hollow capsules is promising. (author)
Large amplitude parallel propagating electromagnetic oscillitons

International Nuclear Information System (INIS)

Cattaert, Tom; Verheest, Frank

2005-01-01

Earlier systematic nonlinear treatments of parallel propagating electromagnetic waves have been given within a fluid dynamic approach, in a frame where the nonlinear structures are stationary and various constraining first integrals can be obtained. This has lead to the concept of oscillitons that has found application in various space plasmas. The present paper differs in three main aspects from the previous studies: first, the invariants are derived in the plasma frame, as customary in the Sagdeev method, thus retaining in Maxwell's equations all possible effects. Second, a single differential equation is obtained for the parallel fluid velocity, in a form reminiscent of the Sagdeev integrals, hence allowing a fully nonlinear discussion of the oscilliton properties, at such amplitudes as the underlying Mach number restrictions allow. Third, the transition to weakly nonlinear whistler oscillitons is done in an analytical rather than a numerical fashion
Quantum Monte Carlo for large chemical systems: implementing efficient strategies for peta scale platforms and beyond

International Nuclear Information System (INIS)

Scemama, Anthony; Caffarel, Michel; Oseret, Emmanuel; Jalby, William

2013-01-01

Various strategies to implement efficiently quantum Monte Carlo (QMC) simulations for large chemical systems are presented. These include: (i) the introduction of an efficient algorithm to calculate the computationally expensive Slater matrices. This novel scheme is based on the use of the highly localized character of atomic Gaussian basis functions (not the molecular orbitals as usually done), (ii) the possibility of keeping the memory footprint minimal, (iii) the important enhancement of single-core performance when efficient optimization tools are used, and (iv) the definition of a universal, dynamic, fault-tolerant, and load-balanced framework adapted to all kinds of computational platforms (massively parallel machines, clusters, or distributed grids). These strategies have been implemented in the QMC-Chem code developed at Toulouse and illustrated with numerical applications on small peptides of increasing sizes (158, 434, 1056, and 1731 electrons). Using 10-80 k computing cores of the Curie machine (GENCI-TGCC-CEA, France), QMC-Chem has been shown to be capable of running at the peta scale level, thus demonstrating that for this machine a large part of the peak performance can be achieved. Implementation of large-scale QMC simulations for future exa scale platforms with a comparable level of efficiency is expected to be feasible. (authors)
Amplification of large-scale magnetic field in nonhelical magnetohydrodynamics

KAUST Repository

Kumar, Rohit

2017-08-11

It is typically assumed that the kinetic and magnetic helicities play a crucial role in the growth of large-scale dynamo. In this paper, we demonstrate that helicity is not essential for the amplification of large-scale magnetic field. For this purpose, we perform nonhelical magnetohydrodynamic (MHD) simulation, and show that the large-scale magnetic field can grow in nonhelical MHD when random external forcing is employed at scale 1/10 the box size. The energy fluxes and shell-to-shell transfer rates computed using the numerical data show that the large-scale magnetic energy grows due to the energy transfers from the velocity field at the forcing scales.
Developing eThread Pipeline Using SAGA-Pilot Abstraction for Large-Scale Structural Bioinformatics

Directory of Open Access Journals (Sweden)

Anjani Ragothaman

2014-01-01

Full Text Available While most of computational annotation approaches are sequence-based, threading methods are becoming increasingly attractive because of predicted structural information that could uncover the underlying function. However, threading tools are generally compute-intensive and the number of protein sequences from even small genomes such as prokaryotes is large typically containing many thousands, prohibiting their application as a genome-wide structural systems biology tool. To leverage its utility, we have developed a pipeline for eThread—a meta-threading protein structure modeling tool, that can use computational resources efficiently and effectively. We employ a pilot-based approach that supports seamless data and task-level parallelism and manages large variation in workload and computational requirements. Our scalable pipeline is deployed on Amazon EC2 and can efficiently select resources based upon task requirements. We present runtime analysis to characterize computational complexity of eThread and EC2 infrastructure. Based on results, we suggest a pathway to an optimized solution with respect to metrics such as time-to-solution or cost-to-solution. Our eThread pipeline can scale to support a large number of sequences and is expected to be a viable solution for genome-scale structural bioinformatics and structure-based annotation, particularly, amenable for small genomes such as prokaryotes. The developed pipeline is easily extensible to other types of distributed cyberinfrastructure.
Hydrometeorological variability on a large french catchment and its relation to large-scale circulation across temporal scales

Science.gov (United States)

Massei, Nicolas; Dieppois, Bastien; Fritier, Nicolas; Laignel, Benoit; Debret, Maxime; Lavers, David; Hannah, David

2015-04-01

In the present context of global changes, considerable efforts have been deployed by the hydrological scientific community to improve our understanding of the impacts of climate fluctuations on water resources. Both observational and modeling studies have been extensively employed to characterize hydrological changes and trends, assess the impact of climate variability or provide future scenarios of water resources. In the aim of a better understanding of hydrological changes, it is of crucial importance to determine how and to what extent trends and long-term oscillations detectable in hydrological variables are linked to global climate oscillations. In this work, we develop an approach associating large-scale/local-scale correlation, enmpirical statistical downscaling and wavelet multiresolution decomposition of monthly precipitation and streamflow over the Seine river watershed, and the North Atlantic sea level pressure (SLP) in order to gain additional insights on the atmospheric patterns associated with the regional hydrology. We hypothesized that: i) atmospheric patterns may change according to the different temporal wavelengths defining the variability of the signals; and ii) definition of those hydrological/circulation relationships for each temporal wavelength may improve the determination of large-scale predictors of local variations. The results showed that the large-scale/local-scale links were not necessarily constant according to time-scale (i.e. for the different frequencies characterizing the signals), resulting in changing spatial patterns across scales. This was then taken into account by developing an empirical statistical downscaling (ESD) modeling approach which integrated discrete wavelet multiresolution analysis for reconstructing local hydrometeorological processes (predictand : precipitation and streamflow on the Seine river catchment) based on a large-scale predictor (SLP over the Euro-Atlantic sector) on a monthly time-step. This approach
Superconducting materials for large scale applications

International Nuclear Information System (INIS)

Dew-Hughes, D.

1975-01-01

Applications of superconductors capable of carrying large current densities in large-scale electrical devices are examined. Discussions are included on critical current density, superconducting materials available, and future prospects for improved superconducting materials. (JRD)
Large-scale parallel configuration interaction. I. Nonrelativisticand scalar-relativistic general active space implementationwith application to (Rb-Ba)+

DEFF Research Database (Denmark)

Knecht, Stefan; Jensen, Hans Jørgen Aagaard; Fleig, Timo

2008-01-01

We present a parallel implementation of a string-driven general active space configuration interaction program for nonrelativistic and scalar-relativistic electronic-structure calculations. The code has been modularly incorporated in the DIRAC quantum chemistry program package. The implementation...
Large-scale influences in near-wall turbulence.

Science.gov (United States)

Hutchins, Nicholas; Marusic, Ivan

2007-03-15

Hot-wire data acquired in a high Reynolds number facility are used to illustrate the need for adequate scale separation when considering the coherent structure in wall-bounded turbulence. It is found that a large-scale motion in the log region becomes increasingly comparable in energy to the near-wall cycle as the Reynolds number increases. Through decomposition of fluctuating velocity signals, it is shown that this large-scale motion has a distinct modulating influence on the small-scale energy (akin to amplitude modulation). Reassessment of DNS data, in light of these results, shows similar trends, with the rate and intensity of production due to the near-wall cycle subject to a modulating influence from the largest-scale motions.
PKI security in large-scale healthcare networks.

Science.gov (United States)

Mantas, Georgios; Lymberopoulos, Dimitrios; Komninos, Nikos

2012-06-01

During the past few years a lot of PKI (Public Key Infrastructures) infrastructures have been proposed for healthcare networks in order to ensure secure communication services and exchange of data among healthcare professionals. However, there is a plethora of challenges in these healthcare PKI infrastructures. Especially, there are a lot of challenges for PKI infrastructures deployed over large-scale healthcare networks. In this paper, we propose a PKI infrastructure to ensure security in a large-scale Internet-based healthcare network connecting a wide spectrum of healthcare units geographically distributed within a wide region. Furthermore, the proposed PKI infrastructure facilitates the trust issues that arise in a large-scale healthcare network including multi-domain PKI infrastructures.
Large Scale Software Building with CMake in ATLAS

Science.gov (United States)

Elmsheuser, J.; Krasznahorkay, A.; Obreshkov, E.; Undrus, A.; ATLAS Collaboration

2017-10-01

The offline software of the ATLAS experiment at the Large Hadron Collider (LHC) serves as the platform for detector data reconstruction, simulation and analysis. It is also used in the detector’s trigger system to select LHC collision events during data taking. The ATLAS offline software consists of several million lines of C++ and Python code organized in a modular design of more than 2000 specialized packages. Because of different workflows, many stable numbered releases are in parallel production use. To accommodate specific workflow requests, software patches with modified libraries are distributed on top of existing software releases on a daily basis. The different ATLAS software applications also require a flexible build system that strongly supports unit and integration tests. Within the last year this build system was migrated to CMake. A CMake configuration has been developed that allows one to easily set up and build the above mentioned software packages. This also makes it possible to develop and test new and modified packages on top of existing releases. The system also allows one to detect and execute partial rebuilds of the release based on single package changes. The build system makes use of CPack for building RPM packages out of the software releases, and CTest for running unit and integration tests. We report on the migration and integration of the ATLAS software to CMake and show working examples of this large scale project in production.
Cosmic Shear With ACS Pure Parallels. Targeted Portion.

Science.gov (United States)

Rhodes, Jason

2002-07-01

Small distortions in the shapes of background galaxies by foreground mass provide a powerful method of directly measuring the amount and distribution of dark matter. Several groups have recently detected this weak lensing by large-scale structure, also called cosmic shear. The high resolution and sensitivity of HST/ACS provide a unique opportunity to measure cosmic shear accurately on small scales. Using 260 parallel orbits in Sloan i {F775W} we will measure for the first time: the cosmic shear variance on scales Omega_m^0.5, with signal-to-noise {s/n} 20, and the mass density Omega_m with s/n=4. They will be done at small angular scales where non-linear effects dominate the power spectrum, providing a test of the gravitational instability paradigm for structure formation. Measurements on these scales are not possible from the ground, because of the systematic effects induced by PSF smearing from seeing. Having many independent lines of sight reduces the uncertainty due to cosmic variance, making parallel observations ideal.
Emerging large-scale solar heating applications

International Nuclear Information System (INIS)

Wong, W.P.; McClung, J.L.

2009-01-01

Currently the market for solar heating applications in Canada is dominated by outdoor swimming pool heating, make-up air pre-heating and domestic water heating in homes, commercial and institutional buildings. All of these involve relatively small systems, except for a few air pre-heating systems on very large buildings. Together these applications make up well over 90% of the solar thermal collectors installed in Canada during 2007. These three applications, along with the recent re-emergence of large-scale concentrated solar thermal for generating electricity, also dominate the world markets. This paper examines some emerging markets for large scale solar heating applications, with a focus on the Canadian climate and market. (author)
Emerging large-scale solar heating applications

Energy Technology Data Exchange (ETDEWEB)

Wong, W.P.; McClung, J.L. [Science Applications International Corporation (SAIC Canada), Ottawa, Ontario (Canada)

2009-07-01

Currently the market for solar heating applications in Canada is dominated by outdoor swimming pool heating, make-up air pre-heating and domestic water heating in homes, commercial and institutional buildings. All of these involve relatively small systems, except for a few air pre-heating systems on very large buildings. Together these applications make up well over 90% of the solar thermal collectors installed in Canada during 2007. These three applications, along with the recent re-emergence of large-scale concentrated solar thermal for generating electricity, also dominate the world markets. This paper examines some emerging markets for large scale solar heating applications, with a focus on the Canadian climate and market. (author)
Parallelization of 2-D lattice Boltzmann codes

International Nuclear Information System (INIS)

Suzuki, Soichiro; Kaburaki, Hideo; Yokokawa, Mitsuo.

1996-03-01

Lattice Boltzmann (LB) codes to simulate two dimensional fluid flow are developed on vector parallel computer Fujitsu VPP500 and scalar parallel computer Intel Paragon XP/S. While a 2-D domain decomposition method is used for the scalar parallel LB code, a 1-D domain decomposition method is used for the vector parallel LB code to be vectorized along with the axis perpendicular to the direction of the decomposition. High parallel efficiency of 95.1% by the vector parallel calculation on 16 processors with 1152x1152 grid and 88.6% by the scalar parallel calculation on 100 processors with 800x800 grid are obtained. The performance models are developed to analyze the performance of the LB codes. It is shown by our performance models that the execution speed of the vector parallel code is about one hundred times faster than that of the scalar parallel code with the same number of processors up to 100 processors. We also analyze the scalability in keeping the available memory size of one processor element at maximum. Our performance model predicts that the execution time of the vector parallel code increases about 3% on 500 processors. Although the 1-D domain decomposition method has in general a drawback in the interprocessor communication, the vector parallel LB code is still suitable for the large scale and/or high resolution simulations. (author)

Parallelization of 2-D lattice Boltzmann codes

Energy Technology Data Exchange (ETDEWEB)

Suzuki, Soichiro; Kaburaki, Hideo; Yokokawa, Mitsuo

1996-03-01

Lattice Boltzmann (LB) codes to simulate two dimensional fluid flow are developed on vector parallel computer Fujitsu VPP500 and scalar parallel computer Intel Paragon XP/S. While a 2-D domain decomposition method is used for the scalar parallel LB code, a 1-D domain decomposition method is used for the vector parallel LB code to be vectorized along with the axis perpendicular to the direction of the decomposition. High parallel efficiency of 95.1% by the vector parallel calculation on 16 processors with 1152x1152 grid and 88.6% by the scalar parallel calculation on 100 processors with 800x800 grid are obtained. The performance models are developed to analyze the performance of the LB codes. It is shown by our performance models that the execution speed of the vector parallel code is about one hundred times faster than that of the scalar parallel code with the same number of processors up to 100 processors. We also analyze the scalability in keeping the available memory size of one processor element at maximum. Our performance model predicts that the execution time of the vector parallel code increases about 3% on 500 processors. Although the 1-D domain decomposition method has in general a drawback in the interprocessor communication, the vector parallel LB code is still suitable for the large scale and/or high resolution simulations. (author).
Sensitivity Analysis of the Proximal-Based Parallel Decomposition Methods

Directory of Open Access Journals (Sweden)

Feng Ma

2014-01-01

Full Text Available The proximal-based parallel decomposition methods were recently proposed to solve structured convex optimization problems. These algorithms are eligible for parallel computation and can be used efficiently for solving large-scale separable problems. In this paper, compared with the previous theoretical results, we show that the range of the involved parameters can be enlarged while the convergence can be still established. Preliminary numerical tests on stable principal component pursuit problem testify to the advantages of the enlargement.
Hybrid parallel execution model for logic-based specification languages

CERN Document Server

Tsai, Jeffrey J P

2001-01-01

Parallel processing is a very important technique for improving the performance of various software development and maintenance activities. The purpose of this book is to introduce important techniques for parallel executation of high-level specifications of software systems. These techniques are very useful for the construction, analysis, and transformation of reliable large-scale and complex software systems. Contents: Current Approaches; Overview of the New Approach; FRORL Requirements Specification Language and Its Decomposition; Rewriting and Data Dependency, Control Flow Analysis of a Lo
Large-scale regions of antimatter

International Nuclear Information System (INIS)

Grobov, A. V.; Rubin, S. G.

2015-01-01

Amodified mechanism of the formation of large-scale antimatter regions is proposed. Antimatter appears owing to fluctuations of a complex scalar field that carries a baryon charge in the inflation era
Large-scale regions of antimatter

Energy Technology Data Exchange (ETDEWEB)

Grobov, A. V., E-mail: alexey.grobov@gmail.com; Rubin, S. G., E-mail: sgrubin@mephi.ru [National Research Nuclear University MEPhI (Russian Federation)

2015-07-15

Amodified mechanism of the formation of large-scale antimatter regions is proposed. Antimatter appears owing to fluctuations of a complex scalar field that carries a baryon charge in the inflation era.
Effects of baryons on the statistical properties of large scale structure of the Universe

International Nuclear Information System (INIS)

Guillet, T.

2010-01-01

Observations of weak gravitational lensing will provide strong constraints on the cosmic expansion history and the growth rate of large scale structure, yielding clues to the properties and nature of dark energy. Their interpretation is impacted by baryonic physics, which are expected to modify the total matter distribution at small scales. My work has focused on determining and modeling the impact of baryons on the statistics of the large scale matter distribution in the Universe. Using numerical simulations, I have extracted the effect of baryons on the power spectrum, variance and skewness of the total density field as predicted by these simulations. I have shown that a model based on the halo model construction, featuring a concentrated central component to account for cool condensed baryons, is able to reproduce accurately, and down to very small scales, the measured amplifications of both the variance and skewness of the density field. Because of well-known issues with baryons in current cosmological simulations, I have extended the central component model to rely on as many observation-based ingredients as possible. As an application, I have studied the effect of baryons on the predictions of the upcoming Euclid weak lensing survey. During the course of this work, I have also worked at developing and extending the RAMSES code, in particular by developing a parallel self-gravity solver, which offers significant performance gains, in particular for the simulation of some astrophysical setups such as isolated galaxy or cluster simulations. (author) [fr
Large-Scale Analysis of Art Proportions

DEFF Research Database (Denmark)

Jensen, Karl Kristoffer

2014-01-01

While literature often tries to impute mathematical constants into art, this large-scale study (11 databases of paintings and photos, around 200.000 items) shows a different truth. The analysis, consisting of the width/height proportions, shows a value of rarely if ever one (square) and with majo......While literature often tries to impute mathematical constants into art, this large-scale study (11 databases of paintings and photos, around 200.000 items) shows a different truth. The analysis, consisting of the width/height proportions, shows a value of rarely if ever one (square...
The Expanded Large Scale Gap Test

Science.gov (United States)

1987-03-01

NSWC TR 86-32 DTIC THE EXPANDED LARGE SCALE GAP TEST BY T. P. LIDDIARD D. PRICE RESEARCH AND TECHNOLOGY DEPARTMENT ’ ~MARCH 1987 Ap~proved for public...arises, to reduce the spread in the LSGT 50% gap value.) The worst charges, such as those with the highest or lowest densities, the largest re-pressed...Arlington, VA 22217 PE 62314N INS3A 1 RJ14E31 7R4TBK 11 TITLE (Include Security CIlmsilficatiorn The Expanded Large Scale Gap Test . 12. PEIRSONAL AUTHOR() T
Modified stress intensity factor as a crack growth parameter applicable under large scale yielding conditions

International Nuclear Information System (INIS)

Yasuoka, Tetsuo; Mizutani, Yoshihiro; Todoroki, Akira

2014-01-01

High-temperature water stress corrosion cracking has high tensile stress sensitivity, and its growth rate has been evaluated using the stress intensity factor, which is a linear fracture mechanics parameter. Stress corrosion cracking mainly occurs and propagates around welded metals or heat-affected zones. These regions have complex residual stress distributions and yield strength distributions because of input heat effects. The authors previously reported that the stress intensity factor becomes inapplicable when steep residual stress distributions or yield strength distributions occur along the crack propagation path, because small-scale yielding conditions deviate around those distributions. Here, when the stress intensity factor is modified by considering these distributions, the modified stress intensity factor may be used for crack growth evaluation for large-scale yielding. The authors previously proposed a modified stress intensity factor incorporating the stress distribution or yield strength distribution in front of the crack using the rate of change of stress intensity factor and yield strength. However, the applicable range of modified stress intensity factor for large-scale yielding was not clarified. In this study, the range was analytically investigated by comparison with the J-integral solution. A three-point bending specimen with parallel surface crack was adopted as the analytical model and the stress intensity factor, modified stress intensity factor and equivalent stress intensity factor derived from the J-integral were calculated and compared under large-scale yielding conditions. The modified stress intensity was closer to the equivalent stress intensity factor when compared with the stress intensity factor. If deviation from the J-integral solution is acceptable up to 2%, the modified stress intensity factor is applicable up to 30% of the J-integral limit, while the stress intensity factor is applicable up to 10%. These results showed that
SMARTS: Exploiting Temporal Locality and Parallelism through Vertical Execution

International Nuclear Information System (INIS)

Beckman, P.; Crotinger, J.; Karmesin, S.; Malony, A.; Oldehoeft, R.; Shende, S.; Smith, S.; Vajracharya, S.

1999-01-01

In the solution of large-scale numerical prob- lems, parallel computing is becoming simultaneously more important and more difficult. The complex organization of today's multiprocessors with several memory hierarchies has forced the scientific programmer to make a choice between simple but unscalable code and scalable but extremely com- plex code that does not port to other architectures. This paper describes how the SMARTS runtime system and the POOMA C++ class library for high-performance scientific computing work together to exploit data parallelism in scientific applications while hiding the details of manag- ing parallelism and data locality from the user. We present innovative algorithms, based on the macro -dataflow model, for detecting data parallelism and efficiently executing data- parallel statements on shared-memory multiprocessors. We also desclibe how these algorithms can be implemented on clusters of SMPS
SMARTS: Exploiting Temporal Locality and Parallelism through Vertical Execution

Energy Technology Data Exchange (ETDEWEB)

Beckman, P.; Crotinger, J.; Karmesin, S.; Malony, A.; Oldehoeft, R.; Shende, S.; Smith, S.; Vajracharya, S.

1999-01-04

In the solution of large-scale numerical prob- lems, parallel computing is becoming simultaneously more important and more difficult. The complex organization of today's multiprocessors with several memory hierarchies has forced the scientific programmer to make a choice between simple but unscalable code and scalable but extremely com- plex code that does not port to other architectures. This paper describes how the SMARTS runtime system and the POOMA C++ class library for high-performance scientific computing work together to exploit data parallelism in scientific applications while hiding the details of manag- ing parallelism and data locality from the user. We present innovative algorithms, based on the macro -dataflow model, for detecting data parallelism and efficiently executing data- parallel statements on shared-memory multiprocessors. We also desclibe how these algorithms can be implemented on clusters of SMPS.
Large scale and big data processing and management

CERN Document Server

Sakr, Sherif

2014-01-01

Large Scale and Big Data: Processing and Management provides readers with a central source of reference on the data management techniques currently available for large-scale data processing. Presenting chapters written by leading researchers, academics, and practitioners, it addresses the fundamental challenges associated with Big Data processing tools and techniques across a range of computing environments.The book begins by discussing the basic concepts and tools of large-scale Big Data processing and cloud computing. It also provides an overview of different programming models and cloud-bas
Scalability of Parallel Scientific Applications on the Cloud

Directory of Open Access Journals (Sweden)

Satish Narayana Srirama

2011-01-01

Full Text Available Cloud computing, with its promise of virtually infinite resources, seems to suit well in solving resource greedy scientific computing problems. To study the effects of moving parallel scientific applications onto the cloud, we deployed several benchmark applications like matrix–vector operations and NAS parallel benchmarks, and DOUG (Domain decomposition On Unstructured Grids on the cloud. DOUG is an open source software package for parallel iterative solution of very large sparse systems of linear equations. The detailed analysis of DOUG on the cloud showed that parallel applications benefit a lot and scale reasonable on the cloud. We could also observe the limitations of the cloud and its comparison with cluster in terms of performance. However, for efficiently running the scientific applications on the cloud infrastructure, the applications must be reduced to frameworks that can successfully exploit the cloud resources, like the MapReduce framework. Several iterative and embarrassingly parallel algorithms are reduced to the MapReduce model and their performance is measured and analyzed. The analysis showed that Hadoop MapReduce has significant problems with iterative methods, while it suits well for embarrassingly parallel algorithms. Scientific computing often uses iterative methods to solve large problems. Thus, for scientific computing on the cloud, this paper raises the necessity for better frameworks or optimizations for MapReduce.
Large scale cluster computing workshop

International Nuclear Information System (INIS)

Dane Skow; Alan Silverman

2002-01-01

Recent revolutions in computer hardware and software technologies have paved the way for the large-scale deployment of clusters of commodity computers to address problems heretofore the domain of tightly coupled SMP processors. Near term projects within High Energy Physics and other computing communities will deploy clusters of scale 1000s of processors and be used by 100s to 1000s of independent users. This will expand the reach in both dimensions by an order of magnitude from the current successful production facilities. The goals of this workshop were: (1) to determine what tools exist which can scale up to the cluster sizes foreseen for the next generation of HENP experiments (several thousand nodes) and by implication to identify areas where some investment of money or effort is likely to be needed. (2) To compare and record experimences gained with such tools. (3) To produce a practical guide to all stages of planning, installing, building and operating a large computing cluster in HENP. (4) To identify and connect groups with similar interest within HENP and the larger clustering community
Iterative algorithms for large sparse linear systems on parallel computers

Science.gov (United States)

Adams, L. M.

1982-01-01

Algorithms for assembling in parallel the sparse system of linear equations that result from finite difference or finite element discretizations of elliptic partial differential equations, such as those that arise in structural engineering are developed. Parallel linear stationary iterative algorithms and parallel preconditioned conjugate gradient algorithms are developed for solving these systems. In addition, a model for comparing parallel algorithms on array architectures is developed and results of this model for the algorithms are given.
Large-Scale Agriculture and Outgrower Schemes in Ethiopia

DEFF Research Database (Denmark)

Wendimu, Mengistu Assefa

, the impact of large-scale agriculture and outgrower schemes on productivity, household welfare and wages in developing countries is highly contentious. Chapter 1 of this thesis provides an introduction to the study, while also reviewing the key debate in the contemporary land ‘grabbing’ and historical large...... sugarcane outgrower scheme on household income and asset stocks. Chapter 5 examines the wages and working conditions in ‘formal’ large-scale and ‘informal’ small-scale irrigated agriculture. The results in Chapter 2 show that moisture stress, the use of untested planting materials, and conflict over land...... commands a higher wage than ‘formal’ large-scale agriculture, while rather different wage determination mechanisms exist in the two sectors. Human capital characteristics (education and experience) partly explain the differences in wages within the formal sector, but play no significant role...
Neural nets for massively parallel optimization

Science.gov (United States)

Dixon, Laurence C. W.; Mills, David

1992-07-01

To apply massively parallel processing systems to the solution of large scale optimization problems it is desirable to be able to evaluate any function f(z), z (epsilon) Rn in a parallel manner. The theorem of Cybenko, Hecht Nielsen, Hornik, Stinchcombe and White, and Funahasi shows that this can be achieved by a neural network with one hidden layer. In this paper we address the problem of the number of nodes required in the layer to achieve a given accuracy in the function and gradient values at all points within a given n dimensional interval. The type of activation function needed to obtain nonsingular Hessian matrices is described and a strategy for obtaining accurate minimal networks presented.
Economically viable large-scale hydrogen liquefaction

Science.gov (United States)

Cardella, U.; Decker, L.; Klein, H.

2017-02-01

The liquid hydrogen demand, particularly driven by clean energy applications, will rise in the near future. As industrial large scale liquefiers will play a major role within the hydrogen supply chain, production capacity will have to increase by a multiple of today’s typical sizes. The main goal is to reduce the total cost of ownership for these plants by increasing energy efficiency with innovative and simple process designs, optimized in capital expenditure. New concepts must ensure a manageable plant complexity and flexible operability. In the phase of process development and selection, a dimensioning of key equipment for large scale liquefiers, such as turbines and compressors as well as heat exchangers, must be performed iteratively to ensure technological feasibility and maturity. Further critical aspects related to hydrogen liquefaction, e.g. fluid properties, ortho-para hydrogen conversion, and coldbox configuration, must be analysed in detail. This paper provides an overview on the approach, challenges and preliminary results in the development of efficient as well as economically viable concepts for large-scale hydrogen liquefaction.
Large scale chromatographic separations using continuous displacement chromatography (CDC)

International Nuclear Information System (INIS)

Taniguchi, V.T.; Doty, A.W.; Byers, C.H.

1988-01-01

A process for large scale chromatographic separations using a continuous chromatography technique is described. The process combines the advantages of large scale batch fixed column displacement chromatography with conventional analytical or elution continuous annular chromatography (CAC) to enable large scale displacement chromatography to be performed on a continuous basis (CDC). Such large scale, continuous displacement chromatography separations have not been reported in the literature. The process is demonstrated with the ion exchange separation of a binary lanthanide (Nd/Pr) mixture. The process is, however, applicable to any displacement chromatography separation that can be performed using conventional batch, fixed column chromatography
Large-Scale, Multi-Sensor Atmospheric Data Fusion Using Hybrid Cloud Computing

Science.gov (United States)

Wilson, B. D.; Manipon, G.; Hua, H.; Fetzer, E. J.

2015-12-01

NASA's Earth Observing System (EOS) is an ambitious facility for studying global climate change. The mandate now is to combine measurements from the instruments on the "A-Train" platforms (AIRS, MODIS, MLS, and CloudSat) and other Earth probes to enable large-scale studies of climate change over decades. Moving to multi-sensor, long-duration presents serious challenges for large-scale data mining and fusion. For example, one might want to compare temperature and water vapor retrievals from one instrument (AIRS) to another (MODIS), and to a model (ECMWF), stratify the comparisons using a classification of the "cloud scenes" from CloudSat, and repeat the entire analysis over 10 years of data. HySDS is a Hybrid-Cloud Science Data System that has been developed and applied under NASA AIST, MEaSUREs, and ACCESS grants. HySDS uses the SciFlow workflow engine to partition analysis workflows into parallel tasks (e.g. segmenting by time or space) that are pushed into a durable job queue. The tasks are "pulled" from the queue by worker Virtual Machines (VM's) and executed in an on-premise Cloud (Eucalyptus or OpenStack) or at Amazon in the public Cloud or govCloud. In this way, years of data (millions of files) can be processed in a massively parallel way. Input variables (arrays) are pulled on-demand into the Cloud using OPeNDAP URLs or other subsetting services, thereby minimizing the size of the transferred data. We are using HySDS to automate the production of multiple versions of a ten-year A-Train water vapor climatology under a MEASURES grant. We will present the architecture of HySDS, describe the achieved "clock time" speedups in fusing datasets on our own nodes and in the Amazon Cloud, and discuss the Cloud cost tradeoffs for storage, compute, and data transfer. Our system demonstrates how one can pull A-Train variables (Levels 2 & 3) on-demand into the Amazon Cloud, and cache only those variables that are heavily used, so that any number of compute jobs can be

Large Scale Processes and Extreme Floods in Brazil

Science.gov (United States)

Ribeiro Lima, C. H.; AghaKouchak, A.; Lall, U.

2016-12-01

Persistent large scale anomalies in the atmospheric circulation and ocean state have been associated with heavy rainfall and extreme floods in water basins of different sizes across the world. Such studies have emerged in the last years as a new tool to improve the traditional, stationary based approach in flood frequency analysis and flood prediction. Here we seek to advance previous studies by evaluating the dominance of large scale processes (e.g. atmospheric rivers/moisture transport) over local processes (e.g. local convection) in producing floods. We consider flood-prone regions in Brazil as case studies and the role of large scale climate processes in generating extreme floods in such regions is explored by means of observed streamflow, reanalysis data and machine learning methods. The dynamics of the large scale atmospheric circulation in the days prior to the flood events are evaluated based on the vertically integrated moisture flux and its divergence field, which are interpreted in a low-dimensional space as obtained by machine learning techniques, particularly supervised kernel principal component analysis. In such reduced dimensional space, clusters are obtained in order to better understand the role of regional moisture recycling or teleconnected moisture in producing floods of a given magnitude. The convective available potential energy (CAPE) is also used as a measure of local convection activities. We investigate for individual sites the exceedance probability in which large scale atmospheric fluxes dominate the flood process. Finally, we analyze regional patterns of floods and how the scaling law of floods with drainage area responds to changes in the climate forcing mechanisms (e.g. local vs large scale).
Computing in Large-Scale Dynamic Systems

NARCIS (Netherlands)

Pruteanu, A.S.

2013-01-01

Software applications developed for large-scale systems have always been difficult to de- velop due to problems caused by the large number of computing devices involved. Above a certain network size (roughly one hundred), necessary services such as code updating, topol- ogy discovery and data
Rapid Large Scale Reprocessing of the ODI Archive using the QuickReduce Pipeline

Science.gov (United States)

Gopu, A.; Kotulla, R.; Young, M. D.; Hayashi, S.; Harbeck, D.; Liu, W.; Henschel, R.

2015-09-01

The traditional model of astronomers collecting their observations as raw instrument data is being increasingly replaced by astronomical observatories serving standard calibrated data products to observers and to the public at large once proprietary restrictions are lifted. For this model to be effective, observatories need the ability to periodically re-calibrate archival data products as improved master calibration products or pipeline improvements become available, and also to allow users to rapidly calibrate their data on-the-fly. Traditional astronomy pipelines are heavily I/O dependent and do not scale with increasing data volumes. In this paper, we present the One Degree Imager - Portal, Pipeline and Archive (ODI-PPA) calibration pipeline framework which integrates the efficient and parallelized QuickReduce pipeline to enable a large number of simultaneous, parallel data reduction jobs - initiated by operators AND/OR users - while also ensuring rapid processing times and full data provenance. Our integrated pipeline system allows re-processing of the entire ODI archive (˜15,000 raw science frames, ˜3.0 TB compressed) within ˜18 hours using twelve 32-core compute nodes on the Big Red II supercomputer. Our flexible, fast, easy to operate, and highly scalable framework improves access to ODI data, in particular when data rates double with an upgraded focal plane (scheduled for 2015), and also serve as a template for future data processing infrastructure across the astronomical community and beyond.
Fires in large scale ventilation systems

International Nuclear Information System (INIS)

Gregory, W.S.; Martin, R.A.; White, B.W.; Nichols, B.D.; Smith, P.R.; Leslie, I.H.; Fenton, D.L.; Gunaji, M.V.; Blythe, J.P.

1991-01-01

This paper summarizes the experience gained simulating fires in large scale ventilation systems patterned after ventilation systems found in nuclear fuel cycle facilities. The series of experiments discussed included: (1) combustion aerosol loading of 0.61x0.61 m HEPA filters with the combustion products of two organic fuels, polystyrene and polymethylemethacrylate; (2) gas dynamic and heat transport through a large scale ventilation system consisting of a 0.61x0.61 m duct 90 m in length, with dampers, HEPA filters, blowers, etc.; (3) gas dynamic and simultaneous transport of heat and solid particulate (consisting of glass beads with a mean aerodynamic diameter of 10μ) through the large scale ventilation system; and (4) the transport of heat and soot, generated by kerosene pool fires, through the large scale ventilation system. The FIRAC computer code, designed to predict fire-induced transients in nuclear fuel cycle facility ventilation systems, was used to predict the results of experiments (2) through (4). In general, the results of the predictions were satisfactory. The code predictions for the gas dynamics, heat transport, and particulate transport and deposition were within 10% of the experimentally measured values. However, the code was less successful in predicting the amount of soot generation from kerosene pool fires, probably due to the fire module of the code being a one-dimensional zone model. The experiments revealed a complicated three-dimensional combustion pattern within the fire room of the ventilation system. Further refinement of the fire module within FIRAC is needed. (orig.)
Large-scale Complex IT Systems

OpenAIRE

Sommerville, Ian; Cliff, Dave; Calinescu, Radu; Keen, Justin; Kelly, Tim; Kwiatkowska, Marta; McDermid, John; Paige, Richard

2011-01-01

This paper explores the issues around the construction of large-scale complex systems which are built as 'systems of systems' and suggests that there are fundamental reasons, derived from the inherent complexity in these systems, why our current software engineering methods and techniques cannot be scaled up to cope with the engineering challenges of constructing such systems. It then goes on to propose a research and education agenda for software engineering that identifies the major challen...
Large-scale complex IT systems

OpenAIRE

Sommerville, Ian; Cliff, Dave; Calinescu, Radu; Keen, Justin; Kelly, Tim; Kwiatkowska, Marta; McDermid, John; Paige, Richard

2012-01-01

12 pages, 2 figures This paper explores the issues around the construction of large-scale complex systems which are built as 'systems of systems' and suggests that there are fundamental reasons, derived from the inherent complexity in these systems, why our current software engineering methods and techniques cannot be scaled up to cope with the engineering challenges of constructing such systems. It then goes on to propose a research and education agenda for software engineering that ident...
First Mile Challenges for Large-Scale IoT

KAUST Repository

Bader, Ahmed; Elsawy, Hesham; Gharbieh, Mohammad; Alouini, Mohamed-Slim; Adinoyi, Abdulkareem; Alshaalan, Furaih

2017-01-01

The Internet of Things is large-scale by nature. This is not only manifested by the large number of connected devices, but also by the sheer scale of spatial traffic intensity that must be accommodated, primarily in the uplink direction. To that end
MARVIN: Distributed reasoning over large-scale Semantic Web data

NARCIS (Netherlands)

Oren, E.; Kotoulas, S.; Anadiotis, G.; Siebes, R.M.; ten Teije, A.C.M.; van Harmelen, F.A.H.

2009-01-01

Many Semantic Web problems are difficult to solve through common divide-and-conquer strategies, since they are hard to partition. We present Marvin, a parallel and distributed platform for processing large amounts of RDF data, on a network of loosely coupled peers. We present our divide-conquer-swap
Intelligent spatial ecosystem modeling using parallel processors

International Nuclear Information System (INIS)

Maxwell, T.; Costanza, R.

1993-01-01

Spatial modeling of ecosystems is essential if one's modeling goals include developing a relatively realistic description of past behavior and predictions of the impacts of alternative management policies on future ecosystem behavior. Development of these models has been limited in the past by the large amount of input data required and the difficulty of even large mainframe serial computers in dealing with large spatial arrays. These two limitations have begun to erode with the increasing availability of remote sensing data and GIS systems to manipulate it, and the development of parallel computer systems which allow computation of large, complex, spatial arrays. Although many forms of dynamic spatial modeling are highly amenable to parallel processing, the primary focus in this project is on process-based landscape models. These models simulate spatial structure by first compartmentalizing the landscape into some geometric design and then describing flows within compartments and spatial processes between compartments according to location-specific algorithms. The authors are currently building and running parallel spatial models at the regional scale for the Patuxent River region in Maryland, the Everglades in Florida, and Barataria Basin in Louisiana. The authors are also planning a project to construct a series of spatially explicit linked ecological and economic simulation models aimed at assessing the long-term potential impacts of global climate change
A scalable PC-based parallel computer for lattice QCD

International Nuclear Information System (INIS)

Fodor, Z.; Katz, S.D.; Pappa, G.

2003-01-01

A PC-based parallel computer for medium/large scale lattice QCD simulations is suggested. The Eoetvoes Univ., Inst. Theor. Phys. cluster consists of 137 Intel P4-1.7GHz nodes. Gigabit Ethernet cards are used for nearest neighbor communication in a two-dimensional mesh. The sustained performance for dynamical staggered (wilson) quarks on large lattices is around 70(110) GFlops. The exceptional price/performance ratio is below $1/Mflop
A scalable PC-based parallel computer for lattice QCD

International Nuclear Information System (INIS)

Fodor, Z.; Papp, G.

2002-09-01

A PC-based parallel computer for medium/large scale lattice QCD simulations is suggested. The Eoetvoes Univ., Inst. Theor. Phys. cluster consists of 137 Intel P4-1.7 GHz nodes. Gigabit Ethernet cards are used for nearest neighbor communication in a two-dimensional mesh. The sustained performance for dynamical staggered(wilson) quarks on large lattices is around 70(110) GFlops. The exceptional price/performance ratio is below $1/Mflop. (orig.)
Parallel preconditioning techniques for sparse CG solvers

Energy Technology Data Exchange (ETDEWEB)

Basermann, A.; Reichel, B.; Schelthoff, C. [Central Institute for Applied Mathematics, Juelich (Germany)

1996-12-31

Conjugate gradient (CG) methods to solve sparse systems of linear equations play an important role in numerical methods for solving discretized partial differential equations. The large size and the condition of many technical or physical applications in this area result in the need for efficient parallelization and preconditioning techniques of the CG method. In particular for very ill-conditioned matrices, sophisticated preconditioner are necessary to obtain both acceptable convergence and accuracy of CG. Here, we investigate variants of polynomial and incomplete Cholesky preconditioners that markedly reduce the iterations of the simply diagonally scaled CG and are shown to be well suited for massively parallel machines.
Large scale access tests and online interfaces to ATLAS conditions databases

International Nuclear Information System (INIS)

Amorim, A; Lopes, L; Pereira, P; Simoes, J; Soloviev, I; Burckhart, D; Schmitt, J V D; Caprini, M; Kolos, S

2008-01-01

The access of the ATLAS Trigger and Data Acquisition (TDAQ) system to the ATLAS Conditions Databases sets strong reliability and performance requirements on the database storage and access infrastructures. Several applications were developed to support the integration of Conditions database access with the online services in TDAQ, including the interface to the Information Services (IS) and to the TDAQ Configuration Databases. The information storage requirements were the motivation for the ONline A Synchronous Interface to COOL (ONASIC) from the Information Service (IS) to LCG/COOL databases. ONASIC avoids the possible backpressure from Online Database servers by managing a local cache. In parallel, OKS2COOL was developed to store Configuration Databases into an Offline Database with history record. The DBStressor application was developed to test and stress the access to the Conditions database using the LCG/COOL interface while operating in an integrated way as a TDAQ application. The performance scaling of simultaneous Conditions database read accesses was studied in the context of the ATLAS High Level Trigger large computing farms. A large set of tests were performed involving up to 1000 computing nodes that simultaneously accessed the LCG central database server infrastructure at CERN
Parallel computing simulation of fluid flow in the unsaturated zone of Yucca Mountain, Nevada

International Nuclear Information System (INIS)

Zhang, Keni; Wu, Yu-Shu; Bodvarsson, G.S.

2001-01-01

This paper presents the application of parallel computing techniques to large-scale modeling of fluid flow in the unsaturated zone (UZ) at Yucca Mountain, Nevada. In this study, parallel computing techniques, as implemented into the TOUGH2 code, are applied in large-scale numerical simulations on a distributed-memory parallel computer. The modeling study has been conducted using an over-one-million-cell three-dimensional numerical model, which incorporates a wide variety of field data for the highly heterogeneous fractured formation at Yucca Mountain. The objective of this study is to analyze the impact of various surface infiltration scenarios (under current and possible future climates) on flow through the UZ system, using various hydrogeological conceptual models with refined grids. The results indicate that the one-million-cell models produce better resolution results and reveal some flow patterns that cannot be obtained using coarse-grid modeling models
Prospects for large scale electricity storage in Denmark

DEFF Research Database (Denmark)

Krog Ekman, Claus; Jensen, Søren Højgaard

2010-01-01

In a future power systems with additional wind power capacity there will be an increased need for large scale power management as well as reliable balancing and reserve capabilities. Different technologies for large scale electricity storage provide solutions to the different challenges arising w...
Multi-format all-optical processing based on a large-scale, hybridly integrated photonic circuit.

Science.gov (United States)

Bougioukos, M; Kouloumentas, Ch; Spyropoulou, M; Giannoulis, G; Kalavrouziotis, D; Maziotis, A; Bakopoulos, P; Harmon, R; Rogers, D; Harrison, J; Poustie, A; Maxwell, G; Avramopoulos, H

2011-06-06

We investigate through numerical studies and experiments the performance of a large scale, silica-on-silicon photonic integrated circuit for multi-format regeneration and wavelength-conversion. The circuit encompasses a monolithically integrated array of four SOAs inside two parallel Mach-Zehnder structures, four delay interferometers and a large number of silica waveguides and couplers. Exploiting phase-incoherent techniques, the circuit is capable of processing OOK signals at variable bit rates, DPSK signals at 22 or 44 Gb/s and DQPSK signals at 44 Gbaud. Simulation studies reveal the wavelength-conversion potential of the circuit with enhanced regenerative capabilities for OOK and DPSK modulation formats and acceptable quality degradation for DQPSK format. Regeneration of 22 Gb/s OOK signals with amplified spontaneous emission (ASE) noise and DPSK data signals degraded with amplitude, phase and ASE noise is experimentally validated demonstrating a power penalty improvement up to 1.5 dB.
KINETIC ALFVÉN WAVE GENERATION BY LARGE-SCALE PHASE MIXING

International Nuclear Information System (INIS)

Vásconez, C. L.; Pucci, F.; Valentini, F.; Servidio, S.; Malara, F.; Matthaeus, W. H.

2015-01-01

One view of the solar wind turbulence is that the observed highly anisotropic fluctuations at spatial scales near the proton inertial length d p may be considered as kinetic Alfvén waves (KAWs). In the present paper, we show how phase mixing of large-scale parallel-propagating Alfvén waves is an efficient mechanism for the production of KAWs at wavelengths close to d p and at a large propagation angle with respect to the magnetic field. Magnetohydrodynamic (MHD), Hall magnetohydrodynamic (HMHD), and hybrid Vlasov–Maxwell (HVM) simulations modeling the propagation of Alfvén waves in inhomogeneous plasmas are performed. In the linear regime, the role of dispersive effects is singled out by comparing MHD and HMHD results. Fluctuations produced by phase mixing are identified as KAWs through a comparison of polarization of magnetic fluctuations and wave-group velocity with analytical linear predictions. In the nonlinear regime, a comparison of HMHD and HVM simulations allows us to point out the role of kinetic effects in shaping the proton-distribution function. We observe the generation of temperature anisotropy with respect to the local magnetic field and the production of field-aligned beams. The regions where the proton-distribution function highly departs from thermal equilibrium are located inside the shear layers, where the KAWs are excited, this suggesting that the distortions of the proton distribution are driven by a resonant interaction of protons with KAW fluctuations. Our results are relevant in configurations where magnetic-field inhomogeneities are present, as, for example, in the solar corona, where the presence of Alfvén waves has been ascertained
KINETIC ALFVÉN WAVE GENERATION BY LARGE-SCALE PHASE MIXING

Energy Technology Data Exchange (ETDEWEB)

Vásconez, C. L.; Pucci, F.; Valentini, F.; Servidio, S.; Malara, F. [Dipartimento di Fisica, Università della Calabria, I-87036, Rende (CS) (Italy); Matthaeus, W. H. [Department of Physics and Astronomy, University of Delaware, DE 19716 (United States)

2015-12-10

One view of the solar wind turbulence is that the observed highly anisotropic fluctuations at spatial scales near the proton inertial length d{sub p} may be considered as kinetic Alfvén waves (KAWs). In the present paper, we show how phase mixing of large-scale parallel-propagating Alfvén waves is an efficient mechanism for the production of KAWs at wavelengths close to d{sub p} and at a large propagation angle with respect to the magnetic field. Magnetohydrodynamic (MHD), Hall magnetohydrodynamic (HMHD), and hybrid Vlasov–Maxwell (HVM) simulations modeling the propagation of Alfvén waves in inhomogeneous plasmas are performed. In the linear regime, the role of dispersive effects is singled out by comparing MHD and HMHD results. Fluctuations produced by phase mixing are identified as KAWs through a comparison of polarization of magnetic fluctuations and wave-group velocity with analytical linear predictions. In the nonlinear regime, a comparison of HMHD and HVM simulations allows us to point out the role of kinetic effects in shaping the proton-distribution function. We observe the generation of temperature anisotropy with respect to the local magnetic field and the production of field-aligned beams. The regions where the proton-distribution function highly departs from thermal equilibrium are located inside the shear layers, where the KAWs are excited, this suggesting that the distortions of the proton distribution are driven by a resonant interaction of protons with KAW fluctuations. Our results are relevant in configurations where magnetic-field inhomogeneities are present, as, for example, in the solar corona, where the presence of Alfvén waves has been ascertained.
Evolution of scaling emergence in large-scale spatial epidemic spreading.

Science.gov (United States)

Wang, Lin; Li, Xiang; Zhang, Yi-Qing; Zhang, Yan; Zhang, Kan

2011-01-01

Zipf's law and Heaps' law are two representatives of the scaling concepts, which play a significant role in the study of complexity science. The coexistence of the Zipf's law and the Heaps' law motivates different understandings on the dependence between these two scalings, which has still hardly been clarified. In this article, we observe an evolution process of the scalings: the Zipf's law and the Heaps' law are naturally shaped to coexist at the initial time, while the crossover comes with the emergence of their inconsistency at the larger time before reaching a stable state, where the Heaps' law still exists with the disappearance of strict Zipf's law. Such findings are illustrated with a scenario of large-scale spatial epidemic spreading, and the empirical results of pandemic disease support a universal analysis of the relation between the two laws regardless of the biological details of disease. Employing the United States domestic air transportation and demographic data to construct a metapopulation model for simulating the pandemic spread at the U.S. country level, we uncover that the broad heterogeneity of the infrastructure plays a key role in the evolution of scaling emergence. The analyses of large-scale spatial epidemic spreading help understand the temporal evolution of scalings, indicating the coexistence of the Zipf's law and the Heaps' law depends on the collective dynamics of epidemic processes, and the heterogeneity of epidemic spread indicates the significance of performing targeted containment strategies at the early time of a pandemic disease.
Efficient parallel simulation of CO2 geologic sequestration in saline aquifers

International Nuclear Information System (INIS)

Zhang, Keni; Doughty, Christine; Wu, Yu-Shu; Pruess, Karsten

2007-01-01

An efficient parallel simulator for large-scale, long-term CO2 geologic sequestration in saline aquifers has been developed. The parallel simulator is a three-dimensional, fully implicit model that solves large, sparse linear systems arising from discretization of the partial differential equations for mass and energy balance in porous and fractured media. The simulator is based on the ECO2N module of the TOUGH2code and inherits all the process capabilities of the single-CPU TOUGH2code, including a comprehensive description of the thermodynamics and thermophysical properties of H2O-NaCl- CO2 mixtures, modeling single and/or two-phase isothermal or non-isothermal flow processes, two-phase mixtures, fluid phases appearing or disappearing, as well as salt precipitation or dissolution. The new parallel simulator uses MPI for parallel implementation, the METIS software package for simulation domain partitioning, and the iterative parallel linear solver package Aztec for solving linear equations by multiple processors. In addition, the parallel simulator has been implemented with an efficient communication scheme. Test examples show that a linear or super-linear speedup can be obtained on Linux clusters as well as on supercomputers. Because of the significant improvement in both simulation time and memory requirement, the new simulator provides a powerful tool for tackling larger scale and more complex problems than can be solved by single-CPU codes. A high-resolution simulation example is presented that models buoyant convection, induced by a small increase in brine density caused by dissolution of CO2

Large-Scale Structure and Hyperuniformity of Amorphous Ices

Science.gov (United States)

Martelli, Fausto; Torquato, Salvatore; Giovambattista, Nicolas; Car, Roberto

2017-09-01

We investigate the large-scale structure of amorphous ices and transitions between their different forms by quantifying their large-scale density fluctuations. Specifically, we simulate the isothermal compression of low-density amorphous ice (LDA) and hexagonal ice to produce high-density amorphous ice (HDA). Both HDA and LDA are nearly hyperuniform; i.e., they are characterized by an anomalous suppression of large-scale density fluctuations. By contrast, in correspondence with the nonequilibrium phase transitions to HDA, the presence of structural heterogeneities strongly suppresses the hyperuniformity and the system becomes hyposurficial (devoid of "surface-area fluctuations"). Our investigation challenges the largely accepted "frozen-liquid" picture, which views glasses as structurally arrested liquids. Beyond implications for water, our findings enrich our understanding of pressure-induced structural transformations in glasses.
A Scalable Parallel PWTD-Accelerated SIE Solver for Analyzing Transient Scattering from Electrically Large Objects

KAUST Repository

Liu, Yang

2015-12-17

A scalable parallel plane-wave time-domain (PWTD) algorithm for efficient and accurate analysis of transient scattering from electrically large objects is presented. The algorithm produces scalable communication patterns on very large numbers of processors by leveraging two mechanisms: (i) a hierarchical parallelization strategy to evenly distribute the computation and memory loads at all levels of the PWTD tree among processors, and (ii) a novel asynchronous communication scheme to reduce the cost and memory requirement of the communications between the processors. The efficiency and accuracy of the algorithm are demonstrated through its applications to the analysis of transient scattering from a perfect electrically conducting (PEC) sphere with a diameter of 70 wavelengths and a PEC square plate with a dimension of 160 wavelengths. Furthermore, the proposed algorithm is used to analyze transient fields scattered from realistic airplane and helicopter models under high frequency excitation.
WImpiBLAST: web interface for mpiBLAST to help biologists perform large-scale annotation using high performance computing.

Directory of Open Access Journals (Sweden)

Parichit Sharma

Full Text Available The function of a newly sequenced gene can be discovered by determining its sequence homology with known proteins. BLAST is the most extensively used sequence analysis program for sequence similarity search in large databases of sequences. With the advent of next generation sequencing technologies it has now become possible to study genes and their expression at a genome-wide scale through RNA-seq and metagenome sequencing experiments. Functional annotation of all the genes is done by sequence similarity search against multiple protein databases. This annotation task is computationally very intensive and can take days to obtain complete results. The program mpiBLAST, an open-source parallelization of BLAST that achieves superlinear speedup, can be used to accelerate large-scale annotation by using supercomputers and high performance computing (HPC clusters. Although many parallel bioinformatics applications using the Message Passing Interface (MPI are available in the public domain, researchers are reluctant to use them due to lack of expertise in the Linux command line and relevant programming experience. With these limitations, it becomes difficult for biologists to use mpiBLAST for accelerating annotation. No web interface is available in the open-source domain for mpiBLAST. We have developed WImpiBLAST, a user-friendly open-source web interface for parallel BLAST searches. It is implemented in Struts 1.3 using a Java backbone and runs atop the open-source Apache Tomcat Server. WImpiBLAST supports script creation and job submission features and also provides a robust job management interface for system administrators. It combines script creation and modification features with job monitoring and management through the Torque resource manager on a Linux-based HPC cluster. Use case information highlights the acceleration of annotation analysis achieved by using WImpiBLAST. Here, we describe the WImpiBLAST web interface features and architecture
WImpiBLAST: web interface for mpiBLAST to help biologists perform large-scale annotation using high performance computing.

Science.gov (United States)

Sharma, Parichit; Mantri, Shrikant S

2014-01-01

The function of a newly sequenced gene can be discovered by determining its sequence homology with known proteins. BLAST is the most extensively used sequence analysis program for sequence similarity search in large databases of sequences. With the advent of next generation sequencing technologies it has now become possible to study genes and their expression at a genome-wide scale through RNA-seq and metagenome sequencing experiments. Functional annotation of all the genes is done by sequence similarity search against multiple protein databases. This annotation task is computationally very intensive and can take days to obtain complete results. The program mpiBLAST, an open-source parallelization of BLAST that achieves superlinear speedup, can be used to accelerate large-scale annotation by using supercomputers and high performance computing (HPC) clusters. Although many parallel bioinformatics applications using the Message Passing Interface (MPI) are available in the public domain, researchers are reluctant to use them due to lack of expertise in the Linux command line and relevant programming experience. With these limitations, it becomes difficult for biologists to use mpiBLAST for accelerating annotation. No web interface is available in the open-source domain for mpiBLAST. We have developed WImpiBLAST, a user-friendly open-source web interface for parallel BLAST searches. It is implemented in Struts 1.3 using a Java backbone and runs atop the open-source Apache Tomcat Server. WImpiBLAST supports script creation and job submission features and also provides a robust job management interface for system administrators. It combines script creation and modification features with job monitoring and management through the Torque resource manager on a Linux-based HPC cluster. Use case information highlights the acceleration of annotation analysis achieved by using WImpiBLAST. Here, we describe the WImpiBLAST web interface features and architecture, explain design
Massively parallel sparse matrix function calculations with NTPoly

Science.gov (United States)

Dawson, William; Nakajima, Takahito

2018-04-01

We present NTPoly, a massively parallel library for computing the functions of sparse, symmetric matrices. The theory of matrix functions is a well developed framework with a wide range of applications including differential equations, graph theory, and electronic structure calculations. One particularly important application area is diagonalization free methods in quantum chemistry. When the input and output of the matrix function are sparse, methods based on polynomial expansions can be used to compute matrix functions in linear time. We present a library based on these methods that can compute a variety of matrix functions. Distributed memory parallelization is based on a communication avoiding sparse matrix multiplication algorithm. OpenMP task parallellization is utilized to implement hybrid parallelization. We describe NTPoly's interface and show how it can be integrated with programs written in many different programming languages. We demonstrate the merits of NTPoly by performing large scale calculations on the K computer.
Community response to large-scale federal projects: the case of the MX

International Nuclear Information System (INIS)

Albrecht, S.L.

1983-01-01

An analysis of community response to large-scale defense projects, such as the proposals to site MX missiles in Utah and Nevada, is one way to identify those factors likely to be important in determining community response to nuclear waste repository siting. This chapter gives a brief overview of the MX system's characteristics and the potential impacts it would have had on the rural areas, describes the patterns of community mobilization that occurred in Utah and Nevada, and suggests where this response may parallel community concerns about a repository siting. Three lessons from the MX experience are that local residents, asked to assume a disproportionate share of the negative impacts, should be involved in the siting process, that local residents should be treated as equal, and that compensation should be offered when local residents suffer from political expediency
A Stream Tilling Approach to Surface Area Estimation for Large Scale Spatial Data in a Shared Memory System

Directory of Open Access Journals (Sweden)

Liu Jiping

2017-12-01

Full Text Available Surface area estimation is a widely used tool for resource evaluation in the physical world. When processing large scale spatial data, the input/output (I/O can easily become the bottleneck in parallelizing the algorithm due to the limited physical memory resources and the very slow disk transfer rate. In this paper, we proposed a stream tilling approach to surface area estimation that first decomposed a spatial data set into tiles with topological expansions. With these tiles, the one-to-one mapping relationship between the input and the computing process was broken. Then, we realized a streaming framework towards the scheduling of the I/O processes and computing units. Herein, each computing unit encapsulated a same copy of the estimation algorithm, and multiple asynchronous computing units could work individually in parallel. Finally, the performed experiment demonstrated that our stream tilling estimation can efficiently alleviate the heavy pressures from the I/O-bound work, and the measured speedup after being optimized have greatly outperformed the directly parallel versions in shared memory systems with multi-core processors.
Double inflation: A possible resolution of the large-scale structure problem

International Nuclear Information System (INIS)

Turner, M.S.; Villumsen, J.V.; Vittorio, N.; Silk, J.; Juszkiewicz, R.

1986-11-01

A model is presented for the large-scale structure of the universe in which two successive inflationary phases resulted in large small-scale and small large-scale density fluctuations. This bimodal density fluctuation spectrum in an Ω = 1 universe dominated by hot dark matter leads to large-scale structure of the galaxy distribution that is consistent with recent observational results. In particular, large, nearly empty voids and significant large-scale peculiar velocity fields are produced over scales of ∼100 Mpc, while the small-scale structure over ≤ 10 Mpc resembles that in a low density universe, as observed. Detailed analytical calculations and numerical simulations are given of the spatial and velocity correlations. 38 refs., 6 figs
Large-scale fracture mechancis testing -- requirements and possibilities

International Nuclear Information System (INIS)

Brumovsky, M.

1993-01-01

Application of fracture mechanics to very important and/or complicated structures, like reactor pressure vessels, brings also some questions about the reliability and precision of such calculations. These problems become more pronounced in cases of elastic-plastic conditions of loading and/or in parts with non-homogeneous materials (base metal and austenitic cladding, property gradient changes through material thickness) or with non-homogeneous stress fields (nozzles, bolt threads, residual stresses etc.). For such special cases some verification by large-scale testing is necessary and valuable. This paper discusses problems connected with planning of such experiments with respect to their limitations, requirements to a good transfer of received results to an actual vessel. At the same time, an analysis of possibilities of small-scale model experiments is also shown, mostly in connection with application of results between standard, small-scale and large-scale experiments. Experience from 30 years of large-scale testing in SKODA is used as an example to support this analysis. 1 fig
Ethics of large-scale change

DEFF Research Database (Denmark)

Arler, Finn

2006-01-01

, which kind of attitude is appropriate when dealing with large-scale changes like these from an ethical point of view. Three kinds of approaches are discussed: Aldo Leopold's mountain thinking, the neoclassical economists' approach, and finally the so-called Concentric Circle Theories approach...
A model for optimizing file access patterns using spatio-temporal parallelism

Energy Technology Data Exchange (ETDEWEB)

Boonthanome, Nouanesengsy [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Patchett, John [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Geveci, Berk [Kitware Inc., Clifton Park, NY (United States); Ahrens, James [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Bauer, Andy [Kitware Inc., Clifton Park, NY (United States); Chaudhary, Aashish [Kitware Inc., Clifton Park, NY (United States); Miller, Ross G. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Shipman, Galen M. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Williams, Dean N. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

2013-01-01

For many years now, I/O read time has been recognized as the primary bottleneck for parallel visualization and analysis of large-scale data. In this paper, we introduce a model that can estimate the read time for a file stored in a parallel filesystem when given the file access pattern. Read times ultimately depend on how the file is stored and the access pattern used to read the file. The file access pattern will be dictated by the type of parallel decomposition used. We employ spatio-temporal parallelism, which combines both spatial and temporal parallelism, to provide greater flexibility to possible file access patterns. Using our model, we were able to configure the spatio-temporal parallelism to design optimized read access patterns that resulted in a speedup factor of approximately 400 over traditional file access patterns.
Algorithm and Application of Gcp-Independent Block Adjustment for Super Large-Scale Domestic High Resolution Optical Satellite Imagery

Science.gov (United States)

Sun, Y. S.; Zhang, L.; Xu, B.; Zhang, Y.

2018-04-01

The accurate positioning of optical satellite image without control is the precondition for remote sensing application and small/medium scale mapping in large abroad areas or with large-scale images. In this paper, aiming at the geometric features of optical satellite image, based on a widely used optimization method of constraint problem which is called Alternating Direction Method of Multipliers (ADMM) and RFM least-squares block adjustment, we propose a GCP independent block adjustment method for the large-scale domestic high resolution optical satellite image - GISIBA (GCP-Independent Satellite Imagery Block Adjustment), which is easy to parallelize and highly efficient. In this method, the virtual "average" control points are built to solve the rank defect problem and qualitative and quantitative analysis in block adjustment without control. The test results prove that the horizontal and vertical accuracy of multi-covered and multi-temporal satellite images are better than 10 m and 6 m. Meanwhile the mosaic problem of the adjacent areas in large area DOM production can be solved if the public geographic information data is introduced as horizontal and vertical constraints in the block adjustment process. Finally, through the experiments by using GF-1 and ZY-3 satellite images over several typical test areas, the reliability, accuracy and performance of our developed procedure will be presented and studied in this paper.
Coexistence and conflict: IWRM and large-scale water infrastructure development in Piura, Peru

Directory of Open Access Journals (Sweden)

Megan Mills-Novoa

2017-06-01

Full Text Available Despite the emphasis of Integrated Water Resources Management (IWRM on 'soft' demand-side management, large-scale water infrastructure is increasingly being constructed in basins managed under an IWRM framework. While there has been substantial research on IWRM, few scholars have unpacked how IWRM and large-scale water infrastructure development coexist and conflict. Piura, Peru is an important site for understanding how IWRM and capital-intensive, concrete-heavy water infrastructure development articulate in practice. After 70 years of proposals and planning, the Regional Government of Piura began construction of the mega-irrigation project, Proyecto Especial de Irrigación e Hidroeléctrico del Alto Piura (PEIHAP in 2013. PEIHAP, which will irrigate an additional 19,000 hectares (ha, is being realised in the wake of major reforms in the ChiraPiura River Basin, a pilot basin for the IWRM-inspired 2009 Water Resources Law. We first map the historical trajectory of PEIHAP as it mirrors the shifting political priorities of the Peruvian state. We then draw on interviews with the newly formed River Basin Council, regional government, PEIHAP, and civil society actors to understand why and how these differing water management paradigms coexist. We find that while the 2009 Water Resources Law labels large-scale irrigation infrastructure as an 'exceptional measure', this development continues to eclipse IWRM provisions of the new law. This uneasy coexistence reflects the parallel desires of the state to imbue water policy reform with international credibility via IWRM while also furthering economic development goals via largescale water infrastructure. While the participatory mechanisms and expertise of IWRM-inspired river basin councils have not been brought to bear on the approval and construction of PEIHAP, these institutions will play a crucial role in managing the myriad resource and social conflicts that are likely to result.
Comparison Between Overtopping Discharge in Small and Large Scale Models

DEFF Research Database (Denmark)

Helgason, Einar; Burcharth, Hans F.

2006-01-01

The present paper presents overtopping measurements from small scale model test performed at the Haudraulic & Coastal Engineering Laboratory, Aalborg University, Denmark and large scale model tests performed at the Largde Wave Channel,Hannover, Germany. Comparison between results obtained from...... small and large scale model tests show no clear evidence of scale effects for overtopping above a threshold value. In the large scale model no overtopping was measured for waveheights below Hs = 0.5m as the water sunk into the voids between the stones on the crest. For low overtopping scale effects...
Bayesian Image Restoration Using a Large-Scale Total Patch Variation Prior

Directory of Open Access Journals (Sweden)

Yang Chen

2011-01-01

Full Text Available Edge-preserving Bayesian restorations using nonquadratic priors are often inefficient in restoring continuous variations and tend to produce block artifacts around edges in ill-posed inverse image restorations. To overcome this, we have proposed a spatial adaptive (SA prior with improved performance. However, this SA prior restoration suffers from high computational cost and the unguaranteed convergence problem. Concerning these issues, this paper proposes a Large-scale Total Patch Variation (LS-TPV Prior model for Bayesian image restoration. In this model, the prior for each pixel is defined as a singleton conditional probability, which is in a mixture prior form of one patch similarity prior and one weight entropy prior. A joint MAP estimation is thus built to ensure the iteration monotonicity. The intensive calculation of patch distances is greatly alleviated by the parallelization of Compute Unified Device Architecture(CUDA. Experiments with both simulated and real data validate the good performance of the proposed restoration.
Xyce Parallel Electronic Simulator Users' Guide Version 6.8

Energy Technology Data Exchange (ETDEWEB)

Keiter, Eric R. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Aadithya, Karthik Venkatraman [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Mei, Ting [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Russo, Thomas V. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Schiek, Richard L. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Sholander, Peter E. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Thornquist, Heidi K. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Verley, Jason C. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

2017-10-01

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been de- signed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel com- puting platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase$-$ a message passing parallel implementation $-$ which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.
The relationship between small-scale and large-scale ionospheric electron density irregularities generated by powerful HF electromagnetic waves at high latitudes

Directory of Open Access Journals (Sweden)

E. D. Tereshchenko

2006-11-01

Full Text Available Satellite radio beacons were used in June 2001 to probe the ionosphere modified by a radio beam produced by the EISCAT high-power, high-frequency (HF transmitter located near Tromsø (Norway. Amplitude scintillations and variations of the phase of 150- and 400-MHz signals from Russian navigational satellites passing over the modified region were observed at three receiver sites. In several papers it has been stressed that in the polar ionosphere the thermal self-focusing on striations during ionospheric modification is the main mechanism resulting in the formation of large-scale (hundreds of meters to kilometers nonlinear structures aligned along the geomagnetic field (magnetic zenith effect. It has also been claimed that the maximum effects caused by small-scale (tens of meters irregularities detected in satellite signals are also observed in the direction parallel to the magnetic field. Contrary to those studies, the present paper shows that the maximum in amplitude scintillations does not correspond strictly to the magnetic zenith direction because high latitude drifts typically cause a considerable anisotropy of small-scale irregularities in a plane perpendicular to the geomagnetic field resulting in a deviation of the amplitude-scintillation peak relative to the minimum angle between the line-of-sight to the satellite and direction of the geomagnetic field lines. The variance of the logarithmic relative amplitude fluctuations is considered here, which is a useful quantity in such studies. The experimental values of the variance are compared with model calculations and good agreement has been found. It is also shown from the experimental data that in most of the satellite passes a variance maximum occurs at a minimum in the phase fluctuations indicating that the artificial excitation of large-scale irregularities is minimum when the excitation of small-scale irregularities is maximum.
Pore-Scale Simulation for Predicting Material Transport Through Porous Media

International Nuclear Information System (INIS)

Goichi Itoh; Jinya Nakamura; Koji Kono; Tadashi Watanabe; Hirotada Ohashi; Yu Chen; Shinya Nagasaki

2002-01-01

Microscopic models of real-coded lattice gas automata (RLG) method with a special boundary condition and lattice Boltzmann method (LBM) are developed for simulating three-dimensional fluid dynamics in complex geometry. Those models enable us to simulate pore-scale fluid dynamics that is an essential part for predicting material transport in porous media precisely. For large-scale simulation of porous media with high resolution, the RLG and LBM programs are designed for parallel computation. Simulation results of porous media flow by the LBM with different pressure gradient conditions show quantitative agreements with macroscopic relations of Darcy's law and Kozeny-Carman equation. As for the efficiency of parallel computing, a standard parallel computation by using MPI (Message Passing Interface) is compared with the hybrid parallel computation of MPI-node parallel technique. The benchmark tests conclude that in case of using large number of computing node, the parallel performance declines due to increase of data communication between nodes and the hybrid parallel computation totally shows better performance in comparison with the standard parallel computation. (authors)
Displacement and deformation measurement for large structures by camera network

Science.gov (United States)

Shang, Yang; Yu, Qifeng; Yang, Zhen; Xu, Zhiqiang; Zhang, Xiaohu

2014-03-01

A displacement and deformation measurement method for large structures by a series-parallel connection camera network is presented. By taking the dynamic monitoring of a large-scale crane in lifting operation as an example, a series-parallel connection camera network is designed, and the displacement and deformation measurement method by using this series-parallel connection camera network is studied. The movement range of the crane body is small, and that of the crane arm is large. The displacement of the crane body, the displacement of the crane arm relative to the body and the deformation of the arm are measured. Compared with a pure series or parallel connection camera network, the designed series-parallel connection camera network can be used to measure not only the movement and displacement of a large structure but also the relative movement and deformation of some interesting parts of the large structure by a relatively simple optical measurement system.
Hybrid parallel strategy for the simulation of fast transient accidental situations at reactor scale

International Nuclear Information System (INIS)

Faucher, V.; Galon, P.; Beccantini, A.; Crouzet, F.; Debaud, F.; Gautier, T.

2015-01-01

Highlights: • Reference accidental situations for current and future reactors are considered. • They require the modeling of complex fluid–structure systems at full reactor scale. • EPX software computes the non-linear transient solution with explicit time stepping. • Focus on the parallel hybrid solver specific to the proposed coupled equations. - Abstract: This contribution is dedicated to the latest methodological developments implemented in the fast transient dynamics software EUROPLEXUS (EPX) to simulate the mechanical response of fully coupled fluid–structure systems to accidental situations to be considered at reactor scale, among which the Loss of Coolant Accident, the Core Disruptive Accident and the Hydrogen Explosion. Time integration is explicit and the search for reference solutions within the safety framework prevents any simplification and approximations in the coupled algorithm: for instance, all kinematic constraints are dealt with using Lagrange Multipliers, yielding a complex flow chart when non-permanent constraints such as unilateral contact or immersed fluid–structure boundaries are considered. The parallel acceleration of the solution process is then achieved through a hybrid approach, based on a weighted domain decomposition for distributed memory computing and the use of the KAAPI library for self-balanced shared memory processing inside subdomains

Needs, opportunities, and options for large scale systems research

Energy Technology Data Exchange (ETDEWEB)

Thompson, G.L.

1984-10-01

The Office of Energy Research was recently asked to perform a study of Large Scale Systems in order to facilitate the development of a true large systems theory. It was decided to ask experts in the fields of electrical engineering, chemical engineering and manufacturing/operations research for their ideas concerning large scale systems research. The author was asked to distribute a questionnaire among these experts to find out their opinions concerning recent accomplishments and future research directions in large scale systems research. He was also requested to convene a conference which included three experts in each area as panel members to discuss the general area of large scale systems research. The conference was held on March 26--27, 1984 in Pittsburgh with nine panel members, and 15 other attendees. The present report is a summary of the ideas presented and the recommendations proposed by the attendees.
Parallel Object-Oriented Computation Applied to a Finite Element Problem

Directory of Open Access Journals (Sweden)

Jon B. Weissman

1993-01-01

Full Text Available The conventional wisdom in the scientific computing community is that the best way to solve large-scale numerically intensive scientific problems on today's parallel MIMD computers is to use Fortran or C programmed in a data-parallel style using low-level message-passing primitives. This approach inevitably leads to nonportable codes and extensive development time, and restricts parallel programming to the domain of the expert programmer. We believe that these problems are not inherent to parallel computing but are the result of the programming tools used. We will show that comparable performance can be achieved with little effort if better tools that present higher level abstractions are used. The vehicle for our demonstration is a 2D electromagnetic finite element scattering code we have implemented in Mentat, an object-oriented parallel processing system. We briefly describe the application. Mentat, the implementation, and present performance results for both a Mentat and a hand-coded parallel Fortran version.
Large-scale structure of the Universe

International Nuclear Information System (INIS)

Doroshkevich, A.G.

1978-01-01

The problems, discussed at the ''Large-scale Structure of the Universe'' symposium are considered on a popular level. Described are the cell structure of galaxy distribution in the Universe, principles of mathematical galaxy distribution modelling. The images of cell structures, obtained after reprocessing with the computer are given. Discussed are three hypothesis - vortical, entropic, adiabatic, suggesting various processes of galaxy and galaxy clusters origin. A considerable advantage of the adiabatic hypothesis is recognized. The relict radiation, as a method of direct studying the processes taking place in the Universe is considered. The large-scale peculiarities and small-scale fluctuations of the relict radiation temperature enable one to estimate the turbance properties at the pre-galaxy stage. The discussion of problems, pertaining to studying the hot gas, contained in galaxy clusters, the interactions within galaxy clusters and with the inter-galaxy medium, is recognized to be a notable contribution into the development of theoretical and observational cosmology
Seismic safety in conducting large-scale blasts

Science.gov (United States)

Mashukov, I. V.; Chaplygin, V. V.; Domanov, V. P.; Semin, A. A.; Klimkin, M. A.

2017-09-01

In mining enterprises to prepare hard rocks for excavation a drilling and blasting method is used. With the approach of mining operations to settlements the negative effect of large-scale blasts increases. To assess the level of seismic impact of large-scale blasts the scientific staff of Siberian State Industrial University carried out expertise for coal mines and iron ore enterprises. Determination of the magnitude of surface seismic vibrations caused by mass explosions was performed using seismic receivers, an analog-digital converter with recording on a laptop. The registration results of surface seismic vibrations during production of more than 280 large-scale blasts at 17 mining enterprises in 22 settlements are presented. The maximum velocity values of the Earth’s surface vibrations are determined. The safety evaluation of seismic effect was carried out according to the permissible value of vibration velocity. For cases with exceedance of permissible values recommendations were developed to reduce the level of seismic impact.
Large-scale simulations of error-prone quantum computation devices

International Nuclear Information System (INIS)

Trieu, Doan Binh

2009-01-01

The theoretical concepts of quantum computation in the idealized and undisturbed case are well understood. However, in practice, all quantum computation devices do suffer from decoherence effects as well as from operational imprecisions. This work assesses the power of error-prone quantum computation devices using large-scale numerical simulations on parallel supercomputers. We present the Juelich Massively Parallel Ideal Quantum Computer Simulator (JUMPIQCS), that simulates a generic quantum computer on gate level. It comprises an error model for decoherence and operational errors. The robustness of various algorithms in the presence of noise has been analyzed. The simulation results show that for large system sizes and long computations it is imperative to actively correct errors by means of quantum error correction. We implemented the 5-, 7-, and 9-qubit quantum error correction codes. Our simulations confirm that using error-prone correction circuits with non-fault-tolerant quantum error correction will always fail, because more errors are introduced than being corrected. Fault-tolerant methods can overcome this problem, provided that the single qubit error rate is below a certain threshold. We incorporated fault-tolerant quantum error correction techniques into JUMPIQCS using Steane's 7-qubit code and determined this threshold numerically. Using the depolarizing channel as the source of decoherence, we find a threshold error rate of (5.2±0.2) x 10 -6 . For Gaussian distributed operational over-rotations the threshold lies at a standard deviation of 0.0431±0.0002. We can conclude that quantum error correction is especially well suited for the correction of operational imprecisions and systematic over-rotations. For realistic simulations of specific quantum computation devices we need to extend the generic model to dynamic simulations, i.e. time-dependent Hamiltonian simulations of realistic hardware models. We focus on today's most advanced technology, i
Study of an electromagnetic pump applied to a primary main pump of a large scale sodium cooled reactor

International Nuclear Information System (INIS)

Aizawa, Kosuke; Kotake, Shoji; Chikazawa, Yoshitaka; Ara, Kuniaki; Araseki, Hideo; Aizawa, Rie; Ota, Hiroyuki

2009-01-01

This paper describes a future innovative design options with a parallel electromagnetic pump (EMP) system as the main circulating pump of the JSFR design. A conceptual design of EMPs integrated with an intermediate heat exchanger (IHX) is carried out. The major design parameters are consistent with the current JSFR design, where the main flow rate is 630 m 3 /min and the flow halving time is the same of the mechanical pump with the similar reliability. As a result of several design studies, a five parallel EMPs with IHX system has been selected from the geometry suitability for JSFR design. The EMP advantages comparing with mechanical pumps are investigated from the views of in-service inspection, maintenance and reliability. Numerical analysis with two dimensional MHD codes is conducted on a former experiment of a 160 m 3 /min flow rate EMP. The overall trend of the experimental data and the numerical results agrees with that in small-scale EMPs. However, the difference between the experimental data and the numerical results seems larger compared with the small-scale EMPs, which comes from large magnetic Reynolds number and interaction parameter of 160 m 3 /min EMP. (author)
Image-based Exploration of Large-Scale Pathline Fields

KAUST Repository

Nagoor, Omniah H.

2014-05-27

While real-time applications are nowadays routinely used in visualizing large nu- merical simulations and volumes, handling these large-scale datasets requires high-end graphics clusters or supercomputers to process and visualize them. However, not all users have access to powerful clusters. Therefore, it is challenging to come up with a visualization approach that provides insight to large-scale datasets on a single com- puter. Explorable images (EI) is one of the methods that allows users to handle large data on a single workstation. Although it is a view-dependent method, it combines both exploration and modification of visual aspects without re-accessing the original huge data. In this thesis, we propose a novel image-based method that applies the concept of EI in visualizing large flow-field pathlines data. The goal of our work is to provide an optimized image-based method, which scales well with the dataset size. Our approach is based on constructing a per-pixel linked list data structure in which each pixel contains a list of pathlines segments. With this view-dependent method it is possible to filter, color-code and explore large-scale flow data in real-time. In addition, optimization techniques such as early-ray termination and deferred shading are applied, which further improves the performance and scalability of our approach.
Large Spatial Scale Ground Displacement Mapping through the P-SBAS Processing of Sentinel-1 Data on a Cloud Computing Environment

Science.gov (United States)

Casu, F.; Bonano, M.; de Luca, C.; Lanari, R.; Manunta, M.; Manzo, M.; Zinno, I.

2017-12-01

Since its launch in 2014, the Sentinel-1 (S1) constellation has played a key role on SAR data availability and dissemination all over the World. Indeed, the free and open access data policy adopted by the European Copernicus program together with the global coverage acquisition strategy, make the Sentinel constellation as a game changer in the Earth Observation scenario. Being the SAR data become ubiquitous, the technological and scientific challenge is focused on maximizing the exploitation of such huge data flow. In this direction, the use of innovative processing algorithms and distributed computing infrastructures, such as the Cloud Computing platforms, can play a crucial role. In this work we present a Cloud Computing solution for the advanced interferometric (DInSAR) processing chain based on the Parallel SBAS (P-SBAS) approach, aimed at processing S1 Interferometric Wide Swath (IWS) data for the generation of large spatial scale deformation time series in efficient, automatic and systematic way. Such a DInSAR chain ingests Sentinel 1 SLC images and carries out several processing steps, to finally compute deformation time series and mean deformation velocity maps. Different parallel strategies have been designed ad hoc for each processing step of the P-SBAS S1 chain, encompassing both multi-core and multi-node programming techniques, in order to maximize the computational efficiency achieved within a Cloud Computing environment and cut down the relevant processing times. The presented P-SBAS S1 processing chain has been implemented on the Amazon Web Services platform and a thorough analysis of the attained parallel performances has been performed to identify and overcome the major bottlenecks to the scalability. The presented approach is used to perform national-scale DInSAR analyses over Italy, involving the processing of more than 3000 S1 IWS images acquired from both ascending and descending orbits. Such an experiment confirms the big advantage of
Parallel processing and non-uniform grids in global air quality modeling

NARCIS (Netherlands)

Berkvens, P.J.F.; Bochev, Mikhail A.

2002-01-01

A large-scale global air quality model, running efficiently on a single vector processor, is enhanced to make more realistic and more long-term simulations feasible. Two strategies are combined: non-uniform grids and parallel processing. The communication through the hierarchy of non-uniform grids
Study on MPI/OpenMP hybrid parallelism for Monte Carlo neutron transport code

International Nuclear Information System (INIS)

Liang Jingang; Xu Qi; Wang Kan; Liu Shiwen

2013-01-01

Parallel programming with mixed mode of messages-passing and shared-memory has several advantages when used in Monte Carlo neutron transport code, such as fitting hardware of distributed-shared clusters, economizing memory demand of Monte Carlo transport, improving parallel performance, and so on. MPI/OpenMP hybrid parallelism was implemented based on a one dimension Monte Carlo neutron transport code. Some critical factors affecting the parallel performance were analyzed and solutions were proposed for several problems such as contention access, lock contention and false sharing. After optimization the code was tested finally. It is shown that the hybrid parallel code can reach good performance just as pure MPI parallel program, while it saves a lot of memory usage at the same time. Therefore hybrid parallel is efficient for achieving large-scale parallel of Monte Carlo neutron transport. (authors)
Homogenization of Large-Scale Movement Models in Ecology

Science.gov (United States)

Garlick, M.J.; Powell, J.A.; Hooten, M.B.; McFarlane, L.R.

2011-01-01

A difficulty in using diffusion models to predict large scale animal population dispersal is that individuals move differently based on local information (as opposed to gradients) in differing habitat types. This can be accommodated by using ecological diffusion. However, real environments are often spatially complex, limiting application of a direct approach. Homogenization for partial differential equations has long been applied to Fickian diffusion (in which average individual movement is organized along gradients of habitat and population density). We derive a homogenization procedure for ecological diffusion and apply it to a simple model for chronic wasting disease in mule deer. Homogenization allows us to determine the impact of small scale (10-100 m) habitat variability on large scale (10-100 km) movement. The procedure generates asymptotic equations for solutions on the large scale with parameters defined by small-scale variation. The simplicity of this homogenization procedure is striking when compared to the multi-dimensional homogenization procedure for Fickian diffusion,and the method will be equally straightforward for more complex models. ?? 2010 Society for Mathematical Biology.
The role of large-scale, extratropical dynamics in climate change

Energy Technology Data Exchange (ETDEWEB)

Shepherd, T.G. [ed.

1994-02-01

The climate modeling community has focused recently on improving our understanding of certain processes, such as cloud feedbacks and ocean circulation, that are deemed critical to climate-change prediction. Although attention to such processes is warranted, emphasis on these areas has diminished a general appreciation of the role played by the large-scale dynamics of the extratropical atmosphere. Lack of interest in extratropical dynamics may reflect the assumption that these dynamical processes are a non-problem as far as climate modeling is concerned, since general circulation models (GCMs) calculate motions on this scale from first principles. Nevertheless, serious shortcomings in our ability to understand and simulate large-scale dynamics exist. Partly due to a paucity of standard GCM diagnostic calculations of large-scale motions and their transports of heat, momentum, potential vorticity, and moisture, a comprehensive understanding of the role of large-scale dynamics in GCM climate simulations has not been developed. Uncertainties remain in our understanding and simulation of large-scale extratropical dynamics and their interaction with other climatic processes, such as cloud feedbacks, large-scale ocean circulation, moist convection, air-sea interaction and land-surface processes. To address some of these issues, the 17th Stanstead Seminar was convened at Bishop`s University in Lennoxville, Quebec. The purpose of the Seminar was to promote discussion of the role of large-scale extratropical dynamics in global climate change. Abstracts of the talks are included in this volume. On the basis of these talks, several key issues emerged concerning large-scale extratropical dynamics and their climatic role. Individual records are indexed separately for the database.
The role of large-scale, extratropical dynamics in climate change

International Nuclear Information System (INIS)

Shepherd, T.G.

1994-02-01

The climate modeling community has focused recently on improving our understanding of certain processes, such as cloud feedbacks and ocean circulation, that are deemed critical to climate-change prediction. Although attention to such processes is warranted, emphasis on these areas has diminished a general appreciation of the role played by the large-scale dynamics of the extratropical atmosphere. Lack of interest in extratropical dynamics may reflect the assumption that these dynamical processes are a non-problem as far as climate modeling is concerned, since general circulation models (GCMs) calculate motions on this scale from first principles. Nevertheless, serious shortcomings in our ability to understand and simulate large-scale dynamics exist. Partly due to a paucity of standard GCM diagnostic calculations of large-scale motions and their transports of heat, momentum, potential vorticity, and moisture, a comprehensive understanding of the role of large-scale dynamics in GCM climate simulations has not been developed. Uncertainties remain in our understanding and simulation of large-scale extratropical dynamics and their interaction with other climatic processes, such as cloud feedbacks, large-scale ocean circulation, moist convection, air-sea interaction and land-surface processes. To address some of these issues, the 17th Stanstead Seminar was convened at Bishop's University in Lennoxville, Quebec. The purpose of the Seminar was to promote discussion of the role of large-scale extratropical dynamics in global climate change. Abstracts of the talks are included in this volume. On the basis of these talks, several key issues emerged concerning large-scale extratropical dynamics and their climatic role. Individual records are indexed separately for the database
Parallelization Issues and Particle-In Codes.

Science.gov (United States)

Elster, Anne Cathrine

1994-01-01

"Everything should be made as simple as possible, but not simpler." Albert Einstein. The field of parallel scientific computing has concentrated on parallelization of individual modules such as matrix solvers and factorizers. However, many applications involve several interacting modules. Our analyses of a particle-in-cell code modeling charged particles in an electric field, show that these accompanying dependencies affect data partitioning and lead to new parallelization strategies concerning processor, memory and cache utilization. Our test-bed, a KSR1, is a distributed memory machine with a globally shared addressing space. However, most of the new methods presented hold generally for hierarchical and/or distributed memory systems. We introduce a novel approach that uses dual pointers on the local particle arrays to keep the particle locations automatically partially sorted. Complexity and performance analyses with accompanying KSR benchmarks, have been included for both this scheme and for the traditional replicated grids approach. The latter approach maintains load-balance with respect to particles. However, our results demonstrate it fails to scale properly for problems with large grids (say, greater than 128-by-128) running on as few as 15 KSR nodes, since the extra storage and computation time associated with adding the grid copies, becomes significant. Our grid partitioning scheme, although harder to implement, does not need to replicate the whole grid. Consequently, it scales well for large problems on highly parallel systems. It may, however, require load balancing schemes for non-uniform particle distributions. Our dual pointer approach may facilitate this through dynamically partitioned grids. We also introduce hierarchical data structures that store neighboring grid-points within the same cache -line by reordering the grid indexing. This alignment produces a 25% savings in cache-hits for a 4-by-4 cache. A consideration of the input data's effect on
Xyce parallel electronic simulator : users' guide.

Energy Technology Data Exchange (ETDEWEB)

Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Warrender, Christina E.; Keiter, Eric Richard; Pawlowski, Roger Patrick

2011-05-01

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers; (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only); and (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is
Status: Large-scale subatmospheric cryogenic systems

International Nuclear Information System (INIS)

Peterson, T.

1989-01-01

In the late 1960's and early 1970's an interest in testing and operating RF cavities at 1.8K motivated the development and construction of four large (300 Watt) 1.8K refrigeration systems. in the past decade, development of successful superconducting RF cavities and interest in obtaining higher magnetic fields with the improved Niobium-Titanium superconductors has once again created interest in large-scale 1.8K refrigeration systems. The L'Air Liquide plant for Tore Supra is a recently commissioned 300 Watt 1.8K system which incorporates new technology, cold compressors, to obtain the low vapor pressure for low temperature cooling. CEBAF proposes to use cold compressors to obtain 5KW at 2.0K. Magnetic refrigerators of 10 Watt capacity or higher at 1.8K are now being developed. The state of the art of large-scale refrigeration in the range under 4K will be reviewed. 28 refs., 4 figs., 7 tabs
Enhanced 2D-DOA Estimation for Large Spacing Three-Parallel Uniform Linear Arrays

Directory of Open Access Journals (Sweden)

Dong Zhang

2018-01-01

Full Text Available An enhanced two-dimensional direction of arrival (2D-DOA estimation algorithm for large spacing three-parallel uniform linear arrays (ULAs is proposed in this paper. Firstly, we use the propagator method (PM to get the highly accurate but ambiguous estimation of directional cosine. Then, we use the relationship between the directional cosine to eliminate the ambiguity. This algorithm not only can make use of the elements of the three-parallel ULAs but also can utilize the connection between directional cosine to improve the estimation accuracy. Besides, it has satisfied estimation performance when the elevation angle is between 70° and 90° and it can automatically pair the estimated azimuth and elevation angles. Furthermore, it has low complexity without using any eigen value decomposition (EVD or singular value decompostion (SVD to the covariance matrix. Simulation results demonstrate the effectiveness of our proposed algorithm.
Large-scale weakly supervised object localization via latent category learning.

Science.gov (United States)

Chong Wang; Kaiqi Huang; Weiqiang Ren; Junge Zhang; Maybank, Steve

2015-04-01

Localizing objects in cluttered backgrounds is challenging under large-scale weakly supervised conditions. Due to the cluttered image condition, objects usually have large ambiguity with backgrounds. Besides, there is also a lack of effective algorithm for large-scale weakly supervised localization in cluttered backgrounds. However, backgrounds contain useful latent information, e.g., the sky in the aeroplane class. If this latent information can be learned, object-background ambiguity can be largely reduced and background can be suppressed effectively. In this paper, we propose the latent category learning (LCL) in large-scale cluttered conditions. LCL is an unsupervised learning method which requires only image-level class labels. First, we use the latent semantic analysis with semantic object representation to learn the latent categories, which represent objects, object parts or backgrounds. Second, to determine which category contains the target object, we propose a category selection strategy by evaluating each category's discrimination. Finally, we propose the online LCL for use in large-scale conditions. Evaluation on the challenging PASCAL Visual Object Class (VOC) 2007 and the large-scale imagenet large-scale visual recognition challenge 2013 detection data sets shows that the method can improve the annotation precision by 10% over previous methods. More importantly, we achieve the detection precision which outperforms previous results by a large margin and can be competitive to the supervised deformable part model 5.0 baseline on both data sets.
Large-scale networks in engineering and life sciences

CERN Document Server

Findeisen, Rolf; Flockerzi, Dietrich; Reichl, Udo; Sundmacher, Kai

2014-01-01

This edited volume provides insights into and tools for the modeling, analysis, optimization, and control of large-scale networks in the life sciences and in engineering. Large-scale systems are often the result of networked interactions between a large number of subsystems, and their analysis and control are becoming increasingly important. The chapters of this book present the basic concepts and theoretical foundations of network theory and discuss its applications in different scientific areas such as biochemical reactions, chemical production processes, systems biology, electrical circuits, and mobile agents. The aim is to identify common concepts, to understand the underlying mathematical ideas, and to inspire discussions across the borders of the various disciplines. The book originates from the interdisciplinary summer school “Large Scale Networks in Engineering and Life Sciences” hosted by the International Max Planck Research School Magdeburg, September 26-30, 2011, and will therefore be of int...
Massively parallel implementations of coupled-cluster methods for electron spin resonance spectra. I. Isotropic hyperfine coupling tensors in large radicals

Energy Technology Data Exchange (ETDEWEB)

Verma, Prakash; Morales, Jorge A., E-mail: jorge.morales@ttu.edu [Department of Chemistry and Biochemistry, Texas Tech University, P.O. Box 41061, Lubbock, Texas 79409-1061 (United States); Perera, Ajith [Department of Chemistry and Biochemistry, Texas Tech University, P.O. Box 41061, Lubbock, Texas 79409-1061 (United States); Department of Chemistry, Quantum Theory Project, University of Florida, Gainesville, Florida 32611 (United States)

2013-11-07

Coupled cluster (CC) methods provide highly accurate predictions of molecular properties, but their high computational cost has precluded their routine application to large systems. Fortunately, recent computational developments in the ACES III program by the Bartlett group [the OED/ERD atomic integral package, the super instruction processor, and the super instruction architecture language] permit overcoming that limitation by providing a framework for massively parallel CC implementations. In that scheme, we are further extending those parallel CC efforts to systematically predict the three main electron spin resonance (ESR) tensors (A-, g-, and D-tensors) to be reported in a series of papers. In this paper inaugurating that series, we report our new ACES III parallel capabilities that calculate isotropic hyperfine coupling constants in 38 neutral, cationic, and anionic radicals that include the {sup 11}B, {sup 17}O, {sup 9}Be, {sup 19}F, {sup 1}H, {sup 13}C, {sup 35}Cl, {sup 33}S,{sup 14}N, {sup 31}P, and {sup 67}Zn nuclei. Present parallel calculations are conducted at the Hartree-Fock (HF), second-order many-body perturbation theory [MBPT(2)], CC singles and doubles (CCSD), and CCSD with perturbative triples [CCSD(T)] levels using Roos augmented double- and triple-zeta atomic natural orbitals basis sets. HF results consistently overestimate isotropic hyperfine coupling constants. However, inclusion of electron correlation effects in the simplest way via MBPT(2) provides significant improvements in the predictions, but not without occasional failures. In contrast, CCSD results are consistently in very good agreement with experimental results. Inclusion of perturbative triples to CCSD via CCSD(T) leads to small improvements in the predictions, which might not compensate for the extra computational effort at a non-iterative N{sup 7}-scaling in CCSD(T). The importance of these accurate computations of isotropic hyperfine coupling constants to elucidate

An Novel Architecture of Large-scale Communication in IOT

Science.gov (United States)

Ma, Wubin; Deng, Su; Huang, Hongbin

2018-03-01

In recent years, many scholars have done a great deal of research on the development of Internet of Things and networked physical systems. However, few people have made the detailed visualization of the large-scale communications architecture in the IOT. In fact, the non-uniform technology between IPv6 and access points has led to a lack of broad principles of large-scale communications architectures. Therefore, this paper presents the Uni-IPv6 Access and Information Exchange Method (UAIEM), a new architecture and algorithm that addresses large-scale communications in the IOT.
Benefits of transactive memory systems in large-scale development

OpenAIRE

Aivars, Sablis

2016-01-01

Context. Large-scale software development projects are those consisting of a large number of teams, maybe even spread across multiple locations, and working on large and complex software tasks. That means that neither a team member individually nor an entire team holds all the knowledge about the software being developed and teams have to communicate and coordinate their knowledge. Therefore, teams and team members in large-scale software development projects must acquire and manage expertise...
An Implementation and Parallelization of the Scale Space Meshing Algorithm

Directory of Open Access Journals (Sweden)

Julie Digne

2015-11-01

Full Text Available Creating an interpolating mesh from an unorganized set of oriented points is a difficult problemwhich is often overlooked. Most methods focus indeed on building a watertight smoothed meshby defining some function whose zero level set is the surface of the object. However in some casesit is crucial to build a mesh that interpolates the points and does not fill the acquisition holes:either because the data are sparse and trying to fill the holes would create spurious artifactsor because the goal is to explore visually the data exactly as they were acquired without anysmoothing process. In this paper we detail a parallel implementation of the Scale-Space Meshingalgorithm, which builds on the scale-space framework for reconstructing a high precision meshfrom an input oriented point set. This algorithm first smoothes the point set, producing asingularity free shape. It then uses a standard mesh reconstruction technique, the Ball PivotingAlgorithm, to build a mesh from the smoothed point set. The final step consists in back-projecting the mesh built on the smoothed positions onto the original point set. The result ofthis process is an interpolating, hole-preserving surface mesh reconstruction.
Study of a large scale neutron measurement channel

International Nuclear Information System (INIS)

Amarouayache, Anissa; Ben Hadid, Hayet.

1982-12-01

A large scale measurement channel allows the processing of the signal coming from an unique neutronic sensor, during three different running modes: impulses, fluctuations and current. The study described in this note includes three parts: - A theoretical study of the large scale channel and its brief description are given. The results obtained till now in that domain are presented. - The fluctuation mode is thoroughly studied and the improvements to be done are defined. The study of a fluctuation linear channel with an automatic commutation of scales is described and the results of the tests are given. In this large scale channel, the method of data processing is analogical. - To become independent of the problems generated by the use of a an analogical processing of the fluctuation signal, a digital method of data processing is tested. The validity of that method is improved. The results obtained on a test system realized according to this method are given and a preliminary plan for further research is defined [fr
Large-scale ground motion simulation using GPGPU

Science.gov (United States)

Aoi, S.; Maeda, T.; Nishizawa, N.; Aoki, T.

2012-12-01

Huge computation resources are required to perform large-scale ground motion simulations using 3-D finite difference method (FDM) for realistic and complex models with high accuracy. Furthermore, thousands of various simulations are necessary to evaluate the variability of the assessment caused by uncertainty of the assumptions of the source models for future earthquakes. To conquer the problem of restricted computational resources, we introduced the use of GPGPU (General purpose computing on graphics processing units) which is the technique of using a GPU as an accelerator of the computation which has been traditionally conducted by the CPU. We employed the CPU version of GMS (Ground motion Simulator; Aoi et al., 2004) as the original code and implemented the function for GPU calculation using CUDA (Compute Unified Device Architecture). GMS is a total system for seismic wave propagation simulation based on 3-D FDM scheme using discontinuous grids (Aoi&Fujiwara, 1999), which includes the solver as well as the preprocessor tools (parameter generation tool) and postprocessor tools (filter tool, visualization tool, and so on). The computational model is decomposed in two horizontal directions and each decomposed model is allocated to a different GPU. We evaluated the performance of our newly developed GPU version of GMS on the TSUBAME2.0 which is one of the Japanese fastest supercomputer operated by the Tokyo Institute of Technology. First we have performed a strong scaling test using the model with about 22 million grids and achieved 3.2 and 7.3 times of the speed-up by using 4 and 16 GPUs. Next, we have examined a weak scaling test where the model sizes (number of grids) are increased in proportion to the degree of parallelism (number of GPUs). The result showed almost perfect linearity up to the simulation with 22 billion grids using 1024 GPUs where the calculation speed reached to 79.7 TFlops and about 34 times faster than the CPU calculation using the same number
Hybrid shared/distributed parallelism for 3D characteristics transport solvers

International Nuclear Information System (INIS)

Dahmani, M.; Roy, R.

2005-01-01

In this paper, we will present a new hybrid parallel model for solving large-scale 3-dimensional neutron transport problems used in nuclear reactor simulations. Large heterogeneous reactor problems, like the ones that occurs when simulating Candu cores, have remained computationally intensive and impractical for routine applications on single-node or even vector computers. Based on the characteristics method, this new model is designed to solve the transport equation after distributing the calculation load on a network of shared memory multi-processors. The tracks are either generated on the fly at each characteristics sweep or stored in sequential files. The load balancing is taken into account by estimating the calculation load of tracks and by distributing batches of uniform load on each node of the network. Moreover, the communication overhead can be predicted after benchmarking the latency and bandwidth using appropriate network test suite. These models are useful for predicting the performance of the parallel applications and to analyze the scalability of the parallel systems. (authors)
Performance evaluation for compressible flow calculations on five parallel computers of different architectures

International Nuclear Information System (INIS)

Kimura, Toshiya.

1997-03-01

A two-dimensional explicit Euler solver has been implemented for five MIMD parallel computers of different machine architectures in Center for Promotion of Computational Science and Engineering of Japan Atomic Energy Research Institute. These parallel computers are Fujitsu VPP300, NEC SX-4, CRAY T94, IBM SP2, and Hitachi SR2201. The code was parallelized by several parallelization methods, and a typical compressible flow problem has been calculated for different grid sizes changing the number of processors. Their effective performances for parallel calculations, such as calculation speed, speed-up ratio and parallel efficiency, have been investigated and evaluated. The communication time among processors has been also measured and evaluated. As a result, the differences on the performance and the characteristics between vector-parallel and scalar-parallel computers can be pointed, and it will present the basic data for efficient use of parallel computers and for large scale CFD simulations on parallel computers. (author)
Efficacy of the SU(3) scheme for ab initio large-scale calculations beyond the lightest nuclei

Energy Technology Data Exchange (ETDEWEB)

Dytrych, T. [Academy of Sciences of the Czech Republic (ASCR), Prague (Czech Republic); Louisiana State Univ., Baton Rouge, LA (United States); Maris, Pieter [Iowa State Univ., Ames, IA (United States); Launey, K. D. [Louisiana State Univ., Baton Rouge, LA (United States); Draayer, J. P. [Louisiana State Univ., Baton Rouge, LA (United States); Vary, James [Iowa State Univ., Ames, IA (United States); Langr, D. [Czech Technical Univ., Prague (Czech Republic); Aerospace Research and Test Establishment, Prague (Czech Republic); Saule, E. [Univ. of North Carolina, Charlotte, NC (United States); Caprio, M. A. [Univ. of Notre Dame, IN (United States); Catalyurek, U. [The Ohio State Univ., Columbus, OH (United States). Dept. of Electrical and Computer Engineering; Sosonkina, M. [Old Dominion Univ., Norfolk, VA (United States)

2016-06-09

We report on the computational characteristics of ab initio nuclear structure calculations in a symmetry-adapted no-core shell model (SA-NCSM) framework. We examine the computational complexity of the current implementation of the SA-NCSM approach, dubbed LSU3shell, by analyzing ab initio results for ⁶Li and ¹²C in large harmonic oscillator model spaces and SU(3)-selected subspaces. We demonstrate LSU3shell's strong-scaling properties achieved with highly-parallel methods for computing the many-body matrix elements. Results compare favorably with complete model space calculations and signi cant memory savings are achieved in physically important applications. In particular, a well-chosen symmetry-adapted basis a ords memory savings in calculations of states with a fixed total angular momentum in large model spaces while exactly preserving translational invariance.
Capabilities of the Large-Scale Sediment Transport Facility

Science.gov (United States)

2016-04-01

pump flow meters, sediment trap weigh tanks , and beach profiling lidar. A detailed discussion of the original LSTF features and capabilities can be...ERDC/CHL CHETN-I-88 April 2016 Approved for public release; distribution is unlimited. Capabilities of the Large-Scale Sediment Transport...describes the Large-Scale Sediment Transport Facility (LSTF) and recent upgrades to the measurement systems. The purpose of these upgrades was to increase
Par@Graph - a parallel toolbox for the construction and analysis of large complex climate networks

NARCIS (Netherlands)

Tantet, A.J.J.

2015-01-01

In this paper, we present Par@Graph, a software toolbox to reconstruct and analyze complex climate networks having a large number of nodes (up to at least 106) and edges (up to at least 1012). The key innovation is an efficient set of parallel software tools designed to leverage the inherited hybrid
Spatiotemporal property and predictability of large-scale human mobility

Science.gov (United States)

Zhang, Hai-Tao; Zhu, Tao; Fu, Dongfei; Xu, Bowen; Han, Xiao-Pu; Chen, Duxin

2018-04-01

Spatiotemporal characteristics of human mobility emerging from complexity on individual scale have been extensively studied due to the application potential on human behavior prediction and recommendation, and control of epidemic spreading. We collect and investigate a comprehensive data set of human activities on large geographical scales, including both websites browse and mobile towers visit. Numerical results show that the degree of activity decays as a power law, indicating that human behaviors are reminiscent of scale-free random walks known as Lévy flight. More significantly, this study suggests that human activities on large geographical scales have specific non-Markovian characteristics, such as a two-segment power-law distribution of dwelling time and a high possibility for prediction. Furthermore, a scale-free featured mobility model with two essential ingredients, i.e., preferential return and exploration, and a Gaussian distribution assumption on the exploration tendency parameter is proposed, which outperforms existing human mobility models under scenarios of large geographical scales.
Development of the Large-Scale Statistical Analysis System of Satellites Observations Data with Grid Datafarm Architecture

Science.gov (United States)

Yamamoto, K.; Murata, K.; Kimura, E.; Honda, R.

2006-12-01

In the Solar-Terrestrial Physics (STP) field, the amount of satellite observation data has been increasing every year. It is necessary to solve the following three problems to achieve large-scale statistical analyses of plenty of data. (i) More CPU power and larger memory and disk size are required. However, total powers of personal computers are not enough to analyze such amount of data. Super-computers provide a high performance CPU and rich memory area, but they are usually separated from the Internet or connected only for the purpose of programming or data file transfer. (ii) Most of the observation data files are managed at distributed data sites over the Internet. Users have to know where the data files are located. (iii) Since no common data format in the STP field is available now, users have to prepare reading program for each data by themselves. To overcome the problems (i) and (ii), we constructed a parallel and distributed data analysis environment based on the Gfarm reference implementation of the Grid Datafarm architecture. The Gfarm shares both computational resources and perform parallel distributed processings. In addition, the Gfarm provides the Gfarm filesystem which can be as virtual directory tree among nodes. The Gfarm environment is composed of three parts; a metadata server to manage distributed files information, filesystem nodes to provide computational resources and a client to throw a job into metadata server and manages data processing schedulings. In the present study, both data files and data processes are parallelized on the Gfarm with 6 file system nodes: CPU clock frequency of each node is Pentium V 1GHz, 256MB memory and40GB disk. To evaluate performances of the present Gfarm system, we scanned plenty of data files, the size of which is about 300MB for each, in three processing methods: sequential processing in one node, sequential processing by each node and parallel processing by each node. As a result, in comparison between the
Problems of large-scale vertically-integrated aquaculture

Energy Technology Data Exchange (ETDEWEB)

Webber, H H; Riordan, P F

1976-01-01

The problems of vertically-integrated aquaculture are outlined; they are concerned with: species limitations (in the market, biological and technological); site selection, feed, manpower needs, and legal, institutional and financial requirements. The gaps in understanding of, and the constraints limiting, large-scale aquaculture are listed. Future action is recommended with respect to: types and diversity of species to be cultivated, marketing, biotechnology (seed supply, disease control, water quality and concerted effort), siting, feed, manpower, legal and institutional aids (granting of water rights, grants, tax breaks, duty-free imports, etc.), and adequate financing. The last of hard data based on experience suggests that large-scale vertically-integrated aquaculture is a high risk enterprise, and with the high capital investment required, banks and funding institutions are wary of supporting it. Investment in pilot projects is suggested to demonstrate that large-scale aquaculture can be a fully functional and successful business. Construction and operation of such pilot farms is judged to be in the interests of both the public and private sector.
Reionization on large scales. IV. Predictions for the 21 cm signal incorporating the light cone effect

Energy Technology Data Exchange (ETDEWEB)

La Plante, P.; Battaglia, N.; Natarajan, A.; Peterson, J. B.; Trac, H. [McWilliams Center for Cosmology, Department of Physics, Carnegie Mellon University, Pittsburgh, PA 15213 (United States); Cen, R. [Department of Astrophysical Science, Princeton University, Princeton, NJ 08544 (United States); Loeb, A., E-mail: plaplant@andrew.cmu.edu [Harvard-Smithsonian Center for Astrophysics, Cambridge, MA 02138 (United States)

2014-07-01

We present predictions for the 21 cm brightness temperature power spectrum during the Epoch of Reionization (EoR). We discuss the implications of the 'light cone' effect, which incorporates evolution of the neutral hydrogen fraction and 21 cm brightness temperature along the line of sight. Using a novel method calibrated against radiation-hydrodynamic simulations, we model the neutral hydrogen density field and 21 cm signal in large volumes (L = 2 Gpc h {sup –1}). The inclusion of the light cone effect leads to a relative decrease of about 50% in the 21 cm power spectrum on all scales. We also find that the effect is more prominent at the midpoint of reionization and later. The light cone effect can also introduce an anisotropy along the line of sight. By decomposing the 3D power spectrum into components perpendicular to and along the line of sight, we find that in our fiducial reionization model, there is no significant anisotropy. However, parallel modes can contribute up to 40% more power for shorter reionization scenarios. The scales on which the light cone effect is relevant are comparable to scales where one measures the baryon acoustic oscillation. We argue that due to its large comoving scale and introduction of anisotropy, the light cone effect is important when considering redshift space distortions and future application to the Alcock-Paczyński test for the determination of cosmological parameters.
SCALE Continuous-Energy Monte Carlo Depletion with Parallel KENO in TRITON

International Nuclear Information System (INIS)

Goluoglu, Sedat; Bekar, Kursat B.; Wiarda, Dorothea

2012-01-01

The TRITON sequence of the SCALE code system is a powerful and robust tool for performing multigroup (MG) reactor physics analysis using either the 2-D deterministic solver NEWT or the 3-D Monte Carlo transport code KENO. However, as with all MG codes, the accuracy of the results depends on the accuracy of the MG cross sections that are generated and/or used. While SCALE resonance self-shielding modules provide rigorous resonance self-shielding, they are based on 1-D models and therefore 2-D or 3-D effects such as heterogeneity of the lattice structures may render final MG cross sections inaccurate. Another potential drawback to MG Monte Carlo depletion is the need to perform resonance self-shielding calculations at each depletion step for each fuel segment that is being depleted. The CPU time and memory required for self-shielding calculations can often eclipse the resources needed for the Monte Carlo transport. This summary presents the results of the new continuous-energy (CE) calculation mode in TRITON. With the new capability, accurate reactor physics analyses can be performed for all types of systems using the SCALE Monte Carlo code KENO as the CE transport solver. In addition, transport calculations can be performed in parallel mode on multiple processors.
Xyce™ Parallel Electronic Simulator Users' Guide, Version 6.5.

Energy Technology Data Exchange (ETDEWEB)

Keiter, Eric R. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical Models and Simulation; Aadithya, Karthik V. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical Models and Simulation; Mei, Ting [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical Models and Simulation; Russo, Thomas V. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical Models and Simulation; Schiek, Richard L. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical Models and Simulation; Sholander, Peter E. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical Models and Simulation; Thornquist, Heidi K. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical Models and Simulation; Verley, Jason C. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical Models and Simulation

2016-06-01

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase -- a message passing parallel implementation -- which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The information herein is subject to change without notice. Copyright © 2002-2016 Sandia Corporation. All rights reserved.
Xyce parallel electronic simulator users guide, version 6.0.

Energy Technology Data Exchange (ETDEWEB)

Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd S; Pawlowski, Roger P; Warrender, Christina E.; Baur, David Gregory.

2013-08-01

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandias needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase a message passing parallel implementation which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.
Xyce parallel electronic simulator users' guide, Version 6.0.1.

Energy Technology Data Exchange (ETDEWEB)

Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd S; Pawlowski, Roger P; Warrender, Christina E.; Baur, David Gregory.

2014-01-01

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandias needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase a message passing parallel implementation which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.
Xyce parallel electronic simulator users guide, version 6.1

Energy Technology Data Exchange (ETDEWEB)

Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason C.; Baur, David Gregory

2014-03-01

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas; Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers; A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models; Device models that are specifically tailored to meet Sandia's needs, including some radiationaware devices (for Sandia users only); and Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase-a message passing parallel implementation-which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.
VESPA: Very large-scale Evolutionary and Selective Pressure Analyses

Directory of Open Access Journals (Sweden)

Andrew E. Webb

2017-06-01

Full Text Available Background Large-scale molecular evolutionary analyses of protein coding sequences requires a number of preparatory inter-related steps from finding gene families, to generating alignments and phylogenetic trees and assessing selective pressure variation. Each phase of these analyses can represent significant challenges, particularly when working with entire proteomes (all protein coding sequences in a genome from a large number of species. Methods We present VESPA, software capable of automating a selective pressure analysis using codeML in addition to the preparatory analyses and summary statistics. VESPA is written in python and Perl and is designed to run within a UNIX environment. Results We have benchmarked VESPA and our results show that the method is consistent, performs well on both large scale and smaller scale datasets, and produces results in line with previously published datasets. Discussion Large-scale gene family identification, sequence alignment, and phylogeny reconstruction are all important aspects of large-scale molecular evolutionary analyses. VESPA provides flexible software for simplifying these processes along with downstream selective pressure variation analyses. The software automatically interprets results from codeML and produces simplified summary files to assist the user in better understanding the results. VESPA may be found at the following website: http://www.mol-evol.org/VESPA.

Effect of grain boundary phase on the magnetization reversal process of nanocrystalline magnet using large-scale micromagnetic simulation

Directory of Open Access Journals (Sweden)

Hiroshi Tsukahara

2018-05-01

Full Text Available We investigated the effects of grain boundary phases on magnetization reversal in permanent magnets by performing large-scale micromagnetic simulations based on Landau–Lifshitz–Gilbert equation under a periodic boundary. We considered planar grain boundary phases parallel and perpendicular to an easy axis of the permanent magnet and assumed the saturation magnetization and exchange stiffness constant of the grain boundary phase to be 10% and 1%, respectively, for Nd2Fe14B grains. The grain boundary phase parallel to the easy axis effectively inhibits propagation of magnetization reversal. In contrast, the domain wall moves across the grain boundary perpendicular to the easy axis. These properties of the domain wall motion are explained by dipole interaction, which stabilizes the antiparallel magnetic configuration in the direction perpendicular to the magnetization orientation. On the other hand, the magnetization is aligned in the same direction by the dipole interaction parallel to the magnetization orientation. This anisotropy of the effect of the grain boundary phase shows that improvement of the grain boundary phase perpendicular to the easy axis effectively enhances the coercivity of permanent magnets.
Parallel computation for solving the tridiagonal linear system of equations

International Nuclear Information System (INIS)

Ishiguro, Misako; Harada, Hiroo; Fujii, Minoru; Fujimura, Toichiro; Nakamura, Yasuhiro; Nanba, Katsumi.

1981-09-01

Recently, applications of parallel computation for scientific calculations have increased from the need of the high speed calculation of large scale programs. At the JAERI computing center, an array processor FACOM 230-75 APU has installed to study the applicability of parallel computation for nuclear codes. We made some numerical experiments by using the APU on the methods of solution of tridiagonal linear equation which is an important problem in scientific calculations. Referring to the recent papers with parallel methods, we investigate eight ones. These are Gauss elimination method, Parallel Gauss method, Accelerated parallel Gauss method, Jacobi method, Recursive doubling method, Cyclic reduction method, Chebyshev iteration method, and Conjugate gradient method. The computing time and accuracy were compared among the methods on the basis of the numerical experiments. As the result, it is found that the Cyclic reduction method is best both in computing time and accuracy and the Gauss elimination method is the second one. (author)
Scheduling Parallel Jobs Using Migration and Consolidation in the Cloud

Directory of Open Access Journals (Sweden)

Xiaocheng Liu

2012-01-01

Full Text Available An increasing number of high performance computing parallel applications leverages the power of the cloud for parallel processing. How to schedule the parallel applications to improve the quality of service is the key to the successful host of parallel applications in the cloud. The large scale of the cloud makes the parallel job scheduling more complicated as even simple parallel job scheduling problem is NP-complete. In this paper, we propose a parallel job scheduling algorithm named MEASY. MEASY adopts migration and consolidation to enhance the most popular EASY scheduling algorithm. Our extensive experiments on well-known workloads show that our algorithm takes very good care of the quality of service. For two common parallel job scheduling objectives, our algorithm produces an up to 41.1% and an average of 23.1% improvement on the average response time; an up to 82.9% and an average of 69.3% improvement on the average slowdown. Our algorithm is robust even in terms that it allows inaccurate CPU usage estimation and high migration cost. Our approach involves trivial modification on EASY and requires no additional technique; it is practical and effective in the cloud environment.
RESTRUCTURING OF THE LARGE-SCALE SPRINKLERS

Directory of Open Access Journals (Sweden)

Paweł Kozaczyk

2016-09-01

Full Text Available One of the best ways for agriculture to become independent from shortages of precipitation is irrigation. In the seventies and eighties of the last century a number of large-scale sprinklers in Wielkopolska was built. At the end of 1970’s in the Poznan province 67 sprinklers with a total area of 6400 ha were installed. The average size of the sprinkler reached 95 ha. In 1989 there were 98 sprinklers, and the area which was armed with them was more than 10 130 ha. The study was conducted on 7 large sprinklers with the area ranging from 230 to 520 hectares in 1986÷1998. After the introduction of the market economy in the early 90’s and ownership changes in agriculture, large-scale sprinklers have gone under a significant or total devastation. Land on the State Farms of the State Agricultural Property Agency has leased or sold and the new owners used the existing sprinklers to a very small extent. This involved a change in crop structure, demand structure and an increase in operating costs. There has also been a threefold increase in electricity prices. Operation of large-scale irrigation encountered all kinds of barriers in practice and limitations of system solutions, supply difficulties, high levels of equipment failure which is not inclined to rational use of available sprinklers. An effect of a vision of the local area was to show the current status of the remaining irrigation infrastructure. The adopted scheme for the restructuring of Polish agriculture was not the best solution, causing massive destruction of assets previously invested in the sprinkler system.
Large-scale synthesis of YSZ nanopowder by Pechini method

Indian Academy of Sciences (India)

Administrator

structure and chemical purity of 99⋅1% by inductively coupled plasma optical emission spectroscopy on a large scale. Keywords. Sol–gel; yttria-stabilized zirconia; large scale; nanopowder; Pechini method. 1. Introduction. Zirconia has attracted the attention of many scientists because of its tremendous thermal, mechanical ...
Block-Parallel Data Analysis with DIY2

Energy Technology Data Exchange (ETDEWEB)

Morozov, Dmitriy [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Peterka, Tom [Argonne National Lab. (ANL), Argonne, IL (United States)

2017-08-30

DIY2 is a programming model and runtime for block-parallel analytics on distributed-memory machines. Its main abstraction is block-structured data parallelism: data are decomposed into blocks; blocks are assigned to processing elements (processes or threads); computation is described as iterations over these blocks, and communication between blocks is defined by reusable patterns. By expressing computation in this general form, the DIY2 runtime is free to optimize the movement of blocks between slow and fast memories (disk and flash vs. DRAM) and to concurrently execute blocks residing in memory with multiple threads. This enables the same program to execute in-core, out-of-core, serial, parallel, single-threaded, multithreaded, or combinations thereof. This paper describes the implementation of the main features of the DIY2 programming model and optimizations to improve performance. DIY2 is evaluated on benchmark test cases to establish baseline performance for several common patterns and on larger complete analysis codes running on large-scale HPC machines.
Algorithms for large scale singular value analysis of spatially variant tomography systems

International Nuclear Information System (INIS)

Cao-Huu, Tuan; Brownell, G.; Lachiver, G.

1996-01-01

The problem of determining the eigenvalues of large matrices occurs often in the design and analysis of modem tomography systems. As there is an interest in solving systems containing an ever-increasing number of variables, current research effort is being made to create more robust solvers which do not depend on some special feature of the matrix for convergence (e.g. block circulant), and to improve the speed of already known and understood solvers so that solving even larger systems in a reasonable time becomes viable. Our standard techniques for singular value analysis are based on sparse matrix factorization and are not applicable when the input matrices are large because the algorithms cause too much fill. Fill refers to the increase of non-zero elements in the LU decomposition of the original matrix A (the system matrix). So we have developed iterative solutions that are based on sparse direct methods. Data motion and preconditioning techniques are critical for performance. This conference paper describes our algorithmic approaches for large scale singular value analysis of spatially variant imaging systems, and in particular of PCR2, a cylindrical three-dimensional PET imager 2 built at the Massachusetts General Hospital (MGH) in Boston. We recommend the desirable features and challenges for the next generation of parallel machines for optimal performance of our solver
The Phoenix series large scale LNG pool fire experiments.

Energy Technology Data Exchange (ETDEWEB)

Simpson, Richard B.; Jensen, Richard Pearson; Demosthenous, Byron; Luketa, Anay Josephine; Ricks, Allen Joseph; Hightower, Marion Michael; Blanchat, Thomas K.; Helmick, Paul H.; Tieszen, Sheldon Robert; Deola, Regina Anne; Mercier, Jeffrey Alan; Suo-Anttila, Jill Marie; Miller, Timothy J.

2010-12-01

The increasing demand for natural gas could increase the number and frequency of Liquefied Natural Gas (LNG) tanker deliveries to ports across the United States. Because of the increasing number of shipments and the number of possible new facilities, concerns about the potential safety of the public and property from an accidental, and even more importantly intentional spills, have increased. While improvements have been made over the past decade in assessing hazards from LNG spills, the existing experimental data is much smaller in size and scale than many postulated large accidental and intentional spills. Since the physics and hazards from a fire change with fire size, there are concerns about the adequacy of current hazard prediction techniques for large LNG spills and fires. To address these concerns, Congress funded the Department of Energy (DOE) in 2008 to conduct a series of laboratory and large-scale LNG pool fire experiments at Sandia National Laboratories (Sandia) in Albuquerque, New Mexico. This report presents the test data and results of both sets of fire experiments. A series of five reduced-scale (gas burner) tests (yielding 27 sets of data) were conducted in 2007 and 2008 at Sandia's Thermal Test Complex (TTC) to assess flame height to fire diameter ratios as a function of nondimensional heat release rates for extrapolation to large-scale LNG fires. The large-scale LNG pool fire experiments were conducted in a 120 m diameter pond specially designed and constructed in Sandia's Area III large-scale test complex. Two fire tests of LNG spills of 21 and 81 m in diameter were conducted in 2009 to improve the understanding of flame height, smoke production, and burn rate and therefore the physics and hazards of large LNG spills and fires.
Potential climatic impacts and reliability of very large-scale wind farms

Directory of Open Access Journals (Sweden)

C. Wang

2010-02-01

Full Text Available Meeting future world energy needs while addressing climate change requires large-scale deployment of low or zero greenhouse gas (GHG emission technologies such as wind energy. The widespread availability of wind power has fueled substantial interest in this renewable energy source as one of the needed technologies. For very large-scale utilization of this resource, there are however potential environmental impacts, and also problems arising from its inherent intermittency, in addition to the present need to lower unit costs. To explore some of these issues, we use a three-dimensional climate model to simulate the potential climate effects associated with installation of wind-powered generators over vast areas of land or coastal ocean. Using wind turbines to meet 10% or more of global energy demand in 2100, could cause surface warming exceeding 1 °C over land installations. In contrast, surface cooling exceeding 1 °C is computed over ocean installations, but the validity of simulating the impacts of wind turbines by simply increasing the ocean surface drag needs further study. Significant warming or cooling remote from both the land and ocean installations, and alterations of the global distributions of rainfall and clouds also occur. These results are influenced by the competing effects of increases in roughness and decreases in wind speed on near-surface turbulent heat fluxes, the differing nature of land and ocean surface friction, and the dimensions of the installations parallel and perpendicular to the prevailing winds. These results are also dependent on the accuracy of the model used, and the realism of the methods applied to simulate wind turbines. Additional theory and new field observations will be required for their ultimate validation. Intermittency of wind power on daily, monthly and longer time scales as computed in these simulations and inferred from meteorological observations, poses a demand for one or more options to ensure
Geospatial Optimization of Siting Large-Scale Solar Projects

Energy Technology Data Exchange (ETDEWEB)

Macknick, Jordan [National Renewable Energy Lab. (NREL), Golden, CO (United States); Quinby, Ted [National Renewable Energy Lab. (NREL), Golden, CO (United States); Caulfield, Emmet [Stanford Univ., CA (United States); Gerritsen, Margot [Stanford Univ., CA (United States); Diffendorfer, Jay [U.S. Geological Survey, Boulder, CO (United States); Haines, Seth [U.S. Geological Survey, Boulder, CO (United States)

2014-03-01

Recent policy and economic conditions have encouraged a renewed interest in developing large-scale solar projects in the U.S. Southwest. However, siting large-scale solar projects is complex. In addition to the quality of the solar resource, solar developers must take into consideration many environmental, social, and economic factors when evaluating a potential site. This report describes a proof-of-concept, Web-based Geographical Information Systems (GIS) tool that evaluates multiple user-defined criteria in an optimization algorithm to inform discussions and decisions regarding the locations of utility-scale solar projects. Existing siting recommendations for large-scale solar projects from governmental and non-governmental organizations are not consistent with each other, are often not transparent in methods, and do not take into consideration the differing priorities of stakeholders. The siting assistance GIS tool we have developed improves upon the existing siting guidelines by being user-driven, transparent, interactive, capable of incorporating multiple criteria, and flexible. This work provides the foundation for a dynamic siting assistance tool that can greatly facilitate siting decisions among multiple stakeholders.
Large-scale Agricultural Land Acquisitions in West Africa | IDRC ...

International Development Research Centre (IDRC) Digital Library (Canada)

This project will examine large-scale agricultural land acquisitions in nine West African countries -Burkina Faso, Guinea-Bissau, Guinea, Benin, Mali, Togo, Senegal, Niger, and Côte d'Ivoire. ... They will use the results to increase public awareness and knowledge about the consequences of large-scale land acquisitions.
WebPIE : A web-scale parallel inference engine using MapReduce

NARCIS (Netherlands)

Urbani, Jacopo; Kotoulas, Spyros; Maassen, Jason; Van Harmelen, Frank; Bal, Henri

2012-01-01

The large amount of Semantic Web data and its fast growth pose a significant computational challenge in performing efficient and scalable reasoning. On a large scale, the resources of single machines are no longer sufficient and we are required to distribute the process to improve performance. The
Large-scale motions in the universe: a review

International Nuclear Information System (INIS)

Burstein, D.

1990-01-01

The expansion of the universe can be retarded in localised regions within the universe both by the presence of gravity and by non-gravitational motions generated in the post-recombination universe. The motions of galaxies thus generated are called 'peculiar motions', and the amplitudes, size scales and coherence of these peculiar motions are among the most direct records of the structure of the universe. As such, measurements of these properties of the present-day universe provide some of the severest tests of cosmological theories. This is a review of the current evidence for large-scale motions of galaxies out to a distance of ∼5000 km s -1 (in an expanding universe, distance is proportional to radial velocity). 'Large-scale' in this context refers to motions that are correlated over size scales larger than the typical sizes of groups of galaxies, up to and including the size of the volume surveyed. To orient the reader into this relatively new field of study, a short modern history is given together with an explanation of the terminology. Careful consideration is given to the data used to measure the distances, and hence the peculiar motions, of galaxies. The evidence for large-scale motions is presented in a graphical fashion, using only the most reliable data for galaxies spanning a wide range in optical properties and over the complete range of galactic environments. The kinds of systematic errors that can affect this analysis are discussed, and the reliability of these motions is assessed. The predictions of two models of large-scale motion are compared to the observations, and special emphasis is placed on those motions in which our own Galaxy directly partakes. (author)
GRAPES: a software for parallel searching on biological graphs targeting multi-core architectures.

Directory of Open Access Journals (Sweden)

Rosalba Giugno

Full Text Available Biological applications, from genomics to ecology, deal with graphs that represents the structure of interactions. Analyzing such data requires searching for subgraphs in collections of graphs. This task is computationally expensive. Even though multicore architectures, from commodity computers to more advanced symmetric multiprocessing (SMP, offer scalable computing power, currently published software implementations for indexing and graph matching are fundamentally sequential. As a consequence, such software implementations (i do not fully exploit available parallel computing power and (ii they do not scale with respect to the size of graphs in the database. We present GRAPES, software for parallel searching on databases of large biological graphs. GRAPES implements a parallel version of well-established graph searching algorithms, and introduces new strategies which naturally lead to a faster parallel searching system especially for large graphs. GRAPES decomposes graphs into subcomponents that can be efficiently searched in parallel. We show the performance of GRAPES on representative biological datasets containing antiviral chemical compounds, DNA, RNA, proteins, protein contact maps and protein interactions networks.
State of the Art in Large-Scale Soil Moisture Monitoring

Science.gov (United States)

Ochsner, Tyson E.; Cosh, Michael Harold; Cuenca, Richard H.; Dorigo, Wouter; Draper, Clara S.; Hagimoto, Yutaka; Kerr, Yan H.; Larson, Kristine M.; Njoku, Eni Gerald; Small, Eric E.;

2013-01-01

Soil moisture is an essential climate variable influencing land atmosphere interactions, an essential hydrologic variable impacting rainfall runoff processes, an essential ecological variable regulating net ecosystem exchange, and an essential agricultural variable constraining food security. Large-scale soil moisture monitoring has advanced in recent years creating opportunities to transform scientific understanding of soil moisture and related processes. These advances are being driven by researchers from a broad range of disciplines, but this complicates collaboration and communication. For some applications, the science required to utilize large-scale soil moisture data is poorly developed. In this review, we describe the state of the art in large-scale soil moisture monitoring and identify some critical needs for research to optimize the use of increasingly available soil moisture data. We review representative examples of 1) emerging in situ and proximal sensing techniques, 2) dedicated soil moisture remote sensing missions, 3) soil moisture monitoring networks, and 4) applications of large-scale soil moisture measurements. Significant near-term progress seems possible in the use of large-scale soil moisture data for drought monitoring. Assimilation of soil moisture data for meteorological or hydrologic forecasting also shows promise, but significant challenges related to model structures and model errors remain. Little progress has been made yet in the use of large-scale soil moisture observations within the context of ecological or agricultural modeling. Opportunities abound to advance the science and practice of large-scale soil moisture monitoring for the sake of improved Earth system monitoring, modeling, and forecasting.

A Parallel Adaboost-Backpropagation Neural Network for Massive Image Dataset Classification

Science.gov (United States)

Cao, Jianfang; Chen, Lichao; Wang, Min; Shi, Hao; Tian, Yun

2016-01-01

Image classification uses computers to simulate human understanding and cognition of images by automatically categorizing images. This study proposes a faster image classification approach that parallelizes the traditional Adaboost-Backpropagation (BP) neural network using the MapReduce parallel programming model. First, we construct a strong classifier by assembling the outputs of 15 BP neural networks (which are individually regarded as weak classifiers) based on the Adaboost algorithm. Second, we design Map and Reduce tasks for both the parallel Adaboost-BP neural network and the feature extraction algorithm. Finally, we establish an automated classification model by building a Hadoop cluster. We use the Pascal VOC2007 and Caltech256 datasets to train and test the classification model. The results are superior to those obtained using traditional Adaboost-BP neural network or parallel BP neural network approaches. Our approach increased the average classification accuracy rate by approximately 14.5% and 26.0% compared to the traditional Adaboost-BP neural network and parallel BP neural network, respectively. Furthermore, the proposed approach requires less computation time and scales very well as evaluated by speedup, sizeup and scaleup. The proposed approach may provide a foundation for automated large-scale image classification and demonstrates practical value. PMID:27905520
Large-Scale Multi-Dimensional Document Clustering on GPU Clusters

Energy Technology Data Exchange (ETDEWEB)

Cui, Xiaohui [ORNL; Mueller, Frank [North Carolina State University; Zhang, Yongpeng [ORNL; Potok, Thomas E [ORNL

2010-01-01

Document clustering plays an important role in data mining systems. Recently, a flocking-based document clustering algorithm has been proposed to solve the problem through simulation resembling the flocking behavior of birds in nature. This method is superior to other clustering algorithms, including k-means, in the sense that the outcome is not sensitive to the initial state. One limitation of this approach is that the algorithmic complexity is inherently quadratic in the number of documents. As a result, execution time becomes a bottleneck with large number of documents. In this paper, we assess the benefits of exploiting the computational power of Beowulf-like clusters equipped with contemporary Graphics Processing Units (GPUs) as a means to significantly reduce the runtime of flocking-based document clustering. Our framework scales up to over one million documents processed simultaneously in a sixteennode GPU cluster. Results are also compared to a four-node cluster with higher-end GPUs. On these clusters, we observe 30X-50X speedups, which demonstrates the potential of GPU clusters to efficiently solve massive data mining problems. Such speedups combined with the scalability potential and accelerator-based parallelization are unique in the domain of document-based data mining, to the best of our knowledge.
A route to explosive large-scale magnetic reconnection in a super-ion-scale current sheet

Directory of Open Access Journals (Sweden)

K. G. Tanaka

2009-01-01

Full Text Available How to trigger magnetic reconnection is one of the most interesting and important problems in space plasma physics. Recently, electron temperature anisotropy (αeo=Te⊥/Te|| at the center of a current sheet and non-local effect of the lower-hybrid drift instability (LHDI that develops at the current sheet edges have attracted attention in this context. In addition to these effects, here we also study the effects of ion temperature anisotropy (αio=Ti⊥/Ti||. Electron anisotropy effects are known to be helpless in a current sheet whose thickness is of ion-scale. In this range of current sheet thickness, the LHDI effects are shown to weaken substantially with a small increase in thickness and the obtained saturation level is too low for a large-scale reconnection to be achieved. Then we investigate whether introduction of electron and ion temperature anisotropies in the initial stage would couple with the LHDI effects to revive quick triggering of large-scale reconnection in a super-ion-scale current sheet. The results are as follows. (1 The initial electron temperature anisotropy is consumed very quickly when a number of minuscule magnetic islands (each lateral length is 1.5~3 times the ion inertial length form. These minuscule islands do not coalesce into a large-scale island to enable large-scale reconnection. (2 The subsequent LHDI effects disturb the current sheet filled with the small islands. This makes the triggering time scale to be accelerated substantially but does not enhance the saturation level of reconnected flux. (3 When the ion temperature anisotropy is added, it survives through the small island formation stage and makes even quicker triggering to happen when the LHDI effects set-in. Furthermore the saturation level is seen to be elevated by a factor of ~2 and large-scale reconnection is achieved only in this case. Comparison with two-dimensional simulations that exclude the LHDI effects confirms that the saturation level
Kinetic-Monte-Carlo-Based Parallel Evolution Simulation Algorithm of Dust Particles

Directory of Open Access Journals (Sweden)

Xiaomei Hu

2014-01-01

Full Text Available The evolution simulation of dust particles provides an important way to analyze the impact of dust on the environment. KMC-based parallel algorithm is proposed to simulate the evolution of dust particles. In the parallel evolution simulation algorithm of dust particles, data distribution way and communication optimizing strategy are raised to balance the load of every process and reduce the communication expense among processes. The experimental results show that the simulation of diffusion, sediment, and resuspension of dust particles in virtual campus is realized and the simulation time is shortened by parallel algorithm, which makes up for the shortage of serial computing and makes the simulation of large-scale virtual environment possible.
Large-scale Labeled Datasets to Fuel Earth Science Deep Learning Applications

Science.gov (United States)

Maskey, M.; Ramachandran, R.; Miller, J.

2017-12-01

Deep learning has revolutionized computer vision and natural language processing with various algorithms scaled using high-performance computing. However, generic large-scale labeled datasets such as the ImageNet are the fuel that drives the impressive accuracy of deep learning results. Large-scale labeled datasets already exist in domains such as medical science, but creating them in the Earth science domain is a challenge. While there are ways to apply deep learning using limited labeled datasets, there is a need in the Earth sciences for creating large-scale labeled datasets for benchmarking and scaling deep learning applications. At the NASA Marshall Space Flight Center, we are using deep learning for a variety of Earth science applications where we have encountered the need for large-scale labeled datasets. We will discuss our approaches for creating such datasets and why these datasets are just as valuable as deep learning algorithms. We will also describe successful usage of these large-scale labeled datasets with our deep learning based applications.

A Parallel Butterfly Algorithm

KAUST Repository

Poulson, Jack; Demanet, Laurent; Maxwell, Nicholas; Ying, Lexing

2014-01-01

The butterfly algorithm is a fast algorithm which approximately evaluates a discrete analogue of the integral transform (Equation Presented.) at large numbers of target points when the kernel, K(x, y), is approximately low-rank when restricted to subdomains satisfying a certain simple geometric condition. In d dimensions with O(Nd) quasi-uniformly distributed source and target points, when each appropriate submatrix of K is approximately rank-r, the running time of the algorithm is at most O(r2Nd logN). A parallelization of the butterfly algorithm is introduced which, assuming a message latency of α and per-process inverse bandwidth of β, executes in at most (Equation Presented.) time using p processes. This parallel algorithm was then instantiated in the form of the open-source DistButterfly library for the special case where K(x, y) = exp(iΦ(x, y)), where Φ(x, y) is a black-box, sufficiently smooth, real-valued phase function. Experiments on Blue Gene/Q demonstrate impressive strong-scaling results for important classes of phase functions. Using quasi-uniform sources, hyperbolic Radon transforms, and an analogue of a three-dimensional generalized Radon transform were, respectively, observed to strong-scale from 1-node/16-cores up to 1024-nodes/16,384-cores with greater than 90% and 82% efficiency, respectively. © 2014 Society for Industrial and Applied Mathematics.
A Parallel Butterfly Algorithm

KAUST Repository

Poulson, Jack

2014-02-04

The butterfly algorithm is a fast algorithm which approximately evaluates a discrete analogue of the integral transform (Equation Presented.) at large numbers of target points when the kernel, K(x, y), is approximately low-rank when restricted to subdomains satisfying a certain simple geometric condition. In d dimensions with O(Nd) quasi-uniformly distributed source and target points, when each appropriate submatrix of K is approximately rank-r, the running time of the algorithm is at most O(r2Nd logN). A parallelization of the butterfly algorithm is introduced which, assuming a message latency of α and per-process inverse bandwidth of β, executes in at most (Equation Presented.) time using p processes. This parallel algorithm was then instantiated in the form of the open-source DistButterfly library for the special case where K(x, y) = exp(iΦ(x, y)), where Φ(x, y) is a black-box, sufficiently smooth, real-valued phase function. Experiments on Blue Gene/Q demonstrate impressive strong-scaling results for important classes of phase functions. Using quasi-uniform sources, hyperbolic Radon transforms, and an analogue of a three-dimensional generalized Radon transform were, respectively, observed to strong-scale from 1-node/16-cores up to 1024-nodes/16,384-cores with greater than 90% and 82% efficiency, respectively. © 2014 Society for Industrial and Applied Mathematics.
A visual analytics system for optimizing the performance of large-scale networks in supercomputing systems

Directory of Open Access Journals (Sweden)

Takanori Fujiwara

2018-03-01

Full Text Available The overall efficiency of an extreme-scale supercomputer largely relies on the performance of its network interconnects. Several of the state of the art supercomputers use networks based on the increasingly popular Dragonfly topology. It is crucial to study the behavior and performance of different parallel applications running on Dragonfly networks in order to make optimal system configurations and design choices, such as job scheduling and routing strategies. However, in order to study these temporal network behavior, we would need a tool to analyze and correlate numerous sets of multivariate time-series data collected from the Dragonfly’s multi-level hierarchies. This paper presents such a tool–a visual analytics system–that uses the Dragonfly network to investigate the temporal behavior and optimize the communication performance of a supercomputer. We coupled interactive visualization with time-series analysis methods to help reveal hidden patterns in the network behavior with respect to different parallel applications and system configurations. Our system also provides multiple coordinated views for connecting behaviors observed at different levels of the network hierarchies, which effectively helps visual analysis tasks. We demonstrate the effectiveness of the system with a set of case studies. Our system and findings can not only help improve the communication performance of supercomputing applications, but also the network performance of next-generation supercomputers. Keywords: Supercomputing, Parallel communication network, Dragonfly networks, Time-series data, Performance analysis, Visual analytics
Large scale photovoltaic field trials. Second technical report: monitoring phase

Energy Technology Data Exchange (ETDEWEB)

NONE

2007-09-15

This report provides an update on the Large-Scale Building Integrated Photovoltaic Field Trials (LS-BIPV FT) programme commissioned by the Department of Trade and Industry (Department for Business, Enterprise and Industry; BERR). It provides detailed profiles of the 12 projects making up this programme, which is part of the UK programme on photovoltaics and has run in parallel with the Domestic Field Trial. These field trials aim to record the experience and use the lessons learnt to raise awareness of, and confidence in, the technology and increase UK capabilities. The projects involved: the visitor centre at the Gaia Energy Centre in Cornwall; a community church hall in London; council offices in West Oxfordshire; a sports science centre at Gloucester University; the visitor centre at Cotswold Water Park; the headquarters of the Insolvency Service; a Welsh Development Agency building; an athletics centre in Birmingham; a research facility at the University of East Anglia; a primary school in Belfast; and Barnstable civic centre in Devon. The report describes the aims of the field trials, monitoring issues, performance, observations and trends, lessons learnt and the results of occupancy surveys.
Large-scale structure observables in general relativity

International Nuclear Information System (INIS)

Jeong, Donghui; Schmidt, Fabian

2015-01-01

We review recent studies that rigorously define several key observables of the large-scale structure of the Universe in a general relativistic context. Specifically, we consider (i) redshift perturbation of cosmic clock events; (ii) distortion of cosmic rulers, including weak lensing shear and magnification; and (iii) observed number density of tracers of the large-scale structure. We provide covariant and gauge-invariant expressions of these observables. Our expressions are given for a linearly perturbed flat Friedmann–Robertson–Walker metric including scalar, vector, and tensor metric perturbations. While we restrict ourselves to linear order in perturbation theory, the approach can be straightforwardly generalized to higher order. (paper)
Fatigue Analysis of Large-scale Wind turbine

Directory of Open Access Journals (Sweden)

Zhu Yongli

2017-01-01

Full Text Available The paper does research on top flange fatigue damage of large-scale wind turbine generator. It establishes finite element model of top flange connection system with finite element analysis software MSC. Marc/Mentat, analyzes its fatigue strain, implements load simulation of flange fatigue working condition with Bladed software, acquires flange fatigue load spectrum with rain-flow counting method, finally, it realizes fatigue analysis of top flange with fatigue analysis software MSC. Fatigue and Palmgren-Miner linear cumulative damage theory. The analysis result indicates that its result provides new thinking for flange fatigue analysis of large-scale wind turbine generator, and possesses some practical engineering value.
Real-time simulation of large-scale floods

Science.gov (United States)

Liu, Q.; Qin, Y.; Li, G. D.; Liu, Z.; Cheng, D. J.; Zhao, Y. H.

2016-08-01

According to the complex real-time water situation, the real-time simulation of large-scale floods is very important for flood prevention practice. Model robustness and running efficiency are two critical factors in successful real-time flood simulation. This paper proposed a robust, two-dimensional, shallow water model based on the unstructured Godunov- type finite volume method. A robust wet/dry front method is used to enhance the numerical stability. An adaptive method is proposed to improve the running efficiency. The proposed model is used for large-scale flood simulation on real topography. Results compared to those of MIKE21 show the strong performance of the proposed model.
Large-scale numerical simulations of plasmas

International Nuclear Information System (INIS)

Hamaguchi, Satoshi

2004-01-01

The recent trend of large scales simulations of fusion plasma and processing plasmas is briefly summarized. Many advanced simulation techniques have been developed for fusion plasmas and some of these techniques are now applied to analyses of processing plasmas. (author)
Nearly incompressible fluids: Hydrodynamics and large scale inhomogeneity

International Nuclear Information System (INIS)

Hunana, P.; Zank, G. P.; Shaikh, D.

2006-01-01

A system of hydrodynamic equations in the presence of large-scale inhomogeneities for a high plasma beta solar wind is derived. The theory is derived under the assumption of low turbulent Mach number and is developed for the flows where the usual incompressible description is not satisfactory and a full compressible treatment is too complex for any analytical studies. When the effects of compressibility are incorporated only weakly, a new description, referred to as 'nearly incompressible hydrodynamics', is obtained. The nearly incompressible theory, was originally applied to homogeneous flows. However, large-scale gradients in density, pressure, temperature, etc., are typical in the solar wind and it was unclear how inhomogeneities would affect the usual incompressible and nearly incompressible descriptions. In the homogeneous case, the lowest order expansion of the fully compressible equations leads to the usual incompressible equations, followed at higher orders by the nearly incompressible equations, as introduced by Zank and Matthaeus. With this work we show that the inclusion of large-scale inhomogeneities (in this case time-independent and radially symmetric background solar wind) modifies the leading-order incompressible description of solar wind flow. We find, for example, that the divergence of velocity fluctuations is nonsolenoidal and that density fluctuations can be described to leading order as a passive scalar. Locally (for small lengthscales), this system of equations converges to the usual incompressible equations and we therefore use the term 'locally incompressible' to describe the equations. This term should be distinguished from the term 'nearly incompressible', which is reserved for higher-order corrections. Furthermore, we find that density fluctuations scale with Mach number linearly, in contrast to the original homogeneous nearly incompressible theory, in which density fluctuations scale with the square of Mach number. Inhomogeneous nearly
A parallel FE-FV scheme to solve fluid flow in complex geologic media

NARCIS (Netherlands)

Coumou, Dim; Matthäi, Stephan; Geiger, Sebastian; Driesner, Thomas

2008-01-01

Field data-based simulations of geologic systems require much computational time because of their mathematical complexity and the often desired large scales in space and time. To conduct accurate simulations in an acceptable time period, methods to reduce runtime are required. A parallelization
Parallel Algorithms for Switching Edges in Heterogeneous Graphs.

Science.gov (United States)

Bhuiyan, Hasanuzzaman; Khan, Maleq; Chen, Jiangzhuo; Marathe, Madhav

2017-06-01

An edge switch is an operation on a graph (or network) where two edges are selected randomly and one of their end vertices are swapped with each other. Edge switch operations have important applications in graph theory and network analysis, such as in generating random networks with a given degree sequence, modeling and analyzing dynamic networks, and in studying various dynamic phenomena over a network. The recent growth of real-world networks motivates the need for efficient parallel algorithms. The dependencies among successive edge switch operations and the requirement to keep the graph simple (i.e., no self-loops or parallel edges) as the edges are switched lead to significant challenges in designing a parallel algorithm. Addressing these challenges requires complex synchronization and communication among the processors leading to difficulties in achieving a good speedup by parallelization. In this paper, we present distributed memory parallel algorithms for switching edges in massive networks. These algorithms provide good speedup and scale well to a large number of processors. A harmonic mean speedup of 73.25 is achieved on eight different networks with 1024 processors. One of the steps in our edge switch algorithms requires the computation of multinomial random variables in parallel. This paper presents the first non-trivial parallel algorithm for the problem, achieving a speedup of 925 using 1024 processors.
Performance Health Monitoring of Large-Scale Systems

Energy Technology Data Exchange (ETDEWEB)

Rajamony, Ram [IBM Research, Austin, TX (United States)

2014-11-20

This report details the progress made on the ASCR funded project Performance Health Monitoring for Large Scale Systems. A large-scale application may not achieve its full performance potential due to degraded performance of even a single subsystem. Detecting performance faults, isolating them, and taking remedial action is critical for the scale of systems on the horizon. PHM aims to develop techniques and tools that can be used to identify and mitigate such performance problems. We accomplish this through two main aspects. The PHM framework encompasses diagnostics, system monitoring, fault isolation, and performance evaluation capabilities that indicates when a performance fault has been detected, either due to an anomaly present in the system itself or due to contention for shared resources between concurrently executing jobs. Software components called the PHM Control system then build upon the capabilities provided by the PHM framework to mitigate degradation caused by performance problems.
Large-scale simulations of error-prone quantum computation devices

Energy Technology Data Exchange (ETDEWEB)

Trieu, Doan Binh

2009-07-01

The theoretical concepts of quantum computation in the idealized and undisturbed case are well understood. However, in practice, all quantum computation devices do suffer from decoherence effects as well as from operational imprecisions. This work assesses the power of error-prone quantum computation devices using large-scale numerical simulations on parallel supercomputers. We present the Juelich Massively Parallel Ideal Quantum Computer Simulator (JUMPIQCS), that simulates a generic quantum computer on gate level. It comprises an error model for decoherence and operational errors. The robustness of various algorithms in the presence of noise has been analyzed. The simulation results show that for large system sizes and long computations it is imperative to actively correct errors by means of quantum error correction. We implemented the 5-, 7-, and 9-qubit quantum error correction codes. Our simulations confirm that using error-prone correction circuits with non-fault-tolerant quantum error correction will always fail, because more errors are introduced than being corrected. Fault-tolerant methods can overcome this problem, provided that the single qubit error rate is below a certain threshold. We incorporated fault-tolerant quantum error correction techniques into JUMPIQCS using Steane's 7-qubit code and determined this threshold numerically. Using the depolarizing channel as the source of decoherence, we find a threshold error rate of (5.2{+-}0.2) x 10{sup -6}. For Gaussian distributed operational over-rotations the threshold lies at a standard deviation of 0.0431{+-}0.0002. We can conclude that quantum error correction is especially well suited for the correction of operational imprecisions and systematic over-rotations. For realistic simulations of specific quantum computation devices we need to extend the generic model to dynamic simulations, i.e. time-dependent Hamiltonian simulations of realistic hardware models. We focus on today's most advanced
Environmental Impacts of Large Scale Biochar Application Through Spatial Modeling

Science.gov (United States)

Huber, I.; Archontoulis, S.

2017-12-01

In an effort to study the environmental (emissions, soil quality) and production (yield) impacts of biochar application at regional scales we coupled the APSIM-Biochar model with the pSIMS parallel platform. So far the majority of biochar research has been concentrated on lab to field studies to advance scientific knowledge. Regional scale assessments are highly needed to assist decision making. The overall objective of this simulation study was to identify areas in the USA that have the most gain environmentally from biochar's application, as well as areas which our model predicts a notable yield increase due to the addition of biochar. We present the modifications in both APSIM biochar and pSIMS components that were necessary to facilitate these large scale model runs across several regions in the United States at a resolution of 5 arcminutes. This study uses the AgMERRA global climate data set (1980-2010) and the Global Soil Dataset for Earth Systems modeling as a basis for creating its simulations, as well as local management operations for maize and soybean cropping systems and different biochar application rates. The regional scale simulation analysis is in progress. Preliminary results showed that the model predicts that high quality soils (particularly those common to Iowa cropping systems) do not receive much, if any, production benefit from biochar. However, soils with low soil organic matter ( 0.5%) do get a noteworthy yield increase of around 5-10% in the best cases. We also found N2O emissions to be spatial and temporal specific; increase in some areas and decrease in some other areas due to biochar application. In contrast, we found increases in soil organic carbon and plant available water in all soils (top 30 cm) due to biochar application. The magnitude of these increases (% change from the control) were larger in soil with low organic matter (below 1.5%) and smaller in soils with high organic matter (above 3%) and also dependent on biochar
Learning from large scale neural simulations

DEFF Research Database (Denmark)

Serban, Maria

2017-01-01

Large-scale neural simulations have the marks of a distinct methodology which can be fruitfully deployed to advance scientific understanding of the human brain. Computer simulation studies can be used to produce surrogate observational data for better conceptual models and new how...
Phenomenology of two-dimensional stably stratified turbulence under large-scale forcing

KAUST Repository

Kumar, Abhishek; Verma, Mahendra K.; Sukhatme, Jai

2017-01-01

In this paper, we characterise the scaling of energy spectra, and the interscale transfer of energy and enstrophy, for strongly, moderately and weakly stably stratified two-dimensional (2D) turbulence, restricted in a vertical plane, under large-scale random forcing. In the strongly stratified case, a large-scale vertically sheared horizontal flow (VSHF) coexists with small scale turbulence. The VSHF consists of internal gravity waves and the turbulent flow has a kinetic energy (KE) spectrum that follows an approximate k−3 scaling with zero KE flux and a robust positive enstrophy flux. The spectrum of the turbulent potential energy (PE) also approximately follows a k−3 power-law and its flux is directed to small scales. For moderate stratification, there is no VSHF and the KE of the turbulent flow exhibits Bolgiano–Obukhov scaling that transitions from a shallow k−11/5 form at large scales, to a steeper approximate k−3 scaling at small scales. The entire range of scales shows a strong forward enstrophy flux, and interestingly, large (small) scales show an inverse (forward) KE flux. The PE flux in this regime is directed to small scales, and the PE spectrum is characterised by an approximate k−1.64 scaling. Finally, for weak stratification, KE is transferred upscale and its spectrum closely follows a k−2.5 scaling, while PE exhibits a forward transfer and its spectrum shows an approximate k−1.6 power-law. For all stratification strengths, the total energy always flows from large to small scales and almost all the spectral indicies are well explained by accounting for the scale-dependent nature of the corresponding flux.
Phenomenology of two-dimensional stably stratified turbulence under large-scale forcing

KAUST Repository

Kumar, Abhishek

2017-01-11

In this paper, we characterise the scaling of energy spectra, and the interscale transfer of energy and enstrophy, for strongly, moderately and weakly stably stratified two-dimensional (2D) turbulence, restricted in a vertical plane, under large-scale random forcing. In the strongly stratified case, a large-scale vertically sheared horizontal flow (VSHF) coexists with small scale turbulence. The VSHF consists of internal gravity waves and the turbulent flow has a kinetic energy (KE) spectrum that follows an approximate k−3 scaling with zero KE flux and a robust positive enstrophy flux. The spectrum of the turbulent potential energy (PE) also approximately follows a k−3 power-law and its flux is directed to small scales. For moderate stratification, there is no VSHF and the KE of the turbulent flow exhibits Bolgiano–Obukhov scaling that transitions from a shallow k−11/5 form at large scales, to a steeper approximate k−3 scaling at small scales. The entire range of scales shows a strong forward enstrophy flux, and interestingly, large (small) scales show an inverse (forward) KE flux. The PE flux in this regime is directed to small scales, and the PE spectrum is characterised by an approximate k−1.64 scaling. Finally, for weak stratification, KE is transferred upscale and its spectrum closely follows a k−2.5 scaling, while PE exhibits a forward transfer and its spectrum shows an approximate k−1.6 power-law. For all stratification strengths, the total energy always flows from large to small scales and almost all the spectral indicies are well explained by accounting for the scale-dependent nature of the corresponding flux.
Exploring the large-scale structure of Taylor–Couette turbulence through Large-Eddy Simulations

Science.gov (United States)

Ostilla-Mónico, Rodolfo; Zhu, Xiaojue; Verzicco, Roberto

2018-04-01

Large eddy simulations (LES) of Taylor-Couette (TC) flow, the flow between two co-axial and independently rotating cylinders are performed in an attempt to explore the large-scale axially-pinned structures seen in experiments and simulations. Both static and dynamic LES models are used. The Reynolds number is kept fixed at Re = 3.4 · 104, and the radius ratio η = ri /ro is set to η = 0.909, limiting the effects of curvature and resulting in frictional Reynolds numbers of around Re τ ≈ 500. Four rotation ratios from Rot = ‑0.0909 to Rot = 0.3 are simulated. First, the LES of TC is benchmarked for different rotation ratios. Both the Smagorinsky model with a constant of cs = 0.1 and the dynamic model are found to produce reasonable results for no mean rotation and cyclonic rotation, but deviations increase for increasing rotation. This is attributed to the increasing anisotropic character of the fluctuations. Second, “over-damped” LES, i.e. LES with a large Smagorinsky constant is performed and is shown to reproduce some features of the large-scale structures, even when the near-wall region is not adequately modeled. This shows the potential for using over-damped LES for fast explorations of the parameter space where large-scale structures are found.
Large-scale preparation of hollow graphitic carbon nanospheres

International Nuclear Information System (INIS)

Feng, Jun; Li, Fu; Bai, Yu-Jun; Han, Fu-Dong; Qi, Yong-Xin; Lun, Ning; Lu, Xi-Feng

2013-01-01

Hollow graphitic carbon nanospheres (HGCNSs) were synthesized on large scale by a simple reaction between glucose and Mg at 550 °C in an autoclave. Characterization by X-ray diffraction, Raman spectroscopy and transmission electron microscopy demonstrates the formation of HGCNSs with an average diameter of 10 nm or so and a wall thickness of a few graphenes. The HGCNSs exhibit a reversible capacity of 391 mAh g −1 after 60 cycles when used as anode materials for Li-ion batteries. -- Graphical abstract: Hollow graphitic carbon nanospheres could be prepared on large scale by the simple reaction between glucose and Mg at 550 °C, which exhibit superior electrochemical performance to graphite. Highlights: ► Hollow graphitic carbon nanospheres (HGCNSs) were prepared on large scale at 550 °C ► The preparation is simple, effective and eco-friendly. ► The in situ yielded MgO nanocrystals promote the graphitization. ► The HGCNSs exhibit superior electrochemical performance to graphite.
Accelerating large-scale phase-field simulations with GPU

Directory of Open Access Journals (Sweden)

Xiaoming Shi

2017-10-01

Full Text Available A new package for accelerating large-scale phase-field simulations was developed by using GPU based on the semi-implicit Fourier method. The package can solve a variety of equilibrium equations with different inhomogeneity including long-range elastic, magnetostatic, and electrostatic interactions. Through using specific algorithm in Compute Unified Device Architecture (CUDA, Fourier spectral iterative perturbation method was integrated in GPU package. The Allen-Cahn equation, Cahn-Hilliard equation, and phase-field model with long-range interaction were solved based on the algorithm running on GPU respectively to test the performance of the package. From the comparison of the calculation results between the solver executed in single CPU and the one on GPU, it was found that the speed on GPU is enormously elevated to 50 times faster. The present study therefore contributes to the acceleration of large-scale phase-field simulations and provides guidance for experiments to design large-scale functional devices.

First Mile Challenges for Large-Scale IoT

KAUST Repository

Bader, Ahmed

2017-03-16

The Internet of Things is large-scale by nature. This is not only manifested by the large number of connected devices, but also by the sheer scale of spatial traffic intensity that must be accommodated, primarily in the uplink direction. To that end, cellular networks are indeed a strong first mile candidate to accommodate the data tsunami to be generated by the IoT. However, IoT devices are required in the cellular paradigm to undergo random access procedures as a precursor to resource allocation. Such procedures impose a major bottleneck that hinders cellular networks\\' ability to support large-scale IoT. In this article, we shed light on the random access dilemma and present a case study based on experimental data as well as system-level simulations. Accordingly, a case is built for the latent need to revisit random access procedures. A call for action is motivated by listing a few potential remedies and recommendations.
Likelihood Approximation With Parallel Hierarchical Matrices For Large Spatial Datasets

KAUST Repository

Litvinenko, Alexander; Sun, Ying; Genton, Marc G.; Keyes, David E.

2017-01-01

The main goal of this article is to introduce the parallel hierarchical matrix library HLIBpro to the statistical community. We describe the HLIBCov package, which is an extension of the HLIBpro library for approximating large covariance matrices and maximizing likelihood functions. We show that an approximate Cholesky factorization of a dense matrix of size $2M\\times 2M$ can be computed on a modern multi-core desktop in few minutes. Further, HLIBCov is used for estimating the unknown parameters such as the covariance length, variance and smoothness parameter of a Matérn covariance function by maximizing the joint Gaussian log-likelihood function. The computational bottleneck here is expensive linear algebra arithmetics due to large and dense covariance matrices. Therefore covariance matrices are approximated in the hierarchical ($\\H$-) matrix format with computational cost $\\mathcal{O}(k^2n \\log^2 n/p)$ and storage $\\mathcal{O}(kn \\log n)$, where the rank $k$ is a small integer (typically $k<25$), $p$ the number of cores and $n$ the number of locations on a fairly general mesh. We demonstrate a synthetic example, where the true values of known parameters are known. For reproducibility we provide the C++ code, the documentation, and the synthetic data.
Likelihood Approximation With Parallel Hierarchical Matrices For Large Spatial Datasets

KAUST Repository

Litvinenko, Alexander

2017-11-01

The main goal of this article is to introduce the parallel hierarchical matrix library HLIBpro to the statistical community. We describe the HLIBCov package, which is an extension of the HLIBpro library for approximating large covariance matrices and maximizing likelihood functions. We show that an approximate Cholesky factorization of a dense matrix of size $2M\\\\times 2M$ can be computed on a modern multi-core desktop in few minutes. Further, HLIBCov is used for estimating the unknown parameters such as the covariance length, variance and smoothness parameter of a Matérn covariance function by maximizing the joint Gaussian log-likelihood function. The computational bottleneck here is expensive linear algebra arithmetics due to large and dense covariance matrices. Therefore covariance matrices are approximated in the hierarchical ($\\\\H$-) matrix format with computational cost $\\\\mathcal{O}(k^2n \\\\log^2 n/p)$ and storage $\\\\mathcal{O}(kn \\\\log n)$, where the rank $k$ is a small integer (typically $k<25$), $p$ the number of cores and $n$ the number of locations on a fairly general mesh. We demonstrate a synthetic example, where the true values of known parameters are known. For reproducibility we provide the C++ code, the documentation, and the synthetic data.
10th International Workshop on Parallel Tools for High Performance Computing

CERN Document Server

Gracia, José; Hilbrich, Tobias; Knüpfer, Andreas; Resch, Michael; Nagel, Wolfgang

2017-01-01

This book presents the proceedings of the 10th International Parallel Tools Workshop, held October 4-5, 2016 in Stuttgart, Germany – a forum to discuss the latest advances in parallel tools. High-performance computing plays an increasingly important role for numerical simulation and modelling in academic and industrial research. At the same time, using large-scale parallel systems efficiently is becoming more difficult. A number of tools addressing parallel program development and analysis have emerged from the high-performance computing community over the last decade, and what may have started as collection of small helper script has now matured to production-grade frameworks. Powerful user interfaces and an extensive body of documentation allow easy usage by non-specialists.
Observing the Cosmic Microwave Background Polarization with Variable-delay Polarization Modulators for the Cosmology Large Angular Scale Surveyor

Science.gov (United States)

Harrington, Kathleen; CLASS Collaboration

2018-01-01

The search for inflationary primordial gravitational waves and the optical depth to reionization, both through their imprint on the large angular scale correlations in the polarization of the cosmic microwave background (CMB), has created the need for high sensitivity measurements of polarization across large fractions of the sky at millimeter wavelengths. These measurements are subjected to instrumental and atmospheric 1/f noise, which has motivated the development of polarization modulators to facilitate the rejection of these large systematic effects.Variable-delay polarization modulators (VPMs) are used in the Cosmology Large Angular Scale Surveyor (CLASS) telescopes as the first element in the optical chain to rapidly modulate the incoming polarization. VPMs consist of a linearly polarizing wire grid in front of a moveable flat mirror; varying the distance between the grid and the mirror produces a changing phase shift between polarization states parallel and perpendicular to the grid which modulates Stokes U (linear polarization at 45°) and Stokes V (circular polarization). The reflective and scalable nature of the VPM enables its placement as the first optical element in a reflecting telescope. This simultaneously allows a lock-in style polarization measurement and the separation of sky polarization from any instrumental polarization farther along in the optical chain.The Q-Band CLASS VPM was the first VPM to begin observing the CMB full time in 2016. I will be presenting its design and characterization as well as demonstrating how modulating polarization significantly rejects atmospheric and instrumental long time scale noise.
HTMT-class Latency Tolerant Parallel Architecture for Petaflops Scale Computation

Science.gov (United States)

Sterling, Thomas; Bergman, Larry

2000-01-01

Computational Aero Sciences and other numeric intensive computation disciplines demand computing throughputs substantially greater than the Teraflops scale systems only now becoming available. The related fields of fluids, structures, thermal, combustion, and dynamic controls are among the interdisciplinary areas that in combination with sufficient resolution and advanced adaptive techniques may force performance requirements towards Petaflops. This will be especially true for compute intensive models such as Navier-Stokes are or when such system models are only part of a larger design optimization computation involving many design points. Yet recent experience with conventional MPP configurations comprising commodity processing and memory components has shown that larger scale frequently results in higher programming difficulty and lower system efficiency. While important advances in system software and algorithms techniques have had some impact on efficiency and programmability for certain classes of problems, in general it is unlikely that software alone will resolve the challenges to higher scalability. As in the past, future generations of high-end computers may require a combination of hardware architecture and system software advances to enable efficient operation at a Petaflops level. The NASA led HTMT project has engaged the talents of a broad interdisciplinary team to develop a new strategy in high-end system architecture to deliver petaflops scale computing in the 2004/5 timeframe. The Hybrid-Technology, MultiThreaded parallel computer architecture incorporates several advanced technologies in combination with an innovative dynamic adaptive scheduling mechanism to provide unprecedented performance and efficiency within practical constraints of cost, complexity, and power consumption. The emerging superconductor Rapid Single Flux Quantum electronics can operate at 100 GHz (the record is 770 GHz) and one percent of the power required by convention
Performance Analysis and Scaling Behavior of the Terrestrial Systems Modeling Platform TerrSysMP in Large-Scale Supercomputing Environments

Science.gov (United States)

Kollet, S. J.; Goergen, K.; Gasper, F.; Shresta, P.; Sulis, M.; Rihani, J.; Simmer, C.; Vereecken, H.

2013-12-01

In studies of the terrestrial hydrologic, energy and biogeochemical cycles, integrated multi-physics simulation platforms take a central role in characterizing non-linear interactions, variances and uncertainties of system states and fluxes in reciprocity with observations. Recently developed integrated simulation platforms attempt to honor the complexity of the terrestrial system across multiple time and space scales from the deeper subsurface including groundwater dynamics into the atmosphere. Technically, this requires the coupling of atmospheric, land surface, and subsurface-surface flow models in supercomputing environments, while ensuring a high-degree of efficiency in the utilization of e.g., standard Linux clusters and massively parallel resources. A systematic performance analysis including profiling and tracing in such an application is crucial in the understanding of the runtime behavior, to identify optimum model settings, and is an efficient way to distinguish potential parallel deficiencies. On sophisticated leadership-class supercomputers, such as the 28-rack 5.9 petaFLOP IBM Blue Gene/Q 'JUQUEEN' of the Jülich Supercomputing Centre (JSC), this is a challenging task, but even more so important, when complex coupled component models are to be analysed. Here we want to present our experience from coupling, application tuning (e.g. 5-times speedup through compiler optimizations), parallel scaling and performance monitoring of the parallel Terrestrial Systems Modeling Platform TerrSysMP. The modeling platform consists of the weather prediction system COSMO of the German Weather Service; the Community Land Model, CLM of NCAR; and the variably saturated surface-subsurface flow code ParFlow. The model system relies on the Multiple Program Multiple Data (MPMD) execution model where the external Ocean-Atmosphere-Sea-Ice-Soil coupler (OASIS3) links the component models. TerrSysMP has been instrumented with the performance analysis tool Scalasca and analyzed
Thermal power generation projects ``Large Scale Solar Heating``; EU-Thermie-Projekte ``Large Scale Solar Heating``

Energy Technology Data Exchange (ETDEWEB)

Kuebler, R.; Fisch, M.N. [Steinbeis-Transferzentrum Energie-, Gebaeude- und Solartechnik, Stuttgart (Germany)

1998-12-31

The aim of this project is the preparation of the ``Large-Scale Solar Heating`` programme for an Europe-wide development of subject technology. The following demonstration programme was judged well by the experts but was not immediately (1996) accepted for financial subsidies. In November 1997 the EU-commission provided 1,5 million ECU which allowed the realisation of an updated project proposal. By mid 1997 a small project was approved, that had been requested under the lead of Chalmes Industriteteknik (CIT) in Sweden and is mainly carried out for the transfer of technology. (orig.) [Deutsch] Ziel dieses Vorhabens ist die Vorbereitung eines Schwerpunktprogramms `Large Scale Solar Heating`, mit dem die Technologie europaweit weiterentwickelt werden sollte. Das daraus entwickelte Demonstrationsprogramm wurde von den Gutachtern positiv bewertet, konnte jedoch nicht auf Anhieb (1996) in die Foerderung aufgenommen werden. Im November 1997 wurden von der EU-Kommission dann kurzfristig noch 1,5 Mio ECU an Foerderung bewilligt, mit denen ein aktualisierter Projektvorschlag realisiert werden kann. Bereits Mitte 1997 wurde ein kleineres Vorhaben bewilligt, das unter Federfuehrung von Chalmers Industriteknik (CIT) in Schweden beantragt worden war und das vor allem dem Technologietransfer dient. (orig.)
Introducing PROFESS 2.0: A parallelized, fully linear scaling program for orbital-free density functional theory calculations

Science.gov (United States)

Hung, Linda; Huang, Chen; Shin, Ilgyou; Ho, Gregory S.; Lignères, Vincent L.; Carter, Emily A.

2010-12-01

Orbital-free density functional theory (OFDFT) is a first principles quantum mechanics method to find the ground-state energy of a system by variationally minimizing with respect to the electron density. No orbitals are used in the evaluation of the kinetic energy (unlike Kohn-Sham DFT), and the method scales nearly linearly with the size of the system. The PRinceton Orbital-Free Electronic Structure Software (PROFESS) uses OFDFT to model materials from the atomic scale to the mesoscale. This new version of PROFESS allows the study of larger systems with two significant changes: PROFESS is now parallelized, and the ion-electron and ion-ion terms scale quasilinearly, instead of quadratically as in PROFESS v1 (L. Hung and E.A. Carter, Chem. Phys. Lett. 475 (2009) 163). At the start of a run, PROFESS reads the various input files that describe the geometry of the system (ion positions and cell dimensions), the type of elements (defined by electron-ion pseudopotentials), the actions you want it to perform (minimize with respect to electron density and/or ion positions and/or cell lattice vectors), and the various options for the computation (such as which functionals you want it to use). Based on these inputs, PROFESS sets up a computation and performs the appropriate optimizations. Energies, forces, stresses, material geometries, and electron density configurations are some of the values that can be output throughout the optimization. New version program summaryProgram Title: PROFESS Catalogue identifier: AEBN_v2_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEBN_v2_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 68 721 No. of bytes in distributed program, including test data, etc.: 1 708 547 Distribution format: tar.gz Programming language: Fortran 90 Computer
Large-scale retrieval for medical image analytics: A comprehensive review.

Science.gov (United States)

Li, Zhongyu; Zhang, Xiaofan; Müller, Henning; Zhang, Shaoting

2018-01-01

Over the past decades, medical image analytics was greatly facilitated by the explosion of digital imaging techniques, where huge amounts of medical images were produced with ever-increasing quality and diversity. However, conventional methods for analyzing medical images have achieved limited success, as they are not capable to tackle the huge amount of image data. In this paper, we review state-of-the-art approaches for large-scale medical image analysis, which are mainly based on recent advances in computer vision, machine learning and information retrieval. Specifically, we first present the general pipeline of large-scale retrieval, summarize the challenges/opportunities of medical image analytics on a large-scale. Then, we provide a comprehensive review of algorithms and techniques relevant to major processes in the pipeline, including feature representation, feature indexing, searching, etc. On the basis of existing work, we introduce the evaluation protocols and multiple applications of large-scale medical image retrieval, with a variety of exploratory and diagnostic scenarios. Finally, we discuss future directions of large-scale retrieval, which can further improve the performance of medical image analysis. Copyright © 2017 Elsevier B.V. All rights reserved.
Photorealistic large-scale urban city model reconstruction.

Science.gov (United States)

Poullis, Charalambos; You, Suya

2009-01-01

The rapid and efficient creation of virtual environments has become a crucial part of virtual reality applications. In particular, civil and defense applications often require and employ detailed models of operations areas for training, simulations of different scenarios, planning for natural or man-made events, monitoring, surveillance, games, and films. A realistic representation of the large-scale environments is therefore imperative for the success of such applications since it increases the immersive experience of its users and helps reduce the difference between physical and virtual reality. However, the task of creating such large-scale virtual environments still remains a time-consuming and manual work. In this work, we propose a novel method for the rapid reconstruction of photorealistic large-scale virtual environments. First, a novel, extendible, parameterized geometric primitive is presented for the automatic building identification and reconstruction of building structures. In addition, buildings with complex roofs containing complex linear and nonlinear surfaces are reconstructed interactively using a linear polygonal and a nonlinear primitive, respectively. Second, we present a rendering pipeline for the composition of photorealistic textures, which unlike existing techniques, can recover missing or occluded texture information by integrating multiple information captured from different optical sensors (ground, aerial, and satellite).
Prototype Vector Machine for Large Scale Semi-Supervised Learning

Energy Technology Data Exchange (ETDEWEB)

Zhang, Kai; Kwok, James T.; Parvin, Bahram

2009-04-29

Practicaldataminingrarelyfalls exactlyinto the supervisedlearning scenario. Rather, the growing amount of unlabeled data poses a big challenge to large-scale semi-supervised learning (SSL). We note that the computationalintensivenessofgraph-based SSLarises largely from the manifold or graph regularization, which in turn lead to large models that are dificult to handle. To alleviate this, we proposed the prototype vector machine (PVM), a highlyscalable,graph-based algorithm for large-scale SSL. Our key innovation is the use of"prototypes vectors" for effcient approximation on both the graph-based regularizer and model representation. The choice of prototypes are grounded upon two important criteria: they not only perform effective low-rank approximation of the kernel matrix, but also span a model suffering the minimum information loss compared with the complete model. We demonstrate encouraging performance and appealing scaling properties of the PVM on a number of machine learning benchmark data sets.
Parallel steady state studies on a milliliter scale accelerate fed-batch bioprocess design for recombinant protein production with Escherichia coli.

Science.gov (United States)

Schmideder, Andreas; Cremer, Johannes H; Weuster-Botz, Dirk

2016-11-01

In general, fed-batch processes are applied for recombinant protein production with Escherichia coli (E. coli). However, state of the art methods for identifying suitable reaction conditions suffer from severe drawbacks, i.e. direct transfer of process information from parallel batch studies is often defective and sequential fed-batch studies are time-consuming and cost-intensive. In this study, continuously operated stirred-tank reactors on a milliliter scale were applied to identify suitable reaction conditions for fed-batch processes. Isopropyl β-d-1-thiogalactopyranoside (IPTG) induction strategies were varied in parallel-operated stirred-tank bioreactors to study the effects on the continuous production of the recombinant protein photoactivatable mCherry (PAmCherry) with E. coli. Best-performing induction strategies were transferred from the continuous processes on a milliliter scale to liter scale fed-batch processes. Inducing recombinant protein expression by dynamically increasing the IPTG concentration to 100 µM led to an increase in the product concentration of 21% (8.4 g L -1 ) compared to an implemented high-performance production process with the most frequently applied induction strategy by a single addition of 1000 µM IPGT. Thus, identifying feasible reaction conditions for fed-batch processes in parallel continuous studies on a milliliter scale was shown to be a powerful, novel method to accelerate bioprocess design in a cost-reducing manner. © 2016 American Institute of Chemical Engineers Biotechnol. Prog., 32:1426-1435, 2016. © 2016 American Institute of Chemical Engineers.
General-purpose parallel algorithm based on CUDA for source pencils' deployment of large γ irradiator

International Nuclear Information System (INIS)

Yang Lei; Gong Xueyu; Wang Ling

2013-01-01

Combined with standard mathematical model for evaluating quality of deploying results, a new high-performance parallel algorithm for source pencils' deployment was obtained by using parallel plant growth simulation algorithm which was completely parallelized with CUDA execute model, and the corresponding code can run on GPU. Based on such work, several instances in various scales were used to test the new version of algorithm. The results show that, based on the advantage of old versions. the performance of new one is improved more than 500 times comparing with the CPU version, and also 30 times with the CPU plus GPU hybrid version. The computation time of new version is less than ten minutes for the irradiator of which the activity is less than 111 PBq. For a single GTX275 GPU, the maximum computing power of new version is no more than 167 PBq as well as the computation time is no more than 25 minutes, and for multiple GPUs, the power can be improved more. Overall, the new version of algorithm running on GPU can satisfy the requirement of source pencils' deployment of any domestic irradiator, and it is of high competitiveness. (authors)
GATECloud.net: a platform for large-scale, open-source text processing on the cloud.

Science.gov (United States)

Tablan, Valentin; Roberts, Ian; Cunningham, Hamish; Bontcheva, Kalina

2013-01-28

Cloud computing is increasingly being regarded as a key enabler of the 'democratization of science', because on-demand, highly scalable cloud computing facilities enable researchers anywhere to carry out data-intensive experiments. In the context of natural language processing (NLP), algorithms tend to be complex, which makes their parallelization and deployment on cloud platforms a non-trivial task. This study presents a new, unique, cloud-based platform for large-scale NLP research--GATECloud. net. It enables researchers to carry out data-intensive NLP experiments by harnessing the vast, on-demand compute power of the Amazon cloud. Important infrastructural issues are dealt with by the platform, completely transparently for the researcher: load balancing, efficient data upload and storage, deployment on the virtual machines, security and fault tolerance. We also include a cost-benefit analysis and usage evaluation.
Evaluation of parallel milliliter-scale stirred-tank bioreactors for the study of biphasic whole-cell biocatalysis with ionic liquids.

Science.gov (United States)

Dennewald, Danielle; Hortsch, Ralf; Weuster-Botz, Dirk

2012-01-01

As clear structure-activity relationships are still rare for ionic liquids, preliminary experiments are necessary for the process development of biphasic whole-cell processes involving these solvents. To reduce the time investment and the material costs, the process development of such biphasic reaction systems would profit from a small-scale high-throughput platform. Exemplarily, the reduction of 2-octanone to (R)-2-octanol by a recombinant Escherichia coli in a biphasic ionic liquid/water system was studied in a miniaturized stirred-tank bioreactor system allowing the parallel operation of up to 48 reactors at the mL-scale. The results were compared to those obtained in a 20-fold larger stirred-tank reactor. The maximum local energy dissipation was evaluated at the larger scale and compared to the data available for the small-scale reactors, to verify if similar mass transfer could be obtained at both scales. Thereafter, the reaction kinetics and final conversions reached in different reactions setups were analysed. The results were in good agreement between both scales for varying ionic liquids and for ionic liquid volume fractions up to 40%. The parallel bioreactor system can thus be used for the process development of the majority of biphasic reaction systems involving ionic liquids, reducing the time and resource investment during the process development of this type of applications. Copyright © 2011. Published by Elsevier B.V.
Scalable multi-objective control for large scale water resources systems under uncertainty

Science.gov (United States)

Giuliani, Matteo; Quinn, Julianne; Herman, Jonathan; Castelletti, Andrea; Reed, Patrick

2016-04-01

The use of mathematical models to support the optimal management of environmental systems is rapidly expanding over the last years due to advances in scientific knowledge of the natural processes, efficiency of the optimization techniques, and availability of computational resources. However, undergoing changes in climate and society introduce additional challenges for controlling these systems, ultimately motivating the emergence of complex models to explore key causal relationships and dependencies on uncontrolled sources of variability. In this work, we contribute a novel implementation of the evolutionary multi-objective direct policy search (EMODPS) method for controlling environmental systems under uncertainty. The proposed approach combines direct policy search (DPS) with hierarchical parallelization of multi-objective evolutionary algorithms (MOEAs) and offers a threefold advantage: the DPS simulation-based optimization can be combined with any simulation model and does not add any constraint on modeled information, allowing the use of exogenous information in conditioning the decisions. Moreover, the combination of DPS and MOEAs prompts the generation or Pareto approximate set of solutions for up to 10 objectives, thus overcoming the decision biases produced by cognitive myopia, where narrow or restrictive definitions of optimality strongly limit the discovery of decision relevant alternatives. Finally, the use of large-scale MOEAs parallelization improves the ability of the designed solutions in handling the uncertainty due to severe natural variability. The proposed approach is demonstrated on a challenging water resources management problem represented by the optimal control of a network of four multipurpose water reservoirs in the Red River basin (Vietnam). As part of the medium-long term energy and food security national strategy, four large reservoirs have been constructed on the Red River tributaries, which are mainly operated for hydropower
Improving the Communication Pattern in Matrix-Vector Operations for Large Scale-Free Graphs by Disaggregation

Energy Technology Data Exchange (ETDEWEB)

Kuhlemann, Verena [Emory Univ., Atlanta, GA (United States); Vassilevski, Panayot S. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

2013-10-28

Matrix-vector multiplication is the key operation in any Krylov-subspace iteration method. We are interested in Krylov methods applied to problems associated with the graph Laplacian arising from large scale-free graphs. Furthermore, computations with graphs of this type on parallel distributed-memory computers are challenging. This is due to the fact that scale-free graphs have a degree distribution that follows a power law, and currently available graph partitioners are not efficient for such an irregular degree distribution. The lack of a good partitioning leads to excessive interprocessor communication requirements during every matrix-vector product. Here, we present an approach to alleviate this problem based on embedding the original irregular graph into a more regular one by disaggregating (splitting up) vertices in the original graph. The matrix-vector operations for the original graph are performed via a factored triple matrix-vector product involving the embedding graph. And even though the latter graph is larger, we are able to decrease the communication requirements considerably and improve the performance of the matrix-vector product.
Accelerating Relevance Vector Machine for Large-Scale Data on Spark

Directory of Open Access Journals (Sweden)

Liu Fang

2017-01-01

Full Text Available Relevance vector machine (RVM is a machine learning algorithm based on a sparse Bayesian framework, which performs well when running classification and regression tasks on small-scale datasets. However, RVM also has certain drawbacks which restricts its practical applications such as (1 slow training process, (2 poor performance on training large-scale datasets. In order to solve these problem, we propose Discrete AdaBoost RVM (DAB-RVM which incorporate ensemble learning in RVM at first. This method performs well with large-scale low-dimensional datasets. However, as the number of features increases, the training time of DAB-RVM increases as well. To avoid this phenomenon, we utilize the sufficient training samples of large-scale datasets and propose all features boosting RVM (AFB-RVM, which modifies the way of obtaining weak classifiers. In our experiments we study the differences between various boosting techniques with RVM, demonstrating the performance of the proposed approaches on Spark. As a result of this paper, two proposed approaches on Spark for different types of large-scale datasets are available.
A Parallel Approach for Frequent Subgraph Mining in a Single Large Graph Using Spark

Directory of Open Access Journals (Sweden)

Fengcai Qiao

2018-02-01

Full Text Available Frequent subgraph mining (FSM plays an important role in graph mining, attracting a great deal of attention in many areas, such as bioinformatics, web data mining and social networks. In this paper, we propose SSiGraM (Spark based Single Graph Mining, a Spark based parallel frequent subgraph mining algorithm in a single large graph. Aiming to approach the two computational challenges of FSM, we conduct the subgraph extension and support evaluation parallel across all the distributed cluster worker nodes. In addition, we also employ a heuristic search strategy and three novel optimizations: load balancing, pre-search pruning and top-down pruning in the support evaluation process, which significantly improve the performance. Extensive experiments with four different real-world datasets demonstrate that the proposed algorithm outperforms the existing GraMi (Graph Mining algorithm by an order of magnitude for all datasets and can work with a lower support threshold.

Bayesian hierarchical model for large-scale covariance matrix estimation.

Science.gov (United States)

Zhu, Dongxiao; Hero, Alfred O

2007-12-01

Many bioinformatics problems implicitly depend on estimating large-scale covariance matrix. The traditional approaches tend to give rise to high variance and low accuracy due to "overfitting." We cast the large-scale covariance matrix estimation problem into the Bayesian hierarchical model framework, and introduce dependency between covariance parameters. We demonstrate the advantages of our approaches over the traditional approaches using simulations and OMICS data analysis.
Gaussian Pulse-Based Two-Threshold Parallel Scaling Tone Reservation for PAPR Reduction of OFDM Signals

Directory of Open Access Journals (Sweden)

Lei Guan

2011-01-01

Full Text Available Tone Reservation (TR is a technique proposed to combat the high Peak-to-Average Power Ratio (PAPR problem of Orthogonal Frequency Division Multiplexing (OFDM signals. However conventional TR suffers from high computational cost due to the difficulties in finding an effective cancellation signal in the time domain by using only a few tones in the frequency domain. It also suffers from a high cost of hardware implementation and long handling time delay issues due to the need to conduct multiple iterations to cancel multiple high signal peaks. In this paper, we propose an efficient approach, called two-threshold parallel scaling, for implementing a previously proposed Gaussian pulse-based Tone Reservation algorithm. Compared to conventional approaches, this technique significantly reduces the hardware implementation complexity and cost, while also reducing signal processing time delay by using just two iterations. Experimental results show that the proposed technique can effectively reduce the PAPR of OFDM signals with only a very small number of reserved tones and with limited usage of hardware resources. This technique is suitable for any OFDM-based communication systems, especially for Digital Video Broadcasting (DVB systems employing large IFFT/FFT transforms.
Creating Large Scale Database Servers

International Nuclear Information System (INIS)

Becla, Jacek

2001-01-01

The BaBar experiment at the Stanford Linear Accelerator Center (SLAC) is designed to perform a high precision investigation of the decays of the B-meson produced from electron-positron interactions. The experiment, started in May 1999, will generate approximately 300TB/year of data for 10 years. All of the data will reside in Objectivity databases accessible via the Advanced Multi-threaded Server (AMS). To date, over 70TB of data have been placed in Objectivity/DB, making it one of the largest databases in the world. Providing access to such a large quantity of data through a database server is a daunting task. A full-scale testbed environment had to be developed to tune various software parameters and a fundamental change had to occur in the AMS architecture to allow it to scale past several hundred terabytes of data. Additionally, several protocol extensions had to be implemented to provide practical access to large quantities of data. This paper will describe the design of the database and the changes that we needed to make in the AMS for scalability reasons and how the lessons we learned would be applicable to virtually any kind of database server seeking to operate in the Petabyte region
Creating Large Scale Database Servers

Energy Technology Data Exchange (ETDEWEB)

Becla, Jacek

2001-12-14

The BaBar experiment at the Stanford Linear Accelerator Center (SLAC) is designed to perform a high precision investigation of the decays of the B-meson produced from electron-positron interactions. The experiment, started in May 1999, will generate approximately 300TB/year of data for 10 years. All of the data will reside in Objectivity databases accessible via the Advanced Multi-threaded Server (AMS). To date, over 70TB of data have been placed in Objectivity/DB, making it one of the largest databases in the world. Providing access to such a large quantity of data through a database server is a daunting task. A full-scale testbed environment had to be developed to tune various software parameters and a fundamental change had to occur in the AMS architecture to allow it to scale past several hundred terabytes of data. Additionally, several protocol extensions had to be implemented to provide practical access to large quantities of data. This paper will describe the design of the database and the changes that we needed to make in the AMS for scalability reasons and how the lessons we learned would be applicable to virtually any kind of database server seeking to operate in the Petabyte region.
Decentralised stabilising controllers for a class of large-scale linear ...

Indian Academy of Sciences (India)

subsystems resulting from a new aggregation-decomposition technique. The method has been illustrated through a numerical example of a large-scale linear system consisting of three subsystems each of the fourth order. Keywords. Decentralised stabilisation; large-scale linear systems; optimal feedback control; algebraic ...
Large Scale Survey Data in Career Development Research

Science.gov (United States)

Diemer, Matthew A.

2008-01-01

Large scale survey datasets have been underutilized but offer numerous advantages for career development scholars, as they contain numerous career development constructs with large and diverse samples that are followed longitudinally. Constructs such as work salience, vocational expectations, educational expectations, work satisfaction, and…
Similitude and scaling of large structural elements: Case study

Directory of Open Access Journals (Sweden)

M. Shehadeh

2015-06-01

Full Text Available Scaled down models are widely used for experimental investigations of large structures due to the limitation in the capacities of testing facilities along with the expenses of the experimentation. The modeling accuracy depends upon the model material properties, fabrication accuracy and loading techniques. In the present work the Buckingham π theorem is used to develop the relations (i.e. geometry, loading and properties between the model and a large structural element as that is present in the huge existing petroleum oil drilling rigs. The model is to be designed, loaded and treated according to a set of similitude requirements that relate the model to the large structural element. Three independent scale factors which represent three fundamental dimensions, namely mass, length and time need to be selected for designing the scaled down model. Numerical prediction of the stress distribution within the model and its elastic deformation under steady loading is to be made. The results are compared with those obtained from the full scale structure numerical computations. The effect of scaled down model size and material on the accuracy of the modeling technique is thoroughly examined.
Large-Scale Image Analytics Using Deep Learning

Science.gov (United States)

Ganguly, S.; Nemani, R. R.; Basu, S.; Mukhopadhyay, S.; Michaelis, A.; Votava, P.

2014-12-01

High resolution land cover classification maps are needed to increase the accuracy of current Land ecosystem and climate model outputs. Limited studies are in place that demonstrates the state-of-the-art in deriving very high resolution (VHR) land cover products. In addition, most methods heavily rely on commercial softwares that are difficult to scale given the region of study (e.g. continents to globe). Complexities in present approaches relate to (a) scalability of the algorithm, (b) large image data processing (compute and memory intensive), (c) computational cost, (d) massively parallel architecture, and (e) machine learning automation. In addition, VHR satellite datasets are of the order of terabytes and features extracted from these datasets are of the order of petabytes. In our present study, we have acquired the National Agricultural Imaging Program (NAIP) dataset for the Continental United States at a spatial resolution of 1-m. This data comes as image tiles (a total of quarter million image scenes with ~60 million pixels) and has a total size of ~100 terabytes for a single acquisition. Features extracted from the entire dataset would amount to ~8-10 petabytes. In our proposed approach, we have implemented a novel semi-automated machine learning algorithm rooted on the principles of "deep learning" to delineate the percentage of tree cover. In order to perform image analytics in such a granular system, it is mandatory to devise an intelligent archiving and query system for image retrieval, file structuring, metadata processing and filtering of all available image scenes. Using the Open NASA Earth Exchange (NEX) initiative, which is a partnership with Amazon Web Services (AWS), we have developed an end-to-end architecture for designing the database and the deep belief network (following the distbelief computing model) to solve a grand challenge of scaling this process across quarter million NAIP tiles that cover the entire Continental United States. The
Large-scale preparation of hollow graphitic carbon nanospheres

Energy Technology Data Exchange (ETDEWEB)

Feng, Jun; Li, Fu [Key Laboratory for Liquid-Solid Structural Evolution and Processing of Materials, Ministry of Education, Shandong University, Jinan 250061 (China); Bai, Yu-Jun, E-mail: byj97@126.com [Key Laboratory for Liquid-Solid Structural Evolution and Processing of Materials, Ministry of Education, Shandong University, Jinan 250061 (China); State Key laboratory of Crystal Materials, Shandong University, Jinan 250100 (China); Han, Fu-Dong; Qi, Yong-Xin; Lun, Ning [Key Laboratory for Liquid-Solid Structural Evolution and Processing of Materials, Ministry of Education, Shandong University, Jinan 250061 (China); Lu, Xi-Feng [Lunan Institute of Coal Chemical Engineering, Jining 272000 (China)

2013-01-15

Hollow graphitic carbon nanospheres (HGCNSs) were synthesized on large scale by a simple reaction between glucose and Mg at 550 Degree-Sign C in an autoclave. Characterization by X-ray diffraction, Raman spectroscopy and transmission electron microscopy demonstrates the formation of HGCNSs with an average diameter of 10 nm or so and a wall thickness of a few graphenes. The HGCNSs exhibit a reversible capacity of 391 mAh g{sup -1} after 60 cycles when used as anode materials for Li-ion batteries. -- Graphical abstract: Hollow graphitic carbon nanospheres could be prepared on large scale by the simple reaction between glucose and Mg at 550 Degree-Sign C, which exhibit superior electrochemical performance to graphite. Highlights: Black-Right-Pointing-Pointer Hollow graphitic carbon nanospheres (HGCNSs) were prepared on large scale at 550 Degree-Sign C Black-Right-Pointing-Pointer The preparation is simple, effective and eco-friendly. Black-Right-Pointing-Pointer The in situ yielded MgO nanocrystals promote the graphitization. Black-Right-Pointing-Pointer The HGCNSs exhibit superior electrochemical performance to graphite.
Large-scale impact cratering on the terrestrial planets

International Nuclear Information System (INIS)

Grieve, R.A.F.

1982-01-01

The crater densities on the earth and moon form the basis for a standard flux-time curve that can be used in dating unsampled planetary surfaces and constraining the temporal history of endogenic geologic processes. Abundant evidence is seen not only that impact cratering was an important surface process in planetary history but also that large imapact events produced effects that were crucial in scale. By way of example, it is noted that the formation of multiring basins on the early moon was as important in defining the planetary tectonic framework as plate tectonics is on the earth. Evidence from several planets suggests that the effects of very-large-scale impacts go beyond the simple formation of an impact structure and serve to localize increased endogenic activity over an extended period of geologic time. Even though no longer occurring with the frequency and magnitude of early solar system history, it is noted that large scale impact events continue to affect the local geology of the planets. 92 references
Optical interconnect for large-scale systems

Science.gov (United States)

Dress, William

2013-02-01

This paper presents a switchless, optical interconnect module that serves as a node in a network of identical distribution modules for large-scale systems. Thousands to millions of hosts or endpoints may be interconnected by a network of such modules, avoiding the need for multi-level switches. Several common network topologies are reviewed and their scaling properties assessed. The concept of message-flow routing is discussed in conjunction with the unique properties enabled by the optical distribution module where it is shown how top-down software control (global routing tables, spanning-tree algorithms) may be avoided.
Massive Asynchronous Parallelization of Sparse Matrix Factorizations

Energy Technology Data Exchange (ETDEWEB)

Chow, Edmond [Georgia Inst. of Technology, Atlanta, GA (United States)

2018-01-08

Solving sparse problems is at the core of many DOE computational science applications. We focus on the challenge of developing sparse algorithms that can fully exploit the parallelism in extreme-scale computing systems, in particular systems with massive numbers of cores per node. Our approach is to express a sparse matrix factorization as a large number of bilinear constraint equations, and then solving these equations via an asynchronous iterative method. The unknowns in these equations are the matrix entries of the factorization that is desired.
Real-world-time simulation of memory consolidation in a large-scale cerebellar model

Directory of Open Access Journals (Sweden)

Masato eGosui

2016-03-01

Full Text Available We report development of a large-scale spiking network model of thecerebellum composed of more than 1 million neurons. The model isimplemented on graphics processing units (GPUs, which are dedicatedhardware for parallel computing. Using 4 GPUs simultaneously, we achieve realtime simulation, in which computer simulation ofcerebellar activity for 1 sec completes within 1 sec in thereal-world time, with temporal resolution of 1 msec.This allows us to carry out a very long-term computer simulationof cerebellar activity in a practical time with millisecond temporalresolution. Using the model, we carry out computer simulationof long-term gain adaptation of optokinetic response (OKR eye movementsfor 5 days aimed to study the neural mechanisms of posttraining memoryconsolidation. The simulation results are consistent with animal experimentsand our theory of posttraining memory consolidation. These resultssuggest that realtime computing provides a useful means to studya very slow neural process such as memory consolidation in the brain.
Dealing with BIG Data - Exploiting the Potential of Multicore Parallelism and Auto-Tuning

CERN Multimedia

CERN. Geneva

2012-01-01

Physics experiments nowadays produce tremendous amounts of data that require sophisticated analyses in order to gain new insights. At such large scale, scientists are facing non-trivial software engineering problems in addition to the physics problems. Ubiquitous multicore processors and GPGPUs have turned almost any computer into a parallel machine and have pushed compute clusters and clouds to become multicore-based and more heterogenous. These developments complicate the exploitation of various types of parallelism within different layers of hardware and software. As a consequence, manual performance tuning is non-intuitive and tedious due to the large search space spanned by numerous inter-related tuning parameters. This talk addresses these challenges at CERN and discusses how to leverage multicore parallelization techniques in this context. It presents recent advances in automatic performance tuning to algorithmically find sweet spots with good performance. The talk also presents results from empiri...
[A large-scale accident in Alpine terrain].

Science.gov (United States)

Wildner, M; Paal, P

2015-02-01

Due to the geographical conditions, large-scale accidents amounting to mass casualty incidents (MCI) in Alpine terrain regularly present rescue teams with huge challenges. Using an example incident, specific conditions and typical problems associated with such a situation are presented. The first rescue team members to arrive have the elementary tasks of qualified triage and communication to the control room, which is required to dispatch the necessary additional support. Only with a clear "concept", to which all have to adhere, can the subsequent chaos phase be limited. In this respect, a time factor confounded by adverse weather conditions or darkness represents enormous pressure. Additional hazards are frostbite and hypothermia. If priorities can be established in terms of urgency, then treatment and procedure algorithms have proven successful. For evacuation of causalities, a helicopter should be strived for. Due to the low density of hospitals in Alpine regions, it is often necessary to distribute the patients over a wide area. Rescue operations in Alpine terrain have to be performed according to the particular conditions and require rescue teams to have specific knowledge and expertise. The possibility of a large-scale accident should be considered when planning events. With respect to optimization of rescue measures, regular training and exercises are rational, as is the analysis of previous large-scale Alpine accidents.
A path-level exact parallelization strategy for sequential simulation

Science.gov (United States)

Peredo, Oscar F.; Baeza, Daniel; Ortiz, Julián M.; Herrero, José R.

2018-01-01

Sequential Simulation is a well known method in geostatistical modelling. Following the Bayesian approach for simulation of conditionally dependent random events, Sequential Indicator Simulation (SIS) method draws simulated values for K categories (categorical case) or classes defined by K different thresholds (continuous case). Similarly, Sequential Gaussian Simulation (SGS) method draws simulated values from a multivariate Gaussian field. In this work, a path-level approach to parallelize SIS and SGS methods is presented. A first stage of re-arrangement of the simulation path is performed, followed by a second stage of parallel simulation for non-conflicting nodes. A key advantage of the proposed parallelization method is to generate identical realizations as with the original non-parallelized methods. Case studies are presented using two sequential simulation codes from GSLIB: SISIM and SGSIM. Execution time and speedup results are shown for large-scale domains, with many categories and maximum kriging neighbours in each case, achieving high speedup results in the best scenarios using 16 threads of execution in a single machine.
Multimode Resource-Constrained Multiple Project Scheduling Problem under Fuzzy Random Environment and Its Application to a Large Scale Hydropower Construction Project

Science.gov (United States)

Xu, Jiuping

2014-01-01

This paper presents an extension of the multimode resource-constrained project scheduling problem for a large scale construction project where multiple parallel projects and a fuzzy random environment are considered. By taking into account the most typical goals in project management, a cost/weighted makespan/quality trade-off optimization model is constructed. To deal with the uncertainties, a hybrid crisp approach is used to transform the fuzzy random parameters into fuzzy variables that are subsequently defuzzified using an expected value operator with an optimistic-pessimistic index. Then a combinatorial-priority-based hybrid particle swarm optimization algorithm is developed to solve the proposed model, where the combinatorial particle swarm optimization and priority-based particle swarm optimization are designed to assign modes to activities and to schedule activities, respectively. Finally, the results and analysis of a practical example at a large scale hydropower construction project are presented to demonstrate the practicality and efficiency of the proposed model and optimization method. PMID:24550708
Multimode resource-constrained multiple project scheduling problem under fuzzy random environment and its application to a large scale hydropower construction project.

Science.gov (United States)

Xu, Jiuping; Feng, Cuiying

2014-01-01

This paper presents an extension of the multimode resource-constrained project scheduling problem for a large scale construction project where multiple parallel projects and a fuzzy random environment are considered. By taking into account the most typical goals in project management, a cost/weighted makespan/quality trade-off optimization model is constructed. To deal with the uncertainties, a hybrid crisp approach is used to transform the fuzzy random parameters into fuzzy variables that are subsequently defuzzified using an expected value operator with an optimistic-pessimistic index. Then a combinatorial-priority-based hybrid particle swarm optimization algorithm is developed to solve the proposed model, where the combinatorial particle swarm optimization and priority-based particle swarm optimization are designed to assign modes to activities and to schedule activities, respectively. Finally, the results and analysis of a practical example at a large scale hydropower construction project are presented to demonstrate the practicality and efficiency of the proposed model and optimization method.
Hierarchical Cantor set in the large scale structure with torus geometry

Energy Technology Data Exchange (ETDEWEB)

Murdzek, R. [Physics Department, ' Al. I. Cuza' University, Blvd. Carol I, Nr. 11, Iassy 700506 (Romania)], E-mail: rmurdzek@yahoo.com

2008-12-15

The formation of large scale structures is considered within a model with string on toroidal space-time. Firstly, the space-time geometry is presented. In this geometry, the Universe is represented by a string describing a torus surface. Thereafter, the large scale structure of the Universe is derived from the string oscillations. The results are in agreement with the cellular structure of the large scale distribution and with the theory of a Cantorian space-time.
Accelerating research into bio-based FDCA-polyesters by using small scale parallel film reactors.

Science.gov (United States)

Gruter, Gert-Jan M; Sipos, Laszlo; Adrianus Dam, Matheus

2012-02-01

High Throughput experimentation has been well established as a tool in early stage catalyst development and catalyst and process scale-up today. One of the more challenging areas of catalytic research is polymer catalysis. The main difference with most non-polymer catalytic conversions is the fact that the product is not a well defined molecule and the catalytic performance cannot be easily expressed only in terms of catalyst activity and selectivity. In polymerization reactions, polymer chains are formed that can have various lengths (resulting in a molecular weight distribution rather than a defined molecular weight), that can have different compositions (when random or block co-polymers are produced), that can have cross-linking (often significantly affecting physical properties), that can have different endgroups (often affecting subsequent processing steps) and several other variations. In addition, for polyolefins, mass and heat transfer, oxygen and moisture sensitivity, stereoregularity and many other intrinsic features make relevant high throughput screening in this field an incredible challenge. For polycondensation reactions performed in the melt often the viscosity becomes already high at modest molecular weights, which greatly influences mass transfer of the condensation product (often water or methanol). When reactions become mass transfer limited, catalyst performance comparison is often no longer relevant. This however does not mean that relevant experiments for these application areas cannot be performed on small scale. Relevant catalyst screening experiments for polycondensation reactions can be performed in very efficient small scale parallel equipment. Both transesterification and polycondensation as well as post condensation through solid-stating in parallel equipment have been developed. Next to polymer synthesis, polymer characterization also needs to be accelerated without making concessions to quality in order to draw relevant conclusions.

Large-area parallel near-field optical nanopatterning of functional materials using microsphere mask

Energy Technology Data Exchange (ETDEWEB)

Chen, G.X. [NUS Nanoscience and Nanotechnology Initiative, National University of Singapore, 2 Engineering Drive 3, Singapore 117576 (Singapore); Department of Electrical and Computer Engineering, National University of Singapore, 4 Engineering Drive 3, Singapore 117576 (Singapore); Hong, M.H. [NUS Nanoscience and Nanotechnology Initiative, National University of Singapore, 2 Engineering Drive 3, Singapore 117576 (Singapore); Department of Electrical and Computer Engineering, National University of Singapore, 4 Engineering Drive 3, Singapore 117576 (Singapore); Data Storage Institute, ASTAR, DSI Building, 5 Engineering Drive 1, Singapore 117608 (Singapore)], E-mail: Hong_Minghui@dsi.a-star.edu.sg; Lin, Y. [NUS Nanoscience and Nanotechnology Initiative, National University of Singapore, 2 Engineering Drive 3, Singapore 117576 (Singapore); Department of Electrical and Computer Engineering, National University of Singapore, 4 Engineering Drive 3, Singapore 117576 (Singapore); Wang, Z.B. [Data Storage Institute, ASTAR, DSI Building, 5 Engineering Drive 1, Singapore 117608 (Singapore); Ng, D.K.T. [Department of Electrical and Computer Engineering, National University of Singapore, 4 Engineering Drive 3, Singapore 117576 (Singapore); Data Storage Institute, ASTAR, DSI Building, 5 Engineering Drive 1, Singapore 117608 (Singapore); Xie, Q. [Data Storage Institute, ASTAR, DSI Building, 5 Engineering Drive 1, Singapore 117608 (Singapore); Tan, L.S. [NUS Nanoscience and Nanotechnology Initiative, National University of Singapore, 2 Engineering Drive 3, Singapore 117576 (Singapore); Department of Electrical and Computer Engineering, National University of Singapore, 4 Engineering Drive 3, Singapore 117576 (Singapore); Chong, T.C. [Department of Electrical and Computer Engineering, National University of Singapore, 4 Engineering Drive 3, Singapore 117576 (Singapore); Data Storage Institute, ASTAR, DSI Building, 5 Engineering Drive 1, Singapore 117608 (Singapore)

2008-01-31

Large-area parallel near-field optical nanopatterning on functional material surfaces was investigated with KrF excimer laser irradiation. A monolayer of silicon dioxide microspheres was self-assembled on the sample surfaces as the processing mask. Nanoholes and nanospots were obtained on silicon surfaces and thin silver films, respectively. The nanopatterning results were affected by the refractive indices of the surrounding media. Near-field optical enhancement beneath the microspheres is the physical origin of nanostructure formation. Theoretical calculation was performed to study the intensity of optical field distributions under the microspheres according to the light scattering model of a sphere on the substrate.
Large-scale Motion of Solar Filaments

Indian Academy of Sciences (India)

tribpo

Large-scale Motion of Solar Filaments. Pavel Ambrož, Astronomical Institute of the Acad. Sci. of the Czech Republic, CZ-25165. Ondrejov, The Czech Republic. e-mail: pambroz@asu.cas.cz. Alfred Schroll, Kanzelhöehe Solar Observatory of the University of Graz, A-9521 Treffen,. Austria. e-mail: schroll@solobskh.ac.at.
Sensitivity analysis for large-scale problems

Science.gov (United States)

Noor, Ahmed K.; Whitworth, Sandra L.

1987-01-01

The development of efficient techniques for calculating sensitivity derivatives is studied. The objective is to present a computational procedure for calculating sensitivity derivatives as part of performing structural reanalysis for large-scale problems. The scope is limited to framed type structures. Both linear static analysis and free-vibration eigenvalue problems are considered.
The Cosmology Large Angular Scale Surveyor

Science.gov (United States)

Harrington, Kathleen; Marriage, Tobias; Ali, Aamir; Appel, John; Bennett, Charles; Boone, Fletcher; Brewer, Michael; Chan, Manwei; Chuss, David T.; Colazo, Felipe;

2016-01-01

The Cosmology Large Angular Scale Surveyor (CLASS) is a four telescope array designed to characterize relic primordial gravitational waves from inflation and the optical depth to reionization through a measurement of the polarized cosmic microwave background (CMB) on the largest angular scales. The frequencies of the four CLASS telescopes, one at 38 GHz, two at 93 GHz, and one dichroic system at 145217 GHz, are chosen to avoid spectral regions of high atmospheric emission and span the minimum of the polarized Galactic foregrounds: synchrotron emission at lower frequencies and dust emission at higher frequencies. Low-noise transition edge sensor detectors and a rapid front-end polarization modulator provide a unique combination of high sensitivity, stability, and control of systematics. The CLASS site, at 5200 m in the Chilean Atacama desert, allows for daily mapping of up to 70% of the sky and enables the characterization of CMB polarization at the largest angular scales. Using this combination of a broad frequency range, large sky coverage, control over systematics, and high sensitivity, CLASS will observe the reionization and recombination peaks of the CMB E- and B-mode power spectra. CLASS will make a cosmic variance limited measurement of the optical depth to reionization and will measure or place upper limits on the tensor-to-scalar ratio, r, down to a level of 0.01 (95% C.L.).

Prehospital Acute Stroke Severity Scale to Predict Large Artery Occlusion: Design and Comparison With Other Scales.

Science.gov (United States)

Hastrup, Sidsel; Damgaard, Dorte; Johnsen, Søren Paaske; Andersen, Grethe

2016-07-01

We designed and validated a simple prehospital stroke scale to identify emergent large vessel occlusion (ELVO) in patients with acute ischemic stroke and compared the scale to other published scales for prediction of ELVO. A national historical test cohort of 3127 patients with information on intracranial vessel status (angiography) before reperfusion therapy was identified. National Institutes of Health Stroke Scale (NIHSS) items with the highest predictive value of occlusion of a large intracranial artery were identified, and the most optimal combination meeting predefined criteria to ensure usefulness in the prehospital phase was determined. The predictive performance of Prehospital Acute Stroke Severity (PASS) scale was compared with other published scales for ELVO. The PASS scale was composed of 3 NIHSS scores: level of consciousness (month/age), gaze palsy/deviation, and arm weakness. In derivation of PASS 2/3 of the test cohort was used and showed accuracy (area under the curve) of 0.76 for detecting large arterial occlusion. Optimal cut point ≥2 abnormal scores showed: sensitivity=0.66 (95% CI, 0.62-0.69), specificity=0.83 (0.81-0.85), and area under the curve=0.74 (0.72-0.76). Validation on 1/3 of the test cohort showed similar performance. Patients with a large artery occlusion on angiography with PASS ≥2 had a median NIHSS score of 17 (interquartile range=6) as opposed to PASS <2 with a median NIHSS score of 6 (interquartile range=5). The PASS scale showed equal performance although more simple when compared with other scales predicting ELVO. The PASS scale is simple and has promising accuracy for prediction of ELVO in the field. © 2016 American Heart Association, Inc.
Performance studies of the parallel VIM code

International Nuclear Information System (INIS)

Shi, B.; Blomquist, R.N.

1996-01-01

In this paper, the authors evaluate the performance of the parallel version of the VIM Monte Carlo code on the IBM SPx at the High Performance Computing Research Facility at ANL. Three test problems with contrasting computational characteristics were used to assess effects in performance. A statistical method for estimating the inefficiencies due to load imbalance and communication is also introduced. VIM is a large scale continuous energy Monte Carlo radiation transport program and was parallelized using history partitioning, the master/worker approach, and p4 message passing library. Dynamic load balancing is accomplished when the master processor assigns chunks of histories to workers that have completed a previously assigned task, accommodating variations in the lengths of histories, processor speeds, and worker loads. At the end of each batch (generation), the fission sites and tallies are sent from each worker to the master process, contributing to the parallel inefficiency. All communications are between master and workers, and are serial. The SPx is a scalable 128-node parallel supercomputer with high-performance Omega switches of 63 microsec latency and 35 MBytes/sec bandwidth. For uniform and reproducible performance, they used only the 120 identical regular processors (IBM RS/6000) and excluded the remaining eight planet nodes, which may be loaded by other's jobs
Analysis using large-scale ringing data

Directory of Open Access Journals (Sweden)

Baillie, S. R.

2004-06-01

Full Text Available Birds are highly mobile organisms and there is increasing evidence that studies at large spatial scales are needed if we are to properly understand their population dynamics. While classical metapopulation models have rarely proved useful for birds, more general metapopulation ideas involving collections of populations interacting within spatially structured landscapes are highly relevant (Harrison, 1994. There is increasing interest in understanding patterns of synchrony, or lack of synchrony, between populations and the environmental and dispersal mechanisms that bring about these patterns (Paradis et al., 2000. To investigate these processes we need to measure abundance, demographic rates and dispersal at large spatial scales, in addition to gathering data on relevant environmental variables. There is an increasing realisation that conservation needs to address rapid declines of common and widespread species (they will not remain so if such trends continue as well as the management of small populations that are at risk of extinction. While the knowledge needed to support the management of small populations can often be obtained from intensive studies in a few restricted areas, conservation of widespread species often requires information on population trends and processes measured at regional, national and continental scales (Baillie, 2001. While management prescriptions for widespread populations may initially be developed from a small number of local studies or experiments, there is an increasing need to understand how such results will scale up when applied across wider areas. There is also a vital role for monitoring at large spatial scales both in identifying such population declines and in assessing population recovery. Gathering data on avian abundance and demography at large spatial scales usually relies on the efforts of large numbers of skilled volunteers. Volunteer studies based on ringing (for example Constant Effort Sites [CES
Managing Risk and Uncertainty in Large-Scale University Research Projects

Science.gov (United States)

Moore, Sharlissa; Shangraw, R. F., Jr.

2011-01-01

Both publicly and privately funded research projects managed by universities are growing in size and scope. Complex, large-scale projects (over $50 million) pose new management challenges and risks for universities. This paper explores the relationship between project success and a variety of factors in large-scale university projects. First, we…
Parallel Auxiliary Space AMG Solver for $H(div)$ Problems

Energy Technology Data Exchange (ETDEWEB)

Kolev, Tzanio V. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Vassilevski, Panayot S. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

2012-12-18

We present a family of scalable preconditioners for matrices arising in the discretization of $H(div)$ problems using the lowest order Raviart--Thomas finite elements. Our approach belongs to the class of “auxiliary space''--based methods and requires only the finite element stiffness matrix plus some minimal additional discretization information about the topology and orientation of mesh entities. Also, we provide a detailed algebraic description of the theory, parallel implementation, and different variants of this parallel auxiliary space divergence solver (ADS) and discuss its relations to the Hiptmair--Xu (HX) auxiliary space decomposition of $H(div)$ [SIAM J. Numer. Anal., 45 (2007), pp. 2483--2509] and to the auxiliary space Maxwell solver AMS [J. Comput. Math., 27 (2009), pp. 604--623]. Finally, an extensive set of numerical experiments demonstrates the robustness and scalability of our implementation on large-scale $H(div)$ problems with large jumps in the material coefficients.
Large-Scale Star Formation-Driven Outflows at 13D-HST Survey

Science.gov (United States)

Lundgren, Britt; Brammer, G.; Van Dokkum, P. G.; Bezanson, R.; Franx, M.; Fumagalli, M.; Momcheva, I. G.; Nelson, E.; Skelton, R.; Wake, D.; Whitaker, K. E.; da Cunha, E.; Erb, D.; Fan, X.; Kriek, M.; Labbe, I.; Marchesini, D.; Patel, S.; Rix, H.; Schmidt, K.; van der Wel, A.

2013-01-01

We present evidence of large-scale outflows from three low-mass star-forming galaxies observed at z=1.24, z=1.35 and z=1.75 in the 3D-HST Survey. Each of these galaxies is located within a projected physical distance of 60 kpc around the sight line to the quasar SDSS J123622.93+621526.6, which exhibits well-separated strong (W>0.8A) MgII absorption systems matching precisely to the redshifts of the three galaxies. We derive the star formation surface densities from the H-alpha emission in the WFC3 G141 grism observations for the galaxies and find that in each case the star formation surface density well-exceeds 0.1 solar mass / yr / kpc^2, the typical threshold for starburst galaxies in the local Universe. From a small but complete parallel census of the 0.650.8A MgII covering fraction of star-forming galaxies at 10.4A MgII absorbing gas around star-forming galaxies may evolve from 2 to the present, consistent with recent observations of an increasing collimation of star formation-driven outflows with time from 3.
A novel conceptual design of parallel nitrogen expansion liquefaction process for small-scale LNG (liquefied natural gas) plant in skid-mount packages

International Nuclear Information System (INIS)

He, Tianbiao; Ju, Yonglin

2014-01-01

The utilization of unconventional natural gas is still a great challenge for China due to its distribution locations and small reserves. Thus, liquefying the unconventional natural gas by using small-scale LNG plant in skid-mount packages is a good choice with great economic benefits. A novel conceptual design of parallel nitrogen expansion liquefaction process for small-scale plant in skid-mount packages has been proposed. It first designs a process configuration. Then, thermodynamic analysis of the process is conducted. Next, an optimization model with genetic algorithm method is developed to optimize the process. Finally, the flexibilities of the process are tested by two different feed gases. In conclusion, the proposed parallel nitrogen expansion liquefaction process can be used in small-scale LNG plant in skid-mount packages with high exergy efficiency and great economic benefits. - Highlights: • A novel design of parallel nitrogen expansion liquefaction process is proposed. • Genetic algorithm is applied to optimize the novel process. • The unit energy consumption of optimized process is 0.5163 kWh/Nm 3 . • The exergy efficiency of the optimized case is 0.3683. • The novel process has a good flexibility for different feed gas conditions
Parallelization of pressure equation solver for incompressible N-S equations

International Nuclear Information System (INIS)

Ichihara, Kiyoshi; Yokokawa, Mitsuo; Kaburaki, Hideo.

1996-03-01

A pressure equation solver in a code for 3-dimensional incompressible flow analysis has been parallelized by using red-black SOR method and PCG method on Fujitsu VPP500, a vector parallel computer with distributed memory. For the comparison of scalability, the solver using the red-black SOR method has been also parallelized on the Intel Paragon, a scalar parallel computer with a distributed memory. The scalability of the red-black SOR method on both VPP500 and Paragon was lost, when number of processor elements was increased. The reason of non-scalability on both systems is increasing communication time between processor elements. In addition, the parallelization by DO-loop division makes the vectorizing efficiency lower on VPP500. For an effective implementation on VPP500, a large scale problem which holds very long vectorized DO-loops in the parallel program should be solved. PCG method with red-black SOR method applied to incomplete LU factorization (red-black PCG) has more iteration steps than normal PCG method with forward and backward substitution, in spite of same number of the floating point operations in a DO-loop of incomplete LU factorization. The parallelized red-black PCG method has less merits than the parallelized red-black SOR method when the computational region has fewer grids, because the low vectorization efficiency is obtained in red-black PCG method. (author)
Parallelizing ATLAS Reconstruction and Simulation: Issues and Optimization Solutions for Scaling on Multi- and Many-CPU Platforms

International Nuclear Information System (INIS)

Leggett, C; Jackson, K; Tatarkhanov, M; Yao, Y; Binet, S; Levinthal, D

2011-01-01

Thermal limitations have forced CPU manufacturers to shift from simply increasing clock speeds to improve processor performance, to producing chip designs with multi- and many-core architectures. Further the cores themselves can run multiple threads as a zero overhead context switch allowing low level resource sharing (Intel Hyperthreading). To maximize bandwidth and minimize memory latency, memory access has become non uniform (NUMA). As manufacturers add more cores to each chip, a careful understanding of the underlying architecture is required in order to fully utilize the available resources. We present AthenaMP and the Atlas event loop manager, the driver of the simulation and reconstruction engines, which have been rewritten to make use of multiple cores, by means of event based parallelism, and final stage I/O synchronization. However, initial studies on 8 andl6 core Intel architectures have shown marked non-linearities as parallel process counts increase, with as much as 30% reductions in event throughput in some scenarios. Since the Intel Nehalem architecture (both Gainestown and Westmere) will be the most common choice for the next round of hardware procurements, an understanding of these scaling issues is essential. Using hardware based event counters and Intel's Performance Tuning Utility, we have studied the performance bottlenecks at the hardware level, and discovered optimization schemes to maximize processor throughput. We have also produced optimization mechanisms, common to all large experiments, that address the extreme nature of today's HEP code, which due to it's size, places huge burdens on the memory infrastructure of today's processors.
Symmetries in eleven dimensional supergravity compactified on a parallelized seven sphere

CERN Document Server

Englert, F; Spindel, P

1983-01-01

We analyse, in eleven-dimensional supergravity compactified on S7, the spontaneous symmetry breaking induced by a spontaneous parallelization of the sphere. The eight supersymmetries are broken at a common scale and the SO(8) gauge group is reduced to Spin (7). Such a large residual symmetry has a simple geometrical significance revealed through use of octonions; this is explained in elementary terms.
Adaptive visualization for large-scale graph

International Nuclear Information System (INIS)

Nakamura, Hiroko; Shinano, Yuji; Ohzahata, Satoshi

2010-01-01

We propose an adoptive visualization technique for representing a large-scale hierarchical dataset within limited display space. A hierarchical dataset has nodes and links showing the parent-child relationship between the nodes. These nodes and links are described using graphics primitives. When the number of these primitives is large, it is difficult to recognize the structure of the hierarchical data because many primitives are overlapped within a limited region. To overcome this difficulty, we propose an adaptive visualization technique for hierarchical datasets. The proposed technique selects an appropriate graph style according to the nodal density in each area. (author)
Stabilization Algorithms for Large-Scale Problems

DEFF Research Database (Denmark)

Jensen, Toke Koldborg

2006-01-01

The focus of the project is on stabilization of large-scale inverse problems where structured models and iterative algorithms are necessary for computing approximate solutions. For this purpose, we study various iterative Krylov methods and their abilities to produce regularized solutions. Some......-curve. This heuristic is implemented as a part of a larger algorithm which is developed in collaboration with G. Rodriguez and P. C. Hansen. Last, but not least, a large part of the project has, in different ways, revolved around the object-oriented Matlab toolbox MOORe Tools developed by PhD Michael Jacobsen. New...
Influence of weathering and pre-existing large scale fractures on gravitational slope failure: insights from 3-D physical modelling

Directory of Open Access Journals (Sweden)

D. Bachmann

2004-01-01

Full Text Available Using a new 3-D physical modelling technique we investigated the initiation and evolution of large scale landslides in presence of pre-existing large scale fractures and taking into account the slope material weakening due to the alteration/weathering. The modelling technique is based on the specially developed properly scaled analogue materials, as well as on the original vertical accelerator device enabling increases in the 'gravity acceleration' up to a factor 50. The weathering primarily affects the uppermost layers through the water circulation. We simulated the effect of this process by making models of two parts. The shallower one represents the zone subject to homogeneous weathering and is made of low strength material of compressive strength σl. The deeper (core part of the model is stronger and simulates intact rocks. Deformation of such a model subjected to the gravity force occurred only in its upper (low strength layer. In another set of experiments, low strength (σw narrow planar zones sub-parallel to the slope surface (σwl were introduced into the model's superficial low strength layer to simulate localized highly weathered zones. In this configuration landslides were initiated much easier (at lower 'gravity force', were shallower and had smaller horizontal size largely defined by the weak zone size. Pre-existing fractures were introduced into the model by cutting it along a given plan. They have proved to be of small influence on the slope stability, except when they were associated to highly weathered zones. In this latter case the fractures laterally limited the slides. Deep seated rockslides initiation is thus directly defined by the mechanical structure of the hillslope's uppermost levels and especially by the presence of the weak zones due to the weathering. The large scale fractures play a more passive role and can only influence the shape and the volume of the sliding units.
Design study on sodium cooled large-scale reactor

International Nuclear Information System (INIS)

Murakami, Tsutomu; Hishida, Masahiko; Kisohara, Naoyuki

2004-07-01

In Phase 1 of the 'Feasibility Studies on Commercialized Fast Reactor Cycle Systems (F/S)', an advanced loop type reactor has been selected as a promising concept of sodium-cooled large-scale reactor, which has a possibility to fulfill the design requirements of the F/S. In Phase 2, design improvement for further cost reduction of establishment of the plant concept has been performed. This report summarizes the results of the design study on the sodium-cooled large-scale reactor performed in JFY2003, which is the third year of Phase 2. In the JFY2003 design study, critical subjects related to safety, structural integrity and thermal hydraulics which found in the last fiscal year has been examined and the plant concept has been modified. Furthermore, fundamental specifications of main systems and components have been set and economy has been evaluated. In addition, as the interim evaluation of the candidate concept of the FBR fuel cycle is to be conducted, cost effectiveness and achievability for the development goal were evaluated and the data of the three large-scale reactor candidate concepts were prepared. As a results of this study, the plant concept of the sodium-cooled large-scale reactor has been constructed, which has a prospect to satisfy the economic goal (construction cost: less than 200,000 yens/kWe, etc.) and has a prospect to solve the critical subjects. From now on, reflecting the results of elemental experiments, the preliminary conceptual design of this plant will be preceded toward the selection for narrowing down candidate concepts at the end of Phase 2. (author)
Design study on sodium-cooled large-scale reactor

International Nuclear Information System (INIS)

Shimakawa, Yoshio; Nibe, Nobuaki; Hori, Toru

2002-05-01

In Phase 1 of the 'Feasibility Study on Commercialized Fast Reactor Cycle Systems (F/S)', an advanced loop type reactor has been selected as a promising concept of sodium-cooled large-scale reactor, which has a possibility to fulfill the design requirements of the F/S. In Phase 2 of the F/S, it is planed to precede a preliminary conceptual design of a sodium-cooled large-scale reactor based on the design of the advanced loop type reactor. Through the design study, it is intended to construct such a plant concept that can show its attraction and competitiveness as a commercialized reactor. This report summarizes the results of the design study on the sodium-cooled large-scale reactor performed in JFY2001, which is the first year of Phase 2. In the JFY2001 design study, a plant concept has been constructed based on the design of the advanced loop type reactor, and fundamental specifications of main systems and components have been set. Furthermore, critical subjects related to safety, structural integrity, thermal hydraulics, operability, maintainability and economy have been examined and evaluated. As a result of this study, the plant concept of the sodium-cooled large-scale reactor has been constructed, which has a prospect to satisfy the economic goal (construction cost: less than 200,000yens/kWe, etc.) and has a prospect to solve the critical subjects. From now on, reflecting the results of elemental experiments, the preliminary conceptual design of this plant will be preceded toward the selection for narrowing down candidate concepts at the end of Phase 2. (author)
LSD: Large Survey Database framework

Science.gov (United States)

Juric, Mario

2012-09-01

The Large Survey Database (LSD) is a Python framework and DBMS for distributed storage, cross-matching and querying of large survey catalogs (>10^9 rows, >1 TB). The primary driver behind its development is the analysis of Pan-STARRS PS1 data. It is specifically optimized for fast queries and parallel sweeps of positionally and temporally indexed datasets. It transparently scales to more than >10^2 nodes, and can be made to function in "shared nothing" architectures.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.