parallel structurally-symmetric sparse: Topics by WorldWideScience.org

Sample records for parallel structurally-symmetric sparse

Massively parallel sparse matrix function calculations with NTPoly

Science.gov (United States)

Dawson, William; Nakajima, Takahito

2018-04-01

We present NTPoly, a massively parallel library for computing the functions of sparse, symmetric matrices. The theory of matrix functions is a well developed framework with a wide range of applications including differential equations, graph theory, and electronic structure calculations. One particularly important application area is diagonalization free methods in quantum chemistry. When the input and output of the matrix function are sparse, methods based on polynomial expansions can be used to compute matrix functions in linear time. We present a library based on these methods that can compute a variety of matrix functions. Distributed memory parallelization is based on a communication avoiding sparse matrix multiplication algorithm. OpenMP task parallellization is utilized to implement hybrid parallelization. We describe NTPoly's interface and show how it can be integrated with programs written in many different programming languages. We demonstrate the merits of NTPoly by performing large scale calculations on the K computer.
Parallel transposition of sparse data structures

DEFF Research Database (Denmark)

Wang, Hao; Liu, Weifeng; Hou, Kaixi

2016-01-01

Many applications in computational sciences and social sciences exploit sparsity and connectivity of acquired data. Even though many parallel sparse primitives such as sparse matrix-vector (SpMV) multiplication have been extensively studied, some other important building blocks, e.g., parallel tr...... transposition in the latest vendor-supplied library on an Intel multicore CPU platform, and the MergeTrans approach achieves on average of 3.4-fold (up to 11.7-fold) speedup on an Intel Xeon Phi many-core processor....
Partitioning sparse rectangular matrices for parallel processing

Energy Technology Data Exchange (ETDEWEB)

Kolda, T.G.

1998-05-01

The authors are interested in partitioning sparse rectangular matrices for parallel processing. The partitioning problem has been well-studied in the square symmetric case, but the rectangular problem has received very little attention. They will formalize the rectangular matrix partitioning problem and discuss several methods for solving it. They will extend the spectral partitioning method for symmetric matrices to the rectangular case and compare this method to three new methods -- the alternating partitioning method and two hybrid methods. The hybrid methods will be shown to be best.
Storage of sparse files using parallel log-structured file system

Science.gov (United States)

Bent, John M.; Faibish, Sorin; Grider, Gary; Torres, Aaron

2017-11-07

A sparse file is stored without holes by storing a data portion of the sparse file using a parallel log-structured file system; and generating an index entry for the data portion, the index entry comprising a logical offset, physical offset and length of the data portion. The holes can be restored to the sparse file upon a reading of the sparse file. The data portion can be stored at a logical end of the sparse file. Additional storage efficiency can optionally be achieved by (i) detecting a write pattern for a plurality of the data portions and generating a single patterned index entry for the plurality of the patterned data portions; and/or (ii) storing the patterned index entries for a plurality of the sparse files in a single directory, wherein each entry in the single directory comprises an identifier of a corresponding sparse file.
A new scheduling algorithm for parallel sparse LU factorization with static pivoting

Energy Technology Data Exchange (ETDEWEB)

Grigori, Laura; Li, Xiaoye S.

2002-08-20

In this paper we present a static scheduling algorithm for parallel sparse LU factorization with static pivoting. The algorithm is divided into mapping and scheduling phases, using the symmetric pruned graphs of L' and U to represent dependencies. The scheduling algorithm is designed for driving the parallel execution of the factorization on a distributed-memory architecture. Experimental results and comparisons with SuperLU{_}DIST are reported after applying this algorithm on real world application matrices on an IBM SP RS/6000 distributed memory machine.
Parallel sparse direct solver for integrated circuit simulation

CERN Document Server

Chen, Xiaoming; Yang, Huazhong

2017-01-01

This book describes algorithmic methods and parallelization techniques to design a parallel sparse direct solver which is specifically targeted at integrated circuit simulation problems. The authors describe a complete flow and detailed parallel algorithms of the sparse direct solver. They also show how to improve the performance by simple but effective numerical techniques. The sparse direct solver techniques described can be applied to any SPICE-like integrated circuit simulator and have been proven to be high-performance in actual circuit simulation. Readers will benefit from the state-of-the-art parallel integrated circuit simulation techniques described in this book, especially the latest parallel sparse matrix solution techniques. · Introduces complicated algorithms of sparse linear solvers, using concise principles and simple examples, without complex theory or lengthy derivations; · Describes a parallel sparse direct solver that can be adopted to accelerate any SPICE-like integrated circuit simulato...
New Structural Representation and Digital-Analysis Platform for Symmetrical Parallel Mechanisms

Directory of Open Access Journals (Sweden)

Wenao Cao

2013-05-01

Full Text Available Abstract An automatic design platform capable of automatic structural analysis, structural synthesis and the application of parallel mechanisms will be a great aid in the conceptual design of mechanisms, though up to now such a platform has only existed as an idea. The work in this paper constitutes part of such a platform. Based on the screw theory and a new structural representation method proposed here which builds a one-to-one correspondence between the strings of representative characters and the kinematic structures of symmetrical parallel mechanisms (SPMs, this paper develops a fully-automatic approach for mobility (degree-of-freedom analysis, and further establishes an automatic digital-analysis platform for SPMs. With this platform, users simply have to enter the strings of representative characters, and the kinematic structures of the SPMs will be generated and displayed automatically, and the mobility and its properties will also be analysed and displayed automatically. Typical examples are provided to show the effectiveness of the approach.
Parallel Sparse Matrix - Vector Product

DEFF Research Database (Denmark)

Alexandersen, Joe; Lazarov, Boyan Stefanov; Dammann, Bernd

This technical report contains a case study of a sparse matrix-vector product routine, implemented for parallel execution on a compute cluster with both pure MPI and hybrid MPI-OpenMP solutions. C++ classes for sparse data types were developed and the report shows how these class can be used...
Iterative algorithms for large sparse linear systems on parallel computers

Science.gov (United States)

Adams, L. M.

1982-01-01

Algorithms for assembling in parallel the sparse system of linear equations that result from finite difference or finite element discretizations of elliptic partial differential equations, such as those that arise in structural engineering are developed. Parallel linear stationary iterative algorithms and parallel preconditioned conjugate gradient algorithms are developed for solving these systems. In addition, a model for comparing parallel algorithms on array architectures is developed and results of this model for the algorithms are given.
On the Automatic Parallelization of Sparse and Irregular Fortran Programs

Directory of Open Access Journals (Sweden)

Yuan Lin

1999-01-01

Full Text Available Automatic parallelization is usually believed to be less effective at exploiting implicit parallelism in sparse/irregular programs than in their dense/regular counterparts. However, not much is really known because there have been few research reports on this topic. In this work, we have studied the possibility of using an automatic parallelizing compiler to detect the parallelism in sparse/irregular programs. The study with a collection of sparse/irregular programs led us to some common loop patterns. Based on these patterns new techniques were derived that produced good speedups when manually applied to our benchmark codes. More importantly, these parallelization methods can be implemented in a parallelizing compiler and can be applied automatically.
Sparse BLIP: BLind Iterative Parallel imaging reconstruction using compressed sensing.

Science.gov (United States)

She, Huajun; Chen, Rong-Rong; Liang, Dong; DiBella, Edward V R; Ying, Leslie

2014-02-01

To develop a sensitivity-based parallel imaging reconstruction method to reconstruct iteratively both the coil sensitivities and MR image simultaneously based on their prior information. Parallel magnetic resonance imaging reconstruction problem can be formulated as a multichannel sampling problem where solutions are sought analytically. However, the channel functions given by the coil sensitivities in parallel imaging are not known exactly and the estimation error usually leads to artifacts. In this study, we propose a new reconstruction algorithm, termed Sparse BLind Iterative Parallel, for blind iterative parallel imaging reconstruction using compressed sensing. The proposed algorithm reconstructs both the sensitivity functions and the image simultaneously from undersampled data. It enforces the sparseness constraint in the image as done in compressed sensing, but is different from compressed sensing in that the sensing matrix is unknown and additional constraint is enforced on the sensitivities as well. Both phantom and in vivo imaging experiments were carried out with retrospective undersampling to evaluate the performance of the proposed method. Experiments show improvement in Sparse BLind Iterative Parallel reconstruction when compared with Sparse SENSE, JSENSE, IRGN-TV, and L1-SPIRiT reconstructions with the same number of measurements. The proposed Sparse BLind Iterative Parallel algorithm reduces the reconstruction errors when compared to the state-of-the-art parallel imaging methods. Copyright © 2013 Wiley Periodicals, Inc.
Sparse-matrix factorizations for fast symmetric Fourier transforms

International Nuclear Information System (INIS)

Sequel, J.

1987-01-01

This work proposes new fast algorithms computing the discrete Fourier transform of certain families of symmetric sequences. Sequences commonly found in problems of structure determination by x-ray crystallography and in numerical solutions of boundary-value problems in partial differential equations are dealt with. In the algorithms presented, the redundancies in the input and output data, due to the presence of symmetries in the input data sequence, were eliminated. Using ring-theoretical methods a matrix representation is obtained for the remaining calculations; which factors as the product of a complex block-diagonal matrix times as integral matrix. A basic two-step algorithm scheme arises from this factorization with a first step consisting of pre-additions and a second step containing the calculations involved in computing with the blocks in the block-diagonal factor. These blocks are structured as block-Hankel matrices, and two sparse-matrix factoring formulas are developed in order to diminish their arithmetic complexity
Algorithms for sparse, symmetric, definite quadratic lambda-matrix eigenproblems

International Nuclear Information System (INIS)

Scott, D.S.; Ward, R.C.

1981-01-01

Methods are presented for computing eigenpairs of the quadratic lambda-matrix, M lambda 2 + C lambda + K, where M, C, and K are large and sparse, and have special symmetry-type properties. These properties are sufficient to insure that all the eigenvalues are real and that theory analogous to the standard symmetric eigenproblem exists. The methods employ some standard techniques such as partial tri-diagonalization via the Lanczos Method and subsequent eigenpair calculation, shift-and- invert strategy and subspace iteration. The methods also employ some new techniques such as Rayleigh-Ritz quadratic roots and the inertia of symmetric, definite, quadratic lambda-matrices
Massive Asynchronous Parallelization of Sparse Matrix Factorizations

Energy Technology Data Exchange (ETDEWEB)

Chow, Edmond [Georgia Inst. of Technology, Atlanta, GA (United States)

2018-01-08

Solving sparse problems is at the core of many DOE computational science applications. We focus on the challenge of developing sparse algorithms that can fully exploit the parallelism in extreme-scale computing systems, in particular systems with massive numbers of cores per node. Our approach is to express a sparse matrix factorization as a large number of bilinear constraint equations, and then solving these equations via an asynchronous iterative method. The unknowns in these equations are the matrix entries of the factorization that is desired.
Graph Transformation and Designing Parallel Sparse Matrix Algorithms beyond Data Dependence Analysis

Directory of Open Access Journals (Sweden)

H.X. Lin

2004-01-01

Full Text Available Algorithms are often parallelized based on data dependence analysis manually or by means of parallel compilers. Some vector/matrix computations such as the matrix-vector products with simple data dependence structures (data parallelism can be easily parallelized. For problems with more complicated data dependence structures, parallelization is less straightforward. The data dependence graph is a powerful means for designing and analyzing parallel algorithms. However, for sparse matrix computations, parallelization based on solely exploiting the existing parallelism in an algorithm does not always give satisfactory results. For example, the conventional Gaussian elimination algorithm for the solution of a tri-diagonal system is inherently sequential, so algorithms specially for parallel computation has to be designed. After briefly reviewing different parallelization approaches, a powerful graph formalism for designing parallel algorithms is introduced. This formalism will be discussed using a tri-diagonal system as an example. Its application to general matrix computations is also discussed. Its power in designing parallel algorithms beyond the ability of data dependence analysis is shown by means of a new algorithm called ACER (Alternating Cyclic Elimination and Reduction algorithm.
Parallelism in matrix computations

CERN Document Server

Gallopoulos, Efstratios; Sameh, Ahmed H

2016-01-01

This book is primarily intended as a research monograph that could also be used in graduate courses for the design of parallel algorithms in matrix computations. It assumes general but not extensive knowledge of numerical linear algebra, parallel architectures, and parallel programming paradigms. The book consists of four parts: (I) Basics; (II) Dense and Special Matrix Computations; (III) Sparse Matrix Computations; and (IV) Matrix functions and characteristics. Part I deals with parallel programming paradigms and fundamental kernels, including reordering schemes for sparse matrices. Part II is devoted to dense matrix computations such as parallel algorithms for solving linear systems, linear least squares, the symmetric algebraic eigenvalue problem, and the singular-value decomposition. It also deals with the development of parallel algorithms for special linear systems such as banded ,Vandermonde ,Toeplitz ,and block Toeplitz systems. Part III addresses sparse matrix computations: (a) the development of pa...
P-SPARSLIB: A parallel sparse iterative solution package

Energy Technology Data Exchange (ETDEWEB)

Saad, Y. [Univ. of Minnesota, Minneapolis, MN (United States)

1994-12-31

Iterative methods are gaining popularity in engineering and sciences at a time where the computational environment is changing rapidly. P-SPARSLIB is a project to build a software library for sparse matrix computations on parallel computers. The emphasis is on iterative methods and the use of distributed sparse matrices, an extension of the domain decomposition approach to general sparse matrices. One of the goals of this project is to develop a software package geared towards specific applications. For example, the author will test the performance and usefulness of P-SPARSLIB modules on linear systems arising from CFD applications. Equally important is the goal of portability. In the long run, the author wishes to ensure that this package is portable on a variety of platforms, including SIMD environments and shared memory environments.
User's Manual for PCSMS (Parallel Complex Sparse Matrix Solver). Version 1.

Science.gov (United States)

Reddy, C. J.

2000-01-01

PCSMS (Parallel Complex Sparse Matrix Solver) is a computer code written to make use of the existing real sparse direct solvers to solve complex, sparse matrix linear equations. PCSMS converts complex matrices into real matrices and use real, sparse direct matrix solvers to factor and solve the real matrices. The solution vector is reconverted to complex numbers. Though, this utility is written for Silicon Graphics (SGI) real sparse matrix solution routines, it is general in nature and can be easily modified to work with any real sparse matrix solver. The User's Manual is written to make the user acquainted with the installation and operation of the code. Driver routines are given to aid the users to integrate PCSMS routines in their own codes.
Learning Joint-Sparse Codes for Calibration-Free Parallel MR Imaging.

Science.gov (United States)

Wang, Shanshan; Tan, Sha; Gao, Yuan; Liu, Qiegen; Ying, Leslie; Xiao, Taohui; Liu, Yuanyuan; Liu, Xin; Zheng, Hairong; Liang, Dong

2018-01-01

The integration of compressed sensing and parallel imaging (CS-PI) has shown an increased popularity in recent years to accelerate magnetic resonance (MR) imaging. Among them, calibration-free techniques have presented encouraging performances due to its capability in robustly handling the sensitivity information. Unfortunately, existing calibration-free methods have only explored joint-sparsity with direct analysis transform projections. To further exploit joint-sparsity and improve reconstruction accuracy, this paper proposes to Learn joINt-sparse coDes for caliBration-free parallEl mR imaGing (LINDBERG) by modeling the parallel MR imaging problem as an - - minimization objective with an norm constraining data fidelity, Frobenius norm enforcing sparse representation error and the mixed norm triggering joint sparsity across multichannels. A corresponding algorithm has been developed to alternatively update the sparse representation, sensitivity encoded images and K-space data. Then, the final image is produced as the square root of sum of squares of all channel images. Experimental results on both physical phantom and in vivo data sets show that the proposed method is comparable and even superior to state-of-the-art CS-PI reconstruction approaches. Specifically, LINDBERG has presented strong capability in suppressing noise and artifacts while reconstructing MR images from highly undersampled multichannel measurements.
Parallel preconditioning techniques for sparse CG solvers

Energy Technology Data Exchange (ETDEWEB)

Basermann, A.; Reichel, B.; Schelthoff, C. [Central Institute for Applied Mathematics, Juelich (Germany)

1996-12-31

Conjugate gradient (CG) methods to solve sparse systems of linear equations play an important role in numerical methods for solving discretized partial differential equations. The large size and the condition of many technical or physical applications in this area result in the need for efficient parallelization and preconditioning techniques of the CG method. In particular for very ill-conditioned matrices, sophisticated preconditioner are necessary to obtain both acceptable convergence and accuracy of CG. Here, we investigate variants of polynomial and incomplete Cholesky preconditioners that markedly reduce the iterations of the simply diagonally scaled CG and are shown to be well suited for massively parallel machines.

Structural Sparse Tracking

KAUST Repository

Zhang, Tianzhu

2015-06-01

Sparse representation has been applied to visual tracking by finding the best target candidate with minimal reconstruction error by use of target templates. However, most sparse representation based trackers only consider holistic or local representations and do not make full use of the intrinsic structure among and inside target candidates, thereby making the representation less effective when similar objects appear or under occlusion. In this paper, we propose a novel Structural Sparse Tracking (SST) algorithm, which not only exploits the intrinsic relationship among target candidates and their local patches to learn their sparse representations jointly, but also preserves the spatial layout structure among the local patches inside each target candidate. We show that our SST algorithm accommodates most existing sparse trackers with the respective merits. Both qualitative and quantitative evaluations on challenging benchmark image sequences demonstrate that the proposed SST algorithm performs favorably against several state-of-the-art methods.
Sparse Probabilistic Parallel Factor Analysis for the Modeling of PET and Task-fMRI Data

DEFF Research Database (Denmark)

Beliveau, Vincent; Papoutsakis, Georgios; Hinrich, Jesper Løve

2017-01-01

Modern datasets are often multiway in nature and can contain patterns common to a mode of the data (e.g. space, time, and subjects). Multiway decomposition such as parallel factor analysis (PARAFAC) take into account the intrinsic structure of the data, and sparse versions of these methods improv...
Parallelized preconditioned BiCGStab solution of sparse linear system equations in F-COBRA-TF

International Nuclear Information System (INIS)

Geemert, Rene van; Glück, Markus; Riedmann, Michael; Gabriel, Harry

2011-01-01

Recently, the in-house development of a preconditioned and parallelized BiCGStab solver has been pursued successfully in AREVA’s advanced sub-channel code F-COBRA-TF. This solver can be run either in a sequential computation mode on a single CPU, or in a parallel computation mode on multiple parallel CPUs. The developed procedure enables the computation of several thousands of successive sparse linear system solutions in F-COBRA-TF with acceptable wall clock run times. The current paper provides general information about F-COBRA-TF in terms of modeling capabilities and application areas, and points out where the relevance arises for the efficient iterative solution of sparse linear systems. Furthermore, the preconditioning and parallelization strategies in the developed BiCGStab iterative solution approach are discussed. The paper is concluded with a number of verification examples. (author)
Efficient diagonalization of the sparse matrices produced within the framework of the UK R-matrix molecular codes

Science.gov (United States)

Galiatsatos, P. G.; Tennyson, J.

2012-11-01

The most time consuming step within the framework of the UK R-matrix molecular codes is that of the diagonalization of the inner region Hamiltonian matrix (IRHM). Here we present the method that we follow to speed up this step. We use shared memory machines (SMM), distributed memory machines (DMM), the OpenMP directive based parallel language, the MPI function based parallel language, the sparse matrix diagonalizers ARPACK and PARPACK, a variation for real symmetric matrices of the official coordinate sparse matrix format and finally a parallel sparse matrix-vector product (PSMV). The efficient application of the previous techniques rely on two important facts: the sparsity of the matrix is large enough (more than 98%) and in order to get back converged results we need a small only part of the matrix spectrum.
Sparse structure regularized ranking

KAUST Repository

Wang, Jim Jing-Yan; Sun, Yijun; Gao, Xin

2014-01-01

Learning ranking scores is critical for the multimedia database retrieval problem. In this paper, we propose a novel ranking score learning algorithm by exploring the sparse structure and using it to regularize ranking scores. To explore the sparse
Sparse structure regularized ranking

KAUST Repository

Wang, Jim Jing-Yan

2014-04-17

Learning ranking scores is critical for the multimedia database retrieval problem. In this paper, we propose a novel ranking score learning algorithm by exploring the sparse structure and using it to regularize ranking scores. To explore the sparse structure, we assume that each multimedia object could be represented as a sparse linear combination of all other objects, and combination coefficients are regarded as a similarity measure between objects and used to regularize their ranking scores. Moreover, we propose to learn the sparse combination coefficients and the ranking scores simultaneously. A unified objective function is constructed with regard to both the combination coefficients and the ranking scores, and is optimized by an iterative algorithm. Experiments on two multimedia database retrieval data sets demonstrate the significant improvements of the propose algorithm over state-of-the-art ranking score learning algorithms.
A parallel algorithm for the non-symmetric eigenvalue problem

International Nuclear Information System (INIS)

Sidani, M.M.

1991-01-01

An algorithm is presented for the solution of the non-symmetric eigenvalue problem. The algorithm is based on a divide-and-conquer procedure that provides initial approximations to the eigenpairs, which are then refined using Newton iterations. Since the smaller subproblems can be solved independently, and since Newton iterations with different initial guesses can be started simultaneously, the algorithm - unlike the standard QR method - is ideal for parallel computers. The author also reports on his investigation of deflation methods designed to obtain further eigenpairs if needed. Numerical results from implementations on a host of parallel machines (distributed and shared-memory) are presented
Parallel coupling of symmetric and asymmetric exclusion processes

International Nuclear Information System (INIS)

Tsekouras, K; Kolomeisky, A B

2008-01-01

A system consisting of two parallel coupled channels where particles in one of them follow the rules of totally asymmetric exclusion processes (TASEP) and in another one move as in symmetric simple exclusion processes (SSEP) is investigated theoretically. Particles interact with each other via hard-core exclusion potential, and in the asymmetric channel they can only hop in one direction, while on the symmetric lattice particles jump in both directions with equal probabilities. Inter-channel transitions are also allowed at every site of both lattices. Stationary state properties of the system are solved exactly in the limit of strong couplings between the channels. It is shown that strong symmetric couplings between totally asymmetric and symmetric channels lead to an effective partially asymmetric simple exclusion process (PASEP) and properties of both channels become almost identical. However, strong asymmetric couplings between symmetric and asymmetric channels yield an effective TASEP with nonzero particle flux in the asymmetric channel and zero flux on the symmetric lattice. For intermediate strength of couplings between the lattices a vertical-cluster mean-field method is developed. This approximate approach treats exactly particle dynamics during the vertical transitions between the channels and it neglects the correlations along the channels. Our calculations show that in all cases there are three stationary phases defined by particle dynamics at entrances, at exits or in the bulk of the system, while phase boundaries depend on the strength and symmetry of couplings between the channels. Extensive Monte Carlo computer simulations strongly support our theoretical predictions. Theoretical calculations and computer simulations predict that inter-channel couplings have a strong effect on stationary properties. It is also argued that our results might be relevant for understanding multi-particle dynamics of motor proteins
Parallel Reservoir Simulations with Sparse Grid Techniques and Applications to Wormhole Propagation

KAUST Repository

Wu, Yuanqing

2015-09-08

In this work, two topics of reservoir simulations are discussed. The first topic is the two-phase compositional flow simulation in hydrocarbon reservoir. The major obstacle that impedes the applicability of the simulation code is the long run time of the simulation procedure, and thus speeding up the simulation code is necessary. Two means are demonstrated to address the problem: parallelism in physical space and the application of sparse grids in parameter space. The parallel code can gain satisfactory scalability, and the sparse grids can remove the bottleneck of flash calculations. Instead of carrying out the flash calculation in each time step of the simulation, a sparse grid approximation of all possible results of the flash calculation is generated before the simulation. Then the constructed surrogate model is evaluated to approximate the flash calculation results during the simulation. The second topic is the wormhole propagation simulation in carbonate reservoir. In this work, different from the traditional simulation technique relying on the Darcy framework, we propose a new framework called Darcy-Brinkman-Forchheimer framework to simulate wormhole propagation. Furthermore, to process the large quantity of cells in the simulation grid and shorten the long simulation time of the traditional serial code, standard domain-based parallelism is employed, using the Hypre multigrid library. In addition to that, a new technique called “experimenting field approach” to set coefficients in the model equations is introduced. In the 2D dissolution experiments, different configurations of wormholes and a series of properties simulated by both frameworks are compared. We conclude that the numerical results of the DBF framework are more like wormholes and more stable than the Darcy framework, which is a demonstration of the advantages of the DBF framework. The scalability of the parallel code is also evaluated, and good scalability can be achieved. Finally, a mixed
Building Input Adaptive Parallel Applications: A Case Study of Sparse Grid Interpolation

KAUST Repository

Murarasu, Alin

2012-12-01

The well-known power wall resulting in multi-cores requires special techniques for speeding up applications. In this sense, parallelization plays a crucial role. Besides standard serial optimizations, techniques such as input specialization can also bring a substantial contribution to the speedup. By identifying common patterns in the input data, we propose new algorithms for sparse grid interpolation that accelerate the state-of-the-art non-specialized version. Sparse grid interpolation is an inherently hierarchical method of interpolation employed for example in computational steering applications for decompressing highdimensional simulation data. In this context, improving the speedup is essential for real-time visualization. Using input specialization, we report a speedup of up to 9x over the nonspecialized version. The paper covers the steps we took to reach this speedup by means of input adaptivity. Our algorithms will be integrated in fastsg, a library for fast sparse grid interpolation. © 2012 IEEE.
Multi-threaded Sparse Matrix Sparse Matrix Multiplication for Many-Core and GPU Architectures.

Energy Technology Data Exchange (ETDEWEB)

Deveci, Mehmet [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Trott, Christian Robert [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Rajamanickam, Sivasankaran [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

2018-01-01

Sparse Matrix-Matrix multiplication is a key kernel that has applications in several domains such as scientific computing and graph analysis. Several algorithms have been studied in the past for this foundational kernel. In this paper, we develop parallel algorithms for sparse matrix- matrix multiplication with a focus on performance portability across different high performance computing architectures. The performance of these algorithms depend on the data structures used in them. We compare different types of accumulators in these algorithms and demonstrate the performance difference between these data structures. Furthermore, we develop a meta-algorithm, kkSpGEMM, to choose the right algorithm and data structure based on the characteristics of the problem. We show performance comparisons on three architectures and demonstrate the need for the community to develop two phase sparse matrix-matrix multiplication implementations for efficient reuse of the data structures involved.
Symmetric and asymmetric capillary bridges between a rough surface and a parallel surface.

Science.gov (United States)

Wang, Yongxin; Michielsen, Stephen; Lee, Hoon Joo

2013-09-03

Although the formation of a capillary bridge between two parallel surfaces has been extensively studied, the majority of research has described only symmetric capillary bridges between two smooth surfaces. In this work, an instrument was built to form a capillary bridge by squeezing a liquid drop on one surface with another surface. An analytical solution that describes the shape of symmetric capillary bridges joining two smooth surfaces has been extended to bridges that are asymmetric about the midplane and to rough surfaces. The solution, given by elliptical integrals of the first and second kind, is consistent with a constant Laplace pressure over the entire surface and has been verified for water, Kaydol, and dodecane drops forming symmetric and asymmetric bridges between parallel smooth surfaces. This solution has been applied to asymmetric capillary bridges between a smooth surface and a rough fabric surface as well as symmetric bridges between two rough surfaces. These solutions have been experimentally verified, and good agreement has been found between predicted and experimental profiles for small drops where the effect of gravity is negligible. Finally, a protocol for determining the profile from the volume and height of the capillary bridge has been developed and experimentally verified.
On symmetric structures of order two

Directory of Open Access Journals (Sweden)

Michel Bousquet

2008-04-01

Full Text Available Let (ω n 0 < n be the sequence known as Integer Sequence A047749 http://www.research.att.com/ njas/sequences/A047749 In this paper, we show that the integer ω n enumerates various kinds of symmetric structures of order two. We first consider ternary trees having a reflexive symmetry and we relate all symmetric combinatorial objects by means of bijection. We then generalize the symmetric structures and correspondences to an infinite family of symmetric objects.
An M-step preconditioned conjugate gradient method for parallel computation

Science.gov (United States)

Adams, L.

1983-01-01

This paper describes a preconditioned conjugate gradient method that can be effectively implemented on both vector machines and parallel arrays to solve sparse symmetric and positive definite systems of linear equations. The implementation on the CYBER 203/205 and on the Finite Element Machine is discussed and results obtained using the method on these machines are given.
Weakly Interacting Symmetric and Anti-Symmetric States in the Bilayer Systems

Science.gov (United States)

Marchewka, M.; Sheregii, E. M.; Tralle, I.; Tomaka, G.; Ploch, D.

We have studied the parallel magneto-transport in DQW-structures of two different potential shapes: quasi-rectangular and quasi-triangular. The quantum beats effect was observed in Shubnikov-de Haas (SdH) oscillations for both types of the DQW structures in perpendicular magnetic filed arrangement. We developed a special scheme for the Landau levels energies calculation by means of which we carried out the necessary simulations of beating effect. In order to obtain the agreement between our experimental data and the results of simulations, we introduced two different quasi-Fermi levels which characterize symmetric and anti-symmetric states in DQWs. The existence of two different quasi Fermi-Levels simply means, that one can treat two sub-systems (charge carriers characterized by symmetric and anti-symmetric wave functions) as weakly interacting and having their own rate of establishing the equilibrium state.
Modeling an in-register, parallel "iowa" aβ fibril structure using solid-state NMR data from labeled samples with rosetta.

Science.gov (United States)

Sgourakis, Nikolaos G; Yau, Wai-Ming; Qiang, Wei

2015-01-06

Determining the structures of amyloid fibrils is an important first step toward understanding the molecular basis of neurodegenerative diseases. For β-amyloid (Aβ) fibrils, conventional solid-state NMR structure determination using uniform labeling is limited by extensive peak overlap. We describe the characterization of a distinct structural polymorph of Aβ using solid-state NMR, transmission electron microscopy (TEM), and Rosetta model building. First, the overall fibril arrangement is established using mass-per-length measurements from TEM. Then, the fibril backbone arrangement, stacking registry, and "steric zipper" core interactions are determined using a number of solid-state NMR techniques on sparsely (13)C-labeled samples. Finally, we perform Rosetta structure calculations with an explicitly symmetric representation of the system. We demonstrate the power of the hybrid Rosetta/NMR approach by modeling the in-register, parallel "Iowa" mutant (D23N) at high resolution (1.2Å backbone rmsd). The final models are validated using an independent set of NMR experiments that confirm key features. Copyright © 2015 Elsevier Ltd. All rights reserved.
A Non-static Data Layout Enhancing Parallelism and Vectorization in Sparse Grid Algorithms

KAUST Repository

Buse, Gerrit; Pfluger, Dirk; Murarasu, Alin; Jacob, Riko

2012-01-01

performance and facilitate the use of vector registers for our sparse grid benchmark problem hierarchization. Based on the compact data structure proposed for regular sparse grids in [2], we developed a new algorithm that outperforms existing implementations
A Non-static Data Layout Enhancing Parallelism and Vectorization in Sparse Grid Algorithms

KAUST Repository

Buse, Gerrit

2012-06-01

The name sparse grids denotes a highly space-efficient, grid-based numerical technique to approximate high-dimensional functions. Although employed in a broad spectrum of applications from different fields, there have only been few tries to use it in real time visualization (e.g. [1]), due to complex data structures and long algorithm runtime. In this work we present a novel approach inspired by principles of I/0-efficient algorithms. Locally applied coefficient permutations lead to improved cache performance and facilitate the use of vector registers for our sparse grid benchmark problem hierarchization. Based on the compact data structure proposed for regular sparse grids in [2], we developed a new algorithm that outperforms existing implementations on modern multi-core systems by a factor of 37 for a grid size of 127 million points. For larger problems the speedup is even increasing, and with execution times below 1 s, sparse grids are well-suited for visualization applications. Furthermore, we point out how a broad class of sparse grid algorithms can benefit from our approach. © 2012 IEEE.
Radial electric field and ion parallel flow in the quasi-symmetric and Mirror configurations of HSX

Science.gov (United States)

Kumar, S. T. A.; Dobbins, T. J.; Talmadge, J. N.; Wilcox, R. S.; Anderson, D. T.

2018-05-01

The radial electric field and the ion mean parallel flow are obtained in the helically symmetric experiment stellarator from toroidal flow measurements of C+6 ion at two locations on a flux surface, using the Pfirsch–Schlüter effect. Results from the standard quasi-helically symmetric magnetic configuration are compared with those from the Mirror configuration where the quasi-symmetry is deliberately degraded using auxiliary coils. For similar injected power, the quasi-symmetric configuration is observed to have significantly lower flows while the experimental observations from the Mirror geometry are in better agreement with neoclassical calculations. Indications are that the radial electric field near the core of the quasi-symmetric configuration may be governed by non-neoclassical processes.
The Non–Symmetric s–Step Lanczos Algorithm: Derivation of Efficient Recurrences and Synchronization–Reducing Variants of BiCG and QMR

Directory of Open Access Journals (Sweden)

Feuerriegel Stefan

2015-12-01

Full Text Available The Lanczos algorithm is among the most frequently used iterative techniques for computing a few dominant eigenvalues of a large sparse non-symmetric matrix. At the same time, it serves as a building block within biconjugate gradient (BiCG and quasi-minimal residual (QMR methods for solving large sparse non-symmetric systems of linear equations. It is well known that, when implemented on distributed-memory computers with a huge number of processes, the synchronization time spent on computing dot products increasingly limits the parallel scalability. Therefore, we propose synchronization-reducing variants of the Lanczos, as well as BiCG and QMR methods, in an attempt to mitigate these negative performance effects. These so-called s-step algorithms are based on grouping dot products for joint execution and replacing time-consuming matrix operations by efficient vector recurrences. The purpose of this paper is to provide a rigorous derivation of the recurrences for the s-step Lanczos algorithm, introduce s-step BiCG and QMR variants, and compare the parallel performance of these new s-step versions with previous algorithms.

Symmetric metamaterials based on flower-shaped structure

International Nuclear Information System (INIS)

Tuong, P.V.; Park, J.W.; Rhee, J.Y.; Kim, K.W.; Cheong, H.; Jang, W.H.; Lee, Y.P.

2013-01-01

We proposed new models of metamaterials (MMs) based on a flower-shaped structure (FSS), whose “meta-atoms” consist of two flower-shaped metallic parts separated by a dielectric layer. Like the non-symmetric MMs based on cut-wire-pairs or electric ring resonators, the symmetrical FSS demonstrates the negative permeability at GHz frequencies. Employing the results, we designed a symmetric negative-refractive-index MM [a symmetric combined structure (SCS)], which is composed of FSSs and cross continuous wires. The MM properties of the FSS and the SCS are presented numerically and experimentally. - Highlights: • A new designed of sub-wavelength metamaterial, flower-shaped structure was proposed. • Flower-shaped meta-atom illustrated effective negative permeability. • Based on the meta-atom, negative refractive index was conventionally gained. • Negative refractive index was demonstrated with symmetric properties for electromagnetic wave. • Dimensional parameters were studied under normal electromagnetic wave
Structure-aware Local Sparse Coding for Visual Tracking

KAUST Repository

Qi, Yuankai

2018-01-24

Sparse coding has been applied to visual tracking and related vision problems with demonstrated success in recent years. Existing tracking methods based on local sparse coding sample patches from a target candidate and sparsely encode these using a dictionary consisting of patches sampled from target template images. The discriminative strength of existing methods based on local sparse coding is limited as spatial structure constraints among the template patches are not exploited. To address this problem, we propose a structure-aware local sparse coding algorithm which encodes a target candidate using templates with both global and local sparsity constraints. For robust tracking, we show local regions of a candidate region should be encoded only with the corresponding local regions of the target templates that are the most similar from the global view. Thus, a more precise and discriminative sparse representation is obtained to account for appearance changes. To alleviate the issues with tracking drifts, we design an effective template update scheme. Extensive experiments on challenging image sequences demonstrate the effectiveness of the proposed algorithm against numerous stateof- the-art methods.
A performance study of sparse Cholesky factorization on INTEL iPSC/860

Science.gov (United States)

Zubair, M.; Ghose, M.

1992-01-01

The problem of Cholesky factorization of a sparse matrix has been very well investigated on sequential machines. A number of efficient codes exist for factorizing large unstructured sparse matrices. However, there is a lack of such efficient codes on parallel machines in general, and distributed machines in particular. Some of the issues that are critical to the implementation of sparse Cholesky factorization on a distributed memory parallel machine are ordering, partitioning and mapping, load balancing, and ordering of various tasks within a processor. Here, we focus on the effect of various partitioning schemes on the performance of sparse Cholesky factorization on the Intel iPSC/860. Also, a new partitioning heuristic for structured as well as unstructured sparse matrices is proposed, and its performance is compared with other schemes.
Structure-based bayesian sparse reconstruction

KAUST Repository

Quadeer, Ahmed Abdul

2012-12-01

Sparse signal reconstruction algorithms have attracted research attention due to their wide applications in various fields. In this paper, we present a simple Bayesian approach that utilizes the sparsity constraint and a priori statistical information (Gaussian or otherwise) to obtain near optimal estimates. In addition, we make use of the rich structure of the sensing matrix encountered in many signal processing applications to develop a fast sparse recovery algorithm. The computational complexity of the proposed algorithm is very low compared with the widely used convex relaxation methods as well as greedy matching pursuit techniques, especially at high sparsity. © 1991-2012 IEEE.
Sparse symmetric preconditioners for dense linear systems in electromagnetism

NARCIS (Netherlands)

Carpentieri, Bruno; Duff, Iain S.; Giraud, Luc; Monga Made, M. Magolu

2004-01-01

We consider symmetric preconditioning strategies for the iterative solution of dense complex symmetric non-Hermitian systems arising in computational electromagnetics. In particular, we report on the numerical behaviour of the classical incomplete Cholesky factorization as well as some of its recent
Sparse Parallel MRI Based on Accelerated Operator Splitting Schemes.

Science.gov (United States)

Cai, Nian; Xie, Weisi; Su, Zhenghang; Wang, Shanshan; Liang, Dong

2016-01-01

Recently, the sparsity which is implicit in MR images has been successfully exploited for fast MR imaging with incomplete acquisitions. In this paper, two novel algorithms are proposed to solve the sparse parallel MR imaging problem, which consists of l 1 regularization and fidelity terms. The two algorithms combine forward-backward operator splitting and Barzilai-Borwein schemes. Theoretically, the presented algorithms overcome the nondifferentiable property in l 1 regularization term. Meanwhile, they are able to treat a general matrix operator that may not be diagonalized by fast Fourier transform and to ensure that a well-conditioned optimization system of equations is simply solved. In addition, we build connections between the proposed algorithms and the state-of-the-art existing methods and prove their convergence with a constant stepsize in Appendix. Numerical results and comparisons with the advanced methods demonstrate the efficiency of proposed algorithms.
Turbo-SMT: Parallel Coupled Sparse Matrix-Tensor Factorizations and Applications

Science.gov (United States)

Papalexakis, Evangelos E.; Faloutsos, Christos; Mitchell, Tom M.; Talukdar, Partha Pratim; Sidiropoulos, Nicholas D.; Murphy, Brian

2016-01-01

How can we correlate the neural activity in the human brain as it responds to typed words, with properties of these terms (like ’edible’, ’fits in hand’)? In short, we want to find latent variables, that jointly explain both the brain activity, as well as the behavioral responses. This is one of many settings of the Coupled Matrix-Tensor Factorization (CMTF) problem. Can we enhance any CMTF solver, so that it can operate on potentially very large datasets that may not fit in main memory? We introduce Turbo-SMT, a meta-method capable of doing exactly that: it boosts the performance of any CMTF algorithm, produces sparse and interpretable solutions, and parallelizes any CMTF algorithm, producing sparse and interpretable solutions (up to 65 fold). Additionally, we improve upon ALS, the work-horse algorithm for CMTF, with respect to efficiency and robustness to missing values. We apply Turbo-SMT to BrainQ, a dataset consisting of a (nouns, brain voxels, human subjects) tensor and a (nouns, properties) matrix, with coupling along the nouns dimension. Turbo-SMT is able to find meaningful latent variables, as well as to predict brain activity with competitive accuracy. Finally, we demonstrate the generality of Turbo-SMT, by applying it on a Facebook dataset (users, ’friends’, wall-postings); there, Turbo-SMT spots spammer-like anomalies. PMID:27672406
A distributed-memory hierarchical solver for general sparse linear systems

Energy Technology Data Exchange (ETDEWEB)

Chen, Chao [Stanford Univ., CA (United States). Inst. for Computational and Mathematical Engineering; Pouransari, Hadi [Stanford Univ., CA (United States). Dept. of Mechanical Engineering; Rajamanickam, Sivasankaran [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Center for Computing Research; Boman, Erik G. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Center for Computing Research; Darve, Eric [Stanford Univ., CA (United States). Inst. for Computational and Mathematical Engineering and Dept. of Mechanical Engineering

2017-12-20

We present a parallel hierarchical solver for general sparse linear systems on distributed-memory machines. For large-scale problems, this fully algebraic algorithm is faster and more memory-efficient than sparse direct solvers because it exploits the low-rank structure of fill-in blocks. Depending on the accuracy of low-rank approximations, the hierarchical solver can be used either as a direct solver or as a preconditioner. The parallel algorithm is based on data decomposition and requires only local communication for updating boundary data on every processor. Moreover, the computation-to-communication ratio of the parallel algorithm is approximately the volume-to-surface-area ratio of the subdomain owned by every processor. We also provide various numerical results to demonstrate the versatility and scalability of the parallel algorithm.
Object tracking by occlusion detection via structured sparse learning

KAUST Repository

Zhang, Tianzhu

2013-06-01

Sparse representation based methods have recently drawn much attention in visual tracking due to good performance against illumination variation and occlusion. They assume the errors caused by image variations can be modeled as pixel-wise sparse. However, in many practical scenarios these errors are not truly pixel-wise sparse but rather sparsely distributed in a structured way. In fact, pixels in error constitute contiguous regions within the object\\'s track. This is the case when significant occlusion occurs. To accommodate for non-sparse occlusion in a given frame, we assume that occlusion detected in previous frames can be propagated to the current one. This propagated information determines which pixels will contribute to the sparse representation of the current track. In other words, pixels that were detected as part of an occlusion in the previous frame will be removed from the target representation process. As such, this paper proposes a novel tracking algorithm that models and detects occlusion through structured sparse learning. We test our tracker on challenging benchmark sequences, such as sports videos, which involve heavy occlusion, drastic illumination changes, and large pose variations. Experimental results show that our tracker consistently outperforms the state-of-the-art. © 2013 IEEE.
Magnetospectroscopy of symmetric and anti-symmetric states in double quantum wells

Science.gov (United States)

Marchewka, M.; Sheregii, E. M.; Tralle, I.; Ploch, D.; Tomaka, G.; Furdak, M.; Kolek, A.; Stadler, A.; Mleczko, K.; Zak, D.; Strupinski, W.; Jasik, A.; Jakiela, R.

2008-02-01

The experimental results obtained for magnetotransport in the InGaAs/InAlAs double quantum well (DQW) structures of two different shapes of wells are reported. A beating effect occurring in the Shubnikov-de Haas (SdH) oscillations was observed for both types of structures at low temperatures in the parallel transport when the magnetic field was perpendicular to the layers. An approach for the calculation of the Landau level energies for DQW structures was developed and then applied to the analysis and interpretation of the experimental data related to the beating effect. We also argue that in order to account for the observed magnetotransport phenomena (SdH and integer quantum Hall effect), one should introduce two different quasi-Fermi levels characterizing two electron subsystems regarding the symmetry properties of their states, symmetric and anti-symmetric ones, which are not mixed by electron-electron interaction.
Iterative solution of general sparse linear systems on clusters of workstations

Energy Technology Data Exchange (ETDEWEB)

Lo, Gen-Ching; Saad, Y. [Univ. of Minnesota, Minneapolis, MN (United States)

1996-12-31

Solving sparse irregularly structured linear systems on parallel platforms poses several challenges. First, sparsity makes it difficult to exploit data locality, whether in a distributed or shared memory environment. A second, perhaps more serious challenge, is to find efficient ways to precondition the system. Preconditioning techniques which have a large degree of parallelism, such as multicolor SSOR, often have a slower rate of convergence than their sequential counterparts. Finally, a number of other computational kernels such as inner products could ruin any gains gained from parallel speed-ups, and this is especially true on workstation clusters where start-up times may be high. In this paper we discuss these issues and report on our experience with PSPARSLIB, an on-going project for building a library of parallel iterative sparse matrix solvers.
Enhancing Scalability of Sparse Direct Methods

International Nuclear Information System (INIS)

Li, Xiaoye S.; Demmel, James; Grigori, Laura; Gu, Ming; Xia, Jianlin; Jardin, Steve; Sovinec, Carl; Lee, Lie-Quan

2007-01-01

TOPS is providing high-performance, scalable sparse direct solvers, which have had significant impacts on the SciDAC applications, including fusion simulation (CEMM), accelerator modeling (COMPASS), as well as many other mission-critical applications in DOE and elsewhere. Our recent developments have been focusing on new techniques to overcome scalability bottleneck of direct methods, in both time and memory. These include parallelizing symbolic analysis phase and developing linear-complexity sparse factorization methods. The new techniques will make sparse direct methods more widely usable in large 3D simulations on highly-parallel petascale computers
JiTTree: A Just-in-Time Compiled Sparse GPU Volume Data Structure

KAUST Repository

Labschutz, Matthias

2015-08-12

Sparse volume data structures enable the efficient representation of large but sparse volumes in GPU memory for computation and visualization. However, the choice of a specific data structure for a given data set depends on several factors, such as the memory budget, the sparsity of the data, and data access patterns. In general, there is no single optimal sparse data structure, but a set of several candidates with individual strengths and drawbacks. One solution to this problem are hybrid data structures which locally adapt themselves to the sparsity. However, they typically suffer from increased traversal overhead which limits their utility in many applications. This paper presents JiTTree, a novel sparse hybrid volume data structure that uses just-in-time compilation to overcome these problems. By combining multiple sparse data structures and reducing traversal overhead we leverage their individual advantages. We demonstrate that hybrid data structures adapt well to a large range of data sets. They are especially superior to other sparse data structures for data sets that locally vary in sparsity. Possible optimization criteria are memory, performance and a combination thereof. Through just-in-time (JIT) compilation, JiTTree reduces the traversal overhead of the resulting optimal data structure. As a result, our hybrid volume data structure enables efficient computations on the GPU, while being superior in terms of memory usage when compared to non-hybrid data structures.
JiTTree: A Just-in-Time Compiled Sparse GPU Volume Data Structure

KAUST Repository

Labschutz, Matthias; Bruckner, Stefan; Groller, M. Eduard; Hadwiger, Markus; Rautek, Peter

2015-01-01

Sparse volume data structures enable the efficient representation of large but sparse volumes in GPU memory for computation and visualization. However, the choice of a specific data structure for a given data set depends on several factors, such as the memory budget, the sparsity of the data, and data access patterns. In general, there is no single optimal sparse data structure, but a set of several candidates with individual strengths and drawbacks. One solution to this problem are hybrid data structures which locally adapt themselves to the sparsity. However, they typically suffer from increased traversal overhead which limits their utility in many applications. This paper presents JiTTree, a novel sparse hybrid volume data structure that uses just-in-time compilation to overcome these problems. By combining multiple sparse data structures and reducing traversal overhead we leverage their individual advantages. We demonstrate that hybrid data structures adapt well to a large range of data sets. They are especially superior to other sparse data structures for data sets that locally vary in sparsity. Possible optimization criteria are memory, performance and a combination thereof. Through just-in-time (JIT) compilation, JiTTree reduces the traversal overhead of the resulting optimal data structure. As a result, our hybrid volume data structure enables efficient computations on the GPU, while being superior in terms of memory usage when compared to non-hybrid data structures.
JiTTree: A Just-in-Time Compiled Sparse GPU Volume Data Structure.

Science.gov (United States)

Labschütz, Matthias; Bruckner, Stefan; Gröller, M Eduard; Hadwiger, Markus; Rautek, Peter

2016-01-01

Sparse volume data structures enable the efficient representation of large but sparse volumes in GPU memory for computation and visualization. However, the choice of a specific data structure for a given data set depends on several factors, such as the memory budget, the sparsity of the data, and data access patterns. In general, there is no single optimal sparse data structure, but a set of several candidates with individual strengths and drawbacks. One solution to this problem are hybrid data structures which locally adapt themselves to the sparsity. However, they typically suffer from increased traversal overhead which limits their utility in many applications. This paper presents JiTTree, a novel sparse hybrid volume data structure that uses just-in-time compilation to overcome these problems. By combining multiple sparse data structures and reducing traversal overhead we leverage their individual advantages. We demonstrate that hybrid data structures adapt well to a large range of data sets. They are especially superior to other sparse data structures for data sets that locally vary in sparsity. Possible optimization criteria are memory, performance and a combination thereof. Through just-in-time (JIT) compilation, JiTTree reduces the traversal overhead of the resulting optimal data structure. As a result, our hybrid volume data structure enables efficient computations on the GPU, while being superior in terms of memory usage when compared to non-hybrid data structures.
Sparse linear systems: Theory of decomposition, methods, technology, applications and implementation in Wolfram Mathematica

Energy Technology Data Exchange (ETDEWEB)

Pilipchuk, L. A., E-mail: pilipchik@bsu.by [Belarussian State University, 220030 Minsk, 4, Nezavisimosti avenue, Republic of Belarus (Belarus); Pilipchuk, A. S., E-mail: an.pilipchuk@gmail.com [The Natural Resources and Environmental Protestion Ministry of the Republic of Belarus, 220004 Minsk, 10 Kollektornaya Street, Republic of Belarus (Belarus)

2015-11-30

In this paper we propose the theory of decomposition, methods, technologies, applications and implementation in Wol-fram Mathematica for the constructing the solutions of the sparse linear systems. One of the applications is the Sensor Location Problem for the symmetric graph in the case when split ratios of some arc flows can be zeros. The objective of that application is to minimize the number of sensors that are assigned to the nodes. We obtain a sparse system of linear algebraic equations and research its matrix rank. Sparse systems of these types appear in generalized network flow programming problems in the form of restrictions and can be characterized as systems with a large sparse sub-matrix representing the embedded network structure.
Sparse linear systems: Theory of decomposition, methods, technology, applications and implementation in Wolfram Mathematica

International Nuclear Information System (INIS)

Pilipchuk, L. A.; Pilipchuk, A. S.

2015-01-01

In this paper we propose the theory of decomposition, methods, technologies, applications and implementation in Wol-fram Mathematica for the constructing the solutions of the sparse linear systems. One of the applications is the Sensor Location Problem for the symmetric graph in the case when split ratios of some arc flows can be zeros. The objective of that application is to minimize the number of sensors that are assigned to the nodes. We obtain a sparse system of linear algebraic equations and research its matrix rank. Sparse systems of these types appear in generalized network flow programming problems in the form of restrictions and can be characterized as systems with a large sparse sub-matrix representing the embedded network structure
Cavity approach to the first eigenvalue problem in a family of symmetric random sparse matrices

International Nuclear Information System (INIS)

Kabashima, Yoshiyuki; Takahashi, Hisanao; Watanabe, Osamu

2010-01-01

A methodology to analyze the properties of the first (largest) eigenvalue and its eigenvector is developed for large symmetric random sparse matrices utilizing the cavity method of statistical mechanics. Under a tree approximation, which is plausible for infinitely large systems, in conjunction with the introduction of a Lagrange multiplier for constraining the length of the eigenvector, the eigenvalue problem is reduced to a bunch of optimization problems of a quadratic function of a single variable, and the coefficients of the first and the second order terms of the functions act as cavity fields that are handled in cavity analysis. We show that the first eigenvalue is determined in such a way that the distribution of the cavity fields has a finite value for the second order moment with respect to the cavity fields of the first order coefficient. The validity and utility of the developed methodology are examined by applying it to two analytically solvable and one simple but non-trivial examples in conjunction with numerical justification.
Occlusion detection via structured sparse learning for robust object tracking

KAUST Repository

Zhang, Tianzhu

2014-01-01

Sparse representation based methods have recently drawn much attention in visual tracking due to good performance against illumination variation and occlusion. They assume the errors caused by image variations can be modeled as pixel-wise sparse. However, in many practical scenarios, these errors are not truly pixel-wise sparse but rather sparsely distributed in a structured way. In fact, pixels in error constitute contiguous regions within the object’s track. This is the case when significant occlusion occurs. To accommodate for nonsparse occlusion in a given frame, we assume that occlusion detected in previous frames can be propagated to the current one. This propagated information determines which pixels will contribute to the sparse representation of the current track. In other words, pixels that were detected as part of an occlusion in the previous frame will be removed from the target representation process. As such, this paper proposes a novel tracking algorithm that models and detects occlusion through structured sparse learning. We test our tracker on challenging benchmark sequences, such as sports videos, which involve heavy occlusion, drastic illumination changes, and large pose variations. Extensive experimental results show that our proposed tracker consistently outperforms the state-of-the-art trackers.
SuperLU{_}DIST: A scalable distributed-memory sparse direct solver for unsymmetric linear systems

Energy Technology Data Exchange (ETDEWEB)

Li, Xiaoye S.; Demmel, James W.

2002-03-27

In this paper, we present the main algorithmic features in the software package SuperLU{_}DIST, a distributed-memory sparse direct solver for large sets of linear equations. We give in detail our parallelization strategies, with focus on scalability issues, and demonstrate the parallel performance and scalability on current machines. The solver is based on sparse Gaussian elimination, with an innovative static pivoting strategy proposed earlier by the authors. The main advantage of static pivoting over classical partial pivoting is that it permits a priori determination of data structures and communication pattern for sparse Gaussian elimination, which makes it more scalable on distributed memory machines. Based on this a priori knowledge, we designed highly parallel and scalable algorithms for both LU decomposition and triangular solve and we show that they are suitable for large-scale distributed memory machines.

Sparse distributed memory overview

Science.gov (United States)

Raugh, Mike

1990-01-01

The Sparse Distributed Memory (SDM) project is investigating the theory and applications of massively parallel computing architecture, called sparse distributed memory, that will support the storage and retrieval of sensory and motor patterns characteristic of autonomous systems. The immediate objectives of the project are centered in studies of the memory itself and in the use of the memory to solve problems in speech, vision, and robotics. Investigation of methods for encoding sensory data is an important part of the research. Examples of NASA missions that may benefit from this work are Space Station, planetary rovers, and solar exploration. Sparse distributed memory offers promising technology for systems that must learn through experience and be capable of adapting to new circumstances, and for operating any large complex system requiring automatic monitoring and control. Sparse distributed memory is a massively parallel architecture motivated by efforts to understand how the human brain works. Sparse distributed memory is an associative memory, able to retrieve information from cues that only partially match patterns stored in the memory. It is able to store long temporal sequences derived from the behavior of a complex system, such as progressive records of the system's sensory data and correlated records of the system's motor controls.
The FORCE: A portable parallel programming language supporting computational structural mechanics

Science.gov (United States)

Jordan, Harry F.; Benten, Muhammad S.; Brehm, Juergen; Ramanan, Aruna

1989-01-01

This project supports the conversion of codes in Computational Structural Mechanics (CSM) to a parallel form which will efficiently exploit the computational power available from multiprocessors. The work is a part of a comprehensive, FORTRAN-based system to form a basis for a parallel version of the NICE/SPAR combination which will form the CSM Testbed. The software is macro-based and rests on the force methodology developed by the principal investigator in connection with an early scientific multiprocessor. Machine independence is an important characteristic of the system so that retargeting it to the Flex/32, or any other multiprocessor on which NICE/SPAR might be imnplemented, is well supported. The principal investigator has experience in producing parallel software for both full and sparse systems of linear equations using the force macros. Other researchers have used the Force in finite element programs. It has been possible to rapidly develop software which performs at maximum efficiency on a multiprocessor. The inherent machine independence of the system also means that the parallelization will not be limited to a specific multiprocessor.
A Parallel Prefix Algorithm for Almost Toeplitz Tridiagonal Systems

Science.gov (United States)

Sun, Xian-He; Joslin, Ronald D.

1995-01-01

A compact scheme is a discretization scheme that is advantageous in obtaining highly accurate solutions. However, the resulting systems from compact schemes are tridiagonal systems that are difficult to solve efficiently on parallel computers. Considering the almost symmetric Toeplitz structure, a parallel algorithm, simple parallel prefix (SPP), is proposed. The SPP algorithm requires less memory than the conventional LU decomposition and is efficient on parallel machines. It consists of a prefix communication pattern and AXPY operations. Both the computation and the communication can be truncated without degrading the accuracy when the system is diagonally dominant. A formal accuracy study has been conducted to provide a simple truncation formula. Experimental results have been measured on a MasPar MP-1 SIMD machine and on a Cray 2 vector machine. Experimental results show that the simple parallel prefix algorithm is a good algorithm for symmetric, almost symmetric Toeplitz tridiagonal systems and for the compact scheme on high-performance computers.
Duality, phase structures, and dilemmas in symmetric quantum games

International Nuclear Information System (INIS)

Ichikawa, Tsubasa; Tsutsui, Izumi

2007-01-01

Symmetric quantum games for 2-player, 2-qubit strategies are analyzed in detail by using a scheme in which all pure states in the 2-qubit Hilbert space are utilized for strategies. We consider two different types of symmetric games exemplified by the familiar games, the Battle of the Sexes (BoS) and the Prisoners' Dilemma (PD). These two types of symmetric games are shown to be related by a duality map, which ensures that they share common phase structures with respect to the equilibria of the strategies. We find eight distinct phase structures possible for the symmetric games, which are determined by the classical payoff matrices from which the quantum games are defined. We also discuss the possibility of resolving the dilemmas in the classical BoS, PD, and the Stag Hunt (SH) game based on the phase structures obtained in the quantum games. It is observed that quantization cannot resolve the dilemma fully for the BoS, while it generically can for the PD and SH if appropriate correlations for the strategies of the players are provided
Multi-threaded Sparse Matrix-Matrix Multiplication for Many-Core and GPU Architectures.

Energy Technology Data Exchange (ETDEWEB)

Deveci, Mehmet [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Rajamanickam, Sivasankaran [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Trott, Christian Robert [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

2017-12-01

Sparse Matrix-Matrix multiplication is a key kernel that has applications in several domains such as scienti c computing and graph analysis. Several algorithms have been studied in the past for this foundational kernel. In this paper, we develop parallel algorithms for sparse matrix-matrix multiplication with a focus on performance portability across different high performance computing architectures. The performance of these algorithms depend on the data structures used in them. We compare different types of accumulators in these algorithms and demonstrate the performance difference between these data structures. Furthermore, we develop a meta-algorithm, kkSpGEMM, to choose the right algorithm and data structure based on the characteristics of the problem. We show performance comparisons on three architectures and demonstrate the need for the community to develop two phase sparse matrix-matrix multiplication implementations for efficient reuse of the data structures involved.
Fast sparsely synchronized brain rhythms in a scale-free neural network.

Science.gov (United States)

Kim, Sang-Yoon; Lim, Woochang

2015-08-01

We consider a directed version of the Barabási-Albert scale-free network model with symmetric preferential attachment with the same in- and out-degrees and study the emergence of sparsely synchronized rhythms for a fixed attachment degree in an inhibitory population of fast-spiking Izhikevich interneurons. Fast sparsely synchronized rhythms with stochastic and intermittent neuronal discharges are found to appear for large values of J (synaptic inhibition strength) and D (noise intensity). For an intensive study we fix J at a sufficiently large value and investigate the population states by increasing D. For small D, full synchronization with the same population-rhythm frequency fp and mean firing rate (MFR) fi of individual neurons occurs, while for large D partial synchronization with fp>〈fi〉 (〈fi〉: ensemble-averaged MFR) appears due to intermittent discharge of individual neurons; in particular, the case of fp>4〈fi〉 is referred to as sparse synchronization. For the case of partial and sparse synchronization, MFRs of individual neurons vary depending on their degrees. As D passes a critical value D* (which is determined by employing an order parameter), a transition to unsynchronization occurs due to the destructive role of noise to spoil the pacing between sparse spikes. For Dsparse synchronization do contributions of individual neuronal dynamics to population synchronization change depending on their degrees, unlike in the case of full synchronization. Consequently, dynamics of individual neurons reveal the inhomogeneous network structure for the case of partial and sparse synchronization, which is in contrast to the case of
Fast sparsely synchronized brain rhythms in a scale-free neural network

Science.gov (United States)

Kim, Sang-Yoon; Lim, Woochang

2015-08-01

We consider a directed version of the Barabási-Albert scale-free network model with symmetric preferential attachment with the same in- and out-degrees and study the emergence of sparsely synchronized rhythms for a fixed attachment degree in an inhibitory population of fast-spiking Izhikevich interneurons. Fast sparsely synchronized rhythms with stochastic and intermittent neuronal discharges are found to appear for large values of J (synaptic inhibition strength) and D (noise intensity). For an intensive study we fix J at a sufficiently large value and investigate the population states by increasing D . For small D , full synchronization with the same population-rhythm frequency fp and mean firing rate (MFR) fi of individual neurons occurs, while for large D partial synchronization with fp> ( : ensemble-averaged MFR) appears due to intermittent discharge of individual neurons; in particular, the case of fp>4 is referred to as sparse synchronization. For the case of partial and sparse synchronization, MFRs of individual neurons vary depending on their degrees. As D passes a critical value D* (which is determined by employing an order parameter), a transition to unsynchronization occurs due to the destructive role of noise to spoil the pacing between sparse spikes. For D sparse synchronization do contributions of individual neuronal dynamics to population synchronization change depending on their degrees, unlike in the case of full synchronization. Consequently, dynamics of individual neurons reveal the inhomogeneous network structure for the case of partial and sparse synchronization, which is in contrast to the case of statistically homogeneous
Robust visual tracking via structured multi-task sparse learning

KAUST Repository

Zhang, Tianzhu; Ghanem, Bernard; Liu, Si; Ahuja, Narendra

2012-01-01

In this paper, we formulate object tracking in a particle filter framework as a structured multi-task sparse learning problem, which we denote as Structured Multi-Task Tracking (S-MTT). Since we model particles as linear combinations of dictionary
Compact data structure and scalable algorithms for the sparse grid technique

KAUST Repository

Murarasu, Alin

2011-01-01

The sparse grid discretization technique enables a compressed representation of higher-dimensional functions. In its original form, it relies heavily on recursion and complex data structures, thus being far from well-suited for GPUs. In this paper, we describe optimizations that enable us to implement compression and decompression, the crucial sparse grid algorithms for our application, on Nvidia GPUs. The main idea consists of a bijective mapping between the set of points in a multi-dimensional sparse grid and a set of consecutive natural numbers. The resulting data structure consumes a minimum amount of memory. For a 10-dimensional sparse grid with approximately 127 million points, it consumes up to 30 times less memory than trees or hash tables which are typically used. Compared to a sequential CPU implementation, the speedups achieved on GPU are up to 17 for compression and up to 70 for decompression, respectively. We show that the optimizations are also applicable to multicore CPUs. Copyright © 2011 ACM.
Parallel Conjugate Gradient: Effects of Ordering Strategies, Programming Paradigms, and Architectural Platforms

Science.gov (United States)

Oliker, Leonid; Heber, Gerd; Biswas, Rupak

2000-01-01

The Conjugate Gradient (CG) algorithm is perhaps the best-known iterative technique to solve sparse linear systems that are symmetric and positive definite. A sparse matrix-vector multiply (SPMV) usually accounts for most of the floating-point operations within a CG iteration. In this paper, we investigate the effects of various ordering and partitioning strategies on the performance of parallel CG and SPMV using different programming paradigms and architectures. Results show that for this class of applications, ordering significantly improves overall performance, that cache reuse may be more important than reducing communication, and that it is possible to achieve message passing performance using shared memory constructs through careful data ordering and distribution. However, a multi-threaded implementation of CG on the Tera MTA does not require special ordering or partitioning to obtain high efficiency and scalability.
Effects of Ordering Strategies and Programming Paradigms on Sparse Matrix Computations

Science.gov (United States)

Oliker, Leonid; Li, Xiaoye; Husbands, Parry; Biswas, Rupak; Biegel, Bryan (Technical Monitor)

2002-01-01

The Conjugate Gradient (CG) algorithm is perhaps the best-known iterative technique to solve sparse linear systems that are symmetric and positive definite. For systems that are ill-conditioned, it is often necessary to use a preconditioning technique. In this paper, we investigate the effects of various ordering and partitioning strategies on the performance of parallel CG and ILU(O) preconditioned CG (PCG) using different programming paradigms and architectures. Results show that for this class of applications: ordering significantly improves overall performance on both distributed and distributed shared-memory systems, that cache reuse may be more important than reducing communication, that it is possible to achieve message-passing performance using shared-memory constructs through careful data ordering and distribution, and that a hybrid MPI+OpenMP paradigm increases programming complexity with little performance gains. A implementation of CG on the Cray MTA does not require special ordering or partitioning to obtain high efficiency and scalability, giving it a distinct advantage for adaptive applications; however, it shows limited scalability for PCG due to a lack of thread level parallelism.
A General Sparse Tensor Framework for Electronic Structure Theory.

Science.gov (United States)

Manzer, Samuel; Epifanovsky, Evgeny; Krylov, Anna I; Head-Gordon, Martin

2017-03-14

Linear-scaling algorithms must be developed in order to extend the domain of applicability of electronic structure theory to molecules of any desired size. However, the increasing complexity of modern linear-scaling methods makes code development and maintenance a significant challenge. A major contributor to this difficulty is the lack of robust software abstractions for handling block-sparse tensor operations. We therefore report the development of a highly efficient symbolic block-sparse tensor library in order to provide access to high-level software constructs to treat such problems. Our implementation supports arbitrary multi-dimensional sparsity in all input and output tensors. We avoid cumbersome machine-generated code by implementing all functionality as a high-level symbolic C++ language library and demonstrate that our implementation attains very high performance for linear-scaling sparse tensor contractions.
Structural Sparse Tracking

KAUST Repository

Zhang, Tianzhu; Yang, Ming-Hsuan; Ahuja, Narendra; Ghanem, Bernard; Yan, Shuicheng; Xu, Changsheng; Liu, Si

2015-01-01

candidate. We show that our SST algorithm accommodates most existing sparse trackers with the respective merits. Both qualitative and quantitative evaluations on challenging benchmark image sequences demonstrate that the proposed SST algorithm performs
Evolution of symmetric reconnection layer in the presence of parallel shear flow

Energy Technology Data Exchange (ETDEWEB)

Lu Haoyu [Space Science Institute, School of Astronautics, Beihang University, Beijing 100191 (China); Sate Key Laboratory of Space Weather, Chinese Academy of Sciences, Beijing 100190 (China); Cao Jinbin [Space Science Institute, School of Astronautics, Beihang University, Beijing 100191 (China)

2011-07-15

The development of the structure of symmetric reconnection layer in the presence of a shear flow parallel to the antiparallel magnetic field component is studied by using a set of one-dimensional (1D) magnetohydrodynamic (MHD) equations. The Riemann problem is simulated through a second-order conservative TVD (total variation diminishing) scheme, in conjunction with Roe's averages for the Riemann problem. The simulation results indicate that besides the MHD shocks and expansion waves, there exist some new small-scale structures in the reconnection layer. For the case of zero initial guide magnetic field (i.e., B{sub y0} = 0), a pair of intermediate shock and slow shock (SS) is formed in the presence of the parallel shear flow. The critical velocity of initial shear flow V{sub zc} is just the Alfven velocity in the inflow region. As V{sub z{infinity}} increases to the value larger than V{sub zc}, a new slow expansion wave appears in the position of SS in the case V{sub z{infinity}} < V{sub zc}, and one of the current densities drops to zero. As plasma {beta} increases, the out-flow region is widened. For B{sub y0} {ne} 0, a pair of SSs and an additional pair of time-dependent intermediate shocks (TDISs) are found to be present. Similar to the case of B{sub y0} = 0, there exists a critical velocity of initial shear flow V{sub zc}. The value of V{sub zc} is, however, smaller than the Alfven velocity of the inflow region. As plasma {beta} increases, the velocities of SS and TDIS increase, and the out-flow region is widened. However, the velocity of downstream SS increases even faster, making the distance between SS and TDIS smaller. Consequently, the interaction between SS and TDIS in the case of high plasma {beta} influences the property of direction rotation of magnetic field across TDIS. Thereby, a wedge in the hodogram of tangential magnetic field comes into being. When {beta}{yields}{infinity}, TDISs disappear and the guide magnetic field becomes constant.
A Fast Parallel Algorithm for Selected Inversion of Structured Sparse Matrices with Application to 2D Electronic Structure Calculations

International Nuclear Information System (INIS)

Lin Lin; Chao Yang; Jiangfeng Lu; Lexing Ying; Weinan, E.

2009-01-01

We present an efficient parallel algorithm and its implementation for computing the diagonal of H -1 where H is a 2D Kohn-Sham Hamiltonian discretized on a rectangular domain using a standard second order finite difference scheme. This type of calculation can be used to obtain an accurate approximation to the diagonal of a Fermi-Dirac function of H through a recently developed pole-expansion technique LinLuYingE2009. The diagonal elements are needed in electronic structure calculations for quantum mechanical systems HohenbergKohn1964, KohnSham 1965,DreizlerGross1990. We show how elimination tree is used to organize the parallel computation and how synchronization overhead is reduced by passing data level by level along this tree using the technique of local buffers and relative indices. We analyze the performance of our implementation by examining its load balance and communication overhead. We show that our implementation exhibits an excellent weak scaling on a large-scale high performance distributed parallel machine. When compared with standard approach for evaluating the diagonal a Fermi-Dirac function of a Kohn-Sham Hamiltonian associated a 2D electron quantum dot, the new pole-expansion technique that uses our algorithm to compute the diagonal of (H-z i I) -1 for a small number of poles z i is much faster, especially when the quantum dot contains many electrons.
A Fast Parallel Algorithm for Selected Inversion of Structured Sparse Matrices with Application to 2D Electronic Structure Calculations

Energy Technology Data Exchange (ETDEWEB)

Lin, Lin; Yang, Chao; Lu, Jiangfeng; Ying, Lexing; E, Weinan

2009-09-25

We present an efficient parallel algorithm and its implementation for computing the diagonal of $H^-1$ where $H$ is a 2D Kohn-Sham Hamiltonian discretized on a rectangular domain using a standard second order finite difference scheme. This type of calculation can be used to obtain an accurate approximation to the diagonal of a Fermi-Dirac function of $H$ through a recently developed pole-expansion technique \\cite{LinLuYingE2009}. The diagonal elements are needed in electronic structure calculations for quantum mechanical systems \\citeHohenbergKohn1964, KohnSham 1965,DreizlerGross1990. We show how elimination tree is used to organize the parallel computation and how synchronization overhead is reduced by passing data level by level along this tree using the technique of local buffers and relative indices. We analyze the performance of our implementation by examining its load balance and communication overhead. We show that our implementation exhibits an excellent weak scaling on a large-scale high performance distributed parallel machine. When compared with standard approach for evaluating the diagonal a Fermi-Dirac function of a Kohn-Sham Hamiltonian associated a 2D electron quantum dot, the new pole-expansion technique that uses our algorithm to compute the diagonal of $(H-z_i I)^-1$ for a small number of poles $z_i$ is much faster, especially when the quantum dot contains many electrons.
Object tracking by occlusion detection via structured sparse learning

KAUST Repository

Zhang, Tianzhu; Ghanem, Bernard; Xu, Changsheng; Ahuja, Narendra

2013-01-01

occlusion through structured sparse learning. We test our tracker on challenging benchmark sequences, such as sports videos, which involve heavy occlusion, drastic illumination changes, and large pose variations. Experimental results show that our tracker
Prosodic structure as a parallel to musical structure

Directory of Open Access Journals (Sweden)

Christopher Cullen Heffner

2015-12-01

Full Text Available What structural properties do language and music share? Although early speculation identified a wide variety of possibilities, the literature has largely focused on the parallels between musical structure and syntactic structure. Here, we argue that parallels between musical structure and prosodic structure deserve more attention. We review the evidence for a link between musical and prosodic structure and find it to be strong. In fact, certain elements of prosodic structure may provide a parsimonious comparison with musical structure without sacrificing empirical findings related to the parallels between language and music. We then develop several predictions related to such a hypothesis.
A framework for general sparse matrix-matrix multiplication on GPUs and heterogeneous processors

DEFF Research Database (Denmark)

Liu, Weifeng; Vinter, Brian

2015-01-01

General sparse matrix-matrix multiplication (SpGEMM) is a fundamental building block for numerous applications such as algebraic multigrid method (AMG), breadth first search and shortest path problem. Compared to other sparse BLAS routines, an efficient parallel SpGEMM implementation has to handle...... extra irregularity from three aspects: (1) the number of nonzero entries in the resulting sparse matrix is unknown in advance, (2) very expensive parallel insert operations at random positions in the resulting sparse matrix dominate the execution time, and (3) load balancing must account for sparse data...... memory space and efficiently utilizes the very limited on-chip scratchpad memory. Parallel insert operations of the nonzero entries are implemented through the GPU merge path algorithm that is experimentally found to be the fastest GPU merge approach. Load balancing builds on the number of necessary...
Adaptive structured dictionary learning for image fusion based on group-sparse-representation

Science.gov (United States)

Yang, Jiajie; Sun, Bin; Luo, Chengwei; Wu, Yuzhong; Xu, Limei

2018-04-01

Dictionary learning is the key process of sparse representation which is one of the most widely used image representation theories in image fusion. The existing dictionary learning method does not use the group structure information and the sparse coefficients well. In this paper, we propose a new adaptive structured dictionary learning algorithm and a l1-norm maximum fusion rule that innovatively utilizes grouped sparse coefficients to merge the images. In the dictionary learning algorithm, we do not need prior knowledge about any group structure of the dictionary. By using the characteristics of the dictionary in expressing the signal, our algorithm can automatically find the desired potential structure information that hidden in the dictionary. The fusion rule takes the physical meaning of the group structure dictionary, and makes activity-level judgement on the structure information when the images are being merged. Therefore, the fused image can retain more significant information. Comparisons have been made with several state-of-the-art dictionary learning methods and fusion rules. The experimental results demonstrate that, the dictionary learning algorithm and the fusion rule both outperform others in terms of several objective evaluation metrics.

Parallel and Scalable Sparse Basic Linear Algebra Subprograms

DEFF Research Database (Denmark)

Liu, Weifeng

and heterogeneous processors. The thesis compares the proposed methods with state-of-the-art approaches on six homogeneous and five heterogeneous processors from Intel, AMD and nVidia. Using in total 38 sparse matrices as a benchmark suite, the experimental results show that the proposed methods obtain significant...
Seeing or moving in parallel

DEFF Research Database (Denmark)

Christensen, Mark Schram; Ehrsson, H Henrik; Nielsen, Jens Bo

2013-01-01

a different network, involving bilateral dorsal premotor cortex (PMd), primary motor cortex, and SMA, was more active when subjects viewed parallel movements while performing either symmetrical or parallel movements. Correlations between behavioral instability and brain activity were present in right lateral...... adduction-abduction movements symmetrically or in parallel with real-time congruent or incongruent visual feedback of the movements. One network, consisting of bilateral superior and middle frontal gyrus and supplementary motor area (SMA), was more active when subjects performed parallel movements, whereas...
Remote sensing image segmentation using local sparse structure constrained latent low rank representation

Science.gov (United States)

Tian, Shu; Zhang, Ye; Yan, Yimin; Su, Nan; Zhang, Junping

2016-09-01

Latent low-rank representation (LatLRR) has been attached considerable attention in the field of remote sensing image segmentation, due to its effectiveness in exploring the multiple subspace structures of data. However, the increasingly heterogeneous texture information in the high spatial resolution remote sensing images, leads to more severe interference of pixels in local neighborhood, and the LatLRR fails to capture the local complex structure information. Therefore, we present a local sparse structure constrainted latent low-rank representation (LSSLatLRR) segmentation method, which explicitly imposes the local sparse structure constraint on LatLRR to capture the intrinsic local structure in manifold structure feature subspaces. The whole segmentation framework can be viewed as two stages in cascade. In the first stage, we use the local histogram transform to extract the texture local histogram features (LHOG) at each pixel, which can efficiently capture the complex and micro-texture pattern. In the second stage, a local sparse structure (LSS) formulation is established on LHOG, which aims to preserve the local intrinsic structure and enhance the relationship between pixels having similar local characteristics. Meanwhile, by integrating the LSS and the LatLRR, we can efficiently capture the local sparse and low-rank structure in the mixture of feature subspace, and we adopt the subspace segmentation method to improve the segmentation accuracy. Experimental results on the remote sensing images with different spatial resolution show that, compared with three state-of-the-art image segmentation methods, the proposed method achieves more accurate segmentation results.
Building Input Adaptive Parallel Applications: A Case Study of Sparse Grid Interpolation

KAUST Repository

Murarasu, Alin; Weidendorfer, Josef

2012-01-01

bring a substantial contribution to the speedup. By identifying common patterns in the input data, we propose new algorithms for sparse grid interpolation that accelerate the state-of-the-art non-specialized version. Sparse grid interpolation
Sparse data structure design for wavelet-based methods

Directory of Open Access Journals (Sweden)

Latu Guillaume

2011-12-01

Full Text Available This course gives an introduction to the design of efficient datatypes for adaptive wavelet-based applications. It presents some code fragments and benchmark technics useful to learn about the design of sparse data structures and adaptive algorithms. Material and practical examples are given, and they provide good introduction for anyone involved in the development of adaptive applications. An answer will be given to the question: how to implement and efficiently use the discrete wavelet transform in computer applications? A focus will be made on time-evolution problems, and use of wavelet-based scheme for adaptively solving partial differential equations (PDE. One crucial issue is that the benefits of the adaptive method in term of algorithmic cost reduction can not be wasted by overheads associated to sparse data management.
An Efficient GPU General Sparse Matrix-Matrix Multiplication for Irregular Data

DEFF Research Database (Denmark)

Liu, Weifeng; Vinter, Brian

2014-01-01

General sparse matrix-matrix multiplication (SpGEMM) is a fundamental building block for numerous applications such as algebraic multigrid method, breadth first search and shortest path problem. Compared to other sparse BLAS routines, an efficient parallel SpGEMM algorithm has to handle extra...... irregularity from three aspects: (1) the number of the nonzero entries in the result sparse matrix is unknown in advance, (2) very expensive parallel insert operations at random positions in the result sparse matrix dominate the execution time, and (3) load balancing must account for sparse data in both input....... Load balancing builds on the number of the necessary arithmetic operations on the nonzero entries and is guaranteed in all stages. Compared with the state-of-the-art GPU SpGEMM methods in the CUSPARSE library and the CUSP library and the latest CPU SpGEMM method in the Intel Math Kernel Library, our...
A symmetric positive definite formulation for monolithic fluid structure interaction

KAUST Repository

Robinson-Mosher, Avi; Schroeder, Craig; Fedkiw, Ronald

2011-01-01

In this paper we consider a strongly coupled (monolithic) fluid structure interaction framework for incompressible flow, as opposed to a loosely coupled (partitioned) method. This requires solving a single linear system that combines the unknown velocities of the structure with the unknown pressures of the fluid. In our previous work, we were able to obtain a symmetric formulation of this coupled system; however, it was also indefinite, making it more difficult to solve. In fact in practice there have been cases where we have been unable to invert the system. In this paper we take a novel approach that consists of factoring the damping matrix of deformable structures and show that this can be used to obtain a symmetric positive definite system, at least to the extent that the uncoupled systems were symmetric positive definite. We use a traditional MAC grid discretization of the fluid and a fully Lagrangian discretization of the structures for the sake of exposition, noting that our procedure can be generalized to other scenarios. For the special case of rigid bodies, where there are no internal damping forces, we exactly recover the system of Batty et al. (2007) [4]. © 2010 Elsevier Inc.
A symmetric positive definite formulation for monolithic fluid structure interaction

KAUST Repository

Robinson-Mosher, Avi

2011-02-01

In this paper we consider a strongly coupled (monolithic) fluid structure interaction framework for incompressible flow, as opposed to a loosely coupled (partitioned) method. This requires solving a single linear system that combines the unknown velocities of the structure with the unknown pressures of the fluid. In our previous work, we were able to obtain a symmetric formulation of this coupled system; however, it was also indefinite, making it more difficult to solve. In fact in practice there have been cases where we have been unable to invert the system. In this paper we take a novel approach that consists of factoring the damping matrix of deformable structures and show that this can be used to obtain a symmetric positive definite system, at least to the extent that the uncoupled systems were symmetric positive definite. We use a traditional MAC grid discretization of the fluid and a fully Lagrangian discretization of the structures for the sake of exposition, noting that our procedure can be generalized to other scenarios. For the special case of rigid bodies, where there are no internal damping forces, we exactly recover the system of Batty et al. (2007) [4]. © 2010 Elsevier Inc.
Occlusion detection via structured sparse learning for robust object tracking

KAUST Repository

Zhang, Tianzhu; Ghanem, Bernard; Xu, Changsheng; Ahuja, Narendra

2014-01-01

occlusion through structured sparse learning. We test our tracker on challenging benchmark sequences, such as sports videos, which involve heavy occlusion, drastic illumination changes, and large pose variations. Extensive experimental results show that our
A Projected Conjugate Gradient Method for Sparse Minimax Problems

DEFF Research Database (Denmark)

Madsen, Kaj; Jonasson, Kristjan

1993-01-01

A new method for nonlinear minimax problems is presented. The method is of the trust region type and based on sequential linear programming. It is a first order method that only uses first derivatives and does not approximate Hessians. The new method is well suited for large sparse problems...... as it only requires that software for sparse linear programming and a sparse symmetric positive definite equation solver are available. On each iteration a special linear/quadratic model of the function is minimized, but contrary to the usual practice in trust region methods the quadratic model is only...... with the method are presented. In fact, we find that the number of iterations required is comparable to that of state-of-the-art quasi-Newton codes....
Efficient implementations of block sparse matrix operations on shared memory vector machines

International Nuclear Information System (INIS)

Washio, T.; Maruyama, K.; Osoda, T.; Doi, S.; Shimizu, F.

2000-01-01

In this paper, we propose vectorization and shared memory-parallelization techniques for block-type random sparse matrix operations in finite element (FEM) applications. Here, a block corresponds to unknowns on one node in the FEM mesh and we assume that the block size is constant over the mesh. First, we discuss some basic vectorization ideas (the jagged diagonal (JAD) format and the segmented scan algorithm) for the sparse matrix-vector product. Then, we extend these ideas to the shared memory parallelization. After that, we show that the techniques can be applied not only to the sparse matrix-vector product but also to the sparse matrix-matrix product, the incomplete or complete sparse LU factorization and preconditioning. Finally, we report the performance evaluation results obtained on an NEC SX-4 shared memory vector machine for linear systems in some FEM applications. (author)
A sparse-grid isogeometric solver

KAUST Repository

Beck, Joakim; Sangalli, Giancarlo; Tamellini, Lorenzo

2018-01-01

Isogeometric Analysis (IGA) typically adopts tensor-product splines and NURBS as a basis for the approximation of the solution of PDEs. In this work, we investigate to which extent IGA solvers can benefit from the so-called sparse-grids construction in its combination technique form, which was first introduced in the early 90’s in the context of the approximation of high-dimensional PDEs.The tests that we report show that, in accordance to the literature, a sparse-grid construction can indeed be useful if the solution of the PDE at hand is sufficiently smooth. Sparse grids can also be useful in the case of non-smooth solutions when some a-priori knowledge on the location of the singularities of the solution can be exploited to devise suitable non-equispaced meshes. Finally, we remark that sparse grids can be seen as a simple way to parallelize pre-existing serial IGA solvers in a straightforward fashion, which can be beneficial in many practical situations.
A sparse-grid isogeometric solver

KAUST Repository

Beck, Joakim

2018-02-28

Isogeometric Analysis (IGA) typically adopts tensor-product splines and NURBS as a basis for the approximation of the solution of PDEs. In this work, we investigate to which extent IGA solvers can benefit from the so-called sparse-grids construction in its combination technique form, which was first introduced in the early 90’s in the context of the approximation of high-dimensional PDEs.The tests that we report show that, in accordance to the literature, a sparse-grid construction can indeed be useful if the solution of the PDE at hand is sufficiently smooth. Sparse grids can also be useful in the case of non-smooth solutions when some a-priori knowledge on the location of the singularities of the solution can be exploited to devise suitable non-equispaced meshes. Finally, we remark that sparse grids can be seen as a simple way to parallelize pre-existing serial IGA solvers in a straightforward fashion, which can be beneficial in many practical situations.
Improving the Stability and Robustness of Incomplete Symmetric Indefinite Factorization Preconditioners

Czech Academy of Sciences Publication Activity Database

Scott, J.; Tůma, Miroslav

2017-01-01

Roč. 24, č. 5 (2017), č. článku e2099. ISSN 1070-5325 Grant - others:GA ČR(CZ) GC17-04150J; GA ČR(CZ) GC17-04150J; EPSRC(GB) EP/I013067/1 Institutional support: RVO:67985807 Keywords : incomplete factorizations * indefinite symmetric systems * iterative solvers * pivoting * preconditioning * sparse linear systems * sparse matrices Subject RIV: BA - General Mathematics OBOR OECD: Applied mathematics Impact factor: 1.303, year: 2016
Variation in efficiency of parallel algorithms. [for study of stiffness matrices in planar trusses

Science.gov (United States)

Hayashi, A.; Melosh, R. J.; Utku, S.; Salama, M.

1985-01-01

The present study has the objective to investigate some iterative parallel-processor linear equation solving algorithms with respect to efficiency for analyses of typical linear engineering systems. Attention is given to a set of n linear equations, Ku = p, where K = an n x n positive definite, sparsely populated, symmetric matrix, u = an n x 1 vector of unknown responses, and p = an n x 1 vector of prescribed constants. This study is concerned with a hybrid method in which iteration is used to solve the problem, while a direct method is used on the local processor level. Variations in the efficiency of parallel algorithms are explored. Measures of the efficiency are based on computer experiments regarding the algorithms. For all the algorithms, the wall clock time is found to decrease as the number of processors increases.
Symmetric structures of coherent states in superfluid helium-4

International Nuclear Information System (INIS)

Ahmad, M.

1981-02-01

Coherent States in superfluid helium-4 are discussed and symmetric structures are assigned to these states. Discrete and continuous series functions are exhibited for such states. Coherent State structure has been assigned to oscillating condensed bosons and their inter-relations and their effects on the superfluid system are analysed. (author)
Efficient Pseudorecursive Evaluation Schemes for Non-adaptive Sparse Grids

KAUST Repository

Buse, Gerrit

2014-01-01

In this work we propose novel algorithms for storing and evaluating sparse grid functions, operating on regular (not spatially adaptive), yet potentially dimensionally adaptive grid types. Besides regular sparse grids our approach includes truncated grids, both with and without boundary grid points. Similar to the implicit data structures proposed in Feuersänger (Dünngitterverfahren für hochdimensionale elliptische partielle Differntialgleichungen. Diploma Thesis, Institut für Numerische Simulation, Universität Bonn, 2005) and Murarasu et al. (Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming. Cambridge University Press, New York, 2011, pp. 25–34) we also define a bijective mapping from the multi-dimensional space of grid points to a contiguous index, such that the grid data can be stored in a simple array without overhead. Our approach is especially well-suited to exploit all levels of current commodity hardware, including cache-levels and vector extensions. Furthermore, this kind of data structure is extremely attractive for today’s real-time applications, as it gives direct access to the hierarchical structure of the grids, while outperforming other common sparse grid structures (hash maps, etc.) which do not match with modern compute platforms that well. For dimensionality d ≤ 10 we achieve good speedups on a 12 core Intel Westmere-EP NUMA platform compared to the results presented in Murarasu et al. (Proceedings of the International Conference on Computational Science—ICCS 2012. Procedia Computer Science, 2012). As we show, this also holds for the results obtained on Nvidia Fermi GPUs, for which we observe speedups over our own CPU implementation of up to 4.5 when dealing with moderate dimensionality. In high-dimensional settings, in the order of tens to hundreds of dimensions, our sparse grid evaluation kernels on the CPU outperform any other known implementation.
Semi-supervised sparse coding

KAUST Repository

Wang, Jim Jing-Yan; Gao, Xin

2014-01-01

Sparse coding approximates the data sample as a sparse linear combination of some basic codewords and uses the sparse codes as new presentations. In this paper, we investigate learning discriminative sparse codes by sparse coding in a semi-supervised manner, where only a few training samples are labeled. By using the manifold structure spanned by the data set of both labeled and unlabeled samples and the constraints provided by the labels of the labeled samples, we learn the variable class labels for all the samples. Furthermore, to improve the discriminative ability of the learned sparse codes, we assume that the class labels could be predicted from the sparse codes directly using a linear classifier. By solving the codebook, sparse codes, class labels and classifier parameters simultaneously in a unified objective function, we develop a semi-supervised sparse coding algorithm. Experiments on two real-world pattern recognition problems demonstrate the advantage of the proposed methods over supervised sparse coding methods on partially labeled data sets.
Semi-supervised sparse coding

KAUST Repository

Wang, Jim Jing-Yan

2014-07-06

Sparse coding approximates the data sample as a sparse linear combination of some basic codewords and uses the sparse codes as new presentations. In this paper, we investigate learning discriminative sparse codes by sparse coding in a semi-supervised manner, where only a few training samples are labeled. By using the manifold structure spanned by the data set of both labeled and unlabeled samples and the constraints provided by the labels of the labeled samples, we learn the variable class labels for all the samples. Furthermore, to improve the discriminative ability of the learned sparse codes, we assume that the class labels could be predicted from the sparse codes directly using a linear classifier. By solving the codebook, sparse codes, class labels and classifier parameters simultaneously in a unified objective function, we develop a semi-supervised sparse coding algorithm. Experiments on two real-world pattern recognition problems demonstrate the advantage of the proposed methods over supervised sparse coding methods on partially labeled data sets.
In Defense of Sparse Tracking: Circulant Sparse Tracker

KAUST Repository

Zhang, Tianzhu; Bibi, Adel Aamer; Ghanem, Bernard

2016-01-01

Sparse representation has been introduced to visual tracking by finding the best target candidate with minimal reconstruction error within the particle filter framework. However, most sparse representation based trackers have high computational cost, less than promising tracking performance, and limited feature representation. To deal with the above issues, we propose a novel circulant sparse tracker (CST), which exploits circulant target templates. Because of the circulant structure property, CST has the following advantages: (1) It can refine and reduce particles using circular shifts of target templates. (2) The optimization can be efficiently solved entirely in the Fourier domain. (3) High dimensional features can be embedded into CST to significantly improve tracking performance without sacrificing much computation time. Both qualitative and quantitative evaluations on challenging benchmark sequences demonstrate that CST performs better than all other sparse trackers and favorably against state-of-the-art methods.

In Defense of Sparse Tracking: Circulant Sparse Tracker

KAUST Repository

Zhang, Tianzhu

2016-12-13

Sparse representation has been introduced to visual tracking by finding the best target candidate with minimal reconstruction error within the particle filter framework. However, most sparse representation based trackers have high computational cost, less than promising tracking performance, and limited feature representation. To deal with the above issues, we propose a novel circulant sparse tracker (CST), which exploits circulant target templates. Because of the circulant structure property, CST has the following advantages: (1) It can refine and reduce particles using circular shifts of target templates. (2) The optimization can be efficiently solved entirely in the Fourier domain. (3) High dimensional features can be embedded into CST to significantly improve tracking performance without sacrificing much computation time. Both qualitative and quantitative evaluations on challenging benchmark sequences demonstrate that CST performs better than all other sparse trackers and favorably against state-of-the-art methods.
Analog system for computing sparse codes

Science.gov (United States)

Rozell, Christopher John; Johnson, Don Herrick; Baraniuk, Richard Gordon; Olshausen, Bruno A.; Ortman, Robert Lowell

2010-08-24

A parallel dynamical system for computing sparse representations of data, i.e., where the data can be fully represented in terms of a small number of non-zero code elements, and for reconstructing compressively sensed images. The system is based on the principles of thresholding and local competition that solves a family of sparse approximation problems corresponding to various sparsity metrics. The system utilizes Locally Competitive Algorithms (LCAs), nodes in a population continually compete with neighboring units using (usually one-way) lateral inhibition to calculate coefficients representing an input in an over complete dictionary.
On Symmetric Polynomials

OpenAIRE

Golden, Ryan; Cho, Ilwoo

2015-01-01

In this paper, we study structure theorems of algebras of symmetric functions. Based on a certain relation on elementary symmetric polynomials generating such algebras, we consider perturbation in the algebras. In particular, we understand generators of the algebras as perturbations. From such perturbations, define injective maps on generators, which induce algebra-monomorphisms (or embeddings) on the algebras. They provide inductive structure theorems on algebras of symmetric polynomials. As...
A sparse version of IGA solvers

KAUST Repository

Beck, Joakim; Sangalli, Giancarlo; Tamellini, Lorenzo

2017-01-01

Isogeometric Analysis (IGA) typically adopts tensor-product splines and NURBS as a basis for the approximation of the solution of PDEs. In this work, we investigate to which extent IGA solvers can benefit from the so-called sparse-grids construction in its combination technique form, which was first introduced in the early 90s in the context of the approximation of high-dimensional PDEs. The tests that we report show that, in accordance to the literature, a sparse grids construction can indeed be useful if the solution of the PDE at hand is sufficiently smooth. Sparse grids can also be useful in the case of non-smooth solutions when some a-priori knowledge on the location of the singularities of the solution can be exploited to devise suitable non-equispaced meshes. Finally, we remark that sparse grids can be seen as a simple way to parallelize pre-existing serial IGA solvers in a straightforward fashion, which can be beneficial in many practical situations.
A sparse version of IGA solvers

KAUST Repository

Beck, Joakim

2017-07-30

Isogeometric Analysis (IGA) typically adopts tensor-product splines and NURBS as a basis for the approximation of the solution of PDEs. In this work, we investigate to which extent IGA solvers can benefit from the so-called sparse-grids construction in its combination technique form, which was first introduced in the early 90s in the context of the approximation of high-dimensional PDEs. The tests that we report show that, in accordance to the literature, a sparse grids construction can indeed be useful if the solution of the PDE at hand is sufficiently smooth. Sparse grids can also be useful in the case of non-smooth solutions when some a-priori knowledge on the location of the singularities of the solution can be exploited to devise suitable non-equispaced meshes. Finally, we remark that sparse grids can be seen as a simple way to parallelize pre-existing serial IGA solvers in a straightforward fashion, which can be beneficial in many practical situations.
Compact data structure and scalable algorithms for the sparse grid technique

KAUST Repository

Murarasu, Alin; Weidendorfer, Josef; Buse, Gerrit; Butnaru, Daniel; Pflü ger, Dirk

2011-01-01

The sparse grid discretization technique enables a compressed representation of higher-dimensional functions. In its original form, it relies heavily on recursion and complex data structures, thus being far from well-suited for GPUs. In this paper
Analog/RF performance of two tunnel FETs with symmetric structures

Science.gov (United States)

Chen, Shupeng; Liu, Hongxia; Wang, Shulong; Li, Wei; Wang, Qianqiong

2017-11-01

In this paper, the radio frequency and analog performance of two tunnel field-effect transistors with symmetric structures are analyzed. The symmetric U-shape gate tunnel field-effect transistor (SUTFET) and symmetric tunnel field-effect transistor (STFET) are investigated by Silvaco Atlas simulation. The basic electrical properties and the parameters related to frequency and analog characteristics are analyzed. Due to the lower off-state leakage current, the STFET has better power consumption performance. The SUTFET obtains larger operating current (242 μA/μm), transconductance (490 μS/μm), output conductance (494 μS/μm), gain bandwidth product (3.2 GHz) and cut-off frequency (27.7 GHz). The simulation result of these two devices can be used as a guideline for their analog/RF applications.
New Parallel Algorithms for Structural Analysis and Design of Aerospace Structures

Science.gov (United States)

Nguyen, Duc T.

1998-01-01

Subspace and Lanczos iterations have been developed, well documented, and widely accepted as efficient methods for obtaining p-lowest eigen-pair solutions of large-scale, practical engineering problems. The focus of this paper is to incorporate recent developments in vectorized sparse technologies in conjunction with Subspace and Lanczos iterative algorithms for computational enhancements. Numerical performance, in terms of accuracy and efficiency of the proposed sparse strategies for Subspace and Lanczos algorithm, is demonstrated by solving for the lowest frequencies and mode shapes of structural problems on the IBM-R6000/590 and SunSparc 20 workstations.
Sparse RNA folding revisited: space-efficient minimum free energy structure prediction.

Science.gov (United States)

Will, Sebastian; Jabbari, Hosna

2016-01-01

RNA secondary structure prediction by energy minimization is the central computational tool for the analysis of structural non-coding RNAs and their interactions. Sparsification has been successfully applied to improve the time efficiency of various structure prediction algorithms while guaranteeing the same result; however, for many such folding problems, space efficiency is of even greater concern, particularly for long RNA sequences. So far, space-efficient sparsified RNA folding with fold reconstruction was solved only for simple base-pair-based pseudo-energy models. Here, we revisit the problem of space-efficient free energy minimization. Whereas the space-efficient minimization of the free energy has been sketched before, the reconstruction of the optimum structure has not even been discussed. We show that this reconstruction is not possible in trivial extension of the method for simple energy models. Then, we present the time- and space-efficient sparsified free energy minimization algorithm SparseMFEFold that guarantees MFE structure prediction. In particular, this novel algorithm provides efficient fold reconstruction based on dynamically garbage-collected trace arrows. The complexity of our algorithm depends on two parameters, the number of candidates Z and the number of trace arrows T; both are bounded by [Formula: see text], but are typically much smaller. The time complexity of RNA folding is reduced from [Formula: see text] to [Formula: see text]; the space complexity, from [Formula: see text] to [Formula: see text]. Our empirical results show more than 80 % space savings over RNAfold [Vienna RNA package] on the long RNAs from the RNA STRAND database (≥2500 bases). The presented technique is intentionally generalizable to complex prediction algorithms; due to their high space demands, algorithms like pseudoknot prediction and RNA-RNA-interaction prediction are expected to profit even stronger than "standard" MFE folding. SparseMFEFold is free
De novo protein structure determination using sparse NMR data

International Nuclear Information System (INIS)

Bowers, Peter M.; Strauss, Charlie E.M.; Baker, David

2000-01-01

We describe a method for generating moderate to high-resolution protein structures using limited NMR data combined with the ab initio protein structure prediction method Rosetta. Peptide fragments are selected from proteins of known structure based on sequence similarity and consistency with chemical shift and NOE data. Models are built from these fragments by minimizing an energy function that favors hydrophobic burial, strand pairing, and satisfaction of NOE constraints. Models generated using this procedure with ∼1 NOE constraint per residue are in some cases closer to the corresponding X-ray structures than the published NMR solution structures. The method requires only the sparse constraints available during initial stages of NMR structure determination, and thus holds promise for increasing the speed with which protein solution structures can be determined
An iteration for indefinite and non-symmetric systems and its application to the Navier-Stokes equations

Energy Technology Data Exchange (ETDEWEB)

Wathen, A. [Oxford Univ. (United Kingdom); Golub, G. [Stanford Univ., CA (United States)

1996-12-31

A simple fixed point linearisation of the Navier-Stokes equations leads to the Oseen problem which after appropriate discretisation yields large sparse linear systems with coefficient matrices of the form (A B{sup T} B -C). Here A is non-symmetric but its symmetric part is positive definite, and C is symmetric and positive semi-definite. Such systems arise in other situations. In this talk we will describe and present some analysis for an iteration based on an indefinite and symmetric preconditioner of the form (D B{sup T} B -C).
Structure-based bayesian sparse reconstruction

KAUST Repository

Quadeer, Ahmed Abdul; Al-Naffouri, Tareq Y.

2012-01-01

Sparse signal reconstruction algorithms have attracted research attention due to their wide applications in various fields. In this paper, we present a simple Bayesian approach that utilizes the sparsity constraint and a priori statistical
Solving Sparse Polynomial Optimization Problems with Chordal Structure Using the Sparse, Bounded-Degree Sum-of-Squares Hierarchy

NARCIS (Netherlands)

Marandi, Ahmadreza; de Klerk, Etienne; Dahl, Joachim

The sparse bounded degree sum-of-squares (sparse-BSOS) hierarchy of Weisser, Lasserre and Toh [arXiv:1607.01151,2016] constructs a sequence of lower bounds for a sparse polynomial optimization problem. Under some assumptions, it is proven by the authors that the sequence converges to the optimal
Structured Parallel Programming Patterns for Efficient Computation

CERN Document Server

McCool, Michael; Robison, Arch

2012-01-01

Programming is now parallel programming. Much as structured programming revolutionized traditional serial programming decades ago, a new kind of structured programming, based on patterns, is relevant to parallel programming today. Parallel computing experts and industry insiders Michael McCool, Arch Robison, and James Reinders describe how to design and implement maintainable and efficient parallel algorithms using a pattern-based approach. They present both theory and practice, and give detailed concrete examples using multiple programming models. Examples are primarily given using two of th
Structure-constrained sparse canonical correlation analysis with an application to microbiome data analysis.

Science.gov (United States)

Chen, Jun; Bushman, Frederic D; Lewis, James D; Wu, Gary D; Li, Hongzhe

2013-04-01

Motivated by studying the association between nutrient intake and human gut microbiome composition, we developed a method for structure-constrained sparse canonical correlation analysis (ssCCA) in a high-dimensional setting. ssCCA takes into account the phylogenetic relationships among bacteria, which provides important prior knowledge on evolutionary relationships among bacterial taxa. Our ssCCA formulation utilizes a phylogenetic structure-constrained penalty function to impose certain smoothness on the linear coefficients according to the phylogenetic relationships among the taxa. An efficient coordinate descent algorithm is developed for optimization. A human gut microbiome data set is used to illustrate this method. Both simulations and real data applications show that ssCCA performs better than the standard sparse CCA in identifying meaningful variables when there are structures in the data.
SLAP, Large Sparse Linear System Solution Package

International Nuclear Information System (INIS)

Greenbaum, A.

1987-01-01

1 - Description of program or function: SLAP is a set of routines for solving large sparse systems of linear equations. One need not store the entire matrix - only the nonzero elements and their row and column numbers. Any nonzero structure is acceptable, so the linear system solver need not be modified when the structure of the matrix changes. Auxiliary storage space is acquired and released within the routines themselves by use of the LRLTRAN POINTER statement. 2 - Method of solution: SLAP contains one direct solver, a band matrix factorization and solution routine, BAND, and several interactive solvers. The iterative routines are as follows: JACOBI, Jacobi iteration; GS, Gauss-Seidel Iteration; ILUIR, incomplete LU decomposition with iterative refinement; DSCG and ICCG, diagonal scaling and incomplete Cholesky decomposition with conjugate gradient iteration (for symmetric positive definite matrices only); DSCGN and ILUGGN, diagonal scaling and incomplete LU decomposition with conjugate gradient interaction on the normal equations; DSBCG and ILUBCG, diagonal scaling and incomplete LU decomposition with bi-conjugate gradient iteration; and DSOMN and ILUOMN, diagonal scaling and incomplete LU decomposition with ORTHOMIN iteration
Reconstruction of sparse connectivity in neural networks from spike train covariances

International Nuclear Information System (INIS)

Pernice, Volker; Rotter, Stefan

2013-01-01

The inference of causation from correlation is in general highly problematic. Correspondingly, it is difficult to infer the existence of physical synaptic connections between neurons from correlations in their activity. Covariances in neural spike trains and their relation to network structure have been the subject of intense research, both experimentally and theoretically. The influence of recurrent connections on covariances can be characterized directly in linear models, where connectivity in the network is described by a matrix of linear coupling kernels. However, as indirect connections also give rise to covariances, the inverse problem of inferring network structure from covariances can generally not be solved unambiguously. Here we study to what degree this ambiguity can be resolved if the sparseness of neural networks is taken into account. To reconstruct a sparse network, we determine the minimal set of linear couplings consistent with the measured covariances by minimizing the L 1 norm of the coupling matrix under appropriate constraints. Contrary to intuition, after stochastic optimization of the coupling matrix, the resulting estimate of the underlying network is directed, despite the fact that a symmetric matrix of count covariances is used for inference. The performance of the new method is best if connections are neither exceedingly sparse, nor too dense, and it is easily applicable for networks of a few hundred nodes. Full coupling kernels can be obtained from the matrix of full covariance functions. We apply our method to networks of leaky integrate-and-fire neurons in an asynchronous–irregular state, where spike train covariances are well described by a linear model. (paper)
Ordering schemes for sparse matrices using modern programming paradigms

International Nuclear Information System (INIS)

Oliker, Leonid; Li, Xiaoye; Husbands, Parry; Biswas, Rupak

2000-01-01

The Conjugate Gradient (CG) algorithm is perhaps the best-known iterative technique to solve sparse linear systems that are symmetric and positive definite. In previous work, we investigated the effects of various ordering and partitioning strategies on the performance of CG using different programming paradigms and architectures. This paper makes several extensions to our prior research. First, we present a hybrid(MPI+OpenMP) implementation of the CG algorithm on the IBM SP and show that the hybrid paradigm increases programming complexity with little performance gains compared to a pure MPI implementation. For ill-conditioned linear systems, it is often necessary to use a preconditioning technique. We present MPI results for ILU(0) preconditioned CG (PCG) using the BlockSolve95 library, and show that the initial ordering of the input matrix dramatically affect PCG's performance. Finally, a multithreaded version of the PCG is developed on the Cray (Tera) MTA. Unlike the message-passing version, this implementation did not require the complexities of special orderings or graph dependency analysis. However, only limited scalability was achieved due to the lack of available thread level parallelism
Tunable elastic parity-time symmetric structure based on the shunted piezoelectric materials

Science.gov (United States)

Hou, Zhilin; Assouar, Badreddine

2018-02-01

We theoretically and numerically report on the tunable elastic Parity-Time (PT) symmetric structure based on shunted piezoelectric units. We show that the elastic loss and gain can be archived in piezoelectric materials when they are shunted by external circuits containing positive and negative resistances. We present and discuss, as an example, the strongly dependent relationship between the exceptional points of a three-layered system and the impedance of their external shunted circuit. The achieved results evidence that the PT symmetric structures based on this proposed concept can actively be tuned without any change of their geometric configurations.
Fast wavelet based sparse approximate inverse preconditioner

Energy Technology Data Exchange (ETDEWEB)

Wan, W.L. [Univ. of California, Los Angeles, CA (United States)

1996-12-31

Incomplete LU factorization is a robust preconditioner for both general and PDE problems but unfortunately not easy to parallelize. Recent study of Huckle and Grote and Chow and Saad showed that sparse approximate inverse could be a potential alternative while readily parallelizable. However, for special class of matrix A that comes from elliptic PDE problems, their preconditioners are not optimal in the sense that independent of mesh size. A reason may be that no good sparse approximate inverse exists for the dense inverse matrix. Our observation is that for this kind of matrices, its inverse entries typically have piecewise smooth changes. We can take advantage of this fact and use wavelet compression techniques to construct a better sparse approximate inverse preconditioner. We shall show numerically that our approach is effective for this kind of matrices.

A low loss superconducting filter with four states based on symmetrical interdigital-loaded structure

International Nuclear Information System (INIS)

Gao, Tianqi; Wei, Bin; Cao, Bisong; Wang, Dan; Guo, Xubo

2016-01-01

Highlights: • A novel symmetrical interdigital-loaded microstrip structure is presents. • A six-pole L-band HTS filter with four states has similar in-band responses. • The coupling coefficients between resonators keep unchanged during tuning. • The low loss HTS filter can be tuned from 1.382 GHz to 1.193 GHz. - Abstract: This paper presents a new symmetrical interdigital-loaded microstrip structure. The symmetrical structure can be applied to design a filter that can work at different frequencies. The filter has similar in-band response at each working frequency with low insertion loss. Based on the proposed structures, a low-loss six-pole high temperature superconducting (HTS) filter with four different working states is designed and fabricated. The center frequency of the filter can be tuned discretely from 1.382 GHz to 1.193 GHz. All four states have similar in-band characters, whereas the insertion losses are less than 0.3 dB. The measured results are consistent with the simulations.
Structural Properties of G,T-Parallel Duplexes

Directory of Open Access Journals (Sweden)

Anna Aviñó

2010-01-01

Full Text Available The structure of G,T-parallel-stranded duplexes of DNA carrying similar amounts of adenine and guanine residues is studied by means of molecular dynamics (MD simulations and UV- and CD spectroscopies. In addition the impact of the substitution of adenine by 8-aminoadenine and guanine by 8-aminoguanine is analyzed. The presence of 8-aminoadenine and 8-aminoguanine stabilizes the parallel duplex structure. Binding of these oligonucleotides to their target polypyrimidine sequences to form the corresponding G,T-parallel triplex was not observed. Instead, when unmodified parallel-stranded duplexes were mixed with their polypyrimidine target, an interstrand Watson-Crick duplex was formed. As predicted by theoretical calculations parallel-stranded duplexes carrying 8-aminopurines did not bind to their target. The preference for the parallel-duplex over the Watson-Crick antiparallel duplex is attributed to the strong stabilization of the parallel duplex produced by the 8-aminopurines. Theoretical studies show that the isomorphism of the triads is crucial for the stability of the parallel triplex.
Permuting sparse rectangular matrices into block-diagonal form

Energy Technology Data Exchange (ETDEWEB)

Aykanat, Cevdet; Pinar, Ali; Catalyurek, Umit V.

2002-12-09

This work investigates the problem of permuting a sparse rectangular matrix into block diagonal form. Block diagonal form of a matrix grants an inherent parallelism for the solution of the deriving problem, as recently investigated in the context of mathematical programming, LU factorization and QR factorization. We propose graph and hypergraph models to represent the nonzero structure of a matrix, which reduce the permutation problem to those of graph partitioning by vertex separator and hypergraph partitioning, respectively. Besides proposing the models to represent sparse matrices and investigating related combinatorial problems, we provide a detailed survey of relevant literature to bridge the gap between different societies, investigate existing techniques for partitioning and propose new ones, and finally present a thorough empirical study of these techniques. Our experiments on a wide range of matrices, using state-of-the-art graph and hypergraph partitioning tools MeTiS and PaT oH, revealed that the proposed methods yield very effective solutions both in terms of solution quality and run time.
Automatic Management of Parallel and Distributed System Resources

Science.gov (United States)

Yan, Jerry; Ngai, Tin Fook; Lundstrom, Stephen F.

1990-01-01

Viewgraphs on automatic management of parallel and distributed system resources are presented. Topics covered include: parallel applications; intelligent management of multiprocessing systems; performance evaluation of parallel architecture; dynamic concurrent programs; compiler-directed system approach; lattice gaseous cellular automata; and sparse matrix Cholesky factorization.
Secoond order parallel tensors on some paracontact manifolds | Liu ...

African Journals Online (AJOL)

The object of the present paper is to study the symmetric and skewsymmetric properties of a second order parallel tensor on paracontact metric (k;μ)- spaces and almost β-para-Kenmotsu (k;μ)-spaces. In this paper, we prove that if there exists a second order symmetric parallel tensor on a paracontact metric (k;μ)- space M, ...
The hidden symmetries and their algebraic structure of the static axially symmetric SDYM fields

International Nuclear Information System (INIS)

Hao Sanru

1993-01-01

A new explicit transformation about the static axially symmetric self-dual Yang-Mills (SDYM) fields is presented. The theory has proved that the new transformation is a symmetric one. For the two kinds of the Lie algebraic generators of the Lie group SL (N. R) /SO (N), the corresponding transformations are given. By making use of the Yang-Baxter equality and their square brackets, the loop and conformal algebraic structures of the symmetric transformations for the basic fields have been obtained. All the results obtained can be directly generalized to the other models
Parallel reduction to condensed forms for symmetric eigenvalue problems using aggregated fine-grained and memory-aware kernels

KAUST Repository

Haidar, Azzam

2011-01-01

This paper introduces a novel implementation in reducing a symmetric dense matrix to tridiagonal form, which is the preprocessing step toward solving symmetric eigenvalue problems. Based on tile algorithms, the reduction follows a two-stage approach, where the tile matrix is first reduced to symmetric band form prior to the final condensed structure. The challenging trade-off between algorithmic performance and task granularity has been tackled through a grouping technique, which consists of aggregating fine-grained and memory-aware computational tasks during both stages, while sustaining the application\\'s overall high performance. A dynamic runtime environment system then schedules the different tasks in an out-of-order fashion. The performance for the tridiagonal reduction reported in this paper is unprecedented. Our implementation results in up to 50-fold and 12-fold improvement (130 Gflop/s) compared to the equivalent routines from LAPACK V3.2 and Intel MKL V10.3, respectively, on an eight socket hexa-core AMD Opteron multicore shared-memory system with a matrix size of 24000×24000. Copyright 2011 ACM.
Harmonic analysis on symmetric spaces

CERN Document Server

Terras, Audrey

This text explores the geometry and analysis of higher rank analogues of the symmetric spaces introduced in volume one. To illuminate both the parallels and differences of the higher rank theory, the space of positive matrices is treated in a manner mirroring that of the upper-half space in volume one. This concrete example furnishes motivation for the general theory of noncompact symmetric spaces, which is outlined in the final chapter. The book emphasizes motivation and comprehensibility, concrete examples and explicit computations (by pen and paper, and by computer), history, and, above all, applications in mathematics, statistics, physics, and engineering. The second edition includes new sections on Donald St. P. Richards’s central limit theorem for O(n)-invariant random variables on the symmetric space of GL(n, R), on random matrix theory, and on advances in the theory of automorphic forms on arithmetic groups.
Structural synthesis of parallel robots

CERN Document Server

Gogu, Grigore

This book represents the fifth part of a larger work dedicated to the structural synthesis of parallel robots. The originality of this work resides in the fact that it combines new formulae for mobility, connectivity, redundancy and overconstraints with evolutionary morphology in a unified structural synthesis approach that yields interesting and innovative solutions for parallel robotic manipulators. This is the first book on robotics that presents solutions for coupled, decoupled, uncoupled, fully-isotropic and maximally regular robotic manipulators with Schönflies motions systematically generated by using the structural synthesis approach proposed in Part 1. Overconstrained non-redundant/overactuated/redundantly actuated solutions with simple/complex limbs are proposed. Many solutions are presented here for the first time in the literature. The author had to make a difficult and challenging choice between protecting these solutions through patents and releasing them directly into the public domain. T...
Stiffness Analysis and Comparison of 3-PPR Planar Parallel Manipulators with Actuation Compliance

DEFF Research Database (Denmark)

Wu, Guanglei; Bai, Shaoping; Kepler, Jørgen Asbøl

2012-01-01

In this paper, the stiffness of 3-PPR planar parallel manipulator (PPM) is analyzed with the consideration of nonlinear actuation compliance. The characteristics of the stiffness matrix pertaining to the planar parallel manipulators are analyzed and discussed. Graphic representation of the stiffn...... of the stiffness characteristics by means of translational and rotational stiffness mapping is developed. The developed method is illustrated with an unsymmetrical 3-PPR PPM, being compared with its structure-symmetrical counterpart....
Spectra of sparse random matrices

International Nuclear Information System (INIS)

Kuehn, Reimer

2008-01-01

We compute the spectral density for ensembles of sparse symmetric random matrices using replica. Our formulation of the replica-symmetric ansatz shares the symmetries of that suggested in a seminal paper by Rodgers and Bray (symmetry with respect to permutation of replica and rotation symmetry in the space of replica), but uses a different representation in terms of superpositions of Gaussians. It gives rise to a pair of integral equations which can be solved by a stochastic population-dynamics algorithm. Remarkably our representation allows us to identify pure-point contributions to the spectral density related to the existence of normalizable eigenstates. Our approach is not restricted to matrices defined on graphs with Poissonian degree distribution. Matrices defined on regular random graphs or on scale-free graphs, are easily handled. We also look at matrices with row constraints such as discrete graph Laplacians. Our approach naturally allows us to unfold the total density of states into contributions coming from vertices of different local coordinations and an example of such an unfolding is presented. Our results are well corroborated by numerical diagonalization studies of large finite random matrices
Joint Group Sparse PCA for Compressed Hyperspectral Imaging.

Science.gov (United States)

Khan, Zohaib; Shafait, Faisal; Mian, Ajmal

2015-12-01

A sparse principal component analysis (PCA) seeks a sparse linear combination of input features (variables), so that the derived features still explain most of the variations in the data. A group sparse PCA introduces structural constraints on the features in seeking such a linear combination. Collectively, the derived principal components may still require measuring all the input features. We present a joint group sparse PCA (JGSPCA) algorithm, which forces the basic coefficients corresponding to a group of features to be jointly sparse. Joint sparsity ensures that the complete basis involves only a sparse set of input features, whereas the group sparsity ensures that the structural integrity of the features is maximally preserved. We evaluate the JGSPCA algorithm on the problems of compressed hyperspectral imaging and face recognition. Compressed sensing results show that the proposed method consistently outperforms sparse PCA and group sparse PCA in reconstructing the hyperspectral scenes of natural and man-made objects. The efficacy of the proposed compressed sensing method is further demonstrated in band selection for face recognition.
Factored Facade Acquisition using Symmetric Line Arrangements

KAUST Repository

Ceylan, Duygu

2012-05-01

We introduce a novel framework for image-based 3D reconstruction of urban buildings based on symmetry priors. Starting from image-level edges, we generate a sparse and approximate set of consistent 3D lines. These lines are then used to simultaneously detect symmetric line arrangements while refining the estimated 3D model. Operating both on 2D image data and intermediate 3D feature representations, we perform iterative feature consolidation and effective outlier pruning, thus eliminating reconstruction artifacts arising from ambiguous or wrong stereo matches. We exploit non-local coherence of symmetric elements to generate precise model reconstructions, even in the presence of a significant amount of outlier image-edges arising from reflections, shadows, outlier objects, etc. We evaluate our algorithm on several challenging test scenarios, both synthetic and real. Beyond reconstruction, the extracted symmetry patterns are useful towards interactive and intuitive model manipulations.
Rotationally symmetric structure in two extragalactic radio sources

International Nuclear Information System (INIS)

Lonsdale, C.J.; Morison, I.

1980-01-01

The new multi-telescope radio-linked interferometer (MTRLI) at Jodrell Bank was used during January and February 1980 at a frequency of 408 MHz to map the extragalactic radio sources 3C196 and 3C305 with a resolution of approximately 1 arc s. It is shown here that both the markedly symmetric structures observed and the spectral index distributions inferred from comparisons with previously published 5 GHz maps provide evidence for the source axes having rotated during the lifetime of the emitting regions. (U.K.)
Parallel processing of structural integrity analysis codes

International Nuclear Information System (INIS)

Swami Prasad, P.; Dutta, B.K.; Kushwaha, H.S.

1996-01-01

Structural integrity analysis forms an important role in assessing and demonstrating the safety of nuclear reactor components. This analysis is performed using analytical tools such as Finite Element Method (FEM) with the help of digital computers. The complexity of the problems involved in nuclear engineering demands high speed computation facilities to obtain solutions in reasonable amount of time. Parallel processing systems such as ANUPAM provide an efficient platform for realising the high speed computation. The development and implementation of software on parallel processing systems is an interesting and challenging task. The data and algorithm structure of the codes plays an important role in exploiting the parallel processing system capabilities. Structural analysis codes based on FEM can be divided into two categories with respect to their implementation on parallel processing systems. The first category codes such as those used for harmonic analysis, mechanistic fuel performance codes need not require the parallelisation of individual modules of the codes. The second category of codes such as conventional FEM codes require parallelisation of individual modules. In this category, parallelisation of equation solution module poses major difficulties. Different solution schemes such as domain decomposition method (DDM), parallel active column solver and substructuring method are currently used on parallel processing systems. Two codes, FAIR and TABS belonging to each of these categories have been implemented on ANUPAM. The implementation details of these codes and the performance of different equation solvers are highlighted. (author). 5 refs., 12 figs., 1 tab
Robust visual tracking via structured multi-task sparse learning

KAUST Repository

Zhang, Tianzhu

2012-11-09

In this paper, we formulate object tracking in a particle filter framework as a structured multi-task sparse learning problem, which we denote as Structured Multi-Task Tracking (S-MTT). Since we model particles as linear combinations of dictionary templates that are updated dynamically, learning the representation of each particle is considered a single task in Multi-Task Tracking (MTT). By employing popular sparsity-inducing lp,q mixed norms (specifically p∈2,∞ and q=1), we regularize the representation problem to enforce joint sparsity and learn the particle representations together. As compared to previous methods that handle particles independently, our results demonstrate that mining the interdependencies between particles improves tracking performance and overall computational complexity. Interestingly, we show that the popular L1 tracker (Mei and Ling, IEEE Trans Pattern Anal Mach Intel 33(11):2259-2272, 2011) is a special case of our MTT formulation (denoted as the L11 tracker) when p=q=1. Under the MTT framework, some of the tasks (particle representations) are often more closely related and more likely to share common relevant covariates than other tasks. Therefore, we extend the MTT framework to take into account pairwise structural correlations between particles (e.g. spatial smoothness of representation) and denote the novel framework as S-MTT. The problem of learning the regularized sparse representation in MTT and S-MTT can be solved efficiently using an Accelerated Proximal Gradient (APG) method that yields a sequence of closed form updates. As such, S-MTT and MTT are computationally attractive. We test our proposed approach on challenging sequences involving heavy occlusion, drastic illumination changes, and large pose variations. Experimental results show that S-MTT is much better than MTT, and both methods consistently outperform state-of-the-art trackers. © 2012 Springer Science+Business Media New York.
Ab initio nuclear structure - the large sparse matrix eigenvalue problem

Energy Technology Data Exchange (ETDEWEB)

Vary, James P; Maris, Pieter [Department of Physics, Iowa State University, Ames, IA, 50011 (United States); Ng, Esmond; Yang, Chao [Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720 (United States); Sosonkina, Masha, E-mail: jvary@iastate.ed [Scalable Computing Laboratory, Ames Laboratory, Iowa State University, Ames, IA, 50011 (United States)

2009-07-01

The structure and reactions of light nuclei represent fundamental and formidable challenges for microscopic theory based on realistic strong interaction potentials. Several ab initio methods have now emerged that provide nearly exact solutions for some nuclear properties. The ab initio no core shell model (NCSM) and the no core full configuration (NCFC) method, frame this quantum many-particle problem as a large sparse matrix eigenvalue problem where one evaluates the Hamiltonian matrix in a basis space consisting of many-fermion Slater determinants and then solves for a set of the lowest eigenvalues and their associated eigenvectors. The resulting eigenvectors are employed to evaluate a set of experimental quantities to test the underlying potential. For fundamental problems of interest, the matrix dimension often exceeds 10{sup 10} and the number of nonzero matrix elements may saturate available storage on present-day leadership class facilities. We survey recent results and advances in solving this large sparse matrix eigenvalue problem. We also outline the challenges that lie ahead for achieving further breakthroughs in fundamental nuclear theory using these ab initio approaches.
Ab initio nuclear structure - the large sparse matrix eigenvalue problem

International Nuclear Information System (INIS)

Vary, James P; Maris, Pieter; Ng, Esmond; Yang, Chao; Sosonkina, Masha

2009-01-01

The structure and reactions of light nuclei represent fundamental and formidable challenges for microscopic theory based on realistic strong interaction potentials. Several ab initio methods have now emerged that provide nearly exact solutions for some nuclear properties. The ab initio no core shell model (NCSM) and the no core full configuration (NCFC) method, frame this quantum many-particle problem as a large sparse matrix eigenvalue problem where one evaluates the Hamiltonian matrix in a basis space consisting of many-fermion Slater determinants and then solves for a set of the lowest eigenvalues and their associated eigenvectors. The resulting eigenvectors are employed to evaluate a set of experimental quantities to test the underlying potential. For fundamental problems of interest, the matrix dimension often exceeds 10 10 and the number of nonzero matrix elements may saturate available storage on present-day leadership class facilities. We survey recent results and advances in solving this large sparse matrix eigenvalue problem. We also outline the challenges that lie ahead for achieving further breakthroughs in fundamental nuclear theory using these ab initio approaches.
Efficient Computation of Sparse Matrix Functions for Large-Scale Electronic Structure Calculations: The CheSS Library.

Science.gov (United States)

Mohr, Stephan; Dawson, William; Wagner, Michael; Caliste, Damien; Nakajima, Takahito; Genovese, Luigi

2017-10-10

We present CheSS, the "Chebyshev Sparse Solvers" library, which has been designed to solve typical problems arising in large-scale electronic structure calculations using localized basis sets. The library is based on a flexible and efficient expansion in terms of Chebyshev polynomials and presently features the calculation of the density matrix, the calculation of matrix powers for arbitrary powers, and the extraction of eigenvalues in a selected interval. CheSS is able to exploit the sparsity of the matrices and scales linearly with respect to the number of nonzero entries, making it well-suited for large-scale calculations. The approach is particularly adapted for setups leading to small spectral widths of the involved matrices and outperforms alternative methods in this regime. By coupling CheSS to the DFT code BigDFT, we show that such a favorable setup is indeed possible in practice. In addition, the approach based on Chebyshev polynomials can be massively parallelized, and CheSS exhibits excellent scaling up to thousands of cores even for relatively small matrix sizes.
Photoluminescence spectra of n-doped double quantum wells in a parallel magnetic field

International Nuclear Information System (INIS)

Huang, D.; Lyo, S.K.

1999-01-01

We show that the photoluminescence (PL) line shapes from tunnel-split ground sublevels of n-doped thin double quantum wells (DQW close-quote s) are sensitively modulated by an in-plane magnetic field B parallel at low temperatures (T). The modulation is caused by the B parallel -induced distortion of the electronic structure. The latter arises from the relative shift of the energy-dispersion parabolas of the two quantum wells (QW close-quote s) in rvec k space, both in the conduction and valence bands, and formation of an anticrossing gap in the conduction band. Using a self-consistent density-functional theory, the PL spectra and the band-gap narrowing are calculated as a function of B parallel , T, and the homogeneous linewidths. The PL spectra from symmetric and asymmetric DQW close-quote s are found to show strikingly different behavior. In symmetric DQW close-quote s with a high density of electrons, two PL peaks are obtained at B parallel =0, representing the interband transitions between the pair of the upper (i.e., antisymmetric) levels and that of the lower (i.e., symmetric) levels of the ground doublets. As B parallel increases, the upper PL peak develops an N-type kink, namely a maximum followed by a minimum, and merges with the lower peak, which rises monotonically as a function of B parallel due to the diamagnetic energy. When the electron density is low, however, only a single PL peak, arising from the transitions between the lower levels, is obtained. In asymmetric DQW close-quote s, the PL spectra show mainly one dominant peak at all B parallel close-quote s. In this case, the holes are localized in one of the QW close-quote s at low T and recombine only with the electrons in the same QW. At high electron densities, the upper PL peak shows an N-type kink like in symmetric DQW close-quote s. However, the lower peak is absent at low B parallel close-quote s because it arises from the inter-QW transitions. Reasonable agreement is obtained with recent

Biclustering via Sparse Singular Value Decomposition

KAUST Repository

Lee, Mihee

2010-02-16

Sparse singular value decomposition (SSVD) is proposed as a new exploratory analysis tool for biclustering or identifying interpretable row-column associations within high-dimensional data matrices. SSVD seeks a low-rank, checkerboard structured matrix approximation to data matrices. The desired checkerboard structure is achieved by forcing both the left- and right-singular vectors to be sparse, that is, having many zero entries. By interpreting singular vectors as regression coefficient vectors for certain linear regressions, sparsity-inducing regularization penalties are imposed to the least squares regression to produce sparse singular vectors. An efficient iterative algorithm is proposed for computing the sparse singular vectors, along with some discussion of penalty parameter selection. A lung cancer microarray dataset and a food nutrition dataset are used to illustrate SSVD as a biclustering method. SSVD is also compared with some existing biclustering methods using simulated datasets. © 2010, The International Biometric Society.
Multi-information fusion sparse coding with preserving local structure for hyperspectral image classification

Science.gov (United States)

Wei, Xiaohui; Zhu, Wen; Liao, Bo; Gu, Changlong; Li, Weibiao

2017-10-01

The key question of sparse coding (SC) is how to exploit the information that already exists to acquire the robust sparse representations (SRs) of distinguishing different objects for hyperspectral image (HSI) classification. We propose a multi-information fusion SC framework, which fuses the spectral, spatial, and label information in the same level, to solve the above question. In particular, pixels from disjointed spatial clusters, which are obtained by cutting the given HSI in space, are individually and sparsely encoded. Then, due to the importance of spatial structure, graph- and hypergraph-based regularizers are enforced to motivate the obtained representations smoothness and to preserve the local consistency for each spatial cluster. The latter simultaneously considers the spectrum, spatial, and label information of multiple pixels that have a great probability with the same label. Finally, a linear support vector machine is selected as the final classifier with the learned SRs as input. Experiments conducted on three frequently used real HSIs show that our methods can achieve satisfactory results compared with other state-of-the-art methods.
Parallel Narrative Structure in Paul Harding's "Tinkers"

Science.gov (United States)

Çirakli, Mustafa Zeki

2014-01-01

The present paper explores the implications of parallel narrative structure in Paul Harding's "Tinkers" (2009). Besides primarily recounting the two sets of parallel narratives, "Tinkers" also comprises of seemingly unrelated fragments such as excerpts from clock repair manuals and diaries. The main stories, however, told…
Manifold regularization for sparse unmixing of hyperspectral images.

Science.gov (United States)

Liu, Junmin; Zhang, Chunxia; Zhang, Jiangshe; Li, Huirong; Gao, Yuelin

2016-01-01

Recently, sparse unmixing has been successfully applied to spectral mixture analysis of remotely sensed hyperspectral images. Based on the assumption that the observed image signatures can be expressed in the form of linear combinations of a number of pure spectral signatures known in advance, unmixing of each mixed pixel in the scene is to find an optimal subset of signatures in a very large spectral library, which is cast into the framework of sparse regression. However, traditional sparse regression models, such as collaborative sparse regression , ignore the intrinsic geometric structure in the hyperspectral data. In this paper, we propose a novel model, called manifold regularized collaborative sparse regression , by introducing a manifold regularization to the collaborative sparse regression model. The manifold regularization utilizes a graph Laplacian to incorporate the locally geometrical structure of the hyperspectral data. An algorithm based on alternating direction method of multipliers has been developed for the manifold regularized collaborative sparse regression model. Experimental results on both the simulated and real hyperspectral data sets have demonstrated the effectiveness of our proposed model.
Parallel Synthesis of a Library of Symmetrically- and Dissymmetrically-disubstituted Imidazole-4,5-dicarboxamides Bearing Amino Acid Esters

Directory of Open Access Journals (Sweden)

Rosanna Solinas

2009-01-01

Full Text Available The imidazole-4,5-dicarboxylic acid scaffold is readily derivatized with amino acid esters to afford symmetrically- and dissymmetrically-disubstituted imidazole-4,5-dicarboxamides with intramolecularly hydrogen bonded conformations that predispose the presentation of amino acid pharmacophores. In this work, a total of 45 imidazole-4,5-dicarboxamides bearing amino acid esters were prepared by parallel synthesis. The library members were purified by column chromatography on silica gel and the purified compounds characterized by LC-MS with LC detection at 214 nm. A selection of the final compounds was also analyzed by 1H-NMR spectroscopy. The analytically pure final products have been submitted to the Molecular Library Small Molecule Repository (MLSMR for screening in the Molecular Library Screening Center Network (MLSCN as part of the NIH Roadmap.
A novel sandwich differential capacitive accelerometer with symmetrical double-sided serpentine beam-mass structure

International Nuclear Information System (INIS)

Xiao, D B; Li, Q S; Hou, Z Q; Wang, X H; Chen, Z H; Xia, D W; Wu, X Z

2016-01-01

This paper presents a novel differential capacitive silicon micro-accelerometer with symmetrical double-sided serpentine beam-mass sensing structure and glass–silicon–glass sandwich structure. The symmetrical double-sided serpentine beam-mass sensing structure is fabricated with a novel pre-buried mask fabrication technology, which is convenient for manufacturing multi-layer sensors. The glass–silicon–glass sandwich structure is realized by a double anodic bonding process. To solve the problem of the difficulty of leading out signals from the top and bottom layer simultaneously in the sandwich sensors, a silicon pillar structure is designed that is inherently simple and low-cost. The prototype is fabricated and tested. It has low noise performance (the peak to peak value is 40 μg) and μg-level Allan deviation of bias (2.2 μg in 1 h), experimentally demonstrating the effectiveness of the design and the novel fabrication technology. (paper)
A structured sparse regression method for estimating isoform expression level from multi-sample RNA-seq data.

Science.gov (United States)

Zhang, L; Liu, X J

2016-06-03

With the rapid development of next-generation high-throughput sequencing technology, RNA-seq has become a standard and important technique for transcriptome analysis. For multi-sample RNA-seq data, the existing expression estimation methods usually deal with each single-RNA-seq sample, and ignore that the read distributions are consistent across multiple samples. In the current study, we propose a structured sparse regression method, SSRSeq, to estimate isoform expression using multi-sample RNA-seq data. SSRSeq uses a non-parameter model to capture the general tendency of non-uniformity read distribution for all genes across multiple samples. Additionally, our method adds a structured sparse regularization, which not only incorporates the sparse specificity between a gene and its corresponding isoform expression levels, but also reduces the effects of noisy reads, especially for lowly expressed genes and isoforms. Four real datasets were used to evaluate our method on isoform expression estimation. Compared with other popular methods, SSRSeq reduced the variance between multiple samples, and produced more accurate isoform expression estimations, and thus more meaningful biological interpretations.
Totally parallel multilevel algorithms

Science.gov (United States)

Frederickson, Paul O.

1988-01-01

Four totally parallel algorithms for the solution of a sparse linear system have common characteristics which become quite apparent when they are implemented on a highly parallel hypercube such as the CM2. These four algorithms are Parallel Superconvergent Multigrid (PSMG) of Frederickson and McBryan, Robust Multigrid (RMG) of Hackbusch, the FFT based Spectral Algorithm, and Parallel Cyclic Reduction. In fact, all four can be formulated as particular cases of the same totally parallel multilevel algorithm, which are referred to as TPMA. In certain cases the spectral radius of TPMA is zero, and it is recognized to be a direct algorithm. In many other cases the spectral radius, although not zero, is small enough that a single iteration per timestep keeps the local error within the required tolerance.
General upper bounds on the runtime of parallel evolutionary algorithms.

Science.gov (United States)

Lässig, Jörg; Sudholt, Dirk

2014-01-01

We present a general method for analyzing the runtime of parallel evolutionary algorithms with spatially structured populations. Based on the fitness-level method, it yields upper bounds on the expected parallel runtime. This allows for a rigorous estimate of the speedup gained by parallelization. Tailored results are given for common migration topologies: ring graphs, torus graphs, hypercubes, and the complete graph. Example applications for pseudo-Boolean optimization show that our method is easy to apply and that it gives powerful results. In our examples the performance guarantees improve with the density of the topology. Surprisingly, even sparse topologies such as ring graphs lead to a significant speedup for many functions while not increasing the total number of function evaluations by more than a constant factor. We also identify which number of processors lead to the best guaranteed speedups, thus giving hints on how to parameterize parallel evolutionary algorithms.
SparseM: A Sparse Matrix Package for R *

Directory of Open Access Journals (Sweden)

Roger Koenker

2003-02-01

Full Text Available SparseM provides some basic R functionality for linear algebra with sparse matrices. Use of the package is illustrated by a family of linear model fitting functions that implement least squares methods for problems with sparse design matrices. Significant performance improvements in memory utilization and computational speed are possible for applications involving large sparse matrices.
Exploiting Symmetry on Parallel Architectures.

Science.gov (United States)

Stiller, Lewis Benjamin

1995-01-01

This thesis describes techniques for the design of parallel programs that solve well-structured problems with inherent symmetry. Part I demonstrates the reduction of such problems to generalized matrix multiplication by a group-equivariant matrix. Fast techniques for this multiplication are described, including factorization, orbit decomposition, and Fourier transforms over finite groups. Our algorithms entail interaction between two symmetry groups: one arising at the software level from the problem's symmetry and the other arising at the hardware level from the processors' communication network. Part II illustrates the applicability of our symmetry -exploitation techniques by presenting a series of case studies of the design and implementation of parallel programs. First, a parallel program that solves chess endgames by factorization of an associated dihedral group-equivariant matrix is described. This code runs faster than previous serial programs, and discovered it a number of results. Second, parallel algorithms for Fourier transforms for finite groups are developed, and preliminary parallel implementations for group transforms of dihedral and of symmetric groups are described. Applications in learning, vision, pattern recognition, and statistics are proposed. Third, parallel implementations solving several computational science problems are described, including the direct n-body problem, convolutions arising from molecular biology, and some communication primitives such as broadcast and reduce. Some of our implementations ran orders of magnitude faster than previous techniques, and were used in the investigation of various physical phenomena.
Structured Sparse Principal Components Analysis With the TV-Elastic Net Penalty.

Science.gov (United States)

de Pierrefeu, Amicie; Lofstedt, Tommy; Hadj-Selem, Fouad; Dubois, Mathieu; Jardri, Renaud; Fovet, Thomas; Ciuciu, Philippe; Frouin, Vincent; Duchesnay, Edouard

2018-02-01

Principal component analysis (PCA) is an exploratory tool widely used in data analysis to uncover the dominant patterns of variability within a population. Despite its ability to represent a data set in a low-dimensional space, PCA's interpretability remains limited. Indeed, the components produced by PCA are often noisy or exhibit no visually meaningful patterns. Furthermore, the fact that the components are usually non-sparse may also impede interpretation, unless arbitrary thresholding is applied. However, in neuroimaging, it is essential to uncover clinically interpretable phenotypic markers that would account for the main variability in the brain images of a population. Recently, some alternatives to the standard PCA approach, such as sparse PCA (SPCA), have been proposed, their aim being to limit the density of the components. Nonetheless, sparsity alone does not entirely solve the interpretability problem in neuroimaging, since it may yield scattered and unstable components. We hypothesized that the incorporation of prior information regarding the structure of the data may lead to improved relevance and interpretability of brain patterns. We therefore present a simple extension of the popular PCA framework that adds structured sparsity penalties on the loading vectors in order to identify the few stable regions in the brain images that capture most of the variability. Such structured sparsity can be obtained by combining, e.g., and total variation (TV) penalties, where the TV regularization encodes information on the underlying structure of the data. This paper presents the structured SPCA (denoted SPCA-TV) optimization framework and its resolution. We demonstrate SPCA-TV's effectiveness and versatility on three different data sets. It can be applied to any kind of structured data, such as, e.g., -dimensional array images or meshes of cortical surfaces. The gains of SPCA-TV over unstructured approaches (such as SPCA and ElasticNet PCA) or structured approach
Algorithm for advanced canonical coding of planar chemical structures that considers stereochemical and symmetric information.

Science.gov (United States)

Koichi, Shungo; Iwata, Satoru; Uno, Takeaki; Koshino, Hiroyuki; Satoh, Hiroko

2007-01-01

We describe a rigorous and fast algorithm for advanced canonical coding of planar chemical structures based on the algorithm of Faulon et al. (J. Chem. Inf. Comput. Sci. 2004, 44, 427-436). Our algorithm works well even for highly symmetric structures; moreover, an advantage of our algorithm includes providing a rigorous canonical numbering of atoms with a consideration of stereochemistry and recognizing symmetric moieties. The planar structural line notation with the canonical numbering is also fit for use with stereochemical line notation. These capabilities are usable for general purposes in chemical structural coding and are particularly essential for detecting equivalent atoms in NMR studies. This algorithm was implemented on a 13C NMR chemical shift prediction system CAST/CNMR. Applications of the algorithm to several organic compounds demonstrate the practical efficiency of the rigorous coding.
Conjugate gradient type methods for linear systems with complex symmetric coefficient matrices

Science.gov (United States)

Freund, Roland

1989-01-01

We consider conjugate gradient type methods for the solution of large sparse linear system Ax equals b with complex symmetric coefficient matrices A equals A(T). Such linear systems arise in important applications, such as the numerical solution of the complex Helmholtz equation. Furthermore, most complex non-Hermitian linear systems which occur in practice are actually complex symmetric. We investigate conjugate gradient type iterations which are based on a variant of the nonsymmetric Lanczos algorithm for complex symmetric matrices. We propose a new approach with iterates defined by a quasi-minimal residual property. The resulting algorithm presents several advantages over the standard biconjugate gradient method. We also include some remarks on the obvious approach to general complex linear systems by solving equivalent real linear systems for the real and imaginary parts of x. Finally, numerical experiments for linear systems arising from the complex Helmholtz equation are reported.
Parallelization for X-ray crystal structural analysis program

Energy Technology Data Exchange (ETDEWEB)

Watanabe, Hiroshi [Japan Atomic Energy Research Inst., Tokyo (Japan); Minami, Masayuki; Yamamoto, Akiji

1997-10-01

In this report we study vectorization and parallelization for X-ray crystal structural analysis program. The target machine is NEC SX-4 which is a distributed/shared memory type vector parallel supercomputer. X-ray crystal structural analysis is surveyed, and a new multi-dimensional discrete Fourier transform method is proposed. The new method is designed to have a very long vector length, so that it enables to obtain the 12.0 times higher performance result that the original code. Besides the above-mentioned vectorization, the parallelization by micro-task functions on SX-4 reaches 13.7 times acceleration in the part of multi-dimensional discrete Fourier transform with 14 CPUs, and 3.0 times acceleration in the whole program. Totally 35.9 times acceleration to the original 1CPU scalar version is achieved with vectorization and parallelization on SX-4. (author)
Improved Sparse Channel Estimation for Cooperative Communication Systems

Directory of Open Access Journals (Sweden)

Guan Gui

2012-01-01

Full Text Available Accurate channel state information (CSI is necessary at receiver for coherent detection in amplify-and-forward (AF cooperative communication systems. To estimate the channel, traditional methods, that is, least squares (LS and least absolute shrinkage and selection operator (LASSO, are based on assumptions of either dense channel or global sparse channel. However, LS-based linear method neglects the inherent sparse structure information while LASSO-based sparse channel method cannot take full advantage of the prior information. Based on the partial sparse assumption of the cooperative channel model, we propose an improved channel estimation method with partial sparse constraint. At first, by using sparse decomposition theory, channel estimation is formulated as a compressive sensing problem. Secondly, the cooperative channel is reconstructed by LASSO with partial sparse constraint. Finally, numerical simulations are carried out to confirm the superiority of proposed methods over global sparse channel estimation methods.
Synthesis and Structure of D3h-Symmetric Triptycene Trimaleimide

Directory of Open Access Journals (Sweden)

Anthony Linden

2010-01-01

Full Text Available A new D3h symmetric triptycene derivative has been synthesized with the aim of obtaining molecules that are able to assemble into porous structures, and can be used in the development of new ligands. The synthesis involves a Diels-Alder reaction as the key step, followed by an oxidation and the formation of a maleimide ring. Triptycene trimaleimide furnished single crystals which have been analyzed by means of X-ray diffraction.
A regularized matrix factorization approach to induce structured sparse-low-rank solutions in the EEG inverse problem

DEFF Research Database (Denmark)

Montoya-Martinez, Jair; Artes-Rodriguez, Antonio; Pontil, Massimiliano

2014-01-01

We consider the estimation of the Brain Electrical Sources (BES) matrix from noisy electroencephalographic (EEG) measurements, commonly named as the EEG inverse problem. We propose a new method to induce neurophysiological meaningful solutions, which takes into account the smoothness, structured...... sparsity, and low rank of the BES matrix. The method is based on the factorization of the BES matrix as a product of a sparse coding matrix and a dense latent source matrix. The structured sparse-low-rank structure is enforced by minimizing a regularized functional that includes the ℓ21-norm of the coding...... matrix and the squared Frobenius norm of the latent source matrix. We develop an alternating optimization algorithm to solve the resulting nonsmooth-nonconvex minimization problem. We analyze the convergence of the optimization procedure, and we compare, under different synthetic scenarios...
Thermally optimum spacing of vertical, natural convection cooled, parallel plates

Science.gov (United States)

Bar-Cohen, A.; Rohsenow, W. M.

Vertical two-dimensional channels formed by parallel plates or fins are a frequently encountered configuration in natural convection cooling in air of electronic equipment. In connection with the complexity of heat dissipation in vertical parallel plate arrays, little theoretical effort is devoted to thermal optimization of the relevant packaging configurations. The present investigation is concerned with the establishment of an analytical structure for analyses of such arrays, giving attention to useful relations for heat distribution patterns. The limiting relations for fully-developed laminar flow, in a symmetric isothermal or isoflux channel as well as in a channel with an insulated wall, are derived by use of a straightforward integral formulation.
MMS Observations of Large Guide Field Symmetric Reconnection Between Colliding Reconnection Jets at the Center of a Magnetic Flux Rope at the Magnetopause

Science.gov (United States)

Oieroset, M.; Phan, T. D.; Haggerty, C.; Shay, M. A.; Eastwood, J. P.; Gershman, D. J.; Drake, J. F.; Fujimoto, M.; Ergun, R. E.; Mozer, F. S.;

2016-01-01

We report evidence for reconnection between colliding reconnection jets in a compressed current sheet at the center of a magnetic flux rope at Earth's magnetopause. The reconnection involved nearly symmetric Inflow boundary conditions with a strong guide field of two. The thin (2.5 ion-skin depth (d(sub i) width) current sheet (at approximately 12 d(sub i) downstream of the X line) was well resolved by MMS, which revealed large asymmetries in plasma and field structures in the exhaust. Ion perpendicular heating, electron parallel heating, and density compression occurred on one side of the exhaust, while ion parallel heating and density depression were shifted to the other side. The normal electric field and double out-of-plane (bifurcated) currents spanned almost the entire exhaust. These observations are in good agreement with a kinetic simulation for similar boundary conditions, demonstrating in new detail that the structure of large guide field symmetric reconnection is distinctly different from antiparallel reconnection.

JTpack90: A parallel, object-based, Fortran 90 linear algebra package

Energy Technology Data Exchange (ETDEWEB)

Turner, J.A.; Kothe, D.B. [Los Alamos National Lab., NM (United States); Ferrell, R.C. [Cambridge Power Computing Associates, Ltd., Brookline, MA (United States)

1997-03-01

The authors have developed an object-based linear algebra package, currently with emphasis on sparse Krylov methods, driven primarily by needs of the Los Alamos National Laboratory parallel unstructured-mesh casting simulation tool Telluride. Support for a number of sparse storage formats, methods, and preconditioners have been implemented, driven primarily by application needs. They describe the object-based Fortran 90 approach, which enhances maintainability, performance, and extensibility, the parallelization approach using a new portable gather/scatter library (PGSLib), current capabilities and future plans, and present preliminary performance results on a variety of platforms.
A Sparse Self-Consistent Field Algorithm and Its Parallel Implementation: Application to Density-Functional-Based Tight Binding.

Science.gov (United States)

Scemama, Anthony; Renon, Nicolas; Rapacioli, Mathias

2014-06-10

We present an algorithm and its parallel implementation for solving a self-consistent problem as encountered in Hartree-Fock or density functional theory. The algorithm takes advantage of the sparsity of matrices through the use of local molecular orbitals. The implementation allows one to exploit efficiently modern symmetric multiprocessing (SMP) computer architectures. As a first application, the algorithm is used within the density-functional-based tight binding method, for which most of the computational time is spent in the linear algebra routines (diagonalization of the Fock/Kohn-Sham matrix). We show that with this algorithm (i) single point calculations on very large systems (millions of atoms) can be performed on large SMP machines, (ii) calculations involving intermediate size systems (1000-100 000 atoms) are also strongly accelerated and can run efficiently on standard servers, and (iii) the error on the total energy due to the use of a cutoff in the molecular orbital coefficients can be controlled such that it remains smaller than the SCF convergence criterion.
More on Generalizations and Modifications of Iterative Methods for Solving Large Sparse Indefinite Linear Systems

Directory of Open Access Journals (Sweden)

Jen-Yuan Chen

2014-01-01

Full Text Available Continuing from the works of Li et al. (2014, Li (2007, and Kincaid et al. (2000, we present more generalizations and modifications of iterative methods for solving large sparse symmetric and nonsymmetric indefinite systems of linear equations. We discuss a variety of iterative methods such as GMRES, MGMRES, MINRES, LQ-MINRES, QR MINRES, MMINRES, MGRES, and others.
Linear-scaling density-functional simulations of charged point defects in Al2O3 using hierarchical sparse matrix algebra.

Science.gov (United States)

Hine, N D M; Haynes, P D; Mostofi, A A; Payne, M C

2010-09-21

We present calculations of formation energies of defects in an ionic solid (Al(2)O(3)) extrapolated to the dilute limit, corresponding to a simulation cell of infinite size. The large-scale calculations required for this extrapolation are enabled by developments in the approach to parallel sparse matrix algebra operations, which are central to linear-scaling density-functional theory calculations. The computational cost of manipulating sparse matrices, whose sizes are determined by the large number of basis functions present, is greatly improved with this new approach. We present details of the sparse algebra scheme implemented in the ONETEP code using hierarchical sparsity patterns, and demonstrate its use in calculations on a wide range of systems, involving thousands of atoms on hundreds to thousands of parallel processes.
Structured building model reduction toward parallel simulation

Energy Technology Data Exchange (ETDEWEB)

Dobbs, Justin R. [Cornell University; Hencey, Brondon M. [Cornell University

2013-08-26

Building energy model reduction exchanges accuracy for improved simulation speed by reducing the number of dynamical equations. Parallel computing aims to improve simulation times without loss of accuracy but is poorly utilized by contemporary simulators and is inherently limited by inter-processor communication. This paper bridges these disparate techniques to implement efficient parallel building thermal simulation. We begin with a survey of three structured reduction approaches that compares their performance to a leading unstructured method. We then use structured model reduction to find thermal clusters in the building energy model and allocate processing resources. Experimental results demonstrate faster simulation and low error without any interprocessor communication.
An environment for parallel structuring of Fortran programs

International Nuclear Information System (INIS)

Sridharan, K.; McShea, M.; Denton, C.; Eventoff, B.; Browne, J.C.; Newton, P.; Ellis, M.; Grossbard, D.; Wise, T.; Clemmer, D.

1990-01-01

The paper describes and illustrates an environment for interactive support of the detection and implementation of macro-level parallelism in Fortran programs. The approach couples algorithms for dependence analysis with both innovative techniques for complexity management and capabilities for the measurement and analysis of the parallel computation structures generated through use of the environment. The resulting environment is complementary to the more common approach of seeking local parallelism by loop unrolling, either by an automatic compiler or manually. (orig.)
Harmonic maps of the bounded symmetric domains

International Nuclear Information System (INIS)

Xin, Y.L.

1994-06-01

A shrinking property of harmonic maps into R IV (2) is proved which is used to classify complete spacelike surfaces of the parallel mean curvature in R 4 2 with a reasonable condition on the Gauss image. Liouville-type theorems of harmonic maps from the higher dimensional bounded symmetric domains are also established. (author). 25 refs
A viewpoint on nearly conformally symmetric manifold

International Nuclear Information System (INIS)

Rahman, M.S.

1990-06-01

Some observations, with definition, on Nearly Conformally Symmetric (NCS) manifold are made. A number of theorems concerning conformal change of metric and parallel tensors on NCS manifolds are presented. It is illustrated that a manifold M = R n-1 x R + 1 , endowed with a special metric, is NCS but not of harmonic curvature. (author). 8 refs
Orthogonal Matching Pursuit for Enhanced Recovery of Sparse Geological Structures With the Ensemble Kalman Filter

KAUST Repository

Sana, Furrukh; Katterbauer, Klemens; Al-Naffouri, Tareq Y.; Hoteit, Ibrahim

2016-01-01

Estimating the locations and the structures of subsurface channels holds significant importance for forecasting the subsurface flow and reservoir productivity. These channels exhibit high permeability and are easily contrasted from the low-permeability rock formations in their surroundings. This enables formulating the flow channels estimation problem as a sparse field recovery problem. The ensemble Kalman filter (EnKF) is a widely used technique for the estimation and calibration of subsurface reservoir model parameters, such as permeability. However, the conventional EnKF framework does not provide an efficient mechanism to incorporate prior information on the wide varieties of subsurface geological structures, and often fails to recover and preserve flow channel structures. Recent works in the area of compressed sensing (CS) have shown that estimating in a sparse domain, using algorithms such as the orthogonal matching pursuit (OMP), may significantly improve the estimation quality when dealing with such problems. We propose two new, and computationally efficient, algorithms combining OMP with the EnKF to improve the estimation and recovery of the subsurface geological channels. Numerical experiments suggest that the proposed algorithms provide efficient mechanisms to incorporate and preserve structural information in the EnKF and result in significant improvements in recovering flow channel structures.
Orthogonal Matching Pursuit for Enhanced Recovery of Sparse Geological Structures With the Ensemble Kalman Filter

KAUST Repository

Sana, Furrukh

2016-02-23

Estimating the locations and the structures of subsurface channels holds significant importance for forecasting the subsurface flow and reservoir productivity. These channels exhibit high permeability and are easily contrasted from the low-permeability rock formations in their surroundings. This enables formulating the flow channels estimation problem as a sparse field recovery problem. The ensemble Kalman filter (EnKF) is a widely used technique for the estimation and calibration of subsurface reservoir model parameters, such as permeability. However, the conventional EnKF framework does not provide an efficient mechanism to incorporate prior information on the wide varieties of subsurface geological structures, and often fails to recover and preserve flow channel structures. Recent works in the area of compressed sensing (CS) have shown that estimating in a sparse domain, using algorithms such as the orthogonal matching pursuit (OMP), may significantly improve the estimation quality when dealing with such problems. We propose two new, and computationally efficient, algorithms combining OMP with the EnKF to improve the estimation and recovery of the subsurface geological channels. Numerical experiments suggest that the proposed algorithms provide efficient mechanisms to incorporate and preserve structural information in the EnKF and result in significant improvements in recovering flow channel structures.
Solid-State-NMR-Structure-Based Inhibitor Design to Achieve Selective Inhibition of the Parallel-in-Register β-Sheet versus Antiparallel Iowa Mutant β-Amyloid Fibrils.

Science.gov (United States)

Cheng, Qinghui; Qiang, Wei

2017-06-08

Solid-state nuclear magnetic resonance (ssNMR) spectroscopy has been widely applied to characterize the high-resolution structures of β-amyloid (Aβ) fibrils. While these structures provide crucial molecular insights on the deposition of amyloid plaques in Alzheimer's diseases (AD), ssNMR structures have been rarely used so far as the basis for designing inhibitors. It remains a challenge because the ssNMR-based Aβ fibril structures were usually obtained with sparsely isotope-labeled peptides with limited experimental constraints, where the structural models, especially the side-chain coordinates, showed restricted precision. However, these structural models often possess a higher accuracy within the hydrophobic core regions with more well-defined experimental data, which provide potential targets for the molecular design. This work presents an ssNMR-based molecular design to achieve selective inhibition of a particular type of Aβ fibrillar structure, which was formed with the Iowa mutant of Aβ with parallel-in-register β-sheet hydrophobic core. The results show that short peptides that mimic the C-terminal β-strands of the fibril may have a preference in binding to the parallel Aβ fibrils rather than the antiparallel fibrils, mainly due to the differences in the high-resolution structures in the fibril elongation interfaces. The Iowa mutant Aβ fibrils are utilized in this work mainly as a model to demonstrate the feasibility of the strategy because it is relatively straightforward to distinguish the parallel and antiparallel fibril structures using ssNMR. Our results suggest that it is potentially feasible to design structure-selective inhibitors and/or diagnostic agents to Aβ fibrils using ssNMR-based structural models.
Communications oriented programming of parallel iterative solutions of sparse linear systems

Science.gov (United States)

Patrick, M. L.; Pratt, T. W.

1986-01-01

Parallel algorithms are developed for a class of scientific computational problems by partitioning the problems into smaller problems which may be solved concurrently. The effectiveness of the resulting parallel solutions is determined by the amount and frequency of communication and synchronization and the extent to which communication can be overlapped with computation. Three different parallel algorithms for solving the same class of problems are presented, and their effectiveness is analyzed from this point of view. The algorithms are programmed using a new programming environment. Run-time statistics and experience obtained from the execution of these programs assist in measuring the effectiveness of these algorithms.
Research on Characteristics of New Energy Dissipation With Symmetrical Structure

Science.gov (United States)

Ming, Wen; Huang, Chun-mei; Huang, Hao-wen; Wang, Xin-fang

2018-03-01

Utilizing good energy consumption capacity of arc steel bar, a new energy dissipation with symmetrical structure was proposed in this article. On the base of collection experimental data of damper specimen Under low cyclic reversed loading, finite element models were built by using ANSYS software, and influences of parameter change (Conduction rod diameter, Actuation plate thickness, Diameter of arc steel rod, Curved bars initial bending) on energy dissipation performance were analyzed. Some useful conclusions which can lay foundations for practical application were drawn.
A flexible framework for sparse simultaneous component based data integration

Directory of Open Access Journals (Sweden)

Van Deun Katrijn

2011-11-01

Full Text Available Abstract 1 Background High throughput data are complex and methods that reveal structure underlying the data are most useful. Principal component analysis, frequently implemented as a singular value decomposition, is a popular technique in this respect. Nowadays often the challenge is to reveal structure in several sources of information (e.g., transcriptomics, proteomics that are available for the same biological entities under study. Simultaneous component methods are most promising in this respect. However, the interpretation of the principal and simultaneous components is often daunting because contributions of each of the biomolecules (transcripts, proteins have to be taken into account. 2 Results We propose a sparse simultaneous component method that makes many of the parameters redundant by shrinking them to zero. It includes principal component analysis, sparse principal component analysis, and ordinary simultaneous component analysis as special cases. Several penalties can be tuned that account in different ways for the block structure present in the integrated data. This yields known sparse approaches as the lasso, the ridge penalty, the elastic net, the group lasso, sparse group lasso, and elitist lasso. In addition, the algorithmic results can be easily transposed to the context of regression. Metabolomics data obtained with two measurement platforms for the same set of Escherichia coli samples are used to illustrate the proposed methodology and the properties of different penalties with respect to sparseness across and within data blocks. 3 Conclusion Sparse simultaneous component analysis is a useful method for data integration: First, simultaneous analyses of multiple blocks offer advantages over sequential and separate analyses and second, interpretation of the results is highly facilitated by their sparseness. The approach offered is flexible and allows to take the block structure in different ways into account. As such
A flexible framework for sparse simultaneous component based data integration.

Science.gov (United States)

Van Deun, Katrijn; Wilderjans, Tom F; van den Berg, Robert A; Antoniadis, Anestis; Van Mechelen, Iven

2011-11-15

High throughput data are complex and methods that reveal structure underlying the data are most useful. Principal component analysis, frequently implemented as a singular value decomposition, is a popular technique in this respect. Nowadays often the challenge is to reveal structure in several sources of information (e.g., transcriptomics, proteomics) that are available for the same biological entities under study. Simultaneous component methods are most promising in this respect. However, the interpretation of the principal and simultaneous components is often daunting because contributions of each of the biomolecules (transcripts, proteins) have to be taken into account. We propose a sparse simultaneous component method that makes many of the parameters redundant by shrinking them to zero. It includes principal component analysis, sparse principal component analysis, and ordinary simultaneous component analysis as special cases. Several penalties can be tuned that account in different ways for the block structure present in the integrated data. This yields known sparse approaches as the lasso, the ridge penalty, the elastic net, the group lasso, sparse group lasso, and elitist lasso. In addition, the algorithmic results can be easily transposed to the context of regression. Metabolomics data obtained with two measurement platforms for the same set of Escherichia coli samples are used to illustrate the proposed methodology and the properties of different penalties with respect to sparseness across and within data blocks. Sparse simultaneous component analysis is a useful method for data integration: First, simultaneous analyses of multiple blocks offer advantages over sequential and separate analyses and second, interpretation of the results is highly facilitated by their sparseness. The approach offered is flexible and allows to take the block structure in different ways into account. As such, structures can be found that are exclusively tied to one data platform
Discussion about the design for mesh data structure within the parallel framework

International Nuclear Information System (INIS)

Shi Guangmei; Wu Ruian; Wang Keying; Ji Xiaoyu; Hao Zhiming; Mo Jun; He Yingbo

2010-01-01

The mesh data structure, one of the fundamental data structure within the parallel framework, its design and realization level have an effect upon parallel capability of the parallel framework. Through the architecture and the fundamental data structure within some typical parallel framework relatively analyzed, such as JASMIN, SIERRA, and ITAPS, the design thought of parallel framework is discussed. Through borrowing ideas from layered set of services design about the SIERRA Framework, and combining with the objective of PANDA Framework in the near future, this paper present the rudimentary system about PANDA framework layered set of services. On this foundation, detailed introduction is placed in the definition and the management of the mesh data structure that it is located in the underlayer of the PANDA framework. The design and realization about parallel distributed mesh data structure of PANDA are emphatically discussed. The PANDA framework extension and application program development based on PANDA framework are grounded on our efforts.
Design of tryptophan-containing mutants of the symmetrical Pizza protein for biophysical studies.

Science.gov (United States)

Noguchi, Hiroki; Mylemans, Bram; De Zitter, Elke; Van Meervelt, Luc; Tame, Jeremy R H; Voet, Arnout

2018-03-18

β-propeller proteins are highly symmetrical, being composed of a repeated motif with four anti-parallel β-sheets arranged around a central axis. Recently we designed the first completely symmetrical β-propeller protein, Pizza6, consisting of six identical tandem repeats. Pizza6 is expected to prove a useful building block for bionanotechnology, and also a tool to investigate the folding and evolution of β-propeller proteins. Folding studies are made difficult by the high stability and the lack of buried Trp residues to act as monitor fluorophores, so we have designed and characterized several Trp-containing Pizza6 derivatives. In total four proteins were designed, of which three could be purified and characterized. Crystal structures confirm these mutant proteins maintain the expected structure, and a clear redshift of Trp fluorescence emission could be observed upon denaturation. Among the derivative proteins, Pizza6-AYW appears to be the most suitable model protein for future folding/unfolding kinetics studies as it has a comparable stability as natural β-propeller proteins. Copyright © 2018 Elsevier Inc. All rights reserved.
Eighth SIAM conference on parallel processing for scientific computing: Final program and abstracts

Energy Technology Data Exchange (ETDEWEB)

NONE

1997-12-31

This SIAM conference is the premier forum for developments in parallel numerical algorithms, a field that has seen very lively and fruitful developments over the past decade, and whose health is still robust. Themes for this conference were: combinatorial optimization; data-parallel languages; large-scale parallel applications; message-passing; molecular modeling; parallel I/O; parallel libraries; parallel software tools; parallel compilers; particle simulations; problem-solving environments; and sparse matrix computations.
Dictionary Learning Based on Nonnegative Matrix Factorization Using Parallel Coordinate Descent

Directory of Open Access Journals (Sweden)

Zunyi Tang

2013-01-01

Full Text Available Sparse representation of signals via an overcomplete dictionary has recently received much attention as it has produced promising results in various applications. Since the nonnegativities of the signals and the dictionary are required in some applications, for example, multispectral data analysis, the conventional dictionary learning methods imposed simply with nonnegativity may become inapplicable. In this paper, we propose a novel method for learning a nonnegative, overcomplete dictionary for such a case. This is accomplished by posing the sparse representation of nonnegative signals as a problem of nonnegative matrix factorization (NMF with a sparsity constraint. By employing the coordinate descent strategy for optimization and extending it to multivariable case for processing in parallel, we develop a so-called parallel coordinate descent dictionary learning (PCDDL algorithm, which is structured by iteratively solving the two optimal problems, the learning process of the dictionary and the estimating process of the coefficients for constructing the signals. Numerical experiments demonstrate that the proposed algorithm performs better than the conventional nonnegative K-SVD (NN-KSVD algorithm and several other algorithms for comparison. What is more, its computational consumption is remarkably lower than that of the compared algorithms.
Breaking symmetry in the structure determination of (large) symmetric protein dimers

Energy Technology Data Exchange (ETDEWEB)

Gaponenko, Vadim; Altieri, Amanda S.; Li, Jess; Byrd, R. Andrew [National Cancer Institute, Structural Biophysics Laboratory (United States)], E-mail: rabyrd@ncifcrf.gov

2002-10-15

We demonstrate a novel methodology to disrupt the symmetry in the NMR spectra of homodimers. A paramagnetic probe is introduced sub-stoichiometrically to create an asymmetric system with the paramagnetic probe residing on only one monomer within the dimer. This creates sufficient magnetic anisotropy for resolution of symmetry-related overlapped resonances and, consequently, detection of pseudocontact shifts and residual dipolar couplings specific to each monomeric component. These pseudocontact shifts can be readily incorporated into existing structure refinement calculations and enable determination of monomer orientation within the dimeric protein. This methodology can be widely used for solution structure determination of symmetric dimers.

Anti-symmetrized molecular dynamics: a new insight into the structure of nuclei

International Nuclear Information System (INIS)

Yoshiko, Kanada-En'yo; Masaaki, Kimura; Hisashi, Horiuchi

2003-01-01

The AMD (anti-symmetrized molecular dynamics) theory for nuclear structure is explained by showing its actual applications. First the formulation of AMD including various refined versions is briefly presented and its characteristics are discussed, putting a stress on its nature as an 'ab initio' theory. Then we demonstrate fruitful applications to various structure problems in stable nuclei, in order to explicitly verify the 'ab initio' nature of AMD, especially the ability to describe both mean-field-type structure and cluster structure. Finally, we show the results of applications of AMD to unstable nuclei, from which we see that AMD is powerful in elucidating and understanding various types of nuclear structure of unstable nuclei. (authors)
Discrete Sparse Coding.

Science.gov (United States)

Exarchakis, Georgios; Lücke, Jörg

2017-11-01

Sparse coding algorithms with continuous latent variables have been the subject of a large number of studies. However, discrete latent spaces for sparse coding have been largely ignored. In this work, we study sparse coding with latents described by discrete instead of continuous prior distributions. We consider the general case in which the latents (while being sparse) can take on any value of a finite set of possible values and in which we learn the prior probability of any value from data. This approach can be applied to any data generated by discrete causes, and it can be applied as an approximation of continuous causes. As the prior probabilities are learned, the approach then allows for estimating the prior shape without assuming specific functional forms. To efficiently train the parameters of our probabilistic generative model, we apply a truncated expectation-maximization approach (expectation truncation) that we modify to work with a general discrete prior. We evaluate the performance of the algorithm by applying it to a variety of tasks: (1) we use artificial data to verify that the algorithm can recover the generating parameters from a random initialization, (2) use image patches of natural images and discuss the role of the prior for the extraction of image components, (3) use extracellular recordings of neurons to present a novel method of analysis for spiking neurons that includes an intuitive discretization strategy, and (4) apply the algorithm on the task of encoding audio waveforms of human speech. The diverse set of numerical experiments presented in this letter suggests that discrete sparse coding algorithms can scale efficiently to work with realistic data sets and provide novel statistical quantities to describe the structure of the data.
Threshold partitioning of sparse matrices and applications to Markov chains

Energy Technology Data Exchange (ETDEWEB)

Choi, Hwajeong; Szyld, D.B. [Temple Univ., Philadelphia, PA (United States)

1996-12-31

It is well known that the order of the variables and equations of a large, sparse linear system influences the performance of classical iterative methods. In particular if, after a symmetric permutation, the blocks in the diagonal have more nonzeros, classical block methods have a faster asymptotic rate of convergence. In this paper, different ordering and partitioning algorithms for sparse matrices are presented. They are modifications of PABLO. In the new algorithms, in addition to the location of the nonzeros, the values of the entries are taken into account. The matrix resulting after the symmetric permutation has dense blocks along the diagonal, and small entries in the off-diagonal blocks. Parameters can be easily adjusted to obtain, for example, denser blocks, or blocks with elements of larger magnitude. In particular, when the matrices represent Markov chains, the permuted matrices are well suited for block iterative methods that find the corresponding probability distribution. Applications to three types of methods are explored: (1) Classical block methods, such as Block Gauss Seidel. (2) Preconditioned GMRES, where a block diagonal preconditioner is used. (3) Iterative aggregation method (also called aggregation/disaggregation) where the partition obtained from the ordering algorithm with certain parameters is used as an aggregation scheme. In all three cases, experiments are presented which illustrate the performance of the methods with the new orderings. The complexity of the new algorithms is linear in the number of nonzeros and the order of the matrix, and thus adding little computational effort to the overall solution.
Preconditioned conjugate gradient technique for the analysis of symmetric anisotropic structures

Science.gov (United States)

Noor, Ahmed K.; Peters, Jeanne M.

1987-01-01

An efficient preconditioned conjugate gradient (PCG) technique and a computational procedure are presented for the analysis of symmetric anisotropic structures. The technique is based on selecting the preconditioning matrix as the orthotropic part of the global stiffness matrix of the structure, with all the nonorthotropic terms set equal to zero. This particular choice of the preconditioning matrix results in reducing the size of the analysis model of the anisotropic structure to that of the corresponding orthotropic structure. The similarities between the proposed PCG technique and a reduction technique previously presented by the authors are identified and exploited to generate from the PCG technique direct measures for the sensitivity of the different response quantities to the nonorthotropic (anisotropic) material coefficients of the structure. The effectiveness of the PCG technique is demonstrated by means of a numerical example of an anisotropic cylindrical panel.
Parallel Computing Strategies for Irregular Algorithms

Science.gov (United States)

Biswas, Rupak; Oliker, Leonid; Shan, Hongzhang; Biegel, Bryan (Technical Monitor)

2002-01-01

Parallel computing promises several orders of magnitude increase in our ability to solve realistic computationally-intensive problems, but relies on their efficient mapping and execution on large-scale multiprocessor architectures. Unfortunately, many important applications are irregular and dynamic in nature, making their effective parallel implementation a daunting task. Moreover, with the proliferation of parallel architectures and programming paradigms, the typical scientist is faced with a plethora of questions that must be answered in order to obtain an acceptable parallel implementation of the solution algorithm. In this paper, we consider three representative irregular applications: unstructured remeshing, sparse matrix computations, and N-body problems, and parallelize them using various popular programming paradigms on a wide spectrum of computer platforms ranging from state-of-the-art supercomputers to PC clusters. We present the underlying problems, the solution algorithms, and the parallel implementation strategies. Smart load-balancing, partitioning, and ordering techniques are used to enhance parallel performance. Overall results demonstrate the complexity of efficiently parallelizing irregular algorithms.
Impact of repeated uniaxial mechanical strain on flexible a-IGZO thin film transistors with symmetric and asymmetric structures

Science.gov (United States)

Liao, Po-Yung; Chang, Ting-Chang; Su, Wan-Ching; Chen, Bo-Wei; Chen, Li-Hui; Hsieh, Tien-Yu; Yang, Chung-Yi; Chang, Kuan-Chang; Zhang, Sheng-Dong; Huang, Yen-Yu; Chang, Hsi-Ming; Chiang, Shin-Chuan

2017-06-01

This letter investigates repeated uniaxial mechanical stress-induced degradation behavior in flexible amorphous In-Ga-Zn-O thin-film transistors (TFTs) of different geometric structures. Two types of via-contact structure TFTs are investigated: symmetrical and UI structure (TFTs with I- and U-shaped asymmetric electrodes). After repeated mechanical stress, I-V curves for the symmetrical structure show a significant negative threshold voltage (VT) shift, due to mechanical stress-induced oxygen vacancy generation. However, degradation in the UI structure TFTs after stress is a negative VT shift along with the parasitic transistor characteristic in the forward-operation mode, with this hump not evident in the reverse-operation mode. This asymmetrical degradation is clarified by the mechanical strain simulation of the UI TFTs.
Estimation of kinship coefficient in structured and admixed populations using sparse sequencing data.

Directory of Open Access Journals (Sweden)

Jinzhuang Dou

2017-09-01

Full Text Available Knowledge of biological relatedness between samples is important for many genetic studies. In large-scale human genetic association studies, the estimated kinship is used to remove cryptic relatedness, control for family structure, and estimate trait heritability. However, estimation of kinship is challenging for sparse sequencing data, such as those from off-target regions in target sequencing studies, where genotypes are largely uncertain or missing. Existing methods often assume accurate genotypes at a large number of markers across the genome. We show that these methods, without accounting for the genotype uncertainty in sparse sequencing data, can yield a strong downward bias in kinship estimation. We develop a computationally efficient method called SEEKIN to estimate kinship for both homogeneous samples and heterogeneous samples with population structure and admixture. Our method models genotype uncertainty and leverages linkage disequilibrium through imputation. We test SEEKIN on a whole exome sequencing dataset (WES of Singapore Chinese and Malays, which involves substantial population structure and admixture. We show that SEEKIN can accurately estimate kinship coefficient and classify genetic relatedness using off-target sequencing data down sampled to ~0.15X depth. In application to the full WES dataset without down sampling, SEEKIN also outperforms existing methods by properly analyzing shallow off-target data (~0.75X. Using both simulated and real phenotypes, we further illustrate how our method improves estimation of trait heritability for WES studies.
The effect of axial loads on free vibration of symmetric frame structures using continuous system method

Directory of Open Access Journals (Sweden)

Elham Ghandi

2016-09-01

Full Text Available The free vibration of frame structures has been usually studied in literature without considering the effect of axial loads. In this paper, the continuous system method is employed to investigate this effect on the free flexural and torsional vibration of two and three dimensional symmetric frames. In the continuous system method, in approximate analysis of buildings, commonly, the structure is replaced by an equivalent beam which matches the dominant characteristics of the structure. Accordingly, the natural frequencies of the symmetric frame structures are obtained through solving the governing differential equation of the equivalent beam whose stiffness and mass are supposed to be uniformly distributed along the length. The corresponding axial load applied to the replaced beam is calculated based on the total weight and the number of stories of the building. A numerical example is presented to show the simplicity and efficiency of the proposed solution.
Structure-aware Local Sparse Coding for Visual Tracking

KAUST Repository

Qi, Yuankai; Qin, Lei; Zhang, Jian; Zhang, Shengping; Huang, Qingming; Yang, Ming-Hsuan

2018-01-01

with the corresponding local regions of the target templates that are the most similar from the global view. Thus, a more precise and discriminative sparse representation is obtained to account for appearance changes. To alleviate the issues with tracking drifts, we
The symmetric extendibility of quantum states

International Nuclear Information System (INIS)

Nowakowski, Marcin L

2016-01-01

Studies on the symmetric extendibility of quantum states have become particularly important in the context of the analysis of one-way quantum measures of entanglement, and the distillability and security of quantum protocols. In this paper we analyze composite systems containing a symmetric extendible part, with particular attention devoted to the one-way security of such systems. Further, we introduce a new one-way entanglement monotone based on the best symmetric approximation of a quantum state and the extendible number of a quantum state. We underpin these results with geometric observations about the structures of multi-party settings which posses substantial symmetric extendible components in their subspaces. The impossibility of reducing the maximal symmetric extendibility by means of the one-way local operations and classical communication method is pointed out on multiple copies. Finally, we state a conjecture linking symmetric extendibility with the one-way distillability and security of all quantum states, analyzing the behavior of a private key in the neighborhood of symmetric extendible states. (paper)
A GPU-paralleled implementation of an enhanced face recognition algorithm

Science.gov (United States)

Chen, Hao; Liu, Xiyang; Shao, Shuai; Zan, Jiguo

2013-03-01

Face recognition algorithm based on compressed sensing and sparse representation is hotly argued in these years. The scheme of this algorithm increases recognition rate as well as anti-noise capability. However, the computational cost is expensive and has become a main restricting factor for real world applications. In this paper, we introduce a GPU-accelerated hybrid variant of face recognition algorithm named parallel face recognition algorithm (pFRA). We describe here how to carry out parallel optimization design to take full advantage of many-core structure of a GPU. The pFRA is tested and compared with several other implementations under different data sample size. Finally, Our pFRA, implemented with NVIDIA GPU and Computer Unified Device Architecture (CUDA) programming model, achieves a significant speedup over the traditional CPU implementations.
Multilevel sparse functional principal component analysis.

Science.gov (United States)

Di, Chongzhi; Crainiceanu, Ciprian M; Jank, Wolfgang S

2014-01-29

We consider analysis of sparsely sampled multilevel functional data, where the basic observational unit is a function and data have a natural hierarchy of basic units. An example is when functions are recorded at multiple visits for each subject. Multilevel functional principal component analysis (MFPCA; Di et al. 2009) was proposed for such data when functions are densely recorded. Here we consider the case when functions are sparsely sampled and may contain only a few observations per function. We exploit the multilevel structure of covariance operators and achieve data reduction by principal component decompositions at both between and within subject levels. We address inherent methodological differences in the sparse sampling context to: 1) estimate the covariance operators; 2) estimate the functional principal component scores; 3) predict the underlying curves. Through simulations the proposed method is able to discover dominating modes of variations and reconstruct underlying curves well even in sparse settings. Our approach is illustrated by two applications, the Sleep Heart Health Study and eBay auctions.
Hypergraph partitioning implementation for parallelizing matrix-vector multiplication using CUDA GPU-based parallel computing

Science.gov (United States)

Murni, Bustamam, A.; Ernastuti, Handhika, T.; Kerami, D.

2017-07-01

Calculation of the matrix-vector multiplication in the real-world problems often involves large matrix with arbitrary size. Therefore, parallelization is needed to speed up the calculation process that usually takes a long time. Graph partitioning techniques that have been discussed in the previous studies cannot be used to complete the parallelized calculation of matrix-vector multiplication with arbitrary size. This is due to the assumption of graph partitioning techniques that can only solve the square and symmetric matrix. Hypergraph partitioning techniques will overcome the shortcomings of the graph partitioning technique. This paper addresses the efficient parallelization of matrix-vector multiplication through hypergraph partitioning techniques using CUDA GPU-based parallel computing. CUDA (compute unified device architecture) is a parallel computing platform and programming model that was created by NVIDIA and implemented by the GPU (graphics processing unit).
Basic design of parallel computational program for probabilistic structural analysis

International Nuclear Information System (INIS)

Kaji, Yoshiyuki; Arai, Taketoshi; Gu, Wenwei; Nakamura, Hitoshi

1999-06-01

In our laboratory, for 'development of damage evaluation method of structural brittle materials by microscopic fracture mechanics and probabilistic theory' (nuclear computational science cross-over research) we examine computational method related to super parallel computation system which is coupled with material strength theory based on microscopic fracture mechanics for latent cracks and continuum structural model to develop new structural reliability evaluation methods for ceramic structures. This technical report is the review results regarding probabilistic structural mechanics theory, basic terms of formula and program methods of parallel computation which are related to principal terms in basic design of computational mechanics program. (author)
Basic design of parallel computational program for probabilistic structural analysis

Energy Technology Data Exchange (ETDEWEB)

Kaji, Yoshiyuki; Arai, Taketoshi [Japan Atomic Energy Research Inst., Tokai, Ibaraki (Japan). Tokai Research Establishment; Gu, Wenwei; Nakamura, Hitoshi

1999-06-01

In our laboratory, for `development of damage evaluation method of structural brittle materials by microscopic fracture mechanics and probabilistic theory` (nuclear computational science cross-over research) we examine computational method related to super parallel computation system which is coupled with material strength theory based on microscopic fracture mechanics for latent cracks and continuum structural model to develop new structural reliability evaluation methods for ceramic structures. This technical report is the review results regarding probabilistic structural mechanics theory, basic terms of formula and program methods of parallel computation which are related to principal terms in basic design of computational mechanics program. (author)
An NoC Traffic Compiler for Efficient FPGA Implementation of Sparse Graph-Oriented Workloads

Directory of Open Access Journals (Sweden)

Nachiket Kapre

2011-01-01

synchronization to optimize our workloads for large networks up to 2025 parallel elements for BSP model and 25 parallel elements for Token Dataflow. This allows us to demonstrate speedups between 1.2× and 22× (3.5× mean, area reductions (number of Processing Elements between 3× and 15× (9× mean and dynamic energy savings between 2× and 3.5× (2.7× mean over a range of real-world graph applications in the BSP compute model. We deliver speedups of 0.5–13× (geomean 3.6× for Sparse Direct Matrix Solve (Token Dataflow compute model applied to a range of sparse matrices when using a high-quality placement algorithm. We expect such traffic optimization tools and techniques to become an essential part of the NoC application-mapping flow.
The electronic structure of quasi-one-dimensional disordered systems with parallel multi-chains

International Nuclear Information System (INIS)

Liu Xiaoliang; Xu Hui; Deng Chaosheng; Ma Songshan

2006-01-01

For the quasi-one-dimensional disordered systems with parallel multi-chains, taking a special method to code the sites and just considering the nearest-neighbor hopping integral, we write the systems' Hamiltonians as precisely symmetric matrixes, which can be transformed into three diagonally symmetric matrixes by using the Householder transformation. The densities of states, the localization lengths and the conductance of the systems are calculated numerically using the minus eigenvalue theory and the transfer matrix method. From the results of quasi-one-dimensional disordered systems with varied chains, we find, the energy band of the systems extends slightly, the energy gaps are observed and the distribution of the density of states changes obviously with the increase of the dimensionality. Especially, for the systems with four, five or six chains, at the energy band center, there exist extended states whose localization lengths are greater than the size of the systems, accordingly, there having great conductance. With the increasing of the number of the chains, the correlated ranges expand and the systems present the similar behavior to that with off-diagonal long-range correlation
Aerial Observations of Symmetric Instability at the North Wall of the Gulf Stream

Science.gov (United States)

Savelyev, I.; Thomas, L. N.; Smith, G. B.; Wang, Q.; Shearman, R. K.; Haack, T.; Christman, A. J.; Blomquist, B.; Sletten, M.; Miller, W. D.; Fernando, H. J. S.

2018-01-01

An unusual spatial pattern on the ocean surface was captured by thermal airborne swaths taken across a strong sea surface temperature front at the North Wall of the Gulf Stream. The thermal pattern on the cold side of the front resembles a staircase consisting of tens of steps, each up to ˜200 m wide and up to ˜0.3°C warm. The steps are well organized, clearly separated by sharp temperature gradients, mostly parallel and aligned with the primary front. The interpretation of the airborne imagery is aided by oceanographic measurements from two research vessels. Analysis of the in situ observations indicates that the front was unstable to symmetric instability, a type of overturning instability that can generate coherent structures with similar dimensions to the temperature steps seen in the airborne imagery. It is concluded that the images capture, for the first time, the surface temperature field of symmetric instability turbulence.
Super-symmetric informationally complete measurements

Energy Technology Data Exchange (ETDEWEB)

Zhu, Huangjun, E-mail: hzhu@pitp.ca

2015-11-15

Symmetric informationally complete measurements (SICs in short) are highly symmetric structures in the Hilbert space. They possess many nice properties which render them an ideal candidate for fiducial measurements. The symmetry of SICs is intimately connected with the geometry of the quantum state space and also has profound implications for foundational studies. Here we explore those SICs that are most symmetric according to a natural criterion and show that all of them are covariant with respect to the Heisenberg–Weyl groups, which are characterized by the discrete analog of the canonical commutation relation. Moreover, their symmetry groups are subgroups of the Clifford groups. In particular, we prove that the SIC in dimension 2, the Hesse SIC in dimension 3, and the set of Hoggar lines in dimension 8 are the only three SICs up to unitary equivalence whose symmetry groups act transitively on pairs of SIC projectors. Our work not only provides valuable insight about SICs, Heisenberg–Weyl groups, and Clifford groups, but also offers a new approach and perspective for studying many other discrete symmetric structures behind finite state quantum mechanics, such as mutually unbiased bases and discrete Wigner functions.
Parallel-aware, dedicated job co-scheduling within/across symmetric multiprocessing nodes

Science.gov (United States)

Jones, Terry R.; Watson, Pythagoras C.; Tuel, William; Brenner, Larry; ,Caffrey, Patrick; Fier, Jeffrey

2010-10-05

In a parallel computing environment comprising a network of SMP nodes each having at least one processor, a parallel-aware co-scheduling method and system for improving the performance and scalability of a dedicated parallel job having synchronizing collective operations. The method and system uses a global co-scheduler and an operating system kernel dispatcher adapted to coordinate interfering system and daemon activities on a node and across nodes to promote intra-node and inter-node overlap of said interfering system and daemon activities as well as intra-node and inter-node overlap of said synchronizing collective operations. In this manner, the impact of random short-lived interruptions, such as timer-decrement processing and periodic daemon activity, on synchronizing collective operations is minimized on large processor-count SPMD bulk-synchronous programming styles.

Parallel Framework for Cooperative Processes

Directory of Open Access Journals (Sweden)

Mitică Craus

2005-01-01

Full Text Available This paper describes the work of an object oriented framework designed to be used in the parallelization of a set of related algorithms. The idea behind the system we are describing is to have a re-usable framework for running several sequential algorithms in a parallel environment. The algorithms that the framework can be used with have several things in common: they have to run in cycles and the work should be possible to be split between several "processing units". The parallel framework uses the message-passing communication paradigm and is organized as a master-slave system. Two applications are presented: an Ant Colony Optimization (ACO parallel algorithm for the Travelling Salesman Problem (TSP and an Image Processing (IP parallel algorithm for the Symmetrical Neighborhood Filter (SNF. The implementations of these applications by means of the parallel framework prove to have good performances: approximatively linear speedup and low communication cost.
Sparse Bayesian Inference and the Temperature Structure of the Solar Corona

Energy Technology Data Exchange (ETDEWEB)

Warren, Harry P. [Space Science Division, Naval Research Laboratory, Washington, DC 20375 (United States); Byers, Jeff M. [Materials Science and Technology Division, Naval Research Laboratory, Washington, DC 20375 (United States); Crump, Nicholas A. [Naval Center for Space Technology, Naval Research Laboratory, Washington, DC 20375 (United States)

2017-02-20

Measuring the temperature structure of the solar atmosphere is critical to understanding how it is heated to high temperatures. Unfortunately, the temperature of the upper atmosphere cannot be observed directly, but must be inferred from spectrally resolved observations of individual emission lines that span a wide range of temperatures. Such observations are “inverted” to determine the distribution of plasma temperatures along the line of sight. This inversion is ill posed and, in the absence of regularization, tends to produce wildly oscillatory solutions. We introduce the application of sparse Bayesian inference to the problem of inferring the temperature structure of the solar corona. Within a Bayesian framework a preference for solutions that utilize a minimum number of basis functions can be encoded into the prior and many ad hoc assumptions can be avoided. We demonstrate the efficacy of the Bayesian approach by considering a test library of 40 assumed temperature distributions.
A novel parallel pipeline structure of VP9 decoder

Science.gov (United States)

Qin, Huabiao; Chen, Wu; Yi, Sijun; Tan, Yunfei; Yi, Huan

2018-04-01

To improve the efficiency of VP9 decoder, a novel parallel pipeline structure of VP9 decoder is presented in this paper. According to the decoding workflow, VP9 decoder can be divided into sub-modules which include entropy decoding, inverse quantization, inverse transform, intra prediction, inter prediction, deblocking and pixel adaptive compensation. By analyzing the computing time of each module, hotspot modules are located and the causes of low efficiency of VP9 decoder can be found. Then, a novel pipeline decoder structure is designed by using mixed parallel decoding methods of data division and function division. The experimental results show that this structure can greatly improve the decoding efficiency of VP9.
Parallel adaptation of a vectorised quantumchemical program system

International Nuclear Information System (INIS)

Van Corler, L.C.H.; Van Lenthe, J.H.

1987-01-01

Supercomputers, like the CRAY 1 or the Cyber 205, have had, and still have, a marked influence on Quantum Chemistry. Vectorization has led to a considerable increase in the performance of Quantum Chemistry programs. However, clockcycle times more than a factor 10 smaller than those of the present supercomputers are not to be expected. Therefore future supercomputers will have to depend on parallel structures. Recently, the first examples of such supercomputers have been installed. To be prepared for this new generation of (parallel) supercomputers one should consider the concepts one wants to use and the kind of problems one will encounter during implementation of existing vectorized programs on those parallel systems. The authors implemented four important parts of a large quantumchemical program system (ATMOL), i.e. integrals, SCF, 4-index and Direct-CI in the parallel environment at ECSEC (Rome, Italy). This system offers simulated parallellism on the host computer (IBM 4381) and real parallellism on at most 10 attached processors (FPS-164). Quantumchemical programs usually handle large amounts of data and very large, often sparse matrices. The transfer of that many data can cause problems concerning communication and overhead, in view of which shared memory and shared disks must be considered. The strategy and the tools that were used to parallellise the programs are shown. Also, some examples are presented to illustrate effectiveness and performance of the system in Rome for these type of calculations
Symmetric Kullback-Leibler Metric Based Tracking Behaviors for Bioinspired Robotic Eyes.

Science.gov (United States)

Liu, Hengli; Luo, Jun; Wu, Peng; Xie, Shaorong; Li, Hengyu

2015-01-01

A symmetric Kullback-Leibler metric based tracking system, capable of tracking moving targets, is presented for a bionic spherical parallel mechanism to minimize a tracking error function to simulate smooth pursuit of human eyes. More specifically, we propose a real-time moving target tracking algorithm which utilizes spatial histograms taking into account symmetric Kullback-Leibler metric. In the proposed algorithm, the key spatial histograms are extracted and taken into particle filtering framework. Once the target is identified, an image-based control scheme is implemented to drive bionic spherical parallel mechanism such that the identified target is to be tracked at the center of the captured images. Meanwhile, the robot motion information is fed forward to develop an adaptive smooth tracking controller inspired by the Vestibuloocular Reflex mechanism. The proposed tracking system is designed to make the robot track dynamic objects when the robot travels through transmittable terrains, especially bumpy environment. To perform bumpy-resist capability under the condition of violent attitude variation when the robot works in the bumpy environment mentioned, experimental results demonstrate the effectiveness and robustness of our bioinspired tracking system using bionic spherical parallel mechanism inspired by head-eye coordination.
Symmetric Kullback-Leibler Metric Based Tracking Behaviors for Bioinspired Robotic Eyes

Directory of Open Access Journals (Sweden)

Hengli Liu

2015-01-01

Full Text Available A symmetric Kullback-Leibler metric based tracking system, capable of tracking moving targets, is presented for a bionic spherical parallel mechanism to minimize a tracking error function to simulate smooth pursuit of human eyes. More specifically, we propose a real-time moving target tracking algorithm which utilizes spatial histograms taking into account symmetric Kullback-Leibler metric. In the proposed algorithm, the key spatial histograms are extracted and taken into particle filtering framework. Once the target is identified, an image-based control scheme is implemented to drive bionic spherical parallel mechanism such that the identified target is to be tracked at the center of the captured images. Meanwhile, the robot motion information is fed forward to develop an adaptive smooth tracking controller inspired by the Vestibuloocular Reflex mechanism. The proposed tracking system is designed to make the robot track dynamic objects when the robot travels through transmittable terrains, especially bumpy environment. To perform bumpy-resist capability under the condition of violent attitude variation when the robot works in the bumpy environment mentioned, experimental results demonstrate the effectiveness and robustness of our bioinspired tracking system using bionic spherical parallel mechanism inspired by head-eye coordination.
On the harmonic starlike functions with respect to symmetric ...

African Journals Online (AJOL)

In the present paper, we introduce the notions of functions harmonic starlike with respect to symmetric, conjugate and symmetric conjugate points. Such results as coefficient inequalities and structural formulae for these function classes are proved. Keywords: Harmonic functions, harmonic starlike functions, symmetric points, ...
Structural hierarchy in flow-aligned hexagonally self-organized microphases with parallel polyelectrolytic structures

NARCIS (Netherlands)

Ruotsalainen, T; Torkkeli, M; Serimaa, R; Makela, T; Maki-Ontto, R; Ruokolainen, J; ten Brinke, G; Ikkala, O; Mäkelä, Tapio; Mäki-Ontto, Riikka

2003-01-01

We report a novel structural hierarchy where a flow-aligned hexagonal self-organized structure is combined with a polyelectrolytic self-organization on a smaller length scale and where the two structures are mutually parallel. Polystyrene-block-poly(4-vinylpyridine) (PS-block-P4VP) is selected with
A Comparative Taxonomy of Parallel Algorithms for RNA Secondary Structure Prediction

Science.gov (United States)

Al-Khatib, Ra’ed M.; Abdullah, Rosni; Rashid, Nur’Aini Abdul

2010-01-01

RNA molecules have been discovered playing crucial roles in numerous biological and medical procedures and processes. RNA structures determination have become a major problem in the biology context. Recently, computer scientists have empowered the biologists with RNA secondary structures that ease an understanding of the RNA functions and roles. Detecting RNA secondary structure is an NP-hard problem, especially in pseudoknotted RNA structures. The detection process is also time-consuming; as a result, an alternative approach such as using parallel architectures is a desirable option. The main goal in this paper is to do an intensive investigation of parallel methods used in the literature to solve the demanding issues, related to the RNA secondary structure prediction methods. Then, we introduce a new taxonomy for the parallel RNA folding methods. Based on this proposed taxonomy, a systematic and scientific comparison is performed among these existing methods. PMID:20458364
Second International Workshop on Software Engineering and Code Design in Parallel Meteorological and Oceanographic Applications

Science.gov (United States)

OKeefe, Matthew (Editor); Kerr, Christopher L. (Editor)

1998-01-01

This report contains the abstracts and technical papers from the Second International Workshop on Software Engineering and Code Design in Parallel Meteorological and Oceanographic Applications, held June 15-18, 1998, in Scottsdale, Arizona. The purpose of the workshop is to bring together software developers in meteorology and oceanography to discuss software engineering and code design issues for parallel architectures, including Massively Parallel Processors (MPP's), Parallel Vector Processors (PVP's), Symmetric Multi-Processors (SMP's), Distributed Shared Memory (DSM) multi-processors, and clusters. Issues to be discussed include: (1) code architectures for current parallel models, including basic data structures, storage allocation, variable naming conventions, coding rules and styles, i/o and pre/post-processing of data; (2) designing modular code; (3) load balancing and domain decomposition; (4) techniques that exploit parallelism efficiently yet hide the machine-related details from the programmer; (5) tools for making the programmer more productive; and (6) the proliferation of programming models (F--, OpenMP, MPI, and HPF).
The ELPA library: scalable parallel eigenvalue solutions for electronic structure theory and computational science.

Science.gov (United States)

Marek, A; Blum, V; Johanni, R; Havu, V; Lang, B; Auckenthaler, T; Heinecke, A; Bungartz, H-J; Lederer, H

2014-05-28

Obtaining the eigenvalues and eigenvectors of large matrices is a key problem in electronic structure theory and many other areas of computational science. The computational effort formally scales as O(N(3)) with the size of the investigated problem, N (e.g. the electron count in electronic structure theory), and thus often defines the system size limit that practical calculations cannot overcome. In many cases, more than just a small fraction of the possible eigenvalue/eigenvector pairs is needed, so that iterative solution strategies that focus only on a few eigenvalues become ineffective. Likewise, it is not always desirable or practical to circumvent the eigenvalue solution entirely. We here review some current developments regarding dense eigenvalue solvers and then focus on the Eigenvalue soLvers for Petascale Applications (ELPA) library, which facilitates the efficient algebraic solution of symmetric and Hermitian eigenvalue problems for dense matrices that have real-valued and complex-valued matrix entries, respectively, on parallel computer platforms. ELPA addresses standard as well as generalized eigenvalue problems, relying on the well documented matrix layout of the Scalable Linear Algebra PACKage (ScaLAPACK) library but replacing all actual parallel solution steps with subroutines of its own. For these steps, ELPA significantly outperforms the corresponding ScaLAPACK routines and proprietary libraries that implement the ScaLAPACK interface (e.g. Intel's MKL). The most time-critical step is the reduction of the matrix to tridiagonal form and the corresponding backtransformation of the eigenvectors. ELPA offers both a one-step tridiagonalization (successive Householder transformations) and a two-step transformation that is more efficient especially towards larger matrices and larger numbers of CPU cores. ELPA is based on the MPI standard, with an early hybrid MPI-OpenMPI implementation available as well. Scalability beyond 10,000 CPU cores for problem
Parallel time domain solvers for electrically large transient scattering problems

KAUST Repository

Liu, Yang

2014-09-26

Marching on in time (MOT)-based integral equation solvers represent an increasingly appealing avenue for analyzing transient electromagnetic interactions with large and complex structures. MOT integral equation solvers for analyzing electromagnetic scattering from perfect electrically conducting objects are obtained by enforcing electric field boundary conditions and implicitly time advance electric surface current densities by iteratively solving sparse systems of equations at all time steps. Contrary to finite difference and element competitors, these solvers apply to nonlinear and multi-scale structures comprising geometrically intricate and deep sub-wavelength features residing atop electrically large platforms. Moreover, they are high-order accurate, stable in the low- and high-frequency limits, and applicable to conducting and penetrable structures represented by highly irregular meshes. This presentation reviews some recent advances in the parallel implementations of time domain integral equation solvers, specifically those that leverage multilevel plane-wave time-domain algorithm (PWTD) on modern manycore computer architectures including graphics processing units (GPUs) and distributed memory supercomputers. The GPU-based implementation achieves at least one order of magnitude speedups compared to serial implementations while the distributed parallel implementation are highly scalable to thousands of compute-nodes. A distributed parallel PWTD kernel has been adopted to solve time domain surface/volume integral equations (TDSIE/TDVIE) for analyzing transient scattering from large and complex-shaped perfectly electrically conducting (PEC)/dielectric objects involving ten million/tens of millions of spatial unknowns.
A Generic Mesh Data Structure with Parallel Applications

Science.gov (United States)

Cochran, William Kenneth, Jr.

2009-01-01

High performance, massively-parallel multi-physics simulations are built on efficient mesh data structures. Most data structures are designed from the bottom up, focusing on the implementation of linear algebra routines. In this thesis, we explore a top-down approach to design, evaluating the various needs of many aspects of simulation, not just…
Relaxations to Sparse Optimization Problems and Applications

Science.gov (United States)

Skau, Erik West

Parsimony is a fundamental property that is applied to many characteristics in a variety of fields. Of particular interest are optimization problems that apply rank, dimensionality, or support in a parsimonious manner. In this thesis we study some optimization problems and their relaxations, and focus on properties and qualities of the solutions of these problems. The Gramian tensor decomposition problem attempts to decompose a symmetric tensor as a sum of rank one tensors.We approach the Gramian tensor decomposition problem with a relaxation to a semidefinite program. We study conditions which ensure that the solution of the relaxed semidefinite problem gives the minimal Gramian rank decomposition. Sparse representations with learned dictionaries are one of the leading image modeling techniques for image restoration. When learning these dictionaries from a set of training images, the sparsity parameter of the dictionary learning algorithm strongly influences the content of the dictionary atoms.We describe geometrically the content of trained dictionaries and how it changes with the sparsity parameter.We use statistical analysis to characterize how the different content is used in sparse representations. Finally, a method to control the structure of the dictionaries is demonstrated, allowing us to learn a dictionary which can later be tailored for specific applications. Variations of dictionary learning can be broadly applied to a variety of applications.We explore a pansharpening problem with a triple factorization variant of coupled dictionary learning. Another application of dictionary learning is computer vision. Computer vision relies heavily on object detection, which we explore with a hierarchical convolutional dictionary learning model. Data fusion of disparate modalities is a growing topic of interest.We do a case study to demonstrate the benefit of using social media data with satellite imagery to estimate hazard extents. In this case study analysis we
The Use of Sparse Direct Solver in Vector Finite Element Modeling for Calculating Two Dimensional (2-D) Magnetotelluric Responses in Transverse Electric (TE) Mode

Science.gov (United States)

Yihaa Roodhiyah, Lisa’; Tjong, Tiffany; Nurhasan; Sutarno, D.

2018-04-01

The late research, linear matrices of vector finite element in two dimensional(2-D) magnetotelluric (MT) responses modeling was solved by non-sparse direct solver in TE mode. Nevertheless, there is some weakness which have to be improved especially accuracy in the low frequency (10-3 Hz-10-5 Hz) which is not achieved yet and high cost computation in dense mesh. In this work, the solver which is used is sparse direct solver instead of non-sparse direct solverto overcome the weaknesses of solving linear matrices of vector finite element metod using non-sparse direct solver. Sparse direct solver will be advantageous in solving linear matrices of vector finite element method because of the matrix properties which is symmetrical and sparse. The validation of sparse direct solver in solving linear matrices of vector finite element has been done for a homogen half-space model and vertical contact model by analytical solution. Thevalidation result of sparse direct solver in solving linear matrices of vector finite element shows that sparse direct solver is more stable than non-sparse direct solver in computing linear problem of vector finite element method especially in low frequency. In the end, the accuracy of 2D MT responses modelling in low frequency (10-3 Hz-10-5 Hz) has been reached out under the efficient allocation memory of array and less computational time consuming.
Helically symmetric equilibria with pressure anisotropy and incompressible plasma flow

Science.gov (United States)

Evangelias, A.; Kuiroukidis, A.; Throumoulopoulos, G. N.

2018-02-01

We derive a generalized Grad-Shafranov equation governing helically symmetric equilibria with pressure anisotropy and incompressible flow of arbitrary direction. Through the most general linearizing ansatz for the various free surface functions involved therein, we construct equilibrium solutions and study their properties. It turns out that pressure anisotropy can act either paramegnetically or diamagnetically, the parallel flow has a paramagnetic effect, while the non-parallel component of the flow associated with the electric field has a diamagnetic one. Also, pressure anisotropy and flow affect noticeably the helical current density.
Parallel sparse direct solvers for Poisson's equation in streamer discharges

NARCIS (Netherlands)

M. Nool (Margreet); M. Genseberger (Menno); U. M. Ebert (Ute)

2017-01-01

textabstractThe aim of this paper is to examine whether a hybrid approach of parallel computing, a combination of the message passing model (MPI) with the threads model (OpenMP) can deliver good performance in streamer discharge simulations. Since one of the bottlenecks of almost all streamer
Wing-Body Aeroelasticity Using Finite-Difference Fluid/Finite-Element Structural Equations on Parallel Computers

Science.gov (United States)

Byun, Chansup; Guruswamy, Guru P.; Kutler, Paul (Technical Monitor)

1994-01-01

In recent years significant advances have been made for parallel computers in both hardware and software. Now parallel computers have become viable tools in computational mechanics. Many application codes developed on conventional computers have been modified to benefit from parallel computers. Significant speedups in some areas have been achieved by parallel computations. For single-discipline use of both fluid dynamics and structural dynamics, computations have been made on wing-body configurations using parallel computers. However, only a limited amount of work has been completed in combining these two disciplines for multidisciplinary applications. The prime reason is the increased level of complication associated with a multidisciplinary approach. In this work, procedures to compute aeroelasticity on parallel computers using direct coupling of fluid and structural equations will be investigated for wing-body configurations. The parallel computer selected for computations is an Intel iPSC/860 computer which is a distributed-memory, multiple-instruction, multiple data (MIMD) computer with 128 processors. In this study, the computational efficiency issues of parallel integration of both fluid and structural equations will be investigated in detail. The fluid and structural domains will be modeled using finite-difference and finite-element approaches, respectively. Results from the parallel computer will be compared with those from the conventional computers using a single processor. This study will provide an efficient computational tool for the aeroelastic analysis of wing-body structures on MIMD type parallel computers.
Porting of the DBCSR library for Sparse Matrix-Matrix Multiplications to Intel Xeon Phi systems

OpenAIRE

Bethune, Iain; Gloess, Andeas; Hutter, Juerg; Lazzaro, Alfio; Pabst, Hans; Reid, Fiona

2017-01-01

Multiplication of two sparse matrices is a key operation in the simulation of the electronic structure of systems containing thousands of atoms and electrons. The highly optimized sparse linear algebra library DBCSR (Distributed Block Compressed Sparse Row) has been specifically designed to efficiently perform such sparse matrix-matrix multiplications. This library is the basic building block for linear scaling electronic structure theory and low scaling correlated methods in CP2K. It is para...
A novel structure-aware sparse learning algorithm for brain imaging genetics.

Science.gov (United States)

Du, Lei; Jingwen, Yan; Kim, Sungeun; Risacher, Shannon L; Huang, Heng; Inlow, Mark; Moore, Jason H; Saykin, Andrew J; Shen, Li

2014-01-01

Brain imaging genetics is an emergent research field where the association between genetic variations such as single nucleotide polymorphisms (SNPs) and neuroimaging quantitative traits (QTs) is evaluated. Sparse canonical correlation analysis (SCCA) is a bi-multivariate analysis method that has the potential to reveal complex multi-SNP-multi-QT associations. Most existing SCCA algorithms are designed using the soft threshold strategy, which assumes that the features in the data are independent from each other. This independence assumption usually does not hold in imaging genetic data, and thus inevitably limits the capability of yielding optimal solutions. We propose a novel structure-aware SCCA (denoted as S2CCA) algorithm to not only eliminate the independence assumption for the input data, but also incorporate group-like structure in the model. Empirical comparison with a widely used SCCA implementation, on both simulated and real imaging genetic data, demonstrated that S2CCA could yield improved prediction performance and biologically meaningful findings.

Group-sparse representation with dictionary learning for medical image denoising and fusion.

Science.gov (United States)

Li, Shutao; Yin, Haitao; Fang, Leyuan

2012-12-01

Recently, sparse representation has attracted a lot of interest in various areas. However, the standard sparse representation does not consider the intrinsic structure, i.e., the nonzero elements occur in clusters, called group sparsity. Furthermore, there is no dictionary learning method for group sparse representation considering the geometrical structure of space spanned by atoms. In this paper, we propose a novel dictionary learning method, called Dictionary Learning with Group Sparsity and Graph Regularization (DL-GSGR). First, the geometrical structure of atoms is modeled as the graph regularization. Then, combining group sparsity and graph regularization, the DL-GSGR is presented, which is solved by alternating the group sparse coding and dictionary updating. In this way, the group coherence of learned dictionary can be enforced small enough such that any signal can be group sparse coded effectively. Finally, group sparse representation with DL-GSGR is applied to 3-D medical image denoising and image fusion. Specifically, in 3-D medical image denoising, a 3-D processing mechanism (using the similarity among nearby slices) and temporal regularization (to perverse the correlations across nearby slices) are exploited. The experimental results on 3-D image denoising and image fusion demonstrate the superiority of our proposed denoising and fusion approaches.
Low-Rank Sparse Coding for Image Classification

KAUST Repository

Zhang, Tianzhu; Ghanem, Bernard; Liu, Si; Xu, Changsheng; Ahuja, Narendra

2013-01-01

In this paper, we propose a low-rank sparse coding (LRSC) method that exploits local structure information among features in an image for the purpose of image-level classification. LRSC represents densely sampled SIFT descriptors, in a spatial neighborhood, collectively as low-rank, sparse linear combinations of code words. As such, it casts the feature coding problem as a low-rank matrix learning problem, which is different from previous methods that encode features independently. This LRSC has a number of attractive properties. (1) It encourages sparsity in feature codes, locality in codebook construction, and low-rankness for spatial consistency. (2) LRSC encodes local features jointly by considering their low-rank structure information, and is computationally attractive. We evaluate the LRSC by comparing its performance on a set of challenging benchmarks with that of 7 popular coding and other state-of-the-art methods. Our experiments show that by representing local features jointly, LRSC not only outperforms the state-of-the-art in classification accuracy but also improves the time complexity of methods that use a similar sparse linear representation model for feature coding.
Low-Rank Sparse Coding for Image Classification

KAUST Repository

Zhang, Tianzhu

2013-12-01

In this paper, we propose a low-rank sparse coding (LRSC) method that exploits local structure information among features in an image for the purpose of image-level classification. LRSC represents densely sampled SIFT descriptors, in a spatial neighborhood, collectively as low-rank, sparse linear combinations of code words. As such, it casts the feature coding problem as a low-rank matrix learning problem, which is different from previous methods that encode features independently. This LRSC has a number of attractive properties. (1) It encourages sparsity in feature codes, locality in codebook construction, and low-rankness for spatial consistency. (2) LRSC encodes local features jointly by considering their low-rank structure information, and is computationally attractive. We evaluate the LRSC by comparing its performance on a set of challenging benchmarks with that of 7 popular coding and other state-of-the-art methods. Our experiments show that by representing local features jointly, LRSC not only outperforms the state-of-the-art in classification accuracy but also improves the time complexity of methods that use a similar sparse linear representation model for feature coding.
Hybrid MPI-OpenMP Parallelism in the ONETEP Linear-Scaling Electronic Structure Code: Application to the Delamination of Cellulose Nanofibrils.

Science.gov (United States)

Wilkinson, Karl A; Hine, Nicholas D M; Skylaris, Chris-Kriton

2014-11-11

We present a hybrid MPI-OpenMP implementation of Linear-Scaling Density Functional Theory within the ONETEP code. We illustrate its performance on a range of high performance computing (HPC) platforms comprising shared-memory nodes with fast interconnect. Our work has focused on applying OpenMP parallelism to the routines which dominate the computational load, attempting where possible to parallelize different loops from those already parallelized within MPI. This includes 3D FFT box operations, sparse matrix algebra operations, calculation of integrals, and Ewald summation. While the underlying numerical methods are unchanged, these developments represent significant changes to the algorithms used within ONETEP to distribute the workload across CPU cores. The new hybrid code exhibits much-improved strong scaling relative to the MPI-only code and permits calculations with a much higher ratio of cores to atoms. These developments result in a significantly shorter time to solution than was possible using MPI alone and facilitate the application of the ONETEP code to systems larger than previously feasible. We illustrate this with benchmark calculations from an amyloid fibril trimer containing 41,907 atoms. We use the code to study the mechanism of delamination of cellulose nanofibrils when undergoing sonification, a process which is controlled by a large number of interactions that collectively determine the structural properties of the fibrils. Many energy evaluations were needed for these simulations, and as these systems comprise up to 21,276 atoms this would not have been feasible without the developments described here.
Type synthesis for 4-DOF parallel press mechanism using GF set theory

Science.gov (United States)

He, Jun; Gao, Feng; Meng, Xiangdun; Guo, Weizhong

2015-07-01

Parallel mechanisms is used in the large capacity servo press to avoid the over-constraint of the traditional redundant actuation. Currently, the researches mainly focus on the performance analysis for some specific parallel press mechanisms. However, the type synthesis and evaluation of parallel press mechanisms is seldom studied, especially for the four degrees of freedom(DOF) press mechanisms. The type synthesis of 4-DOF parallel press mechanisms is carried out based on the generalized function(GF) set theory. Five design criteria of 4-DOF parallel press mechanisms are firstly proposed. The general procedure of type synthesis of parallel press mechanisms is obtained, which includes number synthesis, symmetrical synthesis of constraint GF sets, decomposition of motion GF sets and design of limbs. Nine combinations of constraint GF sets of 4-DOF parallel press mechanisms, ten combinations of GF sets of active limbs, and eleven combinations of GF sets of passive limbs are synthesized. Thirty-eight kinds of press mechanisms are presented and then different structures of kinematic limbs are designed. Finally, the geometrical constraint complexity( GCC), kinematic pair complexity( KPC), and type complexity( TC) are proposed to evaluate the press types and the optimal press type is achieved. The general methodologies of type synthesis and evaluation for parallel press mechanism are suggested.
PARALLEL SOLUTION METHODS OF PARTIAL DIFFERENTIAL EQUATIONS

Directory of Open Access Journals (Sweden)

Korhan KARABULUT

1998-03-01

Full Text Available Partial differential equations arise in almost all fields of science and engineering. Computer time spent in solving partial differential equations is much more than that of in any other problem class. For this reason, partial differential equations are suitable to be solved on parallel computers that offer great computation power. In this study, parallel solution to partial differential equations with Jacobi, Gauss-Siedel, SOR (Succesive OverRelaxation and SSOR (Symmetric SOR algorithms is studied.
Physics Structure Analysis of Parallel Waves Concept of Physics Teacher Candidate

International Nuclear Information System (INIS)

Sarwi, S; Linuwih, S; Supardi, K I

2017-01-01

The aim of this research was to find a parallel structure concept of wave physics and the factors that influence on the formation of parallel conceptions of physics teacher candidates. The method used qualitative research which types of cross-sectional design. These subjects were five of the third semester of basic physics and six of the fifth semester of wave course students. Data collection techniques used think aloud and written tests. Quantitative data were analysed with descriptive technique-percentage. The data analysis technique for belief and be aware of answers uses an explanatory analysis. Results of the research include: 1) the structure of the concept can be displayed through the illustration of a map containing the theoretical core, supplements the theory and phenomena that occur daily; 2) the trend of parallel conception of wave physics have been identified on the stationary waves, resonance of the sound and the propagation of transverse electromagnetic waves; 3) the influence on the parallel conception that reading textbooks less comprehensive and knowledge is partial understanding as forming the structure of the theory. (paper)
A Novel CSR-Based Sparse Matrix-Vector Multiplication on GPUs

Directory of Open Access Journals (Sweden)

Guixia He

2016-01-01

Full Text Available Sparse matrix-vector multiplication (SpMV is an important operation in scientific computations. Compressed sparse row (CSR is the most frequently used format to store sparse matrices. However, CSR-based SpMVs on graphic processing units (GPUs, for example, CSR-scalar and CSR-vector, usually have poor performance due to irregular memory access patterns. This motivates us to propose a perfect CSR-based SpMV on the GPU that is called PCSR. PCSR involves two kernels and accesses CSR arrays in a fully coalesced manner by introducing a middle array, which greatly alleviates the deficiencies of CSR-scalar (rare coalescing and CSR-vector (partial coalescing. Test results on a single C2050 GPU show that PCSR fully outperforms CSR-scalar, CSR-vector, and CSRMV and HYBMV in the vendor-tuned CUSPARSE library and is comparable with a most recently proposed CSR-based algorithm, CSR-Adaptive. Furthermore, we extend PCSR on a single GPU to multiple GPUs. Experimental results on four C2050 GPUs show that no matter whether the communication between GPUs is considered or not PCSR on multiple GPUs achieves good performance and has high parallel efficiency.
HREM investigation of the structure of the Σ5(310)/[001] symmetric tilt grain boundaries in Nb

International Nuclear Information System (INIS)

King, W.E.; Compbell, G.H.; Coombs, A.; Ruehle, M.

1991-01-01

This paper reports on atomistic simulations using interatomic potentials for Nb developed employing the embedded atom method (EAM) and the model generalized pseudopotential theory (MGPT) that have indicated a possible cusp at the Σ5 (310) orientation in the energy vs tilt angle curves for left-angle 001 right-angle symmetric tilt grain boundaries. In addition, the most stable structure predicted using EAM exhibits shifts of one crystal relative to the other along the tilt axis and along the direction perpendicular to the tilt axis lying in the boundary plane. The structure predicted using the MGPT was mirror symmetric across the plane of the grain boundary. This boundary has been prepared for experimental study using the ultra high vacuum diffusion bonding method. A segment of this boundary has been studied using high resolution electron microscopy
Parallel-Vector Algorithm For Rapid Structural Anlysis

Science.gov (United States)

Agarwal, Tarun R.; Nguyen, Duc T.; Storaasli, Olaf O.

1993-01-01

New algorithm developed to overcome deficiency of skyline storage scheme by use of variable-band storage scheme. Exploits both parallel and vector capabilities of modern high-performance computers. Gives engineers and designers opportunity to include more design variables and constraints during optimization of structures. Enables use of more refined finite-element meshes to obtain improved understanding of complex behaviors of aerospace structures leading to better, safer designs. Not only attractive for current supercomputers but also for next generation of shared-memory supercomputers.
GRAPES: a software for parallel searching on biological graphs targeting multi-core architectures.

Directory of Open Access Journals (Sweden)

Rosalba Giugno

Full Text Available Biological applications, from genomics to ecology, deal with graphs that represents the structure of interactions. Analyzing such data requires searching for subgraphs in collections of graphs. This task is computationally expensive. Even though multicore architectures, from commodity computers to more advanced symmetric multiprocessing (SMP, offer scalable computing power, currently published software implementations for indexing and graph matching are fundamentally sequential. As a consequence, such software implementations (i do not fully exploit available parallel computing power and (ii they do not scale with respect to the size of graphs in the database. We present GRAPES, software for parallel searching on databases of large biological graphs. GRAPES implements a parallel version of well-established graph searching algorithms, and introduces new strategies which naturally lead to a faster parallel searching system especially for large graphs. GRAPES decomposes graphs into subcomponents that can be efficiently searched in parallel. We show the performance of GRAPES on representative biological datasets containing antiviral chemical compounds, DNA, RNA, proteins, protein contact maps and protein interactions networks.
A parallel orbital-updating based plane-wave basis method for electronic structure calculations

International Nuclear Information System (INIS)

Pan, Yan; Dai, Xiaoying; Gironcoli, Stefano de; Gong, Xin-Gao; Rignanese, Gian-Marco; Zhou, Aihui

2017-01-01

Highlights: • Propose three parallel orbital-updating based plane-wave basis methods for electronic structure calculations. • These new methods can avoid the generating of large scale eigenvalue problems and then reduce the computational cost. • These new methods allow for two-level parallelization which is particularly interesting for large scale parallelization. • Numerical experiments show that these new methods are reliable and efficient for large scale calculations on modern supercomputers. - Abstract: Motivated by the recently proposed parallel orbital-updating approach in real space method , we propose a parallel orbital-updating based plane-wave basis method for electronic structure calculations, for solving the corresponding eigenvalue problems. In addition, we propose two new modified parallel orbital-updating methods. Compared to the traditional plane-wave methods, our methods allow for two-level parallelization, which is particularly interesting for large scale parallelization. Numerical experiments show that these new methods are more reliable and efficient for large scale calculations on modern supercomputers.
A possibility of parallel and anti-parallel diffraction measurements on neu- tron diffractometer employing bent perfect crystal monochromator at the monochromatic focusing condition

Science.gov (United States)

Choi, Yong Nam; Kim, Shin Ae; Kim, Sung Kyu; Kim, Sung Baek; Lee, Chang-Hee; Mikula, Pavel

2004-07-01

In a conventional diffractometer having single monochromator, only one position, parallel position, is used for the diffraction experiment (i.e. detection) because the resolution property of the other one, anti-parallel position, is very poor. However, a bent perfect crystal (BPC) monochromator at monochromatic focusing condition can provide a quite flat and equal resolution property at both parallel and anti-parallel positions and thus one can have a chance to use both sides for the diffraction experiment. From the data of the FWHM and the Delta d/d measured on three diffraction geometries (symmetric, asymmetric compression and asymmetric expansion), we can conclude that the simultaneous diffraction measurement in both parallel and anti-parallel positions can be achieved.
Fast parallel diffractive multi-beam femtosecond laser surface micro-structuring

Energy Technology Data Exchange (ETDEWEB)

Zheng Kuang, E-mail: z.kuang@liv.ac.uk [Laser Group, Department of Engineering, University of Liverpool, Brodie Building, Liverpool L69 3GQ (United Kingdom); Dun Liu; Perrie, Walter; Edwardson, Stuart; Sharp, Martin; Fearon, Eamonn; Dearden, Geoff; Watkins, Ken [Laser Group, Department of Engineering, University of Liverpool, Brodie Building, Liverpool L69 3GQ (United Kingdom)

2009-04-15

Fast parallel femtosecond laser surface micro-structuring is demonstrated using a spatial light modulator (SLM). The Gratings and Lenses algorithm, which is simple and computationally fast, is used to calculate computer generated holograms (CGHs) producing diffractive multiple beams for the parallel processing. The results show that the finite laser bandwidth can significantly alter the intensity distribution of diffracted beams at higher angles resulting in elongated hole shapes. In addition, by synchronisation of applied CGHs and the scanning system, true 3D micro-structures are created on Ti6Al4V.
Causal symmetric spaces

CERN Document Server

Olafsson, Gestur; Helgason, Sigurdur

1996-01-01

This book is intended to introduce researchers and graduate students to the concepts of causal symmetric spaces. To date, results of recent studies considered standard by specialists have not been widely published. This book seeks to bring this information to students and researchers in geometry and analysis on causal symmetric spaces.Includes the newest results in harmonic analysis including Spherical functions on ordered symmetric space and the holmorphic discrete series and Hardy spaces on compactly casual symmetric spacesDeals with the infinitesimal situation, coverings of symmetric spaces, classification of causal symmetric pairs and invariant cone fieldsPresents basic geometric properties of semi-simple symmetric spacesIncludes appendices on Lie algebras and Lie groups, Bounded symmetric domains (Cayley transforms), Antiholomorphic Involutions on Bounded Domains and Para-Hermitian Symmetric Spaces
Implementation of a Parallel Protein Structure Alignment Service on Cloud

Directory of Open Access Journals (Sweden)

Che-Lun Hung

2013-01-01

Full Text Available Protein structure alignment has become an important strategy by which to identify evolutionary relationships between protein sequences. Several alignment tools are currently available for online comparison of protein structures. In this paper, we propose a parallel protein structure alignment service based on the Hadoop distribution framework. This service includes a protein structure alignment algorithm, a refinement algorithm, and a MapReduce programming model. The refinement algorithm refines the result of alignment. To process vast numbers of protein structures in parallel, the alignment and refinement algorithms are implemented using MapReduce. We analyzed and compared the structure alignments produced by different methods using a dataset randomly selected from the PDB database. The experimental results verify that the proposed algorithm refines the resulting alignments more accurately than existing algorithms. Meanwhile, the computational performance of the proposed service is proportional to the number of processors used in our cloud platform.
A structured representation for parallel algorithm design on multicomputers

International Nuclear Information System (INIS)

Sun, Xian-He; Ni, L.M.

1991-01-01

Traditionally, parallel algorithms have been designed by brute force methods and fine-tuned on each architecture to achieve high performance. Rather than studying the design case by case, a systematic approach is proposed. A notation is first developed. Using this notation, most of the frequently used scientific and engineering applications can be presented by simple formulas. The formulas constitute the structured representation of the corresponding applications. The structured representation is simple, adequate and easy to understand. They also contain sufficient information about uneven allocation and communication latency degradations. With the structured representation, applications can be compared, classified and partitioned. Some of the basic building blocks, called computation models, of frequently used applications are identified and studied. Most applications are combinations of some computation models. The structured representation relates general applications to computation models. Studying computation models leads to a guideline for efficient parallel algorithm design for general applications. 6 refs., 7 figs
Parallel computing techniques for rotorcraft aerodynamics

Science.gov (United States)

Ekici, Kivanc

The modification of unsteady three-dimensional Navier-Stokes codes for application on massively parallel and distributed computing environments is investigated. The Euler/Navier-Stokes code TURNS (Transonic Unsteady Rotor Navier-Stokes) was chosen as a test bed because of its wide use by universities and industry. For the efficient implementation of TURNS on parallel computing systems, two algorithmic changes are developed. First, main modifications to the implicit operator, Lower-Upper Symmetric Gauss Seidel (LU-SGS) originally used in TURNS, is performed. Second, application of an inexact Newton method, coupled with a Krylov subspace iterative method (Newton-Krylov method) is carried out. Both techniques have been tried previously for the Euler equations mode of the code. In this work, we have extended the methods to the Navier-Stokes mode. Several new implicit operators were tried because of convergence problems of traditional operators with the high cell aspect ratio (CAR) grids needed for viscous calculations on structured grids. Promising results for both Euler and Navier-Stokes cases are presented for these operators. For the efficient implementation of Newton-Krylov methods to the Navier-Stokes mode of TURNS, efficient preconditioners must be used. The parallel implicit operators used in the previous step are employed as preconditioners and the results are compared. The Message Passing Interface (MPI) protocol has been used because of its portability to various parallel architectures. It should be noted that the proposed methodology is general and can be applied to several other CFD codes (e.g. OVERFLOW).
Enhanced Sensitivity of Anti-Symmetrically Structured Surface Plasmon Resonance Sensors with Zinc Oxide Intermediate Layers

Directory of Open Access Journals (Sweden)

Nan-Fu Chiu

2013-12-01

Full Text Available We report a novel design wherein high-refractive-index zinc oxide (ZnO intermediary layers are used in anti-symmetrically structured surface plasmon resonance (SPR devices to enhance signal quality and improve the full width at half maximum (FWHM of the SPR reflectivity curve. The surface plasmon (SP modes of the ZnO intermediary layer were excited by irradiating both sides of the Au film, thus inducing a high electric field at the Au/ZnO interface. We demonstrated that an improvement in the ZnO (002 crystal orientation led to a decrease in the FWHM of the SPR reflectivity curves. We optimized the design of ZnO thin films using different parameters and performed analytical comparisons of the ZnO with conventional chromium (Cr and indium tin oxide (ITO intermediary layers. The present study is based on application of the Fresnel equation, which provides an explanation and verification for the observed narrow SPR reflectivity curve and optical transmittance spectra exhibited by (ZnO/Au, (Cr/Au, and (ITO/Au devices. On exposure to ethanol, the anti-symmetrically structured showed a huge electric field at the Au/ZnO interface and a 2-fold decrease in the FWHM value and a 1.3-fold larger shift in angle interrogation and a 4.5-fold high-sensitivity shift in intensity interrogation. The anti-symmetrically structured of ZnO intermediate layers exhibited a wider linearity range and much higher sensitivity. It also exhibited a good linear relationship between the incident angle and ethanol concentration in the tested range. Thus, we demonstrated a novel and simple method for fabricating high-sensitivity, high-resolution SPR biosensors that provide high accuracy and precision over relevant ranges of analyte measurement.
Parallel magnetotransport in multiple quantum well structures

International Nuclear Information System (INIS)

Sheregii, E.M.; Ploch, D.; Marchewka, M.; Tomaka, G.; Kolek, A.; Stadler, A.; Mleczko, K.; Strupinski, W.; Jasik, A.; Jakiela, R.

2004-01-01

The results of investigations of parallel magnetotransport in AlGaAs/GaAs and InGaAs/InAlAs/InP multiple quantum wells structures (MQW's) are presented in this paper. The MQW's were obtained by metalorganic vapour phase epitaxy with different shapes of QW, numbers of QW and levels of doping. The magnetotransport measurements were performed in wide region of temperatures (0.5-300 K) and at high magnetic fields up to 30 T (B is perpendicular and current is parallel to the plane of the QW). Three types of observed effects are analyzed: quantum Hall effect and Shubnikov-de Haas oscillations at low temperatures (0.5-6 K) as well as magnetophonon resonance at higher temperatures (77-300 K)

Study on paralleled inverters with current-sharing coupled inductors on J-TEXT Tokamak

Energy Technology Data Exchange (ETDEWEB)

Shao, J.; Rao, B., E-mail: borao@hust.edu.cn; Zhang, M.; Ma, S.X.; Liang, X.; Yu, K.X.; Pan, Y.

2016-12-15

Highlights: • A modification scheme of heating field power supply system for plasma current modulation. • High-power fast control power supply with multilevel cascade circuit. • Restraining circulating current with coupled inductors in cyclic symmetric structure. • Analysis on the topology with current-sharing coupled inductors. - Abstract: The coupled inductors in paralleled inverters are applied to restrain the high frequency circulating current on J-TEXT Tokamak. Compared with individual inductor, this method has the benefit of high voltage utilization, less volume and weight of the inductor. In this paper, circuit topology of coupled inductors in cyclic symmetry structure for steady-state operation is analyzed and then the design of the inductor is introduced. The maximum circulating current is related to number of parallel branch, DC side voltage, self-inductance of the inductor and the frequency of carrier wave. The simulation and prototype experiment results verify the design.
PT -symmetric gain and loss in a rotating Bose-Einstein condensate

Science.gov (United States)

Haag, Daniel; Dast, Dennis; Cartarius, Holger; Wunner, Günter

2018-03-01

PT -symmetric quantum mechanics allows finding stationary states in mean-field systems with balanced gain and loss of particles. In this work we apply this method to rotating Bose-Einstein condensates with contact interaction which are known to support ground states with vortices. Due to the particle exchange with the environment transport phenomena through ultracold gases with vortices can be studied. We find that even strongly interacting rotating systems support stable PT -symmetric ground states, sustaining a current parallel and perpendicular to the vortex cores. The vortices move through the nonuniform particle density and leave or enter the condensate through its borders creating the required net current.
A Generalized Lanczos-QR Technique for Structural Analysis

DEFF Research Database (Denmark)

Vissing, S.

systems with very special properties. Due to the finite discretization the matrices are sparse and a relatively large number of problems also has real and symmetric matrices. The matrix equation for an undamped vibration contains two matrices describing tangent stiffness and mass distributions......Within the field of solid mechanics such as structural dynamics and linearized as well as non-linear stability, the eigenvalue problem plays an important role. In the class of finite element and finite difference discretized problems these engineering problems are characterized by large matrix....... Alternatively, in a stability analysis, tangent stiffness and geometric stiffness matrices are introduced into an eigenvalue problem used to determine possible bifurcation points. The common basis for these types of problems is that the matrix equation describing the problem contains two real, symmetric...
Solution Structure of a Novel C2-Symmetrical Bifunctional Bicyclic Inhibitor Based on SFTI-1

International Nuclear Information System (INIS)

Jaulent, Agnes M.; Brauer, Arnd B. E.; Matthews, Stephen J.; Leatherbarrow, Robin J.

2005-01-01

A novel bifunctional bicyclic inhibitor has been created that combines features both from the Bowman-Birk inhibitor (BBI) proteins, which have two distinct inhibitory sites, and from sunflower trypsin inhibitor-1 (SFTI-1), which has a compact bicyclic structure. The inhibitor was designed by fusing together a pair of reactive loops based on a sequence derived from SFTI-1 to create a backbone-cyclized disulfide-bridged 16-mer peptide. This peptide has two symmetrically spaced trypsin binding sites. Its synthesis and biological activity have been reported in a previous communication [Jaulent and Leatherbarrow, 2004, PEDS 17, 681]. In the present study we have examined the three-dimensional structure of the molecule. We find that the new inhibitor, which has a symmetrical 8-mer half-cystine CTKSIPP'I' motif repeated through a C 2 symmetry axis also shows a complete symmetry in its three-dimensional structure. Each of the two loops adopts the expected canonical conformation common to all BBIs as well as SFTI-1. We also find that the inhibitor displays a strong and unique structural identity, with a notable lack of minor conformational isomers that characterise most reactive site loop mimics examined to date as well as SFTI-1. This suggests that the presence of the additional cyclic loop acts to restrict conformational mobility and that the deliberate introduction of cyclic symmetry may offer a general route to locking the conformation of β-hairpin structures
Reducing computational costs in large scale 3D EIT by using a sparse Jacobian matrix with block-wise CGLS reconstruction

International Nuclear Information System (INIS)

Yang, C L; Wei, H Y; Soleimani, M; Adler, A

2013-01-01

Electrical impedance tomography (EIT) is a fast and cost-effective technique to provide a tomographic conductivity image of a subject from boundary current–voltage data. This paper proposes a time and memory efficient method for solving a large scale 3D EIT inverse problem using a parallel conjugate gradient (CG) algorithm. The 3D EIT system with a large number of measurement data can produce a large size of Jacobian matrix; this could cause difficulties in computer storage and the inversion process. One of challenges in 3D EIT is to decrease the reconstruction time and memory usage, at the same time retaining the image quality. Firstly, a sparse matrix reduction technique is proposed using thresholding to set very small values of the Jacobian matrix to zero. By adjusting the Jacobian matrix into a sparse format, the element with zeros would be eliminated, which results in a saving of memory requirement. Secondly, a block-wise CG method for parallel reconstruction has been developed. The proposed method has been tested using simulated data as well as experimental test samples. Sparse Jacobian with a block-wise CG enables the large scale EIT problem to be solved efficiently. Image quality measures are presented to quantify the effect of sparse matrix reduction in reconstruction results. (paper)
Reducing computational costs in large scale 3D EIT by using a sparse Jacobian matrix with block-wise CGLS reconstruction.

Science.gov (United States)

Yang, C L; Wei, H Y; Adler, A; Soleimani, M

2013-06-01

Electrical impedance tomography (EIT) is a fast and cost-effective technique to provide a tomographic conductivity image of a subject from boundary current-voltage data. This paper proposes a time and memory efficient method for solving a large scale 3D EIT inverse problem using a parallel conjugate gradient (CG) algorithm. The 3D EIT system with a large number of measurement data can produce a large size of Jacobian matrix; this could cause difficulties in computer storage and the inversion process. One of challenges in 3D EIT is to decrease the reconstruction time and memory usage, at the same time retaining the image quality. Firstly, a sparse matrix reduction technique is proposed using thresholding to set very small values of the Jacobian matrix to zero. By adjusting the Jacobian matrix into a sparse format, the element with zeros would be eliminated, which results in a saving of memory requirement. Secondly, a block-wise CG method for parallel reconstruction has been developed. The proposed method has been tested using simulated data as well as experimental test samples. Sparse Jacobian with a block-wise CG enables the large scale EIT problem to be solved efficiently. Image quality measures are presented to quantify the effect of sparse matrix reduction in reconstruction results.
Sparse regularization for force identification using dictionaries

Science.gov (United States)

Qiao, Baijie; Zhang, Xingwu; Wang, Chenxi; Zhang, Hang; Chen, Xuefeng

2016-04-01

The classical function expansion method based on minimizing l2-norm of the response residual employs various basis functions to represent the unknown force. Its difficulty lies in determining the optimum number of basis functions. Considering the sparsity of force in the time domain or in other basis space, we develop a general sparse regularization method based on minimizing l1-norm of the coefficient vector of basis functions. The number of basis functions is adaptively determined by minimizing the number of nonzero components in the coefficient vector during the sparse regularization process. First, according to the profile of the unknown force, the dictionary composed of basis functions is determined. Second, a sparsity convex optimization model for force identification is constructed. Third, given the transfer function and the operational response, Sparse reconstruction by separable approximation (SpaRSA) is developed to solve the sparse regularization problem of force identification. Finally, experiments including identification of impact and harmonic forces are conducted on a cantilever thin plate structure to illustrate the effectiveness and applicability of SpaRSA. Besides the Dirac dictionary, other three sparse dictionaries including Db6 wavelets, Sym4 wavelets and cubic B-spline functions can also accurately identify both the single and double impact forces from highly noisy responses in a sparse representation frame. The discrete cosine functions can also successfully reconstruct the harmonic forces including the sinusoidal, square and triangular forces. Conversely, the traditional Tikhonov regularization method with the L-curve criterion fails to identify both the impact and harmonic forces in these cases.
A Spectral Algorithm for Envelope Reduction of Sparse Matrices

Science.gov (United States)

Barnard, Stephen T.; Pothen, Alex; Simon, Horst D.

1993-01-01

The problem of reordering a sparse symmetric matrix to reduce its envelope size is considered. A new spectral algorithm for computing an envelope-reducing reordering is obtained by associating a Laplacian matrix with the given matrix and then sorting the components of a specified eigenvector of the Laplacian. This Laplacian eigenvector solves a continuous relaxation of a discrete problem related to envelope minimization called the minimum 2-sum problem. The permutation vector computed by the spectral algorithm is a closest permutation vector to the specified Laplacian eigenvector. Numerical results show that the new reordering algorithm usually computes smaller envelope sizes than those obtained from the current standard algorithms such as Gibbs-Poole-Stockmeyer (GPS) or SPARSPAK reverse Cuthill-McKee (RCM), in some cases reducing the envelope by more than a factor of two.
Parallel hierarchical radiosity rendering

Energy Technology Data Exchange (ETDEWEB)

Carter, Michael [Iowa State Univ., Ames, IA (United States)

1993-07-01

In this dissertation, the step-by-step development of a scalable parallel hierarchical radiosity renderer is documented. First, a new look is taken at the traditional radiosity equation, and a new form is presented in which the matrix of linear system coefficients is transformed into a symmetric matrix, thereby simplifying the problem and enabling a new solution technique to be applied. Next, the state-of-the-art hierarchical radiosity methods are examined for their suitability to parallel implementation, and scalability. Significant enhancements are also discovered which both improve their theoretical foundations and improve the images they generate. The resultant hierarchical radiosity algorithm is then examined for sources of parallelism, and for an architectural mapping. Several architectural mappings are discussed. A few key algorithmic changes are suggested during the process of making the algorithm parallel. Next, the performance, efficiency, and scalability of the algorithm are analyzed. The dissertation closes with a discussion of several ideas which have the potential to further enhance the hierarchical radiosity method, or provide an entirely new forum for the application of hierarchical methods.
Parallel algorithms for network routing problems and recurrences

International Nuclear Information System (INIS)

Wisniewski, J.A.; Sameh, A.H.

1982-01-01

In this paper, we consider the parallel solution of recurrences, and linear systems in the regular algebra of Carre. These problems are equivalent to solving the shortest path problem in graph theory, and they also arise in the analysis of Fortran programs. Our methods for solving linear systems in the regular algebra are analogues of well-known methods for solving systems of linear algebraic equations. A parallel version of Dijkstra's method, which has no linear algebraic analogue, is presented. Considerations for choosing an algorithm when the problem is large and sparse are also discussed
Symmetrical metallic and magnetic edge states of nanoribbon from semiconductive monolayer PtS2

Science.gov (United States)

Liu, Shan; Zhu, Heyu; Liu, Ziran; Zhou, Guanghui

2018-03-01

Transition metal dichalcogenides (TMD) MoS2 or graphene could be designed to metallic nanoribbons, which always have only one edge show metallic properties due to symmetric protection. In present work, a nanoribbon with two parallel metallic and magnetic edges was designed from a noble TMD PtS2 by employing first-principles calculations based on density functional theory (DFT). Edge energy, bonding charge density, band structure, density of states (DOS) and simulated scanning tunneling microscopy (STM) of four possible edge states of monolayer semiconductive PtS2 were systematically studied. Detailed calculations show that only Pt-terminated edge state among four edge states was relatively stable, metallic and magnetic. Those metallic and magnetic properties mainly contributed from 5d orbits of Pt atoms located at edges. What's more, two of those central symmetric edges coexist in one zigzag nanoribbon, which providing two atomic metallic wires thus may have promising application for the realization of quantum effects, such as Aharanov-Bohm effect and atomic power transmission lines in single nanoribbon.
Two-dimensional sparse wavenumber recovery for guided wavefields

Science.gov (United States)

Sabeti, Soroosh; Harley, Joel B.

2018-04-01

The multi-modal and dispersive behavior of guided waves is often characterized by their dispersion curves, which describe their frequency-wavenumber behavior. In prior work, compressive sensing based techniques, such as sparse wavenumber analysis (SWA), have been capable of recovering dispersion curves from limited data samples. A major limitation of SWA, however, is the assumption that the structure is isotropic. As a result, SWA fails when applied to composites and other anisotropic structures. There have been efforts to address this issue in the literature, but they either are not easily generalizable or do not sufficiently express the data. In this paper, we enhance the existing approaches by employing a two-dimensional wavenumber model to account for direction-dependent velocities in anisotropic media. We integrate this model with tools from compressive sensing to reconstruct a wavefield from incomplete data. Specifically, we create a modified two-dimensional orthogonal matching pursuit algorithm that takes an undersampled wavefield image, with specified unknown elements, and determines its sparse wavenumber characteristics. We then recover the entire wavefield from the sparse representations obtained with our small number of data samples.
A linear programming approach for estimating the structure of a sparse linear genetic network from transcript profiling data

Directory of Open Access Journals (Sweden)

Chandra Nagasuma R

2009-02-01

Full Text Available Abstract Background A genetic network can be represented as a directed graph in which a node corresponds to a gene and a directed edge specifies the direction of influence of one gene on another. The reconstruction of such networks from transcript profiling data remains an important yet challenging endeavor. A transcript profile specifies the abundances of many genes in a biological sample of interest. Prevailing strategies for learning the structure of a genetic network from high-dimensional transcript profiling data assume sparsity and linearity. Many methods consider relatively small directed graphs, inferring graphs with up to a few hundred nodes. This work examines large undirected graphs representations of genetic networks, graphs with many thousands of nodes where an undirected edge between two nodes does not indicate the direction of influence, and the problem of estimating the structure of such a sparse linear genetic network (SLGN from transcript profiling data. Results The structure learning task is cast as a sparse linear regression problem which is then posed as a LASSO (l1-constrained fitting problem and solved finally by formulating a Linear Program (LP. A bound on the Generalization Error of this approach is given in terms of the Leave-One-Out Error. The accuracy and utility of LP-SLGNs is assessed quantitatively and qualitatively using simulated and real data. The Dialogue for Reverse Engineering Assessments and Methods (DREAM initiative provides gold standard data sets and evaluation metrics that enable and facilitate the comparison of algorithms for deducing the structure of networks. The structures of LP-SLGNs estimated from the INSILICO1, INSILICO2 and INSILICO3 simulated DREAM2 data sets are comparable to those proposed by the first and/or second ranked teams in the DREAM2 competition. The structures of LP-SLGNs estimated from two published Saccharomyces cerevisae cell cycle transcript profiling data sets capture known
Parallel protein secondary structure prediction based on neural networks.

Science.gov (United States)

Zhong, Wei; Altun, Gulsah; Tian, Xinmin; Harrison, Robert; Tai, Phang C; Pan, Yi

2004-01-01

Protein secondary structure prediction has a fundamental influence on today's bioinformatics research. In this work, binary and tertiary classifiers of protein secondary structure prediction are implemented on Denoeux belief neural network (DBNN) architecture. Hydrophobicity matrix, orthogonal matrix, BLOSUM62 and PSSM (position specific scoring matrix) are experimented separately as the encoding schemes for DBNN. The experimental results contribute to the design of new encoding schemes. New binary classifier for Helix versus not Helix ( approximately H) for DBNN produces prediction accuracy of 87% when PSSM is used for the input profile. The performance of DBNN binary classifier is comparable to other best prediction methods. The good test results for binary classifiers open a new approach for protein structure prediction with neural networks. Due to the time consuming task of training the neural networks, Pthread and OpenMP are employed to parallelize DBNN in the hyperthreading enabled Intel architecture. Speedup for 16 Pthreads is 4.9 and speedup for 16 OpenMP threads is 4 in the 4 processors shared memory architecture. Both speedup performance of OpenMP and Pthread is superior to that of other research. With the new parallel training algorithm, thousands of amino acids can be processed in reasonable amount of time. Our research also shows that hyperthreading technology for Intel architecture is efficient for parallel biological algorithms.
Iterative methods for the solution of very large complex symmetric linear systems of equations in electrodynamics

Energy Technology Data Exchange (ETDEWEB)

Clemens, M.; Weiland, T. [Technische Hochschule Darmstadt (Germany)

1996-12-31

In the field of computational electrodynamics the discretization of Maxwell`s equations using the Finite Integration Theory (FIT) yields very large, sparse, complex symmetric linear systems of equations. For this class of complex non-Hermitian systems a number of conjugate gradient-type algorithms is considered. The complex version of the biconjugate gradient (BiCG) method by Jacobs can be extended to a whole class of methods for complex-symmetric algorithms SCBiCG(T, n), which only require one matrix vector multiplication per iteration step. In this class the well-known conjugate orthogonal conjugate gradient (COCG) method for complex-symmetric systems corresponds to the case n = 0. The case n = 1 yields the BiCGCR method which corresponds to the conjugate residual algorithm for the real-valued case. These methods in combination with a minimal residual smoothing process are applied separately to practical 3D electro-quasistatical and eddy-current problems in electrodynamics. The practical performance of the SCBiCG methods is compared with other methods such as QMR and TFQMR.
Cache-aware data structure model for parallelism and dynamic load balancing

International Nuclear Information System (INIS)

Sridi, Marwa

2016-01-01

This PhD thesis is dedicated to the implementation of innovative parallel methods in the framework of fast transient fluid-structure dynamics. It improves existing methods within EUROPLEXUS software, in order to optimize the shared memory parallel strategy, complementary to the original distributed memory approach, brought together into a global hybrid strategy for clusters of multi-core nodes. Starting from a sound analysis of the state of the art concerning data structuring techniques correlated to the hierarchic memory organization of current multi-processor architectures, the proposed work introduces an approach suitable for an explicit time integration (i.e. with no linear system to solve at each step). A data structure of type 'Structure of arrays' is conserved for the global data storage, providing flexibility and efficiency for current operations on kinematics fields (displacement, velocity and acceleration). On the contrary, in the particular case of elementary operations (for internal forces generic computations, as well as fluxes computations between cell faces for fluid models), particularly time consuming but localized in the program, a temporary data structure of type 'Array of structures' is used instead, to force an efficient filling of the cache memory and increase the performance of the resolution, for both serial and shared memory parallel processing. Switching from the global structure to the temporary one is based on a cell grouping strategy, following classing cache-blocking principles but handling specifically for this work neighboring data necessary to the efficient treatment of ALE fluxes for cells on the group boundaries. The proposed approach is extensively tested, from the point of views of both the computation time and the access failures into cache memory, confronting the gains obtained within the elementary operations to the potential overhead generated by the data structure switch. Obtained results are very satisfactory, especially
A Performance-Prediction Model for PIC Applications on Clusters of Symmetric MultiProcessors: Validation with Hierarchical HPF+OpenMP Implementation

Directory of Open Access Journals (Sweden)

Sergio Briguglio

2003-01-01

Full Text Available A performance-prediction model is presented, which describes different hierarchical workload decomposition strategies for particle in cell (PIC codes on Clusters of Symmetric MultiProcessors. The devised workload decomposition is hierarchically structured: a higher-level decomposition among the computational nodes, and a lower-level one among the processors of each computational node. Several decomposition strategies are evaluated by means of the prediction model, with respect to the memory occupancy, the parallelization efficiency and the required programming effort. Such strategies have been implemented by integrating the high-level languages High Performance Fortran (at the inter-node stage and OpenMP (at the intra-node one. The details of these implementations are presented, and the experimental values of parallelization efficiency are compared with the predicted results.
Laser-Printed In-Plane Micro-Supercapacitors: From Symmetric to Asymmetric Structure.

Science.gov (United States)

Huang, Gui-Wen; Li, Na; Du, Yi; Feng, Qing-Ping; Xiao, Hong-Mei; Wu, Xing-Hua; Fu, Shao-Yun

2018-01-10

Here, we propose and demonstrate a complete solution for efficiently fabricating in-plane micro-supercapacitors (MSCs) from a symmetric to asymmetric structure. By using an original laser printing process, symmetric MSC with reduced graphene oxide (rGO)/silver nanowire (Ag-NW) hybrid electrodes was facilely fabricated and a high areal capacitance of 5.5 mF cm -2 was achieved, which reaches the best reports on graphene-based MSCs. More importantly, a "print-and-fold" method has been creatively proposed that enabled the rapid manufacturing of asymmetric in-plane MSCs beyond the traditional cumbersome technologies. α-Ni(OH) 2 particles with high tapping density were successfully synthesized and employed as the pseudocapacitive material. Consequently, an improved supply voltage of 1.5 V was obtained and an areal capacitance as high as 8.6 mF cm -2 has been realized. Moreover, a demonstration of a miniaturized MSC pack was performed by multiply-folding the serial Ag-NW-connected MSC units. As a result, a compact MSC pack with a high supply voltage of 3 V was obtained, which can be utilized to power a light-emitting diode light. These presented technologies may pave the way for the efficiently producing high performance in-plane MSCs, meanwhile offering a solution for the achievement of practical power supply packs integrated in limited spaces.
Parallel multigrid smoothing: polynomial versus Gauss-Seidel

International Nuclear Information System (INIS)

Adams, Mark; Brezina, Marian; Hu, Jonathan; Tuminaro, Ray

2003-01-01

Gauss-Seidel is often the smoother of choice within multigrid applications. In the context of unstructured meshes, however, maintaining good parallel efficiency is difficult with multiplicative iterative methods such as Gauss-Seidel. This leads us to consider alternative smoothers. We discuss the computational advantages of polynomial smoothers within parallel multigrid algorithms for positive definite symmetric systems. Two particular polynomials are considered: Chebyshev and a multilevel specific polynomial. The advantages of polynomial smoothing over traditional smoothers such as Gauss-Seidel are illustrated on several applications: Poisson's equation, thin-body elasticity, and eddy current approximations to Maxwell's equations. While parallelizing the Gauss-Seidel method typically involves a compromise between a scalable convergence rate and maintaining high flop rates, polynomial smoothers achieve parallel scalable multigrid convergence rates without sacrificing flop rates. We show that, although parallel computers are the main motivation, polynomial smoothers are often surprisingly competitive with Gauss-Seidel smoothers on serial machines
Parallel multigrid smoothing: polynomial versus Gauss-Seidel

Science.gov (United States)

Adams, Mark; Brezina, Marian; Hu, Jonathan; Tuminaro, Ray

2003-07-01

Gauss-Seidel is often the smoother of choice within multigrid applications. In the context of unstructured meshes, however, maintaining good parallel efficiency is difficult with multiplicative iterative methods such as Gauss-Seidel. This leads us to consider alternative smoothers. We discuss the computational advantages of polynomial smoothers within parallel multigrid algorithms for positive definite symmetric systems. Two particular polynomials are considered: Chebyshev and a multilevel specific polynomial. The advantages of polynomial smoothing over traditional smoothers such as Gauss-Seidel are illustrated on several applications: Poisson's equation, thin-body elasticity, and eddy current approximations to Maxwell's equations. While parallelizing the Gauss-Seidel method typically involves a compromise between a scalable convergence rate and maintaining high flop rates, polynomial smoothers achieve parallel scalable multigrid convergence rates without sacrificing flop rates. We show that, although parallel computers are the main motivation, polynomial smoothers are often surprisingly competitive with Gauss-Seidel smoothers on serial machines.

Parallel algorithms and archtectures for computational structural mechanics

Science.gov (United States)

Patrick, Merrell; Ma, Shing; Mahajan, Umesh

1989-01-01

The determination of the fundamental (lowest) natural vibration frequencies and associated mode shapes is a key step used to uncover and correct potential failures or problem areas in most complex structures. However, the computation time taken by finite element codes to evaluate these natural frequencies is significant, often the most computationally intensive part of structural analysis calculations. There is continuing need to reduce this computation time. This study addresses this need by developing methods for parallel computation.
Sparse dictionary learning of resting state fMRI networks.

Science.gov (United States)

Eavani, Harini; Filipovych, Roman; Davatzikos, Christos; Satterthwaite, Theodore D; Gur, Raquel E; Gur, Ruben C

2012-07-02

Research in resting state fMRI (rsfMRI) has revealed the presence of stable, anti-correlated functional subnetworks in the brain. Task-positive networks are active during a cognitive process and are anti-correlated with task-negative networks, which are active during rest. In this paper, based on the assumption that the structure of the resting state functional brain connectivity is sparse, we utilize sparse dictionary modeling to identify distinct functional sub-networks. We propose two ways of formulating the sparse functional network learning problem that characterize the underlying functional connectivity from different perspectives. Our results show that the whole-brain functional connectivity can be concisely represented with highly modular, overlapping task-positive/negative pairs of sub-networks.
Massively Parallel and Scalable Implicit Time Integration Algorithms for Structural Dynamics

Science.gov (United States)

Farhat, Charbel

1997-01-01

Explicit codes are often used to simulate the nonlinear dynamics of large-scale structural systems, even for low frequency response, because the storage and CPU requirements entailed by the repeated factorizations traditionally found in implicit codes rapidly overwhelm the available computing resources. With the advent of parallel processing, this trend is accelerating because of the following additional facts: (a) explicit schemes are easier to parallelize than implicit ones, and (b) explicit schemes induce short range interprocessor communications that are relatively inexpensive, while the factorization methods used in most implicit schemes induce long range interprocessor communications that often ruin the sought-after speed-up. However, the time step restriction imposed by the Courant stability condition on all explicit schemes cannot yet be offset by the speed of the currently available parallel hardware. Therefore, it is essential to develop efficient alternatives to direct methods that are also amenable to massively parallel processing because implicit codes using unconditionally stable time-integration algorithms are computationally more efficient when simulating the low-frequency dynamics of aerospace structures.
Symmetric normalisation for intuitionistic logic

DEFF Research Database (Denmark)

Guenot, Nicolas; Straßburger, Lutz

2014-01-01

We present two proof systems for implication-only intuitionistic logic in the calculus of structures. The first is a direct adaptation of the standard sequent calculus to the deep inference setting, and we describe a procedure for cut elimination, similar to the one from the sequent calculus......, but using a non-local rewriting. The second system is the symmetric completion of the first, as normally given in deep inference for logics with a DeMorgan duality: all inference rules have duals, as cut is dual to the identity axiom. We prove a generalisation of cut elimination, that we call symmetric...
Discrete integration of continuous Kalman filtering equations for time invariant second-order structural systems

Science.gov (United States)

Park, K. C.; Belvin, W. Keith

1990-01-01

A general form for the first-order representation of the continuous second-order linear structural-dynamics equations is introduced to derive a corresponding form of first-order continuous Kalman filtering equations. Time integration of the resulting equations is carried out via a set of linear multistep integration formulas. It is shown that a judicious combined selection of computational paths and the undetermined matrices introduced in the general form of the first-order linear structural systems leads to a class of second-order discrete Kalman filtering equations involving only symmetric sparse N x N solution matrices.
Dynamic Representations of Sparse Graphs

DEFF Research Database (Denmark)

Brodal, Gerth Stølting; Fagerberg, Rolf

1999-01-01

We present a linear space data structure for maintaining graphs with bounded arboricity—a large class of sparse graphs containing e.g. planar graphs and graphs of bounded treewidth—under edge insertions, edge deletions, and adjacency queries. The data structure supports adjacency queries in worst...... case O(c) time, and edge insertions and edge deletions in amortized O(1) and O(c+log n) time, respectively, where n is the number of nodes in the graph, and c is the bound on the arboricity....
Parallel preconditioned conjugate gradient algorithm applied to neutron diffusion problem

International Nuclear Information System (INIS)

Majumdar, A.; Martin, W.R.

1992-01-01

Numerical solution of the neutron diffusion problem requires solving a linear system of equations such as Ax = b, where A is an n x n symmetric positive definite (SPD) matrix; x and b are vectors with n components. The preconditioned conjugate gradient (PCG) algorithm is an efficient iterative method for solving such a linear system of equations. In this paper, the authors describe the implementation of a parallel PCG algorithm on a shared memory machine (BBN TC2000) and on a distributed workstation (IBM RS6000) environment created by the parallel virtual machine parallelization software
Sparse PDF Volumes for Consistent Multi-Resolution Volume Rendering

KAUST Repository

Sicat, Ronell Barrera

2014-12-31

This paper presents a new multi-resolution volume representation called sparse pdf volumes, which enables consistent multi-resolution volume rendering based on probability density functions (pdfs) of voxel neighborhoods. These pdfs are defined in the 4D domain jointly comprising the 3D volume and its 1D intensity range. Crucially, the computation of sparse pdf volumes exploits data coherence in 4D, resulting in a sparse representation with surprisingly low storage requirements. At run time, we dynamically apply transfer functions to the pdfs using simple and fast convolutions. Whereas standard low-pass filtering and down-sampling incur visible differences between resolution levels, the use of pdfs facilitates consistent results independent of the resolution level used. We describe the efficient out-of-core computation of large-scale sparse pdf volumes, using a novel iterative simplification procedure of a mixture of 4D Gaussians. Finally, our data structure is optimized to facilitate interactive multi-resolution volume rendering on GPUs.
Development of structural schemes of parallel structure manipulators using screw calculus

Science.gov (United States)

Rashoyan, G. V.; Shalyukhin, K. A.; Gaponenko, EV

2018-03-01

The paper considers the approach to the structural analysis and synthesis of parallel structure robots based on the mathematical apparatus of groups of screws and on a concept of reciprocity of screws. The results are depicted of synthesis of parallel structure robots with different numbers of degrees of freedom, corresponding to the different groups of screws. Power screws are applied with this aim, based on the principle of static-kinematic analogy; the power screws are similar to the orts of axes of not driven kinematic pairs of a corresponding connecting chain. Accordingly, kinematic screws of the outlet chain of a robot are simultaneously determined which are reciprocal to power screws of kinematic sub-chains. Solution of certain synthesis problems is illustrated with practical applications. Closed groups of screws can have eight types. The three-membered groups of screws are of greatest significance, as well as four-membered screw groups [1] and six-membered screw groups. Three-membered screw groups correspond to progressively guiding mechanisms, to spherical mechanisms, and to planar mechanisms. The four-membered group corresponds to the motion of the SCARA robot. The six-membered group includes all possible motions. From the works of A.P. Kotelnikov, F.M. Dimentberg, it is known that closed fifth-order screw groups do not exist. The article presents examples of the mechanisms corresponding to the given groups.
A redundant, 6-DOF parallel manipulator structure with improved workspace and dexterity

International Nuclear Information System (INIS)

Stoughton, R.S.; Salerno, R.; Canfield, S.; Reinholtz, C.

1994-08-01

This paper presents a novel manipulator structure which combines two known parallel manipulator structures--a Stewart Platform (SP), and a double octahedral Variable Geometry Truss (VGT). The combined VGT + SP structure is redundant, using nine actuators to realize six-DOF motion. Combining the two structures allows the translational and orientational workspaces of the two individual structures to sum together to a much larger workspace than is generally achievable with parallel manipulator structures. In addition, the VGT portion of the structure allows the configuration of the Stewart Platform to be changed ''on the fly'' from one with a large workspace to one with high dexterity. A useful application of this structure is at the distal end of a truss-based manipulator, where it can serve as a dexterous wrist while preserving an internal passageway for cabling and/or conveyance systems
Linkage mechanisms in the vertebrate skull: Structure and function of three-dimensional, parallel transmission systems.

Science.gov (United States)

Olsen, Aaron M; Westneat, Mark W

2016-12-01

Many musculoskeletal systems, including the skulls of birds, fishes, and some lizards consist of interconnected chains of mobile skeletal elements, analogous to linkage mechanisms used in engineering. Biomechanical studies have applied linkage models to a diversity of musculoskeletal systems, with previous applications primarily focusing on two-dimensional linkage geometries, bilaterally symmetrical pairs of planar linkages, or single four-bar linkages. Here, we present new, three-dimensional (3D), parallel linkage models of the skulls of birds and fishes and use these models (available as free kinematic simulation software), to investigate structure-function relationships in these systems. This new computational framework provides an accessible and integrated workflow for exploring the evolution of structure and function in complex musculoskeletal systems. Linkage simulations show that kinematic transmission, although a suitable functional metric for linkages with single rotating input and output links, can give misleading results when applied to linkages with substantial translational components or multiple output links. To take into account both linear and rotational displacement we define force mechanical advantage for a linkage (analogous to lever mechanical advantage) and apply this metric to measure transmission efficiency in the bird cranial mechanism. For linkages with multiple, expanding output points we propose a new functional metric, expansion advantage, to measure expansion amplification and apply this metric to the buccal expansion mechanism in fishes. Using the bird cranial linkage model, we quantify the inaccuracies that result from simplifying a 3D geometry into two dimensions. We also show that by combining single-chain linkages into parallel linkages, more links can be simulated while decreasing or maintaining the same number of input parameters. This generalized framework for linkage simulation and analysis can accommodate linkages of differing
MULTISCALE SPARSE APPEARANCE MODELING AND SIMULATION OF PATHOLOGICAL DEFORMATIONS

Directory of Open Access Journals (Sweden)

Rami Zewail

2017-08-01

Full Text Available Machine learning and statistical modeling techniques has drawn much interest within the medical imaging research community. However, clinically-relevant modeling of anatomical structures continues to be a challenging task. This paper presents a novel method for multiscale sparse appearance modeling in medical images with application to simulation of pathological deformations in X-ray images of human spine. The proposed appearance model benefits from the non-linear approximation power of Contourlets and its ability to capture higher order singularities to achieve a sparse representation while preserving the accuracy of the statistical model. Independent Component Analysis is used to extract statistical independent modes of variations from the sparse Contourlet-based domain. The new model is then used to simulate clinically-relevant pathological deformations in radiographic images.
Symmetric waterbomb origami.

Science.gov (United States)

Chen, Yan; Feng, Huijuan; Ma, Jiayao; Peng, Rui; You, Zhong

2016-06-01

The traditional waterbomb origami, produced from a pattern consisting of a series of vertices where six creases meet, is one of the most widely used origami patterns. From a rigid origami viewpoint, it generally has multiple degrees of freedom, but when the pattern is folded symmetrically, the mobility reduces to one. This paper presents a thorough kinematic investigation on symmetric folding of the waterbomb pattern. It has been found that the pattern can have two folding paths under certain circumstance. Moreover, the pattern can be used to fold thick panels. Not only do the additional constraints imposed to fold the thick panels lead to single degree of freedom folding, but the folding process is also kinematically equivalent to the origami of zero-thickness sheets. The findings pave the way for the pattern being readily used to fold deployable structures ranging from flat roofs to large solar panels.
Symmetric imaging findings in neuroradiology

International Nuclear Information System (INIS)

Zlatareva, D.

2015-01-01

Full text: Learning objectives: to make a list of diseases and syndromes which manifest as bilateral symmetric findings on computed tomography and magnetic resonance imaging; to discuss the clinical and radiological differential diagnosis for these diseases; to explain which of these conditions necessitates urgent therapy and when additional studies and laboratory can precise diagnosis. There is symmetry in human body and quite often we compare the affected side to the normal one but in neuroradiology we might have bilateral findings which affected pair structures or corresponding anatomic areas. It is very rare when clinical data prompt diagnosis. Usually clinicians suspect such an involvement but Ct and MRI can reveal symmetric changes and are one of the leading diagnostic tool. The most common location of bilateral findings is basal ganglia and thalamus. There are a number of diseases affecting these structures symmetrically: metabolic and systemic diseases, intoxication, neurodegeneration and vascular conditions, toxoplasmosis, tumors and some infections. Malformations of cortical development and especially bilateral perisylvian polymicrogyria requires not only exact report on the most affected parts but in some cases genetic tests or combination with other clinical symptoms. In the case of herpes simplex encephalitis bilateral temporal involvement is common and this finding very often prompt therapy even before laboratory results. Posterior reversible encephalopathy syndrome (PReS) and some forms of hypoxic ischemic encephalopathy can lead to symmetric changes. In these acute conditions MR plays a crucial role not only in diagnosis but also in monitoring of the therapeutic effect. Patients with neurofibromatosis type 1 or type 2 can demonstrate bilateral optic glioma combined with spinal neurofibroma and bilateral acoustic schwanoma respectively. Mirror-image aneurysm affecting both internal carotid or middle cerebral arteries is an example of symmetry in
Fast Sparse Coding for Range Data Denoising with Sparse Ridges Constraint

Directory of Open Access Journals (Sweden)

Zhi Gao

2018-05-01

Full Text Available Light detection and ranging (LiDAR sensors have been widely deployed on intelligent systems such as unmanned ground vehicles (UGVs and unmanned aerial vehicles (UAVs to perform localization, obstacle detection, and navigation tasks. Thus, research into range data processing with competitive performance in terms of both accuracy and efficiency has attracted increasing attention. Sparse coding has revolutionized signal processing and led to state-of-the-art performance in a variety of applications. However, dictionary learning, which plays the central role in sparse coding techniques, is computationally demanding, resulting in its limited applicability in real-time systems. In this study, we propose sparse coding algorithms with a fixed pre-learned ridge dictionary to realize range data denoising via leveraging the regularity of laser range measurements in man-made environments. Experiments on both synthesized data and real data demonstrate that our method obtains accuracy comparable to that of sophisticated sparse coding methods, but with much higher computational efficiency.
Fast Sparse Coding for Range Data Denoising with Sparse Ridges Constraint.

Science.gov (United States)

Gao, Zhi; Lao, Mingjie; Sang, Yongsheng; Wen, Fei; Ramesh, Bharath; Zhai, Ruifang

2018-05-06

Light detection and ranging (LiDAR) sensors have been widely deployed on intelligent systems such as unmanned ground vehicles (UGVs) and unmanned aerial vehicles (UAVs) to perform localization, obstacle detection, and navigation tasks. Thus, research into range data processing with competitive performance in terms of both accuracy and efficiency has attracted increasing attention. Sparse coding has revolutionized signal processing and led to state-of-the-art performance in a variety of applications. However, dictionary learning, which plays the central role in sparse coding techniques, is computationally demanding, resulting in its limited applicability in real-time systems. In this study, we propose sparse coding algorithms with a fixed pre-learned ridge dictionary to realize range data denoising via leveraging the regularity of laser range measurements in man-made environments. Experiments on both synthesized data and real data demonstrate that our method obtains accuracy comparable to that of sophisticated sparse coding methods, but with much higher computational efficiency.
NUFFT-Based Iterative Image Reconstruction via Alternating Direction Total Variation Minimization for Sparse-View CT

Directory of Open Access Journals (Sweden)

Bin Yan

2015-01-01

Full Text Available Sparse-view imaging is a promising scanning method which can reduce the radiation dose in X-ray computed tomography (CT. Reconstruction algorithm for sparse-view imaging system is of significant importance. The adoption of the spatial iterative algorithm for CT image reconstruction has a low operation efficiency and high computation requirement. A novel Fourier-based iterative reconstruction technique that utilizes nonuniform fast Fourier transform is presented in this study along with the advanced total variation (TV regularization for sparse-view CT. Combined with the alternating direction method, the proposed approach shows excellent efficiency and rapid convergence property. Numerical simulations and real data experiments are performed on a parallel beam CT. Experimental results validate that the proposed method has higher computational efficiency and better reconstruction quality than the conventional algorithms, such as simultaneous algebraic reconstruction technique using TV method and the alternating direction total variation minimization approach, with the same time duration. The proposed method appears to have extensive applications in X-ray CT imaging.
When sparse coding meets ranking: a joint framework for learning sparse codes and ranking scores

KAUST Repository

Wang, Jim Jing-Yan

2017-06-28

Sparse coding, which represents a data point as a sparse reconstruction code with regard to a dictionary, has been a popular data representation method. Meanwhile, in database retrieval problems, learning the ranking scores from data points plays an important role. Up to now, these two problems have always been considered separately, assuming that data coding and ranking are two independent and irrelevant problems. However, is there any internal relationship between sparse coding and ranking score learning? If yes, how to explore and make use of this internal relationship? In this paper, we try to answer these questions by developing the first joint sparse coding and ranking score learning algorithm. To explore the local distribution in the sparse code space, and also to bridge coding and ranking problems, we assume that in the neighborhood of each data point, the ranking scores can be approximated from the corresponding sparse codes by a local linear function. By considering the local approximation error of ranking scores, the reconstruction error and sparsity of sparse coding, and the query information provided by the user, we construct a unified objective function for learning of sparse codes, the dictionary and ranking scores. We further develop an iterative algorithm to solve this optimization problem.
Parallel ray tracing for one-dimensional discrete ordinate computations

International Nuclear Information System (INIS)

Jarvis, R.D.; Nelson, P.

1996-01-01

The ray-tracing sweep in discrete-ordinates, spatially discrete numerical approximation methods applied to the linear, steady-state, plane-parallel, mono-energetic, azimuthally symmetric, neutral-particle transport equation can be reduced to a parallel prefix computation. In so doing, the often severe penalty in convergence rate of the source iteration, suffered by most current parallel algorithms using spatial domain decomposition, can be avoided while attaining parallelism in the spatial domain to whatever extent desired. In addition, the reduction implies parallel algorithm complexity limits for the ray-tracing sweep. The reduction applies to all closed, linear, one-cell functional (CLOF) spatial approximation methods, which encompasses most in current popular use. Scalability test results of an implementation of the algorithm on a 64-node nCube-2S hypercube-connected, message-passing, multi-computer are described. (author)
Epileptic Seizure Detection with Log-Euclidean Gaussian Kernel-Based Sparse Representation.

Science.gov (United States)

Yuan, Shasha; Zhou, Weidong; Wu, Qi; Zhang, Yanli

2016-05-01

Epileptic seizure detection plays an important role in the diagnosis of epilepsy and reducing the massive workload of reviewing electroencephalography (EEG) recordings. In this work, a novel algorithm is developed to detect seizures employing log-Euclidean Gaussian kernel-based sparse representation (SR) in long-term EEG recordings. Unlike the traditional SR for vector data in Euclidean space, the log-Euclidean Gaussian kernel-based SR framework is proposed for seizure detection in the space of the symmetric positive definite (SPD) matrices, which form a Riemannian manifold. Since the Riemannian manifold is nonlinear, the log-Euclidean Gaussian kernel function is applied to embed it into a reproducing kernel Hilbert space (RKHS) for performing SR. The EEG signals of all channels are divided into epochs and the SPD matrices representing EEG epochs are generated by covariance descriptors. Then, the testing samples are sparsely coded over the dictionary composed by training samples utilizing log-Euclidean Gaussian kernel-based SR. The classification of testing samples is achieved by computing the minimal reconstructed residuals. The proposed method is evaluated on the Freiburg EEG dataset of 21 patients and shows its notable performance on both epoch-based and event-based assessments. Moreover, this method handles multiple channels of EEG recordings synchronously which is more speedy and efficient than traditional seizure detection methods.

Efficient sparse matrix-matrix multiplication for computing periodic responses by shooting method on Intel Xeon Phi

Science.gov (United States)

Stoykov, S.; Atanassov, E.; Margenov, S.

2016-10-01

Many of the scientific applications involve sparse or dense matrix operations, such as solving linear systems, matrix-matrix products, eigensolvers, etc. In what concerns structural nonlinear dynamics, the computations of periodic responses and the determination of stability of the solution are of primary interest. Shooting method iswidely used for obtaining periodic responses of nonlinear systems. The method involves simultaneously operations with sparse and dense matrices. One of the computationally expensive operations in the method is multiplication of sparse by dense matrices. In the current work, a new algorithm for sparse matrix by dense matrix products is presented. The algorithm takes into account the structure of the sparse matrix, which is obtained by space discretization of the nonlinear Mindlin's plate equation of motion by the finite element method. The algorithm is developed to use the vector engine of Intel Xeon Phi coprocessors. It is compared with the standard sparse matrix by dense matrix algorithm and the one developed by Intel MKL and it is shown that by considering the properties of the sparse matrix better algorithms can be developed.
Parallel processing for nonlinear dynamics simulations of structures including rotating bladed-disk assemblies

Science.gov (United States)

Hsieh, Shang-Hsien

1993-01-01

The principal objective of this research is to develop, test, and implement coarse-grained, parallel-processing strategies for nonlinear dynamic simulations of practical structural problems. There are contributions to four main areas: finite element modeling and analysis of rotational dynamics, numerical algorithms for parallel nonlinear solutions, automatic partitioning techniques to effect load-balancing among processors, and an integrated parallel analysis system.
Vector sparse representation of color image using quaternion matrix analysis.

Science.gov (United States)

Xu, Yi; Yu, Licheng; Xu, Hongteng; Zhang, Hao; Nguyen, Truong

2015-04-01

Traditional sparse image models treat color image pixel as a scalar, which represents color channels separately or concatenate color channels as a monochrome image. In this paper, we propose a vector sparse representation model for color images using quaternion matrix analysis. As a new tool for color image representation, its potential applications in several image-processing tasks are presented, including color image reconstruction, denoising, inpainting, and super-resolution. The proposed model represents the color image as a quaternion matrix, where a quaternion-based dictionary learning algorithm is presented using the K-quaternion singular value decomposition (QSVD) (generalized K-means clustering for QSVD) method. It conducts the sparse basis selection in quaternion space, which uniformly transforms the channel images to an orthogonal color space. In this new color space, it is significant that the inherent color structures can be completely preserved during vector reconstruction. Moreover, the proposed sparse model is more efficient comparing with the current sparse models for image restoration tasks due to lower redundancy between the atoms of different color channels. The experimental results demonstrate that the proposed sparse image model avoids the hue bias issue successfully and shows its potential as a general and powerful tool in color image analysis and processing domain.
Facade Layout Symmetrization

KAUST Repository

Jiang, Haiyong

2016-04-11

We present an automatic algorithm for symmetrizing facade layouts. Our method symmetrizes a given facade layout while minimally modifying the original layout. Based on the principles of symmetry in urban design, we formulate the problem of facade layout symmetrization as an optimization problem. Our system further enhances the regularity of the final layout by redistributing and aligning boxes in the layout. We demonstrate that the proposed solution can generate symmetric facade layouts efficiently. © 2015 IEEE.
Facade Layout Symmetrization

KAUST Repository

Jiang, Haiyong; Dong, Weiming; Yan, Dongming; Zhang, Xiaopeng

2016-01-01

We present an automatic algorithm for symmetrizing facade layouts. Our method symmetrizes a given facade layout while minimally modifying the original layout. Based on the principles of symmetry in urban design, we formulate the problem of facade layout symmetrization as an optimization problem. Our system further enhances the regularity of the final layout by redistributing and aligning boxes in the layout. We demonstrate that the proposed solution can generate symmetric facade layouts efficiently. © 2015 IEEE.
Symmetric cryptographic protocols

CERN Document Server

Ramkumar, Mahalingam

2014-01-01

This book focuses on protocols and constructions that make good use of symmetric pseudo random functions (PRF) like block ciphers and hash functions - the building blocks for symmetric cryptography. Readers will benefit from detailed discussion of several strategies for utilizing symmetric PRFs. Coverage includes various key distribution strategies for unicast, broadcast and multicast security, and strategies for constructing efficient digests of dynamic databases using binary hash trees. • Provides detailed coverage of symmetric key protocols • Describes various applications of symmetric building blocks • Includes strategies for constructing compact and efficient digests of dynamic databases
Development of an efficient iterative solver for linear systems in FE structural analysis

International Nuclear Information System (INIS)

Saint-Georges, P.; Warzee, G.; Beauwens, R.; Notay, Y.

1993-01-01

The preconditioned conjugate gradient is a well-known and powerful method to solve sparse symmetric positive definite systems of linear equations. Such systems are generated by the finite element discretization in structural analysis but users of finite element in this context generally still rely on direct methods. It is our purpose in the present paper to highlight the improvement brought forward by some new preconditioning techniques and show that the preconditioned conjugate gradient method is more performant than any direct method. (author)
Domain decomposition parallel computing for transient two-phase flow of nuclear reactors

Energy Technology Data Exchange (ETDEWEB)

Lee, Jae Ryong; Yoon, Han Young [KAERI, Daejeon (Korea, Republic of); Choi, Hyoung Gwon [Seoul National University, Seoul (Korea, Republic of)

2016-05-15

KAERI (Korea Atomic Energy Research Institute) has been developing a multi-dimensional two-phase flow code named CUPID for multi-physics and multi-scale thermal hydraulics analysis of Light water reactors (LWRs). The CUPID code has been validated against a set of conceptual problems and experimental data. In this work, the CUPID code has been parallelized based on the domain decomposition method with Message passing interface (MPI) library. For domain decomposition, the CUPID code provides both manual and automatic methods with METIS library. For the effective memory management, the Compressed sparse row (CSR) format is adopted, which is one of the methods to represent the sparse asymmetric matrix. CSR format saves only non-zero value and its position (row and column). By performing the verification for the fundamental problem set, the parallelization of the CUPID has been successfully confirmed. Since the scalability of a parallel simulation is generally known to be better for fine mesh system, three different scales of mesh system are considered: 40000 meshes for coarse mesh system, 320000 meshes for mid-size mesh system, and 2560000 meshes for fine mesh system. In the given geometry, both single- and two-phase calculations were conducted. In addition, two types of preconditioners for a matrix solver were compared: Diagonal and incomplete LU preconditioner. In terms of enhancement of the parallel performance, the OpenMP and MPI hybrid parallel computing for a pressure solver was examined. It is revealed that the scalability of hybrid calculation was enhanced for the multi-core parallel computation.
The effect of earthquake on architecture geometry with non-parallel system irregularity configuration

Science.gov (United States)

Teddy, Livian; Hardiman, Gagoek; Nuroji; Tudjono, Sri

2017-12-01

Indonesia is an area prone to earthquake that may cause casualties and damage to buildings. The fatalities or the injured are not largely caused by the earthquake, but by building collapse. The collapse of the building is resulted from the building behaviour against the earthquake, and it depends on many factors, such as architectural design, geometry configuration of structural elements in horizontal and vertical plans, earthquake zone, geographical location (distance to earthquake center), soil type, material quality, and construction quality. One of the geometry configurations that may lead to the collapse of the building is irregular configuration of non-parallel system. In accordance with FEMA-451B, irregular configuration in non-parallel system is defined to have existed if the vertical lateral force-retaining elements are neither parallel nor symmetric with main orthogonal axes of the earthquake-retaining axis system. Such configuration may lead to torque, diagonal translation and local damage to buildings. It does not mean that non-parallel irregular configuration should not be formed on architectural design; however the designer must know the consequence of earthquake behaviour against buildings with irregular configuration of non-parallel system. The present research has the objective to identify earthquake behaviour in architectural geometry with irregular configuration of non-parallel system. The present research was quantitative with simulation experimental method. It consisted of 5 models, where architectural data and model structure data were inputted and analyzed using the software SAP2000 in order to find out its performance, and ETAB2015 to determine the eccentricity occurred. The output of the software analysis was tabulated, graphed, compared and analyzed with relevant theories. For areas of strong earthquake zones, avoid designing buildings which wholly form irregular configuration of non-parallel system. If it is inevitable to design a
Vortex structure behind highly heated two cylinders in parallel arrangements

International Nuclear Information System (INIS)

Kurita, Eiichirou; Yahagi, Yuji

2008-01-01

Vortex structures behind twin, highly heated cylinders in parallel arrangements have been investigated experimentally. The experiments were conducted under the following conditions: cylinder diameter, D=4 mm; mean flow velocity, U ∞ =1.0 m/s; Reynolds number, Re=250; cylinder clearance, S/D=0.5 - 1.4; and cylinder heat flux, q=0 - 72.6 kW/m 2 . For S/D > 1.2, the Karman vortex street is formed alternately behind each cylinder divided on the slit flow. The slit flow velocity increases with a decrease in S/D and decreases with increasing heat flux. For S/D 2 ). As a result, the increased local kinematic viscosity and S/D play a key role for the vortex structure and formation behind arrangements of two parallel cylinders. (author)
Structural Synthesis of 3-DoF Spatial Fully Parallel Manipulators

Directory of Open Access Journals (Sweden)

Alfonso Hernandez

2014-07-01

Full Text Available In this paper, the architectures of three degrees of freedom (3-DoF spatial, fully parallel manipulators (PMs, whose limbs are structurally identical, are obtained systematically. To do this, the methodology followed makes use of the concepts of the displacement group theory of rigid body motion. This theory works with so-called ‘motion generators’. That is, every limb is a kinematic chain that produces a certain type of displacement in the mobile platform or end-effector. The laws of group algebra will determine the actual motion pattern of the end-effector. The structural synthesis is a combinatorial process of different kinematic chains’ topologies employed in order to get all of the 3-DoF motion pattern possibilities in the end-effector of the fully parallel manipulator.
Are both symmetric and buckled dimers on Si(100) minima? Density functional and multireference perturbation theory calculations

International Nuclear Information System (INIS)

Jung, Yousung; Shao, Yihan; Gordon, Mark S.; Doren, Douglas J.; Head-Gordon, Martin

2003-01-01

We report a spin-unrestricted density functional theory (DFT) solution at the symmetric dimer structure for cluster models of Si(100). With this solution, it is shown that the symmetric structure is a minimum on the DFT potential energy surface, although higher in energy than the buckled structure. In restricted DFT calculations the symmetric structure is a saddle point connecting the two buckled minima. To further assess the effects of electron correlation on the relative energies of symmetric versus buckled dimers on Si(100), multireference second order perturbation theory (MRMP2) calculations are performed on these DFT optimized minima. The symmetric structure is predicted to be lower in energy than the buckled structure via MRMP2, while the reverse order is found by DFT. The implications for recent experimental interpretations are discussed
Inter-dot coupling effects on transport through correlated parallel

Indian Academy of Sciences (India)

Transport through symmetric parallel coupled quantum dot system has been studied, using non-equilibrium Green function formalism. The inter-dot tunnelling with on-dot and inter-dot Coulomb repulsion is included. The transmission coefficient and Landaur–Buttiker like current formula are shown in terms of internal states ...
A Parallel Algebraic Multigrid Solver on Graphics Processing Units

KAUST Repository

Haase, Gundolf

2010-01-01

The paper presents a multi-GPU implementation of the preconditioned conjugate gradient algorithm with an algebraic multigrid preconditioner (PCG-AMG) for an elliptic model problem on a 3D unstructured grid. An efficient parallel sparse matrix-vector multiplication scheme underlying the PCG-AMG algorithm is presented for the many-core GPU architecture. A performance comparison of the parallel solver shows that a singe Nvidia Tesla C1060 GPU board delivers the performance of a sixteen node Infiniband cluster and a multi-GPU configuration with eight GPUs is about 100 times faster than a typical server CPU core. © 2010 Springer-Verlag.
Selectivity and sparseness in randomly connected balanced networks.

Directory of Open Access Journals (Sweden)

Cengiz Pehlevan

Full Text Available Neurons in sensory cortex show stimulus selectivity and sparse population response, even in cases where no strong functionally specific structure in connectivity can be detected. This raises the question whether selectivity and sparseness can be generated and maintained in randomly connected networks. We consider a recurrent network of excitatory and inhibitory spiking neurons with random connectivity, driven by random projections from an input layer of stimulus selective neurons. In this architecture, the stimulus-to-stimulus and neuron-to-neuron modulation of total synaptic input is weak compared to the mean input. Surprisingly, we show that in the balanced state the network can still support high stimulus selectivity and sparse population response. In the balanced state, strong synapses amplify the variation in synaptic input and recurrent inhibition cancels the mean. Functional specificity in connectivity emerges due to the inhomogeneity caused by the generative statistical rule used to build the network. We further elucidate the mechanism behind and evaluate the effects of model parameters on population sparseness and stimulus selectivity. Network response to mixtures of stimuli is investigated. It is shown that a balanced state with unselective inhibition can be achieved with densely connected input to inhibitory population. Balanced networks exhibit the "paradoxical" effect: an increase in excitatory drive to inhibition leads to decreased inhibitory population firing rate. We compare and contrast selectivity and sparseness generated by the balanced network to randomly connected unbalanced networks. Finally, we discuss our results in light of experiments.
Highly-Accelerated Real-Time Cardiac Cine MRI Using k-t SPARSE-SENSE

Science.gov (United States)

Feng, Li; Srichai, Monvadi B.; Lim, Ruth P.; Harrison, Alexis; King, Wilson; Adluru, Ganesh; Dibella, Edward VR.; Sodickson, Daniel K.; Otazo, Ricardo; Kim, Daniel

2012-01-01

For patients with impaired breath-hold capacity and/or arrhythmias, real-time cine MRI may be more clinically useful than breath-hold cine MRI. However, commercially available real-time cine MRI methods using parallel imaging typically yield relatively poor spatio-temporal resolution due to their low image acquisition speed. We sought to achieve relatively high spatial resolution (~2.5mm × 2.5mm) and temporal resolution (~40ms), to produce high-quality real-time cine MR images that could be applied clinically for wall motion assessment and measurement of left ventricular (LV) function. In this work, we present an 8-fold accelerated real-time cardiac cine MRI pulse sequence using a combination of compressed sensing and parallel imaging (k-t SPARSE-SENSE). Compared with reference, breath-hold cine MRI, our 8-fold accelerated real-time cine MRI produced significantly worse qualitative grades (1–5 scale), but its image quality and temporal fidelity scores were above 3.0 (adequate) and artifacts and noise scores were below 3.0 (moderate), suggesting that acceptable diagnostic image quality can be achieved. Additionally, both 8-fold accelerated real-time cine and breath-hold cine MRI yielded comparable LV function measurements, with coefficient of variation cine MRI with k-t SPARSE-SENSE is a promising modality for rapid imaging of myocardial function. PMID:22887290
Highly accelerated real-time cardiac cine MRI using k-t SPARSE-SENSE.

Science.gov (United States)

Feng, Li; Srichai, Monvadi B; Lim, Ruth P; Harrison, Alexis; King, Wilson; Adluru, Ganesh; Dibella, Edward V R; Sodickson, Daniel K; Otazo, Ricardo; Kim, Daniel

2013-07-01

For patients with impaired breath-hold capacity and/or arrhythmias, real-time cine MRI may be more clinically useful than breath-hold cine MRI. However, commercially available real-time cine MRI methods using parallel imaging typically yield relatively poor spatio-temporal resolution due to their low image acquisition speed. We sought to achieve relatively high spatial resolution (∼2.5 × 2.5 mm(2)) and temporal resolution (∼40 ms), to produce high-quality real-time cine MR images that could be applied clinically for wall motion assessment and measurement of left ventricular function. In this work, we present an eightfold accelerated real-time cardiac cine MRI pulse sequence using a combination of compressed sensing and parallel imaging (k-t SPARSE-SENSE). Compared with reference, breath-hold cine MRI, our eightfold accelerated real-time cine MRI produced significantly worse qualitative grades (1-5 scale), but its image quality and temporal fidelity scores were above 3.0 (adequate) and artifacts and noise scores were below 3.0 (moderate), suggesting that acceptable diagnostic image quality can be achieved. Additionally, both eightfold accelerated real-time cine and breath-hold cine MRI yielded comparable left ventricular function measurements, with coefficient of variation cine MRI with k-t SPARSE-SENSE is a promising modality for rapid imaging of myocardial function. Copyright © 2012 Wiley Periodicals, Inc.
Symmetric nonnegative matrix factorization: algorithms and applications to probabilistic clustering.

Science.gov (United States)

He, Zhaoshui; Xie, Shengli; Zdunek, Rafal; Zhou, Guoxu; Cichocki, Andrzej

2011-12-01

Nonnegative matrix factorization (NMF) is an unsupervised learning method useful in various applications including image processing and semantic analysis of documents. This paper focuses on symmetric NMF (SNMF), which is a special case of NMF decomposition. Three parallel multiplicative update algorithms using level 3 basic linear algebra subprograms directly are developed for this problem. First, by minimizing the Euclidean distance, a multiplicative update algorithm is proposed, and its convergence under mild conditions is proved. Based on it, we further propose another two fast parallel methods: α-SNMF and β -SNMF algorithms. All of them are easy to implement. These algorithms are applied to probabilistic clustering. We demonstrate their effectiveness for facial image clustering, document categorization, and pattern clustering in gene expression.
Parallel Implicit Algorithms for CFD

Science.gov (United States)

Keyes, David E.

1998-01-01

The main goal of this project was efficient distributed parallel and workstation cluster implementations of Newton-Krylov-Schwarz (NKS) solvers for implicit Computational Fluid Dynamics (CFD.) "Newton" refers to a quadratically convergent nonlinear iteration using gradient information based on the true residual, "Krylov" to an inner linear iteration that accesses the Jacobian matrix only through highly parallelizable sparse matrix-vector products, and "Schwarz" to a domain decomposition form of preconditioning the inner Krylov iterations with primarily neighbor-only exchange of data between the processors. Prior experience has established that Newton-Krylov methods are competitive solvers in the CFD context and that Krylov-Schwarz methods port well to distributed memory computers. The combination of the techniques into Newton-Krylov-Schwarz was implemented on 2D and 3D unstructured Euler codes on the parallel testbeds that used to be at LaRC and on several other parallel computers operated by other agencies or made available by the vendors. Early implementations were made directly in Massively Parallel Integration (MPI) with parallel solvers we adapted from legacy NASA codes and enhanced for full NKS functionality. Later implementations were made in the framework of the PETSC library from Argonne National Laboratory, which now includes pseudo-transient continuation Newton-Krylov-Schwarz solver capability (as a result of demands we made upon PETSC during our early porting experiences). A secondary project pursued with funding from this contract was parallel implicit solvers in acoustics, specifically in the Helmholtz formulation. A 2D acoustic inverse problem has been solved in parallel within the PETSC framework.
Slowness and sparseness have diverging effects on complex cell learning.

Directory of Open Access Journals (Sweden)

Jörn-Philipp Lies

2014-03-01

Full Text Available Following earlier studies which showed that a sparse coding principle may explain the receptive field properties of complex cells in primary visual cortex, it has been concluded that the same properties may be equally derived from a slowness principle. In contrast to this claim, we here show that slowness and sparsity drive the representations towards substantially different receptive field properties. To do so, we present complete sets of basis functions learned with slow subspace analysis (SSA in case of natural movies as well as translations, rotations, and scalings of natural images. SSA directly parallels independent subspace analysis (ISA with the only difference that SSA maximizes slowness instead of sparsity. We find a large discrepancy between the filter shapes learned with SSA and ISA. We argue that SSA can be understood as a generalization of the Fourier transform where the power spectrum corresponds to the maximally slow subspace energies in SSA. Finally, we investigate the trade-off between slowness and sparseness when combined in one objective function.

Normalization for sparse encoding of odors by a wide-field interneuron.

Science.gov (United States)

Papadopoulou, Maria; Cassenaer, Stijn; Nowotny, Thomas; Laurent, Gilles

2011-05-06

Sparse coding presents practical advantages for sensory representations and memory storage. In the insect olfactory system, the representation of general odors is dense in the antennal lobes but sparse in the mushroom bodies, only one synapse downstream. In locusts, this transformation relies on the oscillatory structure of antennal lobe output, feed-forward inhibitory circuits, intrinsic properties of mushroom body neurons, and connectivity between antennal lobe and mushroom bodies. Here we show the existence of a normalizing negative-feedback loop within the mushroom body to maintain sparse output over a wide range of input conditions. This loop consists of an identifiable "giant" nonspiking inhibitory interneuron with ubiquitous connectivity and graded release properties.
A hybrid method for the parallel computation of Green's functions

International Nuclear Information System (INIS)

Petersen, Dan Erik; Li Song; Stokbro, Kurt; Sorensen, Hans Henrik B.; Hansen, Per Christian; Skelboe, Stig; Darve, Eric

2009-01-01

Quantum transport models for nanodevices using the non-equilibrium Green's function method require the repeated calculation of the block tridiagonal part of the Green's and lesser Green's function matrices. This problem is related to the calculation of the inverse of a sparse matrix. Because of the large number of times this calculation needs to be performed, this is computationally very expensive even on supercomputers. The classical approach is based on recurrence formulas which cannot be efficiently parallelized. This practically prevents the solution of large problems with hundreds of thousands of atoms. We propose new recurrences for a general class of sparse matrices to calculate Green's and lesser Green's function matrices which extend formulas derived by Takahashi and others. We show that these recurrences may lead to a dramatically reduced computational cost because they only require computing a small number of entries of the inverse matrix. Then, we propose a parallelization strategy for block tridiagonal matrices which involves a combination of Schur complement calculations and cyclic reduction. It achieves good scalability even on problems of modest size.
Energy Scaling Advantages of Resistive Memory Crossbar Based Computation and its Application to Sparse Coding

Directory of Open Access Journals (Sweden)

Sapan eAgarwal

2016-01-01

Full Text Available The exponential increase in data over the last decade presents a significant challenge to analytics efforts that seek to process and interpret such data for various applications. Neural-inspired computing approaches are being developed in order to leverage the computational advantages of the analog, low-power data processing observed in biological systems. Analog resistive memory crossbars can perform a parallel read or a vector-matrix multiplication as well as a parallel write or a rank-1 update with high computational efficiency. For an NxN crossbar, these two kernels are at a minimum O(N more energy efficient than a digital memory-based architecture. If the read operation is noise limited, the energy to read a column can be independent of the crossbar size (O(1. These two kernels form the basis of many neuromorphic algorithms such as image, text, and speech recognition. For instance, these kernels can be applied to a neural sparse coding algorithm to give an O(N reduction in energy for the entire algorithm. Sparse coding is a rich problem with a host of applications including computer vision, object tracking, and more generally unsupervised learning.
Effects of parallel dynamics on vortex structures in electron temperature gradient driven turbulence

International Nuclear Information System (INIS)

Nakata, M.; Watanabe, T.-H.; Sugama, H.; Horton, W.

2011-01-01

Vortex structures and related heat transport properties in slab electron temperature gradient (ETG) driven turbulence are comprehensively investigated by means of nonlinear gyrokinetic Vlasov simulations, with the aim of elucidating the underlying physical mechanisms of the transition from turbulent to coherent states. Numerical results show three different types of vortex structures, i.e., coherent vortex streets accompanied with the transport reduction, turbulent vortices with steady transport, and a zonal-flow-dominated state, depending on the relative magnitude of the parallel compression to the diamagnetic drift. In particular, the formation of coherent vortex streets is correlated with the strong generation of zonal flows for the cases with weak parallel compression, even though the maximum growth rate of linear ETG modes is relatively large. The zonal flow generation in the ETG turbulence is investigated by the modulational instability analysis with a truncated fluid model, where the parallel dynamics such as acoustic modes for electrons is incorporated. The modulational instability for zonal flows is found to be stabilized by the effect of the finite parallel compression. The theoretical analysis qualitatively agrees with secondary growth of zonal flows found in the slab ETG turbulence simulations, where the transition of vortex structures is observed.
Turbulent flows over sparse canopies

Science.gov (United States)

Sharma, Akshath; García-Mayoral, Ricardo

2018-04-01

Turbulent flows over sparse and dense canopies exerting a similar drag force on the flow are investigated using Direct Numerical Simulations. The dense canopies are modelled using a homogeneous drag force, while for the sparse canopy, the geometry of the canopy elements is represented. It is found that on using the friction velocity based on the local shear at each height, the streamwise velocity fluctuations and the Reynolds stress within the sparse canopy are similar to those from a comparable smooth-wall case. In addition, when scaled with the local friction velocity, the intensity of the off-wall peak in the streamwise vorticity for sparse canopies also recovers a value similar to a smooth-wall. This indicates that the sparse canopy does not significantly disturb the near-wall turbulence cycle, but causes its rescaling to an intensity consistent with a lower friction velocity within the canopy. In comparison, the dense canopy is found to have a higher damping effect on the turbulent fluctuations. For the case of the sparse canopy, a peak in the spectral energy density of the wall-normal velocity, and Reynolds stress is observed, which may indicate the formation of Kelvin-Helmholtz-like instabilities. It is also found that a sparse canopy is better modelled by a homogeneous drag applied on the mean flow alone, and not the turbulent fluctuations.
Performance optimization of Sparse Matrix-Vector Multiplication for multi-component PDE-based applications using GPUs

KAUST Repository

Abdelfattah, Ahmad

2016-05-23

Simulations of many multi-component PDE-based applications, such as petroleum reservoirs or reacting flows, are dominated by the solution, on each time step and within each Newton step, of large sparse linear systems. The standard solver is a preconditioned Krylov method. Along with application of the preconditioner, memory-bound Sparse Matrix-Vector Multiplication (SpMV) is the most time-consuming operation in such solvers. Multi-species models produce Jacobians with a dense block structure, where the block size can be as large as a few dozen. Failing to exploit this dense block structure vastly underutilizes hardware capable of delivering high performance on dense BLAS operations. This paper presents a GPU-accelerated SpMV kernel for block-sparse matrices. Dense matrix-vector multiplications within the sparse-block structure leverage optimization techniques from the KBLAS library, a high performance library for dense BLAS kernels. The design ideas of KBLAS can be applied to block-sparse matrices. Furthermore, a technique is proposed to balance the workload among thread blocks when there are large variations in the lengths of nonzero rows. Multi-GPU performance is highlighted. The proposed SpMV kernel outperforms existing state-of-the-art implementations using matrices with real structures from different applications. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Performance optimization of Sparse Matrix-Vector Multiplication for multi-component PDE-based applications using GPUs

KAUST Repository

Abdelfattah, Ahmad; Ltaief, Hatem; Keyes, David E.; Dongarra, Jack

2016-01-01

Simulations of many multi-component PDE-based applications, such as petroleum reservoirs or reacting flows, are dominated by the solution, on each time step and within each Newton step, of large sparse linear systems. The standard solver is a preconditioned Krylov method. Along with application of the preconditioner, memory-bound Sparse Matrix-Vector Multiplication (SpMV) is the most time-consuming operation in such solvers. Multi-species models produce Jacobians with a dense block structure, where the block size can be as large as a few dozen. Failing to exploit this dense block structure vastly underutilizes hardware capable of delivering high performance on dense BLAS operations. This paper presents a GPU-accelerated SpMV kernel for block-sparse matrices. Dense matrix-vector multiplications within the sparse-block structure leverage optimization techniques from the KBLAS library, a high performance library for dense BLAS kernels. The design ideas of KBLAS can be applied to block-sparse matrices. Furthermore, a technique is proposed to balance the workload among thread blocks when there are large variations in the lengths of nonzero rows. Multi-GPU performance is highlighted. The proposed SpMV kernel outperforms existing state-of-the-art implementations using matrices with real structures from different applications. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
On A Nonlinear Generalization of Sparse Coding and Dictionary Learning.

Science.gov (United States)

Xie, Yuchen; Ho, Jeffrey; Vemuri, Baba

2013-01-01

Existing dictionary learning algorithms are based on the assumption that the data are vectors in an Euclidean vector space ℝ d , and the dictionary is learned from the training data using the vector space structure of ℝ d and its Euclidean L 2 -metric. However, in many applications, features and data often originated from a Riemannian manifold that does not support a global linear (vector space) structure. Furthermore, the extrinsic viewpoint of existing dictionary learning algorithms becomes inappropriate for modeling and incorporating the intrinsic geometry of the manifold that is potentially important and critical to the application. This paper proposes a novel framework for sparse coding and dictionary learning for data on a Riemannian manifold, and it shows that the existing sparse coding and dictionary learning methods can be considered as special (Euclidean) cases of the more general framework proposed here. We show that both the dictionary and sparse coding can be effectively computed for several important classes of Riemannian manifolds, and we validate the proposed method using two well-known classification problems in computer vision and medical imaging analysis.
Multivariate sparse group lasso for the multivariate multiple linear regression with an arbitrary group structure.

Science.gov (United States)

Li, Yanming; Nan, Bin; Zhu, Ji

2015-06-01

We propose a multivariate sparse group lasso variable selection and estimation method for data with high-dimensional predictors as well as high-dimensional response variables. The method is carried out through a penalized multivariate multiple linear regression model with an arbitrary group structure for the regression coefficient matrix. It suits many biology studies well in detecting associations between multiple traits and multiple predictors, with each trait and each predictor embedded in some biological functional groups such as genes, pathways or brain regions. The method is able to effectively remove unimportant groups as well as unimportant individual coefficients within important groups, particularly for large p small n problems, and is flexible in handling various complex group structures such as overlapping or nested or multilevel hierarchical structures. The method is evaluated through extensive simulations with comparisons to the conventional lasso and group lasso methods, and is applied to an eQTL association study. © 2015, The International Biometric Society.
A parallel algorithm for solving linear equations arising from one-dimensional network problems

International Nuclear Information System (INIS)

Mesina, G.L.

1991-01-01

One-dimensional (1-D) network problems, such as those arising from 1- D fluid simulations and electrical circuitry, produce systems of sparse linear equations which are nearly tridiagonal and contain a few non-zero entries outside the tridiagonal. Most direct solution techniques for such problems either do not take advantage of the special structure of the matrix or do not fully utilize parallel computer architectures. We describe a new parallel direct linear equation solution algorithm, called TRBR, which is especially designed to take advantage of this structure on MIMD shared memory machines. The new method belongs to a family of methods which split the coefficient matrix into the sum of a tridiagonal matrix T and a matrix comprised of the remaining coefficients R. Efficient tridiagonal methods are used to algebraically simplify the linear system. A smaller auxiliary subsystem is created and solved and its solution is used to calculate the solution of the original system. The newly devised BR method solves the subsystem. The serial and parallel operation counts are given for the new method and related earlier methods. TRBR is shown to have the smallest operation count in this class of direct methods. Numerical results are given. Although the algorithm is designed for one-dimensional networks, it has been applied successfully to three-dimensional problems as well. 20 refs., 2 figs., 4 tabs
OFDM receiver for fast time-varying channels using block-sparse Bayesian learning

DEFF Research Database (Denmark)

Barbu, Oana-Elena; Manchón, Carles Navarro; Rom, Christian

2016-01-01

characterized with a basis expansion model using a small number of terms. As a result, the channel estimation problem is posed as that of estimating a vector of complex coefficients that exhibits a block-sparse structure, which we solve with tools from block-sparse Bayesian learning. Using variational Bayesian...... inference, we embed the channel estimator in a receiver structure that performs iterative channel and noise precision estimation, intercarrier interference cancellation, detection and decoding. Simulation results illustrate the superior performance of the proposed receiver over state-of-art receivers....
Mutation rules and the evolution of sparseness and modularity in biological systems.

Directory of Open Access Journals (Sweden)

Tamar Friedlander

Full Text Available Biological systems exhibit two structural features on many levels of organization: sparseness, in which only a small fraction of possible interactions between components actually occur; and modularity--the near decomposability of the system into modules with distinct functionality. Recent work suggests that modularity can evolve in a variety of circumstances, including goals that vary in time such that they share the same subgoals (modularly varying goals, or when connections are costly. Here, we studied the origin of modularity and sparseness focusing on the nature of the mutation process, rather than on connection cost or variations in the goal. We use simulations of evolution with different mutation rules. We found that commonly used sum-rule mutations, in which interactions are mutated by adding random numbers, do not lead to modularity or sparseness except for in special situations. In contrast, product-rule mutations in which interactions are mutated by multiplying by random numbers--a better model for the effects of biological mutations--led to sparseness naturally. When the goals of evolution are modular, in the sense that specific groups of inputs affect specific groups of outputs, product-rule mutations also lead to modular structure; sum-rule mutations do not. Product-rule mutations generate sparseness and modularity because they tend to reduce interactions, and to keep small interaction terms small.
Discriminative sparse coding on multi-manifolds

KAUST Repository

Wang, J.J.-Y.; Bensmail, H.; Yao, N.; Gao, Xin

2013-01-01

Sparse coding has been popularly used as an effective data representation method in various applications, such as computer vision, medical imaging and bioinformatics. However, the conventional sparse coding algorithms and their manifold-regularized variants (graph sparse coding and Laplacian sparse coding), learn codebooks and codes in an unsupervised manner and neglect class information that is available in the training set. To address this problem, we propose a novel discriminative sparse coding method based on multi-manifolds, that learns discriminative class-conditioned codebooks and sparse codes from both data feature spaces and class labels. First, the entire training set is partitioned into multiple manifolds according to the class labels. Then, we formulate the sparse coding as a manifold-manifold matching problem and learn class-conditioned codebooks and codes to maximize the manifold margins of different classes. Lastly, we present a data sample-manifold matching-based strategy to classify the unlabeled data samples. Experimental results on somatic mutations identification and breast tumor classification based on ultrasonic images demonstrate the efficacy of the proposed data representation and classification approach. 2013 The Authors. All rights reserved.
Discriminative sparse coding on multi-manifolds

KAUST Repository

Wang, J.J.-Y.

2013-09-26

Sparse coding has been popularly used as an effective data representation method in various applications, such as computer vision, medical imaging and bioinformatics. However, the conventional sparse coding algorithms and their manifold-regularized variants (graph sparse coding and Laplacian sparse coding), learn codebooks and codes in an unsupervised manner and neglect class information that is available in the training set. To address this problem, we propose a novel discriminative sparse coding method based on multi-manifolds, that learns discriminative class-conditioned codebooks and sparse codes from both data feature spaces and class labels. First, the entire training set is partitioned into multiple manifolds according to the class labels. Then, we formulate the sparse coding as a manifold-manifold matching problem and learn class-conditioned codebooks and codes to maximize the manifold margins of different classes. Lastly, we present a data sample-manifold matching-based strategy to classify the unlabeled data samples. Experimental results on somatic mutations identification and breast tumor classification based on ultrasonic images demonstrate the efficacy of the proposed data representation and classification approach. 2013 The Authors. All rights reserved.
Sparse Regression by Projection and Sparse Discriminant Analysis

KAUST Repository

Qi, Xin

2015-04-03

© 2015, © American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America. Recent years have seen active developments of various penalized regression methods, such as LASSO and elastic net, to analyze high-dimensional data. In these approaches, the direction and length of the regression coefficients are determined simultaneously. Due to the introduction of penalties, the length of the estimates can be far from being optimal for accurate predictions. We introduce a new framework, regression by projection, and its sparse version to analyze high-dimensional data. The unique nature of this framework is that the directions of the regression coefficients are inferred first, and the lengths and the tuning parameters are determined by a cross-validation procedure to achieve the largest prediction accuracy. We provide a theoretical result for simultaneous model selection consistency and parameter estimation consistency of our method in high dimension. This new framework is then generalized such that it can be applied to principal components analysis, partial least squares, and canonical correlation analysis. We also adapt this framework for discriminant analysis. Compared with the existing methods, where there is relatively little control of the dependency among the sparse components, our method can control the relationships among the components. We present efficient algorithms and related theory for solving the sparse regression by projection problem. Based on extensive simulations and real data analysis, we demonstrate that our method achieves good predictive performance and variable selection in the regression setting, and the ability to control relationships between the sparse components leads to more accurate classification. In supplementary materials available online, the details of the algorithms and theoretical proofs, and R codes for all simulation studies are provided.
A parallel code named NEPTUNE for 3D fully electromagnetic and pic simulations

International Nuclear Information System (INIS)

Dong Ye; Yang Wenyuan; Chen Jun; Zhao Qiang; Xia Fang; Ma Yan; Xiao Li; Sun Huifang; Chen Hong; Zhou Haijing; Mao Zeyao; Dong Zhiwei

2010-01-01

A parallel code named NEPTUNE for 3D fully electromagnetic and particle-in-cell (PIC) simulations is introduced, which could run on the Linux system with hundreds to thousand CPUs. NEPTUNE is suitable to simulate entire 3D HPM devices; many HPM devices are simulated and designed by using it. In NEPTUNE code, the electromagnetic fields are updated by using the finite-difference in time domain (FDTD) method of solving Maxwell equations and the particles are moved by using Buneman-Boris advance method of solving relativistic Newton-Lorentz equation. Electromagnetic fields and particles are coupled by using liner weighing interpolation PIC method, and the electric filed components are corrected by using Boris method of solve Poisson equation in order to ensure charge-conservation. NEPTUNE code could construct many complicated geometric structures, such as arbitrary axial-symmetric structures, plane transforming structures, slow-wave-structures, coupling holes, foils, and so on. The boundary conditions used in NEPTUNE code are introduced in brief, including perfectly electric conductor boundary, external wave boundary, and particle boundary. Finally, some typical HPM devices are simulated and test by using NEPTUNE code, including MILO, RBWO, VCO, and RKA. The simulation results are with correct and credible physical images, and the parallel efficiencies are also given. (authors)
In-place sparse suffix sorting

DEFF Research Database (Denmark)

Prezza, Nicola

2018-01-01

information regarding the lexicographical order of a size-b subset of all n text suffixes is often needed. Such information can be stored space-efficiently (in b words) in the sparse suffix array (SSA). The SSA and its relative sparse LCP array (SLCP) can be used as a space-efficient substitute of the sparse...... suffix tree. Very recently, Gawrychowski and Kociumaka [11] showed that the sparse suffix tree (and therefore SSA and SLCP) can be built in asymptotically optimal O(b) space with a Monte Carlo algorithm running in O(n) time. The main reason for using the SSA and SLCP arrays in place of the sparse suffix...... tree is, however, their reduced space of b words each. This leads naturally to the quest for in-place algorithms building these arrays. Franceschini and Muthukrishnan [8] showed that the full suffix array can be built in-place and in optimal running time. On the other hand, finding sub-quadratic in...
SPARSE: quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics

Science.gov (United States)

Will, Sebastian; Otto, Christina; Miladi, Milad; Möhl, Mathias; Backofen, Rolf

2015-01-01

Motivation: RNA-Seq experiments have revealed a multitude of novel ncRNAs. The gold standard for their analysis based on simultaneous alignment and folding suffers from extreme time complexity of O(n6). Subsequently, numerous faster ‘Sankoff-style’ approaches have been suggested. Commonly, the performance of such methods relies on sequence-based heuristics that restrict the search space to optimal or near-optimal sequence alignments; however, the accuracy of sequence-based methods breaks down for RNAs with sequence identities below 60%. Alignment approaches like LocARNA that do not require sequence-based heuristics, have been limited to high complexity (≥ quartic time). Results: Breaking this barrier, we introduce the novel Sankoff-style algorithm ‘sparsified prediction and alignment of RNAs based on their structure ensembles (SPARSE)’, which runs in quadratic time without sequence-based heuristics. To achieve this low complexity, on par with sequence alignment algorithms, SPARSE features strong sparsification based on structural properties of the RNA ensembles. Following PMcomp, SPARSE gains further speed-up from lightweight energy computation. Although all existing lightweight Sankoff-style methods restrict Sankoff’s original model by disallowing loop deletions and insertions, SPARSE transfers the Sankoff algorithm to the lightweight energy model completely for the first time. Compared with LocARNA, SPARSE achieves similar alignment and better folding quality in significantly less time (speedup: 3.7). At similar run-time, it aligns low sequence identity instances substantially more accurate than RAF, which uses sequence-based heuristics. Availability and implementation: SPARSE is freely available at http://www.bioinf.uni-freiburg.de/Software/SPARSE. Contact: backofen@informatik.uni-freiburg.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25838465
A Hybrid FPGA/Coarse Parallel Processing Architecture for Multi-modal Visual Feature Descriptors

DEFF Research Database (Denmark)

Jensen, Lars Baunegaard With; Kjær-Nielsen, Anders; Alonso, Javier Díaz

2008-01-01

This paper describes the hybrid architecture developed for speeding up the processing of so-called multi-modal visual primitives which are sparse image descriptors extracted along contours. In the system, the first stages of visual processing are implemented on FPGAs due to their highly parallel...
Optimization of sparse matrix-vector multiplication on emerging multicore platforms

Energy Technology Data Exchange (ETDEWEB)

Williams, Samuel [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Univ. of California, Berkeley, CA (United States); Oliker, Leonid [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Vuduc, Richard [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Shalf, John [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Yelick, Katherine [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Univ. of California, Berkeley, CA (United States); Demmel, James [Univ. of California, Berkeley, CA (United States)

2007-01-01

We are witnessing a dramatic change in computer architecture due to the multicore paradigm shift, as every electronic device from cell phones to supercomputers confronts parallelism of unprecedented scale. To fully unleash the potential of these systems, the HPC community must develop multicore specific optimization methodologies for important scientific computations. In this work, we examine sparse matrix-vector multiply (SpMV) - one of the most heavily used kernels in scientific computing - across a broad spectrum of multicore designs. Our experimental platform includes the homogeneous AMD dual-core and Intel quad-core designs, the heterogeneous STI Cell, as well as the first scientific study of the highly multithreaded Sun Niagara2. We present several optimization strategies especially effective for the multicore environment, and demonstrate significant performance improvements compared to existing state-of-the-art serial and parallel SpMV implementations. Additionally, we present key insights into the architectural tradeoffs of leading multicore design strategies, in the context of demanding memory-bound numerical algorithms.

Image Super-Resolution Algorithm Based on an Improved Sparse Autoencoder

Directory of Open Access Journals (Sweden)

Detian Huang

2018-01-01

Full Text Available Due to the limitations of the resolution of the imaging system and the influence of scene changes and other factors, sometimes only low-resolution images can be acquired, which cannot satisfy the practical application’s requirements. To improve the quality of low-resolution images, a novel super-resolution algorithm based on an improved sparse autoencoder is proposed. Firstly, in the training set preprocessing stage, the high- and low-resolution image training sets are constructed, respectively, by using high-frequency information of the training samples as the characterization, and then the zero-phase component analysis whitening technique is utilized to decorrelate the formed joint training set to reduce its redundancy. Secondly, a constructed sparse regularization term is added to the cost function of the traditional sparse autoencoder to further strengthen the sparseness constraint on the hidden layer. Finally, in the dictionary learning stage, the improved sparse autoencoder is adopted to achieve unsupervised dictionary learning to improve the accuracy and stability of the dictionary. Experimental results validate that the proposed algorithm outperforms the existing algorithms both in terms of the subjective visual perception and the objective evaluation indices, including the peak signal-to-noise ratio and the structural similarity measure.
Modeling of Electromagnetic Fields in Parallel-Plane Structures: A Unified Contour-Integral Approach

Directory of Open Access Journals (Sweden)

M. Stumpf

2017-04-01

Full Text Available A unified reciprocity-based modeling approach for analyzing electromagnetic fields in dispersive parallel-plane structures of arbitrary shape is described. It is shown that the use of the reciprocity theorem of the time-convolution type leads to a global contour-integral interaction quantity from which novel both time- and frequency-domain numerical schemes can be arrived at. Applications of the numerical method concerning the time-domain radiated interference and susceptibility of parallel-plane structures are discussed and illustrated on numerical examples.
Classification of multispectral or hyperspectral satellite imagery using clustering of sparse approximations on sparse representations in learned dictionaries obtained using efficient convolutional sparse coding

Science.gov (United States)

Moody, Daniela; Wohlberg, Brendt

2018-01-02

An approach for land cover classification, seasonal and yearly change detection and monitoring, and identification of changes in man-made features may use a clustering of sparse approximations (CoSA) on sparse representations in learned dictionaries. The learned dictionaries may be derived using efficient convolutional sparse coding to build multispectral or hyperspectral, multiresolution dictionaries that are adapted to regional satellite image data. Sparse image representations of images over the learned dictionaries may be used to perform unsupervised k-means clustering into land cover categories. The clustering process behaves as a classifier in detecting real variability. This approach may combine spectral and spatial textural characteristics to detect geologic, vegetative, hydrologic, and man-made features, as well as changes in these features over time.
Effect of intermolecular dipole-dipole interactions on interfacial supramolecular structures of C3-symmetric hexa-peri-hexabenzocoronene derivatives.

Science.gov (United States)

Mu, Zhongcheng; Shao, Qi; Ye, Jun; Zeng, Zebing; Zhao, Yang; Hng, Huey Hoon; Boey, Freddy Yin Chiang; Wu, Jishan; Chen, Xiaodong

2011-02-15

Two-dimensional (2D) supramolecular assemblies of a series of novel C(3)-symmetric hexa-peri-hexabenzocoronene (HBC) derivatives bearing different substituents adsorbed on highly oriented pyrolytic graphite were studied by using scanning tunneling microscopy at a solid-liquid interface. It was found that the intermolecular dipole-dipole interactions play a critical role in controlling the interfacial supramolecular assembly of these C(3)-symmetric HBC derivatives at the solid-liquid interface. The HBC molecule bearing three -CF(3) groups could form 2D honeycomb structures because of antiparallel dipole-dipole interactions, whereas HBC molecules bearing three -CN or -NO(2) groups could form hexagonal superstructures because of a special trimeric arrangement induced by dipole-dipole interactions and weak hydrogen bonding interactions ([C-H···NC-] or [C-H···O(2)N-]). Molecular mechanics and dynamics simulations were performed to reveal the physics behind the 2D structures as well as detailed functional group interactions. This work provides an example of how intermolecular dipole-dipole interactions could enable fine control over the self-assembly of disklike π-conjugated molecules.
A density functional for sparse matter

DEFF Research Database (Denmark)

Langreth, D.C.; Lundqvist, Bengt; Chakarova-Kack, S.D.

2009-01-01

forces in molecules, to adsorbed molecules, like benzene, naphthalene, phenol and adenine on graphite, alumina and metals, to polymer and carbon nanotube (CNT) crystals, and hydrogen storage in graphite and metal-organic frameworks (MOFs), and to the structure of DNA and of DNA with intercalators......Sparse matter is abundant and has both strong local bonds and weak nonbonding forces, in particular nonlocal van der Waals (vdW) forces between atoms separated by empty space. It encompasses a broad spectrum of systems, like soft matter, adsorption systems and biostructures. Density-functional...... theory (DFT), long since proven successful for dense matter, seems now to have come to a point, where useful extensions to sparse matter are available. In particular, a functional form, vdW-DF (Dion et al 2004 Phys. Rev. Lett. 92 246401; Thonhauser et al 2007 Phys. Rev. B 76 125112), has been proposed...
A massively-parallel electronic-structure calculations based on real-space density functional theory

International Nuclear Information System (INIS)

Iwata, Jun-Ichi; Takahashi, Daisuke; Oshiyama, Atsushi; Boku, Taisuke; Shiraishi, Kenji; Okada, Susumu; Yabana, Kazuhiro

2010-01-01

Based on the real-space finite-difference method, we have developed a first-principles density functional program that efficiently performs large-scale calculations on massively-parallel computers. In addition to efficient parallel implementation, we also implemented several computational improvements, substantially reducing the computational costs of O(N 3 ) operations such as the Gram-Schmidt procedure and subspace diagonalization. Using the program on a massively-parallel computer cluster with a theoretical peak performance of several TFLOPS, we perform electronic-structure calculations for a system consisting of over 10,000 Si atoms, and obtain a self-consistent electronic-structure in a few hundred hours. We analyze in detail the costs of the program in terms of computation and of inter-node communications to clarify the efficiency, the applicability, and the possibility for further improvements.
Parallel computation of rotating flows

DEFF Research Database (Denmark)

Lundin, Lars Kristian; Barker, Vincent A.; Sørensen, Jens Nørkær

1999-01-01

This paper deals with the simulation of 3‐D rotating flows based on the velocity‐vorticity formulation of the Navier‐Stokes equations in cylindrical coordinates. The governing equations are discretized by a finite difference method. The solution is advanced to a new time level by a two‐step process...... is that of solving a singular, large, sparse, over‐determined linear system of equations, and the iterative method CGLS is applied for this purpose. We discuss some of the mathematical and numerical aspects of this procedure and report on the performance of our software on a wide range of parallel computers. Darbe...
Parallel structures for disaster risk reduction and climate change adaptation in Southern Africa

Directory of Open Access Journals (Sweden)

Per Becker

2013-01-01

Full Text Available During the last decade, the interest of the international community in the concepts of disaster risk reduction and climate change adaptation has been growing immensely. Even though an increasing number of scholars seem to view these concepts as two sides of the same coin (at least when not considering the potentially positive effects of climate change, in practice the two concepts have developed in parallel rather than in an integrated manner when it comes to policy, rhetoric and funding opportunities amongst international organisations and donors. This study investigates the extent of the creation of parallel structures for disaster risk reduction and climate change adaptation in the Southern African Development Community (SADC region. The chosen methodology for the study is a comparative case study and the data are collected through focus groups and content analysis of documentary sources, as well as interviews with key informants. The results indicate that parallel structures for disaster risk reduction and climate change adaptation have been established in all but one of the studied countries. The qualitative interviews performed in some of the countries indicate that stakeholders in disaster risk reduction view this duplication of structures as unfortunate, inefficient and a fertile setup for conflict over resources for the implementation of similar activities. Additional research is called for in order to study the concrete effects of having these parallel structures as a foundation for advocacy for more efficient future disaster risk reduction and climate change adaptation.
Software abstractions and computational issues in parallel structure adaptive mesh methods for electronic structure calculations

Energy Technology Data Exchange (ETDEWEB)

Kohn, S.; Weare, J.; Ong, E.; Baden, S.

1997-05-01

We have applied structured adaptive mesh refinement techniques to the solution of the LDA equations for electronic structure calculations. Local spatial refinement concentrates memory resources and numerical effort where it is most needed, near the atomic centers and in regions of rapidly varying charge density. The structured grid representation enables us to employ efficient iterative solver techniques such as conjugate gradient with FAC multigrid preconditioning. We have parallelized our solver using an object- oriented adaptive mesh refinement framework.
Greedy Algorithms for Nonnegativity-Constrained Simultaneous Sparse Recovery

Science.gov (United States)

Kim, Daeun; Haldar, Justin P.

2016-01-01

This work proposes a family of greedy algorithms to jointly reconstruct a set of vectors that are (i) nonnegative and (ii) simultaneously sparse with a shared support set. The proposed algorithms generalize previous approaches that were designed to impose these constraints individually. Similar to previous greedy algorithms for sparse recovery, the proposed algorithms iteratively identify promising support indices. In contrast to previous approaches, the support index selection procedure has been adapted to prioritize indices that are consistent with both the nonnegativity and shared support constraints. Empirical results demonstrate for the first time that the combined use of simultaneous sparsity and nonnegativity constraints can substantially improve recovery performance relative to existing greedy algorithms that impose less signal structure. PMID:26973368
Symmetrization of Facade Layouts

KAUST Repository

Jiang, Haiyong; Yan, Dong-Ming; Dong, Weiming; Wu, Fuzhang; Nan, Liangliang; Zhang, Xiaopeng

2016-01-01

We present an automatic approach for symmetrizing urban facade layouts. Our method can generate a symmetric layout through minimally modifying the original input layout. Based on the principles of symmetry in urban design, we formulate facade layout symmetrization as an optimization problem. Our method further enhances the regularity of the final layout by redistributing and aligning elements in the layout. We demonstrate that the proposed solution can effectively generate symmetric facade layouts.
Symmetrization of Facade Layouts

KAUST Repository

Jiang, Haiyong

2016-02-26

We present an automatic approach for symmetrizing urban facade layouts. Our method can generate a symmetric layout through minimally modifying the original input layout. Based on the principles of symmetry in urban design, we formulate facade layout symmetrization as an optimization problem. Our method further enhances the regularity of the final layout by redistributing and aligning elements in the layout. We demonstrate that the proposed solution can effectively generate symmetric facade layouts.
Nonlinear spike-and-slab sparse coding for interpretable image encoding.

Directory of Open Access Journals (Sweden)

Jacquelyn A Shelton

Full Text Available Sparse coding is a popular approach to model natural images but has faced two main challenges: modelling low-level image components (such as edge-like structures and their occlusions and modelling varying pixel intensities. Traditionally, images are modelled as a sparse linear superposition of dictionary elements, where the probabilistic view of this problem is that the coefficients follow a Laplace or Cauchy prior distribution. We propose a novel model that instead uses a spike-and-slab prior and nonlinear combination of components. With the prior, our model can easily represent exact zeros for e.g. the absence of an image component, such as an edge, and a distribution over non-zero pixel intensities. With the nonlinearity (the nonlinear max combination rule, the idea is to target occlusions; dictionary elements correspond to image components that can occlude each other. There are major consequences of the model assumptions made by both (nonlinear approaches, thus the main goal of this paper is to isolate and highlight differences between them. Parameter optimization is analytically and computationally intractable in our model, thus as a main contribution we design an exact Gibbs sampler for efficient inference which we can apply to higher dimensional data using latent variable preselection. Results on natural and artificial occlusion-rich data with controlled forms of sparse structure show that our model can extract a sparse set of edge-like components that closely match the generating process, which we refer to as interpretable components. Furthermore, the sparseness of the solution closely follows the ground-truth number of components/edges in the images. The linear model did not learn such edge-like components with any level of sparsity. This suggests that our model can adaptively well-approximate and characterize the meaningful generation process.
Is the Universe matter-antimatter symmetric

International Nuclear Information System (INIS)

Alfven, H.

1976-09-01

According to the symmetric cosmology there should be antimatter regions in space which are equally as large as the matter regions. The regions of different kind are separated by Leidenfrost layers, which may be very thin and not observable from a distance. This view has met resistance which in part is based on the old view that the dilute interstellar and intergalactic medium is more or less homogeneous. However, through space research in the magnetosphere and interplanetary space we know that thin layers, dividing space into regions of different magnetisation, exist and based on this it is concluded that space in general has a cellular structure. This result may break down the psychological resistance to the symmetric theory. The possibility that every second star in our galaxy consists of antimatter is discussed, and it is shown that this view is not in conflict with any observations. As most stars are likely to be surrounded by solar systems of a structure like our own, it is concluded that collisions between comets and antistars (or anticomets and stars) would be rather frequent. Such collisions would result in phenomena of the same type as the observed cosmic γ-ray bursts. Another support for the symmetric cosmology is the continuous X-ray background radiation. Also many of the observed large energy releases in cosmos are likely to be due to annihilation
Ancestral informative marker selection and population structure visualization using sparse Laplacian eigenfunctions.

Directory of Open Access Journals (Sweden)

Jun Zhang

Full Text Available Identification of a small panel of population structure informative markers can reduce genotyping cost and is useful in various applications, such as ancestry inference in association mapping, forensics and evolutionary theory in population genetics. Traditional methods to ascertain ancestral informative markers usually require the prior knowledge of individual ancestry and have difficulty for admixed populations. Recently Principal Components Analysis (PCA has been employed with success to select SNPs which are highly correlated with top significant principal components (PCs without use of individual ancestral information. The approach is also applicable to admixed populations. Here we propose a novel approach based on our recent result on summarizing population structure by graph laplacian eigenfunctions, which differs from PCA in that it is geometric and robust to outliers. Our approach also takes advantage of the priori sparseness of informative markers in the genome. Through simulation of a ring population and the real global population sample HGDP of 650K SNPs genotyped in 940 unrelated individuals, we validate the proposed algorithm at selecting most informative markers, a small fraction of which can recover the similar underlying population structure efficiently. Employing a standard Support Vector Machine (SVM to predict individuals' continental memberships on HGDP dataset of seven continents, we demonstrate that the selected SNPs by our method are more informative but less redundant than those selected by PCA. Our algorithm is a promising tool in genome-wide association studies and population genetics, facilitating the selection of structure informative markers, efficient detection of population substructure and ancestral inference.
Uniqueness of flat spherically symmetric spacelike hypersurfaces admitted by spherically symmetric static spacetimes

Science.gov (United States)

Beig, Robert; Siddiqui, Azad A.

2007-11-01

It is known that spherically symmetric static spacetimes admit a foliation by flat hypersurfaces. Such foliations have explicitly been constructed for some spacetimes, using different approaches, but none of them have proved or even discussed the uniqueness of these foliations. The issue of uniqueness becomes more important due to suitability of flat foliations for studying black hole physics. Here, flat spherically symmetric spacelike hypersurfaces are obtained by a direct method. It is found that spherically symmetric static spacetimes admit flat spherically symmetric hypersurfaces, and that these hypersurfaces are unique up to translation under the timelike Killing vector. This result guarantees the uniqueness of flat spherically symmetric foliations for such spacetimes.
SparseBeads data: benchmarking sparsity-regularized computed tomography

DEFF Research Database (Denmark)

Jørgensen, Jakob Sauer; Coban, Sophia B.; Lionheart, William R. B.

2017-01-01

-regularized reconstruction. A collection of 48 x-ray CT datasets called SparseBeads was designed for benchmarking SR reconstruction algorithms. Beadpacks comprising glass beads of five different sizes as well as mixtures were scanned in a micro-CT scanner to provide structured datasets with variable image sparsity levels...
Solution of generalized shifted linear systems with complex symmetric matrices

International Nuclear Information System (INIS)

Sogabe, Tomohiro; Hoshi, Takeo; Zhang, Shao-Liang; Fujiwara, Takeo

2012-01-01

We develop the shifted COCG method [R. Takayama, T. Hoshi, T. Sogabe, S.-L. Zhang, T. Fujiwara, Linear algebraic calculation of Green’s function for large-scale electronic structure theory, Phys. Rev. B 73 (165108) (2006) 1–9] and the shifted WQMR method [T. Sogabe, T. Hoshi, S.-L. Zhang, T. Fujiwara, On a weighted quasi-residual minimization strategy of the QMR method for solving complex symmetric shifted linear systems, Electron. Trans. Numer. Anal. 31 (2008) 126–140] for solving generalized shifted linear systems with complex symmetric matrices that arise from the electronic structure theory. The complex symmetric Lanczos process with a suitable bilinear form plays an important role in the development of the methods. The numerical examples indicate that the methods are highly attractive when the inner linear systems can efficiently be solved.
The Möbius transform on symmetric ordered structures and its application to capacities on finite sets

OpenAIRE

Michel Grabisch

2004-01-01

International audience; Considering a linearly ordered set, we introduce its symmetric version, and endow it with two operations extending supremum and infimum, so as to obtain an algebraic structure close to a commutative ring. We show that imposing symmetry necessarily entails non associativity, hence computing rules are defined in order to deal with non associativity. We study in details computing rules, which we endow with a partial order. This permits to find solutions to the inversion f...
Sparse canonical methods for biological data integration: application to a cross-platform study

Directory of Open Access Journals (Sweden)

Robert-Granié Christèle

2009-01-01

Full Text Available Abstract Background In the context of systems biology, few sparse approaches have been proposed so far to integrate several data sets. It is however an important and fundamental issue that will be widely encountered in post genomic studies, when simultaneously analyzing transcriptomics, proteomics and metabolomics data using different platforms, so as to understand the mutual interactions between the different data sets. In this high dimensional setting, variable selection is crucial to give interpretable results. We focus on a sparse Partial Least Squares approach (sPLS to handle two-block data sets, where the relationship between the two types of variables is known to be symmetric. Sparse PLS has been developed either for a regression or a canonical correlation framework and includes a built-in procedure to select variables while integrating data. To illustrate the canonical mode approach, we analyzed the NCI60 data sets, where two different platforms (cDNA and Affymetrix chips were used to study the transcriptome of sixty cancer cell lines. Results We compare the results obtained with two other sparse or related canonical correlation approaches: CCA with Elastic Net penalization (CCA-EN and Co-Inertia Analysis (CIA. The latter does not include a built-in procedure for variable selection and requires a two-step analysis. We stress the lack of statistical criteria to evaluate canonical correlation methods, which makes biological interpretation absolutely necessary to compare the different gene selections. We also propose comprehensive graphical representations of both samples and variables to facilitate the interpretation of the results. Conclusion sPLS and CCA-EN selected highly relevant genes and complementary findings from the two data sets, which enabled a detailed understanding of the molecular characteristics of several groups of cell lines. These two approaches were found to bring similar results, although they highlighted the same

Non-coaxial-based microwave ablation antennas for creating symmetric and asymmetric coagulation zones

Science.gov (United States)

Mohtashami, Yahya; Luyen, Hung; Hagness, Susan C.; Behdad, Nader

2018-06-01

We present an investigation of a new class of microwave ablation (MWA) antennas capable of producing axially symmetric or asymmetric heating patterns. The antenna design is based on a dipole fed by a balanced parallel-wire transmission line. The angle and direction of the deployed dipole arms are used to control the heating pattern. We analyzed the specific absorption rate and temperature profiles using electromagnetic and thermal simulations. Two prototypes were fabricated and tested in ex vivo ablation experiments: one was designed to produce symmetric heating patterns and the other was designed to generate asymmetric heating patterns. Both fabricated prototypes exhibited good impedance matching and produced localized coagulation zones as predicted by the simulations. The prototype operating in porcine muscle created an ˜10 cm3 symmetric ablation zone after 10 min of ablation with a power level of 18 W. The prototype operating in egg white created an ˜4 cm3 asymmetric ablation zone with a directionality ratio of 40% after 5 min of ablation with a power level of 25 W. The proposed MWA antenna design shows promise for minimally invasive treatment of tumors in various clinical scenarios where, depending on the situation, a symmetric or an asymmetric heating pattern may be needed.
Structure, complexity and cooperation in parallel external chat interactions

DEFF Research Database (Denmark)

Grønning, Anette

2012-01-01

This article examines structure, complexity and cooperation in external chat interactions at the workplace in which one of the participants is taking part in multiple parallel conversations. The investigation is based on an analysis of nine chat interactions in a work-related context, with partic......This article examines structure, complexity and cooperation in external chat interactions at the workplace in which one of the participants is taking part in multiple parallel conversations. The investigation is based on an analysis of nine chat interactions in a work-related context...... focus is on “turn-taking organisation as the fundamental and generic aspect of interaction organisation” (Drew & Heritage, 1992, p. 25), including the use of turn-taking rules, adjacency pairs, and the importance of pauses. Even though the employee and the union members do not know one another...... and cannot see, hear, or touch one another, it is possible to detect an informal, pleasant tone in their interactions. This challenges the basically asymmetrical relationship between employee and customer, and one can sense a further level of asymmetry. In terms of medium, chat interactions exist via various...
Efficient parallel implicit methods for rotary-wing aerodynamics calculations

Science.gov (United States)

Wissink, Andrew M.

Euler/Navier-Stokes Computational Fluid Dynamics (CFD) methods are commonly used for prediction of the aerodynamics and aeroacoustics of modern rotary-wing aircraft. However, their widespread application to large complex problems is limited lack of adequate computing power. Parallel processing offers the potential for dramatic increases in computing power, but most conventional implicit solution methods are inefficient in parallel and new techniques must be adopted to realize its potential. This work proposes alternative implicit schemes for Euler/Navier-Stokes rotary-wing calculations which are robust and efficient in parallel. The first part of this work proposes an efficient parallelizable modification of the Lower Upper-Symmetric Gauss Seidel (LU-SGS) implicit operator used in the well-known Transonic Unsteady Rotor Navier Stokes (TURNS) code. The new hybrid LU-SGS scheme couples a point-relaxation approach of the Data Parallel-Lower Upper Relaxation (DP-LUR) algorithm for inter-processor communication with the Symmetric Gauss Seidel algorithm of LU-SGS for on-processor computations. With the modified operator, TURNS is implemented in parallel using Message Passing Interface (MPI) for communication. Numerical performance and parallel efficiency are evaluated on the IBM SP2 and Thinking Machines CM-5 multi-processors for a variety of steady-state and unsteady test cases. The hybrid LU-SGS scheme maintains the numerical performance of the original LU-SGS algorithm in all cases and shows a good degree of parallel efficiency. It experiences a higher degree of robustness than DP-LUR for third-order upwind solutions. The second part of this work examines use of Krylov subspace iterative solvers for the nonlinear CFD solutions. The hybrid LU-SGS scheme is used as a parallelizable preconditioner. Two iterative methods are tested, Generalized Minimum Residual (GMRES) and Orthogonal s-Step Generalized Conjugate Residual (OSGCR). The Newton method demonstrates good
Elastic-plastic analysis of an axi-symmetric problem by a finite element method

International Nuclear Information System (INIS)

Isozaki, Toshikuni

1984-06-01

Generally speaking, many structures are designed and fabricated on the basis of an axi-symmetric structure. Finite Element Method is the capable method to solve these axi-symmetric problems beyond the elastic limit. As the first step to solve these problems, the computer program for the elastic-plastic analysis of the axi-symmetric problem is composed. The basic program is based upon that described in Zienkiewicz's text book to solve the elastic plane stress problem, taking the plastic stress matrix by Yamada's method into consideration and it is converted to solve the axi-symmetric problem. For the verification of the program, the plane strain problem of a cylindrical tube under internal pressure was solved. The computed results were compared with those shown in ADINA's user's manual. They showed close agreement. (author)
Bayesian Inference Methods for Sparse Channel Estimation

DEFF Research Database (Denmark)

Pedersen, Niels Lovmand

2013-01-01

This thesis deals with sparse Bayesian learning (SBL) with application to radio channel estimation. As opposed to the classical approach for sparse signal representation, we focus on the problem of inferring complex signals. Our investigations within SBL constitute the basis for the development...... of Bayesian inference algorithms for sparse channel estimation. Sparse inference methods aim at finding the sparse representation of a signal given in some overcomplete dictionary of basis vectors. Within this context, one of our main contributions to the field of SBL is a hierarchical representation...... analysis of the complex prior representation, where we show that the ability to induce sparse estimates of a given prior heavily depends on the inference method used and, interestingly, whether real or complex variables are inferred. We also show that the Bayesian estimators derived from the proposed...
Symmetric q-Bessel functions

Directory of Open Access Journals (Sweden)

Giuseppe Dattoli

1996-05-01

Full Text Available q analog of bessel functions, symmetric under the interchange of q and q^ −1 are introduced. The definition is based on the generating function realized as product of symmetric q-exponential functions with appropriate arguments. Symmetric q-Bessel function are shown to satisfy various identities as well as second-order q-differential equations, which in the limit q → 1 reproduce those obeyed by the usual cylindrical Bessel functions. A brief discussion on the possible algebraic setting for symmetric q-Bessel functions is also provided.
Focusing optical waves with a rotationally symmetric sharp-edge aperture

Science.gov (United States)

Hu, Yanwen; Fu, Shenhe; Li, Zhen; Yin, Hao; Zhou, Jianying; Chen, Zhenqiang

2018-04-01

While there has been various kinds of patterned structures proposed for wave focusing, these patterned structures usually involve complicated lithographic techniques since the element size of the patterned structures should be precisely controlled in microscale or even nanoscale. Here we propose a new and straightforward method for focusing an optical plane wave in free space with a rotationally symmetric sharp-edge aperture. The focusing phenomenon of wave is realized by superposition of a portion of the higher-order symmetric plane waves generated from the sharp edges of the apertures, in contrast to previously focusing techniques which usually depend on a curved phase. We demonstrate both experimentally and theoretically the focusing effect with a series of apertures having different rotational symmetry, and find that the intensity of the hotspots could be controlled by the symmetric strength of the sharp-edge apertures. The presented results would advance the conventional wisdom that light would diffract in all directions and become expanding when it propagates through an aperture. The proposed method is easy to be processed, and might open potential applications in interferometry, image, and superresolution.
Population coding in sparsely connected networks of noisy neurons.

Science.gov (United States)

Tripp, Bryan P; Orchard, Jeff

2012-01-01

This study examines the relationship between population coding and spatial connection statistics in networks of noisy neurons. Encoding of sensory information in the neocortex is thought to require coordinated neural populations, because individual cortical neurons respond to a wide range of stimuli, and exhibit highly variable spiking in response to repeated stimuli. Population coding is rooted in network structure, because cortical neurons receive information only from other neurons, and because the information they encode must be decoded by other neurons, if it is to affect behavior. However, population coding theory has often ignored network structure, or assumed discrete, fully connected populations (in contrast with the sparsely connected, continuous sheet of the cortex). In this study, we modeled a sheet of cortical neurons with sparse, primarily local connections, and found that a network with this structure could encode multiple internal state variables with high signal-to-noise ratio. However, we were unable to create high-fidelity networks by instantiating connections at random according to spatial connection probabilities. In our models, high-fidelity networks required additional structure, with higher cluster factors and correlations between the inputs to nearby neurons.
Population Coding in Sparsely Connected Networks of Noisy Neurons

Directory of Open Access Journals (Sweden)

Bryan Patrick Tripp

2012-05-01

Full Text Available This study examines the relationship between population coding and spatial connection statistics in networks of noisy neurons. Encoding of sensory information in the neocortex is thought to require coordinated neural populations, because individual cortical neurons respond to a wide range of stimuli, and exhibit highly variable spiking in response to repeated stimuli. Population coding is rooted in network structure, because cortical neurons receive information only from other neurons, and because the information they encode must be decoded by other neurons, if it is to affect behaviour. However, population coding theory has often ignored network structure, or assumed discrete, fully-connected populations (in contrast with the sparsely connected, continuous sheet of the cortex. In this study, we model a sheet of cortical neurons with sparse, primarily local connections, and find that a network with this structure can encode multiple internal state variables with high signal-to-noise ratio. However, in our model, although connection probability varies with the distance between neurons, we find that the connections cannot be instantiated at random according to these probabilities, but must have additional structure if information is to be encoded with high fidelity.
Massively Parallel Interrogation of Aptamer Sequence, Structure and Function

Energy Technology Data Exchange (ETDEWEB)

Fischer, N O; Tok, J B; Tarasow, T M

2008-02-08

Optimization of high affinity reagents is a significant bottleneck in medicine and the life sciences. The ability to synthetically create thousands of permutations of a lead high-affinity reagent and survey the properties of individual permutations in parallel could potentially relieve this bottleneck. Aptamers are single stranded oligonucleotides affinity reagents isolated by in vitro selection processes and as a class have been shown to bind a wide variety of target molecules. Methodology/Principal Findings. High density DNA microarray technology was used to synthesize, in situ, arrays of approximately 3,900 aptamer sequence permutations in triplicate. These sequences were interrogated on-chip for their ability to bind the fluorescently-labeled cognate target, immunoglobulin E, resulting in the parallel execution of thousands of experiments. Fluorescence intensity at each array feature was well resolved and shown to be a function of the sequence present. The data demonstrated high intra- and interchip correlation between the same features as well as among the sequence triplicates within a single array. Consistent with aptamer mediated IgE binding, fluorescence intensity correlated strongly with specific aptamer sequences and the concentration of IgE applied to the array. The massively parallel sequence-function analyses provided by this approach confirmed the importance of a consensus sequence found in all 21 of the original IgE aptamer sequences and support a common stem:loop structure as being the secondary structure underlying IgE binding. The microarray application, data and results presented illustrate an efficient, high information content approach to optimizing aptamer function. It also provides a foundation from which to better understand and manipulate this important class of high affinity biomolecules.
Massively parallel interrogation of aptamer sequence, structure and function.

Directory of Open Access Journals (Sweden)

Nicholas O Fischer

Full Text Available BACKGROUND: Optimization of high affinity reagents is a significant bottleneck in medicine and the life sciences. The ability to synthetically create thousands of permutations of a lead high-affinity reagent and survey the properties of individual permutations in parallel could potentially relieve this bottleneck. Aptamers are single stranded oligonucleotides affinity reagents isolated by in vitro selection processes and as a class have been shown to bind a wide variety of target molecules. METHODOLOGY/PRINCIPAL FINDINGS: High density DNA microarray technology was used to synthesize, in situ, arrays of approximately 3,900 aptamer sequence permutations in triplicate. These sequences were interrogated on-chip for their ability to bind the fluorescently-labeled cognate target, immunoglobulin E, resulting in the parallel execution of thousands of experiments. Fluorescence intensity at each array feature was well resolved and shown to be a function of the sequence present. The data demonstrated high intra- and inter-chip correlation between the same features as well as among the sequence triplicates within a single array. Consistent with aptamer mediated IgE binding, fluorescence intensity correlated strongly with specific aptamer sequences and the concentration of IgE applied to the array. CONCLUSION AND SIGNIFICANCE: The massively parallel sequence-function analyses provided by this approach confirmed the importance of a consensus sequence found in all 21 of the original IgE aptamer sequences and support a common stem:loop structure as being the secondary structure underlying IgE binding. The microarray application, data and results presented illustrate an efficient, high information content approach to optimizing aptamer function. It also provides a foundation from which to better understand and manipulate this important class of high affinity biomolecules.
When sparse coding meets ranking: a joint framework for learning sparse codes and ranking scores

KAUST Repository

Wang, Jim Jing-Yan; Cui, Xuefeng; Yu, Ge; Guo, Lili; Gao, Xin

2017-01-01

Sparse coding, which represents a data point as a sparse reconstruction code with regard to a dictionary, has been a popular data representation method. Meanwhile, in database retrieval problems, learning the ranking scores from data points plays
BCYCLIC: A parallel block tridiagonal matrix cyclic solver

Science.gov (United States)

Hirshman, S. P.; Perumalla, K. S.; Lynch, V. E.; Sanchez, R.

2010-09-01

A block tridiagonal matrix is factored with minimal fill-in using a cyclic reduction algorithm that is easily parallelized. Storage of the factored blocks allows the application of the inverse to multiple right-hand sides which may not be known at factorization time. Scalability with the number of block rows is achieved with cyclic reduction, while scalability with the block size is achieved using multithreaded routines (OpenMP, GotoBLAS) for block matrix manipulation. This dual scalability is a noteworthy feature of this new solver, as well as its ability to efficiently handle arbitrary (non-powers-of-2) block row and processor numbers. Comparison with a state-of-the art parallel sparse solver is presented. It is expected that this new solver will allow many physical applications to optimally use the parallel resources on current supercomputers. Example usage of the solver in magneto-hydrodynamic (MHD), three-dimensional equilibrium solvers for high-temperature fusion plasmas is cited.
Fluid/Structure Interaction Studies of Aircraft Using High Fidelity Equations on Parallel Computers

Science.gov (United States)

Guruswamy, Guru; VanDalsem, William (Technical Monitor)

1994-01-01

Abstract Aeroelasticity which involves strong coupling of fluids, structures and controls is an important element in designing an aircraft. Computational aeroelasticity using low fidelity methods such as the linear aerodynamic flow equations coupled with the modal structural equations are well advanced. Though these low fidelity approaches are computationally less intensive, they are not adequate for the analysis of modern aircraft such as High Speed Civil Transport (HSCT) and Advanced Subsonic Transport (AST) which can experience complex flow/structure interactions. HSCT can experience vortex induced aeroelastic oscillations whereas AST can experience transonic buffet associated structural oscillations. Both aircraft may experience a dip in the flutter speed at the transonic regime. For accurate aeroelastic computations at these complex fluid/structure interaction situations, high fidelity equations such as the Navier-Stokes for fluids and the finite-elements for structures are needed. Computations using these high fidelity equations require large computational resources both in memory and speed. Current conventional super computers have reached their limitations both in memory and speed. As a result, parallel computers have evolved to overcome the limitations of conventional computers. This paper will address the transition that is taking place in computational aeroelasticity from conventional computers to parallel computers. The paper will address special techniques needed to take advantage of the architecture of new parallel computers. Results will be illustrated from computations made on iPSC/860 and IBM SP2 computer by using ENSAERO code that directly couples the Euler/Navier-Stokes flow equations with high resolution finite-element structural equations.
Low-rank sparse learning for robust visual tracking

KAUST Repository

Zhang, Tianzhu

2012-01-01

In this paper, we propose a new particle-filter based tracking algorithm that exploits the relationship between particles (candidate targets). By representing particles as sparse linear combinations of dictionary templates, this algorithm capitalizes on the inherent low-rank structure of particle representations that are learned jointly. As such, it casts the tracking problem as a low-rank matrix learning problem. This low-rank sparse tracker (LRST) has a number of attractive properties. (1) Since LRST adaptively updates dictionary templates, it can handle significant changes in appearance due to variations in illumination, pose, scale, etc. (2) The linear representation in LRST explicitly incorporates background templates in the dictionary and a sparse error term, which enables LRST to address the tracking drift problem and to be robust against occlusion respectively. (3) LRST is computationally attractive, since the low-rank learning problem can be efficiently solved as a sequence of closed form update operations, which yield a time complexity that is linear in the number of particles and the template size. We evaluate the performance of LRST by applying it to a set of challenging video sequences and comparing it to 6 popular tracking methods. Our experiments show that by representing particles jointly, LRST not only outperforms the state-of-the-art in tracking accuracy but also significantly improves the time complexity of methods that use a similar sparse linear representation model for particles [1]. © 2012 Springer-Verlag.
Deformable segmentation via sparse representation and dictionary learning.

Science.gov (United States)

Zhang, Shaoting; Zhan, Yiqiang; Metaxas, Dimitris N

2012-10-01

"Shape" and "appearance", the two pillars of a deformable model, complement each other in object segmentation. In many medical imaging applications, while the low-level appearance information is weak or mis-leading, shape priors play a more important role to guide a correct segmentation, thanks to the strong shape characteristics of biological structures. Recently a novel shape prior modeling method has been proposed based on sparse learning theory. Instead of learning a generative shape model, shape priors are incorporated on-the-fly through the sparse shape composition (SSC). SSC is robust to non-Gaussian errors and still preserves individual shape characteristics even when such characteristics is not statistically significant. Although it seems straightforward to incorporate SSC into a deformable segmentation framework as shape priors, the large-scale sparse optimization of SSC has low runtime efficiency, which cannot satisfy clinical requirements. In this paper, we design two strategies to decrease the computational complexity of SSC, making a robust, accurate and efficient deformable segmentation system. (1) When the shape repository contains a large number of instances, which is often the case in 2D problems, K-SVD is used to learn a more compact but still informative shape dictionary. (2) If the derived shape instance has a large number of vertices, which often appears in 3D problems, an affinity propagation method is used to partition the surface into small sub-regions, on which the sparse shape composition is performed locally. Both strategies dramatically decrease the scale of the sparse optimization problem and hence speed up the algorithm. Our method is applied on a diverse set of biomedical image analysis problems. Compared to the original SSC, these two newly-proposed modules not only significant reduce the computational complexity, but also improve the overall accuracy. Copyright © 2012 Elsevier B.V. All rights reserved.
Anti-symmetrized molecular dynamics: a new insight into the structure of nuclei; La dynamique moleculaire antisymetrisee, une nouvelle facon de comprendre la structure des noyaux

Energy Technology Data Exchange (ETDEWEB)

Yoshiko, Kanada-En' yo [High Energy Accelerator Research Organization - KEK, Institute of Particle and Nuclear Studies, Ibaraki (Japan); Masaaki, Kimura [Institute of Physical and Chemical Research - RIKEN, Saitama (Japan); Hisashi, Horiuchi [Kyoto Univ., Dept. of Physics, Graduate School of Science (Japan)

2003-06-01

The AMD (anti-symmetrized molecular dynamics) theory for nuclear structure is explained by showing its actual applications. First the formulation of AMD including various refined versions is briefly presented and its characteristics are discussed, putting a stress on its nature as an 'ab initio' theory. Then we demonstrate fruitful applications to various structure problems in stable nuclei, in order to explicitly verify the 'ab initio' nature of AMD, especially the ability to describe both mean-field-type structure and cluster structure. Finally, we show the results of applications of AMD to unstable nuclei, from which we see that AMD is powerful in elucidating and understanding various types of nuclear structure of unstable nuclei. (authors)
Optimal analysis of structures by concepts of symmetry and regularity

CERN Document Server

Kaveh, Ali

2013-01-01

Optimal analysis is defined as an analysis that creates and uses sparse, well-structured and well-conditioned matrices. The focus is on efficient methods for eigensolution of matrices involved in static, dynamic and stability analyses of symmetric and regular structures, or those general structures containing such components. Powerful tools are also developed for configuration processing, which is an important issue in the analysis and design of space structures and finite element models. Different mathematical concepts are combined to make the optimal analysis of structures feasible. Canonical forms from matrix algebra, product graphs from graph theory and symmetry groups from group theory are some of the concepts involved in the variety of efficient methods and algorithms presented. The algorithms elucidated in this book enable analysts to handle large-scale structural systems by lowering their computational cost, thus fulfilling the requirement for faster analysis and design of future complex systems. The ...
Sparse Image Reconstruction in Computed Tomography

DEFF Research Database (Denmark)

Jørgensen, Jakob Sauer

In recent years, increased focus on the potentially harmful effects of x-ray computed tomography (CT) scans, such as radiation-induced cancer, has motivated research on new low-dose imaging techniques. Sparse image reconstruction methods, as studied for instance in the field of compressed sensing...... applications. This thesis takes a systematic approach toward establishing quantitative understanding of conditions for sparse reconstruction to work well in CT. A general framework for analyzing sparse reconstruction methods in CT is introduced and two sets of computational tools are proposed: 1...... contributions to a general set of computational characterization tools. Thus, the thesis contributions help advance sparse reconstruction methods toward routine use in...
Investigating the degradation behavior under hot carrier stress for InGaZnO TFTs with symmetric and asymmetric structures

International Nuclear Information System (INIS)

Tsai, Ming-Yen; Chang, Ting-Chang; Chu, Ann-Kuo; Chen, Te-Chih; Hsieh, Tien-Yu; Chen, Yu-Te; Tsai, Wu-Wei; Chiang, Wen-Jen; Yan, Jing-Yi

2013-01-01

This letter studies the hot-carrier effect in indium–gallium–zinc oxide (IGZO) thin film transistors with symmetric and asymmetric source/drain structures. The different degradation behaviors after hot-carrier stress in symmetric and asymmetric source/drain devices indicate that different mechanisms dominate the degradation. Since the C–V measurement is highly sensitive to trap states compared to the I–V characterization, C–V curves are utilized to analyze the hot-carrier stress-induced trap state generation. Furthermore, the asymmetric C–V measurements C GD (gate-to-drain capacitance) and C GS (gate-to-source capacitance) are used to analyze the trap state in channel location. The asymmetric source/drain structure under hot-carrier stress induces an asymmetric electrical field and causes different degradation behaviors. In this work, the on-current and subthreshold swing (S.S.) degrade under low electrical field, whereas an apparent V t shift occurs under large electrical field. The different degradation behaviors indicate that trap states are generated under a low electrical field and the channel-hot-electron (CHE) effect occurs under a large electrical field. - Highlights: ► Asymmetric structure thin film transistors improve kick-back effect. ► Asymmetric structures under hot-carrier stress induce different degradation. ► Hot-carrier stress leads to capacitance–voltage curve distortion. ► Extra trap states are generated during hot-carrier stress

Sparse Regression by Projection and Sparse Discriminant Analysis

KAUST Repository

Qi, Xin; Luo, Ruiyan; Carroll, Raymond J.; Zhao, Hongyu

2015-01-01

predictions. We introduce a new framework, regression by projection, and its sparse version to analyze high-dimensional data. The unique nature of this framework is that the directions of the regression coefficients are inferred first, and the lengths
Integrability and symmetric spaces. II- The coset spaces

International Nuclear Information System (INIS)

Ferreira, L.A.

1987-01-01

It shown that a sufficient condition for a model describing the motion of a particle on a coset space to possess a fundamental Poisson bracket relation, and consequently charges involution, is that it must be a symmetric space. The conditions a hamiltonian, or any function of the canonical variables, has to satisfy in order to commute with these charges are studied. It is shown that, for the case of non compact symmetric space, these conditions lead to an algebraic structure which plays an important role in the construction of conserved quantities. (author) [pt
Evaluation of generalized degrees of freedom for sparse estimation by replica method

Science.gov (United States)

Sakata, A.

2016-12-01

We develop a method to evaluate the generalized degrees of freedom (GDF) for linear regression with sparse regularization. The GDF is a key factor in model selection, and thus its evaluation is useful in many modelling applications. An analytical expression for the GDF is derived using the replica method in the large-system-size limit with random Gaussian predictors. The resulting formula has a universal form that is independent of the type of regularization, providing us with a simple interpretation. Within the framework of replica symmetric (RS) analysis, GDF has a physical meaning as the effective fraction of non-zero components. The validity of our method in the RS phase is supported by the consistency of our results with previous mathematical results. The analytical results in the RS phase are calculated numerically using the belief propagation algorithm.
A symmetrical subtraction combined with interpolated values for eliminating scattering from fluorescence EEM data

Science.gov (United States)

Xu, Jing; Liu, Xiaofei; Wang, Yutian

2016-08-01

Parallel factor analysis is a widely used method to extract qualitative and quantitative information of the analyte of interest from fluorescence emission-excitation matrix containing unknown components. Big amplitude of scattering will influence the results of parallel factor analysis. Many methods of eliminating scattering have been proposed. Each of these methods has its advantages and disadvantages. The combination of symmetrical subtraction and interpolated values has been discussed. The combination refers to both the combination of results and the combination of methods. Nine methods were used for comparison. The results show the combination of results can make a better concentration prediction for all the components.
Inference of gene regulatory networks with sparse structural equation models exploiting genetic perturbations.

Directory of Open Access Journals (Sweden)

Xiaodong Cai

Full Text Available Integrating genetic perturbations with gene expression data not only improves accuracy of regulatory network topology inference, but also enables learning of causal regulatory relations between genes. Although a number of methods have been developed to integrate both types of data, the desiderata of efficient and powerful algorithms still remains. In this paper, sparse structural equation models (SEMs are employed to integrate both gene expression data and cis-expression quantitative trait loci (cis-eQTL, for modeling gene regulatory networks in accordance with biological evidence about genes regulating or being regulated by a small number of genes. A systematic inference method named sparsity-aware maximum likelihood (SML is developed for SEM estimation. Using simulated directed acyclic or cyclic networks, the SML performance is compared with that of two state-of-the-art algorithms: the adaptive Lasso (AL based scheme, and the QTL-directed dependency graph (QDG method. Computer simulations demonstrate that the novel SML algorithm offers significantly better performance than the AL-based and QDG algorithms across all sample sizes from 100 to 1,000, in terms of detection power and false discovery rate, in all the cases tested that include acyclic or cyclic networks of 10, 30 and 300 genes. The SML method is further applied to infer a network of 39 human genes that are related to the immune function and are chosen to have a reliable eQTL per gene. The resulting network consists of 9 genes and 13 edges. Most of the edges represent interactions reasonably expected from experimental evidence, while the remaining may just indicate the emergence of new interactions. The sparse SEM and efficient SML algorithm provide an effective means of exploiting both gene expression and perturbation data to infer gene regulatory networks. An open-source computer program implementing the SML algorithm is freely available upon request.
Solution of finite element problems using hybrid parallelization with MPI and OpenMP Solution of finite element problems using hybrid parallelization with MPI and OpenMP

Directory of Open Access Journals (Sweden)

José Miguel Vargas-Félix

2012-11-01

Full Text Available The Finite Element Method (FEM is used to solve problems like solid deformation and heat diffusion in domains with complex geometries. This kind of geometries requires discretization with millions of elements; this is equivalent to solve systems of equations with sparse matrices and tens or hundreds of millions of variables. The aim is to use computer clusters to solve these systems. The solution method used is Schur substructuration. Using it is possible to divide a large system of equations into many small ones to solve them more efficiently. This method allows parallelization. MPI (Message Passing Interface is used to distribute the systems of equations to solve each one in a computer of a cluster. Each system of equations is solved using a solver implemented to use OpenMP as a local parallelization method.The Finite Element Method (FEM is used to solve problems like solid deformation and heat diffusion in domains with complex geometries. This kind of geometries requires discretization with millions of elements; this is equivalent to solve systems of equations with sparse matrices and tens or hundreds of millions of variables. The aim is to use computer clusters to solve these systems. The solution method used is Schur substructuration. Using it is possible to divide a large system of equations into many small ones to solve them more efficiently. This method allows parallelization. MPI (Message Passing Interface is used to distribute the systems of equations to solve each one in a computer of a cluster. Each system of equations is solved using a solver implemented to use OpenMP as a local parallelization method.
Robust visual tracking via multiscale deep sparse networks

Science.gov (United States)

Wang, Xin; Hou, Zhiqiang; Yu, Wangsheng; Xue, Yang; Jin, Zefenfen; Dai, Bo

2017-04-01

In visual tracking, deep learning with offline pretraining can extract more intrinsic and robust features. It has significant success solving the tracking drift in a complicated environment. However, offline pretraining requires numerous auxiliary training datasets and is considerably time-consuming for tracking tasks. To solve these problems, a multiscale sparse networks-based tracker (MSNT) under the particle filter framework is proposed. Based on the stacked sparse autoencoders and rectifier linear unit, the tracker has a flexible and adjustable architecture without the offline pretraining process and exploits the robust and powerful features effectively only through online training of limited labeled data. Meanwhile, the tracker builds four deep sparse networks of different scales, according to the target's profile type. During tracking, the tracker selects the matched tracking network adaptively in accordance with the initial target's profile type. It preserves the inherent structural information more efficiently than the single-scale networks. Additionally, a corresponding update strategy is proposed to improve the robustness of the tracker. Extensive experimental results on a large scale benchmark dataset show that the proposed method performs favorably against state-of-the-art methods in challenging environments.
Efficient MATLAB computations with sparse and factored tensors.

Energy Technology Data Exchange (ETDEWEB)

Bader, Brett William; Kolda, Tamara Gibson (Sandia National Lab, Livermore, CA)

2006-12-01

In this paper, the term tensor refers simply to a multidimensional or N-way array, and we consider how specially structured tensors allow for efficient storage and computation. First, we study sparse tensors, which have the property that the vast majority of the elements are zero. We propose storing sparse tensors using coordinate format and describe the computational efficiency of this scheme for various mathematical operations, including those typical to tensor decomposition algorithms. Second, we study factored tensors, which have the property that they can be assembled from more basic components. We consider two specific types: a Tucker tensor can be expressed as the product of a core tensor (which itself may be dense, sparse, or factored) and a matrix along each mode, and a Kruskal tensor can be expressed as the sum of rank-1 tensors. We are interested in the case where the storage of the components is less than the storage of the full tensor, and we demonstrate that many elementary operations can be computed using only the components. All of the efficiencies described in this paper are implemented in the Tensor Toolbox for MATLAB.
Sparse Linear Identifiable Multivariate Modeling

DEFF Research Database (Denmark)

Henao, Ricardo; Winther, Ole

2011-01-01

and bench-marked on artificial and real biological data sets. SLIM is closest in spirit to LiNGAM (Shimizu et al., 2006), but differs substantially in inference, Bayesian network structure learning and model comparison. Experimentally, SLIM performs equally well or better than LiNGAM with comparable......In this paper we consider sparse and identifiable linear latent variable (factor) and linear Bayesian network models for parsimonious analysis of multivariate data. We propose a computationally efficient method for joint parameter and model inference, and model comparison. It consists of a fully...
A Sparse Bayesian Imaging Technique for Efficient Recovery of Reservoir Channels With Time-Lapse Seismic Measurements

KAUST Repository

Sana, Furrukh

2016-06-01

Subsurface reservoir flow channels are characterized by high-permeability values and serve as preferred pathways for fluid propagation. Accurate estimation of their geophysical structures is thus of great importance for the oil industry. The ensemble Kalman filter (EnKF) is a widely used statistical technique for estimating subsurface reservoir model parameters. However, accurate reconstruction of the subsurface geological features with the EnKF is challenging because of the limited measurements available from the wells and the smoothing effects imposed by the \\\\ell _{2} -norm nature of its update step. A new EnKF scheme based on sparse domain representation was introduced by Sana et al. (2015) to incorporate useful prior structural information in the estimation process for efficient recovery of subsurface channels. In this paper, we extend this work in two ways: 1) investigate the effects of incorporating time-lapse seismic data on the channel reconstruction; and 2) explore a Bayesian sparse reconstruction algorithm with the potential ability to reduce the computational requirements. Numerical results suggest that the performance of the new sparse Bayesian based EnKF scheme is enhanced with the availability of seismic measurements, leading to further improvement in the recovery of flow channels structures. The sparse Bayesian approach further provides a computationally efficient framework for enforcing a sparse solution, especially with the possibility of using high sparsity rates through the inclusion of seismic data.
A Sparse Bayesian Imaging Technique for Efficient Recovery of Reservoir Channels With Time-Lapse Seismic Measurements

KAUST Repository

Sana, Furrukh; Ravanelli, Fabio; Al-Naffouri, Tareq Y.; Hoteit, Ibrahim

2016-01-01

Subsurface reservoir flow channels are characterized by high-permeability values and serve as preferred pathways for fluid propagation. Accurate estimation of their geophysical structures is thus of great importance for the oil industry. The ensemble Kalman filter (EnKF) is a widely used statistical technique for estimating subsurface reservoir model parameters. However, accurate reconstruction of the subsurface geological features with the EnKF is challenging because of the limited measurements available from the wells and the smoothing effects imposed by the \\ell _{2} -norm nature of its update step. A new EnKF scheme based on sparse domain representation was introduced by Sana et al. (2015) to incorporate useful prior structural information in the estimation process for efficient recovery of subsurface channels. In this paper, we extend this work in two ways: 1) investigate the effects of incorporating time-lapse seismic data on the channel reconstruction; and 2) explore a Bayesian sparse reconstruction algorithm with the potential ability to reduce the computational requirements. Numerical results suggest that the performance of the new sparse Bayesian based EnKF scheme is enhanced with the availability of seismic measurements, leading to further improvement in the recovery of flow channels structures. The sparse Bayesian approach further provides a computationally efficient framework for enforcing a sparse solution, especially with the possibility of using high sparsity rates through the inclusion of seismic data.
Sparse decompositions in 'incoherent' dictionaries

DEFF Research Database (Denmark)

Gribonval, R.; Nielsen, Morten

2003-01-01

a unique sparse representation in such a dictionary. In particular, it is proved that the result of Donoho and Huo, concerning the replacement of a combinatorial optimization problem with a linear programming problem when searching for sparse representations, has an analog for dictionaries that may...
Optimization of Sparse Matrix-Vector Multiplication on Emerging Multicore Platforms

Energy Technology Data Exchange (ETDEWEB)

Williams, Samuel; Oliker, Leonid; Vuduc, Richard; Shalf, John; Yelick, Katherine; Demmel, James

2008-10-16

We are witnessing a dramatic change in computer architecture due to the multicore paradigm shift, as every electronic device from cell phones to supercomputers confronts parallelism of unprecedented scale. To fully unleash the potential of these systems, the HPC community must develop multicore specific-optimization methodologies for important scientific computations. In this work, we examine sparse matrix-vector multiply (SpMV) - one of the most heavily used kernels in scientific computing - across a broad spectrum of multicore designs. Our experimental platform includes the homogeneous AMD quad-core, AMD dual-core, and Intel quad-core designs, the heterogeneous STI Cell, as well as one of the first scientific studies of the highly multithreaded Sun Victoria Falls (a Niagara2 SMP). We present several optimization strategies especially effective for the multicore environment, and demonstrate significant performance improvements compared to existing state-of-the-art serial and parallel SpMV implementations. Additionally, we present key insights into the architectural trade-offs of leading multicore design strategies, in the context of demanding memory-bound numerical algorithms.
Optimal Couple Projections for Domain Adaptive Sparse Representation-based Classification.

Science.gov (United States)

Zhang, Guoqing; Sun, Huaijiang; Porikli, Fatih; Liu, Yazhou; Sun, Quansen

2017-08-29

In recent years, sparse representation based classification (SRC) is one of the most successful methods and has been shown impressive performance in various classification tasks. However, when the training data has a different distribution than the testing data, the learned sparse representation may not be optimal, and the performance of SRC will be degraded significantly. To address this problem, in this paper, we propose an optimal couple projections for domain-adaptive sparse representation-based classification (OCPD-SRC) method, in which the discriminative features of data in the two domains are simultaneously learned with the dictionary that can succinctly represent the training and testing data in the projected space. OCPD-SRC is designed based on the decision rule of SRC, with the objective to learn coupled projection matrices and a common discriminative dictionary such that the between-class sparse reconstruction residuals of data from both domains are maximized, and the within-class sparse reconstruction residuals of data are minimized in the projected low-dimensional space. Thus, the resulting representations can well fit SRC and simultaneously have a better discriminant ability. In addition, our method can be easily extended to multiple domains and can be kernelized to deal with the nonlinear structure of data. The optimal solution for the proposed method can be efficiently obtained following the alternative optimization method. Extensive experimental results on a series of benchmark databases show that our method is better or comparable to many state-of-the-art methods.
3-dimensional magnetotelluric inversion including topography using deformed hexahedral edge finite elements and direct solvers parallelized on symmetric multiprocessor computers - Part II: direct data-space inverse solution

Science.gov (United States)

Kordy, M.; Wannamaker, P.; Maris, V.; Cherkaev, E.; Hill, G.

2016-01-01

Following the creation described in Part I of a deformable edge finite-element simulator for 3-D magnetotelluric (MT) responses using direct solvers, in Part II we develop an algorithm named HexMT for 3-D regularized inversion of MT data including topography. Direct solvers parallelized on large-RAM, symmetric multiprocessor (SMP) workstations are used also for the Gauss-Newton model update. By exploiting the data-space approach, the computational cost of the model update becomes much less in both time and computer memory than the cost of the forward simulation. In order to regularize using the second norm of the gradient, we factor the matrix related to the regularization term and apply its inverse to the Jacobian, which is done using the MKL PARDISO library. For dense matrix multiplication and factorization related to the model update, we use the PLASMA library which shows very good scalability across processor cores. A synthetic test inversion using a simple hill model shows that including topography can be important; in this case depression of the electric field by the hill can cause false conductors at depth or mask the presence of resistive structure. With a simple model of two buried bricks, a uniform spatial weighting for the norm of model smoothing recovered more accurate locations for the tomographic images compared to weightings which were a function of parameter Jacobians. We implement joint inversion for static distortion matrices tested using the Dublin secret model 2, for which we are able to reduce nRMS to ˜1.1 while avoiding oscillatory convergence. Finally we test the code on field data by inverting full impedance and tipper MT responses collected around Mount St Helens in the Cascade volcanic chain. Among several prominent structures, the north-south trending, eruption-controlling shear zone is clearly imaged in the inversion.
Data analysis in high-dimensional sparse spaces

DEFF Research Database (Denmark)

Clemmensen, Line Katrine Harder

classification techniques for high-dimensional problems are presented: Sparse discriminant analysis, sparse mixture discriminant analysis and orthogonality constrained support vector machines. The first two introduces sparseness to the well known linear and mixture discriminant analysis and thereby provide low...... are applied to classifications of fish species, ear canal impressions used in the hearing aid industry, microbiological fungi species, and various cancerous tissues and healthy tissues. In addition, novel applications of sparse regressions (also called the elastic net) to the medical, concrete, and food...
Two-phase flow stability structure in a natural circulation system

Energy Technology Data Exchange (ETDEWEB)

Zhou, Zhiwei [Nuclear Engineering Laboratory Zurich (Switzerland)

1995-09-01

The present study reports a numerical analysis of two-phase flow stability structures in a natural circulation system with two parallel, heated channels. The numerical model is derived, based on the Galerkin moving nodal method. This analysis is related to some design options applicable to integral heating reactors with a slightly-boiling operation mode, and is also of general interest to similar facilities. The options include: (1) Symmetric heating and throttling; (2) Asymmetric heating and symmetric throttling; (3) Asymmetric heating and throttling. The oscillation modes for these variants are discussed. Comparisons with the data from the INET two-phase flow stability experiment have qualitatively validated the present analysis.
CT Image Sequence Restoration Based on Sparse and Low-Rank Decomposition

Science.gov (United States)

Gou, Shuiping; Wang, Yueyue; Wang, Zhilong; Peng, Yong; Zhang, Xiaopeng; Jiao, Licheng; Wu, Jianshe

2013-01-01

Blurry organ boundaries and soft tissue structures present a major challenge in biomedical image restoration. In this paper, we propose a low-rank decomposition-based method for computed tomography (CT) image sequence restoration, where the CT image sequence is decomposed into a sparse component and a low-rank component. A new point spread function of Weiner filter is employed to efficiently remove blur in the sparse component; a wiener filtering with the Gaussian PSF is used to recover the average image of the low-rank component. And then we get the recovered CT image sequence by combining the recovery low-rank image with all recovery sparse image sequence. Our method achieves restoration results with higher contrast, sharper organ boundaries and richer soft tissue structure information, compared with existing CT image restoration methods. The robustness of our method was assessed with numerical experiments using three different low-rank models: Robust Principle Component Analysis (RPCA), Linearized Alternating Direction Method with Adaptive Penalty (LADMAP) and Go Decomposition (GoDec). Experimental results demonstrated that the RPCA model was the most suitable for the small noise CT images whereas the GoDec model was the best for the large noisy CT images. PMID:24023764
Parallel Breadth-First Search on Distributed Memory Systems

Energy Technology Data Exchange (ETDEWEB)

Computational Research Division; Buluc, Aydin; Madduri, Kamesh

2011-04-15

Data-intensive, graph-based computations are pervasive in several scientific applications, and are known to to be quite challenging to implement on distributed memory systems. In this work, we explore the design space of parallel algorithms for Breadth-First Search (BFS), a key subroutine in several graph algorithms. We present two highly-tuned par- allel approaches for BFS on large parallel systems: a level-synchronous strategy that relies on a simple vertex-based partitioning of the graph, and a two-dimensional sparse matrix- partitioning-based approach that mitigates parallel commu- nication overhead. For both approaches, we also present hybrid versions with intra-node multithreading. Our novel hybrid two-dimensional algorithm reduces communication times by up to a factor of 3.5, relative to a common vertex based approach. Our experimental study identifies execu- tion regimes in which these approaches will be competitive, and we demonstrate extremely high performance on lead- ing distributed-memory parallel systems. For instance, for a 40,000-core parallel execution on Hopper, an AMD Magny- Cours based system, we achieve a BFS performance rate of 17.8 billion edge visits per second on an undirected graph of 4.3 billion vertices and 68.7 billion edges with skewed degree distribution.
The immunity-related GTPase Irga6 dimerizes in a parallel head-to-head fashion.

Science.gov (United States)

Schulte, Kathrin; Pawlowski, Nikolaus; Faelber, Katja; Fröhlich, Chris; Howard, Jonathan; Daumke, Oliver

2016-03-02

The immunity-related GTPases (IRGs) constitute a powerful cell-autonomous resistance system against several intracellular pathogens. Irga6 is a dynamin-like protein that oligomerizes at the parasitophorous vacuolar membrane (PVM) of Toxoplasma gondii leading to its vesiculation. Based on a previous biochemical analysis, it has been proposed that the GTPase domains of Irga6 dimerize in an antiparallel fashion during oligomerization. We determined the crystal structure of an oligomerization-impaired Irga6 mutant bound to a non-hydrolyzable GTP analog. Contrary to the previous model, the structure shows that the GTPase domains dimerize in a parallel fashion. The nucleotides in the center of the interface participate in dimerization by forming symmetric contacts with each other and with the switch I region of the opposing Irga6 molecule. The latter contact appears to activate GTP hydrolysis by stabilizing the position of the catalytic glutamate 106 in switch I close to the active site. Further dimerization contacts involve switch II, the G4 helix and the trans stabilizing loop. The Irga6 structure features a parallel GTPase domain dimer, which appears to be a unifying feature of all dynamin and septin superfamily members. This study contributes important insights into the assembly and catalytic mechanisms of IRG proteins as prerequisite to understand their anti-microbial action.

Scalability of Parallel Scientific Applications on the Cloud

Directory of Open Access Journals (Sweden)

Satish Narayana Srirama

2011-01-01

Full Text Available Cloud computing, with its promise of virtually infinite resources, seems to suit well in solving resource greedy scientific computing problems. To study the effects of moving parallel scientific applications onto the cloud, we deployed several benchmark applications like matrix–vector operations and NAS parallel benchmarks, and DOUG (Domain decomposition On Unstructured Grids on the cloud. DOUG is an open source software package for parallel iterative solution of very large sparse systems of linear equations. The detailed analysis of DOUG on the cloud showed that parallel applications benefit a lot and scale reasonable on the cloud. We could also observe the limitations of the cloud and its comparison with cluster in terms of performance. However, for efficiently running the scientific applications on the cloud infrastructure, the applications must be reduced to frameworks that can successfully exploit the cloud resources, like the MapReduce framework. Several iterative and embarrassingly parallel algorithms are reduced to the MapReduce model and their performance is measured and analyzed. The analysis showed that Hadoop MapReduce has significant problems with iterative methods, while it suits well for embarrassingly parallel algorithms. Scientific computing often uses iterative methods to solve large problems. Thus, for scientific computing on the cloud, this paper raises the necessity for better frameworks or optimizations for MapReduce.
Sparse-View Ultrasound Diffraction Tomography Using Compressed Sensing with Nonuniform FFT

Directory of Open Access Journals (Sweden)

Shaoyan Hua

2014-01-01

Full Text Available Accurate reconstruction of the object from sparse-view sampling data is an appealing issue for ultrasound diffraction tomography (UDT. In this paper, we present a reconstruction method based on compressed sensing framework for sparse-view UDT. Due to the piecewise uniform characteristics of anatomy structures, the total variation is introduced into the cost function to find a more faithful sparse representation of the object. The inverse problem of UDT is iteratively resolved by conjugate gradient with nonuniform fast Fourier transform. Simulation results show the effectiveness of the proposed method that the main characteristics of the object can be properly presented with only 16 views. Compared to interpolation and multiband method, the proposed method can provide higher resolution and lower artifacts with the same view number. The robustness to noise and the computation complexity are also discussed.
Supervised Transfer Sparse Coding

KAUST Repository

Al-Shedivat, Maruan

2014-07-27

A combination of the sparse coding and transfer learn- ing techniques was shown to be accurate and robust in classification tasks where training and testing objects have a shared feature space but are sampled from differ- ent underlying distributions, i.e., belong to different do- mains. The key assumption in such case is that in spite of the domain disparity, samples from different domains share some common hidden factors. Previous methods often assumed that all the objects in the target domain are unlabeled, and thus the training set solely comprised objects from the source domain. However, in real world applications, the target domain often has some labeled objects, or one can always manually label a small num- ber of them. In this paper, we explore such possibil- ity and show how a small number of labeled data in the target domain can significantly leverage classifica- tion accuracy of the state-of-the-art transfer sparse cod- ing methods. We further propose a unified framework named supervised transfer sparse coding (STSC) which simultaneously optimizes sparse representation, domain transfer and classification. Experimental results on three applications demonstrate that a little manual labeling and then learning the model in a supervised fashion can significantly improve classification accuracy.
Scalable parallel prefix solvers for discrete ordinates transport

International Nuclear Information System (INIS)

Pautz, S.; Pandya, T.; Adams, M.

2009-01-01

The well-known 'sweep' algorithm for inverting the streaming-plus-collision term in first-order deterministic radiation transport calculations has some desirable numerical properties. However, it suffers from parallel scaling issues caused by a lack of concurrency. The maximum degree of concurrency, and thus the maximum parallelism, grows more slowly than the problem size for sweeps-based solvers. We investigate a new class of parallel algorithms that involves recasting the streaming-plus-collision problem in prefix form and solving via cyclic reduction. This method, although computationally more expensive at low levels of parallelism than the sweep algorithm, offers better theoretical scalability properties. Previous work has demonstrated this approach for one-dimensional calculations; we show how to extend it to multidimensional calculations. Notably, for multiple dimensions it appears that this approach is limited to long-characteristics discretizations; other discretizations cannot be cast in prefix form. We implement two variants of the algorithm within the radlib/SCEPTRE transport code library at Sandia National Laboratories and show results on two different massively parallel systems. Both the 'forward' and 'symmetric' solvers behave similarly, scaling well to larger degrees of parallelism then sweeps-based solvers. We do observe some issues at the highest levels of parallelism (relative to the system size) and discuss possible causes. We conclude that this approach shows good potential for future parallel systems, but the parallel scalability will depend heavily on the architecture of the communication networks of these systems. (authors)
The Importance of Structure in Incomplete Factorization Preconditioners

Czech Academy of Sciences Publication Activity Database

Scott, J.; Tůma, Miroslav

2011-01-01

Roč. 51, č. 2 (2011), s. 385-404 ISSN 0006-3835 Grant - others:GA AV ČR(CZ) M100300902 Institutional research plan: CEZ:AV0Z10300504 Keywords : sparse symmetric linear systems * incomplete factorizations * preconditioners * level-based approach Subject RIV: BA - General Mathematics Impact factor: 0.724, year: 2011
SparseLeap: Efficient Empty Space Skipping for Large-Scale Volume Rendering

KAUST Repository

Hadwiger, Markus; Al-Awami, Ali K.; Beyer, Johanna; Agus, Marco; Pfister, Hanspeter

2017-01-01

Recent advances in data acquisition produce volume data of very high resolution and large size, such as terabyte-sized microscopy volumes. These data often contain many fine and intricate structures, which pose huge challenges for volume rendering, and make it particularly important to efficiently skip empty space. This paper addresses two major challenges: (1) The complexity of large volumes containing fine structures often leads to highly fragmented space subdivisions that make empty regions hard to skip efficiently. (2) The classification of space into empty and non-empty regions changes frequently, because the user or the evaluation of an interactive query activate a different set of objects, which makes it unfeasible to pre-compute a well-adapted space subdivision. We describe the novel SparseLeap method for efficient empty space skipping in very large volumes, even around fine structures. The main performance characteristic of SparseLeap is that it moves the major cost of empty space skipping out of the ray-casting stage. We achieve this via a hybrid strategy that balances the computational load between determining empty ray segments in a rasterization (object-order) stage, and sampling non-empty volume data in the ray-casting (image-order) stage. Before ray-casting, we exploit the fast hardware rasterization of GPUs to create a ray segment list for each pixel, which identifies non-empty regions along the ray. The ray-casting stage then leaps over empty space without hierarchy traversal. Ray segment lists are created by rasterizing a set of fine-grained, view-independent bounding boxes. Frame coherence is exploited by re-using the same bounding boxes unless the set of active objects changes. We show that SparseLeap scales better to large, sparse data than standard octree empty space skipping.
SparseLeap: Efficient Empty Space Skipping for Large-Scale Volume Rendering

KAUST Repository

Hadwiger, Markus

2017-08-28

Recent advances in data acquisition produce volume data of very high resolution and large size, such as terabyte-sized microscopy volumes. These data often contain many fine and intricate structures, which pose huge challenges for volume rendering, and make it particularly important to efficiently skip empty space. This paper addresses two major challenges: (1) The complexity of large volumes containing fine structures often leads to highly fragmented space subdivisions that make empty regions hard to skip efficiently. (2) The classification of space into empty and non-empty regions changes frequently, because the user or the evaluation of an interactive query activate a different set of objects, which makes it unfeasible to pre-compute a well-adapted space subdivision. We describe the novel SparseLeap method for efficient empty space skipping in very large volumes, even around fine structures. The main performance characteristic of SparseLeap is that it moves the major cost of empty space skipping out of the ray-casting stage. We achieve this via a hybrid strategy that balances the computational load between determining empty ray segments in a rasterization (object-order) stage, and sampling non-empty volume data in the ray-casting (image-order) stage. Before ray-casting, we exploit the fast hardware rasterization of GPUs to create a ray segment list for each pixel, which identifies non-empty regions along the ray. The ray-casting stage then leaps over empty space without hierarchy traversal. Ray segment lists are created by rasterizing a set of fine-grained, view-independent bounding boxes. Frame coherence is exploited by re-using the same bounding boxes unless the set of active objects changes. We show that SparseLeap scales better to large, sparse data than standard octree empty space skipping.
Some algorithms for the solution of the symmetric eigenvalue problem on a multiprocessor electronic computer

International Nuclear Information System (INIS)

Molchanov, I.N.; Khimich, A.N.

1984-01-01

This article shows how a reflection method can be used to find the eigenvalues of a matrix by transforming the matrix to tridiagonal form. The method of conjugate gradients is used to find the smallest eigenvalue and the corresponding eigenvector of symmetric positive-definite band matrices. Topics considered include the computational scheme of the reflection method, the organization of parallel calculations by the reflection method, the computational scheme of the conjugate gradient method, the organization of parallel calculations by the conjugate gradient method, and the effectiveness of parallel algorithms. It is concluded that it is possible to increase the overall effectiveness of the multiprocessor electronic computers by either letting the newly available processors of a new problem operate in the multiprocessor mode, or by improving the coefficient of uniform partition of the original information
Atmospheric inverse modeling via sparse reconstruction

Science.gov (United States)

Hase, Nils; Miller, Scot M.; Maaß, Peter; Notholt, Justus; Palm, Mathias; Warneke, Thorsten

2017-10-01

Many applications in atmospheric science involve ill-posed inverse problems. A crucial component of many inverse problems is the proper formulation of a priori knowledge about the unknown parameters. In most cases, this knowledge is expressed as a Gaussian prior. This formulation often performs well at capturing smoothed, large-scale processes but is often ill equipped to capture localized structures like large point sources or localized hot spots. Over the last decade, scientists from a diverse array of applied mathematics and engineering fields have developed sparse reconstruction techniques to identify localized structures. In this study, we present a new regularization approach for ill-posed inverse problems in atmospheric science. It is based on Tikhonov regularization with sparsity constraint and allows bounds on the parameters. We enforce sparsity using a dictionary representation system. We analyze its performance in an atmospheric inverse modeling scenario by estimating anthropogenic US methane (CH4) emissions from simulated atmospheric measurements. Different measures indicate that our sparse reconstruction approach is better able to capture large point sources or localized hot spots than other methods commonly used in atmospheric inversions. It captures the overall signal equally well but adds details on the grid scale. This feature can be of value for any inverse problem with point or spatially discrete sources. We show an example for source estimation of synthetic methane emissions from the Barnett shale formation.
Single image super-resolution based on compressive sensing and improved TV minimization sparse recovery

Science.gov (United States)

Vishnukumar, S.; Wilscy, M.

2017-12-01

In this paper, we propose a single image Super-Resolution (SR) method based on Compressive Sensing (CS) and Improved Total Variation (TV) Minimization Sparse Recovery. In the CS framework, low-resolution (LR) image is treated as the compressed version of high-resolution (HR) image. Dictionary Training and Sparse Recovery are the two phases of the method. K-Singular Value Decomposition (K-SVD) method is used for dictionary training and the dictionary represents HR image patches in a sparse manner. Here, only the interpolated version of the LR image is used for training purpose and thereby the structural self similarity inherent in the LR image is exploited. In the sparse recovery phase the sparse representation coefficients with respect to the trained dictionary for LR image patches are derived using Improved TV Minimization method. HR image can be reconstructed by the linear combination of the dictionary and the sparse coefficients. The experimental results show that the proposed method gives better results quantitatively as well as qualitatively on both natural and remote sensing images. The reconstructed images have better visual quality since edges and other sharp details are preserved.
Symmetric vectors and algebraic classification

International Nuclear Information System (INIS)

Leibowitz, E.

1980-01-01

The concept of symmetric vector field in Riemannian manifolds, which arises in the study of relativistic cosmological models, is analyzed. Symmetric vectors are tied up with the algebraic properties of the manifold curvature. A procedure for generating a congruence of symmetric fields out of a given pair is outlined. The case of a three-dimensional manifold of constant curvature (''isotropic universe'') is studied in detail, with all its symmetric vector fields being explicitly constructed
Mode structure symmetry breaking of energetic particle driven beta-induced Alfvén eigenmode

Science.gov (United States)

Lu, Z. X.; Wang, X.; Lauber, Ph.; Zonca, F.

2018-01-01

The mode structure symmetry breaking of energetic particle driven Beta-induced Alfvén Eigenmode (BAE) is studied based on global theory and simulation. The weak coupling formula gives a reasonable estimate of the local eigenvalue compared with global hybrid simulation using XHMGC. The non-perturbative effect of energetic particles on global mode structure symmetry breaking in radial and parallel (along B) directions is demonstrated. With the contribution from energetic particles, two dimensional (radial and poloidal) BAE mode structures with symmetric/asymmetric tails are produced using an analytical model. It is demonstrated that the symmetry breaking in radial and parallel directions is intimately connected. The effects of mode structure symmetry breaking on nonlinear physics, energetic particle transport, and the possible insight for experimental studies are discussed.
Invariant subspaces in some function spaces on symmetric spaces. II

International Nuclear Information System (INIS)

Platonov, S S

1998-01-01

Let G be a semisimple connected Lie group with finite centre, K a maximal compact subgroup of G, and M=G/K a Riemannian symmetric space of non-compact type. We study the problem of describing the structure of closed linear subspaces in various function spaces on M that are invariant under the quasiregular representation of the group G. We consider the case when M is a symplectic symmetric space of rank 1
SPARSE: quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics.

Science.gov (United States)

Will, Sebastian; Otto, Christina; Miladi, Milad; Möhl, Mathias; Backofen, Rolf

2015-08-01

RNA-Seq experiments have revealed a multitude of novel ncRNAs. The gold standard for their analysis based on simultaneous alignment and folding suffers from extreme time complexity of [Formula: see text]. Subsequently, numerous faster 'Sankoff-style' approaches have been suggested. Commonly, the performance of such methods relies on sequence-based heuristics that restrict the search space to optimal or near-optimal sequence alignments; however, the accuracy of sequence-based methods breaks down for RNAs with sequence identities below 60%. Alignment approaches like LocARNA that do not require sequence-based heuristics, have been limited to high complexity ([Formula: see text] quartic time). Breaking this barrier, we introduce the novel Sankoff-style algorithm 'sparsified prediction and alignment of RNAs based on their structure ensembles (SPARSE)', which runs in quadratic time without sequence-based heuristics. To achieve this low complexity, on par with sequence alignment algorithms, SPARSE features strong sparsification based on structural properties of the RNA ensembles. Following PMcomp, SPARSE gains further speed-up from lightweight energy computation. Although all existing lightweight Sankoff-style methods restrict Sankoff's original model by disallowing loop deletions and insertions, SPARSE transfers the Sankoff algorithm to the lightweight energy model completely for the first time. Compared with LocARNA, SPARSE achieves similar alignment and better folding quality in significantly less time (speedup: 3.7). At similar run-time, it aligns low sequence identity instances substantially more accurate than RAF, which uses sequence-based heuristics. © The Author 2015. Published by Oxford University Press.
PAM4 based symmetrical 112-Gbps long-reach TWDM-PON

Science.gov (United States)

Wu, Liyu; Gao, Fan; Zhang, Minming; Fu, Songnian; Deng, Lei; Choi, Michael; Chang, Donald; Lei, Gordon K. P.; Liu, Deming

2018-02-01

We experimentally demonstrate cost effective symmetrical 112-Gbps long-reach passive optical network (LR-PON) over 70-km standard signal mode fiber (SSMF), based on pulse amplitude modulation (PAM)-4. Four 10G-class directly modulated lasers (DMLs) at C-band are used for achieving 4 × 28-Gbps downstream transmission, while two 18G-class DMLs at O-band are used to realize 2 × 56-Gbps upstream transmission, without any optical amplification in optical distributed network (ODN). Both dispersion compensation fiber (DCF) for downstream signal and praseodymium-doped fiber amplifier (PDFA) for upstream signal are equipped at optical line terminal (OLT). Meanwhile, sparse Volterra filter (SVF) equalizer is proposed to mitigate the transmission impairments with substantial reduction of computation complexity. Finally, we can successfully provide a loss budget of 33 dB per downstream wavelength channel, indicating of 64 optical network units (ONUs) with more than 1.25 Gbps per ONU.
Compact broadband polarization beam splitter using a symmetric directional coupler with sinusoidal bends.

Science.gov (United States)

Zhang, Fan; Yun, Han; Wang, Yun; Lu, Zeqin; Chrostowski, Lukas; Jaeger, Nicolas A F

2017-01-15

We design and demonstrate a compact broadband polarization beam splitter (PBS) using a symmetric directional coupler with sinusoidal bends on a silicon-on-insulator platform. The sinusoidal bends in our PBS suppress the power exchange between two parallel symmetric strip waveguides for the transverse-electric (TE) mode, while allowing for the maximum power transfer to the adjacent waveguide for the transverse-magnetic (TM) mode. Our PBS has a nominal coupler length of 8.55 μm, and it has an average extinction ratio (ER) of 12.0 dB for the TE mode, an average ER of 20.1 dB for the TM mode, an average polarization isolation (PI) of 20.6 dB for the through port, and an average PI of 11.5 dB for the cross port, all over a bandwidth of 100 nm.
Marginal Stability Diagrams for Infinite-n Ballooning Modes in Quasi-symmetric Stellarators

International Nuclear Information System (INIS)

Hudson, S.R.; Hegna, C.C.; Torasso, R.; Ware, A.

2003-01-01

By perturbing the pressure and rotational-transform profiles at a selected surface in a given equilibrium, and by inducing a coordinate variation such that the perturbed state is in equilibrium, a family of magnetohydrodynamic equilibria local to the surface and parameterized by the pressure gradient and shear is constructed for arbitrary stellarator geometry. The geometry of the surface is not changed. The perturbed equilibria are analyzed for infinite-n ballooning stability and marginal stability diagrams are constructed that are analogous to the (s; alpha) diagrams constructed for axi-symmetric configurations. The method describes how pressure and rotational-transform gradients influence the local shear, which in turn influences the ballooning stability. Stability diagrams for the quasi-axially-symmetric NCSX (National Compact Stellarator Experiment), a quasi-poloidally-symmetric configuration and the quasi-helically-symmetric HSX (Helically Symmetric Experiment) are presented. Regions of second-stability are observed in both NCSX and the quasi-poloidal configuration, whereas no second stable region is observed for the quasi-helically symmetric device. To explain the different regions of stability, the curvature and local shear of the quasi-poloidal configuration are analyzed. The results are seemingly consistent with the simple explanation: ballooning instability results when the local shear is small in regions of bad curvature. Examples will be given that show that the structure, and stability, of the ballooning mode is determined by the structure of the potential function arising in the Schroedinger form of the ballooning equation
Sparse approximation with bases

CERN Document Server

2015-01-01

This book systematically presents recent fundamental results on greedy approximation with respect to bases. Motivated by numerous applications, the last decade has seen great successes in studying nonlinear sparse approximation. Recent findings have established that greedy-type algorithms are suitable methods of nonlinear approximation in both sparse approximation with respect to bases and sparse approximation with respect to redundant systems. These insights, combined with some previous fundamental results, form the basis for constructing the theory of greedy approximation. Taking into account the theoretical and practical demand for this kind of theory, the book systematically elaborates a theoretical framework for greedy approximation and its applications. The book addresses the needs of researchers working in numerical mathematics, harmonic analysis, and functional analysis. It quickly takes the reader from classical results to the latest frontier, but is written at the level of a graduate course and do...
Efficient convolutional sparse coding

Science.gov (United States)

Wohlberg, Brendt

2017-06-20

Computationally efficient algorithms may be applied for fast dictionary learning solving the convolutional sparse coding problem in the Fourier domain. More specifically, efficient convolutional sparse coding may be derived within an alternating direction method of multipliers (ADMM) framework that utilizes fast Fourier transforms (FFT) to solve the main linear system in the frequency domain. Such algorithms may enable a significant reduction in computational cost over conventional approaches by implementing a linear solver for the most critical and computationally expensive component of the conventional iterative algorithm. The theoretical computational cost of the algorithm may be reduced from O(M.sup.3N) to O(MN log N), where N is the dimensionality of the data and M is the number of elements in the dictionary. This significant improvement in efficiency may greatly increase the range of problems that can practically be addressed via convolutional sparse representations.
Robust numerical methods for boundary-layer equations for a model problem of flow over a symmetric curved surface

NARCIS (Netherlands)

A.R. Ansari; B. Hossain; B. Koren (Barry); G.I. Shishkin (Gregori)

2007-01-01

textabstractWe investigate the model problem of flow of a viscous incompressible fluid past a symmetric curved surface when the flow is parallel to its axis. This problem is known to exhibit boundary layers. Also the problem does not have solutions in closed form, it is modelled by boundary-layer

Hyperspectral Unmixing with Robust Collaborative Sparse Regression

Directory of Open Access Journals (Sweden)

Chang Li

2016-07-01

Full Text Available Recently, sparse unmixing (SU of hyperspectral data has received particular attention for analyzing remote sensing images. However, most SU methods are based on the commonly admitted linear mixing model (LMM, which ignores the possible nonlinear effects (i.e., nonlinearity. In this paper, we propose a new method named robust collaborative sparse regression (RCSR based on the robust LMM (rLMM for hyperspectral unmixing. The rLMM takes the nonlinearity into consideration, and the nonlinearity is merely treated as outlier, which has the underlying sparse property. The RCSR simultaneously takes the collaborative sparse property of the abundance and sparsely distributed additive property of the outlier into consideration, which can be formed as a robust joint sparse regression problem. The inexact augmented Lagrangian method (IALM is used to optimize the proposed RCSR. The qualitative and quantitative experiments on synthetic datasets and real hyperspectral images demonstrate that the proposed RCSR is efficient for solving the hyperspectral SU problem compared with the other four state-of-the-art algorithms.
Stability Analysis on Sparsely Encoded Associative Memory with Short-Term Synaptic Dynamics

Science.gov (United States)

Xu, Muyuan; Katori, Yuichi; Aihara, Kazuyuki

This study investigates the stability of sparsely encoded associative memory in a network composed of stochastic neurons. The incorporation of short-term synaptic dynamics significantly changes the stability with respect to synaptic properties. Various states including static and oscillatory states are found in the network dynamics. Specifically, the sparseness of memory patterns raises the problem of spurious states. A mean field model is used to analyze the detailed structure in the stability and show that the performance of memory retrieval is recovered by appropriate feedback.
Representations of locally symmetric spaces

International Nuclear Information System (INIS)

Rahman, M.S.

1995-09-01

Locally symmetric spaces in reference to globally and Hermitian symmetric Riemannian spaces are studied. Some relations between locally and globally symmetric spaces are exhibited. A lucid account of results on relevant spaces, motivated by fundamental problems, are formulated as theorems and propositions. (author). 10 refs
A portable implementation of ARPACK for distributed memory parallel architectures

Energy Technology Data Exchange (ETDEWEB)

Maschhoff, K.J.; Sorensen, D.C.

1996-12-31

ARPACK is a package of Fortran 77 subroutines which implement the Implicitly Restarted Arnoldi Method used for solving large sparse eigenvalue problems. A parallel implementation of ARPACK is presented which is portable across a wide range of distributed memory platforms and requires minimal changes to the serial code. The communication layers used for message passing are the Basic Linear Algebra Communication Subprograms (BLACS) developed for the ScaLAPACK project and Message Passing Interface(MPI).
Design and Transmission Analysis of an Asymmetrical Spherical Parallel Manipulator

DEFF Research Database (Denmark)

Wu, Guanglei; Caro, Stéphane; Wang, Jiawei

2015-01-01

analysis and optimal design of the proposed manipulator based on its kinematic analysis. The input and output transmission indices of the manipulator are defined for its optimum design based on the virtual coefficient between the transmission wrenches and twist screws. The sets of optimal parameters......This paper presents an asymmetrical spherical parallel manipulator and its transmissibility analysis. This manipulator contains a center shaft to both generate a decoupled unlimited-torsion motion and support the mobile platform for high positioning accuracy. This work addresses the transmission...... are identified and the distribution of the transmission index is visualized. Moreover, a comparative study regarding to the performances with the symmetrical spherical parallel manipulators is conducted and the comparison shows the advantages of the proposed manipulator with respect to its spherical parallel...
A language for data-parallel and task parallel programming dedicated to multi-SIMD computers. Contributions to hydrodynamic simulation with lattice gases

International Nuclear Information System (INIS)

Pic, Marc Michel

1995-01-01

Parallel programming covers task-parallelism and data-parallelism. Many problems need both parallelisms. Multi-SIMD computers allow hierarchical approach of these parallelisms. The T++ language, based on C++, is dedicated to exploit Multi-SIMD computers using a programming paradigm which is an extension of array-programming to tasks managing. Our language introduced array of independent tasks to achieve separately (MIMD), on subsets of processors of identical behaviour (SIMD), in order to translate the hierarchical inclusion of data-parallelism in task-parallelism. To manipulate in a symmetrical way tasks and data we propose meta-operations which have the same behaviour on tasks arrays and on data arrays. We explain how to implement this language on our parallel computer SYMPHONIE in order to profit by the locally-shared memory, by the hardware virtualization, and by the multiplicity of communications networks. We analyse simultaneously a typical application of such architecture. Finite elements scheme for Fluid mechanic needs powerful parallel computers and requires large floating points abilities. Lattice gases is an alternative to such simulations. Boolean lattice bases are simple, stable, modular, need to floating point computation, but include numerical noise. Boltzmann lattice gases present large precision of computation, but needs floating points and are only locally stable. We propose a new scheme, called multi-bit, who keeps the advantages of each boolean model to which it is applied, with large numerical precision and reduced noise. Experiments on viscosity, physical behaviour, noise reduction and spurious invariants are shown and implementation techniques for parallel Multi-SIMD computers detailed. (author) [fr
Image fusion using sparse overcomplete feature dictionaries

Science.gov (United States)

Brumby, Steven P.; Bettencourt, Luis; Kenyon, Garrett T.; Chartrand, Rick; Wohlberg, Brendt

2015-10-06

Approaches for deciding what individuals in a population of visual system "neurons" are looking for using sparse overcomplete feature dictionaries are provided. A sparse overcomplete feature dictionary may be learned for an image dataset and a local sparse representation of the image dataset may be built using the learned feature dictionary. A local maximum pooling operation may be applied on the local sparse representation to produce a translation-tolerant representation of the image dataset. An object may then be classified and/or clustered within the translation-tolerant representation of the image dataset using a supervised classification algorithm and/or an unsupervised clustering algorithm.
Global Convergence of Schubert’s Method for Solving Sparse Nonlinear Equations

Directory of Open Access Journals (Sweden)

Huiping Cao

2014-01-01

Full Text Available Schubert’s method is an extension of Broyden’s method for solving sparse nonlinear equations, which can preserve the zero-nonzero structure defined by the sparse Jacobian matrix and can retain many good properties of Broyden’s method. In particular, Schubert’s method has been proved to be locally and q-superlinearly convergent. In this paper, we globalize Schubert’s method by using a nonmonotone line search. Under appropriate conditions, we show that the proposed algorithm converges globally and superlinearly. Some preliminary numerical experiments are presented, which demonstrate that our algorithm is effective for large-scale problems.
A parallel buffer tree

DEFF Research Database (Denmark)

Sitchinava, Nodar; Zeh, Norbert

2012-01-01

We present the parallel buffer tree, a parallel external memory (PEM) data structure for batched search problems. This data structure is a non-trivial extension of Arge's sequential buffer tree to a private-cache multiprocessor environment and reduces the number of I/O operations by the number of...... in the optimal OhOf(psortN + K/PB) parallel I/O complexity, where K is the size of the output reported in the process and psortN is the parallel I/O complexity of sorting N elements using P processors....
Bistable states of TM polarized non-linear waves guided by symmetric layered structures

International Nuclear Information System (INIS)

Mihalache, D.

1985-04-01

Dispersion relations for TM polarized non-linear waves propagating in a symmetric single film optical waveguide are derived. The system consists of a layer of thickness d with dielectric constant epsilon 1 bounded at two sides by a non-linear medium characterized by the diagonal dielectric tensor epsilon 11 =epsilon 22 =epsilon 0 , epsilon 33 =epsilon 0 +α|E 3 | 2 , where E 3 is the normal electric field component. For sufficiently large d/lambda (lambda is the wavelength) we predict bistable states of both symmetric and antisymmetric modes provided that the power flow is the control parameter. (author)
Real-Time Fabric Defect Detection Using Accelerated Small-Scale Over-Completed Dictionary of Sparse Coding

Directory of Open Access Journals (Sweden)

Tianpeng Feng

2016-01-01

Full Text Available An auto fabric defect detection system via computer vision is used to replace manual inspection. In this paper, we propose a hardware accelerated algorithm based on a small-scale over-completed dictionary (SSOCD via sparse coding (SC method, which is realized on a parallel hardware platform (TMS320C6678. In order to reduce computation, the image patches projections in the training SSOCD are taken as features and the proposed features are more robust, and exhibit obvious advantages in detection results and computational cost. Furthermore, we introduce detection ratio and false ratio in order to measure the performance and reliability of the hardware accelerated algorithm. The experiments show that the proposed algorithm can run with high parallel efficiency and that the detection speed meets the real-time requirements of industrial inspection.
Electron temperature structures associated with magnetic tearing modes in the Madison Symmetric Torus

Science.gov (United States)

Stephens, Hillary Dianne

Tearing mode induced magnetic islands have a significant impact on the thermal characteristics of magnetically confined plasmas such as those in the reversed-field-pinch. Using a state-of-the-art Thomson scattering (TS) diagnostic, electron temperature fluctuations correlated with magnetic tearing modes have been observed on the Madison Symmetric Torus reversed-field-pinch. The TS diagnostic consists of two independently triggerable Nd:YAG lasers that can each pulse up to 15 times each plasma discharge and 21 General Atomics polchromators equipped with avalanche photodiode modules. Detailed calibrations focusing on accuracy, ease of use and repeatability and in-situ measurements have been performed on the system. Electron temperature (Te) profiles are acquired at 25 kHz with 2 cm or less resolution along the minor radius, sufficient to measure the effect of an island on the profile as the island rotates by the measurement point. Bayesian data analysis techniques are developed and used to detect fluctuations over an ensemble of shots. Four cases are studied; standard plasmas in quiescent periods, through sawteeth, through core reconnection events and in plasmas where the tearing mode activity is decreased. With a spectrum of unstable tearing modes, remnant islands that tend to flatten the temperature profile are present in the core between sawtooth-like reconnection events. This flattening is characteristic of rapid parallel heat conduction along helical magnetic field lines. The spatial structure of the temperature fluctuations show that the location of the rational surface of the m/n = 1/6 tearing mode is significantly further in than equilibrium suggestions predict. The fluctuations also provide a measurement of the remnant island width which is significantly smaller than the predicted full island width. These correlated fluctuations disappear during both global and core reconnection events. In striking contrast to temperature flattening, a temperature gradient
Application of regularization technique in image super-resolution algorithm via sparse representation

Science.gov (United States)

Huang, De-tian; Huang, Wei-qin; Huang, Hui; Zheng, Li-xin

2017-11-01

To make use of the prior knowledge of the image more effectively and restore more details of the edges and structures, a novel sparse coding objective function is proposed by applying the principle of the non-local similarity and manifold learning on the basis of super-resolution algorithm via sparse representation. Firstly, the non-local similarity regularization term is constructed by using the similar image patches to preserve the edge information. Then, the manifold learning regularization term is constructed by utilizing the locally linear embedding approach to enhance the structural information. The experimental results validate that the proposed algorithm has a significant improvement compared with several super-resolution algorithms in terms of the subjective visual effect and objective evaluation indices.
“Word upon a Word”: Parallelism, Meaning, and Emergent Structure in Kalevala-meter Poetry

Directory of Open Access Journals (Sweden)

Lotte Tarkka

2017-10-01

Full Text Available This essay treats parallelism as a means for articulating and communicating meaning in performance. Rather than a merely stylistic and structural marker, parallelism is discussed as an expressive and cognitive strategy for the elaboration of notions and cognitive categories that are vital in the culture and central for the individual performers. The essay is based on an analysis of short forms of Kalevala-meter poetry from Viena Karelia: proverbs, aphorisms, and lyric poetry. In the complex system of genres using the same poetic meter parallelism transformed genres and contributed to the emergence of cohesive and finalized performances.
Automated, parallel mass spectrometry imaging and structural identification of lipids

DEFF Research Database (Denmark)

Ellis, Shane R.; Paine, Martin R.L.; Eijkel, Gert B.

2018-01-01

We report a method that enables automated data-dependent acquisition of lipid tandem mass spectrometry data in parallel with a high-resolution mass spectrometry imaging experiment. The method does not increase the total image acquisition time and is combined with automatic structural assignments....... This lipidome-per-pixel approach automatically identified and validated 104 unique molecular lipids and their spatial locations from rat cerebellar tissue....
Plasma and energetic particle structure of a collisionless quasi-parallel shock

Science.gov (United States)

Kennel, C. F.; Scarf, F. L.; Coroniti, F. V.; Russell, C. T.; Smith, E. J.; Wenzel, K. P.; Reinhard, R.; Sanderson, T. R.; Feldman, W. C.; Parks, G. K.

1983-01-01

The quasi-parallel interplanetary shock of November 11-12, 1978 from both the collisionless shock and energetic particle points of view were studied using measurements of the interplanetary magnetic and electric fields, solar wind electrons, plasma and MHD waves, and intermediate and high energy ions obtained on ISEE-1, -2, and -3. The interplanetary environment through which the shock was propagating when it encountered the three spacecraft was characterized; the observations of this shock are documented and current theories of quasi-parallel shock structure and particle acceleration are tested. These observations tend to confirm present self consistent theories of first order Fermi acceleration by shocks and of collisionless shock dissipation involving firehouse instability.
Pricing and collecting decisions in a closed-loop supply chain with symmetric and asymmetric information

DEFF Research Database (Denmark)

Wei, Jie; Govindan, Kannan; Li, Yongjian

2015-01-01

. The optimal strategies in closed form are given under the decision scenarios with symmetric information; moreover, the first order conditions that the optimal retail price, optimal wholesale price, and optimal collection rate satisfy are given under the decision scenarios with asymmetric information......The optimal decision problem of a closed-loop supply chain with symmetric and asymmetric information structures is considered using game theory in this paper. The paper aims to explore how the manufacturer and the retailer make their own decisions about wholesale price, retail price, and collection...... rate under symmetric and asymmetric information conditions. Four game models are established, which allow one to examine the strategies of each firm and explore the role of the manufacturer and the retailer in four different game scenarios under symmetric and asymmetric information structures...
Reduced-Order Structure-Preserving Model for Parallel-Connected Three-Phase Grid-Tied Inverters: Preprint

Energy Technology Data Exchange (ETDEWEB)

Johnson, Brian B [National Renewable Energy Laboratory (NREL), Golden, CO (United States); Purba, Victor [University of Minnesota; Jafarpour, Saber [University of California, Santa Barbara; Bullo, Francesco [University of California, Santa Barbara; Dhople, Sairaj [University of Minnesota

2017-08-31

Given that next-generation infrastructures will contain large numbers of grid-connected inverters and these interfaces will be satisfying a growing fraction of system load, it is imperative to analyze the impacts of power electronics on such systems. However, since each inverter model has a relatively large number of dynamic states, it would be impractical to execute complex system models where the full dynamics of each inverter are retained. To address this challenge, we derive a reduced-order structure-preserving model for parallel-connected grid-tied three-phase inverters. Here, each inverter in the system is assumed to have a full-bridge topology, LCL filter at the point of common coupling, and the control architecture for each inverter includes a current controller, a power controller, and a phase-locked loop for grid synchronization. We outline a structure-preserving reduced-order inverter model for the setting where the parallel inverters are each designed such that the filter components and controller gains scale linearly with the power rating. By structure preserving, we mean that the reduced-order three-phase inverter model is also composed of an LCL filter, a power controller, current controller, and PLL. That is, we show that the system of parallel inverters can be modeled exactly as one aggregated inverter unit and this equivalent model has the same number of dynamical states as an individual inverter in the paralleled system. Numerical simulations validate the reduced-order models.
From particle in a box to PT -symmetric systems via isospectral deformation

OpenAIRE

Cherian, Philip; Abhinav, Kumar; Panigrahi, P. K.

2011-01-01

A family of PT -symmetric complex potentials is obtained, which is isospectral to free particle in an infinite complex box in one dimension (1-D). These are generalizations to the cosec2 (x) potential, isospectral to particle in a real infinite box. In the complex plane, the infinite box is extended parallel to the real axis having a real width, which is found to be an integral multiple of a constant quantum factor, arising due to boundary conditions necessary for maintaining the PT -symmetry...
A new parallelization algorithm of ocean model with explicit scheme

Science.gov (United States)

Fu, X. D.

2017-08-01

This paper will focus on the parallelization of ocean model with explicit scheme which is one of the most commonly used schemes in the discretization of governing equation of ocean model. The characteristic of explicit schema is that calculation is simple, and that the value of the given grid point of ocean model depends on the grid point at the previous time step, which means that one doesn’t need to solve sparse linear equations in the process of solving the governing equation of the ocean model. Aiming at characteristics of the explicit scheme, this paper designs a parallel algorithm named halo cells update with tiny modification of original ocean model and little change of space step and time step of the original ocean model, which can parallelize ocean model by designing transmission module between sub-domains. This paper takes the GRGO for an example to implement the parallelization of GRGO (Global Reduced Gravity Ocean model) with halo update. The result demonstrates that the higher speedup can be achieved at different problem size.

Non-parametric co-clustering of large scale sparse bipartite networks on the GPU

DEFF Research Database (Denmark)

Hansen, Toke Jansen; Mørup, Morten; Hansen, Lars Kai

2011-01-01

of row and column clusters from a hypothesis space of an infinite number of clusters. To reach large scale applications of co-clustering we exploit that parameter inference for co-clustering is well suited for parallel computing. We develop a generic GPU framework for efficient inference on large scale...... sparse bipartite networks and achieve a speedup of two orders of magnitude compared to estimation based on conventional CPUs. In terms of scalability we find for networks with more than 100 million links that reliable inference can be achieved in less than an hour on a single GPU. To efficiently manage...
Regression with Sparse Approximations of Data

DEFF Research Database (Denmark)

Noorzad, Pardis; Sturm, Bob L.

2012-01-01

We propose sparse approximation weighted regression (SPARROW), a method for local estimation of the regression function that uses sparse approximation with a dictionary of measurements. SPARROW estimates the regression function at a point with a linear combination of a few regressands selected...... by a sparse approximation of the point in terms of the regressors. We show SPARROW can be considered a variant of \$k\$-nearest neighbors regression (\$k\$-NNR), and more generally, local polynomial kernel regression. Unlike \$k\$-NNR, however, SPARROW can adapt the number of regressors to use based...
Sparse adaptive filters for echo cancellation

CERN Document Server

Paleologu, Constantin

2011-01-01

Adaptive filters with a large number of coefficients are usually involved in both network and acoustic echo cancellation. Consequently, it is important to improve the convergence rate and tracking of the conventional algorithms used for these applications. This can be achieved by exploiting the sparseness character of the echo paths. Identification of sparse impulse responses was addressed mainly in the last decade with the development of the so-called ``proportionate''-type algorithms. The goal of this book is to present the most important sparse adaptive filters developed for echo cancellati
Parallel computing solution of Boltzmann neutron transport equation

International Nuclear Information System (INIS)

Ansah-Narh, T.

2010-01-01

The focus of the research was on developing parallel computing algorithm for solving Eigen-values of the Boltzmam Neutron Transport Equation (BNTE) in a slab geometry using multi-grid approach. In response to the problem of slow execution of serial computing when solving large problems, such as BNTE, the study was focused on the design of parallel computing systems which was an evolution of serial computing that used multiple processing elements simultaneously to solve complex physical and mathematical problems. Finite element method (FEM) was used for the spatial discretization scheme, while angular discretization was accomplished by expanding the angular dependence in terms of Legendre polynomials. The eigenvalues representing the multiplication factors in the BNTE were determined by the power method. MATLAB Compiler Version 4.1 (R2009a) was used to compile the MATLAB codes of BNTE. The implemented parallel algorithms were enabled with matlabpool, a Parallel Computing Toolbox function. The option UseParallel was set to 'always' and the default value of the option was 'never'. When those conditions held, the solvers computed estimated gradients in parallel. The parallel computing system was used to handle all the bottlenecks in the matrix generated from the finite element scheme and each domain of the power method generated. The parallel algorithm was implemented on a Symmetric Multi Processor (SMP) cluster machine, which had Intel 32 bit quad-core x 86 processors. Convergence rates and timings for the algorithm on the SMP cluster machine were obtained. Numerical experiments indicated the designed parallel algorithm could reach perfect speedup and had good stability and scalability. (au)
3D, parallel fluid-structure interaction code

CSIR Research Space (South Africa)

Oxtoby, Oliver F

2011-01-01

Full Text Available The authors describe the development of a 3D parallel Fluid–Structure–Interaction (FSI) solver and its application to benchmark problems. Fluid and solid domains are discretised using and edge-based finite-volume scheme for efficient parallel...
Workload Balancing on Heterogeneous Systems: A Case Study of Sparse Grid Interpolation

KAUST Repository

Muraraşu, Alin

2012-01-01

Multi-core parallelism and accelerators are becoming common features of today’s computer systems, as they allow for computational power without sacrificing energy efficiency. Due to heterogeneity, tuning for each type of compute unit and adequate load balancing is essential. This paper proposes static and dynamic solutions for load balancing in the context of an application for visualizing high-dimensional simulation data. The application relies on the sparse grid technique for data compression. Its performance critical part is the interpolation routine used for decompression. Results show that our load balancing scheme allows for an efficient acceleration of interpolation on heterogeneous systems containing multi-core CPUs and GPUs.
Design and performance characterization of electronic structure calculations on massively parallel supercomputers

DEFF Research Database (Denmark)

Romero, N. A.; Glinsvad, Christian; Larsen, Ask Hjorth

2013-01-01

Density function theory (DFT) is the most widely employed electronic structure method because of its favorable scaling with system size and accuracy for a broad range of molecular and condensed-phase systems. The advent of massively parallel supercomputers has enhanced the scientific community...
Accelerating Scientific Applications using High Performance Dense and Sparse Linear Algebra Kernels on GPUs

KAUST Repository

Abdelfattah, Ahmad

2015-01-15

High performance computing (HPC) platforms are evolving to more heterogeneous configurations to support the workloads of various applications. The current hardware landscape is composed of traditional multicore CPUs equipped with hardware accelerators that can handle high levels of parallelism. Graphical Processing Units (GPUs) are popular high performance hardware accelerators in modern supercomputers. GPU programming has a different model than that for CPUs, which means that many numerical kernels have to be redesigned and optimized specifically for this architecture. GPUs usually outperform multicore CPUs in some compute intensive and massively parallel applications that have regular processing patterns. However, most scientific applications rely on crucial memory-bound kernels and may witness bottlenecks due to the overhead of the memory bus latency. They can still take advantage of the GPU compute power capabilities, provided that an efficient architecture-aware design is achieved. This dissertation presents a uniform design strategy for optimizing critical memory-bound kernels on GPUs. Based on hierarchical register blocking, double buffering and latency hiding techniques, this strategy leverages the performance of a wide range of standard numerical kernels found in dense and sparse linear algebra libraries. The work presented here focuses on matrix-vector multiplication kernels (MVM) as repre- sentative and most important memory-bound operations in this context. Each kernel inherits the benefits of the proposed strategies. By exposing a proper set of tuning parameters, the strategy is flexible enough to suit different types of matrices, ranging from large dense matrices, to sparse matrices with dense block structures, while high performance is maintained. Furthermore, the tuning parameters are used to maintain the relative performance across different GPU architectures. Multi-GPU acceleration is proposed to scale the performance on several devices. The
Reduced-Order Structure-Preserving Model for Parallel-Connected Three-Phase Grid-Tied Inverters

Energy Technology Data Exchange (ETDEWEB)

Johnson, Brian B [National Renewable Energy Laboratory (NREL), Golden, CO (United States); Purba, Victor [University of Minnesota; Jafarpour, Saber [University of California Santa-Barbara; Bullo, Francesco [University of California Santa-Barbara; Dhople, Sairaj V. [University of Minnesota

2017-08-21

Next-generation power networks will contain large numbers of grid-connected inverters satisfying a significant fraction of system load. Since each inverter model has a relatively large number of dynamic states, it is impractical to analyze complex system models where the full dynamics of each inverter are retained. To address this challenge, we derive a reduced-order structure-preserving model for parallel-connected grid-tied three-phase inverters. Here, each inverter in the system is assumed to have a full-bridge topology, LCL filter at the point of common coupling, and the control architecture for each inverter includes a current controller, a power controller, and a phase-locked loop for grid synchronization. We outline a structure-preserving reduced-order inverter model with lumped parameters for the setting where the parallel inverters are each designed such that the filter components and controller gains scale linearly with the power rating. By structure preserving, we mean that the reduced-order three-phase inverter model is also composed of an LCL filter, a power controller, current controller, and PLL. We show that the system of parallel inverters can be modeled exactly as one aggregated inverter unit and this equivalent model has the same number of dynamical states as any individual inverter in the system. Numerical simulations validate the reduced-order model.
Fast alternating projected gradient descent algorithms for recovering spectrally sparse signals

KAUST Repository

Cho, Myung

2016-06-24

We propose fast algorithms that speed up or improve the performance of recovering spectrally sparse signals from un-derdetermined measurements. Our algorithms are based on a non-convex approach of using alternating projected gradient descent for structured matrix recovery. We apply this approach to two formulations of structured matrix recovery: Hankel and Toeplitz mosaic structured matrix, and Hankel structured matrix. Our methods provide better recovery performance, and faster signal recovery than existing algorithms, including atomic norm minimization.
Fast alternating projected gradient descent algorithms for recovering spectrally sparse signals

KAUST Repository

Cho, Myung; Cai, Jian-Feng; Liu, Suhui; Eldar, Yonina C.; Xu, Weiyu

2016-01-01

We propose fast algorithms that speed up or improve the performance of recovering spectrally sparse signals from un-derdetermined measurements. Our algorithms are based on a non-convex approach of using alternating projected gradient descent for structured matrix recovery. We apply this approach to two formulations of structured matrix recovery: Hankel and Toeplitz mosaic structured matrix, and Hankel structured matrix. Our methods provide better recovery performance, and faster signal recovery than existing algorithms, including atomic norm minimization.
Hardware and software and machine-tool simulation with parallel structures mechanisms

Directory of Open Access Journals (Sweden)

Keba P.V.

2016-12-01

Full Text Available The usage spectrum of mechanisms with parallel structure is spreading all the time. The mechanisms of machine-tools and manipulators become more complicated and it is necessary to improve the program-controlled modules. Closed circuit mechanisms are mostly spread in robotic complexes, where manipulator performs complicated spatial movements by the given trajectory. The usage spectrum is very wide and the most popular are sorting, welding, assembling and others. However, the problem of designing the operating programs is still present even today. It is just because the developed post-processors are created for the equipment that we have for now. But new machine tool constructions appear every day and there is a necessity to control them. The problems associated with using of hardware and software of mechanisms with parallel structure in computer-aided simulation are considered. The program for inverse problem kinematics solving is designed. New method of designing the control programs is found. The kinematic analysis methods options and calculated data obtained by computer mathematics systems are shown with «Tools Glide» software taken as an example.
The symmetric MSD encoder for one-step adder of ternary optical computer

Science.gov (United States)

Kai, Song; LiPing, Yan

2016-08-01

The symmetric Modified Signed-Digit (MSD) encoding is important for achieving the one-step MSD adder of Ternary Optical Computer (TOC). The paper described the symmetric MSD encoding algorithm in detail, and developed its truth table which has nine rows and nine columns. According to the truth table, the state table was developed, and the optical-path structure and circuit-implementation scheme of the symmetric MSD encoder (SME) for one-step adder of TOC were proposed. Finally, a series of experiments were designed and performed. The observed results of the experiments showed that the scheme to implement SME was correct, feasible and efficient.
Dual formulation of covariant nonlinear duality-symmetric action of kappa-symmetric D3-brane

Science.gov (United States)

Vanichchapongjaroen, Pichet

2018-02-01

We study the construction of covariant nonlinear duality-symmetric actions in dual formulation. Essentially, the construction is the PST-covariantisation and nonlinearisation of Zwanziger action. The covariantisation made use of three auxiliary scalar fields. Apart from these, the construction proceed in a similar way to that of the standard formulation. For example, the theories can be extended to include interactions with external fields, and that the theories possess two local PST symmetries. We then explicitly demonstrate the construction of covariant nonlinear duality-symmetric actions in dual formulation of DBI theory, and D3-brane. For each of these theories, the twisted selfduality condition obtained from duality-symmetric actions are explicitly shown to match with the duality relation between field strength and its dual from the one-potential actions. Their on-shell actions between the duality-symmetric and the one-potential versions are also shown to match. We also explicitly prove kappa-symmetry of the covariant nonlinear duality-symmetric D3-brane action in dual formulation.
Sparse representation for infrared Dim target detection via a discriminative over-complete dictionary learned online.

Science.gov (United States)

Li, Zheng-Zhou; Chen, Jing; Hou, Qian; Fu, Hong-Xia; Dai, Zhen; Jin, Gang; Li, Ru-Zhang; Liu, Chang-Ju

2014-05-27

It is difficult for structural over-complete dictionaries such as the Gabor function and discriminative over-complete dictionary, which are learned offline and classified manually, to represent natural images with the goal of ideal sparseness and to enhance the difference between background clutter and target signals. This paper proposes an infrared dim target detection approach based on sparse representation on a discriminative over-complete dictionary. An adaptive morphological over-complete dictionary is trained and constructed online according to the content of infrared image by K-singular value decomposition (K-SVD) algorithm. Then the adaptive morphological over-complete dictionary is divided automatically into a target over-complete dictionary describing target signals, and a background over-complete dictionary embedding background by the criteria that the atoms in the target over-complete dictionary could be decomposed more sparsely based on a Gaussian over-complete dictionary than the one in the background over-complete dictionary. This discriminative over-complete dictionary can not only capture significant features of background clutter and dim targets better than a structural over-complete dictionary, but also strengthens the sparse feature difference between background and target more efficiently than a discriminative over-complete dictionary learned offline and classified manually. The target and background clutter can be sparsely decomposed over their corresponding over-complete dictionaries, yet couldn't be sparsely decomposed based on their opposite over-complete dictionary, so their residuals after reconstruction by the prescribed number of target and background atoms differ very visibly. Some experiments are included and the results show that this proposed approach could not only improve the sparsity more efficiently, but also enhance the performance of small target detection more effectively.
Sparse Representation for Infrared Dim Target Detection via a Discriminative Over-Complete Dictionary Learned Online

Directory of Open Access Journals (Sweden)

Zheng-Zhou Li

2014-05-01

Full Text Available It is difficult for structural over-complete dictionaries such as the Gabor function and discriminative over-complete dictionary, which are learned offline and classified manually, to represent natural images with the goal of ideal sparseness and to enhance the difference between background clutter and target signals. This paper proposes an infrared dim target detection approach based on sparse representation on a discriminative over-complete dictionary. An adaptive morphological over-complete dictionary is trained and constructed online according to the content of infrared image by K-singular value decomposition (K-SVD algorithm. Then the adaptive morphological over-complete dictionary is divided automatically into a target over-complete dictionary describing target signals, and a background over-complete dictionary embedding background by the criteria that the atoms in the target over-complete dictionary could be decomposed more sparsely based on a Gaussian over-complete dictionary than the one in the background over-complete dictionary. This discriminative over-complete dictionary can not only capture significant features of background clutter and dim targets better than a structural over-complete dictionary, but also strengthens the sparse feature difference between background and target more efficiently than a discriminative over-complete dictionary learned offline and classified manually. The target and background clutter can be sparsely decomposed over their corresponding over-complete dictionaries, yet couldn’t be sparsely decomposed based on their opposite over-complete dictionary, so their residuals after reconstruction by the prescribed number of target and background atoms differ very visibly. Some experiments are included and the results show that this proposed approach could not only improve the sparsity more efficiently, but also enhance the performance of small target detection more effectively.
Constraint treatment techniques and parallel algorithms for multibody dynamic analysis. Ph.D. Thesis

Science.gov (United States)

Chiou, Jin-Chern

1990-01-01

Computational procedures for kinematic and dynamic analysis of three-dimensional multibody dynamic (MBD) systems are developed from the differential-algebraic equations (DAE's) viewpoint. Constraint violations during the time integration process are minimized and penalty constraint stabilization techniques and partitioning schemes are developed. The governing equations of motion, a two-stage staggered explicit-implicit numerical algorithm, are treated which takes advantage of a partitioned solution procedure. A robust and parallelizable integration algorithm is developed. This algorithm uses a two-stage staggered central difference algorithm to integrate the translational coordinates and the angular velocities. The angular orientations of bodies in MBD systems are then obtained by using an implicit algorithm via the kinematic relationship between Euler parameters and angular velocities. It is shown that the combination of the present solution procedures yields a computationally more accurate solution. To speed up the computational procedures, parallel implementation of the present constraint treatment techniques, the two-stage staggered explicit-implicit numerical algorithm was efficiently carried out. The DAE's and the constraint treatment techniques were transformed into arrowhead matrices to which Schur complement form was derived. By fully exploiting the sparse matrix structural analysis techniques, a parallel preconditioned conjugate gradient numerical algorithm is used to solve the systems equations written in Schur complement form. A software testbed was designed and implemented in both sequential and parallel computers. This testbed was used to demonstrate the robustness and efficiency of the constraint treatment techniques, the accuracy of the two-stage staggered explicit-implicit numerical algorithm, and the speed up of the Schur-complement-based parallel preconditioned conjugate gradient algorithm on a parallel computer.
In–HgCdTe–In structures with symmetric nonlinear I–V characteristics for sub-THz direct detection

Directory of Open Access Journals (Sweden)

N.I. Kukhtaruk

2017-07-01

Full Text Available This paper reports on the development and investigations of In–Hg1–xCdxTe–In structures with symmetric nonlinear I–V curves that are sensitive to sub-terahertz radiation. It is shown that at low currents photoresponse of the detectors based on these structures is due to the presence of potential barriers at the contacts. The dependences of the photoresponse as the function of the bias current are measured at the radiation frequency  = 140 GHz in 77–300 K temperature range. The studied structures may be used as the detectors of sub-terahertz radiation at room temperature or under weak cooling. The calculated NEP of investigated In–n-Hg0.61Cd0.39Te–In detectors was 3.5•10–9 W/Hz1/2, if taking into account thermal and shot noise.
Robust Face Recognition Via Gabor Feature and Sparse Representation

Directory of Open Access Journals (Sweden)

Hao Yu-Juan

2016-01-01

Full Text Available Sparse representation based on compressed sensing theory has been widely used in the field of face recognition, and has achieved good recognition results. but the face feature extraction based on sparse representation is too simple, and the sparse coefficient is not sparse. In this paper, we improve the classification algorithm based on the fusion of sparse representation and Gabor feature, and then improved algorithm for Gabor feature which overcomes the problem of large dimension of the vector dimension, reduces the computation and storage cost, and enhances the robustness of the algorithm to the changes of the environment.The classification efficiency of sparse representation is determined by the collaborative representation,we simplify the sparse constraint based on L1 norm to the least square constraint, which makes the sparse coefficients both positive and reduce the complexity of the algorithm. Experimental results show that the proposed method is robust to illumination, facial expression and pose variations of face recognition, and the recognition rate of the algorithm is improved.
Sparse Learning with Stochastic Composite Optimization.

Science.gov (United States)

Zhang, Weizhong; Zhang, Lijun; Jin, Zhongming; Jin, Rong; Cai, Deng; Li, Xuelong; Liang, Ronghua; He, Xiaofei

2017-06-01

In this paper, we study Stochastic Composite Optimization (SCO) for sparse learning that aims to learn a sparse solution from a composite function. Most of the recent SCO algorithms have already reached the optimal expected convergence rate O(1/λT), but they often fail to deliver sparse solutions at the end either due to the limited sparsity regularization during stochastic optimization (SO) or due to the limitation in online-to-batch conversion. Even when the objective function is strongly convex, their high probability bounds can only attain O(√{log(1/δ)/T}) with δ is the failure probability, which is much worse than the expected convergence rate. To address these limitations, we propose a simple yet effective two-phase Stochastic Composite Optimization scheme by adding a novel powerful sparse online-to-batch conversion to the general Stochastic Optimization algorithms. We further develop three concrete algorithms, OptimalSL, LastSL and AverageSL, directly under our scheme to prove the effectiveness of the proposed scheme. Both the theoretical analysis and the experiment results show that our methods can really outperform the existing methods at the ability of sparse learning and at the meantime we can improve the high probability bound to approximately O(log(log(T)/δ)/λT).

Experimental research on density wave oscillation of steam-water two-phase flow in parallel inclined internally ribbed pipes

International Nuclear Information System (INIS)

Gao Feng; Chen Tingkuan; Luo Yushan; Yin Fei; Liu Weimin

2005-01-01

At p=3-10 MPa, G=300-600 kg/(m 2 ·s), Δt sub =30-90 degree C, and q=0-190 kW/m 2 , the experiments on steam-water two-phase flow instabilities have been performed. The test sections are parallel inclined internally ribbed pipes with an outer diameter of φ38.1 mm, a wall thinkness of 7.5 mm, a obliquity of 19.5 and a length more than 15 m length. Based on the experimental results, the effects of pressure, mass velocity, inlet subcooling and asymmetrical heat flux on steam-water two-phase flow density wave oscillation were analyzed. The experimental results showed that the flow system were more stable as pressure increased. As an increase in mass velocity, critical heat flux increased but critical steam quality decreased. Inlet subcooling had a monotone effect on density wave oscillation, when inlet subcooling decreased, critical heat flux decreased. Under a certain working condition, critical heat flux on asymmetrically heating parallel pipes is higher than that on symmetrically heating parallel pipes, that means the system with symmetrically heating parallel pips was more stable. (authors)
Shearlets and Optimally Sparse Approximations

DEFF Research Database (Denmark)

Kutyniok, Gitta; Lemvig, Jakob; Lim, Wang-Q

2012-01-01

Multivariate functions are typically governed by anisotropic features such as edges in images or shock fronts in solutions of transport-dominated equations. One major goal both for the purpose of compression as well as for an efficient analysis is the provision of optimally sparse approximations...... optimally sparse approximations of this model class in 2D as well as 3D. Even more, in contrast to all other directional representation systems, a theory for compactly supported shearlet frames was derived which moreover also satisfy this optimality benchmark. This chapter shall serve as an introduction...... to and a survey about sparse approximations of cartoon-like images by band-limited and also compactly supported shearlet frames as well as a reference for the state-of-the-art of this research field....
Symmetric Tensor Decomposition

DEFF Research Database (Denmark)

Brachat, Jerome; Comon, Pierre; Mourrain, Bernard

2010-01-01

We present an algorithm for decomposing a symmetric tensor, of dimension n and order d, as a sum of rank-1 symmetric tensors, extending the algorithm of Sylvester devised in 1886 for binary forms. We recall the correspondence between the decomposition of a homogeneous polynomial in n variables...... of polynomial equations of small degree in non-generic cases. We propose a new algorithm for symmetric tensor decomposition, based on this characterization and on linear algebra computations with Hankel matrices. The impact of this contribution is two-fold. First it permits an efficient computation...... of the decomposition of any tensor of sub-generic rank, as opposed to widely used iterative algorithms with unproved global convergence (e.g. Alternate Least Squares or gradient descents). Second, it gives tools for understanding uniqueness conditions and for detecting the rank....
An efficient implementation of parallel molecular dynamics method on SMP cluster architecture

International Nuclear Information System (INIS)

Suzuki, Masaaki; Okuda, Hiroshi; Yagawa, Genki

2003-01-01

The authors have applied MPI/OpenMP hybrid parallel programming model to parallelize a molecular dynamics (MD) method on a symmetric multiprocessor (SMP) cluster architecture. In that architecture, it can be expected that the hybrid parallel programming model, which uses the message passing library such as MPI for inter-SMP node communication and the loop directive such as OpenMP for intra-SNP node parallelization, is the most effective one. In this study, the parallel performance of the hybrid style has been compared with that of conventional flat parallel programming style, which uses only MPI, both in cases the fast multipole method (FMM) is employed for computing long-distance interactions and that is not employed. The computer environments used here are Hitachi SR8000/MPP placed at the University of Tokyo. The results of calculation are as follows. Without FMM, the parallel efficiency using 16 SMP nodes (128 PEs) is: 90% with the hybrid style, 75% with the flat-MPI style for MD simulation with 33,402 atoms. With FMM, the parallel efficiency using 16 SMP nodes (128 PEs) is: 60% with the hybrid style, 48% with the flat-MPI style for MD simulation with 117,649 atoms. (author)
Characterization of Generalized Young Measures Generated by Symmetric Gradients

Science.gov (United States)

De Philippis, Guido; Rindler, Filip

2017-06-01

This work establishes a characterization theorem for (generalized) Young measures generated by symmetric derivatives of functions of bounded deformation (BD) in the spirit of the classical Kinderlehrer-Pedregal theorem. Our result places such Young measures in duality with symmetric-quasiconvex functions with linear growth. The "local" proof strategy combines blow-up arguments with the singular structure theorem in BD (the analogue of Alberti's rank-one theorem in BV), which was recently proved by the authors. As an application of our characterization theorem we show how an atomic part in a BD-Young measure can be split off in generating sequences.
Superresolution radar imaging based on fast inverse-free sparse Bayesian learning for multiple measurement vectors

Science.gov (United States)

He, Xingyu; Tong, Ningning; Hu, Xiaowei

2018-01-01

Compressive sensing has been successfully applied to inverse synthetic aperture radar (ISAR) imaging of moving targets. By exploiting the block sparse structure of the target image, sparse solution for multiple measurement vectors (MMV) can be applied in ISAR imaging and a substantial performance improvement can be achieved. As an effective sparse recovery method, sparse Bayesian learning (SBL) for MMV involves a matrix inverse at each iteration. Its associated computational complexity grows significantly with the problem size. To address this problem, we develop a fast inverse-free (IF) SBL method for MMV. A relaxed evidence lower bound (ELBO), which is computationally more amiable than the traditional ELBO used by SBL, is obtained by invoking fundamental property for smooth functions. A variational expectation-maximization scheme is then employed to maximize the relaxed ELBO, and a computationally efficient IF-MSBL algorithm is proposed. Numerical results based on simulated and real data show that the proposed method can reconstruct row sparse signal accurately and obtain clear superresolution ISAR images. Moreover, the running time and computational complexity are reduced to a great extent compared with traditional SBL methods.
Language Recognition via Sparse Coding

Science.gov (United States)

2016-09-08

explanation is that sparse coding can achieve a near-optimal approximation of much complicated nonlinear relationship through local and piecewise linear...training examples, where x(i) ∈ RN is the ith example in the batch. Optionally, X can be normalized and whitened before sparse coding for better result...normalized input vectors are then ZCA- whitened [20]. Em- pirically, we choose ZCA- whitening over PCA- whitening , and there is no dimensionality reduction
Sparse Adaptive Iteratively-Weighted Thresholding Algorithm (SAITA for L p -Regularization Using the Multiple Sub-Dictionary Representation

Directory of Open Access Journals (Sweden)

Yunyi Li

2017-12-01

Full Text Available Both L 1 / 2 and L 2 / 3 are two typical non-convex regularizations of L p ( 0 < p < 1 , which can be employed to obtain a sparser solution than the L 1 regularization. Recently, the multiple-state sparse transformation strategy has been developed to exploit the sparsity in L 1 regularization for sparse signal recovery, which combines the iterative reweighted algorithms. To further exploit the sparse structure of signal and image, this paper adopts multiple dictionary sparse transform strategies for the two typical cases p ∈ { 1 / 2 , 2 / 3 } based on an iterative L p thresholding algorithm and then proposes a sparse adaptive iterative-weighted L p thresholding algorithm (SAITA. Moreover, a simple yet effective regularization parameter is proposed to weight each sub-dictionary-based L p regularizer. Simulation results have shown that the proposed SAITA not only performs better than the corresponding L 1 algorithms but can also obtain a better recovery performance and achieve faster convergence than the conventional single-dictionary sparse transform-based L p case. Moreover, we conduct some applications about sparse image recovery and obtain good results by comparison with relative work.
Parallelization of the preconditioned IDR solver for modern multicore computer systems

Science.gov (United States)

Bessonov, O. A.; Fedoseyev, A. I.

2012-10-01

This paper present the analysis, parallelization and optimization approach for the large sparse matrix solver CNSPACK for modern multicore microprocessors. CNSPACK is an advanced solver successfully used for coupled solution of stiff problems arising in multiphysics applications such as CFD, semiconductor transport, kinetic and quantum problems. It employs iterative IDR algorithm with ILU preconditioning (user chosen ILU preconditioning order). CNSPACK has been successfully used during last decade for solving problems in several application areas, including fluid dynamics and semiconductor device simulation. However, there was a dramatic change in processor architectures and computer system organization in recent years. Due to this, performance criteria and methods have been revisited, together with involving the parallelization of the solver and preconditioner using Open MP environment. Results of the successful implementation for efficient parallelization are presented for the most advances computer system (Intel Core i7-9xx or two-processor Xeon 55xx/56xx).
Sparse seismic imaging using variable projection

NARCIS (Netherlands)

Aravkin, Aleksandr Y.; Tu, Ning; van Leeuwen, Tristan

2013-01-01

We consider an important class of signal processing problems where the signal of interest is known to be sparse, and can be recovered from data given auxiliary information about how the data was generated. For example, a sparse Green's function may be recovered from seismic experimental data using
Tunable Sparse Network Coding for Multicast Networks

DEFF Research Database (Denmark)

Feizi, Soheil; Roetter, Daniel Enrique Lucani; Sørensen, Chres Wiant

2014-01-01

This paper shows the potential and key enabling mechanisms for tunable sparse network coding, a scheme in which the density of network coded packets varies during a transmission session. At the beginning of a transmission session, sparsely coded packets are transmitted, which benefits decoding...... complexity. At the end of a transmission, when receivers have accumulated degrees of freedom, coding density is increased. We propose a family of tunable sparse network codes (TSNCs) for multicast erasure networks with a controllable trade-off between completion time performance to decoding complexity...... a mechanism to perform efficient Gaussian elimination over sparse matrices going beyond belief propagation but maintaining low decoding complexity. Supporting simulation results are provided showing the trade-off between decoding complexity and completion time....
Parallel algorithms and architecture for computation of manipulator forward dynamics

Science.gov (United States)

Fijany, Amir; Bejczy, Antal K.

1989-01-01

Parallel computation of manipulator forward dynamics is investigated. Considering three classes of algorithms for the solution of the problem, that is, the O(n), the O(n exp 2), and the O(n exp 3) algorithms, parallelism in the problem is analyzed. It is shown that the problem belongs to the class of NC and that the time and processors bounds are of O(log2/2n) and O(n exp 4), respectively. However, the fastest stable parallel algorithms achieve the computation time of O(n) and can be derived by parallelization of the O(n exp 3) serial algorithms. Parallel computation of the O(n exp 3) algorithms requires the development of parallel algorithms for a set of fundamentally different problems, that is, the Newton-Euler formulation, the computation of the inertia matrix, decomposition of the symmetric, positive definite matrix, and the solution of triangular systems. Parallel algorithms for this set of problems are developed which can be efficiently implemented on a unique architecture, a triangular array of n(n+2)/2 processors with a simple nearest-neighbor interconnection. This architecture is particularly suitable for VLSI and WSI implementations. The developed parallel algorithm, compared to the best serial O(n) algorithm, achieves an asymptotic speedup of more than two orders-of-magnitude in the computation the forward dynamics.
An Explanation of Jupiter's Equatorially Symmetric Gravitational Field using a Four-layer, Non-spheroidal Model with Zonal Flow

Science.gov (United States)

Kong, Dali; Zhang, Keke; Schubert, Gerald; Anderson, John

2017-10-01

The structure/amplitude of the Jovian equatorially symmetric gravitational field is affected by both rotational distortion and the fast equatorially symmetric zonal flow. We construct a fully self-consistent, four-layer, non-spheroidal (i.e, the shape is irregular) model of Jupiter that comprises an inner core, a metallic region, an outer molecular envelope and a thin transition layer between the metallic and molecular regions. While the core is assumed to have a uniform density, three different equations of state are adopted for the metallic, molecular and transition regions. We solve the governing equations via a perturbation approach. The leading-order problem accounts for the full effect of rotational distortion, and determines the density, size and shape of the core, the location and thickness of the transition layer, and the shape of the 1-bar pressure level; it also produces the mass, the equatorial and polar radii of Jupiter, and the even zonal gravitational coefficients caused by the rotational distortion. The next-order problem determines the corrections caused by the zonal flow which is assumed to be confined within the molecular envelope and on cylinders parallel to the rotation axis. Our model provides the total even gravitational coefficients that can be compared with those acquired by the Juno spacecraft.
Sparse PCA with Oracle Property.

Science.gov (United States)

Gu, Quanquan; Wang, Zhaoran; Liu, Han

In this paper, we study the estimation of the k -dimensional sparse principal subspace of covariance matrix Σ in the high-dimensional setting. We aim to recover the oracle principal subspace solution, i.e., the principal subspace estimator obtained assuming the true support is known a priori. To this end, we propose a family of estimators based on the semidefinite relaxation of sparse PCA with novel regularizations. In particular, under a weak assumption on the magnitude of the population projection matrix, one estimator within this family exactly recovers the true support with high probability, has exact rank- k , and attains a [Formula: see text] statistical rate of convergence with s being the subspace sparsity level and n the sample size. Compared to existing support recovery results for sparse PCA, our approach does not hinge on the spiked covariance model or the limited correlation condition. As a complement to the first estimator that enjoys the oracle property, we prove that, another estimator within the family achieves a sharper statistical rate of convergence than the standard semidefinite relaxation of sparse PCA, even when the previous assumption on the magnitude of the projection matrix is violated. We validate the theoretical results by numerical experiments on synthetic datasets.
Technique detection software for Sparse Matrices

Directory of Open Access Journals (Sweden)

KHAN Muhammad Taimoor

2009-12-01

Full Text Available Sparse storage formats are techniques for storing and processing the sparse matrix data efficiently. The performance of these storage formats depend upon the distribution of non-zeros, within the matrix in different dimensions. In order to have better results we need a technique that suits best the organization of data in a particular matrix. So the decision of selecting a better technique is the main step towards improving the system's results otherwise the efficiency can be decreased. The purpose of this research is to help identify the best storage format in case of reduced storage size and high processing efficiency for a sparse matrix.
Large-Scale Parallel Finite Element Analysis of the Stress Singular Problems

International Nuclear Information System (INIS)

Noriyuki Kushida; Hiroshi Okuda; Genki Yagawa

2002-01-01

In this paper, the convergence behavior of large-scale parallel finite element method for the stress singular problems was investigated. The convergence behavior of iterative solvers depends on the efficiency of the pre-conditioners. However, efficiency of pre-conditioners may be influenced by the domain decomposition that is necessary for parallel FEM. In this study the following results were obtained: Conjugate gradient method without preconditioning and the diagonal scaling preconditioned conjugate gradient method were not influenced by the domain decomposition as expected. symmetric successive over relaxation method preconditioned conjugate gradient method converged 6% faster as maximum if the stress singular area was contained in one sub-domain. (authors)
Sparse Representations of Hyperspectral Images

KAUST Repository

Swanson, Robin J.

2015-11-23

Hyperspectral image data has long been an important tool for many areas of sci- ence. The addition of spectral data yields significant improvements in areas such as object and image classification, chemical and mineral composition detection, and astronomy. Traditional capture methods for hyperspectral data often require each wavelength to be captured individually, or by sacrificing spatial resolution. Recently there have been significant improvements in snapshot hyperspectral captures using, in particular, compressed sensing methods. As we move to a compressed sensing image formation model the need for strong image priors to shape our reconstruction, as well as sparse basis become more important. Here we compare several several methods for representing hyperspectral images including learned three dimensional dictionaries, sparse convolutional coding, and decomposable nonlocal tensor dictionaries. Addi- tionally, we further explore their parameter space to identify which parameters provide the most faithful and sparse representations.
Sparse Representations of Hyperspectral Images

KAUST Repository

Swanson, Robin J.

2015-01-01

Hyperspectral image data has long been an important tool for many areas of sci- ence. The addition of spectral data yields significant improvements in areas such as object and image classification, chemical and mineral composition detection, and astronomy. Traditional capture methods for hyperspectral data often require each wavelength to be captured individually, or by sacrificing spatial resolution. Recently there have been significant improvements in snapshot hyperspectral captures using, in particular, compressed sensing methods. As we move to a compressed sensing image formation model the need for strong image priors to shape our reconstruction, as well as sparse basis become more important. Here we compare several several methods for representing hyperspectral images including learned three dimensional dictionaries, sparse convolutional coding, and decomposable nonlocal tensor dictionaries. Addi- tionally, we further explore their parameter space to identify which parameters provide the most faithful and sparse representations.
The Crystal Structures of the N-terminal Photosensory Core Module of Agrobacterium Phytochrome Agp1 as Parallel and Anti-parallel Dimers*

Science.gov (United States)

Nagano, Soshichiro; Scheerer, Patrick; Zubow, Kristina; Michael, Norbert; Inomata, Katsuhiko; Lamparter, Tilman; Krauß, Norbert

2016-01-01

Agp1 is a canonical biliverdin-binding bacteriophytochrome from the soil bacterium Agrobacterium fabrum that acts as a light-regulated histidine kinase. Crystal structures of the photosensory core modules (PCMs) of homologous phytochromes have provided a consistent picture of the structural changes that these proteins undergo during photoconversion between the parent red light-absorbing state (Pr) and the far-red light-absorbing state (Pfr). These changes include secondary structure rearrangements in the so-called tongue of the phytochrome-specific (PHY) domain and structural rearrangements within the long α-helix that connects the cGMP-specific phosphodiesterase, adenylyl cyclase, and FhlA (GAF) and the PHY domains. We present the crystal structures of the PCM of Agp1 at 2.70 Å resolution and of a surface-engineered mutant of this PCM at 1.85 Å resolution in the dark-adapted Pr states. Whereas in the mutant structure the dimer subunits are in anti-parallel orientation, the wild-type structure contains parallel subunits. The relative orientations between the PAS-GAF bidomain and the PHY domain are different in the two structures, due to movement involving two hinge regions in the GAF-PHY connecting α-helix and the tongue, indicating pronounced structural flexibility that may give rise to a dynamic Pr state. The resolution of the mutant structure enabled us to detect a sterically strained conformation of the chromophore at ring A that we attribute to the tight interaction with Pro-461 of the conserved PRXSF motif in the tongue. Based on this observation and on data from mutants where residues in the tongue region were replaced by alanine, we discuss the crucial roles of those residues in Pr-to-Pfr photoconversion. PMID:27466363
The Crystal Structures of the N-terminal Photosensory Core Module of Agrobacterium Phytochrome Agp1 as Parallel and Anti-parallel Dimers.

Science.gov (United States)

Nagano, Soshichiro; Scheerer, Patrick; Zubow, Kristina; Michael, Norbert; Inomata, Katsuhiko; Lamparter, Tilman; Krauß, Norbert

2016-09-23

Agp1 is a canonical biliverdin-binding bacteriophytochrome from the soil bacterium Agrobacterium fabrum that acts as a light-regulated histidine kinase. Crystal structures of the photosensory core modules (PCMs) of homologous phytochromes have provided a consistent picture of the structural changes that these proteins undergo during photoconversion between the parent red light-absorbing state (Pr) and the far-red light-absorbing state (Pfr). These changes include secondary structure rearrangements in the so-called tongue of the phytochrome-specific (PHY) domain and structural rearrangements within the long α-helix that connects the cGMP-specific phosphodiesterase, adenylyl cyclase, and FhlA (GAF) and the PHY domains. We present the crystal structures of the PCM of Agp1 at 2.70 Å resolution and of a surface-engineered mutant of this PCM at 1.85 Å resolution in the dark-adapted Pr states. Whereas in the mutant structure the dimer subunits are in anti-parallel orientation, the wild-type structure contains parallel subunits. The relative orientations between the PAS-GAF bidomain and the PHY domain are different in the two structures, due to movement involving two hinge regions in the GAF-PHY connecting α-helix and the tongue, indicating pronounced structural flexibility that may give rise to a dynamic Pr state. The resolution of the mutant structure enabled us to detect a sterically strained conformation of the chromophore at ring A that we attribute to the tight interaction with Pro-461 of the conserved PRXSF motif in the tongue. Based on this observation and on data from mutants where residues in the tongue region were replaced by alanine, we discuss the crucial roles of those residues in Pr-to-Pfr photoconversion. © 2016 by The American Society for Biochemistry and Molecular Biology, Inc.

Supervised Convolutional Sparse Coding

KAUST Repository

Affara, Lama Ahmed

2018-04-08

Convolutional Sparse Coding (CSC) is a well-established image representation model especially suited for image restoration tasks. In this work, we extend the applicability of this model by proposing a supervised approach to convolutional sparse coding, which aims at learning discriminative dictionaries instead of purely reconstructive ones. We incorporate a supervised regularization term into the traditional unsupervised CSC objective to encourage the final dictionary elements to be discriminative. Experimental results show that using supervised convolutional learning results in two key advantages. First, we learn more semantically relevant filters in the dictionary and second, we achieve improved image reconstruction on unseen data.
Parallel iterative solution of the Hermite Collocation equations on GPUs II

International Nuclear Information System (INIS)

Vilanakis, N; Mathioudakis, E

2014-01-01

Hermite Collocation is a high order finite element method for Boundary Value Problems modelling applications in several fields of science and engineering. Application of this integration free numerical solver for the solution of linear BVPs results in a large and sparse general system of algebraic equations, suggesting the usage of an efficient iterative solver especially for realistic simulations. In part I of this work an efficient parallel algorithm of the Schur complement method coupled with Bi-Conjugate Gradient Stabilized (BiCGSTAB) iterative solver has been designed for multicore computing architectures with a Graphics Processing Unit (GPU). In the present work the proposed algorithm has been extended for high performance computing environments consisting of multiprocessor machines with multiple GPUs. Since this is a distributed GPU and shared CPU memory parallel architecture, a hybrid memory treatment is needed for the development of the parallel algorithm. The realization of the algorithm took place on a multiprocessor machine HP SL390 with Tesla M2070 GPUs using the OpenMP and OpenACC standards. Execution time measurements reveal the efficiency of the parallel implementation
Sparse Representation of Deformable 3D Organs with Spherical Harmonics and Structured Dictionary

Directory of Open Access Journals (Sweden)

Dan Wang

2011-01-01

Full Text Available This paper proposed a novel algorithm to sparsely represent a deformable surface (SRDS with low dimensionality based on spherical harmonic decomposition (SHD and orthogonal subspace pursuit (OSP. The key idea in SRDS method is to identify the subspaces from a training data set in the transformed spherical harmonic domain and then cluster each deformation into the best-fit subspace for fast and accurate representation. This algorithm is also generalized into applications of organs with both interior and exterior surfaces. To test the feasibility, we first use the computer models to demonstrate that the proposed approach matches the accuracy of complex mathematical modeling techniques and then both ex vivo and in vivo experiments are conducted using 3D magnetic resonance imaging (MRI scans for verification in practical settings. All results demonstrated that the proposed algorithm features sparse representation of deformable surfaces with low dimensionality and high accuracy. Specifically, the precision evaluated as maximum error distance between the reconstructed surface and the MRI ground truth is better than 3 mm in real MRI experiments.
Stationary states of a PT symmetric two-mode Bose–Einstein condensate

International Nuclear Information System (INIS)

Graefe, Eva-Maria

2012-01-01

The understanding of nonlinear PT symmetric quantum systems, arising for example in the theory of Bose–Einstein condensates in PT symmetric potentials, is widely based on numerical investigations, and little is known about generic features induced by the interplay of PT symmetry and nonlinearity. To gain deeper insights it is important to have analytically solvable toy models at hand. In the present paper the stationary states of a simple toy model of a PT symmetric system previously introduced in [1, 2] are investigated. The model can be interpreted as a simple description of a Bose–Einstein condensate in a PT symmetric double well trap in a two-mode approximation. The eigenvalues and eigenstates of the system can be explicitly calculated in a straightforward manner; the resulting structures resemble those that have recently been found numerically for a more realistic PT symmetric double delta potential. In addition, a continuation of the system is introduced that allows an interpretation in terms of a simple linear matrix model. This article is part of a special issue of Journal of Physics A: Mathematical and Theoretical devoted to ‘Quantum physics with non-Hermitian operators’. (paper)
MEASURING X-RAY VARIABILITY IN FAINT/SPARSELY SAMPLED ACTIVE GALACTIC NUCLEI

Energy Technology Data Exchange (ETDEWEB)

Allevato, V. [Department of Physics, University of Helsinki, Gustaf Haellstroemin katu 2a, FI-00014 Helsinki (Finland); Paolillo, M. [Department of Physical Sciences, University Federico II, via Cinthia 6, I-80126 Naples (Italy); Papadakis, I. [Department of Physics and Institute of Theoretical and Computational Physics, University of Crete, 71003 Heraklion (Greece); Pinto, C. [SRON Netherlands Institute for Space Research, Sorbonnelaan 2, 3584-CA Utrecht (Netherlands)

2013-07-01

We study the statistical properties of the normalized excess variance of variability process characterized by a ''red-noise'' power spectral density (PSD), as in the case of active galactic nuclei (AGNs). We perform Monte Carlo simulations of light curves, assuming both a continuous and a sparse sampling pattern and various signal-to-noise ratios (S/Ns). We show that the normalized excess variance is a biased estimate of the variance even in the case of continuously sampled light curves. The bias depends on the PSD slope and on the sampling pattern, but not on the S/N. We provide a simple formula to account for the bias, which yields unbiased estimates with an accuracy better than 15%. We show that the normalized excess variance estimates based on single light curves (especially for sparse sampling and S/N < 3) are highly uncertain (even if corrected for bias) and we propose instead the use of an ''ensemble estimate'', based on multiple light curves of the same object, or on the use of light curves of many objects. These estimates have symmetric distributions, known errors, and can also be corrected for biases. We use our results to estimate the ability to measure the intrinsic source variability in current data, and show that they could also be useful in the planning of the observing strategy of future surveys such as those provided by X-ray missions studying distant and/or faint AGN populations and, more in general, in the estimation of the variability amplitude of sources that will result from future surveys such as Pan-STARRS and LSST.
Parallel rendering

Science.gov (United States)

Crockett, Thomas W.

1995-01-01

This article provides a broad introduction to the subject of parallel rendering, encompassing both hardware and software systems. The focus is on the underlying concepts and the issues which arise in the design of parallel rendering algorithms and systems. We examine the different types of parallelism and how they can be applied in rendering applications. Concepts from parallel computing, such as data decomposition, task granularity, scalability, and load balancing, are considered in relation to the rendering problem. We also explore concepts from computer graphics, such as coherence and projection, which have a significant impact on the structure of parallel rendering algorithms. Our survey covers a number of practical considerations as well, including the choice of architectural platform, communication and memory requirements, and the problem of image assembly and display. We illustrate the discussion with numerous examples from the parallel rendering literature, representing most of the principal rendering methods currently used in computer graphics.
Sparse Frequency Waveform Design for Radar-Embedded Communication

Directory of Open Access Journals (Sweden)

Chaoyun Mai

2016-01-01

Full Text Available According to the Tag application with function of covert communication, a method for sparse frequency waveform design based on radar-embedded communication is proposed. Firstly, sparse frequency waveforms are designed based on power spectral density fitting and quasi-Newton method. Secondly, the eigenvalue decomposition of the sparse frequency waveform sequence is used to get the dominant space. Finally the communication waveforms are designed through the projection of orthogonal pseudorandom vectors in the vertical subspace. Compared with the linear frequency modulation waveform, the sparse frequency waveform can further improve the bandwidth occupation of communication signals, thus achieving higher communication rate. A certain correlation exists between the reciprocally orthogonal communication signals samples and the sparse frequency waveform, which guarantees the low SER (signal error rate and LPI (low probability of intercept. The simulation results verify the effectiveness of this method.
Direct and iterative algorithms for the parallel solution of the one-dimensional macroscopic Navier-Stokes equations

International Nuclear Information System (INIS)

Doster, J.M.; Sills, E.D.

1986-01-01

Current efforts are under way to develop and evaluate numerical algorithms for the parallel solution of the large sparse matrix equations associated with the finite difference representation of the macroscopic Navier-Stokes equations. Previous work has shown that these equations can be cast into smaller coupled matrix equations suitable for solution utilizing multiple computer processors operating in parallel. The individual processors themselves may exhibit parallelism through the use of vector pipelines. This wor, has concentrated on the one-dimensional drift flux form of the Navier-Stokes equations. Direct and iterative algorithms that may be suitable for implementation on parallel computer architectures are evaluated in terms of accuracy and overall execution speed. This work has application to engineering and training simulations, on-line process control systems, and engineering workstations where increased computational speeds are required
The 1/ N Expansion of Tensor Models with Two Symmetric Tensors

Science.gov (United States)

Gurau, Razvan

2018-06-01

It is well known that tensor models for a tensor with no symmetry admit a 1/ N expansion dominated by melonic graphs. This result relies crucially on identifying jackets, which are globally defined ribbon graphs embedded in the tensor graph. In contrast, no result of this kind has so far been established for symmetric tensors because global jackets do not exist. In this paper we introduce a new approach to the 1/ N expansion in tensor models adapted to symmetric tensors. In particular we do not use any global structure like the jackets. We prove that, for any rank D, a tensor model with two symmetric tensors and interactions the complete graph K D+1 admits a 1/ N expansion dominated by melonic graphs.
Symmetric extendibility of quantum states

OpenAIRE

Nowakowski, Marcin L.

2015-01-01

Studies on symmetric extendibility of quantum states become especially important in a context of analysis of one-way quantum measures of entanglement, distilabillity and security of quantum protocols. In this paper we analyse composite systems containing a symmetric extendible part with a particular attention devoted to one-way security of such systems. Further, we introduce a new one-way monotone based on the best symmetric approximation of quantum state. We underpin those results with geome...
A Parallel Sweeping Preconditioner for Heterogeneous 3D Helmholtz Equations

KAUST Repository

Poulson, Jack

2013-05-02

A parallelization of a sweeping preconditioner for three-dimensional Helmholtz equations without large cavities is introduced and benchmarked for several challenging velocity models. The setup and application costs of the sequential preconditioner are shown to be O(γ2N4/3) and O(γN logN), where γ(ω) denotes the modestly frequency-dependent number of grid points per perfectly matched layer. Several computational and memory improvements are introduced relative to using black-box sparse-direct solvers for the auxiliary problems, and competitive runtimes and iteration counts are reported for high-frequency problems distributed over thousands of cores. Two open-source packages are released along with this paper: Parallel Sweeping Preconditioner (PSP) and the underlying distributed multifrontal solver, Clique. © 2013 Society for Industrial and Applied Mathematics.
A hybrid method for the parallel computation of Green's functions

DEFF Research Database (Denmark)

Petersen, Dan Erik; Li, Song; Stokbro, Kurt

2009-01-01

of the large number of times this calculation needs to be performed, this is computationally very expensive even on supercomputers. The classical approach is based on recurrence formulas which cannot be efficiently parallelized. This practically prevents the solution of large problems with hundreds...... of thousands of atoms. We propose new recurrences for a general class of sparse matrices to calculate Green's and lesser Green's function matrices which extend formulas derived by Takahashi and others. We show that these recurrences may lead to a dramatically reduced computational cost because they only...... require computing a small number of entries of the inverse matrix. Then. we propose a parallelization strategy for block tridiagonal matrices which involves a combination of Schur complement calculations and cyclic reduction. It achieves good scalability even on problems of modest size....
Three components of postural control associated with pushing in symmetrical and asymmetrical stance.

Science.gov (United States)

Lee, Yun-Ju; Aruin, Alexander S

2013-07-01

A number of occupational and leisure activities that involve pushing are performed in symmetrical or asymmetrical stance. The goal of this study was to investigate early postural adjustments (EPAs), anticipatory postural adjustments (APAs), and compensatory postural adjustments (CPAs) during pushing performed while standing. Ten healthy volunteers stood in symmetrical stance (with feet parallel) or in asymmetrical stance (staggered stance with one foot forward) and were instructed to use both hands to push forward the handle of a pendulum attached to the ceiling. Bilateral EMG activity of the trunk and leg muscles and the center of pressure (COP) displacements in the anterior-posterior (AP) and medial-lateral (ML) directions were recorded and analyzed during the EPAs, APAs, and CPAs. The EMG activity and the COP displacement were different between the symmetrical and asymmetrical stance conditions. The COP displacements in the ML direction were significantly larger in staggered stance than in symmetrical stance. In staggered stance, the EPAs and APAs in the thigh muscles of the backward leg were significantly larger, and the CPAs were smaller than in the forward leg. There was no difference in the EMG activity of the trunk muscles between the stance conditions. The study outcome confirmed the existence of the three components of postural control (EPAs, APAs, and CPAs) in pushing. Moreover, standing asymmetrically was associated with asymmetrical patterns of EMG activity in the lower extremities reflecting the stance-related postural control during pushing. The study outcome provides a basis for studying postural control during other daily activities involving pushing.
Symmetric eikonal expansion

International Nuclear Information System (INIS)

Matsuki, Takayuki

1976-01-01

Symmetric eikonal expansion for the scattering amplitude is formulated for nonrelativistic and relativistic potential scatterings and also for the quantum field theory. The first approximations coincide with those of Levy and Sucher. The obtained scattering amplitudes are time reversal invariant for all cases and are crossing symmetric for the quantum field theory in each order of approximation. The improved eikonal phase introduced by Levy and Sucher is also derived from the different approximation scheme from the above. (auth.)
Structural analysis of alanine tripeptide with antiparallel and parallel beta-sheet structures in relation to the analysis of mixed beta-sheet structures in Samia cynthia ricini silk protein fiber using solid-state NMR spectroscopy.

Science.gov (United States)

Asakura, Tetsuo; Okonogi, Michi; Nakazawa, Yasumoto; Yamauchi, Kazuo

2006-05-10

The structural analysis of natural protein fibers with mixed parallel and antiparallel beta-sheet structures by solid-state NMR is reported. To obtain NMR parameters that can characterize these beta-sheet structures, (13)C solid-state NMR experiments were performed on two alanine tripeptide samples: one with 100% parallel beta-sheet structure and the other with 100% antiparallel beta-sheet structure. All (13)C resonances of the tripeptides could be assigned by a comparison of the methyl (13)C resonances of Ala(3) with different [3-(13)C]Ala labeling schemes and also by a series of RFDR (radio frequency driven recoupling) spectra observed by changing mixing times. Two (13)C resonances observed for each Ala residue could be assigned to two nonequivalent molecules per unit cell. Differences in the (13)C chemical shifts and (13)C spin-lattice relaxation times (T(1)) were observed between the two beta-sheet structures. Especially, about 3 times longer T(1) values were obtained for parallel beta-sheet structure as compared to those of antiparallel beta-sheet structure, which could be explicable by the difference in the hydrogen-bond networks of both structures. This very large difference in T(1) becomes a good measure to differentiate between parallel or antiparallel beta-sheet structures. These differences in the NMR parameters found for the tripeptides may be applied to assign the parallel and antiparallel beta-sheet (13)C resonances in the asymmetric and broad methyl spectra of [3-(13)C]Ala silk protein fiber of a wild silkworm, Samia cynthia ricini.
Sparse maps—A systematic infrastructure for reduced-scaling electronic structure methods. I. An efficient and simple linear scaling local MP2 method that uses an intermediate basis of pair natural orbitals

Energy Technology Data Exchange (ETDEWEB)

Pinski, Peter; Riplinger, Christoph; Neese, Frank, E-mail: evaleev@vt.edu, E-mail: frank.neese@cec.mpg.de [Max Planck Institute for Chemical Energy Conversion, Stiftstr. 34-36, D-45470 Mülheim an der Ruhr (Germany); Valeev, Edward F., E-mail: evaleev@vt.edu, E-mail: frank.neese@cec.mpg.de [Department of Chemistry, Virginia Tech, Blacksburg, Virginia 24061 (United States)

2015-07-21

In this work, a systematic infrastructure is described that formalizes concepts implicit in previous work and greatly simplifies computer implementation of reduced-scaling electronic structure methods. The key concept is sparse representation of tensors using chains of sparse maps between two index sets. Sparse map representation can be viewed as a generalization of compressed sparse row, a common representation of a sparse matrix, to tensor data. By combining few elementary operations on sparse maps (inversion, chaining, intersection, etc.), complex algorithms can be developed, illustrated here by a linear-scaling transformation of three-center Coulomb integrals based on our compact code library that implements sparse maps and operations on them. The sparsity of the three-center integrals arises from spatial locality of the basis functions and domain density fitting approximation. A novel feature of our approach is the use of differential overlap integrals computed in linear-scaling fashion for screening products of basis functions. Finally, a robust linear scaling domain based local pair natural orbital second-order Möller-Plesset (DLPNO-MP2) method is described based on the sparse map infrastructure that only depends on a minimal number of cutoff parameters that can be systematically tightened to approach 100% of the canonical MP2 correlation energy. With default truncation thresholds, DLPNO-MP2 recovers more than 99.9% of the canonical resolution of the identity MP2 (RI-MP2) energy while still showing a very early crossover with respect to the computational effort. Based on extensive benchmark calculations, relative energies are reproduced with an error of typically <0.2 kcal/mol. The efficiency of the local MP2 (LMP2) method can be drastically improved by carrying out the LMP2 iterations in a basis of pair natural orbitals. While the present work focuses on local electron correlation, it is of much broader applicability to computation with sparse tensors in
DEPOSITION DISTRICUTION AMONG THE PARALLEL PATHWAYS IN THE HUMAN LUNG CONDUCTING AIRWAY STRUCTURE.

Science.gov (United States)

DEPOSITION DISTRIBUTION AMONG THE PARALLEL PATHWAYS IN THE HUMAN LUNG CONDUCTING AIRWAY STRUCTURE. Chong S. Kim*, USEPA National Health and Environmental Effects Research Lab. RTP, NC 27711; Z. Zhang and C. Kleinstreuer, Department of Mechanical and Aerospace Engineering, North C...
Efficient Sparse Signal Transmission over a Lossy Link Using Compressive Sensing

Directory of Open Access Journals (Sweden)

Liantao Wu

2015-08-01

Full Text Available Reliable data transmission over lossy communication link is expensive due to overheads for error protection. For signals that have inherent sparse structures, compressive sensing (CS is applied to facilitate efficient sparse signal transmissions over lossy communication links without data compression or error protection. The natural packet loss in the lossy link is modeled as a random sampling process of the transmitted data, and the original signal will be reconstructed from the lossy transmission results using the CS-based reconstruction method at the receiving end. The impacts of packet lengths on transmission efficiency under different channel conditions have been discussed, and interleaving is incorporated to mitigate the impact of burst data loss. Extensive simulations and experiments have been conducted and compared to the traditional automatic repeat request (ARQ interpolation technique, and very favorable results have been observed in terms of both accuracy of the reconstructed signals and the transmission energy consumption. Furthermore, the packet length effect provides useful insights for using compressed sensing for efficient sparse signal transmission via lossy links.
Effects of sparse sampling schemes on image quality in low-dose CT

International Nuclear Information System (INIS)

Abbas, Sajid; Lee, Taewon; Cho, Seungryong; Shin, Sukyoung; Lee, Rena

2013-01-01

Purpose: Various scanning methods and image reconstruction algorithms are actively investigated for low-dose computed tomography (CT) that can potentially reduce a health-risk related to radiation dose. Particularly, compressive-sensing (CS) based algorithms have been successfully developed for reconstructing images from sparsely sampled data. Although these algorithms have shown promises in low-dose CT, it has not been studied how sparse sampling schemes affect image quality in CS-based image reconstruction. In this work, the authors present several sparse-sampling schemes for low-dose CT, quantitatively analyze their data property, and compare effects of the sampling schemes on the image quality.Methods: Data properties of several sampling schemes are analyzed with respect to the CS-based image reconstruction using two measures: sampling density and data incoherence. The authors present five different sparse sampling schemes, and simulated those schemes to achieve a targeted dose reduction. Dose reduction factors of about 75% and 87.5%, compared to a conventional scan, were tested. A fully sampled circular cone-beam CT data set was used as a reference, and sparse sampling has been realized numerically based on the CBCT data.Results: It is found that both sampling density and data incoherence affect the image quality in the CS-based reconstruction. Among the sampling schemes the authors investigated, the sparse-view, many-view undersampling (MVUS)-fine, and MVUS-moving cases have shown promising results. These sampling schemes produced images with similar image quality compared to the reference image and their structure similarity index values were higher than 0.92 in the mouse head scan with 75% dose reduction.Conclusions: The authors found that in CS-based image reconstructions both sampling density and data incoherence affect the image quality, and suggest that a sampling scheme should be devised and optimized by use of these indicators. With this strategic
Seismic detection method for small-scale discontinuities based on dictionary learning and sparse representation

Science.gov (United States)

Yu, Caixia; Zhao, Jingtao; Wang, Yanfei

2017-02-01

Studying small-scale geologic discontinuities, such as faults, cavities and fractures, plays a vital role in analyzing the inner conditions of reservoirs, as these geologic structures and elements can provide storage spaces and migration pathways for petroleum. However, these geologic discontinuities have weak energy and are easily contaminated with noises, and therefore effectively extracting them from seismic data becomes a challenging problem. In this paper, a method for detecting small-scale discontinuities using dictionary learning and sparse representation is proposed that can dig up high-resolution information by sparse coding. A K-SVD (K-means clustering via Singular Value Decomposition) sparse representation model that contains two stage of iteration procedure: sparse coding and dictionary updating, is suggested for mathematically expressing these seismic small-scale discontinuities. Generally, the orthogonal matching pursuit (OMP) algorithm is employed for sparse coding. However, the method can only update one dictionary atom at one time. In order to improve calculation efficiency, a regularized version of OMP algorithm is presented for simultaneously updating a number of atoms at one time. Two numerical experiments demonstrate the validity of the developed method for clarifying and enhancing small-scale discontinuities. The field example of carbonate reservoirs further demonstrates its effectiveness in revealing masked tiny faults and small-scale cavities.

Improving matrix-vector product performance and multi-level preconditioning for the parallel PCG package

Energy Technology Data Exchange (ETDEWEB)

McLay, R.T.; Carey, G.F.

1996-12-31

In this study we consider parallel solution of sparse linear systems arising from discretized PDE`s. As part of our continuing work on our parallel PCG Solver package, we have made improvements in two areas. The first is improving the performance of the matrix-vector product. Here on regular finite-difference grids, we are able to use the cache memory more efficiently for smaller domains or where there are multiple degrees of freedom. The second problem of interest in the present work is the construction of preconditioners in the context of the parallel PCG solver we are developing. Here the problem is partitioned over a set of processors subdomains and the matrix-vector product for PCG is carried out in parallel for overlapping grid subblocks. For problems of scaled speedup, the actual rate of convergence of the unpreconditioned system deteriorates as the mesh is refined. Multigrid and subdomain strategies provide a logical approach to resolving the problem. We consider the parallel trade-offs between communication and computation and provide a complexity analysis of a representative algorithm. Some preliminary calculations using the parallel package and comparisons with other preconditioners are provided together with parallel performance results.
Complex {PT}-symmetric extensions of the nonlinear ultra-short light pulse model

Science.gov (United States)

Yan, Zhenya

2012-11-01

The short pulse equation u_{xt}=u+\\frac{1}{2}(u^2u_x)_x is PT symmetric, which arises in nonlinear optics for the ultra-short pulse case. We present a family of new complex PT-symmetric extensions of the short pulse equation, i[(iu_x)^{\\sigma }]_t=au+bu^m+ic[u^n(iu_x)^{\\epsilon }]_x \\,\\, (\\sigma ,\\, \\epsilon ,\\,a,\\,b,\\,c,\\,m,\\,n \\in {R}), based on the complex PT-symmetric extension principle. Some properties of these equations with some chosen parameters are studied including the Hamiltonian structures and exact solutions such as solitary wave solutions, doubly periodic wave solutions and compacton solutions. Our results may be useful to understand complex PT-symmetric nonlinear physical models. This article is part of a special issue of Journal of Physics A: Mathematical and Theoretical devoted to ‘Quantum physics with non-Hermitian operators’.
Sparse Variational Bayesian SAGE Algorithm With Application to the Estimation of Multipath Wireless Channels

DEFF Research Database (Denmark)

Shutin, Dmitriy; Fleury, Bernard Henri

2011-01-01

In this paper, we develop a sparse variational Bayesian (VB) extension of the space-alternating generalized expectation-maximization (SAGE) algorithm for the high resolution estimation of the parameters of relevant multipath components in the response of frequency and spatially selective wireless...... channels. The application context of the algorithm considered in this contribution is parameter estimation from channel sounding measurements for radio channel modeling purpose. The new sparse VB-SAGE algorithm extends the classical SAGE algorithm in two respects: i) by monotonically minimizing...... parametric sparsity priors for the weights of the multipath components. We revisit the Gaussian sparsity priors within the sparse VB-SAGE framework and extend the results by considering Laplace priors. The structure of the VB-SAGE algorithm allows for an analytical stability analysis of the update expression...
RT-Symmetric Laplace Operators on Star Graphs: Real Spectrum and Self-Adjointness

Directory of Open Access Journals (Sweden)

Maria Astudillo

2015-01-01

Full Text Available How ideas of PT-symmetric quantum mechanics can be applied to quantum graphs is analyzed, in particular to the star graph. The class of rotationally symmetric vertex conditions is analyzed. It is shown that all such conditions can effectively be described by circulant matrices: real in the case of odd number of edges and complex having particular block structure in the even case. Spectral properties of the corresponding operators are discussed.
Sparse representation based image interpolation with nonlocal autoregressive modeling.

Science.gov (United States)

Dong, Weisheng; Zhang, Lei; Lukac, Rastislav; Shi, Guangming

2013-04-01

Sparse representation is proven to be a promising approach to image super-resolution, where the low-resolution (LR) image is usually modeled as the down-sampled version of its high-resolution (HR) counterpart after blurring. When the blurring kernel is the Dirac delta function, i.e., the LR image is directly down-sampled from its HR counterpart without blurring, the super-resolution problem becomes an image interpolation problem. In such cases, however, the conventional sparse representation models (SRM) become less effective, because the data fidelity term fails to constrain the image local structures. In natural images, fortunately, many nonlocal similar patches to a given patch could provide nonlocal constraint to the local structure. In this paper, we incorporate the image nonlocal self-similarity into SRM for image interpolation. More specifically, a nonlocal autoregressive model (NARM) is proposed and taken as the data fidelity term in SRM. We show that the NARM-induced sampling matrix is less coherent with the representation dictionary, and consequently makes SRM more effective for image interpolation. Our extensive experimental results demonstrate that the proposed NARM-based image interpolation method can effectively reconstruct the edge structures and suppress the jaggy/ringing artifacts, achieving the best image interpolation results so far in terms of PSNR as well as perceptual quality metrics such as SSIM and FSIM.
Group sparse canonical correlation analysis for genomic data integration.

Science.gov (United States)

Lin, Dongdong; Zhang, Jigang; Li, Jingyao; Calhoun, Vince D; Deng, Hong-Wen; Wang, Yu-Ping

2013-08-12

The emergence of high-throughput genomic datasets from different sources and platforms (e.g., gene expression, single nucleotide polymorphisms (SNP), and copy number variation (CNV)) has greatly enhanced our understandings of the interplay of these genomic factors as well as their influences on the complex diseases. It is challenging to explore the relationship between these different types of genomic data sets. In this paper, we focus on a multivariate statistical method, canonical correlation analysis (CCA) method for this problem. Conventional CCA method does not work effectively if the number of data samples is significantly less than that of biomarkers, which is a typical case for genomic data (e.g., SNPs). Sparse CCA (sCCA) methods were introduced to overcome such difficulty, mostly using penalizations with l-1 norm (CCA-l1) or the combination of l-1and l-2 norm (CCA-elastic net). However, they overlook the structural or group effect within genomic data in the analysis, which often exist and are important (e.g., SNPs spanning a gene interact and work together as a group). We propose a new group sparse CCA method (CCA-sparse group) along with an effective numerical algorithm to study the mutual relationship between two different types of genomic data (i.e., SNP and gene expression). We then extend the model to a more general formulation that can include the existing sCCA models. We apply the model to feature/variable selection from two data sets and compare our group sparse CCA method with existing sCCA methods on both simulation and two real datasets (human gliomas data and NCI60 data). We use a graphical representation of the samples with a pair of canonical variates to demonstrate the discriminating characteristic of the selected features. Pathway analysis is further performed for biological interpretation of those features. The CCA-sparse group method incorporates group effects of features into the correlation analysis while performs individual feature
One-shot 3D scanning by combining sparse landmarks with dense gradient information

Science.gov (United States)

Di Martino, Matías; Flores, Jorge; Ferrari, José A.

2018-06-01

Scene understanding is one of the most challenging and popular problems in the field of robotics and computer vision and the estimation of 3D information is at the core of most of these applications. In order to retrieve the 3D structure of a test surface we propose a single shot approach that combines dense gradient information with sparse absolute measurements. To that end, we designed a colored pattern that codes fine horizontal and vertical fringes, with sparse corners landmarks. By measuring the deformation (bending) of horizontal and vertical fringes, we are able to estimate surface local variations (i.e. its gradient field). Then corner sparse landmarks are detected and matched to infer spare absolute information about the test surface height. Local gradient information is combined with the sparse absolute values which work as anchors to guide the integration process. We show that this can be mathematically done in a very compact and intuitive way by properly defining a Poisson-like partial differential equation. Then we address in detail how the problem can be formulated in a discrete domain and how it can be practically solved by straight forward linear numerical solvers. Finally, validation experiment are presented.
Mesotherapy for benign symmetric lipomatosis.

Science.gov (United States)

Hasegawa, Toshio; Matsukura, Tomoyuki; Ikeda, Shigaku

2010-04-01

Benign symmetric lipomatosis, also known as Madelung disease, is a rare disorder characterized by fat distribution around the shoulders, arms, and neck in the context of chronic alcoholism. Complete excision of nonencapsulated lipomas is difficult. However, reports describing conservative therapeutic measures for lipomatosis are rare. The authors present the case of a 42-year-old man with a diagnosis of benign symmetric lipomatosis who had multiple, large, symmetrical masses in his neck. Multiple phosphatidylcholine injections in the neck were administered 4 weeks apart, a total of seven times to achieve lipolysis. The patient's lipomatosis improved in response to the injections, and he achieved good cosmetic results. Intralesional injection, termed mesotherapy, using phosphatidylcholine is a potentially effective therapy for benign symmetric lipomatosis that should be reconsidered as a therapeutic option for this disease.
A parallel additive Schwarz preconditioned Jacobi-Davidson algorithm for polynomial eigenvalue problems in quantum dot simulation

International Nuclear Information System (INIS)

Hwang, F-N; Wei, Z-H; Huang, T-M; Wang Weichung

2010-01-01

We develop a parallel Jacobi-Davidson approach for finding a partial set of eigenpairs of large sparse polynomial eigenvalue problems with application in quantum dot simulation. A Jacobi-Davidson eigenvalue solver is implemented based on the Portable, Extensible Toolkit for Scientific Computation (PETSc). The eigensolver thus inherits PETSc's efficient and various parallel operations, linear solvers, preconditioning schemes, and easy usages. The parallel eigenvalue solver is then used to solve higher degree polynomial eigenvalue problems arising in numerical simulations of three dimensional quantum dots governed by Schroedinger's equations. We find that the parallel restricted additive Schwarz preconditioner in conjunction with a parallel Krylov subspace method (e.g. GMRES) can solve the correction equations, the most costly step in the Jacobi-Davidson algorithm, very efficiently in parallel. Besides, the overall performance is quite satisfactory. We have observed near perfect superlinear speedup by using up to 320 processors. The parallel eigensolver can find all target interior eigenpairs of a quintic polynomial eigenvalue problem with more than 32 million variables within 12 minutes by using 272 Intel 3.0 GHz processors.
The numerical parallel computing of photon transport

International Nuclear Information System (INIS)

Huang Qingnan; Liang Xiaoguang; Zhang Lifa

1998-12-01

The parallel computing of photon transport is investigated, the parallel algorithm and the parallelization of programs on parallel computers both with shared memory and with distributed memory are discussed. By analyzing the inherent law of the mathematics and physics model of photon transport according to the structure feature of parallel computers, using the strategy of 'to divide and conquer', adjusting the algorithm structure of the program, dissolving the data relationship, finding parallel liable ingredients and creating large grain parallel subtasks, the sequential computing of photon transport into is efficiently transformed into parallel and vector computing. The program was run on various HP parallel computers such as the HY-1 (PVP), the Challenge (SMP) and the YH-3 (MPP) and very good parallel speedup has been gotten
Sparse reconstruction using distribution agnostic bayesian matching pursuit

KAUST Repository

Masood, Mudassir

2013-11-01

A fast matching pursuit method using a Bayesian approach is introduced for sparse signal recovery. This method performs Bayesian estimates of sparse signals even when the signal prior is non-Gaussian or unknown. It is agnostic on signal statistics and utilizes a priori statistics of additive noise and the sparsity rate of the signal, which are shown to be easily estimated from data if not available. The method utilizes a greedy approach and order-recursive updates of its metrics to find the most dominant sparse supports to determine the approximate minimum mean-square error (MMSE) estimate of the sparse signal. Simulation results demonstrate the power and robustness of our proposed estimator. © 2013 IEEE.
Image understanding using sparse representations

CERN Document Server

Thiagarajan, Jayaraman J; Turaga, Pavan; Spanias, Andreas

2014-01-01

Image understanding has been playing an increasingly crucial role in several inverse problems and computer vision. Sparse models form an important component in image understanding, since they emulate the activity of neural receptors in the primary visual cortex of the human brain. Sparse methods have been utilized in several learning problems because of their ability to provide parsimonious, interpretable, and efficient models. Exploiting the sparsity of natural signals has led to advances in several application areas including image compression, denoising, inpainting, compressed sensing, blin
Parallel checksumming of data chunks of a shared data object using a log-structured file system

Science.gov (United States)

Bent, John M.; Faibish, Sorin; Grider, Gary

2016-09-06

Checksum values are generated and used to verify the data integrity. A client executing in a parallel computing system stores a data chunk to a shared data object on a storage node in the parallel computing system. The client determines a checksum value for the data chunk; and provides the checksum value with the data chunk to the storage node that stores the shared object. The data chunk can be stored on the storage node with the corresponding checksum value as part of the shared object. The storage node may be part of a Parallel Log-Structured File System (PLFS), and the client may comprise, for example, a Log-Structured File System client on a compute node or burst buffer. The checksum value can be evaluated when the data chunk is read from the storage node to verify the integrity of the data that is read.
Sparse inpainting and isotropy

Energy Technology Data Exchange (ETDEWEB)

Feeney, Stephen M.; McEwen, Jason D.; Peiris, Hiranya V. [Department of Physics and Astronomy, University College London, Gower Street, London, WC1E 6BT (United Kingdom); Marinucci, Domenico; Cammarota, Valentina [Department of Mathematics, University of Rome Tor Vergata, via della Ricerca Scientifica 1, Roma, 00133 (Italy); Wandelt, Benjamin D., E-mail: s.feeney@imperial.ac.uk, E-mail: marinucc@axp.mat.uniroma2.it, E-mail: jason.mcewen@ucl.ac.uk, E-mail: h.peiris@ucl.ac.uk, E-mail: wandelt@iap.fr, E-mail: cammarot@axp.mat.uniroma2.it [Kavli Institute for Theoretical Physics, Kohn Hall, University of California, 552 University Road, Santa Barbara, CA, 93106 (United States)

2014-01-01

Sparse inpainting techniques are gaining in popularity as a tool for cosmological data analysis, in particular for handling data which present masked regions and missing observations. We investigate here the relationship between sparse inpainting techniques using the spherical harmonic basis as a dictionary and the isotropy properties of cosmological maps, as for instance those arising from cosmic microwave background (CMB) experiments. In particular, we investigate the possibility that inpainted maps may exhibit anisotropies in the behaviour of higher-order angular polyspectra. We provide analytic computations and simulations of inpainted maps for a Gaussian isotropic model of CMB data, suggesting that the resulting angular trispectrum may exhibit small but non-negligible deviations from isotropy.
Superresolving Black Hole Images with Full-Closure Sparse Modeling

Science.gov (United States)

Crowley, Chelsea; Akiyama, Kazunori; Fish, Vincent

2018-01-01

It is believed that almost all galaxies have black holes at their centers. Imaging a black hole is a primary objective to answer scientific questions relating to relativistic accretion and jet formation. The Event Horizon Telescope (EHT) is set to capture images of two nearby black holes, Sagittarius A* at the center of the Milky Way galaxy roughly 26,000 light years away and the other M87 which is in Virgo A, a large elliptical galaxy that is 50 million light years away. Sparse imaging techniques have shown great promise for reconstructing high-fidelity superresolved images of black holes from simulated data. Previous work has included the effects of atmospheric phase errors and thermal noise, but not systematic amplitude errors that arise due to miscalibration. We explore a full-closure imaging technique with sparse modeling that uses closure amplitudes and closure phases to improve the imaging process. This new technique can successfully handle data with systematic amplitude errors. Applying our technique to synthetic EHT data of M87, we find that full-closure sparse modeling can reconstruct images better than traditional methods and recover key structural information on the source, such as the shape and size of the predicted photon ring. These results suggest that our new approach will provide superior imaging performance for data from the EHT and other interferometric arrays.
Sparse matrix-vector multiplication on GPGPU clusters: A new storage format and a scalable implementation

OpenAIRE

Kreutzer, Moritz; Hager, Georg; Wellein, Gerhard; Fehske, Holger; Basermann, Achim; Bishop, Alan R.

2011-01-01

Sparse matrix-vector multiplication (spMVM) is the dominant operation in many sparse solvers. We investigate performance properties of spMVM with matrices of various sparsity patterns on the nVidia “Fermi” class of GPGPUs. A new “padded jagged diagonals storage” (pJDS) format is proposed which may substantially reduce the memory overhead intrinsic to the widespread ELLPACK-R scheme while making no assumptions about the matrix structure. In our test scenarios the pJDS format cuts the ...
Introduction to parallel programming

CERN Document Server

Brawer, Steven

1989-01-01

Introduction to Parallel Programming focuses on the techniques, processes, methodologies, and approaches involved in parallel programming. The book first offers information on Fortran, hardware and operating system models, and processes, shared memory, and simple parallel programs. Discussions focus on processes and processors, joining processes, shared memory, time-sharing with multiple processors, hardware, loops, passing arguments in function/subroutine calls, program structure, and arithmetic expressions. The text then elaborates on basic parallel programming techniques, barriers and race
Adaptive Distributed Data Structure Management for Parallel CFD Applications

KAUST Repository

Frisch, Jerome

2013-09-01

Computational fluid dynamics (CFD) simulations require a lot of computing resources in terms of CPU time and memory in order to compute with a reasonable physical accuracy. If only uniformly refined domains are applied, the amount of computing cells is growing rather fast if a certain small resolution is physically required. This can be remedied by applying adaptively refined grids. Unfortunately, due to the adaptive refinement procedures, errors are introduced which have to be taken into account. This paper is focussing on implementation details of the applied adaptive data structure management and a qualitative analysis of the introduced errors by analysing a Poisson problem on the given data structure, which has to be solved in every time step of a CFD analysis. Furthermore an adaptive CFD benchmark example is computed, showing the benefits of an adaptive refinement as well as measurements of parallel data distribution and performance. © 2013 IEEE.
Decentralized modal identification using sparse blind source separation

International Nuclear Information System (INIS)

Sadhu, A; Hazra, B; Narasimhan, S; Pandey, M D

2011-01-01

Popular ambient vibration-based system identification methods process information collected from a dense array of sensors centrally to yield the modal properties. In such methods, the need for a centralized processing unit capable of satisfying large memory and processing demands is unavoidable. With the advent of wireless smart sensor networks, it is now possible to process information locally at the sensor level, instead. The information at the individual sensor level can then be concatenated to obtain the global structure characteristics. A novel decentralized algorithm based on wavelet transforms to infer global structure mode information using measurements obtained using a small group of sensors at a time is proposed in this paper. The focus of the paper is on algorithmic development, while the actual hardware and software implementation is not pursued here. The problem of identification is cast within the framework of under-determined blind source separation invoking transformations of measurements to the time–frequency domain resulting in a sparse representation. The partial mode shape coefficients so identified are then combined to yield complete modal information. The transformations are undertaken using stationary wavelet packet transform (SWPT), yielding a sparse representation in the wavelet domain. Principal component analysis (PCA) is then performed on the resulting wavelet coefficients, yielding the partial mixing matrix coefficients from a few measurement channels at a time. This process is repeated using measurements obtained from multiple sensor groups, and the results so obtained from each group are concatenated to obtain the global modal characteristics of the structure
Decentralized modal identification using sparse blind source separation

Science.gov (United States)

Sadhu, A.; Hazra, B.; Narasimhan, S.; Pandey, M. D.

2011-12-01

Popular ambient vibration-based system identification methods process information collected from a dense array of sensors centrally to yield the modal properties. In such methods, the need for a centralized processing unit capable of satisfying large memory and processing demands is unavoidable. With the advent of wireless smart sensor networks, it is now possible to process information locally at the sensor level, instead. The information at the individual sensor level can then be concatenated to obtain the global structure characteristics. A novel decentralized algorithm based on wavelet transforms to infer global structure mode information using measurements obtained using a small group of sensors at a time is proposed in this paper. The focus of the paper is on algorithmic development, while the actual hardware and software implementation is not pursued here. The problem of identification is cast within the framework of under-determined blind source separation invoking transformations of measurements to the time-frequency domain resulting in a sparse representation. The partial mode shape coefficients so identified are then combined to yield complete modal information. The transformations are undertaken using stationary wavelet packet transform (SWPT), yielding a sparse representation in the wavelet domain. Principal component analysis (PCA) is then performed on the resulting wavelet coefficients, yielding the partial mixing matrix coefficients from a few measurement channels at a time. This process is repeated using measurements obtained from multiple sensor groups, and the results so obtained from each group are concatenated to obtain the global modal characteristics of the structure.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.