Efficient conjugate gradient algorithms for computation of the manipulator forward dynamics
Fijany, Amir; Scheid, Robert E.
1989-01-01
The applicability of conjugate gradient algorithms for computation of the manipulator forward dynamics is investigated. The redundancies in the previously proposed conjugate gradient algorithm are analyzed. A new version is developed which, by avoiding these redundancies, achieves a significantly greater efficiency. A preconditioned conjugate gradient algorithm is also presented. A diagonal matrix whose elements are the diagonal elements of the inertia matrix is proposed as the preconditioner. In order to increase the computational efficiency, an algorithm is developed which exploits the synergism between the computation of the diagonal elements of the inertia matrix and that required by the conjugate gradient algorithm.
Automated Development of Accurate Algorithms and Efficient Codes for Computational Aeroacoustics
Goodrich, John W.; Dyson, Rodger W.
1999-01-01
The simulation of sound generation and propagation in three space dimensions with realistic aircraft components is a very large time dependent computation with fine details. Simulations in open domains with embedded objects require accurate and robust algorithms for propagation, for artificial inflow and outflow boundaries, and for the definition of geometrically complex objects. The development, implementation, and validation of methods for solving these demanding problems is being done to support the NASA pillar goals for reducing aircraft noise levels. Our goal is to provide algorithms which are sufficiently accurate and efficient to produce usable results rapidly enough to allow design engineers to study the effects on sound levels of design changes in propulsion systems, and in the integration of propulsion systems with airframes. There is a lack of design tools for these purposes at this time. Our technical approach to this problem combines the development of new, algorithms with the use of Mathematica and Unix utilities to automate the algorithm development, code implementation, and validation. We use explicit methods to ensure effective implementation by domain decomposition for SPMD parallel computing. There are several orders of magnitude difference in the computational efficiencies of the algorithms which we have considered. We currently have new artificial inflow and outflow boundary conditions that are stable, accurate, and unobtrusive, with implementations that match the accuracy and efficiency of the propagation methods. The artificial numerical boundary treatments have been proven to have solutions which converge to the full open domain problems, so that the error from the boundary treatments can be driven as low as is required. The purpose of this paper is to briefly present a method for developing highly accurate algorithms for computational aeroacoustics, the use of computer automation in this process, and a brief survey of the algorithms that
An efficient algorithm to compute subsets of points in ℤ n
Pacheco Martínez, Ana María; Real Jurado, Pedro
2012-01-01
In this paper we show a more efficient algorithm than that in [8] to compute subsets of points non-congruent by isometries. This algorithm can be used to reconstruct the object from the digital image. Both algorithms are compared, highlighting the improvements obtained in terms of CPU time.
Investigating the Multi-memetic Mind Evolutionary Computation Algorithm Efficiency
Directory of Open Access Journals (Sweden)
M. K. Sakharov
2017-01-01
Full Text Available In solving practically significant problems of global optimization, the objective function is often of high dimensionality and computational complexity and of nontrivial landscape as well. Studies show that often one optimization method is not enough for solving such problems efficiently - hybridization of several optimization methods is necessary.One of the most promising contemporary trends in this field are memetic algorithms (MA, which can be viewed as a combination of the population-based search for a global optimum and the procedures for a local refinement of solutions (memes, provided by a synergy. Since there are relatively few theoretical studies concerning the MA configuration, which is advisable for use to solve the black-box optimization problems, many researchers tend just to adaptive algorithms, which for search select the most efficient methods of local optimization for the certain domains of the search space.The article proposes a multi-memetic modification of a simple SMEC algorithm, using random hyper-heuristics. Presents the software algorithm and memes used (Nelder-Mead method, method of random hyper-sphere surface search, Hooke-Jeeves method. Conducts a comparative study of the efficiency of the proposed algorithm depending on the set and the number of memes. The study has been carried out using Rastrigin, Rosenbrock, and Zakharov multidimensional test functions. Computational experiments have been carried out for all possible combinations of memes and for each meme individually.According to results of study, conducted by the multi-start method, the combinations of memes, comprising the Hooke-Jeeves method, were successful. These results prove a rapid convergence of the method to a local optimum in comparison with other memes, since all methods perform the fixed number of iterations at the most.The analysis of the average number of iterations shows that using the most efficient sets of memes allows us to find the optimal
Efficient Geo-Computational Algorithms for Constructing Space-Time Prisms in Road Networks
Directory of Open Access Journals (Sweden)
Hui-Ping Chen
2016-11-01
Full Text Available The Space-time prism (STP is a key concept in time geography for analyzing human activity-travel behavior under various Space-time constraints. Most existing time-geographic studies use a straightforward algorithm to construct STPs in road networks by using two one-to-all shortest path searches. However, this straightforward algorithm can introduce considerable computational overhead, given the fact that accessible links in a STP are generally a small portion of the whole network. To address this issue, an efficient geo-computational algorithm, called NTP-A*, is proposed. The proposed NTP-A* algorithm employs the A* and branch-and-bound techniques to discard inaccessible links during two shortest path searches, and thereby improves the STP construction performance. Comprehensive computational experiments are carried out to demonstrate the computational advantage of the proposed algorithm. Several implementation techniques, including the label-correcting technique and the hybrid link-node labeling technique, are discussed and analyzed. Experimental results show that the proposed NTP-A* algorithm can significantly improve STP construction performance in large-scale road networks by a factor of 100, compared with existing algorithms.
International Nuclear Information System (INIS)
Azmy, Y.Y.; Kirk, B.L.
1990-01-01
Modern parallel computer architectures offer an enormous potential for reducing CPU and wall-clock execution times of large-scale computations commonly performed in various applications in science and engineering. Recently, several authors have reported their efforts in developing and implementing parallel algorithms for solving the neutron diffusion equation on a variety of shared- and distributed-memory parallel computers. Testing of these algorithms for a variety of two- and three-dimensional meshes showed significant speedup of the computation. Even for very large problems (i.e., three-dimensional fine meshes) executed concurrently on a few nodes in serial (nonvector) mode, however, the measured computational efficiency is very low (40 to 86%). In this paper, the authors present a highly efficient (∼85 to 99.9%) algorithm for solving the two-dimensional nodal diffusion equations on the Sequent Balance 8000 parallel computer. Also presented is a model for the performance, represented by the efficiency, as a function of problem size and the number of participating processors. The model is validated through several tests and then extrapolated to larger problems and more processors to predict the performance of the algorithm in more computationally demanding situations
DEFF Research Database (Denmark)
Boiroux, Dimitri; Juhl, Rune; Madsen, Henrik
2016-01-01
This paper addresses maximum likelihood parameter estimation of continuous-time nonlinear systems with discrete-time measurements. We derive an efficient algorithm for the computation of the log-likelihood function and its gradient, which can be used in gradient-based optimization algorithms....... This algorithm uses UD decomposition of symmetric matrices and the array algorithm for covariance update and gradient computation. We test our algorithm on the Lotka-Volterra equations. Compared to the maximum likelihood estimation based on finite difference gradient computation, we get a significant speedup...
Fast Ss-Ilm a Computationally Efficient Algorithm to Discover Socially Important Locations
Dokuz, A. S.; Celik, M.
2017-11-01
Socially important locations are places which are frequently visited by social media users in their social media lifetime. Discovering socially important locations provide several valuable information about user behaviours on social media networking sites. However, discovering socially important locations are challenging due to data volume and dimensions, spatial and temporal calculations, location sparseness in social media datasets, and inefficiency of current algorithms. In the literature, several studies are conducted to discover important locations, however, the proposed approaches do not work in computationally efficient manner. In this study, we propose Fast SS-ILM algorithm by modifying the algorithm of SS-ILM to mine socially important locations efficiently. Experimental results show that proposed Fast SS-ILM algorithm decreases execution time of socially important locations discovery process up to 20 %.
FAST SS-ILM: A COMPUTATIONALLY EFFICIENT ALGORITHM TO DISCOVER SOCIALLY IMPORTANT LOCATIONS
Directory of Open Access Journals (Sweden)
A. S. Dokuz
2017-11-01
Full Text Available Socially important locations are places which are frequently visited by social media users in their social media lifetime. Discovering socially important locations provide several valuable information about user behaviours on social media networking sites. However, discovering socially important locations are challenging due to data volume and dimensions, spatial and temporal calculations, location sparseness in social media datasets, and inefficiency of current algorithms. In the literature, several studies are conducted to discover important locations, however, the proposed approaches do not work in computationally efficient manner. In this study, we propose Fast SS-ILM algorithm by modifying the algorithm of SS-ILM to mine socially important locations efficiently. Experimental results show that proposed Fast SS-ILM algorithm decreases execution time of socially important locations discovery process up to 20 %.
A New Method of Histogram Computation for Efficient Implementation of the HOG Algorithm
Directory of Open Access Journals (Sweden)
Mariana-Eugenia Ilas
2018-03-01
Full Text Available In this paper we introduce a new histogram computation method to be used within the histogram of oriented gradients (HOG algorithm. The new method replaces the arctangent with the slope computation and the classical magnitude allocation based on interpolation with a simpler algorithm. The new method allows a more efficient implementation of HOG in general, and particularly in field-programmable gate arrays (FPGAs, by considerably reducing the area (thus increasing the level of parallelism, while maintaining very close classification accuracy compared to the original algorithm. Thus, the new method is attractive for many applications, including car detection and classification.
Lashkin, S. V.; Kozelkov, A. S.; Yalozo, A. V.; Gerasimov, V. Yu.; Zelensky, D. K.
2017-12-01
This paper describes the details of the parallel implementation of the SIMPLE algorithm for numerical solution of the Navier-Stokes system of equations on arbitrary unstructured grids. The iteration schemes for the serial and parallel versions of the SIMPLE algorithm are implemented. In the description of the parallel implementation, special attention is paid to computational data exchange among processors under the condition of the grid model decomposition using fictitious cells. We discuss the specific features for the storage of distributed matrices and implementation of vector-matrix operations in parallel mode. It is shown that the proposed way of matrix storage reduces the number of interprocessor exchanges. A series of numerical experiments illustrates the effect of the multigrid SLAE solver tuning on the general efficiency of the algorithm; the tuning involves the types of the cycles used (V, W, and F), the number of iterations of a smoothing operator, and the number of cells for coarsening. Two ways (direct and indirect) of efficiency evaluation for parallelization of the numerical algorithm are demonstrated. The paper presents the results of solving some internal and external flow problems with the evaluation of parallelization efficiency by two algorithms. It is shown that the proposed parallel implementation enables efficient computations for the problems on a thousand processors. Based on the results obtained, some general recommendations are made for the optimal tuning of the multigrid solver, as well as for selecting the optimal number of cells per processor.
Computationally efficient model predictive control algorithms a neural network approach
Ławryńczuk, Maciej
2014-01-01
This book thoroughly discusses computationally efficient (suboptimal) Model Predictive Control (MPC) techniques based on neural models. The subjects treated include: · A few types of suboptimal MPC algorithms in which a linear approximation of the model or of the predicted trajectory is successively calculated on-line and used for prediction. · Implementation details of the MPC algorithms for feedforward perceptron neural models, neural Hammerstein models, neural Wiener models and state-space neural models. · The MPC algorithms based on neural multi-models (inspired by the idea of predictive control). · The MPC algorithms with neural approximation with no on-line linearization. · The MPC algorithms with guaranteed stability and robustness. · Cooperation between the MPC algorithms and set-point optimization. Thanks to linearization (or neural approximation), the presented suboptimal algorithms do not require d...
Efficient quantum algorithm for computing n-time correlation functions.
Pedernales, J S; Di Candia, R; Egusquiza, I L; Casanova, J; Solano, E
2014-07-11
We propose a method for computing n-time correlation functions of arbitrary spinorial, fermionic, and bosonic operators, consisting of an efficient quantum algorithm that encodes these correlations in an initially added ancillary qubit for probe and control tasks. For spinorial and fermionic systems, the reconstruction of arbitrary n-time correlation functions requires the measurement of two ancilla observables, while for bosonic variables time derivatives of the same observables are needed. Finally, we provide examples applicable to different quantum platforms in the frame of the linear response theory.
Computationally efficient real-time interpolation algorithm for non-uniform sampled biosignals.
Guven, Onur; Eftekhar, Amir; Kindt, Wilko; Constandinou, Timothy G
2016-06-01
This Letter presents a novel, computationally efficient interpolation method that has been optimised for use in electrocardiogram baseline drift removal. In the authors' previous Letter three isoelectric baseline points per heartbeat are detected, and here utilised as interpolation points. As an extension from linear interpolation, their algorithm segments the interpolation interval and utilises different piecewise linear equations. Thus, the algorithm produces a linear curvature that is computationally efficient while interpolating non-uniform samples. The proposed algorithm is tested using sinusoids with different fundamental frequencies from 0.05 to 0.7 Hz and also validated with real baseline wander data acquired from the Massachusetts Institute of Technology University and Boston's Beth Israel Hospital (MIT-BIH) Noise Stress Database. The synthetic data results show an root mean square (RMS) error of 0.9 μV (mean), 0.63 μV (median) and 0.6 μV (standard deviation) per heartbeat on a 1 mVp-p 0.1 Hz sinusoid. On real data, they obtain an RMS error of 10.9 μV (mean), 8.5 μV (median) and 9.0 μV (standard deviation) per heartbeat. Cubic spline interpolation and linear interpolation on the other hand shows 10.7 μV, 11.6 μV (mean), 7.8 μV, 8.9 μV (median) and 9.8 μV, 9.3 μV (standard deviation) per heartbeat.
Efficient frequent pattern mining algorithm based on node sets in cloud computing environment
Billa, V. N. Vinay Kumar; Lakshmanna, K.; Rajesh, K.; Reddy, M. Praveen Kumar; Nagaraja, G.; Sudheer, K.
2017-11-01
The ultimate goal of Data Mining is to determine the hidden information which is useful in making decisions using the large databases collected by an organization. This Data Mining involves many tasks that are to be performed during the process. Mining frequent itemsets is the one of the most important tasks in case of transactional databases. These transactional databases contain the data in very large scale where the mining of these databases involves the consumption of physical memory and time in proportion to the size of the database. A frequent pattern mining algorithm is said to be efficient only if it consumes less memory and time to mine the frequent itemsets from the given large database. Having these points in mind in this thesis we proposed a system which mines frequent itemsets in an optimized way in terms of memory and time by using cloud computing as an important factor to make the process parallel and the application is provided as a service. A complete framework which uses a proven efficient algorithm called FIN algorithm. FIN algorithm works on Nodesets and POC (pre-order coding) tree. In order to evaluate the performance of the system we conduct the experiments to compare the efficiency of the same algorithm applied in a standalone manner and in cloud computing environment on a real time data set which is traffic accidents data set. The results show that the memory consumption and execution time taken for the process in the proposed system is much lesser than those of standalone system.
Yu, Jieqing; Wu, Lixin; Hu, Qingsong; Yan, Zhigang; Zhang, Shaoliang
2017-12-01
Visibility computation is of great interest to location optimization, environmental planning, ecology, and tourism. Many algorithms have been developed for visibility computation. In this paper, we propose a novel method of visibility computation, called synthetic visual plane (SVP), to achieve better performance with respect to efficiency, accuracy, or both. The method uses a global horizon, which is a synthesis of line-of-sight information of all nearer points, to determine the visibility of a point, which makes it an accurate visibility method. We used discretization of horizon to gain a good performance in efficiency. After discretization, the accuracy and efficiency of SVP depends on the scale of discretization (i.e., zone width). The method is more accurate at smaller zone widths, but this requires a longer operating time. Users must strike a balance between accuracy and efficiency at their discretion. According to our experiments, SVP is less accurate but more efficient than R2 if the zone width is set to one grid. However, SVP becomes more accurate than R2 when the zone width is set to 1/24 grid, while it continues to perform as fast or faster than R2. Although SVP performs worse than reference plane and depth map with respect to efficiency, it is superior in accuracy to these other two algorithms.
M4GB : Efficient Groebner Basis algorithm
R.H. Makarim (Rusydi); M.M.J. Stevens (Marc)
2017-01-01
textabstractWe introduce a new efficient algorithm for computing Groebner-bases named M4GB. Like Faugere's algorithm F4 it is an extension of Buchberger's algorithm that describes: how to store already computed (tail-)reduced multiples of basis polynomials to prevent redundant work in the reduction
Labibian, Amir; Bahrami, Amir Hossein; Haghshenas, Javad
2017-09-01
This paper presents a computationally efficient algorithm for attitude estimation of remote a sensing satellite. In this study, gyro, magnetometer, sun sensor and star tracker are used in Extended Kalman Filter (EKF) structure for the purpose of Attitude Determination (AD). However, utilizing all of the measurement data simultaneously in EKF structure increases computational burden. Specifically, assuming n observation vectors, an inverse of a 3n×3n matrix is required for gain calculation. In order to solve this problem, an efficient version of EKF, namely Murrell's version, is employed. This method utilizes measurements separately at each sampling time for gain computation. Therefore, an inverse of a 3n×3n matrix is replaced by an inverse of a 3×3 matrix for each measurement vector. Moreover, gyro drifts during the time can reduce the pointing accuracy. Therefore, a calibration algorithm is utilized for estimation of the main gyro parameters.
Sundareshan, Malur K; Bhattacharjee, Supratik; Inampudi, Radhika; Pang, Ho-Yuen
2002-12-10
Computational complexity is a major impediment to the real-time implementation of image restoration and superresolution algorithms in many applications. Although powerful restoration algorithms have been developed within the past few years utilizing sophisticated mathematical machinery (based on statistical optimization and convex set theory), these algorithms are typically iterative in nature and require a sufficient number of iterations to be executed to achieve the desired resolution improvement that may be needed to meaningfully perform postprocessing image exploitation tasks in practice. Additionally, recent technological breakthroughs have facilitated novel sensor designs (focal plane arrays, for instance) that make it possible to capture megapixel imagery data at video frame rates. A major challenge in the processing of these large-format images is to complete the execution of the image processing steps within the frame capture times and to keep up with the output rate of the sensor so that all data captured by the sensor can be efficiently utilized. Consequently, development of novel methods that facilitate real-time implementation of image restoration and superresolution algorithms is of significant practical interest and is the primary focus of this study. The key to designing computationally efficient processing schemes lies in strategically introducing appropriate preprocessing steps together with the superresolution iterations to tailor optimized overall processing sequences for imagery data of specific formats. For substantiating this assertion, three distinct methods for tailoring a preprocessing filter and integrating it with the superresolution processing steps are outlined. These methods consist of a region-of-interest extraction scheme, a background-detail separation procedure, and a scene-derived information extraction step for implementing a set-theoretic restoration of the image that is less demanding in computation compared with the
Saving time in a space-efficient simulation algorithm
Markovski, J.
2011-01-01
We present an efficient algorithm for computing the simulation preorder and equivalence for labeled transition systems. The algorithm improves an existing space-efficient algorithm and improves its time complexity by employing a variant of the stability condition and exploiting properties of the
Algorithms and file structures for computational geometry
International Nuclear Information System (INIS)
Hinrichs, K.; Nievergelt, J.
1983-01-01
Algorithms for solving geometric problems and file structures for storing large amounts of geometric data are of increasing importance in computer graphics and computer-aided design. As examples of recent progress in computational geometry, we explain plane-sweep algorithms, which solve various topological and geometric problems efficiently; and we present the grid file, an adaptable, symmetric multi-key file structure that provides efficient access to multi-dimensional data along any space dimension. (orig.)
A new efficient algorithm for computing the imprecise reliability of monotone systems
International Nuclear Information System (INIS)
Utkin, Lev V.
2004-01-01
Reliability analysis of complex systems by partial information about reliability of components and by different conditions of independence of components may be carried out by means of the imprecise probability theory which provides a unified framework (natural extension, lower and upper previsions) for computing the system reliability. However, the application of imprecise probabilities to reliability analysis meets with a complexity of optimization problems which have to be solved for obtaining the system reliability measures. Therefore, an efficient simplified algorithm to solve and decompose the optimization problems is proposed in the paper. This algorithm allows us to practically implement reliability analysis of monotone systems under partial and heterogeneous information about reliability of components and under conditions of the component independence or the lack of information about independence. A numerical example illustrates the algorithm
International Nuclear Information System (INIS)
Woodruff, S.B.
1992-01-01
The Transient Reactor Analysis Code (TRAC), which features a two- fluid treatment of thermal-hydraulics, is designed to model transients in water reactors and related facilities. One of the major computational costs associated with TRAC and similar codes is calculating constitutive coefficients. Although the formulations for these coefficients are local the costs are flow-regime- or data-dependent; i.e., the computations needed for a given spatial node often vary widely as a function of time. Consequently, poor load balancing will degrade efficiency on either vector or data parallel architectures when the data are organized according to spatial location. Unfortunately, a general automatic solution to the load-balancing problem associated with data-dependent computations is not yet available for massively parallel architectures. This document discusses why developers algorithms, such as a neural net representation, that do not exhibit algorithms, such as a neural net representation, that do not exhibit load-balancing problems
Directory of Open Access Journals (Sweden)
Shaat Musbah
2010-01-01
Full Text Available Cognitive Radio (CR systems have been proposed to increase the spectrum utilization by opportunistically access the unused spectrum. Multicarrier communication systems are promising candidates for CR systems. Due to its high spectral efficiency, filter bank multicarrier (FBMC can be considered as an alternative to conventional orthogonal frequency division multiplexing (OFDM for transmission over the CR networks. This paper addresses the problem of resource allocation in multicarrier-based CR networks. The objective is to maximize the downlink capacity of the network under both total power and interference introduced to the primary users (PUs constraints. The optimal solution has high computational complexity which makes it unsuitable for practical applications and hence a low complexity suboptimal solution is proposed. The proposed algorithm utilizes the spectrum holes in PUs bands as well as active PU bands. The performance of the proposed algorithm is investigated for OFDM and FBMC based CR systems. Simulation results illustrate that the proposed resource allocation algorithm with low computational complexity achieves near optimal performance and proves the efficiency of using FBMC in CR context.
Algorithmic Mechanism Design of Evolutionary Computation.
Pei, Yan
2015-01-01
We consider algorithmic design, enhancement, and improvement of evolutionary computation as a mechanism design problem. All individuals or several groups of individuals can be considered as self-interested agents. The individuals in evolutionary computation can manipulate parameter settings and operations by satisfying their own preferences, which are defined by an evolutionary computation algorithm designer, rather than by following a fixed algorithm rule. Evolutionary computation algorithm designers or self-adaptive methods should construct proper rules and mechanisms for all agents (individuals) to conduct their evolution behaviour correctly in order to definitely achieve the desired and preset objective(s). As a case study, we propose a formal framework on parameter setting, strategy selection, and algorithmic design of evolutionary computation by considering the Nash strategy equilibrium of a mechanism design in the search process. The evaluation results present the efficiency of the framework. This primary principle can be implemented in any evolutionary computation algorithm that needs to consider strategy selection issues in its optimization process. The final objective of our work is to solve evolutionary computation design as an algorithmic mechanism design problem and establish its fundamental aspect by taking this perspective. This paper is the first step towards achieving this objective by implementing a strategy equilibrium solution (such as Nash equilibrium) in evolutionary computation algorithm.
Quantum algorithms for computational nuclear physics
Directory of Open Access Journals (Sweden)
Višňák Jakub
2015-01-01
Full Text Available While quantum algorithms have been studied as an efficient tool for the stationary state energy determination in the case of molecular quantum systems, no similar study for analogical problems in computational nuclear physics (computation of energy levels of nuclei from empirical nucleon-nucleon or quark-quark potentials have been realized yet. Although the difference between the above mentioned studies might seem negligible, it will be examined. First steps towards a particular simulation (on classical computer of the Iterative Phase Estimation Algorithm for deuterium and tritium nuclei energy level computation will be carried out with the aim to prove algorithm feasibility (and extensibility to heavier nuclei for its possible practical realization on a real quantum computer.
Parallel Computing Strategies for Irregular Algorithms
Biswas, Rupak; Oliker, Leonid; Shan, Hongzhang; Biegel, Bryan (Technical Monitor)
2002-01-01
Parallel computing promises several orders of magnitude increase in our ability to solve realistic computationally-intensive problems, but relies on their efficient mapping and execution on large-scale multiprocessor architectures. Unfortunately, many important applications are irregular and dynamic in nature, making their effective parallel implementation a daunting task. Moreover, with the proliferation of parallel architectures and programming paradigms, the typical scientist is faced with a plethora of questions that must be answered in order to obtain an acceptable parallel implementation of the solution algorithm. In this paper, we consider three representative irregular applications: unstructured remeshing, sparse matrix computations, and N-body problems, and parallelize them using various popular programming paradigms on a wide spectrum of computer platforms ranging from state-of-the-art supercomputers to PC clusters. We present the underlying problems, the solution algorithms, and the parallel implementation strategies. Smart load-balancing, partitioning, and ordering techniques are used to enhance parallel performance. Overall results demonstrate the complexity of efficiently parallelizing irregular algorithms.
DEFF Research Database (Denmark)
Brodal, Gerth Stølting; Fagerberg, Rolf; Mailund, Thomas
2013-01-01
), respectively, and counting how often the induced topologies in the two input trees are different. In this paper we present efficient algorithms for computing these distances. We show how to compute the triplet distance in time O(n log n) and the quartet distance in time O(d n log n), where d is the maximal......The triplet and quartet distances are distance measures to compare two rooted and two unrooted trees, respectively. The leaves of the two trees should have the same set of n labels. The distances are defined by enumerating all subsets of three labels (triplets) and four labels (quartets...... degree of any node in the two trees. Within the same time bounds, our framework also allows us to compute the parameterized triplet and quartet distances, where a parameter is introduced to weight resolved (binary) topologies against unresolved (non-binary) topologies. The previous best algorithm...
Quantum computation and Shor close-quote s factoring algorithm
International Nuclear Information System (INIS)
Ekert, A.; Jozsa, R.
1996-01-01
Current technology is beginning to allow us to manipulate rather than just observe individual quantum phenomena. This opens up the possibility of exploiting quantum effects to perform computations beyond the scope of any classical computer. Recently Peter Shor discovered an efficient algorithm for factoring whole numbers, which uses characteristically quantum effects. The algorithm illustrates the potential power of quantum computation, as there is no known efficient classical method for solving this problem. The authors give an exposition of Shor close-quote s algorithm together with an introduction to quantum computation and complexity theory. They discuss experiments that may contribute to its practical implementation. copyright 1996 The American Physical Society
On the efficiency of chaos optimization algorithms for global optimization
International Nuclear Information System (INIS)
Yang Dixiong; Li Gang; Cheng Gengdong
2007-01-01
Chaos optimization algorithms as a novel method of global optimization have attracted much attention, which were all based on Logistic map. However, we have noticed that the probability density function of the chaotic sequences derived from Logistic map is a Chebyshev-type one, which may affect the global searching capacity and computational efficiency of chaos optimization algorithms considerably. Considering the statistical property of the chaotic sequences of Logistic map and Kent map, the improved hybrid chaos-BFGS optimization algorithm and the Kent map based hybrid chaos-BFGS algorithm are proposed. Five typical nonlinear functions with multimodal characteristic are tested to compare the performance of five hybrid optimization algorithms, which are the conventional Logistic map based chaos-BFGS algorithm, improved Logistic map based chaos-BFGS algorithm, Kent map based chaos-BFGS algorithm, Monte Carlo-BFGS algorithm, mesh-BFGS algorithm. The computational performance of the five algorithms is compared, and the numerical results make us question the high efficiency of the chaos optimization algorithms claimed in some references. It is concluded that the efficiency of the hybrid optimization algorithms is influenced by the statistical property of chaotic/stochastic sequences generated from chaotic/stochastic algorithms, and the location of the global optimum of nonlinear functions. In addition, it is inappropriate to advocate the high efficiency of the global optimization algorithms only depending on several numerical examples of low-dimensional functions
Computational efficiency for the surface renewal method
Kelley, Jason; Higgins, Chad
2018-04-01
Measuring surface fluxes using the surface renewal (SR) method requires programmatic algorithms for tabulation, algebraic calculation, and data quality control. A number of different methods have been published describing automated calibration of SR parameters. Because the SR method utilizes high-frequency (10 Hz+) measurements, some steps in the flux calculation are computationally expensive, especially when automating SR to perform many iterations of these calculations. Several new algorithms were written that perform the required calculations more efficiently and rapidly, and that tested for sensitivity to length of flux averaging period, ability to measure over a large range of lag timescales, and overall computational efficiency. These algorithms utilize signal processing techniques and algebraic simplifications that demonstrate simple modifications that dramatically improve computational efficiency. The results here complement efforts by other authors to standardize a robust and accurate computational SR method. Increased speed of computation time grants flexibility to implementing the SR method, opening new avenues for SR to be used in research, for applied monitoring, and in novel field deployments.
International Nuclear Information System (INIS)
Woodruff, S.B.
1994-01-01
The Transient Reactor Analysis Code (TRAC), which features a two-fluid treatment of thermal-hydraulics, is designed to model transients in water reactors and related facilities. One of the major computational costs associated with TRAC and similar codes is calculating constitutive coefficients. Although the formulations for these coefficients are local, the costs are flow-regime- or data-dependent; i.e., the computations needed for a given spatial node often vary widely as a function of time. Consequently, a fixed, uniform assignment of nodes to prallel processors will result in degraded computational efficiency due to the poor load balancing. A standard method for treating data-dependent models on vector architectures has been to use gather operations (or indirect adressing) to sort the nodes into subsets that (temporarily) share a common computational model. However, this method is not effective on distributed memory data parallel architectures, where indirect adressing involves expensive communication overhead. Another serious problem with this method involves software engineering challenges in the areas of maintainability and extensibility. For example, an implementation that was hand-tuned to achieve good computational efficiency would have to be rewritten whenever the decision tree governing the sorting was modified. Using an example based on the calculation of the wall-to-liquid and wall-to-vapor heat-transfer coefficients for three nonboiling flow regimes, we describe how the use of the Fortran 90 WHERE construct and automatic inlining of functions can be used to ameliorate this problem while improving both efficiency and software engineering. Unfortunately, a general automatic solution to the load-balancing problem associated with data-dependent computations is not yet available for massively parallel architectures. We discuss why developers should either wait for such solutions or consider alternative numerical algorithms, such as a neural network
Carroll, Chester C.; Youngblood, John N.; Saha, Aindam
1987-01-01
Improvements and advances in the development of computer architecture now provide innovative technology for the recasting of traditional sequential solutions into high-performance, low-cost, parallel system to increase system performance. Research conducted in development of specialized computer architecture for the algorithmic execution of an avionics system, guidance and control problem in real time is described. A comprehensive treatment of both the hardware and software structures of a customized computer which performs real-time computation of guidance commands with updated estimates of target motion and time-to-go is presented. An optimal, real-time allocation algorithm was developed which maps the algorithmic tasks onto the processing elements. This allocation is based on the critical path analysis. The final stage is the design and development of the hardware structures suitable for the efficient execution of the allocated task graph. The processing element is designed for rapid execution of the allocated tasks. Fault tolerance is a key feature of the overall architecture. Parallel numerical integration techniques, tasks definitions, and allocation algorithms are discussed. The parallel implementation is analytically verified and the experimental results are presented. The design of the data-driven computer architecture, customized for the execution of the particular algorithm, is discussed.
A class of parallel algorithms for computation of the manipulator inertia matrix
Fijany, Amir; Bejczy, Antal K.
1989-01-01
Parallel and parallel/pipeline algorithms for computation of the manipulator inertia matrix are presented. An algorithm based on composite rigid-body spatial inertia method, which provides better features for parallelization, is used for the computation of the inertia matrix. Two parallel algorithms are developed which achieve the time lower bound in computation. Also described is the mapping of these algorithms with topological variation on a two-dimensional processor array, with nearest-neighbor connection, and with cardinality variation on a linear processor array. An efficient parallel/pipeline algorithm for the linear array was also developed, but at significantly higher efficiency.
Fast Algorithm for Computing the Discrete Hartley Transform of Type-II
Directory of Open Access Journals (Sweden)
Mounir Taha Hamood
2016-06-01
Full Text Available The generalized discrete Hartley transforms (GDHTs have proved to be an efficient alternative to the generalized discrete Fourier transforms (GDFTs for real-valued data applications. In this paper, the development of direct computation of radix-2 decimation-in-time (DIT algorithm for the fast calculation of the GDHT of type-II (DHT-II is presented. The mathematical analysis and the implementation of the developed algorithm are derived, showing that this algorithm possesses a regular structure and can be implemented in-place for efficient memory utilization.The performance of the proposed algorithm is analyzed and the computational complexity is calculated for different transform lengths. A comparison between this algorithm and existing DHT-II algorithms shows that it can be considered as a good compromise between the structural and computational complexities.
Efficient classical simulation of the Deutsch-Jozsa and Simon's algorithms
Johansson, Niklas; Larsson, Jan-Åke
2017-09-01
A long-standing aim of quantum information research is to understand what gives quantum computers their advantage. This requires separating problems that need genuinely quantum resources from those for which classical resources are enough. Two examples of quantum speed-up are the Deutsch-Jozsa and Simon's problem, both efficiently solvable on a quantum Turing machine, and both believed to lack efficient classical solutions. Here we present a framework that can simulate both quantum algorithms efficiently, solving the Deutsch-Jozsa problem with probability 1 using only one oracle query, and Simon's problem using linearly many oracle queries, just as expected of an ideal quantum computer. The presented simulation framework is in turn efficiently simulatable in a classical probabilistic Turing machine. This shows that the Deutsch-Jozsa and Simon's problem do not require any genuinely quantum resources, and that the quantum algorithms show no speed-up when compared with their corresponding classical simulation. Finally, this gives insight into what properties are needed in the two algorithms and calls for further study of oracle separation between quantum and classical computation.
Ehsan, Shoaib; Clark, Adrian F; Naveed ur Rehman; McDonald-Maier, Klaus D
2015-07-10
The integral image, an intermediate image representation, has found extensive use in multi-scale local feature detection algorithms, such as Speeded-Up Robust Features (SURF), allowing fast computation of rectangular features at constant speed, independent of filter size. For resource-constrained real-time embedded vision systems, computation and storage of integral image presents several design challenges due to strict timing and hardware limitations. Although calculation of the integral image only consists of simple addition operations, the total number of operations is large owing to the generally large size of image data. Recursive equations allow substantial decrease in the number of operations but require calculation in a serial fashion. This paper presents two new hardware algorithms that are based on the decomposition of these recursive equations, allowing calculation of up to four integral image values in a row-parallel way without significantly increasing the number of operations. An efficient design strategy is also proposed for a parallel integral image computation unit to reduce the size of the required internal memory (nearly 35% for common HD video). Addressing the storage problem of integral image in embedded vision systems, the paper presents two algorithms which allow substantial decrease (at least 44.44%) in the memory requirements. Finally, the paper provides a case study that highlights the utility of the proposed architectures in embedded vision systems.
Directory of Open Access Journals (Sweden)
Shoaib Ehsan
2015-07-01
Full Text Available The integral image, an intermediate image representation, has found extensive use in multi-scale local feature detection algorithms, such as Speeded-Up Robust Features (SURF, allowing fast computation of rectangular features at constant speed, independent of filter size. For resource-constrained real-time embedded vision systems, computation and storage of integral image presents several design challenges due to strict timing and hardware limitations. Although calculation of the integral image only consists of simple addition operations, the total number of operations is large owing to the generally large size of image data. Recursive equations allow substantial decrease in the number of operations but require calculation in a serial fashion. This paper presents two new hardware algorithms that are based on the decomposition of these recursive equations, allowing calculation of up to four integral image values in a row-parallel way without significantly increasing the number of operations. An efficient design strategy is also proposed for a parallel integral image computation unit to reduce the size of the required internal memory (nearly 35% for common HD video. Addressing the storage problem of integral image in embedded vision systems, the paper presents two algorithms which allow substantial decrease (at least 44.44% in the memory requirements. Finally, the paper provides a case study that highlights the utility of the proposed architectures in embedded vision systems.
Algorithmically specialized parallel computers
Snyder, Lawrence; Gannon, Dennis B
1985-01-01
Algorithmically Specialized Parallel Computers focuses on the concept and characteristics of an algorithmically specialized computer.This book discusses the algorithmically specialized computers, algorithmic specialization using VLSI, and innovative architectures. The architectures and algorithms for digital signal, speech, and image processing and specialized architectures for numerical computations are also elaborated. Other topics include the model for analyzing generalized inter-processor, pipelined architecture for search tree maintenance, and specialized computer organization for raster
Parallel algorithms for computation of the manipulator inertia matrix
Amin-Javaheri, Masoud; Orin, David E.
1989-01-01
The development of an O(log2N) parallel algorithm for the manipulator inertia matrix is presented. It is based on the most efficient serial algorithm which uses the composite rigid body method. Recursive doubling is used to reformulate the linear recurrence equations which are required to compute the diagonal elements of the matrix. It results in O(log2N) levels of computation. Computation of the off-diagonal elements involves N linear recurrences of varying-size and a new method, which avoids redundant computation of position and orientation transforms for the manipulator, is developed. The O(log2N) algorithm is presented in both equation and graphic forms which clearly show the parallelism inherent in the algorithm.
Highly efficient computer algorithm for identifying layer thickness of atomically thin 2D materials
Lee, Jekwan; Cho, Seungwan; Park, Soohyun; Bae, Hyemin; Noh, Minji; Kim, Beom; In, Chihun; Yang, Seunghoon; Lee, Sooun; Seo, Seung Young; Kim, Jehyun; Lee, Chul-Ho; Shim, Woo-Young; Jo, Moon-Ho; Kim, Dohun; Choi, Hyunyong
2018-03-01
The fields of layered material research, such as transition-metal dichalcogenides (TMDs), have demonstrated that the optical, electrical and mechanical properties strongly depend on the layer number N. Thus, efficient and accurate determination of N is the most crucial step before the associated device fabrication. An existing experimental technique using an optical microscope is the most widely used one to identify N. However, a critical drawback of this approach is that it relies on extensive laboratory experiences to estimate N; it requires a very time-consuming image-searching task assisted by human eyes and secondary measurements such as atomic force microscopy and Raman spectroscopy, which are necessary to ensure N. In this work, we introduce a computer algorithm based on the image analysis of a quantized optical contrast. We show that our algorithm can apply to a wide variety of layered materials, including graphene, MoS2, and WS2 regardless of substrates. The algorithm largely consists of two parts. First, it sets up an appropriate boundary between target flakes and substrate. Second, to compute N, it automatically calculates the optical contrast using an adaptive RGB estimation process between each target, which results in a matrix with different integer Ns and returns a matrix map of Ns onto the target flake position. Using a conventional desktop computational power, the time taken to display the final N matrix was 1.8 s on average for the image size of 1280 pixels by 960 pixels and obtained a high accuracy of 90% (six estimation errors among 62 samples) when compared to the other methods. To show the effectiveness of our algorithm, we also apply it to TMD flakes transferred on optically transparent c-axis sapphire substrates and obtain a similar result of the accuracy of 94% (two estimation errors among 34 samples).
Static Load Balancing Algorithms In Cloud Computing Challenges amp Solutions
Directory of Open Access Journals (Sweden)
Nadeem Shah
2015-08-01
Full Text Available Abstract Cloud computing provides on-demand hosted computing resources and services over the Internet on a pay-per-use basis. It is currently becoming the favored method of communication and computation over scalable networks due to numerous attractive attributes such as high availability scalability fault tolerance simplicity of management and low cost of ownership. Due to the huge demand of cloud computing efficient load balancing becomes critical to ensure that computational tasks are evenly distributed across servers to prevent bottlenecks. The aim of this review paper is to understand the current challenges in cloud computing primarily in cloud load balancing using static algorithms and finding gaps to bridge for more efficient static cloud load balancing in the future. We believe the ideas suggested as new solution will allow researchers to redesign better algorithms for better functionalities and improved user experiences in simple cloud systems. This could assist small businesses that cannot afford infrastructure that supports complex amp dynamic load balancing algorithms.
An efficient non-dominated sorting method for evolutionary algorithms.
Fang, Hongbing; Wang, Qian; Tu, Yi-Cheng; Horstemeyer, Mark F
2008-01-01
We present a new non-dominated sorting algorithm to generate the non-dominated fronts in multi-objective optimization with evolutionary algorithms, particularly the NSGA-II. The non-dominated sorting algorithm used by NSGA-II has a time complexity of O(MN(2)) in generating non-dominated fronts in one generation (iteration) for a population size N and M objective functions. Since generating non-dominated fronts takes the majority of total computational time (excluding the cost of fitness evaluations) of NSGA-II, making this algorithm faster will significantly improve the overall efficiency of NSGA-II and other genetic algorithms using non-dominated sorting. The new non-dominated sorting algorithm proposed in this study reduces the number of redundant comparisons existing in the algorithm of NSGA-II by recording the dominance information among solutions from their first comparisons. By utilizing a new data structure called the dominance tree and the divide-and-conquer mechanism, the new algorithm is faster than NSGA-II for different numbers of objective functions. Although the number of solution comparisons by the proposed algorithm is close to that of NSGA-II when the number of objectives becomes large, the total computational time shows that the proposed algorithm still has better efficiency because of the adoption of the dominance tree structure and the divide-and-conquer mechanism.
An Efficient Algorithm for Computing Attractors of Synchronous And Asynchronous Boolean Networks
Zheng, Desheng; Yang, Guowu; Li, Xiaoyu; Wang, Zhicai; Liu, Feng; He, Lei
2013-01-01
Biological networks, such as genetic regulatory networks, often contain positive and negative feedback loops that settle down to dynamically stable patterns. Identifying these patterns, the so-called attractors, can provide important insights for biologists to understand the molecular mechanisms underlying many coordinated cellular processes such as cellular division, differentiation, and homeostasis. Both synchronous and asynchronous Boolean networks have been used to simulate genetic regulatory networks and identify their attractors. The common methods of computing attractors are that start with a randomly selected initial state and finish with exhaustive search of the state space of a network. However, the time complexity of these methods grows exponentially with respect to the number and length of attractors. Here, we build two algorithms to achieve the computation of attractors in synchronous and asynchronous Boolean networks. For the synchronous scenario, combing with iterative methods and reduced order binary decision diagrams (ROBDD), we propose an improved algorithm to compute attractors. For another algorithm, the attractors of synchronous Boolean networks are utilized in asynchronous Boolean translation functions to derive attractors of asynchronous scenario. The proposed algorithms are implemented in a procedure called geneFAtt. Compared to existing tools such as genYsis, geneFAtt is significantly faster in computing attractors for empirical experimental systems. Availability The software package is available at https://sites.google.com/site/desheng619/download. PMID:23585840
Efficient Parallel Algorithm For Direct Numerical Simulation of Turbulent Flows
Moitra, Stuti; Gatski, Thomas B.
1997-01-01
A distributed algorithm for a high-order-accurate finite-difference approach to the direct numerical simulation (DNS) of transition and turbulence in compressible flows is described. This work has two major objectives. The first objective is to demonstrate that parallel and distributed-memory machines can be successfully and efficiently used to solve computationally intensive and input/output intensive algorithms of the DNS class. The second objective is to show that the computational complexity involved in solving the tridiagonal systems inherent in the DNS algorithm can be reduced by algorithm innovations that obviate the need to use a parallelized tridiagonal solver.
Approximate Computing Techniques for Iterative Graph Algorithms
Energy Technology Data Exchange (ETDEWEB)
Panyala, Ajay R.; Subasi, Omer; Halappanavar, Mahantesh; Kalyanaraman, Anantharaman; Chavarria Miranda, Daniel G.; Krishnamoorthy, Sriram
2017-12-18
Approximate computing enables processing of large-scale graphs by trading off quality for performance. Approximate computing techniques have become critical not only due to the emergence of parallel architectures but also the availability of large scale datasets enabling data-driven discovery. Using two prototypical graph algorithms, PageRank and community detection, we present several approximate computing heuristics to scale the performance with minimal loss of accuracy. We present several heuristics including loop perforation, data caching, incomplete graph coloring and synchronization, and evaluate their efficiency. We demonstrate performance improvements of up to 83% for PageRank and up to 450x for community detection, with low impact of accuracy for both the algorithms. We expect the proposed approximate techniques will enable scalable graph analytics on data of importance to several applications in science and their subsequent adoption to scale similar graph algorithms.
An introduction to quantum computing algorithms
Pittenger, Arthur O
2000-01-01
In 1994 Peter Shor [65] published a factoring algorithm for a quantum computer that finds the prime factors of a composite integer N more efficiently than is possible with the known algorithms for a classical com puter. Since the difficulty of the factoring problem is crucial for the se curity of a public key encryption system, interest (and funding) in quan tum computing and quantum computation suddenly blossomed. Quan tum computing had arrived. The study of the role of quantum mechanics in the theory of computa tion seems to have begun in the early 1980s with the publications of Paul Benioff [6]' [7] who considered a quantum mechanical model of computers and the computation process. A related question was discussed shortly thereafter by Richard Feynman [35] who began from a different perspec tive by asking what kind of computer should be used to simulate physics. His analysis led him to the belief that with a suitable class of "quantum machines" one could imitate any quantum system.
Cache and energy efficient algorithms for Nussinov's RNA Folding.
Zhao, Chunchun; Sahni, Sartaj
2017-12-06
An RNA folding/RNA secondary structure prediction algorithm determines the non-nested/pseudoknot-free structure by maximizing the number of complementary base pairs and minimizing the energy. Several implementations of Nussinov's classical RNA folding algorithm have been proposed. Our focus is to obtain run time and energy efficiency by reducing the number of cache misses. Three cache-efficient algorithms, ByRow, ByRowSegment and ByBox, for Nussinov's RNA folding are developed. Using a simple LRU cache model, we show that the Classical algorithm of Nussinov has the highest number of cache misses followed by the algorithms Transpose (Li et al.), ByRow, ByRowSegment, and ByBox (in this order). Extensive experiments conducted on four computational platforms-Xeon E5, AMD Athlon 64 X2, Intel I7 and PowerPC A2-using two programming languages-C and Java-show that our cache efficient algorithms are also efficient in terms of run time and energy. Our benchmarking shows that, depending on the computational platform and programming language, either ByRow or ByBox give best run time and energy performance. The C version of these algorithms reduce run time by as much as 97.2% and energy consumption by as much as 88.8% relative to Classical and by as much as 56.3% and 57.8% relative to Transpose. The Java versions reduce run time by as much as 98.3% relative to Classical and by as much as 75.2% relative to Transpose. Transpose achieves run time and energy efficiency at the expense of memory as it takes twice the memory required by Classical. The memory required by ByRow, ByRowSegment, and ByBox is the same as that of Classical. As a result, using the same amount of memory, the algorithms proposed by us can solve problems up to 40% larger than those solvable by Transpose.
Efficient Algorithms for gcd and Cubic Residuosity in the Ring of Eisenstein Integers
DEFF Research Database (Denmark)
Damgård, Ivan Bjerre; Frandsen, Gudmund Skovbjerg
2003-01-01
We present simple and efficient algorithms for computing gcd and cubic residuosity in the ring of Eisenstein integers, bf Z[ ]i.e. the integers extended with , a complex primitive third root of unity. The algorithms are similar and may be seen as generalisations of the binary integer gcd and deri......We present simple and efficient algorithms for computing gcd and cubic residuosity in the ring of Eisenstein integers, bf Z[ ]i.e. the integers extended with , a complex primitive third root of unity. The algorithms are similar and may be seen as generalisations of the binary integer gcd...
On efficiently computing multigroup multi-layer neutron reflection and transmission conditions
International Nuclear Information System (INIS)
Abreu, Marcos P. de
2007-01-01
In this article, we present an algorithm for efficient computation of multigroup discrete ordinates neutron reflection and transmission conditions, which replace a multi-layered boundary region in neutron multiplication eigenvalue computations with no spatial truncation error. In contrast to the independent layer-by-layer algorithm considered thus far in our computations, the algorithm here is based on an inductive approach developed by the present author for deriving neutron reflection and transmission conditions for a nonactive boundary region with an arbitrary number of arbitrarily thick layers. With this new algorithm, we were able to increase significantly the computational efficiency of our spectral diamond-spectral Green's function method for solving multigroup neutron multiplication eigenvalue problems with multi-layered boundary regions. We provide comparative results for a two-group reactor core model to illustrate the increased efficiency of our spectral method, and we conclude this article with a number of general remarks. (author)
An Efficient Supervised Training Algorithm for Multilayer Spiking Neural Networks.
Xie, Xiurui; Qu, Hong; Liu, Guisong; Zhang, Malu; Kurths, Jürgen
2016-01-01
The spiking neural networks (SNNs) are the third generation of neural networks and perform remarkably well in cognitive tasks such as pattern recognition. The spike emitting and information processing mechanisms found in biological cognitive systems motivate the application of the hierarchical structure and temporal encoding mechanism in spiking neural networks, which have exhibited strong computational capability. However, the hierarchical structure and temporal encoding approach require neurons to process information serially in space and time respectively, which reduce the training efficiency significantly. For training the hierarchical SNNs, most existing methods are based on the traditional back-propagation algorithm, inheriting its drawbacks of the gradient diffusion and the sensitivity on parameters. To keep the powerful computation capability of the hierarchical structure and temporal encoding mechanism, but to overcome the low efficiency of the existing algorithms, a new training algorithm, the Normalized Spiking Error Back Propagation (NSEBP) is proposed in this paper. In the feedforward calculation, the output spike times are calculated by solving the quadratic function in the spike response model instead of detecting postsynaptic voltage states at all time points in traditional algorithms. Besides, in the feedback weight modification, the computational error is propagated to previous layers by the presynaptic spike jitter instead of the gradient decent rule, which realizes the layer-wised training. Furthermore, our algorithm investigates the mathematical relation between the weight variation and voltage error change, which makes the normalization in the weight modification applicable. Adopting these strategies, our algorithm outperforms the traditional SNN multi-layer algorithms in terms of learning efficiency and parameter sensitivity, that are also demonstrated by the comprehensive experimental results in this paper.
Workflow Scheduling Using Hybrid GA-PSO Algorithm in Cloud Computing
Directory of Open Access Journals (Sweden)
Ahmad M. Manasrah
2018-01-01
Full Text Available Cloud computing environment provides several on-demand services and resource sharing for clients. Business processes are managed using the workflow technology over the cloud, which represents one of the challenges in using the resources in an efficient manner due to the dependencies between the tasks. In this paper, a Hybrid GA-PSO algorithm is proposed to allocate tasks to the resources efficiently. The Hybrid GA-PSO algorithm aims to reduce the makespan and the cost and balance the load of the dependent tasks over the heterogonous resources in cloud computing environments. The experiment results show that the GA-PSO algorithm decreases the total execution time of the workflow tasks, in comparison with GA, PSO, HSGA, WSGA, and MTCT algorithms. Furthermore, it reduces the execution cost. In addition, it improves the load balancing of the workflow application over the available resources. Finally, the obtained results also proved that the proposed algorithm converges to optimal solutions faster and with higher quality compared to other algorithms.
Improving the efficiency of deconvolution algorithms for sound source localization
DEFF Research Database (Denmark)
Lylloff, Oliver Ackermann; Fernandez Grande, Efren; Agerkvist, Finn T.
2015-01-01
of the unknown acoustic source distribution and the beamformer's response to a point source, i.e., point-spread function. A significant limitation of deconvolution is, however, an additional computational effort compared to beamforming. In this paper, computationally efficient deconvolution algorithms...
Efficient computation of Laguerre polynomials
A. Gil (Amparo); J. Segura (Javier); N.M. Temme (Nico)
2017-01-01
textabstractAn efficient algorithm and a Fortran 90 module (LaguerrePol) for computing Laguerre polynomials . Ln(α)(z) are presented. The standard three-term recurrence relation satisfied by the polynomials and different types of asymptotic expansions valid for . n large and . α small, are used
An efficient dynamic load balancing algorithm
Lagaros, Nikos D.
2014-01-01
In engineering problems, randomness and uncertainties are inherent. Robust design procedures, formulated in the framework of multi-objective optimization, have been proposed in order to take into account sources of randomness and uncertainty. These design procedures require orders of magnitude more computational effort than conventional analysis or optimum design processes since a very large number of finite element analyses is required to be dealt. It is therefore an imperative need to exploit the capabilities of computing resources in order to deal with this kind of problems. In particular, parallel computing can be implemented at the level of metaheuristic optimization, by exploiting the physical parallelization feature of the nondominated sorting evolution strategies method, as well as at the level of repeated structural analyses required for assessing the behavioural constraints and for calculating the objective functions. In this study an efficient dynamic load balancing algorithm for optimum exploitation of available computing resources is proposed and, without loss of generality, is applied for computing the desired Pareto front. In such problems the computation of the complete Pareto front with feasible designs only, constitutes a very challenging task. The proposed algorithm achieves linear speedup factors and almost 100% speedup factor values with reference to the sequential procedure.
Efficient motif finding algorithms for large-alphabet inputs
Directory of Open Access Journals (Sweden)
Pavlovic Vladimir
2010-10-01
Full Text Available Abstract Background We consider the problem of identifying motifs, recurring or conserved patterns, in the biological sequence data sets. To solve this task, we present a new deterministic algorithm for finding patterns that are embedded as exact or inexact instances in all or most of the input strings. Results The proposed algorithm (1 improves search efficiency compared to existing algorithms, and (2 scales well with the size of alphabet. On a synthetic planted DNA motif finding problem our algorithm is over 10× more efficient than MITRA, PMSPrune, and RISOTTO for long motifs. Improvements are orders of magnitude higher in the same setting with large alphabets. On benchmark TF-binding site problems (FNP, CRP, LexA we observed reduction in running time of over 12×, with high detection accuracy. The algorithm was also successful in rapidly identifying protein motifs in Lipocalin, Zinc metallopeptidase, and supersecondary structure motifs for Cadherin and Immunoglobin families. Conclusions Our algorithm reduces computational complexity of the current motif finding algorithms and demonstrate strong running time improvements over existing exact algorithms, especially in important and difficult cases of large-alphabet sequences.
Efficient Parallel Kernel Solvers for Computational Fluid Dynamics Applications
Sun, Xian-He
1997-01-01
Distributed-memory parallel computers dominate today's parallel computing arena. These machines, such as Intel Paragon, IBM SP2, and Cray Origin2OO, have successfully delivered high performance computing power for solving some of the so-called "grand-challenge" problems. Despite initial success, parallel machines have not been widely accepted in production engineering environments due to the complexity of parallel programming. On a parallel computing system, a task has to be partitioned and distributed appropriately among processors to reduce communication cost and to attain load balance. More importantly, even with careful partitioning and mapping, the performance of an algorithm may still be unsatisfactory, since conventional sequential algorithms may be serial in nature and may not be implemented efficiently on parallel machines. In many cases, new algorithms have to be introduced to increase parallel performance. In order to achieve optimal performance, in addition to partitioning and mapping, a careful performance study should be conducted for a given application to find a good algorithm-machine combination. This process, however, is usually painful and elusive. The goal of this project is to design and develop efficient parallel algorithms for highly accurate Computational Fluid Dynamics (CFD) simulations and other engineering applications. The work plan is 1) developing highly accurate parallel numerical algorithms, 2) conduct preliminary testing to verify the effectiveness and potential of these algorithms, 3) incorporate newly developed algorithms into actual simulation packages. The work plan has well achieved. Two highly accurate, efficient Poisson solvers have been developed and tested based on two different approaches: (1) Adopting a mathematical geometry which has a better capacity to describe the fluid, (2) Using compact scheme to gain high order accuracy in numerical discretization. The previously developed Parallel Diagonal Dominant (PDD) algorithm
On the efficient parallel computation of Legendre transforms
Inda, M.A.; Bisseling, R.H.; Maslen, D.K.
2001-01-01
In this article, we discuss a parallel implementation of efficient algorithms for computation of Legendre polynomial transforms and other orthogonal polynomial transforms. We develop an approach to the Driscoll-Healy algorithm using polynomial arithmetic and present experimental results on the
On the efficient parallel computation of Legendre transforms
Inda, M.A.; Bisseling, R.H.; Maslen, D.K.
1999-01-01
In this article we discuss a parallel implementation of efficient algorithms for computation of Legendre polynomial transforms and other orthogonal polynomial transforms. We develop an approach to the Driscoll-Healy algorithm using polynomial arithmetic and present experimental results on the
A highly efficient multi-core algorithm for clustering extremely large datasets
Directory of Open Access Journals (Sweden)
Kraus Johann M
2010-04-01
Full Text Available Abstract Background In recent years, the demand for computational power in computational biology has increased due to rapidly growing data sets from microarray and other high-throughput technologies. This demand is likely to increase. Standard algorithms for analyzing data, such as cluster algorithms, need to be parallelized for fast processing. Unfortunately, most approaches for parallelizing algorithms largely rely on network communication protocols connecting and requiring multiple computers. One answer to this problem is to utilize the intrinsic capabilities in current multi-core hardware to distribute the tasks among the different cores of one computer. Results We introduce a multi-core parallelization of the k-means and k-modes cluster algorithms based on the design principles of transactional memory for clustering gene expression microarray type data and categorial SNP data. Our new shared memory parallel algorithms show to be highly efficient. We demonstrate their computational power and show their utility in cluster stability and sensitivity analysis employing repeated runs with slightly changed parameters. Computation speed of our Java based algorithm was increased by a factor of 10 for large data sets while preserving computational accuracy compared to single-core implementations and a recently published network based parallelization. Conclusions Most desktop computers and even notebooks provide at least dual-core processors. Our multi-core algorithms show that using modern algorithmic concepts, parallelization makes it possible to perform even such laborious tasks as cluster sensitivity and cluster number estimation on the laboratory computer.
An efficient particle Fokker–Planck algorithm for rarefied gas flows
Energy Technology Data Exchange (ETDEWEB)
Gorji, M. Hossein; Jenny, Patrick
2014-04-01
This paper is devoted to the algorithmic improvement and careful analysis of the Fokker–Planck kinetic model derived by Jenny et al. [1] and Gorji et al. [2]. The motivation behind the Fokker–Planck based particle methods is to gain efficiency in low Knudsen rarefied gas flow simulations, where conventional direct simulation Monte Carlo (DSMC) becomes expensive. This can be achieved due to the fact that the resulting model equations are continuous stochastic differential equations in velocity space. Accordingly, the computational particles evolve along independent stochastic paths and thus no collision needs to be calculated. Therefore the computational cost of the solution algorithm becomes independent of the Knudsen number. In the present study, different computational improvements were persuaded in order to augment the method, including an accurate time integration scheme, local time stepping and noise reduction. For assessment of the performance, gas flow around a cylinder and lid driven cavity flow were studied. Convergence rates, accuracy and computational costs were compared with respect to DSMC for a range of Knudsen numbers (from hydrodynamic regime up to above one). In all the considered cases, the model together with the proposed scheme give rise to very efficient yet accurate solution algorithms.
Efficient and Flexible Computation of Many-Electron Wave Function Overlaps.
Plasser, Felix; Ruckenbauer, Matthias; Mai, Sebastian; Oppel, Markus; Marquetand, Philipp; González, Leticia
2016-03-08
A new algorithm for the computation of the overlap between many-electron wave functions is described. This algorithm allows for the extensive use of recurring intermediates and thus provides high computational efficiency. Because of the general formalism employed, overlaps can be computed for varying wave function types, molecular orbitals, basis sets, and molecular geometries. This paves the way for efficiently computing nonadiabatic interaction terms for dynamics simulations. In addition, other application areas can be envisaged, such as the comparison of wave functions constructed at different levels of theory. Aside from explaining the algorithm and evaluating the performance, a detailed analysis of the numerical stability of wave function overlaps is carried out, and strategies for overcoming potential severe pitfalls due to displaced atoms and truncated wave functions are presented.
Treat a new and efficient match algorithm for AI production system
Miranker, Daniel P
1988-01-01
TREAT: A New and Efficient Match Algorithm for AI Production Systems describes the architecture and software systems embodying the DADO machine, a parallel tree-structured computer designed to provide significant performance improvements over serial computers of comparable hardware complexity in the execution of large expert systems implemented in production system form.This book focuses on TREAT as a match algorithm for executing production systems that is presented and comparatively analyzed with the RETE match algorithm. TREAT, originally designed specifically for the DADO machine architect
Wireless-Uplinks-Based Energy-Efficient Scheduling in Mobile Cloud Computing
Directory of Open Access Journals (Sweden)
Xing Liu
2015-01-01
Full Text Available Mobile cloud computing (MCC combines cloud computing and mobile internet to improve the computational capabilities of resource-constrained mobile devices (MDs. In MCC, mobile users could not only improve the computational capability of MDs but also save operation consumption by offloading the mobile applications to the cloud. However, MCC faces the problem of energy efficiency because of time-varying channels when the offloading is being executed. In this paper, we address the issue of energy-efficient scheduling for wireless uplink in MCC. By introducing Lyapunov optimization, we first propose a scheduling algorithm that can dynamically choose channel to transmit data based on queue backlog and channel statistics. Then, we show that the proposed scheduling algorithm can make a tradeoff between queue backlog and energy consumption in a channel-aware MCC system. Simulation results show that the proposed scheduling algorithm can reduce the time average energy consumption for offloading compared to the existing algorithm.
A Visualization Review of Cloud Computing Algorithms in the Last Decade
Directory of Open Access Journals (Sweden)
Junhu Ruan
2016-10-01
Full Text Available Cloud computing has competitive advantages—such as on-demand self-service, rapid computing, cost reduction, and almost unlimited storage—that have attracted extensive attention from both academia and industry in recent years. Some review works have been reported to summarize extant studies related to cloud computing, but few analyze these studies based on the citations. Co-citation analysis can provide scholars a strong support to identify the intellectual bases and leading edges of a specific field. In addition, advanced algorithms, which can directly affect the availability, efficiency, and security of cloud computing, are the key to conducting computing across various clouds. Motivated by these observations, we conduct a specific visualization review of the studies related to cloud computing algorithms using one mainstream co-citation analysis tool—CiteSpace. The visualization results detect the most influential studies, journals, countries, institutions, and authors on cloud computing algorithms and reveal the intellectual bases and focuses of cloud computing algorithms in the literature, providing guidance for interested researchers to make further studies on cloud computing algorithms.
A Novel Parallel Algorithm for Edit Distance Computation
Directory of Open Access Journals (Sweden)
Muhammad Murtaza Yousaf
2018-01-01
Full Text Available The edit distance between two sequences is the minimum number of weighted transformation-operations that are required to transform one string into the other. The weighted transformation-operations are insert, remove, and substitute. Dynamic programming solution to find edit distance exists but it becomes computationally intensive when the lengths of strings become very large. This work presents a novel parallel algorithm to solve edit distance problem of string matching. The algorithm is based on resolving dependencies in the dynamic programming solution of the problem and it is able to compute each row of edit distance table in parallel. In this way, it becomes possible to compute the complete table in min(m,n iterations for strings of size m and n whereas state-of-the-art parallel algorithm solves the problem in max(m,n iterations. The proposed algorithm also increases the amount of parallelism in each of its iteration. The algorithm is also capable of exploiting spatial locality while its implementation. Additionally, the algorithm works in a load balanced way that further improves its performance. The algorithm is implemented for multicore systems having shared memory. Implementation of the algorithm in OpenMP shows linear speedup and better execution time as compared to state-of-the-art parallel approach. Efficiency of the algorithm is also proven better in comparison to its competitor.
Parallel algorithms and architecture for computation of manipulator forward dynamics
Fijany, Amir; Bejczy, Antal K.
1989-01-01
Parallel computation of manipulator forward dynamics is investigated. Considering three classes of algorithms for the solution of the problem, that is, the O(n), the O(n exp 2), and the O(n exp 3) algorithms, parallelism in the problem is analyzed. It is shown that the problem belongs to the class of NC and that the time and processors bounds are of O(log2/2n) and O(n exp 4), respectively. However, the fastest stable parallel algorithms achieve the computation time of O(n) and can be derived by parallelization of the O(n exp 3) serial algorithms. Parallel computation of the O(n exp 3) algorithms requires the development of parallel algorithms for a set of fundamentally different problems, that is, the Newton-Euler formulation, the computation of the inertia matrix, decomposition of the symmetric, positive definite matrix, and the solution of triangular systems. Parallel algorithms for this set of problems are developed which can be efficiently implemented on a unique architecture, a triangular array of n(n+2)/2 processors with a simple nearest-neighbor interconnection. This architecture is particularly suitable for VLSI and WSI implementations. The developed parallel algorithm, compared to the best serial O(n) algorithm, achieves an asymptotic speedup of more than two orders-of-magnitude in the computation the forward dynamics.
Efficient AM Algorithms for Stochastic ML Estimation of DOA
Directory of Open Access Journals (Sweden)
Haihua Chen
2016-01-01
Full Text Available The estimation of direction-of-arrival (DOA of signals is a basic and important problem in sensor array signal processing. To solve this problem, many algorithms have been proposed, among which the Stochastic Maximum Likelihood (SML is one of the most concerned algorithms because of its high accuracy of DOA. However, the estimation of SML generally involves the multidimensional nonlinear optimization problem. As a result, its computational complexity is rather high. This paper addresses the issue of reducing computational complexity of SML estimation of DOA based on the Alternating Minimization (AM algorithm. We have the following two contributions. First using transformation of matrix and properties of spatial projection, we propose an efficient AM (EAM algorithm by dividing the SML criterion into two components. One depends on a single variable parameter while the other does not. Second when the array is a uniform linear array, we get the irreducible form of the EAM criterion (IAM using polynomial forms. Simulation results show that both EAM and IAM can reduce the computational complexity of SML estimation greatly, while IAM is the best. Another advantage of IAM is that this algorithm can avoid the numerical instability problem which may happen in AM and EAM algorithms when more than one parameter converges to an identical value.
Computationally Efficient DOA Tracking Algorithm in Monostatic MIMO Radar with Automatic Association
Directory of Open Access Journals (Sweden)
Huaxin Yu
2014-01-01
Full Text Available We consider the problem of tracking the direction of arrivals (DOA of multiple moving targets in monostatic multiple-input multiple-output (MIMO radar. A low-complexity DOA tracking algorithm in monostatic MIMO radar is proposed. The proposed algorithm obtains DOA estimation via the difference between previous and current covariance matrix of the reduced-dimension transformation signal, and it reduces the computational complexity and realizes automatic association in DOA tracking. Error analysis and Cramér-Rao lower bound (CRLB of DOA tracking are derived in the paper. The proposed algorithm not only can be regarded as an extension of array-signal-processing DOA tracking algorithm in (Zhang et al. (2008, but also is an improved version of the DOA tracking algorithm in (Zhang et al. (2008. Furthermore, the proposed algorithm has better DOA tracking performance than the DOA tracking algorithm in (Zhang et al. (2008. The simulation results demonstrate effectiveness of the proposed algorithm. Our work provides the technical support for the practical application of MIMO radar.
Efficient quantum computing with weak measurements
International Nuclear Information System (INIS)
Lund, A P
2011-01-01
Projective measurements with high quantum efficiency are often assumed to be required for efficient circuit-based quantum computing. We argue that this is not the case and show that the fact that they are not required was actually known previously but was not deeply explored. We examine this issue by giving an example of how to perform the quantum-ordering-finding algorithm efficiently using non-local weak measurements considering that the measurements used are of bounded weakness and some fixed but arbitrary probability of success less than unity is required. We also show that it is possible to perform the same computation with only local weak measurements, but this must necessarily introduce an exponential overhead.
DANoC: An Efficient Algorithm and Hardware Codesign of Deep Neural Networks on Chip.
Zhou, Xichuan; Li, Shengli; Tang, Fang; Hu, Shengdong; Lin, Zhi; Zhang, Lei
2017-07-18
Deep neural networks (NNs) are the state-of-the-art models for understanding the content of images and videos. However, implementing deep NNs in embedded systems is a challenging task, e.g., a typical deep belief network could exhaust gigabytes of memory and result in bandwidth and computational bottlenecks. To address this challenge, this paper presents an algorithm and hardware codesign for efficient deep neural computation. A hardware-oriented deep learning algorithm, named the deep adaptive network, is proposed to explore the sparsity of neural connections. By adaptively removing the majority of neural connections and robustly representing the reserved connections using binary integers, the proposed algorithm could save up to 99.9% memory utility and computational resources without undermining classification accuracy. An efficient sparse-mapping-memory-based hardware architecture is proposed to fully take advantage of the algorithmic optimization. Different from traditional Von Neumann architecture, the deep-adaptive network on chip (DANoC) brings communication and computation in close proximity to avoid power-hungry parameter transfers between on-board memory and on-chip computational units. Experiments over different image classification benchmarks show that the DANoC system achieves competitively high accuracy and efficiency comparing with the state-of-the-art approaches.
Computational Discovery of Materials Using the Firefly Algorithm
Avendaño-Franco, Guillermo; Romero, Aldo
Our current ability to model physical phenomena accurately, the increase computational power and better algorithms are the driving forces behind the computational discovery and design of novel materials, allowing for virtual characterization before their realization in the laboratory. We present the implementation of a novel firefly algorithm, a population-based algorithm for global optimization for searching the structure/composition space. This novel computation-intensive approach naturally take advantage of concurrency, targeted exploration and still keeping enough diversity. We apply the new method in both periodic and non-periodic structures and we present the implementation challenges and solutions to improve efficiency. The implementation makes use of computational materials databases and network analysis to optimize the search and get insights about the geometric structure of local minima on the energy landscape. The method has been implemented in our software PyChemia, an open-source package for materials discovery. We acknowledge the support of DMREF-NSF 1434897 and the Donors of the American Chemical Society Petroleum Research Fund for partial support of this research under Contract 54075-ND10.
GLOA: A New Job Scheduling Algorithm for Grid Computing
Directory of Open Access Journals (Sweden)
Zahra Pooranian
2013-03-01
Full Text Available The purpose of grid computing is to produce a virtual supercomputer by using free resources available through widespread networks such as the Internet. This resource distribution, changes in resource availability, and an unreliable communication infrastructure pose a major challenge for efficient resource allocation. Because of the geographical spread of resources and their distributed management, grid scheduling is considered to be a NP-complete problem. It has been shown that evolutionary algorithms offer good performance for grid scheduling. This article uses a new evaluation (distributed algorithm inspired by the effect of leaders in social groups, the group leaders' optimization algorithm (GLOA, to solve the problem of scheduling independent tasks in a grid computing system. Simulation results comparing GLOA with several other evaluation algorithms show that GLOA produces shorter makespans.
An Efficient Algorithm for the Discrete Gabor Transform using full length Windows
DEFF Research Database (Denmark)
Søndergaard, Peter Lempel
2007-01-01
This paper extends the efficient factorization of the Gabor frame operator developed by Strohmer in [1] to the Gabor analysis/synthesis operator. This provides a fast method for computing the discrete Gabor transform (DGT) and several algorithms associated with it. The algorithm is used...
MODA: an efficient algorithm for network motif discovery in biological networks.
Omidi, Saeed; Schreiber, Falk; Masoudi-Nejad, Ali
2009-10-01
In recent years, interest has been growing in the study of complex networks. Since Erdös and Rényi (1960) proposed their random graph model about 50 years ago, many researchers have investigated and shaped this field. Many indicators have been proposed to assess the global features of networks. Recently, an active research area has developed in studying local features named motifs as the building blocks of networks. Unfortunately, network motif discovery is a computationally hard problem and finding rather large motifs (larger than 8 nodes) by means of current algorithms is impractical as it demands too much computational effort. In this paper, we present a new algorithm (MODA) that incorporates techniques such as a pattern growth approach for extracting larger motifs efficiently. We have tested our algorithm and found it able to identify larger motifs with more than 8 nodes more efficiently than most of the current state-of-the-art motif discovery algorithms. While most of the algorithms rely on induced subgraphs as motifs of the networks, MODA is able to extract both induced and non-induced subgraphs simultaneously. The MODA source code is freely available at: http://LBB.ut.ac.ir/Download/LBBsoft/MODA/
An efficient algorithm for incompressible N-phase flows
International Nuclear Information System (INIS)
Dong, S.
2014-01-01
We present an efficient algorithm within the phase field framework for simulating the motion of a mixture of N (N⩾2) immiscible incompressible fluids, with possibly very different physical properties such as densities, viscosities, and pairwise surface tensions. The algorithm employs a physical formulation for the N-phase system that honors the conservations of mass and momentum and the second law of thermodynamics. We present a method for uniquely determining the mixing energy density coefficients involved in the N-phase model based on the pairwise surface tensions among the N fluids. Our numerical algorithm has several attractive properties that make it computationally very efficient: (i) it has completely de-coupled the computations for different flow variables, and has also completely de-coupled the computations for the (N−1) phase field functions; (ii) the algorithm only requires the solution of linear algebraic systems after discretization, and no nonlinear algebraic solve is needed; (iii) for each flow variable the linear algebraic system involves only constant and time-independent coefficient matrices, which can be pre-computed during pre-processing, despite the variable density and variable viscosity of the N-phase mixture; (iv) within a time step the semi-discretized system involves only individual de-coupled Helmholtz-type (including Poisson) equations, despite the strongly-coupled phase–field system of fourth spatial order at the continuum level; (v) the algorithm is suitable for large density contrasts and large viscosity contrasts among the N fluids. Extensive numerical experiments have been presented for several problems involving multiple fluid phases, large density contrasts and large viscosity contrasts. In particular, we compare our simulations with the de Gennes theory, and demonstrate that our method produces physically accurate results for multiple fluid phases. We also demonstrate the significant and sometimes dramatic effects of the
EDDA: An Efficient Distributed Data Replication Algorithm in VANETs.
Zhu, Junyu; Huang, Chuanhe; Fan, Xiying; Guo, Sipei; Fu, Bin
2018-02-10
Efficient data dissemination in vehicular ad hoc networks (VANETs) is a challenging issue due to the dynamic nature of the network. To improve the performance of data dissemination, we study distributed data replication algorithms in VANETs for exchanging information and computing in an arbitrarily-connected network of vehicle nodes. To achieve low dissemination delay and improve the network performance, we control the number of message copies that can be disseminated in the network and then propose an efficient distributed data replication algorithm (EDDA). The key idea is to let the data carrier distribute the data dissemination tasks to multiple nodes to speed up the dissemination process. We calculate the number of communication stages for the network to enter into a balanced status and show that the proposed distributed algorithm can converge to a consensus in a small number of communication stages. Most of the theoretical results described in this paper are to study the complexity of network convergence. The lower bound and upper bound are also provided in the analysis of the algorithm. Simulation results show that the proposed EDDA can efficiently disseminate messages to vehicles in a specific area with low dissemination delay and system overhead.
EDDA: An Efficient Distributed Data Replication Algorithm in VANETs
Zhu, Junyu; Huang, Chuanhe; Fan, Xiying; Guo, Sipei; Fu, Bin
2018-01-01
Efficient data dissemination in vehicular ad hoc networks (VANETs) is a challenging issue due to the dynamic nature of the network. To improve the performance of data dissemination, we study distributed data replication algorithms in VANETs for exchanging information and computing in an arbitrarily-connected network of vehicle nodes. To achieve low dissemination delay and improve the network performance, we control the number of message copies that can be disseminated in the network and then propose an efficient distributed data replication algorithm (EDDA). The key idea is to let the data carrier distribute the data dissemination tasks to multiple nodes to speed up the dissemination process. We calculate the number of communication stages for the network to enter into a balanced status and show that the proposed distributed algorithm can converge to a consensus in a small number of communication stages. Most of the theoretical results described in this paper are to study the complexity of network convergence. The lower bound and upper bound are also provided in the analysis of the algorithm. Simulation results show that the proposed EDDA can efficiently disseminate messages to vehicles in a specific area with low dissemination delay and system overhead. PMID:29439443
Computationally Efficient Clustering of Audio-Visual Meeting Data
Hung, Hayley; Friedland, Gerald; Yeo, Chuohao
This chapter presents novel computationally efficient algorithms to extract semantically meaningful acoustic and visual events related to each of the participants in a group discussion using the example of business meeting recordings. The recording setup involves relatively few audio-visual sensors, comprising a limited number of cameras and microphones. We first demonstrate computationally efficient algorithms that can identify who spoke and when, a problem in speech processing known as speaker diarization. We also extract visual activity features efficiently from MPEG4 video by taking advantage of the processing that was already done for video compression. Then, we present a method of associating the audio-visual data together so that the content of each participant can be managed individually. The methods presented in this article can be used as a principal component that enables many higher-level semantic analysis tasks needed in search, retrieval, and navigation.
Tools for Analyzing Computing Resource Management Strategies and Algorithms for SDR Clouds
Marojevic, Vuk; Gomez-Miguelez, Ismael; Gelonch, Antoni
2012-09-01
Software defined radio (SDR) clouds centralize the computing resources of base stations. The computing resource pool is shared between radio operators and dynamically loads and unloads digital signal processing chains for providing wireless communications services on demand. Each new user session request particularly requires the allocation of computing resources for executing the corresponding SDR transceivers. The huge amount of computing resources of SDR cloud data centers and the numerous session requests at certain hours of a day require an efficient computing resource management. We propose a hierarchical approach, where the data center is divided in clusters that are managed in a distributed way. This paper presents a set of computing resource management tools for analyzing computing resource management strategies and algorithms for SDR clouds. We use the tools for evaluating a different strategies and algorithms. The results show that more sophisticated algorithms can achieve higher resource occupations and that a tradeoff exists between cluster size and algorithm complexity.
Indian Academy of Sciences (India)
Shortest path problems. Road network on cities and we want to navigate between cities. . – p.8/30 ..... The rest of the talk... Computing connectivities between all pairs of vertices good algorithm wrt both space and time to compute the exact solution. . – p.15/30 ...
An efficient parallel algorithm for the solution of a tridiagonal linear system of equations
Stone, H. S.
1971-01-01
Tridiagonal linear systems of equations are solved on conventional serial machines in a time proportional to N, where N is the number of equations. The conventional algorithms do not lend themselves directly to parallel computations on computers of the ILLIAC IV class, in the sense that they appear to be inherently serial. An efficient parallel algorithm is presented in which computation time grows as log sub 2 N. The algorithm is based on recursive doubling solutions of linear recurrence relations, and can be used to solve recurrence relations of all orders.
Efficient Algorithms for Electrostatic Interactions Including Dielectric Contrasts
Directory of Open Access Journals (Sweden)
Christian Holm
2013-10-01
Full Text Available Coarse-grained models of soft matter are usually combined with implicit solvent models that take the electrostatic polarizability into account via a dielectric background. In biophysical or nanoscale simulations that include water, this constant can vary greatly within the system. Performing molecular dynamics or other simulations that need to compute exact electrostatic interactions between charges in those systems is computationally demanding. We review here several algorithms developed by us that perform exactly this task. For planar dielectric surfaces in partial periodic boundary conditions, the arising image charges can be either treated with the MMM2D algorithm in a very efficient and accurate way or with the electrostatic layer correction term, which enables the user to use his favorite 3D periodic Coulomb solver. Arbitrarily-shaped interfaces can be dealt with using induced surface charges with the induced charge calculation (ICC* algorithm. Finally, the local electrostatics algorithm, MEMD(Maxwell Equations Molecular Dynamics, even allows one to employ a smoothly varying dielectric constant in the systems. We introduce the concepts of these three algorithms and an extension for the inclusion of boundaries that are to be held fixed at a constant potential (metal conditions. For each method, we present a showcase application to highlight the importance of dielectric interfaces.
Improved FHT Algorithms for Fast Computation of the Discrete Hartley Transform
Directory of Open Access Journals (Sweden)
M. T. Hamood
2013-05-01
Full Text Available In this paper, by using the symmetrical properties of the discrete Hartley transform (DHT, an improved radix-2 fast Hartley transform (FHT algorithm with arithmetic complexity comparable to that of the real-valued fast Fourier transform (RFFT is developed. It has a simple and regular butterfly structure and possesses the in-place computation property. Furthermore, using the same principles, the development can be extended to more efficient radix-based FHT algorithms. An example for the improved radix-4 FHT algorithm is given to show the validity of the presented method. The arithmetic complexity for the new algorithms are computed and then compared with the existing FHT algorithms. The results of these comparisons have shown that the developed algorithms reduce the number of multiplications and additions considerably.
Efficient O(N) recursive computation of the operational space inertial matrix
International Nuclear Information System (INIS)
Lilly, K.W.; Orin, D.E.
1993-01-01
The operational space inertia matrix Λ reflects the dynamic properties of a robot manipulator to its tip. In the control domain, it may be used to decouple force and/or motion control about the manipulator workspace axes. The matrix Λ also plays an important role in the development of efficient algorithms for the dynamic simulation of closed-chain robotic mechanisms, including simple closed-chain mechanisms such as multiple manipulator systems and walking machines. The traditional approach used to compute Λ has a computational complexity of O(N 3 ) for an N degree-of-freedom manipulator. This paper presents the development of a recursive algorithm for computing the operational space inertia matrix (OSIM) that reduces the computational complexity to O(N). This algorithm, the inertia propagation method, is based on a single recursion that begins at the base of the manipulator and progresses out to the last link. Also applicable to redundant systems and mechanisms with multiple-degree-of-freedom joints, the inertia propagation method is the most efficient method known for computing Λ for N ≥ 6. The numerical accuracy of the algorithm is discussed for a PUMA 560 robot with a fixed base
An algorithm of discovering signatures from DNA databases on a computer cluster.
Lee, Hsiao Ping; Sheu, Tzu-Fang
2014-10-05
Signatures are short sequences that are unique and not similar to any other sequence in a database that can be used as the basis to identify different species. Even though several signature discovery algorithms have been proposed in the past, these algorithms require the entirety of databases to be loaded in the memory, thus restricting the amount of data that they can process. It makes those algorithms unable to process databases with large amounts of data. Also, those algorithms use sequential models and have slower discovery speeds, meaning that the efficiency can be improved. In this research, we are debuting the utilization of a divide-and-conquer strategy in signature discovery and have proposed a parallel signature discovery algorithm on a computer cluster. The algorithm applies the divide-and-conquer strategy to solve the problem posed to the existing algorithms where they are unable to process large databases and uses a parallel computing mechanism to effectively improve the efficiency of signature discovery. Even when run with just the memory of regular personal computers, the algorithm can still process large databases such as the human whole-genome EST database which were previously unable to be processed by the existing algorithms. The algorithm proposed in this research is not limited by the amount of usable memory and can rapidly find signatures in large databases, making it useful in applications such as Next Generation Sequencing and other large database analysis and processing. The implementation of the proposed algorithm is available at http://www.cs.pu.edu.tw/~fang/DDCSDPrograms/DDCSD.htm.
A Computational Fluid Dynamics Algorithm on a Massively Parallel Computer
Jespersen, Dennis C.; Levit, Creon
1989-01-01
The discipline of computational fluid dynamics is demanding ever-increasing computational power to deal with complex fluid flow problems. We investigate the performance of a finite-difference computational fluid dynamics algorithm on a massively parallel computer, the Connection Machine. Of special interest is an implicit time-stepping algorithm; to obtain maximum performance from the Connection Machine, it is necessary to use a nonstandard algorithm to solve the linear systems that arise in the implicit algorithm. We find that the Connection Machine ran achieve very high computation rates on both explicit and implicit algorithms. The performance of the Connection Machine puts it in the same class as today's most powerful conventional supercomputers.
EDDA: An Efficient Distributed Data Replication Algorithm in VANETs
Directory of Open Access Journals (Sweden)
Junyu Zhu
2018-02-01
Full Text Available Efficient data dissemination in vehicular ad hoc networks (VANETs is a challenging issue due to the dynamic nature of the network. To improve the performance of data dissemination, we study distributed data replication algorithms in VANETs for exchanging information and computing in an arbitrarily-connected network of vehicle nodes. To achieve low dissemination delay and improve the network performance, we control the number of message copies that can be disseminated in the network and then propose an efficient distributed data replication algorithm (EDDA. The key idea is to let the data carrier distribute the data dissemination tasks to multiple nodes to speed up the dissemination process. We calculate the number of communication stages for the network to enter into a balanced status and show that the proposed distributed algorithm can converge to a consensus in a small number of communication stages. Most of the theoretical results described in this paper are to study the complexity of network convergence. The lower bound and upper bound are also provided in the analysis of the algorithm. Simulation results show that the proposed EDDA can efficiently disseminate messages to vehicles in a specific area with low dissemination delay and system overhead.
Computational Analysis of Distance Operators for the Iterative Closest Point Algorithm.
Directory of Open Access Journals (Sweden)
Higinio Mora
Full Text Available The Iterative Closest Point (ICP algorithm is currently one of the most popular methods for rigid registration so that it has become the standard in the Robotics and Computer Vision communities. Many applications take advantage of it to align 2D/3D surfaces due to its popularity and simplicity. Nevertheless, some of its phases present a high computational cost thus rendering impossible some of its applications. In this work, it is proposed an efficient approach for the matching phase of the Iterative Closest Point algorithm. This stage is the main bottleneck of that method so that any efficiency improvement has a great positive impact on the performance of the algorithm. The proposal consists in using low computational cost point-to-point distance metrics instead of classic Euclidean one. The candidates analysed are the Chebyshev and Manhattan distance metrics due to their simpler formulation. The experiments carried out have validated the performance, robustness and quality of the proposal. Different experimental cases and configurations have been set up including a heterogeneous set of 3D figures, several scenarios with partial data and random noise. The results prove that an average speed up of 14% can be obtained while preserving the convergence properties of the algorithm and the quality of the final results.
Unified commutation-pruning technique for efficient computation of composite DFTs
Castro-Palazuelos, David E.; Medina-Melendrez, Modesto Gpe.; Torres-Roman, Deni L.; Shkvarko, Yuriy V.
2015-12-01
An efficient computation of a composite length discrete Fourier transform (DFT), as well as a fast Fourier transform (FFT) of both time and space data sequences in uncertain (non-sparse or sparse) computational scenarios, requires specific processing algorithms. Traditional algorithms typically employ some pruning methods without any commutations, which prevents them from attaining the potential computational efficiency. In this paper, we propose an alternative unified approach with automatic commutations between three computational modalities aimed at efficient computations of the pruned DFTs adapted for variable composite lengths of the non-sparse input-output data. The first modality is an implementation of the direct computation of a composite length DFT, the second one employs the second-order recursive filtering method, and the third one performs the new pruned decomposed transform. The pruned decomposed transform algorithm performs the decimation in time or space (DIT) data acquisition domain and, then, decimation in frequency (DIF). The unified combination of these three algorithms is addressed as the DFTCOMM technique. Based on the treatment of the combinational-type hypotheses testing optimization problem of preferable allocations between all feasible commuting-pruning modalities, we have found the global optimal solution to the pruning problem that always requires a fewer or, at most, the same number of arithmetic operations than other feasible modalities. The DFTCOMM method outperforms the existing competing pruning techniques in the sense of attainable savings in the number of required arithmetic operations. It requires fewer or at most the same number of arithmetic operations for its execution than any other of the competing pruning methods reported in the literature. Finally, we provide the comparison of the DFTCOMM with the recently developed sparse fast Fourier transform (SFFT) algorithmic family. We feature that, in the sensing scenarios with
Efficient Active Contour and K-Means Algorithms in Image Segmentation
Directory of Open Access Journals (Sweden)
J.R. Rommelse
2004-01-01
Full Text Available In this paper we discuss a classic clustering algorithm that can be used to segment images and a recently developed active contour image segmentation model. We propose integrating aspects of the classic algorithm to improve the active contour model. For the resulting CVK and B-means segmentation algorithms we examine methods to decrease the size of the image domain. The CVK method has been implemented to run on parallel and distributed computers. By changing the order of updating the pixels, it was possible to replace synchronous communication with asynchronous communication and subsequently the parallel efficiency is improved.
Efficient irregular wavefront propagation algorithms on Intel® Xeon Phi™
Gomes, Jeremias M.; Teodoro, George; de Melo, Alba; Kong, Jun; Kurc, Tahsin; Saltz, Joel H.
2016-01-01
We investigate the execution of the Irregular Wavefront Propagation Pattern (IWPP), a fundamental computing structure used in several image analysis operations, on the Intel® Xeon Phi™ co-processor. An efficient implementation of IWPP on the Xeon Phi is a challenging problem because of IWPP’s irregularity and the use of atomic instructions in the original IWPP algorithm to resolve race conditions. On the Xeon Phi, the use of SIMD and vectorization instructions is critical to attain high performance. However, SIMD atomic instructions are not supported. Therefore, we propose a new IWPP algorithm that can take advantage of the supported SIMD instruction set. We also evaluate an alternate storage container (priority queue) to track active elements in the wavefront in an effort to improve the parallel algorithm efficiency. The new IWPP algorithm is evaluated with Morphological Reconstruction and Imfill operations as use cases. Our results show performance improvements of up to 5.63× on top of the original IWPP due to vectorization. Moreover, the new IWPP achieves speedups of 45.7× and 1.62×, respectively, as compared to efficient CPU and GPU implementations. PMID:27298591
Efficient irregular wavefront propagation algorithms on Intel® Xeon Phi™.
Gomes, Jeremias M; Teodoro, George; de Melo, Alba; Kong, Jun; Kurc, Tahsin; Saltz, Joel H
2015-10-01
We investigate the execution of the Irregular Wavefront Propagation Pattern (IWPP), a fundamental computing structure used in several image analysis operations, on the Intel ® Xeon Phi ™ co-processor. An efficient implementation of IWPP on the Xeon Phi is a challenging problem because of IWPP's irregularity and the use of atomic instructions in the original IWPP algorithm to resolve race conditions. On the Xeon Phi, the use of SIMD and vectorization instructions is critical to attain high performance. However, SIMD atomic instructions are not supported. Therefore, we propose a new IWPP algorithm that can take advantage of the supported SIMD instruction set. We also evaluate an alternate storage container (priority queue) to track active elements in the wavefront in an effort to improve the parallel algorithm efficiency. The new IWPP algorithm is evaluated with Morphological Reconstruction and Imfill operations as use cases. Our results show performance improvements of up to 5.63 × on top of the original IWPP due to vectorization. Moreover, the new IWPP achieves speedups of 45.7 × and 1.62 × , respectively, as compared to efficient CPU and GPU implementations.
Efficient GPS Position Determination Algorithms
National Research Council Canada - National Science Library
Nguyen, Thao Q
2007-01-01
... differential GPS algorithm for a network of users. The stand-alone user GPS algorithm is a direct, closed-form, and efficient new position determination algorithm that exploits the closed-form solution of the GPS trilateration equations and works...
Efficient parallel implementation of active appearance model fitting algorithm on GPU.
Wang, Jinwei; Ma, Xirong; Zhu, Yuanping; Sun, Jizhou
2014-01-01
The active appearance model (AAM) is one of the most powerful model-based object detecting and tracking methods which has been widely used in various situations. However, the high-dimensional texture representation causes very time-consuming computations, which makes the AAM difficult to apply to real-time systems. The emergence of modern graphics processing units (GPUs) that feature a many-core, fine-grained parallel architecture provides new and promising solutions to overcome the computational challenge. In this paper, we propose an efficient parallel implementation of the AAM fitting algorithm on GPUs. Our design idea is fine grain parallelism in which we distribute the texture data of the AAM, in pixels, to thousands of parallel GPU threads for processing, which makes the algorithm fit better into the GPU architecture. We implement our algorithm using the compute unified device architecture (CUDA) on the Nvidia's GTX 650 GPU, which has the latest Kepler architecture. To compare the performance of our algorithm with different data sizes, we built sixteen face AAM models of different dimensional textures. The experiment results show that our parallel AAM fitting algorithm can achieve real-time performance for videos even on very high-dimensional textures.
Computationally efficient clustering of audio-visual meeting data
Hung, H.; Friedland, G.; Yeo, C.; Shao, L.; Shan, C.; Luo, J.; Etoh, M.
2010-01-01
This chapter presents novel computationally efficient algorithms to extract semantically meaningful acoustic and visual events related to each of the participants in a group discussion using the example of business meeting recordings. The recording setup involves relatively few audio-visual sensors,
Noise filtering algorithm for the MFTF-B computer based control system
International Nuclear Information System (INIS)
Minor, E.G.
1983-01-01
An algorithm to reduce the message traffic in the MFTF-B computer based control system is described. The algorithm filters analog inputs to the control system. Its purpose is to distinguish between changes in the inputs due to noise and changes due to significant variations in the quantity being monitored. Noise is rejected while significant changes are reported to the control system data base, thus keeping the data base updated with a minimum number of messages. The algorithm is memory efficient, requiring only four bytes of storage per analog channel, and computationally simple, requiring only subtraction and comparison. Quantitative analysis of the algorithm is presented for the case of additive Gaussian noise. It is shown that the algorithm is stable and tends toward the mean value of the monitored variable over a wide variety of additive noise distributions
Evolving Resilient Back-Propagation Algorithm for Energy Efficiency Problem
Directory of Open Access Journals (Sweden)
Yang Fei
2016-01-01
Full Text Available Energy efficiency is one of our most economical sources of new energy. When it comes to efficient building design, the computation of the heating load (HL and cooling load (CL is required to determine the specifications of the heating and cooling equipment. The objective of this paper is to model heating load and cooling load buildings using neural networks in order to predict HL load and CL load. Rprop with genetic algorithm was proposed to increase the global convergence capability of Rprop by modifying a corresponding weight. Comparison results show that Rprop with GA can successfully improve the global convergence capability of Rprop and achieve lower MSE than other perceptron training algorithms, such as Back-Propagation or original Rprop. In addition, the trained network has better generalization ability and stabilization performance.
Hard Real-Time Task Scheduling in Cloud Computing Using an Adaptive Genetic Algorithm
Directory of Open Access Journals (Sweden)
Amjad Mahmood
2017-04-01
Full Text Available In the Infrastructure-as-a-Service cloud computing model, virtualized computing resources in the form of virtual machines are provided over the Internet. A user can rent an arbitrary number of computing resources to meet their requirements, making cloud computing an attractive choice for executing real-time tasks. Economical task allocation and scheduling on a set of leased virtual machines is an important problem in the cloud computing environment. This paper proposes a greedy and a genetic algorithm with an adaptive selection of suitable crossover and mutation operations (named as AGA to allocate and schedule real-time tasks with precedence constraint on heterogamous virtual machines. A comprehensive simulation study has been done to evaluate the performance of the proposed algorithms in terms of their solution quality and efficiency. The simulation results show that AGA outperforms the greedy algorithm and non-adaptive genetic algorithm in terms of solution quality.
Iterative schemes for parallel Sn algorithms in a shared-memory computing environment
International Nuclear Information System (INIS)
Haghighat, A.; Hunter, M.A.; Mattis, R.E.
1995-01-01
Several two-dimensional spatial domain partitioning S n transport theory algorithms are developed on the basis of different iterative schemes. These algorithms are incorporated into TWOTRAN-II and tested on the shared-memory CRAY Y-MP C90 computer. For a series of fixed-source r-z geometry homogeneous problems, it is demonstrated that the concurrent red-black algorithms may result in large parallel efficiencies (>60%) on C90. It is also demonstrated that for a realistic shielding problem, the use of the negative flux fixup causes high load imbalance, which results in a significant loss of parallel efficiency
Directory of Open Access Journals (Sweden)
Sofia D Karamintziou
Full Text Available Advances in the field of closed-loop neuromodulation call for analysis and modeling approaches capable of confronting challenges related to the complex neuronal response to stimulation and the presence of strong internal and measurement noise in neural recordings. Here we elaborate on the algorithmic aspects of a noise-resistant closed-loop subthalamic nucleus deep brain stimulation system for advanced Parkinson's disease and treatment-refractory obsessive-compulsive disorder, ensuring remarkable performance in terms of both efficiency and selectivity of stimulation, as well as in terms of computational speed. First, we propose an efficient method drawn from dynamical systems theory, for the reliable assessment of significant nonlinear coupling between beta and high-frequency subthalamic neuronal activity, as a biomarker for feedback control. Further, we present a model-based strategy through which optimal parameters of stimulation for minimum energy desynchronizing control of neuronal activity are being identified. The strategy integrates stochastic modeling and derivative-free optimization of neural dynamics based on quadratic modeling. On the basis of numerical simulations, we demonstrate the potential of the presented modeling approach to identify, at a relatively low computational cost, stimulation settings potentially associated with a significantly higher degree of efficiency and selectivity compared with stimulation settings determined post-operatively. Our data reinforce the hypothesis that model-based control strategies are crucial for the design of novel stimulation protocols at the backstage of clinical applications.
Karamintziou, Sofia D; Custódio, Ana Luísa; Piallat, Brigitte; Polosan, Mircea; Chabardès, Stéphan; Stathis, Pantelis G; Tagaris, George A; Sakas, Damianos E; Polychronaki, Georgia E; Tsirogiannis, George L; David, Olivier; Nikita, Konstantina S
2017-01-01
Advances in the field of closed-loop neuromodulation call for analysis and modeling approaches capable of confronting challenges related to the complex neuronal response to stimulation and the presence of strong internal and measurement noise in neural recordings. Here we elaborate on the algorithmic aspects of a noise-resistant closed-loop subthalamic nucleus deep brain stimulation system for advanced Parkinson's disease and treatment-refractory obsessive-compulsive disorder, ensuring remarkable performance in terms of both efficiency and selectivity of stimulation, as well as in terms of computational speed. First, we propose an efficient method drawn from dynamical systems theory, for the reliable assessment of significant nonlinear coupling between beta and high-frequency subthalamic neuronal activity, as a biomarker for feedback control. Further, we present a model-based strategy through which optimal parameters of stimulation for minimum energy desynchronizing control of neuronal activity are being identified. The strategy integrates stochastic modeling and derivative-free optimization of neural dynamics based on quadratic modeling. On the basis of numerical simulations, we demonstrate the potential of the presented modeling approach to identify, at a relatively low computational cost, stimulation settings potentially associated with a significantly higher degree of efficiency and selectivity compared with stimulation settings determined post-operatively. Our data reinforce the hypothesis that model-based control strategies are crucial for the design of novel stimulation protocols at the backstage of clinical applications.
An extended Intelligent Water Drops algorithm for workflow scheduling in cloud computing environment
Directory of Open Access Journals (Sweden)
Shaymaa Elsherbiny
2018-03-01
Full Text Available Cloud computing is emerging as a high performance computing environment with a large scale, heterogeneous collection of autonomous systems and flexible computational architecture. Many resource management methods may enhance the efficiency of the whole cloud computing system. The key part of cloud computing resource management is resource scheduling. Optimized scheduling of tasks on the cloud virtual machines is an NP-hard problem and many algorithms have been presented to solve it. The variations among these schedulers are due to the fact that the scheduling strategies of the schedulers are adapted to the changing environment and the types of tasks. The focus of this paper is on workflows scheduling in cloud computing, which is gaining a lot of attention recently because workflows have emerged as a paradigm to represent complex computing problems. We proposed a novel algorithm extending the natural-based Intelligent Water Drops (IWD algorithm that optimizes the scheduling of workflows on the cloud. The proposed algorithm is implemented and embedded within the workflows simulation toolkit and tested in different simulated cloud environments with different cost models. Our algorithm showed noticeable enhancements over the classical workflow scheduling algorithms. We made a comparison between the proposed IWD-based algorithm with other well-known scheduling algorithms, including MIN-MIN, MAX-MIN, Round Robin, FCFS, and MCT, PSO and C-PSO, where the proposed algorithm presented noticeable enhancements in the performance and cost in most situations.
Research on allocation efficiency of the daisy chain allocation algorithm
Shi, Jingping; Zhang, Weiguo
2013-03-01
With the improvement of the aircraft performance in reliability, maneuverability and survivability, the number of the control effectors increases a lot. How to distribute the three-axis moments into the control surfaces reasonably becomes an important problem. Daisy chain method is simple and easy to be carried out in the design of the allocation system. But it can not solve the allocation problem for entire attainable moment subset. For the lateral-directional allocation problem, the allocation efficiency of the daisy chain can be directly measured by the area of its subset of attainable moments. Because of the non-linear allocation characteristic, the subset of attainable moments of daisy-chain method is a complex non-convex polygon, and it is difficult to solve directly. By analyzing the two-dimensional allocation problems with a "micro-element" idea, a numerical calculation algorithm is proposed to compute the area of the non-convex polygon. In order to improve the allocation efficiency of the algorithm, a genetic algorithm with the allocation efficiency chosen as the fitness function is proposed to find the best pseudo-inverse matrix.
An efficient parallel algorithm for the calculation of canonical MP2 energies.
Baker, Jon; Pulay, Peter
2002-09-01
We present the parallel version of a previous serial algorithm for the efficient calculation of canonical MP2 energies (Pulay, P.; Saebo, S.; Wolinski, K. Chem Phys Lett 2001, 344, 543). It is based on the Saebo-Almlöf direct-integral transformation, coupled with an efficient prescreening of the AO integrals. The parallel algorithm avoids synchronization delays by spawning a second set of slaves during the bin-sort prior to the second half-transformation. Results are presented for systems with up to 2000 basis functions. MP2 energies for molecules with 400-500 basis functions can be routinely calculated to microhartree accuracy on a small number of processors (6-8) in a matter of minutes with modern PC-based parallel computers. Copyright 2002 Wiley Periodicals, Inc. J Comput Chem 23: 1150-1156, 2002
Conformal geometry computational algorithms and engineering applications
Jin, Miao; He, Ying; Wang, Yalin
2018-01-01
This book offers an essential overview of computational conformal geometry applied to fundamental problems in specific engineering fields. It introduces readers to conformal geometry theory and discusses implementation issues from an engineering perspective. The respective chapters explore fundamental problems in specific fields of application, and detail how computational conformal geometric methods can be used to solve them in a theoretically elegant and computationally efficient way. The fields covered include computer graphics, computer vision, geometric modeling, medical imaging, and wireless sensor networks. Each chapter concludes with a summary of the material covered and suggestions for further reading, and numerous illustrations and computational algorithms complement the text. The book draws on courses given by the authors at the University of Louisiana at Lafayette, the State University of New York at Stony Brook, and Tsinghua University, and will be of interest to senior undergraduates, gradua...
An Efficient ABC_DE_Based Hybrid Algorithm for Protein–Ligand Docking
Directory of Open Access Journals (Sweden)
Boxin Guan
2018-04-01
Full Text Available Protein–ligand docking is a process of searching for the optimal binding conformation between the receptor and the ligand. Automated docking plays an important role in drug design, and an efficient search algorithm is needed to tackle the docking problem. To tackle the protein–ligand docking problem more efficiently, An ABC_DE_based hybrid algorithm (ADHDOCK, integrating artificial bee colony (ABC algorithm and differential evolution (DE algorithm, is proposed in the article. ADHDOCK applies an adaptive population partition (APP mechanism to reasonably allocate the computational resources of the population in each iteration process, which helps the novel method make better use of the advantages of ABC and DE. The experiment tested fifty protein–ligand docking problems to compare the performance of ADHDOCK, ABC, DE, Lamarckian genetic algorithm (LGA, running history information guided genetic algorithm (HIGA, and swarm optimization for highly flexible protein–ligand docking (SODOCK. The results clearly exhibit the capability of ADHDOCK toward finding the lowest energy and the smallest root-mean-square deviation (RMSD on most of the protein–ligand docking problems with respect to the other five algorithms.
An Algorithm for Computing Screened Coulomb Scattering in Geant4
Mendenhall, Marcus H.; Weller, Robert A.
2004-01-01
An algorithm has been developed for the Geant4 Monte-Carlo package for the efficient computation of screened Coulomb interatomic scattering. It explicitly integrates the classical equations of motion for scattering events, resulting in precise tracking of both the projectile and the recoil target nucleus. The algorithm permits the user to plug in an arbitrary screening function, such as Lens-Jensen screening, which is good for backscattering calculations, or Ziegler-Biersack-Littmark screenin...
International Nuclear Information System (INIS)
Kirk, B.L.; Azmy, Y.Y.
1992-01-01
In this paper the one-group, steady-state neutron diffusion equation in two-dimensional Cartesian geometry is solved using the nodal integral method. The discrete variable equations comprise loosely coupled sets of equations representing the nodal balance of neutrons, as well as neutron current continuity along rows or columns of computational cells. An iterative algorithm that is more suitable for solving large problems concurrently is derived based on the decomposition of the spatial domain and is accelerated using successive overrelaxation. This algorithm is very well suited for parallel computers, especially since the spatial domain decomposition occurs naturally, so that the number of iterations required for convergence does not depend on the number of processors participating in the calculation. Implementation of the authors' algorithm on the Intel iPSC/2 hypercube and Sequent Balance 8000 parallel computer is presented, and measured speedup and efficiency for test problems are reported. The results suggest that the efficiency of the hypercube quickly deteriorates when many processors are used, while the Sequent Balance retains very high efficiency for a comparable number of participating processors. This leads to the conjecture that message-passing parallel computers are not as well suited for this algorithm as shared-memory machines
Secure Computation, I/O-Efficient Algorithms and Distributed Signatures
DEFF Research Database (Denmark)
Damgård, Ivan Bjerre; Kölker, Jonas; Toft, Tomas
2012-01-01
values of form r, gr for random secret-shared r ∈ ℤq and gr in a group of order q. This costs a constant number of exponentiation per player per value generated, even if less than n/3 players are malicious. This can be used for efficient distributed computing of Schnorr signatures. We further develop...... the technique so we can sign secret data in a distributed fashion at essentially the same cost....
Efficient Dual Domain Decoding of Linear Block Codes Using Genetic Algorithms
Directory of Open Access Journals (Sweden)
Ahmed Azouaoui
2012-01-01
Full Text Available A computationally efficient algorithm for decoding block codes is developed using a genetic algorithm (GA. The proposed algorithm uses the dual code in contrast to the existing genetic decoders in the literature that use the code itself. Hence, this new approach reduces the complexity of decoding the codes of high rates. We simulated our algorithm in various transmission channels. The performance of this algorithm is investigated and compared with competitor decoding algorithms including Maini and Shakeel ones. The results show that the proposed algorithm gives large gains over the Chase-2 decoding algorithm and reach the performance of the OSD-3 for some quadratic residue (QR codes. Further, we define a new crossover operator that exploits the domain specific information and compare it with uniform and two point crossover. The complexity of this algorithm is also discussed and compared to other algorithms.
Computational geometry algorithms and applications
de Berg, Mark; Overmars, Mark; Schwarzkopf, Otfried
1997-01-01
Computational geometry emerged from the field of algorithms design and anal ysis in the late 1970s. It has grown into a recognized discipline with its own journals, conferences, and a large community of active researchers. The suc cess of the field as a research discipline can on the one hand be explained from the beauty of the problems studied and the solutions obtained, and, on the other hand, by the many application domains--computer graphics, geographic in formation systems (GIS), robotics, and others-in which geometric algorithms play a fundamental role. For many geometric problems the early algorithmic solutions were either slow or difficult to understand and implement. In recent years a number of new algorithmic techniques have been developed that improved and simplified many of the previous approaches. In this textbook we have tried to make these modem algorithmic solutions accessible to a large audience. The book has been written as a textbook for a course in computational geometry, but it can ...
Hardware-Efficient Design of Real-Time Profile Shape Matching Stereo Vision Algorithm on FPGA
Directory of Open Access Journals (Sweden)
Beau Tippetts
2014-01-01
Full Text Available A variety of platforms, such as micro-unmanned vehicles, are limited in the amount of computational hardware they can support due to weight and power constraints. An efficient stereo vision algorithm implemented on an FPGA would be able to minimize payload and power consumption in microunmanned vehicles, while providing 3D information and still leaving computational resources available for other processing tasks. This work presents a hardware design of the efficient profile shape matching stereo vision algorithm. Hardware resource usage is presented for the targeted micro-UV platform, Helio-copter, that uses the Xilinx Virtex 4 FX60 FPGA. Less than a fifth of the resources on this FGPA were used to produce dense disparity maps for image sizes up to 450 × 375, with the ability to scale up easily by increasing BRAM usage. A comparison is given of accuracy, speed performance, and resource usage of a census transform-based stereo vision FPGA implementation by Jin et al. Results show that the profile shape matching algorithm is an efficient real-time stereo vision algorithm for hardware implementation for resource limited systems such as microunmanned vehicles.
Efficient Parallel Implementation of Active Appearance Model Fitting Algorithm on GPU
Directory of Open Access Journals (Sweden)
Jinwei Wang
2014-01-01
Full Text Available The active appearance model (AAM is one of the most powerful model-based object detecting and tracking methods which has been widely used in various situations. However, the high-dimensional texture representation causes very time-consuming computations, which makes the AAM difficult to apply to real-time systems. The emergence of modern graphics processing units (GPUs that feature a many-core, fine-grained parallel architecture provides new and promising solutions to overcome the computational challenge. In this paper, we propose an efficient parallel implementation of the AAM fitting algorithm on GPUs. Our design idea is fine grain parallelism in which we distribute the texture data of the AAM, in pixels, to thousands of parallel GPU threads for processing, which makes the algorithm fit better into the GPU architecture. We implement our algorithm using the compute unified device architecture (CUDA on the Nvidia’s GTX 650 GPU, which has the latest Kepler architecture. To compare the performance of our algorithm with different data sizes, we built sixteen face AAM models of different dimensional textures. The experiment results show that our parallel AAM fitting algorithm can achieve real-time performance for videos even on very high-dimensional textures.
Algorithms for computational fluid dynamics n parallel processors
International Nuclear Information System (INIS)
Van de Velde, E.F.
1986-01-01
A study of parallel algorithms for the numerical solution of partial differential equations arising in computational fluid dynamics is presented. The actual implementation on parallel processors of shared and nonshared memory design is discussed. The performance of these algorithms is analyzed in terms of machine efficiency, communication time, bottlenecks and software development costs. For elliptic equations, a parallel preconditioned conjugate gradient method is described, which has been used to solve pressure equations discretized with high order finite elements on irregular grids. A parallel full multigrid method and a parallel fast Poisson solver are also presented. Hyperbolic conservation laws were discretized with parallel versions of finite difference methods like the Lax-Wendroff scheme and with the Random Choice method. Techniques are developed for comparing the behavior of an algorithm on different architectures as a function of problem size and local computational effort. Effective use of these advanced architecture machines requires the use of machine dependent programming. It is shown that the portability problems can be minimized by introducing high level operations on vectors and matrices structured into program libraries
An Efficient Meta Heuristic Algorithm to Solve Economic Load Dispatch Problems
Directory of Open Access Journals (Sweden)
R Subramanian
2013-12-01
Full Text Available The Economic Load Dispatch (ELD problems in power generation systems are to reduce the fuel cost by reducing the total cost for the generation of electric power. This paper presents an efficient Modified Firefly Algorithm (MFA, for solving ELD Problem. The main objective of the problems is to minimize the total fuel cost of the generating units having quadratic cost functions subjected to limits on generator true power output and transmission losses. The MFA is a stochastic, Meta heuristic approach based on the idealized behaviour of the flashing characteristics of fireflies. This paper presents an application of MFA to ELD for six generator test case system. MFA is applied to ELD problem and compared its solution quality and computation efficiency to Genetic algorithm (GA, Differential Evolution (DE, Particle swarm optimization (PSO, Artificial Bee Colony optimization (ABC, Biogeography-Based Optimization (BBO, Bacterial Foraging optimization (BFO, Firefly Algorithm (FA techniques. The simulation result shows that the proposed algorithm outperforms previous optimization methods.
Parallel algorithms and cluster computing
Hoffmann, Karl Heinz
2007-01-01
This book presents major advances in high performance computing as well as major advances due to high performance computing. It contains a collection of papers in which results achieved in the collaboration of scientists from computer science, mathematics, physics, and mechanical engineering are presented. From the science problems to the mathematical algorithms and on to the effective implementation of these algorithms on massively parallel and cluster computers we present state-of-the-art methods and technology as well as exemplary results in these fields. This book shows that problems which seem superficially distinct become intimately connected on a computational level.
Processing-Efficient Distributed Adaptive RLS Filtering for Computationally Constrained Platforms
Directory of Open Access Journals (Sweden)
Noor M. Khan
2017-01-01
Full Text Available In this paper, a novel processing-efficient architecture of a group of inexpensive and computationally incapable small platforms is proposed for a parallely distributed adaptive signal processing (PDASP operation. The proposed architecture runs computationally expensive procedures like complex adaptive recursive least square (RLS algorithm cooperatively. The proposed PDASP architecture operates properly even if perfect time alignment among the participating platforms is not available. An RLS algorithm with the application of MIMO channel estimation is deployed on the proposed architecture. Complexity and processing time of the PDASP scheme with MIMO RLS algorithm are compared with sequentially operated MIMO RLS algorithm and liner Kalman filter. It is observed that PDASP scheme exhibits much lesser computational complexity parallely than the sequential MIMO RLS algorithm as well as Kalman filter. Moreover, the proposed architecture provides an improvement of 95.83% and 82.29% decreased processing time parallely compared to the sequentially operated Kalman filter and MIMO RLS algorithm for low doppler rate, respectively. Likewise, for high doppler rate, the proposed architecture entails an improvement of 94.12% and 77.28% decreased processing time compared to the Kalman and RLS algorithms, respectively.
Protein alignment algorithms with an efficient backtracking routine on multiple GPUs
Directory of Open Access Journals (Sweden)
Kierzynka Michal
2011-05-01
Full Text Available Abstract Background Pairwise sequence alignment methods are widely used in biological research. The increasing number of sequences is perceived as one of the upcoming challenges for sequence alignment methods in the nearest future. To overcome this challenge several GPU (Graphics Processing Unit computing approaches have been proposed lately. These solutions show a great potential of a GPU platform but in most cases address the problem of sequence database scanning and computing only the alignment score whereas the alignment itself is omitted. Thus, the need arose to implement the global and semiglobal Needleman-Wunsch, and Smith-Waterman algorithms with a backtracking procedure which is needed to construct the alignment. Results In this paper we present the solution that performs the alignment of every given sequence pair, which is a required step for progressive multiple sequence alignment methods, as well as for DNA recognition at the DNA assembly stage. Performed tests show that the implementation, with performance up to 6.3 GCUPS on a single GPU for affine gap penalties, is very efficient in comparison to other CPU and GPU-based solutions. Moreover, multiple GPUs support with load balancing makes the application very scalable. Conclusions The article shows that the backtracking procedure of the sequence alignment algorithms may be designed to fit in with the GPU architecture. Therefore, our algorithm, apart from scores, is able to compute pairwise alignments. This opens a wide range of new possibilities, allowing other methods from the area of molecular biology to take advantage of the new computational architecture. Performed tests show that the efficiency of the implementation is excellent. Moreover, the speed of our GPU-based algorithms can be almost linearly increased when using more than one graphics card.
Essential algorithms a practical approach to computer algorithms
Stephens, Rod
2013-01-01
A friendly and accessible introduction to the most useful algorithms Computer algorithms are the basic recipes for programming. Professional programmers need to know how to use algorithms to solve difficult programming problems. Written in simple, intuitive English, this book describes how and when to use the most practical classic algorithms, and even how to create new algorithms to meet future needs. The book also includes a collection of questions that can help readers prepare for a programming job interview. Reveals methods for manipulating common data structures s
International Nuclear Information System (INIS)
Vecharynski, Eugene; Yang, Chao; Pask, John E.
2015-01-01
We present an iterative algorithm for computing an invariant subspace associated with the algebraically smallest eigenvalues of a large sparse or structured Hermitian matrix A. We are interested in the case in which the dimension of the invariant subspace is large (e.g., over several hundreds or thousands) even though it may still be small relative to the dimension of A. These problems arise from, for example, density functional theory (DFT) based electronic structure calculations for complex materials. The key feature of our algorithm is that it performs fewer Rayleigh–Ritz calculations compared to existing algorithms such as the locally optimal block preconditioned conjugate gradient or the Davidson algorithm. It is a block algorithm, and hence can take advantage of efficient BLAS3 operations and be implemented with multiple levels of concurrency. We discuss a number of practical issues that must be addressed in order to implement the algorithm efficiently on a high performance computer
Cloud Computing Task Scheduling Based on Cultural Genetic Algorithm
Directory of Open Access Journals (Sweden)
Li Jian-Wen
2016-01-01
Full Text Available The task scheduling strategy based on cultural genetic algorithm(CGA is proposed in order to improve the efficiency of task scheduling in the cloud computing platform, which targets at minimizing the total time and cost of task scheduling. The improved genetic algorithm is used to construct the main population space and knowledge space under cultural framework which get independent parallel evolution, forming a mechanism of mutual promotion to dispatch the cloud task. Simultaneously, in order to prevent the defects of the genetic algorithm which is easy to fall into local optimum, the non-uniform mutation operator is introduced to improve the search performance of the algorithm. The experimental results show that CGA reduces the total time and lowers the cost of the scheduling, which is an effective algorithm for the cloud task scheduling.
ProxImaL: efficient image optimization using proximal algorithms
Heide, Felix
2016-07-11
Computational photography systems are becoming increasingly diverse, while computational resources-for example on mobile platforms-are rapidly increasing. As diverse as these camera systems may be, slightly different variants of the underlying image processing tasks, such as demosaicking, deconvolution, denoising, inpainting, image fusion, and alignment, are shared between all of these systems. Formal optimization methods have recently been demonstrated to achieve state-of-the-art quality for many of these applications. Unfortunately, different combinations of natural image priors and optimization algorithms may be optimal for different problems, and implementing and testing each combination is currently a time-consuming and error-prone process. ProxImaL is a domain-specific language and compiler for image optimization problems that makes it easy to experiment with different problem formulations and algorithm choices. The language uses proximal operators as the fundamental building blocks of a variety of linear and nonlinear image formation models and cost functions, advanced image priors, and noise models. The compiler intelligently chooses the best way to translate a problem formulation and choice of optimization algorithm into an efficient solver implementation. In applications to the image processing pipeline, deconvolution in the presence of Poisson-distributed shot noise, and burst denoising, we show that a few lines of ProxImaL code can generate highly efficient solvers that achieve state-of-the-art results. We also show applications to the nonlinear and nonconvex problem of phase retrieval.
Directory of Open Access Journals (Sweden)
JongBeom Lim
2018-01-01
Full Text Available Many artificial intelligence applications often require a huge amount of computing resources. As a result, cloud computing adoption rates are increasing in the artificial intelligence field. To support the demand for artificial intelligence applications and guarantee the service level agreement, cloud computing should provide not only computing resources but also fundamental mechanisms for efficient computing. In this regard, a snapshot protocol has been used to create a consistent snapshot of the global state in cloud computing environments. However, the existing snapshot protocols are not optimized in the context of artificial intelligence applications, where large-scale iterative computation is the norm. In this paper, we present a distributed snapshot protocol for efficient artificial intelligence computation in cloud computing environments. The proposed snapshot protocol is based on a distributed algorithm to run interconnected multiple nodes in a scalable fashion. Our snapshot protocol is able to deal with artificial intelligence applications, in which a large number of computing nodes are running. We reveal that our distributed snapshot protocol guarantees the correctness, safety, and liveness conditions.
Neural network fusion capabilities for efficient implementation of tracking algorithms
Sundareshan, Malur K.; Amoozegar, Farid
1997-03-01
The ability to efficiently fuse information of different forms to facilitate intelligent decision making is one of the major capabilities of trained multilayer neural networks that is now being recognized. While development of innovative adaptive control algorithms for nonlinear dynamical plants that attempt to exploit these capabilities seems to be more popular, a corresponding development of nonlinear estimation algorithms using these approaches, particularly for application in target surveillance and guidance operations, has not received similar attention. We describe the capabilities and functionality of neural network algorithms for data fusion and implementation of tracking filters. To discuss details and to serve as a vehicle for quantitative performance evaluations, the illustrative case of estimating the position and velocity of surveillance targets is considered. Efficient target- tracking algorithms that can utilize data from a host of sensing modalities and are capable of reliably tracking even uncooperative targets executing fast and complex maneuvers are of interest in a number of applications. The primary motivation for employing neural networks in these applications comes from the efficiency with which more features extracted from different sensor measurements can be utilized as inputs for estimating target maneuvers. A system architecture that efficiently integrates the fusion capabilities of a trained multilayer neural net with the tracking performance of a Kalman filter is described. The innovation lies in the way the fusion of multisensor data is accomplished to facilitate improved estimation without increasing the computational complexity of the dynamical state estimator itself.
Variation in efficiency of parallel algorithms. [for study of stiffness matrices in planar trusses
Hayashi, A.; Melosh, R. J.; Utku, S.; Salama, M.
1985-01-01
The present study has the objective to investigate some iterative parallel-processor linear equation solving algorithms with respect to efficiency for analyses of typical linear engineering systems. Attention is given to a set of n linear equations, Ku = p, where K = an n x n positive definite, sparsely populated, symmetric matrix, u = an n x 1 vector of unknown responses, and p = an n x 1 vector of prescribed constants. This study is concerned with a hybrid method in which iteration is used to solve the problem, while a direct method is used on the local processor level. Variations in the efficiency of parallel algorithms are explored. Measures of the efficiency are based on computer experiments regarding the algorithms. For all the algorithms, the wall clock time is found to decrease as the number of processors increases.
Efficient sequential and parallel algorithms for record linkage.
Mamun, Abdullah-Al; Mi, Tian; Aseltine, Robert; Rajasekaran, Sanguthevar
2014-01-01
Integrating data from multiple sources is a crucial and challenging problem. Even though there exist numerous algorithms for record linkage or deduplication, they suffer from either large time needs or restrictions on the number of datasets that they can integrate. In this paper we report efficient sequential and parallel algorithms for record linkage which handle any number of datasets and outperform previous algorithms. Our algorithms employ hierarchical clustering algorithms as the basis. A key idea that we use is radix sorting on certain attributes to eliminate identical records before any further processing. Another novel idea is to form a graph that links similar records and find the connected components. Our sequential and parallel algorithms have been tested on a real dataset of 1,083,878 records and synthetic datasets ranging in size from 50,000 to 9,000,000 records. Our sequential algorithm runs at least two times faster, for any dataset, than the previous best-known algorithm, the two-phase algorithm using faster computation of the edit distance (TPA (FCED)). The speedups obtained by our parallel algorithm are almost linear. For example, we get a speedup of 7.5 with 8 cores (residing in a single node), 14.1 with 16 cores (residing in two nodes), and 26.4 with 32 cores (residing in four nodes). We have compared the performance of our sequential algorithm with TPA (FCED) and found that our algorithm outperforms the previous one. The accuracy is the same as that of this previous best-known algorithm.
I/O-Efficient Computation of Water Flow Across a Terrain
DEFF Research Database (Denmark)
Arge, Lars Allan; Revsbæk, Morten; Zeh, Norbert
2010-01-01
). We present an I/O-efficient algorithm that solves this problem using O(sort(X) log (X/M) + sort(N)) I/Os, where N is the number of terrain vertices, X is the number of pits of the terrain, sort(N) is the cost of sorting N data items, and M is the size of the computer's main memory. Our algorithm...
A hardware-oriented concurrent TZ search algorithm for High-Efficiency Video Coding
Doan, Nghia; Kim, Tae Sung; Rhee, Chae Eun; Lee, Hyuk-Jae
2017-12-01
High-Efficiency Video Coding (HEVC) is the latest video coding standard, in which the compression performance is double that of its predecessor, the H.264/AVC standard, while the video quality remains unchanged. In HEVC, the test zone (TZ) search algorithm is widely used for integer motion estimation because it effectively searches the good-quality motion vector with a relatively small amount of computation. However, the complex computation structure of the TZ search algorithm makes it difficult to implement it in the hardware. This paper proposes a new integer motion estimation algorithm which is designed for hardware execution by modifying the conventional TZ search to allow parallel motion estimations of all prediction unit (PU) partitions. The algorithm consists of the three phases of zonal, raster, and refinement searches. At the beginning of each phase, the algorithm obtains the search points required by the original TZ search for all PU partitions in a coding unit (CU). Then, all redundant search points are removed prior to the estimation of the motion costs, and the best search points are then selected for all PUs. Compared to the conventional TZ search algorithm, experimental results show that the proposed algorithm significantly decreases the Bjøntegaard Delta bitrate (BD-BR) by 0.84%, and it also reduces the computational complexity by 54.54%.
International Nuclear Information System (INIS)
Zeeberg, B.R.; Bacharach, S.; Carson, R.; Green, M.V.; Larson, S.M.; Soucaille, J.F.
1985-01-01
An algorithm is presented which permits the reconstruction of SPECT images in the presence of spatially varying attenuation. The algorithm considers the spatially variant attenuation as a perturbation of the constant attenuation case and computes a reconstructed image and a correction image to estimate the effects of this perturbation. The corrected image will be computed from these two images and is of comparable quality both visually and quantitatively to those simulated for zero or constant attenuation taken as standard reference images. In addition, the algorithm is time efficient, in that the time required is approximately 2.5 times that for a standard convolution-back projection algorithm
Energy Technology Data Exchange (ETDEWEB)
Hough, Patricia Diane (Sandia National Laboratories, Livermore, CA); Gray, Genetha Anne (Sandia National Laboratories, Livermore, CA); Castro, Joseph Pete Jr. (; .); Giunta, Anthony Andrew
2006-01-01
Many engineering application problems use optimization algorithms in conjunction with numerical simulators to search for solutions. The formulation of relevant objective functions and constraints dictate possible optimization algorithms. Often, a gradient based approach is not possible since objective functions and constraints can be nonlinear, nonconvex, non-differentiable, or even discontinuous and the simulations involved can be computationally expensive. Moreover, computational efficiency and accuracy are desirable and also influence the choice of solution method. With the advent and increasing availability of massively parallel computers, computational speed has increased tremendously. Unfortunately, the numerical and model complexities of many problems still demand significant computational resources. Moreover, in optimization, these expenses can be a limiting factor since obtaining solutions often requires the completion of numerous computationally intensive simulations. Therefore, we propose a multifidelity optimization algorithm (MFO) designed to improve the computational efficiency of an optimization method for a wide range of applications. In developing the MFO algorithm, we take advantage of the interactions between multi fidelity models to develop a dynamic and computational time saving optimization algorithm. First, a direct search method is applied to the high fidelity model over a reduced design space. In conjunction with this search, a specialized oracle is employed to map the design space of this high fidelity model to that of a computationally cheaper low fidelity model using space mapping techniques. Then, in the low fidelity space, an optimum is obtained using gradient or non-gradient based optimization, and it is mapped back to the high fidelity space. In this paper, we describe the theory and implementation details of our MFO algorithm. We also demonstrate our MFO method on some example problems and on two applications: earth penetrators and
Efficient Backprojection-Based Synthetic Aperture Radar Computation with Many-Core Processors
Directory of Open Access Journals (Sweden)
Jongsoo Park
2013-01-01
Full Text Available Tackling computationally challenging problems with high efficiency often requires the combination of algorithmic innovation, advanced architecture, and thorough exploitation of parallelism. We demonstrate this synergy through synthetic aperture radar (SAR via backprojection, an image reconstruction method that can require hundreds of TFLOPS. Computation cost is significantly reduced by our new algorithm of approximate strength reduction; data movement cost is economized by software locality optimizations facilitated by advanced architecture support; parallelism is fully harnessed in various patterns and granularities. We deliver over 35 billion backprojections per second throughput per compute node on an Intel® Xeon® processor E5-2670-based cluster, equipped with Intel® Xeon Phi™ coprocessors. This corresponds to processing a 3K×3K image within a second using a single node. Our study can be extended to other settings: backprojection is applicable elsewhere including medical imaging, approximate strength reduction is a general code transformation technique, and many-core processors are emerging as a solution to energy-efficient computing.
An efficient algorithm for nucleolus and prekernel computation in some classes of TU-games
Faigle, U.; Kern, Walter; Kuipers, J.
1998-01-01
We consider classes of TU-games. We show that we can efficiently compute an allocation in the intersection of the prekernel and the least core of the game if we can efficiently compute the minimum excess for any given allocation. In the case where the prekernel of the game contains exactly one core
Quantum Computation and Algorithms
International Nuclear Information System (INIS)
Biham, O.; Biron, D.; Biham, E.; Grassi, M.; Lidar, D.A.
1999-01-01
It is now firmly established that quantum algorithms provide a substantial speedup over classical algorithms for a variety of problems, including the factorization of large numbers and the search for a marked element in an unsorted database. In this talk I will review the principles of quantum algorithms, the basic quantum gates and their operation. The combination of superposition and interference, that makes these algorithms efficient, will be discussed. In particular, Grover's search algorithm will be presented as an example. I will show that the time evolution of the amplitudes in Grover's algorithm can be found exactly using recursion equations, for any initial amplitude distribution
Directory of Open Access Journals (Sweden)
Vladimir A. Batura
2014-11-01
Full Text Available The efficiency of orthogonal transformations application in the frequency algorithms of the digital watermarking of still images is examined. Discrete Hadamard transform, discrete cosine transform and discrete Haar transform are selected. Their effectiveness is determined by the invisibility of embedded in digital image watermark and its resistance to the most common image processing operations: JPEG-compression, noising, changing of the brightness and image size, histogram equalization. The algorithm for digital watermarking and its embedding parameters remain unchanged at these orthogonal transformations. Imperceptibility of embedding is defined by the peak signal to noise ratio, watermark stability– by Pearson's correlation coefficient. Embedding is considered to be invisible, if the value of the peak signal to noise ratio is not less than 43 dB. Embedded watermark is considered to be resistant to a specific attack, if the Pearson’s correlation coefficient is not less than 0.5. Elham algorithm based on the image entropy is chosen for computing experiment. Computing experiment is carried out according to the following algorithm: embedding of a digital watermark in low-frequency area of the image (container by Elham algorithm, exposure to a harmful influence on the protected image (cover image, extraction of a digital watermark. These actions are followed by quality assessment of cover image and watermark on the basis of which efficiency of orthogonal transformation is defined. As a result of computing experiment it was determined that the choice of the specified orthogonal transformations at identical algorithm and parameters of embedding doesn't influence the degree of imperceptibility for a watermark. Efficiency of discrete Hadamard transform and discrete cosine transformation in relation to the attacks chosen for experiment was established based on the correlation indicators. Application of discrete Hadamard transform increases
An efficient quantum algorithm for spectral estimation
Steffens, Adrian; Rebentrost, Patrick; Marvian, Iman; Eisert, Jens; Lloyd, Seth
2017-03-01
We develop an efficient quantum implementation of an important signal processing algorithm for line spectral estimation: the matrix pencil method, which determines the frequencies and damping factors of signals consisting of finite sums of exponentially damped sinusoids. Our algorithm provides a quantum speedup in a natural regime where the sampling rate is much higher than the number of sinusoid components. Along the way, we develop techniques that are expected to be useful for other quantum algorithms as well—consecutive phase estimations to efficiently make products of asymmetric low rank matrices classically accessible and an alternative method to efficiently exponentiate non-Hermitian matrices. Our algorithm features an efficient quantum-classical division of labor: the time-critical steps are implemented in quantum superposition, while an interjacent step, requiring much fewer parameters, can operate classically. We show that frequencies and damping factors can be obtained in time logarithmic in the number of sampling points, exponentially faster than known classical algorithms.
Convolutional networks for fast, energy-efficient neuromorphic computing.
Esser, Steven K; Merolla, Paul A; Arthur, John V; Cassidy, Andrew S; Appuswamy, Rathinakumar; Andreopoulos, Alexander; Berg, David J; McKinstry, Jeffrey L; Melano, Timothy; Barch, Davis R; di Nolfo, Carmelo; Datta, Pallab; Amir, Arnon; Taba, Brian; Flickner, Myron D; Modha, Dharmendra S
2016-10-11
Deep networks are now able to achieve human-level performance on a broad spectrum of recognition tasks. Independently, neuromorphic computing has now demonstrated unprecedented energy-efficiency through a new chip architecture based on spiking neurons, low precision synapses, and a scalable communication network. Here, we demonstrate that neuromorphic computing, despite its novel architectural primitives, can implement deep convolution networks that (i) approach state-of-the-art classification accuracy across eight standard datasets encompassing vision and speech, (ii) perform inference while preserving the hardware's underlying energy-efficiency and high throughput, running on the aforementioned datasets at between 1,200 and 2,600 frames/s and using between 25 and 275 mW (effectively >6,000 frames/s per Watt), and (iii) can be specified and trained using backpropagation with the same ease-of-use as contemporary deep learning. This approach allows the algorithmic power of deep learning to be merged with the efficiency of neuromorphic processors, bringing the promise of embedded, intelligent, brain-inspired computing one step closer.
Fast algorithms for computing defects and their derivatives in the Regge calculus
International Nuclear Information System (INIS)
Brewin, Leo
2011-01-01
Any practical attempt to solve the Regge equations, these being a large system of non-linear algebraic equations, will almost certainly employ a Newton-Raphson-like scheme. In such cases, it is essential that efficient algorithms be used when computing the defect angles and their derivatives with respect to the leg lengths. The purpose of this paper is to present details of such an algorithm.
Quantum Computations: Fundamentals and Algorithms
International Nuclear Information System (INIS)
Duplij, S.A.; Shapoval, I.I.
2007-01-01
Basic concepts of quantum information theory, principles of quantum calculations and the possibility of creation on this basis unique on calculation power and functioning principle device, named quantum computer, are concerned. The main blocks of quantum logic, schemes of quantum calculations implementation, as well as some known today effective quantum algorithms, called to realize advantages of quantum calculations upon classical, are presented here. Among them special place is taken by Shor's algorithm of number factorization and Grover's algorithm of unsorted database search. Phenomena of decoherence, its influence on quantum computer stability and methods of quantum errors correction are described
An Algorithm for Fast Computation of 3D Zernike Moments for Volumetric Images
Directory of Open Access Journals (Sweden)
Khalid M. Hosny
2012-01-01
Full Text Available An algorithm was proposed for very fast and low-complexity computation of three-dimensional Zernike moments. The 3D Zernike moments were expressed in terms of exact 3D geometric moments where the later are computed exactly through the mathematical integration of the monomial terms over the digital image/object voxels. A new symmetry-based method was proposed to compute 3D Zernike moments with 87% reduction in the computational complexity. A fast 1D cascade algorithm was also employed to add more complexity reduction. The comparison with existing methods was performed, where the numerical experiments and the complexity analysis ensured the efficiency of the proposed method especially with image and objects of large sizes.
Efficient Skyline Computation in Structured Peer-to-Peer Systems
DEFF Research Database (Denmark)
Cui, Bin; Chen, Lijiang; Xu, Linhao
2009-01-01
An increasing number of large-scale applications exploit peer-to-peer network architecture to provide highly scalable and flexible services. Among these applications, data management in peer-to-peer systems is one of the interesting domains. In this paper, we investigate the multidimensional...... skyline computation problem on a structured peer-to-peer network. In order to achieve low communication cost and quick response time, we utilize the iMinMax(\\theta ) method to transform high-dimensional data to one-dimensional value and distribute the data in a structured peer-to-peer network called BATON....... Thereafter, we propose a progressive algorithm with adaptive filter technique for efficient skyline computation in this environment. We further discuss some optimization techniques for the algorithm, and summarize the key principles of our algorithm into a query routing protocol with detailed analysis...
Hofstra, K.L.; Gerez, Sabih H.
2007-01-01
This paper addresses the efficient implementation of highperformance signal-processing algorithms. In early stages of such designs many computation-intensive simulations may be necessary. This calls for hardware description formalisms targeted for efficient simulation (such as the programming
Energy-efficient computing and networking. Revised selected papers
Energy Technology Data Exchange (ETDEWEB)
Hatziargyriou, Nikos; Dimeas, Aris [Ethnikon Metsovion Polytechneion, Athens (Greece); Weidlich, Anke (eds.) [SAP Research Center, Karlsruhe (Germany); Tomtsi, Thomai
2011-07-01
This book constitutes the postproceedings of the First International Conference on Energy-Efficient Computing and Networking, E-Energy, held in Passau, Germany in April 2010. The 23 revised papers presented were carefully reviewed and selected for inclusion in the post-proceedings. The papers are organized in topical sections on energy market and algorithms, ICT technology for the energy market, implementation of smart grid and smart home technology, microgrids and energy management, and energy efficiency through distributed energy management and buildings. (orig.)
Förster, Michael
2014-01-01
Numerical programs often use parallel programming techniques such as OpenMP to compute the program's output values as efficient as possible. In addition, derivative values of these output values with respect to certain input values play a crucial role. To achieve code that computes not only the output values simultaneously but also the derivative values, this work introduces several source-to-source transformation rules. These rules are based on a technique called algorithmic differentiation. The main focus of this work lies on the important reverse mode of algorithmic differentiation. The inh
ANNIT - An Efficient Inversion Algorithm based on Prediction Principles
Růžek, B.; Kolář, P.
2009-04-01
Solution of inverse problems represents meaningful job in geophysics. The amount of data is continuously increasing, methods of modeling are being improved and the computer facilities are also advancing great technical progress. Therefore the development of new and efficient algorithms and computer codes for both forward and inverse modeling is still up to date. ANNIT is contributing to this stream since it is a tool for efficient solution of a set of non-linear equations. Typical geophysical problems are based on parametric approach. The system is characterized by a vector of parameters p, the response of the system is characterized by a vector of data d. The forward problem is usually represented by unique mapping F(p)=d. The inverse problem is much more complex and the inverse mapping p=G(d) is available in an analytical or closed form only exceptionally and generally it may not exist at all. Technically, both forward and inverse mapping F and G are sets of non-linear equations. ANNIT solves such situation as follows: (i) joint subspaces {pD, pM} of original data and model spaces D, M, resp. are searched for, within which the forward mapping F is sufficiently smooth that the inverse mapping G does exist, (ii) numerical approximation of G in subspaces {pD, pM} is found, (iii) candidate solution is predicted by using this numerical approximation. ANNIT is working in an iterative way in cycles. The subspaces {pD, pM} are searched for by generating suitable populations of individuals (models) covering data and model spaces. The approximation of the inverse mapping is made by using three methods: (a) linear regression, (b) Radial Basis Function Network technique, (c) linear prediction (also known as "Kriging"). The ANNIT algorithm has built in also an archive of already evaluated models. Archive models are re-used in a suitable way and thus the number of forward evaluations is minimized. ANNIT is now implemented both in MATLAB and SCILAB. Numerical tests show good
Ltaief, Hatem
2011-08-31
This paper presents the power profile of two high performance dense linear algebra libraries i.e., LAPACK and PLASMA. The former is based on block algorithms that use the fork-join paradigm to achieve parallel performance. The latter uses fine-grained task parallelism that recasts the computation to operate on submatrices called tiles. In this way tile algorithms are formed. We show results from the power profiling of the most common routines, which permits us to clearly identify the different phases of the computations. This allows us to isolate the bottlenecks in terms of energy efficiency. Our results show that PLASMA surpasses LAPACK not only in terms of performance but also in terms of energy efficiency. © 2011 Springer-Verlag.
A Faster Algorithm for Computing Straight Skeletons
Cheng, Siu-Wing
2014-09-01
We present a new algorithm for computing the straight skeleton of a polygon. For a polygon with n vertices, among which r are reflex vertices, we give a deterministic algorithm that reduces the straight skeleton computation to a motorcycle graph computation in O(n (logn)logr) time. It improves on the previously best known algorithm for this reduction, which is randomized, and runs in expected O(n√h+1log2n) time for a polygon with h holes. Using known motorcycle graph algorithms, our result yields improved time bounds for computing straight skeletons. In particular, we can compute the straight skeleton of a non-degenerate polygon in O(n (logn) logr + r 4/3 + ε ) time for any ε > 0. On degenerate input, our time bound increases to O(n (logn) logr + r 17/11 + ε ).
A Faster Algorithm for Computing Straight Skeletons
Mencel, Liam A.
2014-05-06
We present a new algorithm for computing the straight skeleton of a polygon. For a polygon with n vertices, among which r are reflex vertices, we give a deterministic algorithm that reduces the straight skeleton computation to a motorcycle graph computation in O(n (log n) log r) time. It improves on the previously best known algorithm for this reduction, which is randomised, and runs in expected O(n √(h+1) log² n) time for a polygon with h holes. Using known motorcycle graph algorithms, our result yields improved time bounds for computing straight skeletons. In particular, we can compute the straight skeleton of a non-degenerate polygon in O(n (log n) log r + r^(4/3 + ε)) time for any ε > 0. On degenerate input, our time bound increases to O(n (log n) log r + r^(17/11 + ε))
A Faster Algorithm for Computing Straight Skeletons
Cheng, Siu-Wing; Mencel, Liam A.; Vigneron, Antoine E.
2014-01-01
We present a new algorithm for computing the straight skeleton of a polygon. For a polygon with n vertices, among which r are reflex vertices, we give a deterministic algorithm that reduces the straight skeleton computation to a motorcycle graph computation in O(n (logn)logr) time. It improves on the previously best known algorithm for this reduction, which is randomized, and runs in expected O(n√h+1log2n) time for a polygon with h holes. Using known motorcycle graph algorithms, our result yields improved time bounds for computing straight skeletons. In particular, we can compute the straight skeleton of a non-degenerate polygon in O(n (logn) logr + r 4/3 + ε ) time for any ε > 0. On degenerate input, our time bound increases to O(n (logn) logr + r 17/11 + ε ).
International Nuclear Information System (INIS)
Vignes, J.
1986-01-01
Any result of algorithms provided by a computer always contains an error resulting from floating-point arithmetic round-off error propagation. Furthermore signal processing algorithms are also generally performed with data containing errors. The permutation-perturbation method, also known under the name CESTAC (controle et estimation stochastique d'arrondi de calcul) is a very efficient practical method for evaluating these errors and consequently for estimating the exact significant decimal figures of any result of algorithms performed on a computer. The stochastic approach of this method, its probabilistic proof, and the perfect agreement between the theoretical and practical aspects are described in this paper [fr
A multiresolution approach to iterative reconstruction algorithms in X-ray computed tomography.
De Witte, Yoni; Vlassenbroeck, Jelle; Van Hoorebeke, Luc
2010-09-01
In computed tomography, the application of iterative reconstruction methods in practical situations is impeded by their high computational demands. Especially in high resolution X-ray computed tomography, where reconstruction volumes contain a high number of volume elements (several giga voxels), this computational burden prevents their actual breakthrough. Besides the large amount of calculations, iterative algorithms require the entire volume to be kept in memory during reconstruction, which quickly becomes cumbersome for large data sets. To overcome this obstacle, we present a novel multiresolution reconstruction, which greatly reduces the required amount of memory without significantly affecting the reconstructed image quality. It is shown that, combined with an efficient implementation on a graphical processing unit, the multiresolution approach enables the application of iterative algorithms in the reconstruction of large volumes at an acceptable speed using only limited resources.
Duality quantum algorithm efficiently simulates open quantum systems
Wei, Shi-Jie; Ruan, Dong; Long, Gui-Lu
2016-01-01
Because of inevitable coupling with the environment, nearly all practical quantum systems are open system, where the evolution is not necessarily unitary. In this paper, we propose a duality quantum algorithm for simulating Hamiltonian evolution of an open quantum system. In contrast to unitary evolution in a usual quantum computer, the evolution operator in a duality quantum computer is a linear combination of unitary operators. In this duality quantum algorithm, the time evolution of the open quantum system is realized by using Kraus operators which is naturally implemented in duality quantum computer. This duality quantum algorithm has two distinct advantages compared to existing quantum simulation algorithms with unitary evolution operations. Firstly, the query complexity of the algorithm is O(d3) in contrast to O(d4) in existing unitary simulation algorithm, where d is the dimension of the open quantum system. Secondly, By using a truncated Taylor series of the evolution operators, this duality quantum algorithm provides an exponential improvement in precision compared with previous unitary simulation algorithm. PMID:27464855
A Randomized Exchange Algorithm for Computing Optimal Approximate Designs of Experiments
Harman, Radoslav; Filová , Lenka; Richtarik, Peter
2018-01-01
We propose a class of subspace ascent methods for computing optimal approximate designs that covers both existing as well as new and more efficient algorithms. Within this class of methods, we construct a simple, randomized exchange algorithm (REX). Numerical comparisons suggest that the performance of REX is comparable or superior to the performance of state-of-the-art methods across a broad range of problem structures and sizes. We focus on the most commonly used criterion of D-optimality that also has applications beyond experimental design, such as the construction of the minimum volume ellipsoid containing a given set of data-points. For D-optimality, we prove that the proposed algorithm converges to the optimum. We also provide formulas for the optimal exchange of weights in the case of the criterion of A-optimality. These formulas enable one to use REX for computing A-optimal and I-optimal designs.
A Randomized Exchange Algorithm for Computing Optimal Approximate Designs of Experiments
Harman, Radoslav
2018-01-17
We propose a class of subspace ascent methods for computing optimal approximate designs that covers both existing as well as new and more efficient algorithms. Within this class of methods, we construct a simple, randomized exchange algorithm (REX). Numerical comparisons suggest that the performance of REX is comparable or superior to the performance of state-of-the-art methods across a broad range of problem structures and sizes. We focus on the most commonly used criterion of D-optimality that also has applications beyond experimental design, such as the construction of the minimum volume ellipsoid containing a given set of data-points. For D-optimality, we prove that the proposed algorithm converges to the optimum. We also provide formulas for the optimal exchange of weights in the case of the criterion of A-optimality. These formulas enable one to use REX for computing A-optimal and I-optimal designs.
Convolutional networks for fast, energy-efficient neuromorphic computing
Esser, Steven K.; Merolla, Paul A.; Arthur, John V.; Cassidy, Andrew S.; Appuswamy, Rathinakumar; Andreopoulos, Alexander; Berg, David J.; McKinstry, Jeffrey L.; Melano, Timothy; Barch, Davis R.; di Nolfo, Carmelo; Datta, Pallab; Amir, Arnon; Taba, Brian; Flickner, Myron D.; Modha, Dharmendra S.
2016-01-01
Deep networks are now able to achieve human-level performance on a broad spectrum of recognition tasks. Independently, neuromorphic computing has now demonstrated unprecedented energy-efficiency through a new chip architecture based on spiking neurons, low precision synapses, and a scalable communication network. Here, we demonstrate that neuromorphic computing, despite its novel architectural primitives, can implement deep convolution networks that (i) approach state-of-the-art classification accuracy across eight standard datasets encompassing vision and speech, (ii) perform inference while preserving the hardware’s underlying energy-efficiency and high throughput, running on the aforementioned datasets at between 1,200 and 2,600 frames/s and using between 25 and 275 mW (effectively >6,000 frames/s per Watt), and (iii) can be specified and trained using backpropagation with the same ease-of-use as contemporary deep learning. This approach allows the algorithmic power of deep learning to be merged with the efficiency of neuromorphic processors, bringing the promise of embedded, intelligent, brain-inspired computing one step closer. PMID:27651489
Baiges Aznar, Joan; Bayona Roa, Camilo Andrés
2017-01-01
No separate or additional fees are collected for access to or distribution of the work. In this paper we present a novel algorithm for adaptive mesh refinement in computational physics meshes in a distributed memory parallel setting. The proposed method is developed for nodally based parallel domain partitions where the nodes of the mesh belong to a single processor, whereas the elements can belong to multiple processors. Some of the main features of the algorithm presented in this paper a...
Pirbhulal, Sandeep; Zhang, Heye; Mukhopadhyay, Subhas Chandra; Li, Chunyue; Wang, Yumei; Li, Guanglin; Wu, Wanqing; Zhang, Yuan-Ting
2015-06-26
Body Sensor Network (BSN) is a network of several associated sensor nodes on, inside or around the human body to monitor vital signals, such as, Electroencephalogram (EEG), Photoplethysmography (PPG), Electrocardiogram (ECG), etc. Each sensor node in BSN delivers major information; therefore, it is very significant to provide data confidentiality and security. All existing approaches to secure BSN are based on complex cryptographic key generation procedures, which not only demands high resource utilization and computation time, but also consumes large amount of energy, power and memory during data transmission. However, it is indispensable to put forward energy efficient and computationally less complex authentication technique for BSN. In this paper, a novel biometric-based algorithm is proposed, which utilizes Heart Rate Variability (HRV) for simple key generation process to secure BSN. Our proposed algorithm is compared with three data authentication techniques, namely Physiological Signal based Key Agreement (PSKA), Data Encryption Standard (DES) and Rivest Shamir Adleman (RSA). Simulation is performed in Matlab and results suggest that proposed algorithm is quite efficient in terms of transmission time utilization, average remaining energy and total power consumption.
Gong, Jian; Lou, Shuntian; Guo, Yiduo
2016-04-01
An estimation of signal parameters via a rotational invariance techniques-like (ESPRIT-like) algorithm is proposed to estimate the direction of arrival and direction of departure for bistatic multiple-input multiple-output (MIMO) radar. The properties of a noncircular signal and Euler's formula are first exploited to establish a real-valued bistatic MIMO radar array data, which is composed of sine and cosine data. Then the receiving/transmitting selective matrices are constructed to obtain the receiving/transmitting rotational invariance factors. Since the rotational invariance factor is a cosine function, symmetrical mirror angle ambiguity may occur. Finally, a maximum likelihood function is used to avoid the estimation ambiguities. Compared with the existing ESPRIT, the proposed algorithm can save about 75% of computational load owing to the real-valued ESPRIT algorithm. Simulation results confirm the effectiveness of the ESPRIT-like algorithm.
New accountant job market reform by computer algorithm: an experimental study
Directory of Open Access Journals (Sweden)
Hirose Yoshitaka
2017-01-01
Full Text Available The purpose of this study is to examine the matching of new accountants with accounting firms in Japan. A notable feature of the present study is that it brings a computer algorithm to the job-hiring task. Job recruitment activities for new accountants in Japan are one-time, short-term struggles. Accordingly, many have searched for new rules to replace the current ones of the process. Job recruitment activities for new accountants in Japan change every year. This study proposes modifying these job recruitment activities by combining computer and human efforts. Furthermore, the study formulates the job recruitment activities by using a model and conducting experiments. As a result, the Deferred Acceptance (DA algorithm derives a high truth-telling percentage, a stable matching percentage, and greater efficiency compared with the previous approach. This suggests the potential of the Deferred Acceptance algorithm as a replacement for current approaches. In terms of accurate percentage and stability, the DA algorithm is superior to the current methods and should be adopted.
Efficient Computation of Transition State Resonances and Reaction Rates from a Quantum Normal Form
Schubert, Roman; Waalkens, Holger; Wiggins, Stephen
2006-01-01
A quantum version of a recent formulation of transition state theory in phase space is presented. The theory developed provides an algorithm to compute quantum reaction rates and the associated Gamov-Siegert resonances with very high accuracy. The algorithm is especially efficient for
Chen, Gang; Yang, Bing; Zhang, Xiaoyun; Gao, Zhiyong
2017-07-01
The latest high efficiency video coding (HEVC) standard significantly increases the encoding complexity for improving its coding efficiency. Due to the limited computational capability of handheld devices, complexity constrained video coding has drawn great attention in recent years. A complexity control algorithm based on adaptive mode selection is proposed for interframe coding in HEVC. Considering the direct proportionality between encoding time and computational complexity, the computational complexity is measured in terms of encoding time. First, complexity is mapped to a target in terms of prediction modes. Then, an adaptive mode selection algorithm is proposed for the mode decision process. Specifically, the optimal mode combination scheme that is chosen through offline statistics is developed at low complexity. If the complexity budget has not been used up, an adaptive mode sorting method is employed to further improve coding efficiency. The experimental results show that the proposed algorithm achieves a very large complexity control range (as low as 10%) for the HEVC encoder while maintaining good rate-distortion performance. For the lowdelayP condition, compared with the direct resource allocation method and the state-of-the-art method, an average gain of 0.63 and 0.17 dB in BDPSNR is observed for 18 sequences when the target complexity is around 40%.
A micro-hydrology computation ordering algorithm
Croley, Thomas E.
1980-11-01
Discrete-distributed-parameter models are essential for watershed modelling where practical consideration of spatial variations in watershed properties and inputs is desired. Such modelling is necessary for analysis of detailed hydrologic impacts from management strategies and land-use effects. Trade-offs between model validity and model complexity exist in resolution of the watershed. Once these are determined, the watershed is then broken into sub-areas which each have essentially spatially-uniform properties. Lumped-parameter (micro-hydrology) models are applied to these sub-areas and their outputs are combined through the use of a computation ordering technique, as illustrated by many discrete-distributed-parameter hydrology models. Manual ordering of these computations requires fore-thought, and is tedious, error prone, sometimes storage intensive and least adaptable to changes in watershed resolution. A programmable algorithm for ordering micro-hydrology computations is presented that enables automatic ordering of computations within the computer via an easily understood and easily implemented "node" definition, numbering and coding scheme. This scheme and the algorithm are detailed in logic flow-charts and an example application is presented. Extensions and modifications of the algorithm are easily made for complex geometries or differing microhydrology models. The algorithm is shown to be superior to manual ordering techniques and has potential use in high-resolution studies.
A micro-hydrology computation ordering algorithm
International Nuclear Information System (INIS)
Croley, T.E. II
1980-01-01
Discrete-distributed-parameter models are essential for watershed modelling where practical consideration of spatial variations in watershed properties and inputs is desired. Such modelling is necessary for analysis of detailed hydrologic impacts from management strategies and land-use effects. Trade-offs between model validity and model complexity exist in resolution of the watershed. Once these are determined, the watershed is then broken into sub-areas which each have essentially spatially-uniform properties. Lumped-parameter (micro-hydrology) models are applied to these sub-areas and their outputs are combined through the use of a computation ordering technique, as illustrated by many discrete-distributed-parameter hydrology models. Manual ordering of these computations requires fore-thought, and is tedious, error prone, sometimes storage intensive and least adaptable to changes in watershed resolution. A programmable algorithm for ordering micro-hydrology computations is presented that enables automatic ordering of computations within the computer via an easily understood and easily implemented node definition, numbering and coding scheme. This scheme and the algorithm are detailed in logic flow-charts and an example application is presented. Extensions and modifications of the algorithm are easily made for complex geometries or differing micro-hydrology models. The algorithm is shown to be superior to manual ordering techniques and has potential use in high-resolution studies. (orig.)
An Efficient Sleepy Algorithm for Particle-Based Fluids
Directory of Open Access Journals (Sweden)
Xiao Nie
2014-01-01
Full Text Available We present a novel Smoothed Particle Hydrodynamics (SPH based algorithm for efficiently simulating compressible and weakly compressible particle fluids. Prior particle-based methods simulate all fluid particles; however, in many cases some particles appearing to be at rest can be safely ignored without notably affecting the fluid flow behavior. To identify these particles, a novel sleepy strategy is introduced. By utilizing this strategy, only a portion of the fluid particles requires computational resources; thus an obvious performance gain can be achieved. In addition, in order to resolve unphysical clumping issue due to tensile instability in SPH based methods, a new artificial repulsive force is provided. We demonstrate that our approach can be easily integrated with existing SPH based methods to improve the efficiency without sacrificing visual quality.
Computational algorithm for molybdenite concentrate annealing
International Nuclear Information System (INIS)
Alkatseva, V.M.
1995-01-01
Computational algorithm is presented for annealing of molybdenite concentrate with granulated return dust and that of granulated molybdenite concentrate. The algorithm differs from the known analogies for sulphide raw material annealing by including the calculation of return dust mass in stationary annealing; the latter quantity varies form the return dust mass value obtained in the first iteration step. Masses of solid products are determined by distribution of concentrate annealing products, including return dust and benthonite. The algorithm is applied to computations for annealing of other sulphide materials. 3 refs
Efficient parallel and out of core algorithms for constructing large bi-directed de Bruijn graphs
Directory of Open Access Journals (Sweden)
Vaughn Matthew
2010-11-01
Full Text Available Abstract Background Assembling genomic sequences from a set of overlapping reads is one of the most fundamental problems in computational biology. Algorithms addressing the assembly problem fall into two broad categories - based on the data structures which they employ. The first class uses an overlap/string graph and the second type uses a de Bruijn graph. However with the recent advances in short read sequencing technology, de Bruijn graph based algorithms seem to play a vital role in practice. Efficient algorithms for building these massive de Bruijn graphs are very essential in large sequencing projects based on short reads. In an earlier work, an O(n/p time parallel algorithm has been given for this problem. Here n is the size of the input and p is the number of processors. This algorithm enumerates all possible bi-directed edges which can overlap with a node and ends up generating Θ(nΣ messages (Σ being the size of the alphabet. Results In this paper we present a Θ(n/p time parallel algorithm with a communication complexity that is equal to that of parallel sorting and is not sensitive to Σ. The generality of our algorithm makes it very easy to extend it even to the out-of-core model and in this case it has an optimal I/O complexity of Θ(nlog(n/BBlog(M/B (M being the main memory size and B being the size of the disk block. We demonstrate the scalability of our parallel algorithm on a SGI/Altix computer. A comparison of our algorithm with the previous approaches reveals that our algorithm is faster - both asymptotically and practically. We demonstrate the scalability of our sequential out-of-core algorithm by comparing it with the algorithm used by VELVET to build the bi-directed de Bruijn graph. Our experiments reveal that our algorithm can build the graph with a constant amount of memory, which clearly outperforms VELVET. We also provide efficient algorithms for the bi-directed chain compaction problem. Conclusions The bi
Efficient parallel and out of core algorithms for constructing large bi-directed de Bruijn graphs.
Kundeti, Vamsi K; Rajasekaran, Sanguthevar; Dinh, Hieu; Vaughn, Matthew; Thapar, Vishal
2010-11-15
Assembling genomic sequences from a set of overlapping reads is one of the most fundamental problems in computational biology. Algorithms addressing the assembly problem fall into two broad categories - based on the data structures which they employ. The first class uses an overlap/string graph and the second type uses a de Bruijn graph. However with the recent advances in short read sequencing technology, de Bruijn graph based algorithms seem to play a vital role in practice. Efficient algorithms for building these massive de Bruijn graphs are very essential in large sequencing projects based on short reads. In an earlier work, an O(n/p) time parallel algorithm has been given for this problem. Here n is the size of the input and p is the number of processors. This algorithm enumerates all possible bi-directed edges which can overlap with a node and ends up generating Θ(nΣ) messages (Σ being the size of the alphabet). In this paper we present a Θ(n/p) time parallel algorithm with a communication complexity that is equal to that of parallel sorting and is not sensitive to Σ. The generality of our algorithm makes it very easy to extend it even to the out-of-core model and in this case it has an optimal I/O complexity of Θ(nlog(n/B)Blog(M/B)) (M being the main memory size and B being the size of the disk block). We demonstrate the scalability of our parallel algorithm on a SGI/Altix computer. A comparison of our algorithm with the previous approaches reveals that our algorithm is faster--both asymptotically and practically. We demonstrate the scalability of our sequential out-of-core algorithm by comparing it with the algorithm used by VELVET to build the bi-directed de Bruijn graph. Our experiments reveal that our algorithm can build the graph with a constant amount of memory, which clearly outperforms VELVET. We also provide efficient algorithms for the bi-directed chain compaction problem. The bi-directed de Bruijn graph is a fundamental data structure for
Towards the Automatic Detection of Efficient Computing Assets in a Heterogeneous Cloud Environment
Iglesias, Jesus Omana; Stokes, Nicola; Ventresque, Anthony; Murphy, Liam, B.E.; Thorburn, James
2013-01-01
peer-reviewed In a heterogeneous cloud environment, the manual grading of computing assets is the first step in the process of configuring IT infrastructures to ensure optimal utilization of resources. Grading the efficiency of computing assets is however, a difficult, subjective and time consuming manual task. Thus, an automatic efficiency grading algorithm is highly desirable. In this paper, we compare the effectiveness of the different criteria used in the manual gr...
An efficient algorithm for 3D space time kinetics simulations for large PHWRs
International Nuclear Information System (INIS)
Jain, Ishi; Fernando, M.P.S.; Kumar, A.N.
2012-01-01
In nuclear reactor physics and allied areas like shielding, various forms of neutron transport equation or its approximation namely the diffusion equation have to be solved to estimate neutron flux distribution. This paper presents an efficient algorithm yielding accurate results along with promising gain in computational work. (author)
High-order hydrodynamic algorithms for exascale computing
Energy Technology Data Exchange (ETDEWEB)
Morgan, Nathaniel Ray [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
2016-02-05
Hydrodynamic algorithms are at the core of many laboratory missions ranging from simulating ICF implosions to climate modeling. The hydrodynamic algorithms commonly employed at the laboratory and in industry (1) typically lack requisite accuracy for complex multi- material vortical flows and (2) are not well suited for exascale computing due to poor data locality and poor FLOP/memory ratios. Exascale computing requires advances in both computer science and numerical algorithms. We propose to research the second requirement and create a new high-order hydrodynamic algorithm that has superior accuracy, excellent data locality, and excellent FLOP/memory ratios. This proposal will impact a broad range of research areas including numerical theory, discrete mathematics, vorticity evolution, gas dynamics, interface instability evolution, turbulent flows, fluid dynamics and shock driven flows. If successful, the proposed research has the potential to radically transform simulation capabilities and help position the laboratory for computing at the exascale.
MRPack: Multi-Algorithm Execution Using Compute-Intensive Approach in MapReduce
2015-01-01
Large quantities of data have been generated from multiple sources at exponential rates in the last few years. These data are generated at high velocity as real time and streaming data in variety of formats. These characteristics give rise to challenges in its modeling, computation, and processing. Hadoop MapReduce (MR) is a well known data-intensive distributed processing framework using the distributed file system (DFS) for Big Data. Current implementations of MR only support execution of a single algorithm in the entire Hadoop cluster. In this paper, we propose MapReducePack (MRPack), a variation of MR that supports execution of a set of related algorithms in a single MR job. We exploit the computational capability of a cluster by increasing the compute-intensiveness of MapReduce while maintaining its data-intensive approach. It uses the available computing resources by dynamically managing the task assignment and intermediate data. Intermediate data from multiple algorithms are managed using multi-key and skew mitigation strategies. The performance study of the proposed system shows that it is time, I/O, and memory efficient compared to the default MapReduce. The proposed approach reduces the execution time by 200% with an approximate 50% decrease in I/O cost. Complexity and qualitative results analysis shows significant performance improvement. PMID:26305223
A-VCI: A flexible method to efficiently compute vibrational spectra
Odunlami, Marc; Le Bris, Vincent; Bégué, Didier; Baraille, Isabelle; Coulaud, Olivier
2017-06-01
The adaptive vibrational configuration interaction algorithm has been introduced as a new method to efficiently reduce the dimension of the set of basis functions used in a vibrational configuration interaction process. It is based on the construction of nested bases for the discretization of the Hamiltonian operator according to a theoretical criterion that ensures the convergence of the method. In the present work, the Hamiltonian is written as a sum of products of operators. The purpose of this paper is to study the properties and outline the performance details of the main steps of the algorithm. New parameters have been incorporated to increase flexibility, and their influence has been thoroughly investigated. The robustness and reliability of the method are demonstrated for the computation of the vibrational spectrum up to 3000 cm-1 of a widely studied 6-atom molecule (acetonitrile). Our results are compared to the most accurate up to date computation; we also give a new reference calculation for future work on this system. The algorithm has also been applied to a more challenging 7-atom molecule (ethylene oxide). The computed spectrum up to 3200 cm-1 is the most accurate computation that exists today on such systems.
Bioinspired computation in combinatorial optimization: algorithms and their computational complexity
DEFF Research Database (Denmark)
Neumann, Frank; Witt, Carsten
2012-01-01
Bioinspired computation methods, such as evolutionary algorithms and ant colony optimization, are being applied successfully to complex engineering and combinatorial optimization problems, and it is very important that we understand the computational complexity of these algorithms. This tutorials...... problems. Classical single objective optimization is examined first. They then investigate the computational complexity of bioinspired computation applied to multiobjective variants of the considered combinatorial optimization problems, and in particular they show how multiobjective optimization can help...... to speed up bioinspired computation for single-objective optimization problems. The tutorial is based on a book written by the authors with the same title. Further information about the book can be found at www.bioinspiredcomputation.com....
An efficient macro-cell placement algorithm
Aarts, E.H.L.; Bont, de F.M.J.; Korst, J.H.M.; Rongen, J.M.J.
1991-01-01
A new approximation algorithm is presented for the efficient handling of large macro-cell placement problems. The algorithm combines simulated annealing with new features based on a hierarchical approach and a divide-and-conquer technique. Numerical results show that these features can lead to a
Efficient irregular wavefront propagation algorithms on Intel® Xeon Phi™
Gomes, Jeremias M.; Teodoro, George; de Melo, Alba; Kong, Jun; Kurc, Tahsin; Saltz, Joel H.
2015-01-01
We investigate the execution of the Irregular Wavefront Propagation Pattern (IWPP), a fundamental computing structure used in several image analysis operations, on the Intel® Xeon Phi™ co-processor. An efficient implementation of IWPP on the Xeon Phi is a challenging problem because of IWPP’s irregularity and the use of atomic instructions in the original IWPP algorithm to resolve race conditions. On the Xeon Phi, the use of SIMD and vectorization instructions is critical to attain high perfo...
Computationally efficient implementation of combustion chemistry in parallel PDF calculations
International Nuclear Information System (INIS)
Lu Liuyan; Lantz, Steven R.; Ren Zhuyin; Pope, Stephen B.
2009-01-01
In parallel calculations of combustion processes with realistic chemistry, the serial in situ adaptive tabulation (ISAT) algorithm [S.B. Pope, Computationally efficient implementation of combustion chemistry using in situ adaptive tabulation, Combustion Theory and Modelling, 1 (1997) 41-63; L. Lu, S.B. Pope, An improved algorithm for in situ adaptive tabulation, Journal of Computational Physics 228 (2009) 361-386] substantially speeds up the chemistry calculations on each processor. To improve the parallel efficiency of large ensembles of such calculations in parallel computations, in this work, the ISAT algorithm is extended to the multi-processor environment, with the aim of minimizing the wall clock time required for the whole ensemble. Parallel ISAT strategies are developed by combining the existing serial ISAT algorithm with different distribution strategies, namely purely local processing (PLP), uniformly random distribution (URAN), and preferential distribution (PREF). The distribution strategies enable the queued load redistribution of chemistry calculations among processors using message passing. They are implemented in the software x2f m pi, which is a Fortran 95 library for facilitating many parallel evaluations of a general vector function. The relative performance of the parallel ISAT strategies is investigated in different computational regimes via the PDF calculations of multiple partially stirred reactors burning methane/air mixtures. The results show that the performance of ISAT with a fixed distribution strategy strongly depends on certain computational regimes, based on how much memory is available and how much overlap exists between tabulated information on different processors. No one fixed strategy consistently achieves good performance in all the regimes. Therefore, an adaptive distribution strategy, which blends PLP, URAN and PREF, is devised and implemented. It yields consistently good performance in all regimes. In the adaptive parallel
Algorithms for energy efficiency in wireless sensor networks
Energy Technology Data Exchange (ETDEWEB)
Busse, M
2007-01-21
The recent advances in microsensor and semiconductor technology have opened a new field within computer science: the networking of small-sized sensors which are capable of sensing, processing, and communicating. Such wireless sensor networks offer new applications in the areas of habitat and environment monitoring, disaster control and operation, military and intelligence control, object tracking, video surveillance, traffic control, as well as in health care and home automation. It is likely that the deployed sensors will be battery-powered, which will limit the energy capacity significantly. Thus, energy efficiency becomes one of the main challenges that need to be taken into account, and the design of energy-efficient algorithms is a major contribution of this thesis. As the wireless communication in the network is one of the main energy consumers, we first consider in detail the characteristics of wireless communication. By using the embedded sensor board (ESB) platform recently developed by the Free University of Berlin, we analyze the means of forward error correction and propose an appropriate resync mechanism, which improves the communication between two ESB nodes substantially. Afterwards, we focus on the forwarding of data packets through the network. We present the algorithms energy-efficient forwarding (EEF), lifetime-efficient forwarding (LEF), and energy-efficient aggregation forwarding (EEAF). While EEF is designed to maximize the number of data bytes delivered per energy unit, LEF additionally takes into account the residual energy of forwarding nodes. In so doing, LEF further prolongs the lifetime of the network. Energy savings due to data aggregation and in-network processing are exploited by EEAF. Besides single-link forwarding, in which data packets are sent to only one forwarding node, we also study the impact of multi-link forwarding, which exploits the broadcast characteristics of the wireless medium by sending packets to several (potential
Algorithms for energy efficiency in wireless sensor networks
Energy Technology Data Exchange (ETDEWEB)
Busse, M.
2007-01-21
The recent advances in microsensor and semiconductor technology have opened a new field within computer science: the networking of small-sized sensors which are capable of sensing, processing, and communicating. Such wireless sensor networks offer new applications in the areas of habitat and environment monitoring, disaster control and operation, military and intelligence control, object tracking, video surveillance, traffic control, as well as in health care and home automation. It is likely that the deployed sensors will be battery-powered, which will limit the energy capacity significantly. Thus, energy efficiency becomes one of the main challenges that need to be taken into account, and the design of energy-efficient algorithms is a major contribution of this thesis. As the wireless communication in the network is one of the main energy consumers, we first consider in detail the characteristics of wireless communication. By using the embedded sensor board (ESB) platform recently developed by the Free University of Berlin, we analyze the means of forward error correction and propose an appropriate resync mechanism, which improves the communication between two ESB nodes substantially. Afterwards, we focus on the forwarding of data packets through the network. We present the algorithms energy-efficient forwarding (EEF), lifetime-efficient forwarding (LEF), and energy-efficient aggregation forwarding (EEAF). While EEF is designed to maximize the number of data bytes delivered per energy unit, LEF additionally takes into account the residual energy of forwarding nodes. In so doing, LEF further prolongs the lifetime of the network. Energy savings due to data aggregation and in-network processing are exploited by EEAF. Besides single-link forwarding, in which data packets are sent to only one forwarding node, we also study the impact of multi-link forwarding, which exploits the broadcast characteristics of the wireless medium by sending packets to several (potential
A Faster Algorithm for Computing Motorcycle Graphs
Vigneron, Antoine E.; Yan, Lie
2014-01-01
We present a new algorithm for computing motorcycle graphs that runs in (Formula presented.) time for any (Formula presented.), improving on all previously known algorithms. The main application of this result is to computing the straight skeleton of a polygon. It allows us to compute the straight skeleton of a non-degenerate polygon with (Formula presented.) holes in (Formula presented.) expected time. If all input coordinates are (Formula presented.)-bit rational numbers, we can compute the straight skeleton of a (possibly degenerate) polygon with (Formula presented.) holes in (Formula presented.) expected time. In particular, it means that we can compute the straight skeleton of a simple polygon in (Formula presented.) expected time if all input coordinates are (Formula presented.)-bit rationals, while all previously known algorithms have worst-case running time (Formula presented.). © 2014 Springer Science+Business Media New York.
A Faster Algorithm for Computing Motorcycle Graphs
Vigneron, Antoine E.
2014-08-29
We present a new algorithm for computing motorcycle graphs that runs in (Formula presented.) time for any (Formula presented.), improving on all previously known algorithms. The main application of this result is to computing the straight skeleton of a polygon. It allows us to compute the straight skeleton of a non-degenerate polygon with (Formula presented.) holes in (Formula presented.) expected time. If all input coordinates are (Formula presented.)-bit rational numbers, we can compute the straight skeleton of a (possibly degenerate) polygon with (Formula presented.) holes in (Formula presented.) expected time. In particular, it means that we can compute the straight skeleton of a simple polygon in (Formula presented.) expected time if all input coordinates are (Formula presented.)-bit rationals, while all previously known algorithms have worst-case running time (Formula presented.). © 2014 Springer Science+Business Media New York.
A subzone reconstruction algorithm for efficient staggered compatible remapping
Energy Technology Data Exchange (ETDEWEB)
Starinshak, D.P., E-mail: starinshak1@llnl.gov; Owen, J.M., E-mail: mikeowen@llnl.gov
2015-09-01
Staggered-grid Lagrangian hydrodynamics algorithms frequently make use of subzonal discretization of state variables for the purposes of improved numerical accuracy, generality to unstructured meshes, and exact conservation of mass, momentum, and energy. For Arbitrary Lagrangian–Eulerian (ALE) methods using a geometric overlay, it is difficult to remap subzonal variables in an accurate and efficient manner due to the number of subzone–subzone intersections that must be computed. This becomes prohibitive in the case of 3D, unstructured, polyhedral meshes. A new procedure is outlined in this paper to avoid direct subzonal remapping. The new algorithm reconstructs the spatial profile of a subzonal variable using remapped zonal and nodal representations of the data. The reconstruction procedure is cast as an under-constrained optimization problem. Enforcing conservation at each zone and node on the remapped mesh provides the set of equality constraints; the objective function corresponds to a quadratic variation per subzone between the values to be reconstructed and a set of target reference values. Numerical results for various pure-remapping and hydrodynamics tests are provided. Ideas for extending the algorithm to staggered-grid radiation-hydrodynamics are discussed as well as ideas for generalizing the algorithm to include inequality constraints.
Efficient Maximum Likelihood Estimation for Pedigree Data with the Sum-Product Algorithm.
Engelhardt, Alexander; Rieger, Anna; Tresch, Achim; Mansmann, Ulrich
2016-01-01
We analyze data sets consisting of pedigrees with age at onset of colorectal cancer (CRC) as phenotype. The occurrence of familial clusters of CRC suggests the existence of a latent, inheritable risk factor. We aimed to compute the probability of a family possessing this risk factor as well as the hazard rate increase for these risk factor carriers. Due to the inheritability of this risk factor, the estimation necessitates a costly marginalization of the likelihood. We propose an improved EM algorithm by applying factor graphs and the sum-product algorithm in the E-step. This reduces the computational complexity from exponential to linear in the number of family members. Our algorithm is as precise as a direct likelihood maximization in a simulation study and a real family study on CRC risk. For 250 simulated families of size 19 and 21, the runtime of our algorithm is faster by a factor of 4 and 29, respectively. On the largest family (23 members) in the real data, our algorithm is 6 times faster. We introduce a flexible and runtime-efficient tool for statistical inference in biomedical event data with latent variables that opens the door for advanced analyses of pedigree data. © 2017 S. Karger AG, Basel.
Parallel computation of nondeterministic algorithms in VLSI
Energy Technology Data Exchange (ETDEWEB)
Hortensius, P D
1987-01-01
This work examines parallel VLSI implementations of nondeterministic algorithms. It is demonstrated that conventional pseudorandom number generators are unsuitable for highly parallel applications. Efficient parallel pseudorandom sequence generation can be accomplished using certain classes of elementary one-dimensional cellular automata. The pseudorandom numbers appear in parallel on each clock cycle. Extensive study of the properties of these new pseudorandom number generators is made using standard empirical random number tests, cycle length tests, and implementation considerations. Furthermore, it is shown these particular cellular automata can form the basis of efficient VLSI architectures for computations involved in the Monte Carlo simulation of both the percolation and Ising models from statistical mechanics. Finally, a variation on a Built-In Self-Test technique based upon cellular automata is presented. These Cellular Automata-Logic-Block-Observation (CALBO) circuits improve upon conventional design for testability circuitry.
In-Place Algorithms for Computing (Layers of) Maxima
DEFF Research Database (Denmark)
Blunck, Henrik; Vahrenhold, Jan
2010-01-01
We describe space-efficient algorithms for solving problems related to finding maxima among points in two and three dimensions. Our algorithms run in optimal time and occupy only constant extra......We describe space-efficient algorithms for solving problems related to finding maxima among points in two and three dimensions. Our algorithms run in optimal time and occupy only constant extra...
An algorithm to compute the square root of 3x3 positive definite matrix
International Nuclear Information System (INIS)
Franca, L.P.
1988-06-01
An efficient closed form to compute the square root of a 3 x 3 positive definite matrix is presented. The derivation employs the Cayley-Hamilton theorem avoiding calculation of eigenvectors. We show that evaluation of one eigenvalue of the square root matrix is needed and can not be circumvented. The algorithm is robust and efficient. (author) [pt
Structured Parallel Programming Patterns for Efficient Computation
McCool, Michael; Robison, Arch
2012-01-01
Programming is now parallel programming. Much as structured programming revolutionized traditional serial programming decades ago, a new kind of structured programming, based on patterns, is relevant to parallel programming today. Parallel computing experts and industry insiders Michael McCool, Arch Robison, and James Reinders describe how to design and implement maintainable and efficient parallel algorithms using a pattern-based approach. They present both theory and practice, and give detailed concrete examples using multiple programming models. Examples are primarily given using two of th
Computationally efficient optimisation algorithms for WECs arrays
DEFF Research Database (Denmark)
Ferri, Francesco
2017-01-01
In this paper two derivative-free global optimization algorithms are applied for the maximisation of the energy absorbed by wave energy converter (WEC) arrays. Wave energy is a large and mostly untapped source of energy that could have a key role in the future energy mix. The collection of this r...
Use of a genetic algorithm to solve two-fluid flow problems on an NCUBE multiprocessor computer
International Nuclear Information System (INIS)
Pryor, R.J.; Cline, D.D.
1992-01-01
A method of solving the two-phase fluid flow equations using a genetic algorithm on a NCUBE multiprocessor computer is presented. The topics discussed are the two-phase flow equations, the genetic representation of the unknowns, the fitness function, the genetic operators, and the implementation of the algorithm on the NCUBE computer. The efficiency of the implementation is investigated using a pipe blowdown problem. Effects of varying the genetic parameters and the number of processors are presented
Efficient MATLAB computations with sparse and factored tensors.
Energy Technology Data Exchange (ETDEWEB)
Bader, Brett William; Kolda, Tamara Gibson (Sandia National Lab, Livermore, CA)
2006-12-01
In this paper, the term tensor refers simply to a multidimensional or N-way array, and we consider how specially structured tensors allow for efficient storage and computation. First, we study sparse tensors, which have the property that the vast majority of the elements are zero. We propose storing sparse tensors using coordinate format and describe the computational efficiency of this scheme for various mathematical operations, including those typical to tensor decomposition algorithms. Second, we study factored tensors, which have the property that they can be assembled from more basic components. We consider two specific types: a Tucker tensor can be expressed as the product of a core tensor (which itself may be dense, sparse, or factored) and a matrix along each mode, and a Kruskal tensor can be expressed as the sum of rank-1 tensors. We are interested in the case where the storage of the components is less than the storage of the full tensor, and we demonstrate that many elementary operations can be computed using only the components. All of the efficiencies described in this paper are implemented in the Tensor Toolbox for MATLAB.
A simple algorithm for computing the smallest enclosing circle
DEFF Research Database (Denmark)
Skyum, Sven
1991-01-01
Presented is a simple O(n log n) algorithm for computing the smallest enclosing circle of a convex polygon. It can be easily extended to algorithms that compute the farthest-and the closest-point Voronoi diagram of a convex polygon within the same time bound.......Presented is a simple O(n log n) algorithm for computing the smallest enclosing circle of a convex polygon. It can be easily extended to algorithms that compute the farthest-and the closest-point Voronoi diagram of a convex polygon within the same time bound....
Li, X Y; Yang, G W; Zheng, D S; Guo, W S; Hung, W N N
2015-04-28
Genetic regulatory networks are the key to understanding biochemical systems. One condition of the genetic regulatory network under different living environments can be modeled as a synchronous Boolean network. The attractors of these Boolean networks will help biologists to identify determinant and stable factors. Existing methods identify attractors based on a random initial state or the entire state simultaneously. They cannot identify the fixed length attractors directly. The complexity of including time increases exponentially with respect to the attractor number and length of attractors. This study used the bounded model checking to quickly locate fixed length attractors. Based on the SAT solver, we propose a new algorithm for efficiently computing the fixed length attractors, which is more suitable for large Boolean networks and numerous attractors' networks. After comparison using the tool BooleNet, empirical experiments involving biochemical systems demonstrated the feasibility and efficiency of our approach.
Efficient computation method of Jacobian matrix
International Nuclear Information System (INIS)
Sasaki, Shinobu
1995-05-01
As well known, the elements of the Jacobian matrix are complex trigonometric functions of the joint angles, resulting in a matrix of staggering complexity when we write it all out in one place. This article addresses that difficulties to this subject are overcome by using velocity representation. The main point is that its recursive algorithm and computer algebra technologies allow us to derive analytical formulation with no human intervention. Particularly, it is to be noted that as compared to previous results the elements are extremely simplified throughout the effective use of frame transformations. Furthermore, in case of a spherical wrist, it is shown that the present approach is computationally most efficient. Due to such advantages, the proposed method is useful in studying kinematically peculiar properties such as singularity problems. (author)
Quantum Genetic Algorithms for Computer Scientists
Lahoz Beltrá, Rafael
2016-01-01
Genetic algorithms (GAs) are a class of evolutionary algorithms inspired by Darwinian natural selection. They are popular heuristic optimisation methods based on simulated genetic mechanisms, i.e., mutation, crossover, etc. and population dynamical processes such as reproduction, selection, etc. Over the last decade, the possibility to emulate a quantum computer (a computer using quantum-mechanical phenomena to perform operations on data) has led to a new class of GAs known as “Quantum Geneti...
Directory of Open Access Journals (Sweden)
Sandeep Pirbhulal
2015-06-01
Full Text Available Body Sensor Network (BSN is a network of several associated sensor nodes on, inside or around the human body to monitor vital signals, such as, Electroencephalogram (EEG, Photoplethysmography (PPG, Electrocardiogram (ECG, etc. Each sensor node in BSN delivers major information; therefore, it is very significant to provide data confidentiality and security. All existing approaches to secure BSN are based on complex cryptographic key generation procedures, which not only demands high resource utilization and computation time, but also consumes large amount of energy, power and memory during data transmission. However, it is indispensable to put forward energy efficient and computationally less complex authentication technique for BSN. In this paper, a novel biometric-based algorithm is proposed, which utilizes Heart Rate Variability (HRV for simple key generation process to secure BSN. Our proposed algorithm is compared with three data authentication techniques, namely Physiological Signal based Key Agreement (PSKA, Data Encryption Standard (DES and Rivest Shamir Adleman (RSA. Simulation is performed in Matlab and results suggest that proposed algorithm is quite efficient in terms of transmission time utilization, average remaining energy and total power consumption.
Pirbhulal, Sandeep; Zhang, Heye; Mukhopadhyay, Subhas Chandra; Li, Chunyue; Wang, Yumei; Li, Guanglin; Wu, Wanqing; Zhang, Yuan-Ting
2015-01-01
Body Sensor Network (BSN) is a network of several associated sensor nodes on, inside or around the human body to monitor vital signals, such as, Electroencephalogram (EEG), Photoplethysmography (PPG), Electrocardiogram (ECG), etc. Each sensor node in BSN delivers major information; therefore, it is very significant to provide data confidentiality and security. All existing approaches to secure BSN are based on complex cryptographic key generation procedures, which not only demands high resource utilization and computation time, but also consumes large amount of energy, power and memory during data transmission. However, it is indispensable to put forward energy efficient and computationally less complex authentication technique for BSN. In this paper, a novel biometric-based algorithm is proposed, which utilizes Heart Rate Variability (HRV) for simple key generation process to secure BSN. Our proposed algorithm is compared with three data authentication techniques, namely Physiological Signal based Key Agreement (PSKA), Data Encryption Standard (DES) and Rivest Shamir Adleman (RSA). Simulation is performed in Matlab and results suggest that proposed algorithm is quite efficient in terms of transmission time utilization, average remaining energy and total power consumption. PMID:26131666
Delayed Slater determinant update algorithms for high efficiency quantum Monte Carlo
McDaniel, T.; D'Azevedo, E. F.; Li, Y. W.; Wong, K.; Kent, P. R. C.
2017-11-01
Within ab initio Quantum Monte Carlo simulations, the leading numerical cost for large systems is the computation of the values of the Slater determinants in the trial wavefunction. Each Monte Carlo step requires finding the determinant of a dense matrix. This is most commonly iteratively evaluated using a rank-1 Sherman-Morrison updating scheme to avoid repeated explicit calculation of the inverse. The overall computational cost is, therefore, formally cubic in the number of electrons or matrix size. To improve the numerical efficiency of this procedure, we propose a novel multiple rank delayed update scheme. This strategy enables probability evaluation with an application of accepted moves to the matrices delayed until after a predetermined number of moves, K. The accepted events are then applied to the matrices en bloc with enhanced arithmetic intensity and computational efficiency via matrix-matrix operations instead of matrix-vector operations. This procedure does not change the underlying Monte Carlo sampling or its statistical efficiency. For calculations on large systems and algorithms such as diffusion Monte Carlo, where the acceptance ratio is high, order of magnitude improvements in the update time can be obtained on both multi-core central processing units and graphical processing units.
International Nuclear Information System (INIS)
Chen Jian-Lin; Li Lei; Wang Lin-Yuan; Cai Ai-Long; Xi Xiao-Qi; Zhang Han-Ming; Li Jian-Xin; Yan Bin
2015-01-01
The projection matrix model is used to describe the physical relationship between reconstructed object and projection. Such a model has a strong influence on projection and backprojection, two vital operations in iterative computed tomographic reconstruction. The distance-driven model (DDM) is a state-of-the-art technology that simulates forward and back projections. This model has a low computational complexity and a relatively high spatial resolution; however, it includes only a few methods in a parallel operation with a matched model scheme. This study introduces a fast and parallelizable algorithm to improve the traditional DDM for computing the parallel projection and backprojection operations. Our proposed model has been implemented on a GPU (graphic processing unit) platform and has achieved satisfactory computational efficiency with no approximation. The runtime for the projection and backprojection operations with our model is approximately 4.5 s and 10.5 s per loop, respectively, with an image size of 256×256×256 and 360 projections with a size of 512×512. We compare several general algorithms that have been proposed for maximizing GPU efficiency by using the unmatched projection/backprojection models in a parallel computation. The imaging resolution is not sacrificed and remains accurate during computed tomographic reconstruction. (paper)
Homogeneous Buchberger algorithms and Sullivant's computational commutative algebra challenge
DEFF Research Database (Denmark)
Lauritzen, Niels
2005-01-01
We give a variant of the homogeneous Buchberger algorithm for positively graded lattice ideals. Using this algorithm we solve the Sullivant computational commutative algebra challenge.......We give a variant of the homogeneous Buchberger algorithm for positively graded lattice ideals. Using this algorithm we solve the Sullivant computational commutative algebra challenge....
Efficient scheduling request algorithm for opportunistic wireless access
Nam, Haewoon; Alouini, Mohamed-Slim
2011-01-01
An efficient scheduling request algorithm for opportunistic wireless access based on user grouping is proposed in this paper. Similar to the well-known opportunistic splitting algorithm, the proposed algorithm initially adjusts (or lowers
Directory of Open Access Journals (Sweden)
A. Yu. Popov
2014-01-01
Full Text Available Algorithms of optimization in networks and direct graphs find a broad application when solving the practical tasks. However, along with large-scale introduction of information technologies in human activity, requirements for volumes of input data and retrieval rate of solution are aggravated. In spite of the fact that by now the large number of algorithms for the various models of computers and computing systems have been studied and implemented, the solution of key problems of optimization for real dimensions of tasks remains difficult. In this regard search of new and more efficient computing structures, as well as update of known algorithms are of great current interest.The work considers an implementation of the search-end algorithm of the maximum flow on the direct graph for multiple instructions and single data computer system (MISD developed in BMSTU. Key feature of this architecture is deep hardware support of operations over sets and structures of data. Functions of storage and access to them are realized on the specialized processor of structures processing (SP which is capable to perform at the hardware level such operations as: add, delete, search, intersect, complete, merge, and others. Advantage of such system is possibility of parallel execution of parts of the computing tasks regarding the access to the sets to data structures simultaneously with arithmetic and logical processing of information.The previous works present the general principles of the computing process arrangement and features of programs implemented in MISD system, describe the structure and principles of functioning the processor of structures processing, show the general principles of the graph task solutions in such system, and experimentally study the efficiency of the received algorithms.The work gives command formats of the SP processor, offers the technique to update the algorithms realized in MISD system, suggests the option of Ford-Falkersona algorithm
Positive Wigner functions render classical simulation of quantum computation efficient.
Mari, A; Eisert, J
2012-12-07
We show that quantum circuits where the initial state and all the following quantum operations can be represented by positive Wigner functions can be classically efficiently simulated. This is true both for continuous-variable as well as discrete variable systems in odd prime dimensions, two cases which will be treated on entirely the same footing. Noting the fact that Clifford and Gaussian operations preserve the positivity of the Wigner function, our result generalizes the Gottesman-Knill theorem. Our algorithm provides a way of sampling from the output distribution of a computation or a simulation, including the efficient sampling from an approximate output distribution in the case of sampling imperfections for initial states, gates, or measurements. In this sense, this work highlights the role of the positive Wigner function as separating classically efficiently simulable systems from those that are potentially universal for quantum computing and simulation, and it emphasizes the role of negativity of the Wigner function as a computational resource.
International Nuclear Information System (INIS)
Li Jing; Sun Yi; Zhu Peiping
2013-01-01
Differential phase-contrast computed tomography (DPC-CT) reconstruction problems are usually solved by using parallel-, fan- or cone-beam algorithms. For rod-shaped objects, the x-ray beams cannot recover all the slices of the sample at the same time. Thus, if a rod-shaped sample is required to be reconstructed by the above algorithms, one should alternately perform translation and rotation on this sample, which leads to lower efficiency. The helical cone-beam CT may significantly improve scanning efficiency for rod-shaped objects over other algorithms. In this paper, we propose a theoretically exact filter-backprojection algorithm for helical cone-beam DPC-CT, which can be applied to reconstruct the refractive index decrement distribution of the samples directly from two-dimensional differential phase-contrast images. Numerical simulations are conducted to verify the proposed algorithm. Our work provides a potential solution for inspecting the rod-shaped samples using DPC-CT, which may be applicable with the evolution of DPC-CT equipments. (paper)
Pronk, Sander; Pouya, Iman; Lundborg, Magnus; Rotskoff, Grant; Wesén, Björn; Kasson, Peter M; Lindahl, Erik
2015-06-09
Computational chemistry and other simulation fields are critically dependent on computing resources, but few problems scale efficiently to the hundreds of thousands of processors available in current supercomputers-particularly for molecular dynamics. This has turned into a bottleneck as new hardware generations primarily provide more processing units rather than making individual units much faster, which simulation applications are addressing by increasingly focusing on sampling with algorithms such as free-energy perturbation, Markov state modeling, metadynamics, or milestoning. All these rely on combining results from multiple simulations into a single observation. They are potentially powerful approaches that aim to predict experimental observables directly, but this comes at the expense of added complexity in selecting sampling strategies and keeping track of dozens to thousands of simulations and their dependencies. Here, we describe how the distributed execution framework Copernicus allows the expression of such algorithms in generic workflows: dataflow programs. Because dataflow algorithms explicitly state dependencies of each constituent part, algorithms only need to be described on conceptual level, after which the execution is maximally parallel. The fully automated execution facilitates the optimization of these algorithms with adaptive sampling, where undersampled regions are automatically detected and targeted without user intervention. We show how several such algorithms can be formulated for computational chemistry problems, and how they are executed efficiently with many loosely coupled simulations using either distributed or parallel resources with Copernicus.
Hybrid Swarm Intelligence Energy Efficient Clustered Routing Algorithm for Wireless Sensor Networks
Directory of Open Access Journals (Sweden)
Rajeev Kumar
2016-01-01
Full Text Available Currently, wireless sensor networks (WSNs are used in many applications, namely, environment monitoring, disaster management, industrial automation, and medical electronics. Sensor nodes carry many limitations like low battery life, small memory space, and limited computing capability. To create a wireless sensor network more energy efficient, swarm intelligence technique has been applied to resolve many optimization issues in WSNs. In many existing clustering techniques an artificial bee colony (ABC algorithm is utilized to collect information from the field periodically. Nevertheless, in the event based applications, an ant colony optimization (ACO is a good solution to enhance the network lifespan. In this paper, we combine both algorithms (i.e., ABC and ACO and propose a new hybrid ABCACO algorithm to solve a Nondeterministic Polynomial (NP hard and finite problem of WSNs. ABCACO algorithm is divided into three main parts: (i selection of optimal number of subregions and further subregion parts, (ii cluster head selection using ABC algorithm, and (iii efficient data transmission using ACO algorithm. We use a hierarchical clustering technique for data transmission; the data is transmitted from member nodes to the subcluster heads and then from subcluster heads to the elected cluster heads based on some threshold value. Cluster heads use an ACO algorithm to discover the best route for data transmission to the base station (BS. The proposed approach is very useful in designing the framework for forest fire detection and monitoring. The simulation results show that the ABCACO algorithm enhances the stability period by 60% and also improves the goodput by 31% against LEACH and WSNCABC, respectively.
Prospective Algorithms for Quantum Evolutionary Computation
Sofge, Donald A.
2008-01-01
This effort examines the intersection of the emerging field of quantum computing and the more established field of evolutionary computation. The goal is to understand what benefits quantum computing might offer to computational intelligence and how computational intelligence paradigms might be implemented as quantum programs to be run on a future quantum computer. We critically examine proposed algorithms and methods for implementing computational intelligence paradigms, primarily focused on ...
Perfect blind restoration of images blurred by multiple filters: theory and efficient algorithms.
Harikumar, G; Bresler, Y
1999-01-01
We address the problem of restoring an image from its noisy convolutions with two or more unknown finite impulse response (FIR) filters. We develop theoretical results about the existence and uniqueness of solutions, and show that under some generically true assumptions, both the filters and the image can be determined exactly in the absence of noise, and stably estimated in its presence. We present efficient algorithms to estimate the blur functions and their sizes. These algorithms are of two types, subspace-based and likelihood-based, and are extensions of techniques proposed for the solution of the multichannel blind deconvolution problem in one dimension. We present memory and computation-efficient techniques to handle the very large matrices arising in the two-dimensional (2-D) case. Once the blur functions are determined, they are used in a multichannel deconvolution step to reconstruct the unknown image. The theoretical and practical implications of edge effects, and "weakly exciting" images are examined. Finally, the algorithms are demonstrated on synthetic and real data.
Aguiar, Derek; Halldórsson, Bjarni V.; Morrow, Eric M.; Istrail, Sorin
2012-01-01
Motivation: The understanding of the genetic determinants of complex disease is undergoing a paradigm shift. Genetic heterogeneity of rare mutations with deleterious effects is more commonly being viewed as a major component of disease. Autism is an excellent example where research is active in identifying matches between the phenotypic and genomic heterogeneities. A considerable portion of autism appears to be correlated with copy number variation, which is not directly probed by single nucleotide polymorphism (SNP) array or sequencing technologies. Identifying the genetic heterogeneity of small deletions remains a major unresolved computational problem partly due to the inability of algorithms to detect them. Results: In this article, we present an algorithmic framework, which we term DELISHUS, that implements three exact algorithms for inferring regions of hemizygosity containing genomic deletions of all sizes and frequencies in SNP genotype data. We implement an efficient backtracking algorithm—that processes a 1 billion entry genome-wide association study SNP matrix in a few minutes—to compute all inherited deletions in a dataset. We further extend our model to give an efficient algorithm for detecting de novo deletions. Finally, given a set of called deletions, we also give a polynomial time algorithm for computing the critical regions of recurrent deletions. DELISHUS achieves significantly lower false-positive rates and higher power than previously published algorithms partly because it considers all individuals in the sample simultaneously. DELISHUS may be applied to SNP array or sequencing data to identify the deletion spectrum for family-based association studies. Availability: DELISHUS is available at http://www.brown.edu/Research/Istrail_Lab/. Contact: Eric_Morrow@brown.edu and Sorin_Istrail@brown.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22689755
A computationally efficient fuzzy control s
Directory of Open Access Journals (Sweden)
Abdel Badie Sharkawy
2013-12-01
Full Text Available This paper develops a decentralized fuzzy control scheme for MIMO nonlinear second order systems with application to robot manipulators via a combination of genetic algorithms (GAs and fuzzy systems. The controller for each degree of freedom (DOF consists of a feedforward fuzzy torque computing system and a feedback fuzzy PD system. The feedforward fuzzy system is trained and optimized off-line using GAs, whereas not only the parameters but also the structure of the fuzzy system is optimized. The feedback fuzzy PD system, on the other hand, is used to keep the closed-loop stable. The rule base consists of only four rules per each DOF. Furthermore, the fuzzy feedback system is decentralized and simplified leading to a computationally efficient control scheme. The proposed control scheme has the following advantages: (1 it needs no exact dynamics of the system and the computation is time-saving because of the simple structure of the fuzzy systems and (2 the controller is robust against various parameters and payload uncertainties. The computational complexity of the proposed control scheme has been analyzed and compared with previous works. Computer simulations show that this controller is effective in achieving the control goals.
Atanassov, E.; Dimitrov, D.; Gurov, T.
2015-10-01
The recent developments in the area of high-performance computing are driven not only by the desire for ever higher performance but also by the rising costs of electricity. The use of various types of accelerators like GPUs, Intel Xeon Phi has become mainstream and many algorithms and applications have been ported to make use of them where available. In Financial Mathematics the question of optimal use of computational resources should also take into account the limitations on space, because in many use cases the servers are deployed close to the exchanges. In this work we evaluate various algorithms for option pricing that we have implemented for different target architectures in terms of their energy and space efficiency. Since it has been established that low-discrepancy sequences may be better than pseudorandom numbers for these types of algorithms, we also test the Sobol and Halton sequences. We present the raw results, the computed metrics and conclusions from our tests.
Energy Technology Data Exchange (ETDEWEB)
Atanassov, E.; Dimitrov, D., E-mail: d.slavov@bas.bg, E-mail: emanouil@parallel.bas.bg, E-mail: gurov@bas.bg; Gurov, T. [Institute of Information and Communication Technologies, BAS, Acad. G. Bonchev str., bl. 25A, 1113 Sofia (Bulgaria)
2015-10-28
The recent developments in the area of high-performance computing are driven not only by the desire for ever higher performance but also by the rising costs of electricity. The use of various types of accelerators like GPUs, Intel Xeon Phi has become mainstream and many algorithms and applications have been ported to make use of them where available. In Financial Mathematics the question of optimal use of computational resources should also take into account the limitations on space, because in many use cases the servers are deployed close to the exchanges. In this work we evaluate various algorithms for option pricing that we have implemented for different target architectures in terms of their energy and space efficiency. Since it has been established that low-discrepancy sequences may be better than pseudorandom numbers for these types of algorithms, we also test the Sobol and Halton sequences. We present the raw results, the computed metrics and conclusions from our tests.
Privacy-Preserving Computation with Trusted Computing via Scramble-then-Compute
Directory of Open Access Journals (Sweden)
Dang Hung
2017-07-01
Full Text Available We consider privacy-preserving computation of big data using trusted computing primitives with limited private memory. Simply ensuring that the data remains encrypted outside the trusted computing environment is insufficient to preserve data privacy, for data movement observed during computation could leak information. While it is possible to thwart such leakage using generic solution such as ORAM [42], designing efficient privacy-preserving algorithms is challenging. Besides computation efficiency, it is critical to keep trusted code bases lean, for large ones are unwieldy to vet and verify. In this paper, we advocate a simple approach wherein many basic algorithms (e.g., sorting can be made privacy-preserving by adding a step that securely scrambles the data before feeding it to the original algorithms. We call this approach Scramble-then-Compute (StC, and give a sufficient condition whereby existing external memory algorithms can be made privacy-preserving via StC. This approach facilitates code-reuse, and its simplicity contributes to a smaller trusted code base. It is also general, allowing algorithm designers to leverage an extensive body of known efficient algorithms for better performance. Our experiments show that StC could offer up to 4.1× speedups over known, application-specific alternatives.
Efficient Algorithms for Real-Time GPU Volumetric Cloud Rendering with Enhanced Geometry
Carlos Jiménez de Parga; Sebastián Rubén Gómez Palomo
2018-01-01
This paper presents several new techniques for volumetric cloud rendering using efficient algorithms and data structures based on ray-tracing methods for cumulus generation, achieving an optimum balance between realism and performance. These techniques target applications such as flight simulations, computer games, and educational software, even with conventional graphics hardware. The contours of clouds are defined by implicit mathematical expressions or triangulated structures inside which ...
An Efficient Algorithm for Unconstrained Optimization
Directory of Open Access Journals (Sweden)
Sergio Gerardo de-los-Cobos-Silva
2015-01-01
Full Text Available This paper presents an original and efficient PSO algorithm, which is divided into three phases: (1 stabilization, (2 breadth-first search, and (3 depth-first search. The proposed algorithm, called PSO-3P, was tested with 47 benchmark continuous unconstrained optimization problems, on a total of 82 instances. The numerical results show that the proposed algorithm is able to reach the global optimum. This work mainly focuses on unconstrained optimization problems from 2 to 1,000 variables.
An algorithm for computing screened Coulomb scattering in GEANT4
Energy Technology Data Exchange (ETDEWEB)
Mendenhall, Marcus H. [Vanderbilt University Free Electron Laser Center, P.O. Box 351816 Station B, Nashville, TN 37235-1816 (United States)]. E-mail: marcus.h.mendenhall@vanderbilt.edu; Weller, Robert A. [Department of Electrical Engineering and Computer Science, Vanderbilt University, P.O. Box 351821 Station B, Nashville, TN 37235-1821 (United States)]. E-mail: robert.a.weller@vanderbilt.edu
2005-01-01
An algorithm has been developed for the GEANT4 Monte-Carlo package for the efficient computation of screened Coulomb interatomic scattering. It explicitly integrates the classical equations of motion for scattering events, resulting in precise tracking of both the projectile and the recoil target nucleus. The algorithm permits the user to plug in an arbitrary screening function, such as Lens-Jensen screening, which is good for backscattering calculations, or Ziegler-Biersack-Littmark screening, which is good for nuclear straggling and implantation problems. This will allow many of the applications of the TRIM and SRIM codes to be extended into the much more general GEANT4 framework where nuclear and other effects can be included.
An algorithm for computing screened Coulomb scattering in GEANT4
International Nuclear Information System (INIS)
Mendenhall, Marcus H.; Weller, Robert A.
2005-01-01
An algorithm has been developed for the GEANT4 Monte-Carlo package for the efficient computation of screened Coulomb interatomic scattering. It explicitly integrates the classical equations of motion for scattering events, resulting in precise tracking of both the projectile and the recoil target nucleus. The algorithm permits the user to plug in an arbitrary screening function, such as Lens-Jensen screening, which is good for backscattering calculations, or Ziegler-Biersack-Littmark screening, which is good for nuclear straggling and implantation problems. This will allow many of the applications of the TRIM and SRIM codes to be extended into the much more general GEANT4 framework where nuclear and other effects can be included
Becker, Hanna; Albera, Laurent; Comon, Pierre; Kachenoura, Amar; Merlet, Isabelle
2017-01-01
As a noninvasive technique, electroencephalography (EEG) is commonly used to monitor the brain signals of patients with epilepsy such as the interictal epileptic spikes. However, the recorded data are often corrupted by artifacts originating, for example, from muscle activities, which may have much higher amplitudes than the interictal epileptic signals of interest. To remove these artifacts, a number of independent component analysis (ICA) techniques were successfully applied. In this paper, we propose a new deflation ICA algorithm, called penalized semialgebraic unitary deflation (P-SAUD) algorithm, that improves upon classical ICA methods by leading to a considerably reduced computational complexity at equivalent performance. This is achieved by employing a penalized semialgebraic extraction scheme, which permits us to identify the epileptic components of interest (interictal spikes) first and obviates the need of extracting subsequent components. The proposed method is evaluated on physiologically plausible simulated EEG data and actual measurements of three patients. The results are compared to those of several popular ICA algorithms as well as second-order blind source separation methods, demonstrating that P-SAUD extracts the epileptic spikes with the same accuracy as the best ICA methods, but reduces the computational complexity by a factor of 10 for 32-channel recordings. This superior computational efficiency is of particular interest considering the increasing use of high-resolution EEG recordings, whose analysis requires algorithms with low computational cost.
International Nuclear Information System (INIS)
Guo, Li; Li, Pei; Pan, Cong; Cheng, Yuxuan; Ding, Zhihua; Li, Peng; Liao, Rujia; Hu, Weiwei; Chen, Zhong
2016-01-01
The complex-based OCT angiography (Angio-OCT) offers high motion contrast by combining both the intensity and phase information. However, due to involuntary bulk tissue motions, complex-valued OCT raw data are processed sequentially with different algorithms for correcting bulk image shifts (BISs), compensating global phase fluctuations (GPFs) and extracting flow signals. Such a complicated procedure results in massive computational load. To mitigate such a problem, in this work, we present an inter-frame complex-correlation (CC) algorithm. The CC algorithm is suitable for parallel processing of both flow signal extraction and BIS correction, and it does not need GPF compensation. This method provides high processing efficiency and shows superiority in motion contrast. The feasibility and performance of the proposed CC algorithm is demonstrated using both flow phantom and live animal experiments. (paper)
Contact-impact algorithms on parallel computers
International Nuclear Information System (INIS)
Zhong Zhihua; Nilsson, Larsgunnar
1994-01-01
Contact-impact algorithms on parallel computers are discussed within the context of explicit finite element analysis. The algorithms concerned include a contact searching algorithm and an algorithm for contact force calculations. The contact searching algorithm is based on the territory concept of the general HITA algorithm. However, no distinction is made between different contact bodies, or between different contact surfaces. All contact segments from contact boundaries are taken as a single set. Hierarchy territories and contact territories are expanded. A three-dimensional bucket sort algorithm is used to sort contact nodes. The defence node algorithm is used in the calculation of contact forces. Both the contact searching algorithm and the defence node algorithm are implemented on the connection machine CM-200. The performance of the algorithms is examined under different circumstances, and numerical results are presented. ((orig.))
Algorithms for parallel computers
International Nuclear Information System (INIS)
Churchhouse, R.F.
1985-01-01
Until relatively recently almost all the algorithms for use on computers had been designed on the (usually unstated) assumption that they were to be run on single processor, serial machines. With the introduction of vector processors, array processors and interconnected systems of mainframes, minis and micros, however, various forms of parallelism have become available. The advantage of parallelism is that it offers increased overall processing speed but it also raises some fundamental questions, including: (i) which, if any, of the existing 'serial' algorithms can be adapted for use in the parallel mode. (ii) How close to optimal can such adapted algorithms be and, where relevant, what are the convergence criteria. (iii) How can we design new algorithms specifically for parallel systems. (iv) For multi-processor systems how can we handle the software aspects of the interprocessor communications. Aspects of these questions illustrated by examples are considered in these lectures. (orig.)
International Nuclear Information System (INIS)
Snyder, Abigail C.; Jiao, Yu
2010-01-01
Neutron experiments at the Spallation Neutron Source (SNS) at Oak Ridge National Laboratory (ORNL) frequently generate large amounts of data (on the order of 106-1012 data points). Hence, traditional data analysis tools run on a single CPU take too long to be practical and scientists are unable to efficiently analyze all data generated by experiments. Our goal is to develop a scalable algorithm to efficiently compute high-dimensional integrals of arbitrary functions. This algorithm can then be used to integrate the four-dimensional integrals that arise as part of modeling intensity from the experiments at the SNS. Here, three different one-dimensional numerical integration solvers from the GNU Scientific Library were modified and implemented to solve four-dimensional integrals. The results of these solvers on a final integrand provided by scientists at the SNS can be compared to the results of other methods, such as quasi-Monte Carlo methods, computing the same integral. A parallelized version of the most efficient method can allow scientists the opportunity to more effectively analyze all experimental data.
Quantum Genetic Algorithms for Computer Scientists
Directory of Open Access Journals (Sweden)
Rafael Lahoz-Beltra
2016-10-01
Full Text Available Genetic algorithms (GAs are a class of evolutionary algorithms inspired by Darwinian natural selection. They are popular heuristic optimisation methods based on simulated genetic mechanisms, i.e., mutation, crossover, etc. and population dynamical processes such as reproduction, selection, etc. Over the last decade, the possibility to emulate a quantum computer (a computer using quantum-mechanical phenomena to perform operations on data has led to a new class of GAs known as “Quantum Genetic Algorithms” (QGAs. In this review, we present a discussion, future potential, pros and cons of this new class of GAs. The review will be oriented towards computer scientists interested in QGAs “avoiding” the possible difficulties of quantum-mechanical phenomena.
Progressive geometric algorithms
Alewijnse, S.P.A.; Bagautdinov, T.M.; de Berg, M.T.; Bouts, Q.W.; ten Brink, Alex P.; Buchin, K.A.; Westenberg, M.A.
2015-01-01
Progressive algorithms are algorithms that, on the way to computing a complete solution to the problem at hand, output intermediate solutions that approximate the complete solution increasingly well. We present a framework for analyzing such algorithms, and develop efficient progressive algorithms
Progressive geometric algorithms
Alewijnse, S.P.A.; Bagautdinov, T.M.; Berg, de M.T.; Bouts, Q.W.; Brink, ten A.P.; Buchin, K.; Westenberg, M.A.
2014-01-01
Progressive algorithms are algorithms that, on the way to computing a complete solution to the problem at hand, output intermediate solutions that approximate the complete solution increasingly well. We present a framework for analyzing such algorithms, and develop efficient progressive algorithms
An Efficient VQ Codebook Search Algorithm Applied to AMR-WB Speech Coding
Directory of Open Access Journals (Sweden)
Cheng-Yu Yeh
2017-04-01
Full Text Available The adaptive multi-rate wideband (AMR-WB speech codec is widely used in modern mobile communication systems for high speech quality in handheld devices. Nonetheless, a major disadvantage is that vector quantization (VQ of immittance spectral frequency (ISF coefficients takes a considerable computational load in the AMR-WB coding. Accordingly, a binary search space-structured VQ (BSS-VQ algorithm is adopted to efficiently reduce the complexity of ISF quantization in AMR-WB. This search algorithm is done through a fast locating technique combined with lookup tables, such that an input vector is efficiently assigned to a subspace where relatively few codeword searches are required to be executed. In terms of overall search performance, this work is experimentally validated as a superior search algorithm relative to a multiple triangular inequality elimination (MTIE, a TIE with dynamic and intersection mechanisms (DI-TIE, and an equal-average equal-variance equal-norm nearest neighbor search (EEENNS approach. With a full search algorithm as a benchmark for overall search load comparison, this work provides an 87% search load reduction at a threshold of quantization accuracy of 0.96, a figure far beyond 55% in the MTIE, 76% in the EEENNS approach, and 83% in the DI-TIE approach.
Once upon an algorithm how stories explain computing
Erwig, Martin
2017-01-01
How Hansel and Gretel, Sherlock Holmes, the movie Groundhog Day, Harry Potter, and other familiar stories illustrate the concepts of computing. Picture a computer scientist, staring at a screen and clicking away frantically on a keyboard, hacking into a system, or perhaps developing an app. Now delete that picture. In Once Upon an Algorithm, Martin Erwig explains computation as something that takes place beyond electronic computers, and computer science as the study of systematic problem solving. Erwig points out that many daily activities involve problem solving. Getting up in the morning, for example: You get up, take a shower, get dressed, eat breakfast. This simple daily routine solves a recurring problem through a series of well-defined steps. In computer science, such a routine is called an algorithm. Erwig illustrates a series of concepts in computing with examples from daily life and familiar stories. Hansel and Gretel, for example, execute an algorithm to get home from the forest. The movie Groundho...
An efficient parallel algorithm for matrix-vector multiplication
Energy Technology Data Exchange (ETDEWEB)
Hendrickson, B.; Leland, R.; Plimpton, S.
1993-03-01
The multiplication of a vector by a matrix is the kernel computation of many algorithms in scientific computation. A fast parallel algorithm for this calculation is therefore necessary if one is to make full use of the new generation of parallel supercomputers. This paper presents a high performance, parallel matrix-vector multiplication algorithm that is particularly well suited to hypercube multiprocessors. For an n x n matrix on p processors, the communication cost of this algorithm is O(n/[radical]p + log(p)), independent of the matrix sparsity pattern. The performance of the algorithm is demonstrated by employing it as the kernel in the well-known NAS conjugate gradient benchmark, where a run time of 6.09 seconds was observed. This is the best published performance on this benchmark achieved to date using a massively parallel supercomputer.
On the Computation of the Efficient Frontier of the Portfolio Selection Problem
Directory of Open Access Journals (Sweden)
Clara Calvo
2012-01-01
Full Text Available An easy-to-use procedure is presented for improving the ε-constraint method for computing the efficient frontier of the portfolio selection problem endowed with additional cardinality and semicontinuous variable constraints. The proposed method provides not only a numerical plotting of the frontier but also an analytical description of it, including the explicit equations of the arcs of parabola it comprises and the change points between them. This information is useful for performing a sensitivity analysis as well as for providing additional criteria to the investor in order to select an efficient portfolio. Computational results are provided to test the efficiency of the algorithm and to illustrate its applications. The procedure has been implemented in Mathematica.
An Efficient Return Algorithm for Non-Associated Mohr-Coulomb Plasticity
DEFF Research Database (Denmark)
Clausen, Johan Christian; Damkilde, Lars; Andersen, Lars
2005-01-01
. The stress return and the formation of the constitutive matrix is carried out in principal stress space, where the manipulations simplify and rely on geometrical arguments. The singularities arising at the intersection of yield planes are dealt with in a straightforward way also based on geometric......An efficient return algorithm for stress update in numerical plasticity computations is presented. The yield criterion must be linear in principal stress space, and can be composed of any number of yield planes. Each of these yield planes can have an associated or non-associated flow rule...
Efficient RNA structure comparison algorithms.
Arslan, Abdullah N; Anandan, Jithendar; Fry, Eric; Monschke, Keith; Ganneboina, Nitin; Bowerman, Jason
2017-12-01
Recently proposed relative addressing-based ([Formula: see text]) RNA secondary structure representation has important features by which an RNA structure database can be stored into a suffix array. A fast substructure search algorithm has been proposed based on binary search on this suffix array. Using this substructure search algorithm, we present a fast algorithm that finds the largest common substructure of given multiple RNA structures in [Formula: see text] format. The multiple RNA structure comparison problem is NP-hard in its general formulation. We introduced a new problem for comparing multiple RNA structures. This problem has more strict similarity definition and objective, and we propose an algorithm that solves this problem efficiently. We also develop another comparison algorithm that iteratively calls this algorithm to locate nonoverlapping large common substructures in compared RNAs. With the new resulting tools, we improved the RNASSAC website (linked from http://faculty.tamuc.edu/aarslan ). This website now also includes two drawing tools: one specialized for preparing RNA substructures that can be used as input by the search tool, and another one for automatically drawing the entire RNA structure from a given structure sequence.
An efficient dictionary learning algorithm and its application to 3-D medical image denoising.
Li, Shutao; Fang, Leyuan; Yin, Haitao
2012-02-01
In this paper, we propose an efficient dictionary learning algorithm for sparse representation of given data and suggest a way to apply this algorithm to 3-D medical image denoising. Our learning approach is composed of two main parts: sparse coding and dictionary updating. On the sparse coding stage, an efficient algorithm named multiple clusters pursuit (MCP) is proposed. The MCP first applies a dictionary structuring strategy to cluster the atoms with high coherence together, and then employs a multiple-selection strategy to select several competitive atoms at each iteration. These two strategies can greatly reduce the computation complexity of the MCP and assist it to obtain better sparse solution. On the dictionary updating stage, the alternating optimization that efficiently approximates the singular value decomposition is introduced. Furthermore, in the 3-D medical image denoising application, a joint 3-D operation is proposed for taking the learning capabilities of the presented algorithm to simultaneously capture the correlations within each slice and correlations across the nearby slices, thereby obtaining better denoising results. The experiments on both synthetically generated data and real 3-D medical images demonstrate that the proposed approach has superior performance compared to some well-known methods. © 2011 IEEE
In-Place Algorithms for Computing (Layers of) Maxima
DEFF Research Database (Denmark)
Blunck, Henrik; Vahrenhold, Jan
2006-01-01
We describe space-efficient algorithms for solving problems related to finding maxima among points in two and three dimensions. Our algorithms run in optimal O(n log2 n) time and require O(1) space in addition to the representation of the input.......We describe space-efficient algorithms for solving problems related to finding maxima among points in two and three dimensions. Our algorithms run in optimal O(n log2 n) time and require O(1) space in addition to the representation of the input....
Sorting on STAR. [CDC computer algorithm timing comparison
Stone, H. S.
1978-01-01
Timing comparisons are given for three sorting algorithms written for the CDC STAR computer. One algorithm is Hoare's (1962) Quicksort, which is the fastest or nearly the fastest sorting algorithm for most computers. A second algorithm is a vector version of Quicksort that takes advantage of the STAR's vector operations. The third algorithm is an adaptation of Batcher's (1968) sorting algorithm, which makes especially good use of vector operations but has a complexity of N(log N)-squared as compared with a complexity of N log N for the Quicksort algorithms. In spite of its worse complexity, Batcher's sorting algorithm is competitive with the serial version of Quicksort for vectors up to the largest that can be treated by STAR. Vector Quicksort outperforms the other two algorithms and is generally preferred. These results indicate that unusual instruction sets can introduce biases in program execution time that counter results predicted by worst-case asymptotic complexity analysis.
Computed laminography and reconstruction algorithm
International Nuclear Information System (INIS)
Que Jiemin; Cao Daquan; Zhao Wei; Tang Xiao
2012-01-01
Computed laminography (CL) is an alternative to computed tomography if large objects are to be inspected with high resolution. This is especially true for planar objects. In this paper, we set up a new scanning geometry for CL, and study the algebraic reconstruction technique (ART) for CL imaging. We compare the results of ART with variant weighted functions by computer simulation with a digital phantom. It proves that ART algorithm is a good choice for the CL system. (authors)
Computing return times or return periods with rare event algorithms
Lestang, Thibault; Ragone, Francesco; Bréhier, Charles-Edouard; Herbert, Corentin; Bouchet, Freddy
2018-04-01
The average time between two occurrences of the same event, referred to as its return time (or return period), is a useful statistical concept for practical applications. For instance insurances or public agencies may be interested by the return time of a 10 m flood of the Seine river in Paris. However, due to their scarcity, reliably estimating return times for rare events is very difficult using either observational data or direct numerical simulations. For rare events, an estimator for return times can be built from the extrema of the observable on trajectory blocks. Here, we show that this estimator can be improved to remain accurate for return times of the order of the block size. More importantly, we show that this approach can be generalised to estimate return times from numerical algorithms specifically designed to sample rare events. So far those algorithms often compute probabilities, rather than return times. The approach we propose provides a computationally extremely efficient way to estimate numerically the return times of rare events for a dynamical system, gaining several orders of magnitude of computational costs. We illustrate the method on two kinds of observables, instantaneous and time-averaged, using two different rare event algorithms, for a simple stochastic process, the Ornstein–Uhlenbeck process. As an example of realistic applications to complex systems, we finally discuss extreme values of the drag on an object in a turbulent flow.
Computational issues in alternating projection algorithms for fixed-order control design
DEFF Research Database (Denmark)
Beran, Eric Bengt; Grigoriadis, K.
1997-01-01
Alternating projection algorithms have been introduced recently to solve fixed-order controller design problems described by linear matrix inequalities and non-convex coupling rank constraints. In this work, an extensive numerical experimentation using proposed benchmark fixed-order control design...... examples is used to indicate the computational efficiency of the method. These results indicate that the proposed alternating projections are effective in obtaining low-order controllers for small and medium order problems...
Spin-neurons: A possible path to energy-efficient neuromorphic computers
Energy Technology Data Exchange (ETDEWEB)
Sharad, Mrigank; Fan, Deliang; Roy, Kaushik [School of Electrical and Computer Engineering, Purdue University, West Lafayette, Indiana 47907 (United States)
2013-12-21
Recent years have witnessed growing interest in the field of brain-inspired computing based on neural-network architectures. In order to translate the related algorithmic models into powerful, yet energy-efficient cognitive-computing hardware, computing-devices beyond CMOS may need to be explored. The suitability of such devices to this field of computing would strongly depend upon how closely their physical characteristics match with the essential computing primitives employed in such models. In this work, we discuss the rationale of applying emerging spin-torque devices for bio-inspired computing. Recent spin-torque experiments have shown the path to low-current, low-voltage, and high-speed magnetization switching in nano-scale magnetic devices. Such magneto-metallic, current-mode spin-torque switches can mimic the analog summing and “thresholding” operation of an artificial neuron with high energy-efficiency. Comparison with CMOS-based analog circuit-model of a neuron shows that “spin-neurons” (spin based circuit model of neurons) can achieve more than two orders of magnitude lower energy and beyond three orders of magnitude reduction in energy-delay product. The application of spin-neurons can therefore be an attractive option for neuromorphic computers of future.
Efficient algorithms of multidimensional γ-ray spectra compression
International Nuclear Information System (INIS)
Morhac, M.; Matousek, V.
2006-01-01
The efficient algorithms to compress multidimensional γ-ray events are presented. Two alternative kinds of compression algorithms based on both the adaptive orthogonal and randomizing transforms are proposed. In both algorithms we employ the reduction of data volume due to the symmetry of the γ-ray spectra
Quantum many-body effects in x-ray spectra efficiently computed using a basic graph algorithm
Liang, Yufeng; Prendergast, David
2018-05-01
The growing interest in using x-ray spectroscopy for refined materials characterization calls for an accurate electronic-structure theory to interpret the x-ray near-edge fine structure. In this work, we propose an efficient and unified framework to describe all the many-electron processes in a Fermi liquid after a sudden perturbation (such as a core hole). This problem has been visited by the Mahan-Noziéres-De Dominicis (MND) theory, but it is intractable to implement various Feynman diagrams within first-principles calculations. Here, we adopt a nondiagrammatic approach and treat all the many-electron processes in the MND theory on an equal footing. Starting from a recently introduced determinant formalism [Phys. Rev. Lett. 118, 096402 (2017), 10.1103/PhysRevLett.118.096402], we exploit the linear dependence of determinants describing different final states involved in the spectral calculations. An elementary graph algorithm, breadth-first search, can be used to quickly identify the important determinants for shaping the spectrum, which avoids the need to evaluate a great number of vanishingly small terms. This search algorithm is performed over the tree-structure of the many-body expansion, which mimics a path-finding process. We demonstrate that the determinantal approach is computationally inexpensive even for obtaining x-ray spectra of extended systems. Using Kohn-Sham orbitals from two self-consistent fields (ground and core-excited state) as input for constructing the determinants, the calculated x-ray spectra for a number of transition metal oxides are in good agreement with experiments. Many-electron aspects beyond the Bethe-Salpeter equation, as captured by this approach, are also discussed, such as shakeup excitations and many-body wave function overlap considered in Anderson's orthogonality catastrophe.
Applying Kitaev's algorithm in an ion trap quantum computer
International Nuclear Information System (INIS)
Travaglione, B.; Milburn, G.J.
2000-01-01
Full text: Kitaev's algorithm is a method of estimating eigenvalues associated with an operator. Shor's factoring algorithm, which enables a quantum computer to crack RSA encryption codes, is a specific example of Kitaev's algorithm. It has been proposed that the algorithm can also be used to generate eigenstates. We extend this proposal for small quantum systems, identifying the conditions under which the algorithm can successfully generate eigenstates. We then propose an implementation scheme based on an ion trap quantum computer. This scheme allows us to illustrate a simple example, in which the algorithm effectively generates eigenstates
Rastogi, Richa; Srivastava, Abhishek; Khonde, Kiran; Sirasala, Kirannmayi M.; Londhe, Ashutosh; Chavhan, Hitesh
2015-07-01
This paper presents an efficient parallel 3D Kirchhoff depth migration algorithm suitable for current class of multicore architecture. The fundamental Kirchhoff depth migration algorithm exhibits inherent parallelism however, when it comes to 3D data migration, as the data size increases the resource requirement of the algorithm also increases. This challenges its practical implementation even on current generation high performance computing systems. Therefore a smart parallelization approach is essential to handle 3D data for migration. The most compute intensive part of Kirchhoff depth migration algorithm is the calculation of traveltime tables due to its resource requirements such as memory/storage and I/O. In the current research work, we target this area and develop a competent parallel algorithm for post and prestack 3D Kirchhoff depth migration, using hybrid MPI+OpenMP programming techniques. We introduce a concept of flexi-depth iterations while depth migrating data in parallel imaging space, using optimized traveltime table computations. This concept provides flexibility to the algorithm by migrating data in a number of depth iterations, which depends upon the available node memory and the size of data to be migrated during runtime. Furthermore, it minimizes the requirements of storage, I/O and inter-node communication, thus making it advantageous over the conventional parallelization approaches. The developed parallel algorithm is demonstrated and analysed on Yuva II, a PARAM series of supercomputers. Optimization, performance and scalability experiment results along with the migration outcome show the effectiveness of the parallel algorithm.
Comparison of evolutionary computation algorithms for solving bi ...
Indian Academy of Sciences (India)
failure probability. Multiobjective Evolutionary Computation algorithms (MOEAs) are well-suited for Multiobjective task scheduling on heterogeneous environment. The two Multi-Objective Evolutionary Algorithms such as Multiobjective Genetic. Algorithm (MOGA) and Multiobjective Evolutionary Programming (MOEP) with.
Efficient Algorithms for Real-Time GPU Volumetric Cloud Rendering with Enhanced Geometry
Directory of Open Access Journals (Sweden)
Carlos Jiménez de Parga
2018-04-01
Full Text Available This paper presents several new techniques for volumetric cloud rendering using efficient algorithms and data structures based on ray-tracing methods for cumulus generation, achieving an optimum balance between realism and performance. These techniques target applications such as flight simulations, computer games, and educational software, even with conventional graphics hardware. The contours of clouds are defined by implicit mathematical expressions or triangulated structures inside which volumetric rendering is performed. Novel techniques are used to reproduce the asymmetrical nature of clouds and the effects of light-scattering, with low computing costs. The work includes a new method to create randomized fractal clouds using a recursive grammar. The graphical results are comparable to those produced by state-of-the-art, hyper-realistic algorithms. These methods provide real-time performance, and are superior to particle-based systems. These outcomes suggest that our methods offer a good balance between realism and performance, and are suitable for use in the standard graphics industry.
An efficient algorithm for global periodic orbits generation near irregular-shaped asteroids
Shang, Haibin; Wu, Xiaoyu; Ren, Yuan; Shan, Jinjun
2017-07-01
Periodic orbits (POs) play an important role in understanding dynamical behaviors around natural celestial bodies. In this study, an efficient algorithm was presented to generate the global POs around irregular-shaped uniformly rotating asteroids. The algorithm was performed in three steps, namely global search, local refinement, and model continuation. First, a mascon model with a low number of particles and optimized mass distribution was constructed to remodel the exterior gravitational potential of the asteroid. Using this model, a multi-start differential evolution enhanced with a deflection strategy with strong global exploration and bypassing abilities was adopted. This algorithm can be regarded as a search engine to find multiple globally optimal regions in which potential POs were located. This was followed by applying a differential correction to locally refine global search solutions and generate the accurate POs in the mascon model in which an analytical Jacobian matrix was derived to improve convergence. Finally, the concept of numerical model continuation was introduced and used to convert the POs from the mascon model into a high-fidelity polyhedron model by sequentially correcting the initial states. The efficiency of the proposed algorithm was substantiated by computing the global POs around an elongated shoe-shaped asteroid 433 Eros. Various global POs with different topological structures in the configuration space were successfully located. Specifically, the proposed algorithm was generic and could be conveniently extended to explore periodic motions in other gravitational systems.
Efficiently computing exact geodesic loops within finite steps.
Xin, Shi-Qing; He, Ying; Fu, Chi-Wing
2012-06-01
Closed geodesics, or geodesic loops, are crucial to the study of differential topology and differential geometry. Although the existence and properties of closed geodesics on smooth surfaces have been widely studied in mathematics community, relatively little progress has been made on how to compute them on polygonal surfaces. Most existing algorithms simply consider the mesh as a graph and so the resultant loops are restricted only on mesh edges, which are far from the actual geodesics. This paper is the first to prove the existence and uniqueness of geodesic loop restricted on a closed face sequence; it contributes also with an efficient algorithm to iteratively evolve an initial closed path on a given mesh into an exact geodesic loop within finite steps. Our proposed algorithm takes only an O(k) space complexity and an O(mk) time complexity (experimentally), where m is the number of vertices in the region bounded by the initial loop and the resultant geodesic loop, and k is the average number of edges in the edge sequences that the evolving loop passes through. In contrast to the existing geodesic curvature flow methods which compute an approximate geodesic loop within a predefined threshold, our method is exact and can apply directly to triangular meshes without needing to solve any differential equation with a numerical solver; it can run at interactive speed, e.g., in the order of milliseconds, for a mesh with around 50K vertices, and hence, significantly outperforms existing algorithms. Actually, our algorithm could run at interactive speed even for larger meshes. Besides the complexity of the input mesh, the geometric shape could also affect the number of evolving steps, i.e., the performance. We motivate our algorithm with an interactive shape segmentation example shown later in the paper.
Performance indices and evaluation of algorithms in building energy efficient design optimization
International Nuclear Information System (INIS)
Si, Binghui; Tian, Zhichao; Jin, Xing; Zhou, Xin; Tang, Peng; Shi, Xing
2016-01-01
Building energy efficient design optimization is an emerging technique that is increasingly being used to design buildings with better overall performance and a particular emphasis on energy efficiency. To achieve building energy efficient design optimization, algorithms are vital to generate new designs and thus drive the design optimization process. Therefore, the performance of algorithms is crucial to achieving effective energy efficient design techniques. This study evaluates algorithms used for building energy efficient design optimization. A set of performance indices, namely, stability, robustness, validity, speed, coverage, and locality, is proposed to evaluate the overall performance of algorithms. A benchmark building and a design optimization problem are also developed. Hooke–Jeeves algorithm, Multi-Objective Genetic Algorithm II, and Multi-Objective Particle Swarm Optimization algorithm are evaluated by using the proposed performance indices and benchmark design problem. Results indicate that no algorithm performs best in all six areas. Therefore, when facing an energy efficient design problem, the algorithm must be carefully selected based on the nature of the problem and the performance indices that matter the most. - Highlights: • Six indices of algorithm performance in building energy optimization are developed. • For each index, its concept is defined and the calculation formulas are proposed. • A benchmark building and benchmark energy efficient design problem are proposed. • The performance of three selected algorithms are evaluated.
Efficient computation of spaced seeds
Directory of Open Access Journals (Sweden)
Ilie Silvana
2012-02-01
Full Text Available Abstract Background The most frequently used tools in bioinformatics are those searching for similarities, or local alignments, between biological sequences. Since the exact dynamic programming algorithm is quadratic, linear-time heuristics such as BLAST are used. Spaced seeds are much more sensitive than the consecutive seed of BLAST and using several seeds represents the current state of the art in approximate search for biological sequences. The most important aspect is computing highly sensitive seeds. Since the problem seems hard, heuristic algorithms are used. The leading software in the common Bernoulli model is the SpEED program. Findings SpEED uses a hill climbing method based on the overlap complexity heuristic. We propose a new algorithm for this heuristic that improves its speed by over one order of magnitude. We use the new implementation to compute improved seeds for several software programs. We compute as well multiple seeds of the same weight as MegaBLAST, that greatly improve its sensitivity. Conclusion Multiple spaced seeds are being successfully used in bioinformatics software programs. Enabling researchers to compute very fast high quality seeds will help expanding the range of their applications.
Trobec, Roman
2015-01-01
This book is concentrated on the synergy between computer science and numerical analysis. It is written to provide a firm understanding of the described approaches to computer scientists, engineers or other experts who have to solve real problems. The meshless solution approach is described in more detail, with a description of the required algorithms and the methods that are needed for the design of an efficient computer program. Most of the details are demonstrated on solutions of practical problems, from basic to more complicated ones. This book will be a useful tool for any reader interes
An efficient genetic algorithm for maximum coverage deployment in wireless sensor networks.
Yoon, Yourim; Kim, Yong-Hyuk
2013-10-01
Sensor networks have a lot of applications such as battlefield surveillance, environmental monitoring, and industrial diagnostics. Coverage is one of the most important performance metrics for sensor networks since it reflects how well a sensor field is monitored. In this paper, we introduce the maximum coverage deployment problem in wireless sensor networks and analyze the properties of the problem and its solution space. Random deployment is the simplest way to deploy sensor nodes but may cause unbalanced deployment and therefore, we need a more intelligent way for sensor deployment. We found that the phenotype space of the problem is a quotient space of the genotype space in a mathematical view. Based on this property, we propose an efficient genetic algorithm using a novel normalization method. A Monte Carlo method is adopted to design an efficient evaluation function, and its computation time is decreased without loss of solution quality using a method that starts from a small number of random samples and gradually increases the number for subsequent generations. The proposed genetic algorithms could be further improved by combining with a well-designed local search. The performance of the proposed genetic algorithm is shown by a comparative experimental study. When compared with random deployment and existing methods, our genetic algorithm was not only about twice faster, but also showed significant performance improvement in quality.
A neural algorithm for a fundamental computing problem.
Dasgupta, Sanjoy; Stevens, Charles F; Navlakha, Saket
2017-11-10
Similarity search-for example, identifying similar images in a database or similar documents on the web-is a fundamental computing problem faced by large-scale information retrieval systems. We discovered that the fruit fly olfactory circuit solves this problem with a variant of a computer science algorithm (called locality-sensitive hashing). The fly circuit assigns similar neural activity patterns to similar odors, so that behaviors learned from one odor can be applied when a similar odor is experienced. The fly algorithm, however, uses three computational strategies that depart from traditional approaches. These strategies can be translated to improve the performance of computational similarity searches. This perspective helps illuminate the logic supporting an important sensory function and provides a conceptually new algorithm for solving a fundamental computational problem. Copyright © 2017 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.
Zhou, Hong; Zhou, Michael; Li, Daisy; Manthey, Joseph; Lioutikova, Ekaterina; Wang, Hong; Zeng, Xiao
2017-11-17
The beauty and power of the genome editing mechanism, CRISPR Cas9 endonuclease system, lies in the fact that it is RNA-programmable such that Cas9 can be guided to any genomic loci complementary to a 20-nt RNA, single guide RNA (sgRNA), to cleave double stranded DNA, allowing the introduction of wanted mutations. Unfortunately, it has been reported repeatedly that the sgRNA can also guide Cas9 to off-target sites where the DNA sequence is homologous to sgRNA. Using human genome and Streptococcus pyogenes Cas9 (SpCas9) as an example, this article mathematically analyzed the probabilities of off-target homologies of sgRNAs and discovered that for large genome size such as human genome, potential off-target homologies are inevitable for sgRNA selection. A highly efficient computationl algorithm was developed for whole genome sgRNA design and off-target homology searches. By means of a dynamically constructed sequence-indexed database and a simplified sequence alignment method, this algorithm achieves very high efficiency while guaranteeing the identification of all existing potential off-target homologies. Via this algorithm, 1,876,775 sgRNAs were designed for the 19,153 human mRNA genes and only two sgRNAs were found to be free of off-target homology. By means of the novel and efficient sgRNA homology search algorithm introduced in this article, genome wide sgRNA design and off-target analysis were conducted and the results confirmed the mathematical analysis that for a sgRNA sequence, it is almost impossible to escape potential off-target homologies. Future innovations on the CRISPR Cas9 gene editing technology need to focus on how to eliminate the Cas9 off-target activity.
A computationally efficient OMP-based compressed sensing reconstruction for dynamic MRI
International Nuclear Information System (INIS)
Usman, M; Prieto, C; Schaeffter, T; Batchelor, P G; Odille, F; Atkinson, D
2011-01-01
Compressed sensing (CS) methods in MRI are computationally intensive. Thus, designing novel CS algorithms that can perform faster reconstructions is crucial for everyday applications. We propose a computationally efficient orthogonal matching pursuit (OMP)-based reconstruction, specifically suited to cardiac MR data. According to the energy distribution of a y-f space obtained from a sliding window reconstruction, we label the y-f space as static or dynamic. For static y-f space images, a computationally efficient masked OMP reconstruction is performed, whereas for dynamic y-f space images, standard OMP reconstruction is used. The proposed method was tested on a dynamic numerical phantom and two cardiac MR datasets. Depending on the field of view composition of the imaging data, compared to the standard OMP method, reconstruction speedup factors ranging from 1.5 to 2.5 are achieved. (note)
Dataflow-Based Mapping of Computer Vision Algorithms onto FPGAs
Directory of Open Access Journals (Sweden)
Ivan Corretjer
2007-01-01
Full Text Available We develop a design methodology for mapping computer vision algorithms onto an FPGA through the use of coarse-grain reconfigurable dataflow graphs as a representation to guide the designer. We first describe a new dataflow modeling technique called homogeneous parameterized dataflow (HPDF, which effectively captures the structure of an important class of computer vision applications. This form of dynamic dataflow takes advantage of the property that in a large number of image processing applications, data production and consumption rates can vary, but are equal across dataflow graph edges for any particular application iteration. After motivating and defining the HPDF model of computation, we develop an HPDF-based design methodology that offers useful properties in terms of verifying correctness and exposing performance-enhancing transformations; we discuss and address various challenges in efficiently mapping an HPDF-based application representation into target-specific HDL code; and we present experimental results pertaining to the mapping of a gesture recognition application onto the Xilinx Virtex II FPGA.
Role of computational efficiency in process simulation
Directory of Open Access Journals (Sweden)
Kurt Strand
1989-07-01
Full Text Available It is demonstrated how efficient numerical algorithms may be combined to yield a powerful environment for analysing and simulating dynamic systems. The importance of using efficient numerical algorithms is emphasized and demonstrated through examples from the petrochemical industry.
Algorithms for image processing and computer vision
Parker, J R
2010-01-01
A cookbook of algorithms for common image processing applications Thanks to advances in computer hardware and software, algorithms have been developed that support sophisticated image processing without requiring an extensive background in mathematics. This bestselling book has been fully updated with the newest of these, including 2D vision methods in content-based searches and the use of graphics cards as image processing computational aids. It's an ideal reference for software engineers and developers, advanced programmers, graphics programmers, scientists, and other specialists wh
Gentzsch, Wolfgang
1986-01-01
The GAMM Committee for Numerical Methods in Fluid Mechanics organizes workshops which should bring together experts of a narrow field of computational fluid dynamics (CFD) to exchange ideas and experiences in order to speed-up the development in this field. In this sense it was suggested that a workshop should treat the solution of CFD problems on vector computers. Thus we organized a workshop with the title "The efficient use of vector computers with emphasis on computational fluid dynamics". The workshop took place at the Computing Centre of the University of Karlsruhe, March 13-15,1985. The participation had been restricted to 22 people of 7 countries. 18 papers have been presented. In the announcement of the workshop we wrote: "Fluid mechanics has actively stimulated the development of superfast vector computers like the CRAY's or CYBER 205. Now these computers on their turn stimulate the development of new algorithms which result in a high degree of vectorization (sca1ar/vectorized execution-time). But w...
Efficient Algorithms for Segmentation of Item-Set Time Series
Chundi, Parvathi; Rosenkrantz, Daniel J.
We propose a special type of time series, which we call an item-set time series, to facilitate the temporal analysis of software version histories, email logs, stock market data, etc. In an item-set time series, each observed data value is a set of discrete items. We formalize the concept of an item-set time series and present efficient algorithms for segmenting a given item-set time series. Segmentation of a time series partitions the time series into a sequence of segments where each segment is constructed by combining consecutive time points of the time series. Each segment is associated with an item set that is computed from the item sets of the time points in that segment, using a function which we call a measure function. We then define a concept called the segment difference, which measures the difference between the item set of a segment and the item sets of the time points in that segment. The segment difference values are required to construct an optimal segmentation of the time series. We describe novel and efficient algorithms to compute segment difference values for each of the measure functions described in the paper. We outline a dynamic programming based scheme to construct an optimal segmentation of the given item-set time series. We use the item-set time series segmentation techniques to analyze the temporal content of three different data sets—Enron email, stock market data, and a synthetic data set. The experimental results show that an optimal segmentation of item-set time series data captures much more temporal content than a segmentation constructed based on the number of time points in each segment, without examining the item set data at the time points, and can be used to analyze different types of temporal data.
Lee, Y. C.; Thompson, H. M.; Gaskell, P. H.
2009-12-01
FILMPAR is a highly efficient and portable parallel multigrid algorithm for solving a discretised form of the lubrication approximation to three-dimensional, gravity-driven, continuous thin film free-surface flow over substrates containing micro-scale topography. While generally applicable to problems involving heterogeneous and distributed features, for illustrative purposes the algorithm is benchmarked on a distributed memory IBM BlueGene/P computing platform for the case of flow over a single trench topography, enabling direct comparison with complementary experimental data and existing serial multigrid solutions. Parallel performance is assessed as a function of the number of processors employed and shown to lead to super-linear behaviour for the production of mesh-independent solutions. In addition, the approach is used to solve for the case of flow over a complex inter-connected topographical feature and a description provided of how FILMPAR could be adapted relatively simply to solve for a wider class of related thin film flow problems. Program summaryProgram title: FILMPAR Catalogue identifier: AEEL_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEEL_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 530 421 No. of bytes in distributed program, including test data, etc.: 1 960 313 Distribution format: tar.gz Programming language: C++ and MPI Computer: Desktop, server Operating system: Unix/Linux Mac OS X Has the code been vectorised or parallelised?: Yes. Tested with up to 128 processors RAM: 512 MBytes Classification: 12 External routines: GNU C/C++, MPI Nature of problem: Thin film flows over functional substrates containing well-defined single and complex topographical features are of enormous significance, having a wide variety of engineering
An Alternative Algorithm for Computing Watersheds on Shared Memory Parallel Computers
Meijster, A.; Roerdink, J.B.T.M.
1995-01-01
In this paper a parallel implementation of a watershed algorithm is proposed. The algorithm can easily be implemented on shared memory parallel computers. The watershed transform is generally considered to be inherently sequential since the discrete watershed of an image is defined using recursion.
Directory of Open Access Journals (Sweden)
Sergey Kharitonov
2015-06-01
Full Text Available Optimum transport infrastructure usage is an important aspect of the development of the national economy of the Russian Federation. Thus, development of instruments for assessing the efficiency of infrastructure is impossible without constant monitoring of a number of significant indicators. This work is devoted to the selection of indicators and the method of their calculation in relation to the transport subsystem as airport infrastructure. The work also reflects aspects of the evaluation of the possibilities of algorithmic computational mechanisms to improve the tools of public administration transport subsystems.
Directory of Open Access Journals (Sweden)
S. Selvi
2015-07-01
Full Text Available Grid computing solves high performance and high-throughput computing problems through sharing resources ranging from personal computers to super computers distributed around the world. As the grid environments facilitate distributed computation, the scheduling of grid jobs has become an important issue. In this paper, an investigation on implementing Multiobjective Variable Neighborhood Search (MVNS algorithm for scheduling independent jobs on computational grid is carried out. The performance of the proposed algorithm has been evaluated with Min–Min algorithm, Simulated Annealing (SA and Greedy Randomized Adaptive Search Procedure (GRASP algorithm. Simulation results show that MVNS algorithm generally performs better than other metaheuristics methods.
Discrete Riccati equation solutions: Distributed algorithms
Directory of Open Access Journals (Sweden)
D. G. Lainiotis
1996-01-01
Full Text Available In this paper new distributed algorithms for the solution of the discrete Riccati equation are introduced. The algorithms are used to provide robust and computational efficient solutions to the discrete Riccati equation. The proposed distributed algorithms are theoretically interesting and computationally attractive.
A Novel Clustering Algorithm Inspired by Membrane Computing
Directory of Open Access Journals (Sweden)
Hong Peng
2015-01-01
Full Text Available P systems are a class of distributed parallel computing models; this paper presents a novel clustering algorithm, which is inspired from mechanism of a tissue-like P system with a loop structure of cells, called membrane clustering algorithm. The objects of the cells express the candidate centers of clusters and are evolved by the evolution rules. Based on the loop membrane structure, the communication rules realize a local neighborhood topology, which helps the coevolution of the objects and improves the diversity of objects in the system. The tissue-like P system can effectively search for the optimal partitioning with the help of its parallel computing advantage. The proposed clustering algorithm is evaluated on four artificial data sets and six real-life data sets. Experimental results show that the proposed clustering algorithm is superior or competitive to k-means algorithm and several evolutionary clustering algorithms recently reported in the literature.
ERGC: an efficient referential genome compression algorithm.
Saha, Subrata; Rajasekaran, Sanguthevar
2015-11-01
Genome sequencing has become faster and more affordable. Consequently, the number of available complete genomic sequences is increasing rapidly. As a result, the cost to store, process, analyze and transmit the data is becoming a bottleneck for research and future medical applications. So, the need for devising efficient data compression and data reduction techniques for biological sequencing data is growing by the day. Although there exists a number of standard data compression algorithms, they are not efficient in compressing biological data. These generic algorithms do not exploit some inherent properties of the sequencing data while compressing. To exploit statistical and information-theoretic properties of genomic sequences, we need specialized compression algorithms. Five different next-generation sequencing data compression problems have been identified and studied in the literature. We propose a novel algorithm for one of these problems known as reference-based genome compression. We have done extensive experiments using five real sequencing datasets. The results on real genomes show that our proposed algorithm is indeed competitive and performs better than the best known algorithms for this problem. It achieves compression ratios that are better than those of the currently best performing algorithms. The time to compress and decompress the whole genome is also very promising. The implementations are freely available for non-commercial purposes. They can be downloaded from http://engr.uconn.edu/∼rajasek/ERGC.zip. rajasek@engr.uconn.edu. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Parallel grid generation algorithm for distributed memory computers
Moitra, Stuti; Moitra, Anutosh
1994-01-01
A parallel grid-generation algorithm and its implementation on the Intel iPSC/860 computer are described. The grid-generation scheme is based on an algebraic formulation of homotopic relations. Methods for utilizing the inherent parallelism of the grid-generation scheme are described, and implementation of multiple levELs of parallelism on multiple instruction multiple data machines are indicated. The algorithm is capable of providing near orthogonality and spacing control at solid boundaries while requiring minimal interprocessor communications. Results obtained on the Intel hypercube for a blended wing-body configuration are used to demonstrate the effectiveness of the algorithm. Fortran implementations bAsed on the native programming model of the iPSC/860 computer and the Express system of software tools are reported. Computational gains in execution time speed-up ratios are given.
Novel Intermode Prediction Algorithm for High Efficiency Video Coding Encoder
Directory of Open Access Journals (Sweden)
Chan-seob Park
2014-01-01
Full Text Available The joint collaborative team on video coding (JCT-VC is developing the next-generation video coding standard which is called high efficiency video coding (HEVC. In the HEVC, there are three units in block structure: coding unit (CU, prediction unit (PU, and transform unit (TU. The CU is the basic unit of region splitting like macroblock (MB. Each CU performs recursive splitting into four blocks with equal size, starting from the tree block. In this paper, we propose a fast CU depth decision algorithm for HEVC technology to reduce its computational complexity. In 2N×2N PU, the proposed method compares the rate-distortion (RD cost and determines the depth using the compared information. Moreover, in order to speed up the encoding time, the efficient merge SKIP detection method is developed additionally based on the contextual mode information of neighboring CUs. Experimental result shows that the proposed algorithm achieves the average time-saving factor of 44.84% in the random access (RA at Main profile configuration with the HEVC test model (HM 10.0 reference software. Compared to HM 10.0 encoder, a small BD-bitrate loss of 0.17% is also observed without significant loss of image quality.
Directory of Open Access Journals (Sweden)
A. Yu. Popov
2015-01-01
Full Text Available Bauman Moscow State Technical University is implementing a project to develop operating principles of computer system having radically new architecture. A developed working model of the system allowed us to evaluate an efficiency of developed hardware and software. The experimental results presented in previous studies, as well as the analysis of operating principles of new computer system permit to draw conclusions regarding its efficiency in solving discrete optimization problems related to processing of sets.The new architecture is based on a direct hardware support of operations of discrete mathematics, which is reflected in using the special facilities for processing of sets and data structures. Within the framework of the project a special device was designed, i.e. a structure processor (SP, which improved the performance, without limiting the scope of applications of such a computer system.The previous works presented the basic principles of the computational process organization in MISD (Multiple Instructions, Single Data system, showed the structure and features of the structure processor and the general principles to solve discrete optimization problems on graphs.This paper examines two search algorithms of the minimum spanning tree, namely Kruskal's and Prim's algorithms. It studies the implementations of algorithms for two SP operation modes: coprocessor mode and MISD one. The paper presents results of experimental comparison of MISD system performance in coprocessor mode with mainframes.
Efficient computation of clipped Voronoi diagram for mesh generation
Yan, Dongming
2013-04-01
The Voronoi diagram is a fundamental geometric structure widely used in various fields, especially in computer graphics and geometry computing. For a set of points in a compact domain (i.e. a bounded and closed 2D region or a 3D volume), some Voronoi cells of their Voronoi diagram are infinite or partially outside of the domain, but in practice only the parts of the cells inside the domain are needed, as when computing the centroidal Voronoi tessellation. Such a Voronoi diagram confined to a compact domain is called a clipped Voronoi diagram. We present an efficient algorithm to compute the clipped Voronoi diagram for a set of sites with respect to a compact 2D region or a 3D volume. We also apply the proposed method to optimal mesh generation based on the centroidal Voronoi tessellation. Crown Copyright © 2011 Published by Elsevier Ltd. All rights reserved.
Efficient computation of clipped Voronoi diagram for mesh generation
Yan, Dongming; Wang, Wen Ping; Lé vy, Bruno L.; Liu, Yang
2013-01-01
The Voronoi diagram is a fundamental geometric structure widely used in various fields, especially in computer graphics and geometry computing. For a set of points in a compact domain (i.e. a bounded and closed 2D region or a 3D volume), some Voronoi cells of their Voronoi diagram are infinite or partially outside of the domain, but in practice only the parts of the cells inside the domain are needed, as when computing the centroidal Voronoi tessellation. Such a Voronoi diagram confined to a compact domain is called a clipped Voronoi diagram. We present an efficient algorithm to compute the clipped Voronoi diagram for a set of sites with respect to a compact 2D region or a 3D volume. We also apply the proposed method to optimal mesh generation based on the centroidal Voronoi tessellation. Crown Copyright © 2011 Published by Elsevier Ltd. All rights reserved.
Directory of Open Access Journals (Sweden)
Dębski Roman
2016-06-01
Full Text Available A new dynamic programming based parallel algorithm adapted to on-board heterogeneous computers for simulation based trajectory optimization is studied in the context of “high-performance sailing”. The algorithm uses a new discrete space of continuously differentiable functions called the multi-splines as its search space representation. A basic version of the algorithm is presented in detail (pseudo-code, time and space complexity, search space auto-adaptation properties. Possible extensions of the basic algorithm are also described. The presented experimental results show that contemporary heterogeneous on-board computers can be effectively used for solving simulation based trajectory optimization problems. These computers can be considered micro high performance computing (HPC platforms-they offer high performance while remaining energy and cost efficient. The simulation based approach can potentially give highly accurate results since the mathematical model that the simulator is built upon may be as complex as required. The approach described is applicable to many trajectory optimization problems due to its black-box represented performance measure and use of OpenCL.
Yalavarthy, Phaneendra K; Lynch, Daniel R; Pogue, Brian W; Dehghani, Hamid; Paulsen, Keith D
2008-05-01
Three-dimensional (3D) diffuse optical tomography is known to be a nonlinear, ill-posed and sometimes under-determined problem, where regularization is added to the minimization to allow convergence to a unique solution. In this work, a generalized least-squares (GLS) minimization method was implemented, which employs weight matrices for both data-model misfit and optical properties to include their variances and covariances, using a computationally efficient scheme. This allows inversion of a matrix that is of a dimension dictated by the number of measurements, instead of by the number of imaging parameters. This increases the computation speed up to four times per iteration in most of the under-determined 3D imaging problems. An analytic derivation, using the Sherman-Morrison-Woodbury identity, is shown for this efficient alternative form and it is proven to be equivalent, not only analytically, but also numerically. Equivalent alternative forms for other minimization methods, like Levenberg-Marquardt (LM) and Tikhonov, are also derived. Three-dimensional reconstruction results indicate that the poor recovery of quantitatively accurate values in 3D optical images can also be a characteristic of the reconstruction algorithm, along with the target size. Interestingly, usage of GLS reconstruction methods reduces error in the periphery of the image, as expected, and improves by 20% the ability to quantify local interior regions in terms of the recovered optical contrast, as compared to LM methods. Characterization of detector photo-multiplier tubes noise has enabled the use of the GLS method for reconstructing experimental data and showed a promise for better quantification of target in 3D optical imaging. Use of these new alternative forms becomes effective when the ratio of the number of imaging property parameters exceeds the number of measurements by a factor greater than 2.
Efficient sampling algorithms for Monte Carlo based treatment planning
International Nuclear Information System (INIS)
DeMarco, J.J.; Solberg, T.D.; Chetty, I.; Smathers, J.B.
1998-01-01
Efficient sampling algorithms are necessary for producing a fast Monte Carlo based treatment planning code. This study evaluates several aspects of a photon-based tracking scheme and the effect of optimal sampling algorithms on the efficiency of the code. Four areas were tested: pseudo-random number generation, generalized sampling of a discrete distribution, sampling from the exponential distribution, and delta scattering as applied to photon transport through a heterogeneous simulation geometry. Generalized sampling of a discrete distribution using the cutpoint method can produce speedup gains of one order of magnitude versus conventional sequential sampling. Photon transport modifications based upon the delta scattering method were implemented and compared with a conventional boundary and collision checking algorithm. The delta scattering algorithm is faster by a factor of six versus the conventional algorithm for a boundary size of 5 mm within a heterogeneous geometry. A comparison of portable pseudo-random number algorithms and exponential sampling techniques is also discussed
Enabling high performance computational science through combinatorial algorithms
International Nuclear Information System (INIS)
Boman, Erik G; Bozdag, Doruk; Catalyurek, Umit V; Devine, Karen D; Gebremedhin, Assefaw H; Hovland, Paul D; Pothen, Alex; Strout, Michelle Mills
2007-01-01
The Combinatorial Scientific Computing and Petascale Simulations (CSCAPES) Institute is developing algorithms and software for combinatorial problems that play an enabling role in scientific and engineering computations. Discrete algorithms will be increasingly critical for achieving high performance for irregular problems on petascale architectures. This paper describes recent contributions by researchers at the CSCAPES Institute in the areas of load balancing, parallel graph coloring, performance improvement, and parallel automatic differentiation
Enabling high performance computational science through combinatorial algorithms
Energy Technology Data Exchange (ETDEWEB)
Boman, Erik G [Discrete Algorithms and Math Department, Sandia National Laboratories (United States); Bozdag, Doruk [Biomedical Informatics, and Electrical and Computer Engineering, Ohio State University (United States); Catalyurek, Umit V [Biomedical Informatics, and Electrical and Computer Engineering, Ohio State University (United States); Devine, Karen D [Discrete Algorithms and Math Department, Sandia National Laboratories (United States); Gebremedhin, Assefaw H [Computer Science and Center for Computational Science, Old Dominion University (United States); Hovland, Paul D [Mathematics and Computer Science Division, Argonne National Laboratory (United States); Pothen, Alex [Computer Science and Center for Computational Science, Old Dominion University (United States); Strout, Michelle Mills [Computer Science, Colorado State University (United States)
2007-07-15
The Combinatorial Scientific Computing and Petascale Simulations (CSCAPES) Institute is developing algorithms and software for combinatorial problems that play an enabling role in scientific and engineering computations. Discrete algorithms will be increasingly critical for achieving high performance for irregular problems on petascale architectures. This paper describes recent contributions by researchers at the CSCAPES Institute in the areas of load balancing, parallel graph coloring, performance improvement, and parallel automatic differentiation.
Energy Technology Data Exchange (ETDEWEB)
Jimenez, Edward S. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Orr, Laurel J. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Thompson, Kyle R. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
2013-09-01
The goal of this work is to develop a fast computed tomography (CT) reconstruction algorithm based on graphics processing units (GPU) that achieves significant improvement over traditional central processing unit (CPU) based implementations. The main challenge in developing a CT algorithm that is capable of handling very large datasets is parallelizing the algorithm in such a way that data transfer does not hinder performance of the reconstruction algorithm. General Purpose Graphics Processing (GPGPU) is a new technology that the Science and Technology (S&T) community is starting to adopt in many fields where CPU-based computing is the norm. GPGPU programming requires a new approach to algorithm development that utilizes massively multi-threaded environments. Multi-threaded algorithms in general are difficult to optimize since performance bottlenecks occur that are non-existent in single-threaded algorithms such as memory latencies. If an efficient GPU-based CT reconstruction algorithm can be developed; computational times could be improved by a factor of 20. Additionally, cost benefits will be realized as commodity graphics hardware could potentially replace expensive supercomputers and high-end workstations. This project will take advantage of the CUDA programming environment and attempt to parallelize the task in such a way that multiple slices of the reconstruction volume are computed simultaneously. This work will also take advantage of the GPU memory by utilizing asynchronous memory transfers, GPU texture memory, and (when possible) pinned host memory so that the memory transfer bottleneck inherent to GPGPU is amortized. Additionally, this work will take advantage of GPU-specific hardware (i.e. fast texture memory, pixel-pipelines, hardware interpolators, and varying memory hierarchy) that will allow for additional performance improvements.
A Faster Algorithm for Computing Straight Skeletons
Mencel, Liam A.
2014-01-01
computation in O(n (log n) log r) time. It improves on the previously best known algorithm for this reduction, which is randomised, and runs in expected O(n √(h+1) log² n) time for a polygon with h holes. Using known motorcycle graph algorithms, our result
Enhanced Particle Swarm Optimization Algorithm: Efficient Training of ReaxFF Reactive Force Fields.
Furman, David; Carmeli, Benny; Zeiri, Yehuda; Kosloff, Ronnie
2018-05-04
Particle swarm optimization is a powerful metaheuristic population-based global optimization algorithm. However, when applied to non-separable objective functions its performance on multimodal landscapes is significantly degraded. Here we show that a significant improvement in the search quality and efficiency on multimodal functions can be achieved by enhancing the basic rotation-invariant particle swarm optimization algorithm with isotropic Gaussian mutation operators. The new algorithm demonstrates a superior performance across several nonlinear, multimodal benchmark functions compared to the rotation-invariant Particle Swam Optimization (PSO) algorithm and the well-established simulated annealing and sequential one-parameter parabolic interpolation methods. A search for the optimal set of parameters for the dispersion interaction model in ReaxFF-lg reactive force field is carried out with respect to accurate DFT-TS calculations. The resulting optimized force field accurately describes the equations of state of several high-energy molecular crystals where such interactions are of crucial importance. The improved algorithm also presents a better performance compared to a Genetic Algorithm optimization method in the optimization of a ReaxFF-lg correction model parameters. The computational framework is implemented in a standalone C++ code that allows a straightforward development of ReaxFF reactive force fields.
An efficient CU partition algorithm for HEVC based on improved Sobel operator
Sun, Xuebin; Chen, Xiaodong; Xu, Yong; Sun, Gang; Yang, Yunsheng
2018-04-01
As the latest video coding standard, High Efficiency Video Coding (HEVC) achieves over 50% bit rate reduction with similar video quality compared with previous standards H.264/AVC. However, the higher compression efficiency is attained at the cost of significantly increasing computational load. In order to reduce the complexity, this paper proposes a fast coding unit (CU) partition technique to speed up the process. To detect the edge features of each CU, a more accurate improved Sobel filtering is developed and performed By analyzing the textural features of CU, an early CU splitting termination is proposed to decide whether a CU should be decomposed into four lower-dimensions CUs or not. Compared with the reference software HM16.7, experimental results indicate the proposed algorithm can lessen the encoding time up to 44.09% on average, with a negligible bit rate increase of 0.24%, and quality losses lower 0.03 dB, respectively. In addition, the proposed algorithm gets a better trade-off between complexity and rate-distortion among the other proposed works.
Rodríguez, Manuel; Magdaleno, Eduardo; Pérez, Fernando; García, Cristhian
2017-03-28
Non-equispaced Fast Fourier transform (NFFT) is a very important algorithm in several technological and scientific areas such as synthetic aperture radar, computational photography, medical imaging, telecommunications, seismic analysis and so on. However, its computation complexity is high. In this paper, we describe an efficient NFFT implementation with a hardware coprocessor using an All-Programmable System-on-Chip (APSoC). This is a hybrid device that employs an Advanced RISC Machine (ARM) as Processing System with Programmable Logic for high-performance digital signal processing through parallelism and pipeline techniques. The algorithm has been coded in C language with pragma directives to optimize the architecture of the system. We have used the very novel Software Develop System-on-Chip (SDSoC) evelopment tool that simplifies the interface and partitioning between hardware and software. This provides shorter development cycles and iterative improvements by exploring several architectures of the global system. The computational results shows that hardware acceleration significantly outperformed the software based implementation.
Computational algorithms for simulations in atmospheric optics.
Konyaev, P A; Lukin, V P
2016-04-20
A computer simulation technique for atmospheric and adaptive optics based on parallel programing is discussed. A parallel propagation algorithm is designed and a modified spectral-phase method for computer generation of 2D time-variant random fields is developed. Temporal power spectra of Laguerre-Gaussian beam fluctuations are considered as an example to illustrate the applications discussed. Implementation of the proposed algorithms using Intel MKL and IPP libraries and NVIDIA CUDA technology is shown to be very fast and accurate. The hardware system for the computer simulation is an off-the-shelf desktop with an Intel Core i7-4790K CPU operating at a turbo-speed frequency up to 5 GHz and an NVIDIA GeForce GTX-960 graphics accelerator with 1024 1.5 GHz processors.
Efficient convolutional sparse coding
Wohlberg, Brendt
2017-06-20
Computationally efficient algorithms may be applied for fast dictionary learning solving the convolutional sparse coding problem in the Fourier domain. More specifically, efficient convolutional sparse coding may be derived within an alternating direction method of multipliers (ADMM) framework that utilizes fast Fourier transforms (FFT) to solve the main linear system in the frequency domain. Such algorithms may enable a significant reduction in computational cost over conventional approaches by implementing a linear solver for the most critical and computationally expensive component of the conventional iterative algorithm. The theoretical computational cost of the algorithm may be reduced from O(M.sup.3N) to O(MN log N), where N is the dimensionality of the data and M is the number of elements in the dictionary. This significant improvement in efficiency may greatly increase the range of problems that can practically be addressed via convolutional sparse representations.
Fast algorithm for computing complex number-theoretic transforms
Reed, I. S.; Liu, K. Y.; Truong, T. K.
1977-01-01
A high-radix FFT algorithm for computing transforms over FFT, where q is a Mersenne prime, is developed to implement fast circular convolutions. This new algorithm requires substantially fewer multiplications than the conventional FFT.
A fast algorithm for sparse matrix computations related to inversion
International Nuclear Information System (INIS)
Li, S.; Wu, W.; Darve, E.
2013-01-01
We have developed a fast algorithm for computing certain entries of the inverse of a sparse matrix. Such computations are critical to many applications, such as the calculation of non-equilibrium Green’s functions G r and G for nano-devices. The FIND (Fast Inverse using Nested Dissection) algorithm is optimal in the big-O sense. However, in practice, FIND suffers from two problems due to the width-2 separators used by its partitioning scheme. One problem is the presence of a large constant factor in the computational cost of FIND. The other problem is that the partitioning scheme used by FIND is incompatible with most existing partitioning methods and libraries for nested dissection, which all use width-1 separators. Our new algorithm resolves these problems by thoroughly decomposing the computation process such that width-1 separators can be used, resulting in a significant speedup over FIND for realistic devices — up to twelve-fold in simulation. The new algorithm also has the added advantage that desired off-diagonal entries can be computed for free. Consequently, our algorithm is faster than the current state-of-the-art recursive methods for meshes of any size. Furthermore, the framework used in the analysis of our algorithm is the first attempt to explicitly apply the widely-used relationship between mesh nodes and matrix computations to the problem of multiple eliminations with reuse of intermediate results. This framework makes our algorithm easier to generalize, and also easier to compare against other methods related to elimination trees. Finally, our accuracy analysis shows that the algorithms that require back-substitution are subject to significant extra round-off errors, which become extremely large even for some well-conditioned matrices or matrices with only moderately large condition numbers. When compared to these back-substitution algorithms, our algorithm is generally a few orders of magnitude more accurate, and our produced round-off errors
A fast algorithm for sparse matrix computations related to inversion
Energy Technology Data Exchange (ETDEWEB)
Li, S., E-mail: lisong@stanford.edu [Institute for Computational and Mathematical Engineering, Stanford University, 496 Lomita Mall, Durand Building, Stanford, CA 94305 (United States); Wu, W. [Department of Electrical Engineering, Stanford University, 350 Serra Mall, Packard Building, Room 268, Stanford, CA 94305 (United States); Darve, E. [Institute for Computational and Mathematical Engineering, Stanford University, 496 Lomita Mall, Durand Building, Stanford, CA 94305 (United States); Department of Mechanical Engineering, Stanford University, 496 Lomita Mall, Durand Building, Room 209, Stanford, CA 94305 (United States)
2013-06-01
We have developed a fast algorithm for computing certain entries of the inverse of a sparse matrix. Such computations are critical to many applications, such as the calculation of non-equilibrium Green’s functions G{sup r} and G{sup <} for nano-devices. The FIND (Fast Inverse using Nested Dissection) algorithm is optimal in the big-O sense. However, in practice, FIND suffers from two problems due to the width-2 separators used by its partitioning scheme. One problem is the presence of a large constant factor in the computational cost of FIND. The other problem is that the partitioning scheme used by FIND is incompatible with most existing partitioning methods and libraries for nested dissection, which all use width-1 separators. Our new algorithm resolves these problems by thoroughly decomposing the computation process such that width-1 separators can be used, resulting in a significant speedup over FIND for realistic devices — up to twelve-fold in simulation. The new algorithm also has the added advantage that desired off-diagonal entries can be computed for free. Consequently, our algorithm is faster than the current state-of-the-art recursive methods for meshes of any size. Furthermore, the framework used in the analysis of our algorithm is the first attempt to explicitly apply the widely-used relationship between mesh nodes and matrix computations to the problem of multiple eliminations with reuse of intermediate results. This framework makes our algorithm easier to generalize, and also easier to compare against other methods related to elimination trees. Finally, our accuracy analysis shows that the algorithms that require back-substitution are subject to significant extra round-off errors, which become extremely large even for some well-conditioned matrices or matrices with only moderately large condition numbers. When compared to these back-substitution algorithms, our algorithm is generally a few orders of magnitude more accurate, and our produced round
An efficient algorithm for generating random number pairs drawn from a bivariate normal distribution
Campbell, C. W.
1983-01-01
An efficient algorithm for generating random number pairs from a bivariate normal distribution was developed. Any desired value of the two means, two standard deviations, and correlation coefficient can be selected. Theoretically the technique is exact and in practice its accuracy is limited only by the quality of the uniform distribution random number generator, inaccuracies in computer function evaluation, and arithmetic. A FORTRAN routine was written to check the algorithm and good accuracy was obtained. Some small errors in the correlation coefficient were observed to vary in a surprisingly regular manner. A simple model was developed which explained the qualities aspects of the errors.
Graphics processor efficiency for realization of rapid tabular computations
International Nuclear Information System (INIS)
Dudnik, V.A.; Kudryavtsev, V.I.; Us, S.A.; Shestakov, M.V.
2016-01-01
Capabilities of graphics processing units (GPU) and central processing units (CPU) have been investigated for realization of fast-calculation algorithms with the use of tabulated functions. The realization of tabulated functions is exemplified by the GPU/CPU architecture-based processors. Comparison is made between the operating efficiencies of GPU and CPU, employed for tabular calculations at different conditions of use. Recommendations are formulated for the use of graphical and central processors to speed up scientific and engineering computations through the use of tabulated functions
An energy-efficient failure detector for vehicular cloud computing.
Liu, Jiaxi; Wu, Zhibo; Dong, Jian; Wu, Jin; Wen, Dongxin
2018-01-01
Failure detectors are one of the fundamental components for maintaining the high availability of vehicular cloud computing. In vehicular cloud computing, lots of RSUs are deployed along the road to improve the connectivity. Many of them are equipped with solar battery due to the unavailability or excess expense of wired electrical power. So it is important to reduce the battery consumption of RSU. However, the existing failure detection algorithms are not designed to save battery consumption RSU. To solve this problem, a new energy-efficient failure detector 2E-FD has been proposed specifically for vehicular cloud computing. 2E-FD does not only provide acceptable failure detection service, but also saves the battery consumption of RSU. Through the comparative experiments, the results show that our failure detector has better performance in terms of speed, accuracy and battery consumption.
Pal, Partha S; Kar, R; Mandal, D; Ghoshal, S P
2015-11-01
This paper presents an efficient approach to identify different stable and practically useful Hammerstein models as well as unstable nonlinear process along with its stable closed loop counterpart with the help of an evolutionary algorithm as Colliding Bodies Optimization (CBO) optimization algorithm. The performance measures of the CBO based optimization approach such as precision, accuracy are justified with the minimum output mean square value (MSE) which signifies that the amount of bias and variance in the output domain are also the least. It is also observed that the optimization of output MSE in the presence of outliers has resulted in a very close estimation of the output parameters consistently, which also justifies the effective general applicability of the CBO algorithm towards the system identification problem and also establishes the practical usefulness of the applied approach. Optimum values of the MSEs, computational times and statistical information of the MSEs are all found to be the superior as compared with those of the other existing similar types of stochastic algorithms based approaches reported in different recent literature, which establish the robustness and efficiency of the applied CBO based identification scheme. Copyright © 2015 ISA. Published by Elsevier Ltd. All rights reserved.
Nakayama, Hiromasa
2006-01-01
We give an algorithm to compute the local $b$ function. In this algorithm, we use the Mora division algorithm in the ring of differential operators and an approximate division algorithm in the ring of differential operators with power series coefficient.
Parallel algorithms for mapping pipelined and parallel computations
Nicol, David M.
1988-01-01
Many computational problems in image processing, signal processing, and scientific computing are naturally structured for either pipelined or parallel computation. When mapping such problems onto a parallel architecture it is often necessary to aggregate an obvious problem decomposition. Even in this context the general mapping problem is known to be computationally intractable, but recent advances have been made in identifying classes of problems and architectures for which optimal solutions can be found in polynomial time. Among these, the mapping of pipelined or parallel computations onto linear array, shared memory, and host-satellite systems figures prominently. This paper extends that work first by showing how to improve existing serial mapping algorithms. These improvements have significantly lower time and space complexities: in one case a published O(nm sup 3) time algorithm for mapping m modules onto n processors is reduced to an O(nm log m) time complexity, and its space requirements reduced from O(nm sup 2) to O(m). Run time complexity is further reduced with parallel mapping algorithms based on these improvements, which run on the architecture for which they create the mappings.
Computer and machine vision theory, algorithms, practicalities
Davies, E R
2012-01-01
Computer and Machine Vision: Theory, Algorithms, Practicalities (previously entitled Machine Vision) clearly and systematically presents the basic methodology of computer and machine vision, covering the essential elements of the theory while emphasizing algorithmic and practical design constraints. This fully revised fourth edition has brought in more of the concepts and applications of computer vision, making it a very comprehensive and up-to-date tutorial text suitable for graduate students, researchers and R&D engineers working in this vibrant subject. Key features include: Practical examples and case studies give the 'ins and outs' of developing real-world vision systems, giving engineers the realities of implementing the principles in practice New chapters containing case studies on surveillance and driver assistance systems give practical methods on these cutting-edge applications in computer vision Necessary mathematics and essential theory are made approachable by careful explanations and well-il...
An algorithm to improve sampling efficiency for uncertainty propagation using sampling based method
International Nuclear Information System (INIS)
Campolina, Daniel; Lima, Paulo Rubens I.; Pereira, Claubia; Veloso, Maria Auxiliadora F.
2015-01-01
Sample size and computational uncertainty were varied in order to investigate sample efficiency and convergence of the sampling based method for uncertainty propagation. Transport code MCNPX was used to simulate a LWR model and allow the mapping, from uncertain inputs of the benchmark experiment, to uncertain outputs. Random sampling efficiency was improved through the use of an algorithm for selecting distributions. Mean range, standard deviation range and skewness were verified in order to obtain a better representation of uncertainty figures. Standard deviation of 5 pcm in the propagated uncertainties for 10 n-samples replicates was adopted as convergence criterion to the method. Estimation of 75 pcm uncertainty on reactor k eff was accomplished by using sample of size 93 and computational uncertainty of 28 pcm to propagate 1σ uncertainty of burnable poison radius. For a fixed computational time, in order to reduce the variance of the uncertainty propagated, it was found, for the example under investigation, it is preferable double the sample size than double the amount of particles followed by Monte Carlo process in MCNPX code. (author)
An Efficient Topology-Based Algorithm for Transient Analysis of Power Grid
Yang, Lan
2015-08-10
In the design flow of integrated circuits, chip-level verification is an important step that sanity checks the performance is as expected. Power grid verification is one of the most expensive and time-consuming steps of chip-level verification, due to its extremely large size. Efficient power grid analysis technology is highly demanded as it saves computing resources and enables faster iteration. In this paper, a topology-base power grid transient analysis algorithm is proposed. Nodal analysis is adopted to analyze the topology which is mathematically equivalent to iteratively solving a positive semi-definite linear equation. The convergence of the method is proved.
Multiway simple cycle separators and I/O-efficient algorithms for planar graphs
DEFF Research Database (Denmark)
Arge, L.; Walderveen, Freek van; Zeh, Norbert
2013-01-01
memory, where sort(N) is the number of I/Os needed to sort N items in external memory. The key, and the main technical contribution of this paper, is a multiway version of Miller's simple cycle separator theorem. We show how to compute these separators in linear time in internal memory, and using O...... in internal memory, thereby completely negating the performance gain achieved by minimizing the number of disk accesses. In this paper, we show how to make these algorithms simultaneously efficient in internal and external memory so they achieve I/O complexity O(sort(N)) and take O(N log N) time in internal......(sort(N)) I/Os and O(N log N) (internal-memory computation) time in external memory....
NDPA: A generalized efficient parallel in-place N-Dimensional Permutation Algorithm
Directory of Open Access Journals (Sweden)
Muhammad Elsayed Ali
2015-09-01
Full Text Available N-dimensional transpose/permutation is a very important operation in many large-scale data intensive and scientific applications. These applications include but not limited to oil industry i.e. seismic data processing, nuclear medicine, media production, digital signal processing and business intelligence. This paper proposes an efficient in-place N-dimensional permutation algorithm. The algorithm is based on a novel 3D transpose algorithm that was published recently. The proposed algorithm has been tested on 3D, 4D, 5D, 6D and 7D data sets as a proof of concept. This is the first contribution which is breaking the dimensions’ limitation of the base algorithm. The suggested algorithm exploits the idea of mixing both logical and physical permutations together. In the logical permutation, the address map is transposed for each data unit access. In the physical permutation, actual data elements are swapped. Both permutation levels exploit the fast on-chip memory bandwidth by transferring large amount of data and allowing for fine-grain SIMD (Single Instruction, Multiple Data operations. Thus, the performance is improved as evident from the experimental results section. The algorithm is implemented on NVidia GeForce GTS 250 GPU (Graphics Processing Unit containing 128 cores. The rapid increase in GPUs performance coupled with the recent and continuous improvements in its programmability proved that GPUs are the right choice for computationally demanding tasks. The use of GPUs is the second contribution which reflects how strongly they fit for high performance tasks. The third contribution is improving the proposed algorithm performance to its peak as discussed in the results section.
Efficient 3D geometric and Zernike moments computation from unstructured surface meshes.
Pozo, José María; Villa-Uriol, Maria-Cruz; Frangi, Alejandro F
2011-03-01
This paper introduces and evaluates a fast exact algorithm and a series of faster approximate algorithms for the computation of 3D geometric moments from an unstructured surface mesh of triangles. Being based on the object surface reduces the computational complexity of these algorithms with respect to volumetric grid-based algorithms. In contrast, it can only be applied for the computation of geometric moments of homogeneous objects. This advantage and restriction is shared with other proposed algorithms based on the object boundary. The proposed exact algorithm reduces the computational complexity for computing geometric moments up to order N with respect to previously proposed exact algorithms, from N(9) to N(6). The approximate series algorithm appears as a power series on the rate between triangle size and object size, which can be truncated at any desired degree. The higher the number and quality of the triangles, the better the approximation. This approximate algorithm reduces the computational complexity to N(3). In addition, the paper introduces a fast algorithm for the computation of 3D Zernike moments from the computed geometric moments, with a computational complexity N(4), while the previously proposed algorithm is of order N(6). The error introduced by the proposed approximate algorithms is evaluated in different shapes and the cost-benefit ratio in terms of error, and computational time is analyzed for different moment orders.
FPGA Implementation of Computer Vision Algorithm
Zhou, Zhonghua
2014-01-01
Computer vision algorithms, which play an significant role in vision processing, is widely applied in many aspects such as geology survey, traffic management and medical care, etc.. Most of the situations require the process to be real-timed, in other words, as fast as possible. Field Programmable Gate Arrays (FPGAs) have a advantage of parallelism fabric in programming, comparing to the serial communications of CPUs, which makes FPGA a perfect platform for implementing vision algorithms. The...
Efficient On-the-fly Algorithms for the Analysis of Timed Games
DEFF Research Database (Denmark)
Cassez, Franck; David, Alexandre; Fleury, Emmanuel
2005-01-01
In this paper, we propose the first efficient on-the-fly algorithm for solving games based on timed game automata with respect to reachability and safety properties The algorithm we propose is a symbolic extension of the on-the-fly algorithm suggested by Liu & Smolka [15] for linear-time model-ch...... symbolic algorithm are proposed as well as methods for obtaining time-optimal winning strategies (for reachability games). Extensive evaluation of an experimental implementation of the algorithm yields very encouraging performance results.......In this paper, we propose the first efficient on-the-fly algorithm for solving games based on timed game automata with respect to reachability and safety properties The algorithm we propose is a symbolic extension of the on-the-fly algorithm suggested by Liu & Smolka [15] for linear-time model...
Efficient Fourier-based algorithms for time-periodic unsteady problems
Gopinath, Arathi Kamath
2007-12-01
This dissertation work proposes two algorithms for the simulation of time-periodic unsteady problems via the solution of Unsteady Reynolds-Averaged Navier-Stokes (URANS) equations. These algorithms use a Fourier representation in time and hence solve for the periodic state directly without resolving transients (which consume most of the resources in a time-accurate scheme). In contrast to conventional Fourier-based techniques which solve the governing equations in frequency space, the new algorithms perform all the calculations in the time domain, and hence require minimal modifications to an existing solver. The complete space-time solution is obtained by iterating in a fifth pseudo-time dimension. Various time-periodic problems such as helicopter rotors, wind turbines, turbomachinery and flapping-wings can be simulated using the Time Spectral method. The algorithm is first validated using pitching airfoil/wing test cases. The method is further extended to turbomachinery problems, and computational results verified by comparison with a time-accurate calculation. The technique can be very memory intensive for large problems, since the solution is computed (and hence stored) simultaneously at all time levels. Often, the blade counts of a turbomachine are rescaled such that a periodic fraction of the annulus can be solved. This approximation enables the solution to be obtained at a fraction of the cost of a full-scale time-accurate solution. For a viscous computation over a three-dimensional single-stage rescaled compressor, an order of magnitude savings is achieved. The second algorithm, the reduced-order Harmonic Balance method is applicable only to turbomachinery flows, and offers even larger computational savings than the Time Spectral method. It simulates the true geometry of the turbomachine using only one blade passage per blade row as the computational domain. In each blade row of the turbomachine, only the dominant frequencies are resolved, namely
Associative Algorithms for Computational Creativity
Varshney, Lav R.; Wang, Jun; Varshney, Kush R.
2016-01-01
Computational creativity, the generation of new, unimagined ideas or artifacts by a machine that are deemed creative by people, can be applied in the culinary domain to create novel and flavorful dishes. In fact, we have done so successfully using a combinatorial algorithm for recipe generation combined with statistical models for recipe ranking…
International Nuclear Information System (INIS)
Kalkreuter, T.; Simma, H.
1995-07-01
The low-lying eigenvalues of a (sparse) hermitian matrix can be computed with controlled numerical errors by a conjugate gradient (CG) method. This CG algorithm is accelerated by alternating it with exact diagonalizations in the subspace spanned by the numerically computed eigenvectors. We study this combined algorithm in case of the Dirac operator with (dynamical) Wilson fermions in four-dimensional SU(2) gauge fields. The algorithm is numerically very stable and can be parallelized in an efficient way. On lattices of sizes 4 4 - 16 4 an acceleration of the pure CG method by a factor of 4 - 8 is found. (orig.)
Conjugate Gradient Algorithms For Manipulator Simulation
Fijany, Amir; Scheid, Robert E.
1991-01-01
Report discusses applicability of conjugate-gradient algorithms to computation of forward dynamics of robotic manipulators. Rapid computation of forward dynamics essential to teleoperation and other advanced robotic applications. Part of continuing effort to find algorithms meeting requirements for increased computational efficiency and speed. Method used for iterative solution of systems of linear equations.
Ferraro Petrillo, Umberto; Roscigno, Gianluca; Cattaneo, Giuseppe; Giancarlo, Raffaele
2018-06-01
Information theoretic and compositional/linguistic analysis of genomes have a central role in bioinformatics, even more so since the associated methodologies are becoming very valuable also for epigenomic and meta-genomic studies. The kernel of those methods is based on the collection of k-mer statistics, i.e. how many times each k-mer in {A,C,G,T}k occurs in a DNA sequence. Although this problem is computationally very simple and efficiently solvable on a conventional computer, the sheer amount of data available now in applications demands to resort to parallel and distributed computing. Indeed, those type of algorithms have been developed to collect k-mer statistics in the realm of genome assembly. However, they are so specialized to this domain that they do not extend easily to the computation of informational and linguistic indices, concurrently on sets of genomes. Following the well-established approach in many disciplines, and with a growing success also in bioinformatics, to resort to MapReduce and Hadoop to deal with 'Big Data' problems, we present KCH, the first set of MapReduce algorithms able to perform concurrently informational and linguistic analysis of large collections of genomic sequences on a Hadoop cluster. The benchmarking of KCH that we provide indicates that it is quite effective and versatile. It is also competitive with respect to the parallel and distributed algorithms highly specialized to k-mer statistics collection for genome assembly problems. In conclusion, KCH is a much needed addition to the growing number of algorithms and tools that use MapReduce for bioinformatics core applications. The software, including instructions for running it over Amazon AWS, as well as the datasets are available at http://www.di-srv.unisa.it/KCH. umberto.ferraro@uniroma1.it. Supplementary data are available at Bioinformatics online.
An Efficiency Analysis of Augmented Reality Marker Recognition Algorithm
Directory of Open Access Journals (Sweden)
Kurpytė Dovilė
2014-05-01
Full Text Available The article reports on the investigation of augmented reality system which is designed for identification and augmentation of 100 different square markers. Marker recognition efficiency was investigated by rotating markers along x and y axis directions in range from −90° to 90°. Virtual simulations of four environments were developed: a an intense source of light, b an intense source of light falling from the left side, c the non-intensive light source falling from the left side, d equally falling shadows. The graphics were created using the OpenGL graphics computer hardware interface; image processing was programmed in C++ language using OpenCV, while augmented reality was developed in Java programming language using NyARToolKit. The obtained results demonstrate that augmented reality marker recognition algorithm is accurate and reliable in the case of changing lighting conditions and rotational angles - only 4 % markers were unidentified. Assessment of marker recognition efficiency let to propose marker classification strategy in order to use it for grouping various markers into distinct markers’ groups possessing similar recognition properties.
Fast fourier algorithms in spectral computation and analysis of vibrating machines
International Nuclear Information System (INIS)
Farooq, U.; Hafeez, T.; Khan, M.Z.; Amir, M.
2001-01-01
In this work we have discussed Fourier and its history series, relationships among various Fourier mappings, Fourier coefficients, transforms, inverse transforms, integrals, analyses, discrete and fast algorithms for data processing and analysis of vibrating systems. The evaluation of magnitude of the source signal at transmission time, related coefficient matrix, intensity, and magnitude at the receiving end (stations). Matrix computation of Fourier transform has been explained, and applications are presented. The fast Fourier transforms, new computational scheme. have been tested with an example. The work also includes digital programs for obtaining the frequency contents of time function. It has been explained that how the fast Fourier algorithms (FFT) has decreased computational work by several order of magnitudes and split the spectrum of a signal into two (even and odd modes) at every successive step. That fast quantitative processing for discrete Fourier transforms' computations as well as signal splitting and combination provides an efficient. and reliable tool for spectral analyses. Fourier series decompose the given variable into a sum of oscillatory functions each having a specific frequency. These frequencies, with their corresponding amplitude and phase angles, constitute the frequency contents of the original time functions. These fast processing achievements, signals decomposition and combination may be carried out by the principle of superposition and convolution for, even, signals of different frequencies. Considerable information about a machine or a structure can be derived from variable speed and frequency tests. (author)
An improved algorithm for connectivity analysis of distribution networks
International Nuclear Information System (INIS)
Kansal, M.L.; Devi, Sunita
2007-01-01
In the present paper, an efficient algorithm for connectivity analysis of moderately sized distribution networks has been suggested. Algorithm is based on generation of all possible minimal system cutsets. The algorithm is efficient as it identifies only the necessary and sufficient conditions of system failure conditions in n-out-of-n type of distribution networks. The proposed algorithm is demonstrated with the help of saturated and unsaturated distribution networks. The computational efficiency of the algorithm is justified by comparing the computational efforts with the previously suggested appended spanning tree (AST) algorithm. The proposed technique has the added advantage as it can be utilized for generation of system inequalities which is useful in reliability estimation of capacitated networks
Fault-tolerant search algorithms reliable computation with unreliable information
Cicalese, Ferdinando
2013-01-01
Why a book on fault-tolerant search algorithms? Searching is one of the fundamental problems in computer science. Time and again algorithmic and combinatorial issues originally studied in the context of search find application in the most diverse areas of computer science and discrete mathematics. On the other hand, fault-tolerance is a necessary ingredient of computing. Due to their inherent complexity, information systems are naturally prone to errors, which may appear at any level - as imprecisions in the data, bugs in the software, or transient or permanent hardware failures. This book pr
Realization of seven-qubit Deutsch-Jozsa algorithm on NMR quantum computer
International Nuclear Information System (INIS)
Wei Daxiu; Yang Xiaodong; Luo Jun; Sun Xianping; Zeng Xizhi; Liu Maili; Ding Shangwu
2002-01-01
Recent years, remarkable progresses in experimental realization of quantum information have been made, especially based on nuclear magnetic resonance (NMR) theory. In all quantum algorithms, Deutsch-Jozsa algorithm has been widely studied. It can be realized on NMR quantum computer and also can be simplified by using the Cirac's scheme. At first the principle of Deutsch-Jozsa quantum algorithm is analyzed, then the authors implement the seven-qubit Deutsch-Jozsa algorithm on NMR quantum computer
Simoncini, Valeria
2016-01-01
Focusing on special matrices and matrices which are in some sense "near" to structured matrices, this volume covers a broad range of topics of current interest in numerical linear algebra. Exploitation of these less obvious structural properties can be of great importance in the design of efficient numerical methods, for example algorithms for matrices with low-rank block structure, matrices with decay, and structured tensor computations. Applications range from quantum chemistry to queuing theory. Structured matrices arise frequently in applications. Examples include banded and sparse matrices, Toeplitz-type matrices, and matrices with semi-separable or quasi-separable structure, as well as Hamiltonian and symplectic matrices. The associated literature is enormous, and many efficient algorithms have been developed for solving problems involving such matrices. The text arose from a C.I.M.E. course held in Cetraro (Italy) in June 2015 which aimed to present this fast growing field to young researchers, exploit...
An algorithm of computing inhomogeneous differential equations for definite integrals
Nakayama, Hiromasa; Nishiyama, Kenta
2010-01-01
We give an algorithm to compute inhomogeneous differential equations for definite integrals with parameters. The algorithm is based on the integration algorithm for $D$-modules by Oaku. Main tool in the algorithm is the Gr\\"obner basis method in the ring of differential operators.
Efficient algorithms and implementations of entropy-based moment closures for rarefied gases
International Nuclear Information System (INIS)
Schaerer, Roman Pascal; Bansal, Pratyuksh; Torrilhon, Manuel
2017-01-01
We present efficient algorithms and implementations of the 35-moment system equipped with the maximum-entropy closure in the context of rarefied gases. While closures based on the principle of entropy maximization have been shown to yield very promising results for moderately rarefied gas flows, the computational cost of these closures is in general much higher than for closure theories with explicit closed-form expressions of the closing fluxes, such as Grad's classical closure. Following a similar approach as Garrett et al. (2015) , we investigate efficient implementations of the computationally expensive numerical quadrature method used for the moment evaluations of the maximum-entropy distribution by exploiting its inherent fine-grained parallelism with the parallelism offered by multi-core processors and graphics cards. We show that using a single graphics card as an accelerator allows speed-ups of two orders of magnitude when compared to a serial CPU implementation. To accelerate the time-to-solution for steady-state problems, we propose a new semi-implicit time discretization scheme. The resulting nonlinear system of equations is solved with a Newton type method in the Lagrange multipliers of the dual optimization problem in order to reduce the computational cost. Additionally, fully explicit time-stepping schemes of first and second order accuracy are presented. We investigate the accuracy and efficiency of the numerical schemes for several numerical test cases, including a steady-state shock-structure problem.
Efficient algorithms and implementations of entropy-based moment closures for rarefied gases
Schaerer, Roman Pascal; Bansal, Pratyuksh; Torrilhon, Manuel
2017-07-01
We present efficient algorithms and implementations of the 35-moment system equipped with the maximum-entropy closure in the context of rarefied gases. While closures based on the principle of entropy maximization have been shown to yield very promising results for moderately rarefied gas flows, the computational cost of these closures is in general much higher than for closure theories with explicit closed-form expressions of the closing fluxes, such as Grad's classical closure. Following a similar approach as Garrett et al. (2015) [13], we investigate efficient implementations of the computationally expensive numerical quadrature method used for the moment evaluations of the maximum-entropy distribution by exploiting its inherent fine-grained parallelism with the parallelism offered by multi-core processors and graphics cards. We show that using a single graphics card as an accelerator allows speed-ups of two orders of magnitude when compared to a serial CPU implementation. To accelerate the time-to-solution for steady-state problems, we propose a new semi-implicit time discretization scheme. The resulting nonlinear system of equations is solved with a Newton type method in the Lagrange multipliers of the dual optimization problem in order to reduce the computational cost. Additionally, fully explicit time-stepping schemes of first and second order accuracy are presented. We investigate the accuracy and efficiency of the numerical schemes for several numerical test cases, including a steady-state shock-structure problem.
Efficient algorithms and implementations of entropy-based moment closures for rarefied gases
Energy Technology Data Exchange (ETDEWEB)
Schaerer, Roman Pascal, E-mail: schaerer@mathcces.rwth-aachen.de; Bansal, Pratyuksh; Torrilhon, Manuel
2017-07-01
We present efficient algorithms and implementations of the 35-moment system equipped with the maximum-entropy closure in the context of rarefied gases. While closures based on the principle of entropy maximization have been shown to yield very promising results for moderately rarefied gas flows, the computational cost of these closures is in general much higher than for closure theories with explicit closed-form expressions of the closing fluxes, such as Grad's classical closure. Following a similar approach as Garrett et al. (2015) , we investigate efficient implementations of the computationally expensive numerical quadrature method used for the moment evaluations of the maximum-entropy distribution by exploiting its inherent fine-grained parallelism with the parallelism offered by multi-core processors and graphics cards. We show that using a single graphics card as an accelerator allows speed-ups of two orders of magnitude when compared to a serial CPU implementation. To accelerate the time-to-solution for steady-state problems, we propose a new semi-implicit time discretization scheme. The resulting nonlinear system of equations is solved with a Newton type method in the Lagrange multipliers of the dual optimization problem in order to reduce the computational cost. Additionally, fully explicit time-stepping schemes of first and second order accuracy are presented. We investigate the accuracy and efficiency of the numerical schemes for several numerical test cases, including a steady-state shock-structure problem.
Cloud computing task scheduling strategy based on improved differential evolution algorithm
Ge, Junwei; He, Qian; Fang, Yiqiu
2017-04-01
In order to optimize the cloud computing task scheduling scheme, an improved differential evolution algorithm for cloud computing task scheduling is proposed. Firstly, the cloud computing task scheduling model, according to the model of the fitness function, and then used improved optimization calculation of the fitness function of the evolutionary algorithm, according to the evolution of generation of dynamic selection strategy through dynamic mutation strategy to ensure the global and local search ability. The performance test experiment was carried out in the CloudSim simulation platform, the experimental results show that the improved differential evolution algorithm can reduce the cloud computing task execution time and user cost saving, good implementation of the optimal scheduling of cloud computing tasks.
Fast algorithms for computing phylogenetic divergence time.
Crosby, Ralph W; Williams, Tiffani L
2017-12-06
The inference of species divergence time is a key step in most phylogenetic studies. Methods have been available for the last ten years to perform the inference, but the performance of the methods does not yet scale well to studies with hundreds of taxa and thousands of DNA base pairs. For example a study of 349 primate taxa was estimated to require over 9 months of processing time. In this work, we present a new algorithm, AncestralAge, that significantly improves the performance of the divergence time process. As part of AncestralAge, we demonstrate a new method for the computation of phylogenetic likelihood and our experiments show a 90% improvement in likelihood computation time on the aforementioned dataset of 349 primates taxa with over 60,000 DNA base pairs. Additionally, we show that our new method for the computation of the Bayesian prior on node ages reduces the running time for this computation on the 349 taxa dataset by 99%. Through the use of these new algorithms we open up the ability to perform divergence time inference on large phylogenetic studies.
Zhu, Ying; Herbert, John M.
2018-01-01
The "real time" formulation of time-dependent density functional theory (TDDFT) involves integration of the time-dependent Kohn-Sham (TDKS) equation in order to describe the time evolution of the electron density following a perturbation. This approach, which is complementary to the more traditional linear-response formulation of TDDFT, is more efficient for computation of broad-band spectra (including core-excited states) and for systems where the density of states is large. Integration of the TDKS equation is complicated by the time-dependent nature of the effective Hamiltonian, and we introduce several predictor/corrector algorithms to propagate the density matrix, one of which can be viewed as a self-consistent extension of the widely used modified-midpoint algorithm. The predictor/corrector algorithms facilitate larger time steps and are shown to be more efficient despite requiring more than one Fock build per time step, and furthermore can be used to detect a divergent simulation on-the-fly, which can then be halted or else the time step modified.
A New Efficient Algorithm for the 2D WLP-FDTD Method Based on Domain Decomposition Technique
Directory of Open Access Journals (Sweden)
Bo-Ao Xu
2016-01-01
Full Text Available This letter introduces a new efficient algorithm for the two-dimensional weighted Laguerre polynomials finite difference time-domain (WLP-FDTD method based on domain decomposition scheme. By using the domain decomposition finite difference technique, the whole computational domain is decomposed into several subdomains. The conventional WLP-FDTD and the efficient WLP-FDTD methods are, respectively, used to eliminate the splitting error and speed up the calculation in different subdomains. A joint calculation scheme is presented to reduce the amount of calculation. Through our work, the iteration is not essential to obtain the accurate results. Numerical example indicates that the efficiency and accuracy are improved compared with the efficient WLP-FDTD method.
Efficient Algorithm and Architecture of Critical-Band Transform for Low-Power Speech Applications
Directory of Open Access Journals (Sweden)
Gan Woon-Seng
2007-01-01
Full Text Available An efficient algorithm and its corresponding VLSI architecture for the critical-band transform (CBT are developed to approximate the critical-band filtering of the human ear. The CBT consists of a constant-bandwidth transform in the lower frequency range and a Brown constant- transform (CQT in the higher frequency range. The corresponding VLSI architecture is proposed to achieve significant power efficiency by reducing the computational complexity, using pipeline and parallel processing, and applying the supply voltage scaling technique. A 21-band Bark scale CBT processor with a sampling rate of 16 kHz is designed and simulated. Simulation results verify its suitability for performing short-time spectral analysis on speech. It has a better fitting on the human ear critical-band analysis, significantly fewer computations, and therefore is more energy-efficient than other methods. With a 0.35 m CMOS technology, it calculates a 160-point speech in 4.99 milliseconds at 234 kHz. The power dissipation is 15.6 W at 1.1 V. It achieves 82.1 power reduction as compared to a benchmark 256-point FFT processor.
Directory of Open Access Journals (Sweden)
Feuerriegel Stefan
2015-12-01
Full Text Available The Lanczos algorithm is among the most frequently used iterative techniques for computing a few dominant eigenvalues of a large sparse non-symmetric matrix. At the same time, it serves as a building block within biconjugate gradient (BiCG and quasi-minimal residual (QMR methods for solving large sparse non-symmetric systems of linear equations. It is well known that, when implemented on distributed-memory computers with a huge number of processes, the synchronization time spent on computing dot products increasingly limits the parallel scalability. Therefore, we propose synchronization-reducing variants of the Lanczos, as well as BiCG and QMR methods, in an attempt to mitigate these negative performance effects. These so-called s-step algorithms are based on grouping dot products for joint execution and replacing time-consuming matrix operations by efficient vector recurrences. The purpose of this paper is to provide a rigorous derivation of the recurrences for the s-step Lanczos algorithm, introduce s-step BiCG and QMR variants, and compare the parallel performance of these new s-step versions with previous algorithms.
Efficient scheduling request algorithm for opportunistic wireless access
Nam, Haewoon
2011-08-01
An efficient scheduling request algorithm for opportunistic wireless access based on user grouping is proposed in this paper. Similar to the well-known opportunistic splitting algorithm, the proposed algorithm initially adjusts (or lowers) the threshold during a guard period if no user sends a scheduling request. However, if multiple users make requests simultaneously and therefore a collision occurs, the proposed algorithm no longer updates the threshold but narrows down the user search space by splitting the users into multiple groups iteratively, whereas the opportunistic splitting algorithm keeps adjusting the threshold until a single user is found. Since the threshold is only updated when no user sends a request, it is shown that the proposed algorithm significantly alleviates the burden of the signaling for the threshold distribution to the users by the scheduler. More importantly, the proposed algorithm requires a less number of mini-slots to make a user selection given a certain scheduling outage probability. © 2011 IEEE.
Highly Efficient Compression Algorithms for Multichannel EEG.
Shaw, Laxmi; Rahman, Daleef; Routray, Aurobinda
2018-05-01
The difficulty associated with processing and understanding the high dimensionality of electroencephalogram (EEG) data requires developing efficient and robust compression algorithms. In this paper, different lossless compression techniques of single and multichannel EEG data, including Huffman coding, arithmetic coding, Markov predictor, linear predictor, context-based error modeling, multivariate autoregression (MVAR), and a low complexity bivariate model have been examined and their performances have been compared. Furthermore, a high compression algorithm named general MVAR and a modified context-based error modeling for multichannel EEG have been proposed. The resulting compression algorithm produces a higher relative compression ratio of 70.64% on average compared with the existing methods, and in some cases, it goes up to 83.06%. The proposed methods are designed to compress a large amount of multichannel EEG data efficiently so that the data storage and transmission bandwidth can be effectively used. These methods have been validated using several experimental multichannel EEG recordings of different subjects and publicly available standard databases. The satisfactory parametric measures of these methods, namely percent-root-mean square distortion, peak signal-to-noise ratio, root-mean-square error, and cross correlation, show their superiority over the state-of-the-art compression methods.
Lyster, Peter M.; Guo, J.; Clune, T.; Larson, J. W.; Atlas, Robert (Technical Monitor)
2001-01-01
The computational complexity of algorithms for Four Dimensional Data Assimilation (4DDA) at NASA's Data Assimilation Office (DAO) is discussed. In 4DDA, observations are assimilated with the output of a dynamical model to generate best-estimates of the states of the system. It is thus a mapping problem, whereby scattered observations are converted into regular accurate maps of wind, temperature, moisture and other variables. The DAO is developing and using 4DDA algorithms that provide these datasets, or analyses, in support of Earth System Science research. Two large-scale algorithms are discussed. The first approach, the Goddard Earth Observing System Data Assimilation System (GEOS DAS), uses an atmospheric general circulation model (GCM) and an observation-space based analysis system, the Physical-space Statistical Analysis System (PSAS). GEOS DAS is very similar to global meteorological weather forecasting data assimilation systems, but is used at NASA for climate research. Systems of this size typically run at between 1 and 20 gigaflop/s. The second approach, the Kalman filter, uses a more consistent algorithm to determine the forecast error covariance matrix than does GEOS DAS. For atmospheric assimilation, the gridded dynamical fields typically have More than 10(exp 6) variables, therefore the full error covariance matrix may be in excess of a teraword. For the Kalman filter this problem can easily scale to petaflop/s proportions. We discuss the computational complexity of GEOS DAS and our implementation of the Kalman filter. We also discuss and quantify some of the technical issues and limitations in developing efficient, in terms of wall clock time, and scalable parallel implementations of the algorithms.
Li, Tiejun; Min, Bin; Wang, Zhiming
2013-03-14
The stochastic integral ensuring the Newton-Leibnitz chain rule is essential in stochastic energetics. Marcus canonical integral has this property and can be understood as the Wong-Zakai type smoothing limit when the driving process is non-Gaussian. However, this important concept seems not well-known for physicists. In this paper, we discuss Marcus integral for non-Gaussian processes and its computation in the context of stochastic energetics. We give a comprehensive introduction to Marcus integral and compare three equivalent definitions in the literature. We introduce the exact pathwise simulation algorithm and give the error analysis. We show how to compute the thermodynamic quantities based on the pathwise simulation algorithm. We highlight the information hidden in the Marcus mapping, which plays the key role in determining thermodynamic quantities. We further propose the tau-leaping algorithm, which advance the process with deterministic time steps when tau-leaping condition is satisfied. The numerical experiments and its efficiency analysis show that it is very promising.
Efficient computation of hashes
International Nuclear Information System (INIS)
Lopes, Raul H C; Franqueira, Virginia N L; Hobson, Peter R
2014-01-01
The sequential computation of hashes at the core of many distributed storage systems and found, for example, in grid services can hinder efficiency in service quality and even pose security challenges that can only be addressed by the use of parallel hash tree modes. The main contributions of this paper are, first, the identification of several efficiency and security challenges posed by the use of sequential hash computation based on the Merkle-Damgard engine. In addition, alternatives for the parallel computation of hash trees are discussed, and a prototype for a new parallel implementation of the Keccak function, the SHA-3 winner, is introduced.
Efficient Record Linkage Algorithms Using Complete Linkage Clustering.
Mamun, Abdullah-Al; Aseltine, Robert; Rajasekaran, Sanguthevar
2016-01-01
Data from different agencies share data of the same individuals. Linking these datasets to identify all the records belonging to the same individuals is a crucial and challenging problem, especially given the large volumes of data. A large number of available algorithms for record linkage are prone to either time inefficiency or low-accuracy in finding matches and non-matches among the records. In this paper we propose efficient as well as reliable sequential and parallel algorithms for the record linkage problem employing hierarchical clustering methods. We employ complete linkage hierarchical clustering algorithms to address this problem. In addition to hierarchical clustering, we also use two other techniques: elimination of duplicate records and blocking. Our algorithms use sorting as a sub-routine to identify identical copies of records. We have tested our algorithms on datasets with millions of synthetic records. Experimental results show that our algorithms achieve nearly 100% accuracy. Parallel implementations achieve almost linear speedups. Time complexities of these algorithms do not exceed those of previous best-known algorithms. Our proposed algorithms outperform previous best-known algorithms in terms of accuracy consuming reasonable run times.
A recursive algorithm for computing the inverse of the Vandermonde matrix
Directory of Open Access Journals (Sweden)
Youness Aliyari Ghassabeh
2016-12-01
Full Text Available The inverse of a Vandermonde matrix has been used for signal processing, polynomial interpolation, curve fitting, wireless communication, and system identification. In this paper, we propose a novel fast recursive algorithm to compute the inverse of a Vandermonde matrix. The algorithm computes the inverse of a higher order Vandermonde matrix using the available lower order inverse matrix with a computational cost of $ O(n^2 $. The proposed algorithm is given in a matrix form, which makes it appropriate for hardware implementation. The running time of the proposed algorithm to find the inverse of a Vandermonde matrix using a lower order Vandermonde matrix is compared with the running time of the matrix inversion function implemented in MATLAB.
Pap-smear Classification Using Efficient Second Order Neural Network Training Algorithms
DEFF Research Database (Denmark)
Ampazis, Nikolaos; Dounias, George; Jantzen, Jan
2004-01-01
In this paper we make use of two highly efficient second order neural network training algorithms, namely the LMAM (Levenberg-Marquardt with Adaptive Momentum) and OLMAM (Optimized Levenberg-Marquardt with Adaptive Momentum), for the construction of an efficient pap-smear test classifier. The alg......In this paper we make use of two highly efficient second order neural network training algorithms, namely the LMAM (Levenberg-Marquardt with Adaptive Momentum) and OLMAM (Optimized Levenberg-Marquardt with Adaptive Momentum), for the construction of an efficient pap-smear test classifier...
3rd International Conference on Computational Mathematics and Computational Geometry
Ravindran, Anton
2016-01-01
This volume presents original research contributed to the 3rd Annual International Conference on Computational Mathematics and Computational Geometry (CMCGS 2014), organized and administered by Global Science and Technology Forum (GSTF). Computational Mathematics and Computational Geometry are closely related subjects, but are often studied by separate communities and published in different venues. This volume is unique in its combination of these topics. After the conference, which took place in Singapore, selected contributions chosen for this volume and peer-reviewed. The section on Computational Mathematics contains papers that are concerned with developing new and efficient numerical algorithms for mathematical sciences or scientific computing. They also cover analysis of such algorithms to assess accuracy and reliability. The parts of this project that are related to Computational Geometry aim to develop effective and efficient algorithms for geometrical applications such as representation and computati...
Pap-smear Classification Using Efficient Second Order Neural Network Training Algorithms
DEFF Research Database (Denmark)
Ampazis, Nikolaos; Dounias, George; Jantzen, Jan
2004-01-01
In this paper we make use of two highly efficient second order neural network training algorithms, namely the LMAM (Levenberg-Marquardt with Adaptive Momentum) and OLMAM (Optimized Levenberg-Marquardt with Adaptive Momentum), for the construction of an efficient pap-smear test classifier. The alg......In this paper we make use of two highly efficient second order neural network training algorithms, namely the LMAM (Levenberg-Marquardt with Adaptive Momentum) and OLMAM (Optimized Levenberg-Marquardt with Adaptive Momentum), for the construction of an efficient pap-smear test classifier....... The algorithms are methodologically similar, and are based on iterations of the form employed in the Levenberg-Marquardt (LM) method for non-linear least squares problems with the inclusion of an additional adaptive momentum term arising from the formulation of the training task as a constrained optimization...
High-speed computation of the EM algorithm for PET image reconstruction
International Nuclear Information System (INIS)
Rajan, K.; Patnaik, L.M.; Ramakrishna, J.
1994-01-01
The PET image reconstruction based on the EM algorithm has several attractive advantages over the conventional convolution backprojection algorithms. However, two major drawbacks have impeded the routine use of the EM algorithm, namely, the long computational time due to slow convergence and the large memory required for the storage of the image, projection data and the probability matrix. In this study, the authors attempts to solve these two problems by parallelizing the EM algorithm on a multiprocessor system. The authors have implemented an extended hypercube (EH) architecture for the high-speed computation of the EM algorithm using the commercially available fast floating point digital signal processor (DSP) chips as the processing elements (PEs). The authors discuss and compare the performance of the EM algorithm on a 386/387 machine, CD 4360 mainframe, and on the EH system. The results show that the computational speed performance of an EH using DSP chips as PEs executing the EM image reconstruction algorithm is about 130 times better than that of the CD 4360 mainframe. The EH topology is expandable with more number of PEs
DEFF Research Database (Denmark)
Bilardi, Gianfranco; Pietracaprina, Andrea; Pucci, Geppino
2016-01-01
A framework is proposed for the design and analysis of network-oblivious algorithms, namely algorithms that can run unchanged, yet efficiently, on a variety of machines characterized by different degrees of parallelism and communication capabilities. The framework prescribes that a network......-oblivious algorithm be specified on a parallel model of computation where the only parameter is the problem’s input size, and then evaluated on a model with two parameters, capturing parallelism granularity and communication latency. It is shown that for a wide class of network-oblivious algorithms, optimality...... of cache hierarchies, to the realm of parallel computation. Its effectiveness is illustrated by providing optimal network-oblivious algorithms for a number of key problems. Some limitations of the oblivious approach are also discussed....
Efficient computation of smoothing splines via adaptive basis sampling
Ma, Ping
2015-06-24
© 2015 Biometrika Trust. Smoothing splines provide flexible nonparametric regression estimators. However, the high computational cost of smoothing splines for large datasets has hindered their wide application. In this article, we develop a new method, named adaptive basis sampling, for efficient computation of smoothing splines in super-large samples. Except for the univariate case where the Reinsch algorithm is applicable, a smoothing spline for a regression problem with sample size n can be expressed as a linear combination of n basis functions and its computational complexity is generally O(n^{3}). We achieve a more scalable computation in the multivariate case by evaluating the smoothing spline using a smaller set of basis functions, obtained by an adaptive sampling scheme that uses values of the response variable. Our asymptotic analysis shows that smoothing splines computed via adaptive basis sampling converge to the true function at the same rate as full basis smoothing splines. Using simulation studies and a large-scale deep earth core-mantle boundary imaging study, we show that the proposed method outperforms a sampling method that does not use the values of response variables.
Efficient computation of smoothing splines via adaptive basis sampling
Ma, Ping; Huang, Jianhua Z.; Zhang, Nan
2015-01-01
© 2015 Biometrika Trust. Smoothing splines provide flexible nonparametric regression estimators. However, the high computational cost of smoothing splines for large datasets has hindered their wide application. In this article, we develop a new method, named adaptive basis sampling, for efficient computation of smoothing splines in super-large samples. Except for the univariate case where the Reinsch algorithm is applicable, a smoothing spline for a regression problem with sample size n can be expressed as a linear combination of n basis functions and its computational complexity is generally O(n^{3}). We achieve a more scalable computation in the multivariate case by evaluating the smoothing spline using a smaller set of basis functions, obtained by an adaptive sampling scheme that uses values of the response variable. Our asymptotic analysis shows that smoothing splines computed via adaptive basis sampling converge to the true function at the same rate as full basis smoothing splines. Using simulation studies and a large-scale deep earth core-mantle boundary imaging study, we show that the proposed method outperforms a sampling method that does not use the values of response variables.
Evaluation of the efficiency of computer-aided spectra search systems based on information theory
International Nuclear Information System (INIS)
Schaarschmidt, K.
1979-01-01
Application of information theory allows objective evaluation of the efficiency of computer-aided spectra search systems. For this purpose, a significant number of search processes must be analyzed. The amount of information gained by computer application is considered as the difference between the entropy of the data bank and a conditional entropy depending on the proportion of unsuccessful search processes and ballast. The influence of the following factors can be estimated: volume, structure, and quality of the spectra collection stored, efficiency of the encoding instruction and the comparing algorithm, and subjective errors involved in the encoding of spectra. The relations derived are applied to two published storage and retrieval systems for infared spectra. (Auth.)
Janetzke, David C.; Murthy, Durbha V.
1991-01-01
Aeroelastic analysis is multi-disciplinary and computationally expensive. Hence, it can greatly benefit from parallel processing. As part of an effort to develop an aeroelastic capability on a distributed memory transputer network, a parallel algorithm for the computation of aerodynamic influence coefficients is implemented on a network of 32 transputers. The aerodynamic influence coefficients are calculated using a 3-D unsteady aerodynamic model and a parallel discretization. Efficiencies up to 85 percent were demonstrated using 32 processors. The effect of subtask ordering, problem size, and network topology are presented. A comparison to results on a shared memory computer indicates that higher speedup is achieved on the distributed memory system.
Computational Modeling of Teaching and Learning through Application of Evolutionary Algorithms
Directory of Open Access Journals (Sweden)
Richard Lamb
2015-09-01
Full Text Available Within the mind, there are a myriad of ideas that make sense within the bounds of everyday experience, but are not reflective of how the world actually exists; this is particularly true in the domain of science. Classroom learning with teacher explanation are a bridge through which these naive understandings can be brought in line with scientific reality. The purpose of this paper is to examine how the application of a Multiobjective Evolutionary Algorithm (MOEA can work in concert with an existing computational-model to effectively model critical-thinking in the science classroom. An evolutionary algorithm is an algorithm that iteratively optimizes machine learning based computational models. The research question is, does the application of an evolutionary algorithm provide a means to optimize the Student Task and Cognition Model (STAC-M and does the optimized model sufficiently represent and predict teaching and learning outcomes in the science classroom? Within this computational study, the authors outline and simulate the effect of teaching on the ability of a “virtual” student to solve a Piagetian task. Using the Student Task and Cognition Model (STAC-M a computational model of student cognitive processing in science class developed in 2013, the authors complete a computational experiment which examines the role of cognitive retraining on student learning. Comparison of the STAC-M and the STAC-M with inclusion of the Multiobjective Evolutionary Algorithm shows greater success in solving the Piagetian science-tasks post cognitive retraining with the Multiobjective Evolutionary Algorithm. This illustrates the potential uses of cognitive and neuropsychological computational modeling in educational research. The authors also outline the limitations and assumptions of computational modeling.
Efficient computation of argumentation semantics
Liao, Beishui
2013-01-01
Efficient Computation of Argumentation Semantics addresses argumentation semantics and systems, introducing readers to cutting-edge decomposition methods that drive increasingly efficient logic computation in AI and intelligent systems. Such complex and distributed systems are increasingly used in the automation and transportation systems field, and particularly autonomous systems, as well as more generic intelligent computation research. The Series in Intelligent Systems publishes titles that cover state-of-the-art knowledge and the latest advances in research and development in intelligen
Efficient universal computing architectures for decoding neural activity.
Directory of Open Access Journals (Sweden)
Benjamin I Rapoport
Full Text Available The ability to decode neural activity into meaningful control signals for prosthetic devices is critical to the development of clinically useful brain- machine interfaces (BMIs. Such systems require input from tens to hundreds of brain-implanted recording electrodes in order to deliver robust and accurate performance; in serving that primary function they should also minimize power dissipation in order to avoid damaging neural tissue; and they should transmit data wirelessly in order to minimize the risk of infection associated with chronic, transcutaneous implants. Electronic architectures for brain- machine interfaces must therefore minimize size and power consumption, while maximizing the ability to compress data to be transmitted over limited-bandwidth wireless channels. Here we present a system of extremely low computational complexity, designed for real-time decoding of neural signals, and suited for highly scalable implantable systems. Our programmable architecture is an explicit implementation of a universal computing machine emulating the dynamics of a network of integrate-and-fire neurons; it requires no arithmetic operations except for counting, and decodes neural signals using only computationally inexpensive logic operations. The simplicity of this architecture does not compromise its ability to compress raw neural data by factors greater than [Formula: see text]. We describe a set of decoding algorithms based on this computational architecture, one designed to operate within an implanted system, minimizing its power consumption and data transmission bandwidth; and a complementary set of algorithms for learning, programming the decoder, and postprocessing the decoded output, designed to operate in an external, nonimplanted unit. The implementation of the implantable portion is estimated to require fewer than 5000 operations per second. A proof-of-concept, 32-channel field-programmable gate array (FPGA implementation of this portion
F--Ray: A new algorithm for efficient transport of ionizing radiation
Mao, Yi; Zhang, J.; Wandelt, B. D.; Shapiro, P. R.; Iliev, I. T.
2014-04-01
We present a new algorithm for the 3D transport of ionizing radiation, called F2-Ray (Fast Fourier Ray-tracing method). The transfer of ionizing radiation with long mean free path in diffuse intergalactic gas poses a special challenge to standard numerical methods which transport the radiation in position space. Standard methods usually trace each individual ray until it is fully absorbed by the intervening gas. If the mean free path is long, the computational cost and memory load are likely to be prohibitive. We have developed an algorithm that overcomes these limitations and is, therefore, significantly more efficient. The method calculates the transfer of radiation collectively, using the Fast Fourier Transform to convert radiation between position and Fourier spaces, so the computational cost will not increase with the number of ionizing sources. The method also automatically combines parallel rays with the same frequency at the same grid cell, thereby minimizing the memory requirement. The method is explicitly photon-conserving, i.e. the depletion of ionizing photons is guaranteed to equal the photoionizations they caused, and explicitly obeys the periodic boundary condition, i.e. the escape of ionizing photons from one side of a simulation volume is guaranteed to be compensated by emitting the same amount of photons into the volume through the opposite side. Together, these features make it possible to numerically simulate the transfer of ionizing photons more efficiently than previous methods. Since ionizing radiation such as the X-ray is responsible for heating the intergalactic gas when first stars and quasars form at high redshifts, our method can be applied to simulate thermal distribution, in addition to cosmic reionization, in three-dimensional inhomogeneous cosmological density field.
Directory of Open Access Journals (Sweden)
Taneda Akito
2008-12-01
Full Text Available Abstract Background Aligning RNA sequences with low sequence identity has been a challenging problem since such a computation essentially needs an algorithm with high complexities for taking structural conservation into account. Although many sophisticated algorithms for the purpose have been proposed to date, further improvement in efficiency is necessary to accelerate its large-scale applications including non-coding RNA (ncRNA discovery. Results We developed a new genetic algorithm, Cofolga2, for simultaneously computing pairwise RNA sequence alignment and consensus folding, and benchmarked it using BRAliBase 2.1. The benchmark results showed that our new algorithm is accurate and efficient in both time and memory usage. Then, combining with the originally trained SVM, we applied the new algorithm to novel ncRNA discovery where we compared S. cerevisiae genome with six related genomes in a pairwise manner. By focusing our search to the relatively short regions (50 bp to 2,000 bp sandwiched by conserved sequences, we successfully predict 714 intergenic and 1,311 sense or antisense ncRNA candidates, which were found in the pairwise alignments with stable consensus secondary structure and low sequence identity (≤ 50%. By comparing with the previous predictions, we found that > 92% of the candidates is novel candidates. The estimated rate of false positives in the predicted candidates is 51%. Twenty-five percent of the intergenic candidates has supports for expression in cell, i.e. their genomic positions overlap those of the experimentally determined transcripts in literature. By manual inspection of the results, moreover, we obtained four multiple alignments with low sequence identity which reveal consensus structures shared by three species/sequences. Conclusion The present method gives an efficient tool complementary to sequence-alignment-based ncRNA finders.
Fundamentals of natural computing basic concepts, algorithms, and applications
de Castro, Leandro Nunes
2006-01-01
Introduction A Small Sample of Ideas The Philosophy of Natural Computing The Three Branches: A Brief Overview When to Use Natural Computing Approaches Conceptualization General Concepts PART I - COMPUTING INSPIRED BY NATURE Evolutionary Computing Problem Solving as a Search Task Hill Climbing and Simulated Annealing Evolutionary Biology Evolutionary Computing The Other Main Evolutionary Algorithms From Evolutionary Biology to Computing Scope of Evolutionary Computing Neurocomputing The Nervous System Artif
Efficient statistically accurate algorithms for the Fokker-Planck equation in large dimensions
Chen, Nan; Majda, Andrew J.
2018-02-01
Solving the Fokker-Planck equation for high-dimensional complex turbulent dynamical systems is an important and practical issue. However, most traditional methods suffer from the curse of dimensionality and have difficulties in capturing the fat tailed highly intermittent probability density functions (PDFs) of complex systems in turbulence, neuroscience and excitable media. In this article, efficient statistically accurate algorithms are developed for solving both the transient and the equilibrium solutions of Fokker-Planck equations associated with high-dimensional nonlinear turbulent dynamical systems with conditional Gaussian structures. The algorithms involve a hybrid strategy that requires only a small number of ensembles. Here, a conditional Gaussian mixture in a high-dimensional subspace via an extremely efficient parametric method is combined with a judicious non-parametric Gaussian kernel density estimation in the remaining low-dimensional subspace. Particularly, the parametric method provides closed analytical formulae for determining the conditional Gaussian distributions in the high-dimensional subspace and is therefore computationally efficient and accurate. The full non-Gaussian PDF of the system is then given by a Gaussian mixture. Different from traditional particle methods, each conditional Gaussian distribution here covers a significant portion of the high-dimensional PDF. Therefore a small number of ensembles is sufficient to recover the full PDF, which overcomes the curse of dimensionality. Notably, the mixture distribution has significant skill in capturing the transient behavior with fat tails of the high-dimensional non-Gaussian PDFs, and this facilitates the algorithms in accurately describing the intermittency and extreme events in complex turbulent systems. It is shown in a stringent set of test problems that the method only requires an order of O (100) ensembles to successfully recover the highly non-Gaussian transient PDFs in up to 6
Efficient sequential and parallel algorithms for planted motif search.
Nicolae, Marius; Rajasekaran, Sanguthevar
2014-01-31
Motif searching is an important step in the detection of rare events occurring in a set of DNA or protein sequences. One formulation of the problem is known as (l,d)-motif search or Planted Motif Search (PMS). In PMS we are given two integers l and d and n biological sequences. We want to find all sequences of length l that appear in each of the input sequences with at most d mismatches. The PMS problem is NP-complete. PMS algorithms are typically evaluated on certain instances considered challenging. Despite ample research in the area, a considerable performance gap exists because many state of the art algorithms have large runtimes even for moderately challenging instances. This paper presents a fast exact parallel PMS algorithm called PMS8. PMS8 is the first algorithm to solve the challenging (l,d) instances (25,10) and (26,11). PMS8 is also efficient on instances with larger l and d such as (50,21). We include a comparison of PMS8 with several state of the art algorithms on multiple problem instances. This paper also presents necessary and sufficient conditions for 3 l-mers to have a common d-neighbor. The program is freely available at http://engr.uconn.edu/~man09004/PMS8/. We present PMS8, an efficient exact algorithm for Planted Motif Search. PMS8 introduces novel ideas for generating common neighborhoods. We have also implemented a parallel version for this algorithm. PMS8 can solve instances not solved by any previous algorithms.
An Efficient Algorithm for Modelling Duration in Hidden Markov Models, with a Dramatic Application
DEFF Research Database (Denmark)
Hauberg, Søren; Sloth, Jakob
2008-01-01
For many years, the hidden Markov model (HMM) has been one of the most popular tools for analysing sequential data. One frequently used special case is the left-right model, in which the order of the hidden states is known. If knowledge of the duration of a state is available it is not possible...... to represent it explicitly with an HMM. Methods for modelling duration with HMM's do exist (Rabiner in Proc. IEEE 77(2):257---286, [1989]), but they come at the price of increased computational complexity. Here we present an efficient and robust algorithm for modelling duration in HMM's, and this algorithm...
Quantum entanglement and quantum computational algorithms
Indian Academy of Sciences (India)
Abstract. The existence of entangled quantum states gives extra power to quantum computers over their classical counterparts. Quantum entanglement shows up qualitatively at the level of two qubits. We demonstrate that the one- and the two-bit Deutsch-Jozsa algorithm does not require entanglement and can be mapped ...
Directory of Open Access Journals (Sweden)
Dębski Roman
2014-09-01
Full Text Available Effective, simulation-based trajectory optimization algorithms adapted to heterogeneous computers are studied with reference to the problem taken from alpine ski racing (the presented solution is probably the most general one published so far. The key idea behind these algorithms is to use a grid-based discretization scheme to transform the continuous optimization problem into a search problem over a specially constructed finite graph, and then to apply dynamic programming to find an approximation of the global solution. In the analyzed example it is the minimum-time ski line, represented as a piecewise-linear function (a method of elimination of unfeasible solutions is proposed. Serial and parallel versions of the basic optimization algorithm are presented in detail (pseudo-code, time and memory complexity. Possible extensions of the basic algorithm are also described. The implementation of these algorithms is based on OpenCL. The included experimental results show that contemporary heterogeneous computers can be treated as μ-HPC platforms-they offer high performance (the best speedup was equal to 128 while remaining energy and cost efficient (which is crucial in embedded systems, e.g., trajectory planners of autonomous robots. The presented algorithms can be applied to many trajectory optimization problems, including those having a black-box represented performance measure
DEFF Research Database (Denmark)
Bogdanov, Andrey; Kavun, Elif Bilge; Tischhauser, Elmar
2012-01-01
An accurate estimation of the success probability and data complexity of linear cryptanalysis is a fundamental question in symmetric cryptography. In this paper, we propose an efficient reconfigurable hardware architecture to compute the success probability and data complexity of Matsui's Algorithm...... block lengths ensures that any empirical observations are not due to differences in statistical behavior for artificially small block lengths. Rather surprisingly, we observed in previous experiments a significant deviation between the theory and practice for Matsui's Algorithm 2 for larger block sizes...
Sugisaki, Kenji; Yamamoto, Satoru; Nakazawa, Shigeaki; Toyota, Kazuo; Sato, Kazunobu; Shiomi, Daisuke; Takui, Takeji
2016-08-18
Quantum computers are capable to efficiently perform full configuration interaction (FCI) calculations of atoms and molecules by using the quantum phase estimation (QPE) algorithm. Because the success probability of the QPE depends on the overlap between approximate and exact wave functions, efficient methods to prepare accurate initial guess wave functions enough to have sufficiently large overlap with the exact ones are highly desired. Here, we propose a quantum algorithm to construct the wave function consisting of one configuration state function, which is suitable for the initial guess wave function in QPE-based FCI calculations of open-shell molecules, based on the addition theorem of angular momentum. The proposed quantum algorithm enables us to prepare the wave function consisting of an exponential number of Slater determinants only by a polynomial number of quantum operations.
Abdullahi, Mohammed; Ngadi, Md Asri
2016-01-01
Cloud computing has attracted significant attention from research community because of rapid migration rate of Information Technology services to its domain. Advances in virtualization technology has made cloud computing very popular as a result of easier deployment of application services. Tasks are submitted to cloud datacenters to be processed on pay as you go fashion. Task scheduling is one the significant research challenges in cloud computing environment. The current formulation of task scheduling problems has been shown to be NP-complete, hence finding the exact solution especially for large problem sizes is intractable. The heterogeneous and dynamic feature of cloud resources makes optimum task scheduling non-trivial. Therefore, efficient task scheduling algorithms are required for optimum resource utilization. Symbiotic Organisms Search (SOS) has been shown to perform competitively with Particle Swarm Optimization (PSO). The aim of this study is to optimize task scheduling in cloud computing environment based on a proposed Simulated Annealing (SA) based SOS (SASOS) in order to improve the convergence rate and quality of solution of SOS. The SOS algorithm has a strong global exploration capability and uses fewer parameters. The systematic reasoning ability of SA is employed to find better solutions on local solution regions, hence, adding exploration ability to SOS. Also, a fitness function is proposed which takes into account the utilization level of virtual machines (VMs) which reduced makespan and degree of imbalance among VMs. CloudSim toolkit was used to evaluate the efficiency of the proposed method using both synthetic and standard workload. Results of simulation showed that hybrid SOS performs better than SOS in terms of convergence speed, response time, degree of imbalance, and makespan.
Directory of Open Access Journals (Sweden)
Mohammed Abdullahi
Full Text Available Cloud computing has attracted significant attention from research community because of rapid migration rate of Information Technology services to its domain. Advances in virtualization technology has made cloud computing very popular as a result of easier deployment of application services. Tasks are submitted to cloud datacenters to be processed on pay as you go fashion. Task scheduling is one the significant research challenges in cloud computing environment. The current formulation of task scheduling problems has been shown to be NP-complete, hence finding the exact solution especially for large problem sizes is intractable. The heterogeneous and dynamic feature of cloud resources makes optimum task scheduling non-trivial. Therefore, efficient task scheduling algorithms are required for optimum resource utilization. Symbiotic Organisms Search (SOS has been shown to perform competitively with Particle Swarm Optimization (PSO. The aim of this study is to optimize task scheduling in cloud computing environment based on a proposed Simulated Annealing (SA based SOS (SASOS in order to improve the convergence rate and quality of solution of SOS. The SOS algorithm has a strong global exploration capability and uses fewer parameters. The systematic reasoning ability of SA is employed to find better solutions on local solution regions, hence, adding exploration ability to SOS. Also, a fitness function is proposed which takes into account the utilization level of virtual machines (VMs which reduced makespan and degree of imbalance among VMs. CloudSim toolkit was used to evaluate the efficiency of the proposed method using both synthetic and standard workload. Results of simulation showed that hybrid SOS performs better than SOS in terms of convergence speed, response time, degree of imbalance, and makespan.
Komorkiewicz, Mateusz; Kryjak, Tomasz; Gorgon, Marek
2014-01-01
This article presents an efficient hardware implementation of the Horn-Schunck algorithm that can be used in an embedded optical flow sensor. An architecture is proposed, that realises the iterative Horn-Schunck algorithm in a pipelined manner. This modification allows to achieve data throughput of 175 MPixels/s and makes processing of Full HD video stream (1, 920 × 1, 080 @ 60 fps) possible. The structure of the optical flow module as well as pre- and post-filtering blocks and a flow reliability computation unit is described in details. Three versions of optical flow modules, with different numerical precision, working frequency and obtained results accuracy are proposed. The errors caused by switching from floating- to fixed-point computations are also evaluated. The described architecture was tested on popular sequences from an optical flow dataset of the Middlebury University. It achieves state-of-the-art results among hardware implementations of single scale methods. The designed fixed-point architecture achieves performance of 418 GOPS with power efficiency of 34 GOPS/W. The proposed floating-point module achieves 103 GFLOPS, with power efficiency of 24 GFLOPS/W. Moreover, a 100 times speedup compared to a modern CPU with SIMD support is reported. A complete, working vision system realized on Xilinx VC707 evaluation board is also presented. It is able to compute optical flow for Full HD video stream received from an HDMI camera in real-time. The obtained results prove that FPGA devices are an ideal platform for embedded vision systems. PMID:24526303
Efficient computation of the joint sample frequency spectra for multiple populations.
Kamm, John A; Terhorst, Jonathan; Song, Yun S
2017-01-01
A wide range of studies in population genetics have employed the sample frequency spectrum (SFS), a summary statistic which describes the distribution of mutant alleles at a polymorphic site in a sample of DNA sequences and provides a highly efficient dimensional reduction of large-scale population genomic variation data. Recently, there has been much interest in analyzing the joint SFS data from multiple populations to infer parameters of complex demographic histories, including variable population sizes, population split times, migration rates, admixture proportions, and so on. SFS-based inference methods require accurate computation of the expected SFS under a given demographic model. Although much methodological progress has been made, existing methods suffer from numerical instability and high computational complexity when multiple populations are involved and the sample size is large. In this paper, we present new analytic formulas and algorithms that enable accurate, efficient computation of the expected joint SFS for thousands of individuals sampled from hundreds of populations related by a complex demographic model with arbitrary population size histories (including piecewise-exponential growth). Our results are implemented in a new software package called momi (MOran Models for Inference). Through an empirical study we demonstrate our improvements to numerical stability and computational complexity.
Combinatorial algorithms enabling computational science: tales from the front
International Nuclear Information System (INIS)
Bhowmick, Sanjukta; Boman, Erik G; Devine, Karen; Gebremedhin, Assefaw; Hendrickson, Bruce; Hovland, Paul; Munson, Todd; Pothen, Alex
2006-01-01
Combinatorial algorithms have long played a crucial enabling role in scientific and engineering computations. The importance of discrete algorithms continues to grow with the demands of new applications and advanced architectures. This paper surveys some recent developments in this rapidly changing and highly interdisciplinary field
Combinatorial algorithms enabling computational science: tales from the front
Energy Technology Data Exchange (ETDEWEB)
Bhowmick, Sanjukta [Mathematics and Computer Science Division, Argonne National Laboratory (United States); Boman, Erik G [Discrete Algorithms and Math Department, Sandia National Laboratories (United States); Devine, Karen [Discrete Algorithms and Math Department, Sandia National Laboratories (United States); Gebremedhin, Assefaw [Computer Science Department, Old Dominion University (United States); Hendrickson, Bruce [Discrete Algorithms and Math Department, Sandia National Laboratories (United States); Hovland, Paul [Mathematics and Computer Science Division, Argonne National Laboratory (United States); Munson, Todd [Mathematics and Computer Science Division, Argonne National Laboratory (United States); Pothen, Alex [Computer Science Department, Old Dominion University (United States)
2006-09-15
Combinatorial algorithms have long played a crucial enabling role in scientific and engineering computations. The importance of discrete algorithms continues to grow with the demands of new applications and advanced architectures. This paper surveys some recent developments in this rapidly changing and highly interdisciplinary field.
The Efficiency Analysis of the Augmented Reality Algorithm
Directory of Open Access Journals (Sweden)
Dovilė Kurpytė
2013-05-01
Full Text Available The article presents the investigation of the efficiency of augmented reality algorithm that depends on the rotation angles and lighting conditions. The following were the target subject parameters: three degrees of freedom perspective of the rotation and side lighting that forms a shadow. Static parameters of subjects with the ability to change them were as follow: the distance between the marker and the camera, camera, processor, and the distance from the light source. The study is based on an open source Java programming language algorithm, where the algorithm is tested with 10 markers. It was found that the rotation error did not exceed 2%.Article in Lithuanian
Unconventional Algorithms: Complementarity of Axiomatics and Construction
Directory of Open Access Journals (Sweden)
Gordana Dodig Crnkovic
2012-10-01
Full Text Available In this paper, we analyze axiomatic and constructive issues of unconventional computations from a methodological and philosophical point of view. We explain how the new models of algorithms and unconventional computations change the algorithmic universe, making it open and allowing increased flexibility and expressive power that augment creativity. At the same time, the greater power of new types of algorithms also results in the greater complexity of the algorithmic universe, transforming it into the algorithmic multiverse and demanding new tools for its study. That is why we analyze new powerful tools brought forth by local mathematics, local logics, logical varieties and the axiomatic theory of algorithms, automata and computation. We demonstrate how these new tools allow efficient navigation in the algorithmic multiverse. Further work includes study of natural computation by unconventional algorithms and constructive approaches.
Directory of Open Access Journals (Sweden)
Jianfei Zhang
2013-01-01
Full Text Available Graphics processing unit (GPU has obtained great success in scientific computations for its tremendous computational horsepower and very high memory bandwidth. This paper discusses the efficient way to implement polynomial preconditioned conjugate gradient solver for the finite element computation of elasticity on NVIDIA GPUs using compute unified device architecture (CUDA. Sliced block ELLPACK (SBELL format is introduced to store sparse matrix arising from finite element discretization of elasticity with fewer padding zeros than traditional ELLPACK-based formats. Polynomial preconditioning methods have been investigated both in convergence and running time. From the overall performance, the least-squares (L-S polynomial method is chosen as a preconditioner in PCG solver to finite element equations derived from elasticity for its best results on different example meshes. In the PCG solver, mixed precision algorithm is used not only to reduce the overall computational, storage requirements and bandwidth but to make full use of the capacity of the GPU devices. With SBELL format and mixed precision algorithm, the GPU-based L-S preconditioned CG can get a speedup of about 7–9 to CPU-implementation.
The theory of hybrid stochastic algorithms
International Nuclear Information System (INIS)
Duane, S.; Kogut, J.B.
1986-01-01
The theory of hybrid stochastic algorithms is developed. A generalized Fokker-Planck equation is derived and is used to prove that the correct equilibrium distribution is generated by the algorithm. Systematic errors following from the discrete time-step used in the numerical implementation of the scheme are computed. Hybrid algorithms which simulate lattice gauge theory with dynamical fermions are presented. They are optimized in computer simulations and their systematic errors and efficiencies are studied. (orig.)
An algorithm for testing the efficient market hypothesis.
Directory of Open Access Journals (Sweden)
Ioana-Andreea Boboc
Full Text Available The objective of this research is to examine the efficiency of EUR/USD market through the application of a trading system. The system uses a genetic algorithm based on technical analysis indicators such as Exponential Moving Average (EMA, Moving Average Convergence Divergence (MACD, Relative Strength Index (RSI and Filter that gives buying and selling recommendations to investors. The algorithm optimizes the strategies by dynamically searching for parameters that improve profitability in the training period. The best sets of rules are then applied on the testing period. The results show inconsistency in finding a set of trading rules that performs well in both periods. Strategies that achieve very good returns in the training period show difficulty in returning positive results in the testing period, this being consistent with the efficient market hypothesis (EMH.
An algorithm for testing the efficient market hypothesis.
Boboc, Ioana-Andreea; Dinică, Mihai-Cristian
2013-01-01
The objective of this research is to examine the efficiency of EUR/USD market through the application of a trading system. The system uses a genetic algorithm based on technical analysis indicators such as Exponential Moving Average (EMA), Moving Average Convergence Divergence (MACD), Relative Strength Index (RSI) and Filter that gives buying and selling recommendations to investors. The algorithm optimizes the strategies by dynamically searching for parameters that improve profitability in the training period. The best sets of rules are then applied on the testing period. The results show inconsistency in finding a set of trading rules that performs well in both periods. Strategies that achieve very good returns in the training period show difficulty in returning positive results in the testing period, this being consistent with the efficient market hypothesis (EMH).
Griffiths, Thomas L; Lieder, Falk; Goodman, Noah D
2015-04-01
Marr's levels of analysis-computational, algorithmic, and implementation-have served cognitive science well over the last 30 years. But the recent increase in the popularity of the computational level raises a new challenge: How do we begin to relate models at different levels of analysis? We propose that it is possible to define levels of analysis that lie between the computational and the algorithmic, providing a way to build a bridge between computational- and algorithmic-level models. The key idea is to push the notion of rationality, often used in defining computational-level models, deeper toward the algorithmic level. We offer a simple recipe for reverse-engineering the mind's cognitive strategies by deriving optimal algorithms for a series of increasingly more realistic abstract computational architectures, which we call "resource-rational analysis." Copyright © 2015 Cognitive Science Society, Inc.
Parallel conjugate gradient algorithms for manipulator dynamic simulation
Fijany, Amir; Scheld, Robert E.
1989-01-01
Parallel conjugate gradient algorithms for the computation of multibody dynamics are developed for the specialized case of a robot manipulator. For an n-dimensional positive-definite linear system, the Classical Conjugate Gradient (CCG) algorithms are guaranteed to converge in n iterations, each with a computation cost of O(n); this leads to a total computational cost of O(n sq) on a serial processor. A conjugate gradient algorithms is presented that provide greater efficiency using a preconditioner, which reduces the number of iterations required, and by exploiting parallelism, which reduces the cost of each iteration. Two Preconditioned Conjugate Gradient (PCG) algorithms are proposed which respectively use a diagonal and a tridiagonal matrix, composed of the diagonal and tridiagonal elements of the mass matrix, as preconditioners. Parallel algorithms are developed to compute the preconditioners and their inversions in O(log sub 2 n) steps using n processors. A parallel algorithm is also presented which, on the same architecture, achieves the computational time of O(log sub 2 n) for each iteration. Simulation results for a seven degree-of-freedom manipulator are presented. Variants of the proposed algorithms are also developed which can be efficiently implemented on the Robot Mathematics Processor (RMP).
Power-efficient computer architectures recent advances
Själander, Magnus; Kaxiras, Stefanos
2014-01-01
As Moore's Law and Dennard scaling trends have slowed, the challenges of building high-performance computer architectures while maintaining acceptable power efficiency levels have heightened. Over the past ten years, architecture techniques for power efficiency have shifted from primarily focusing on module-level efficiencies, toward more holistic design styles based on parallelism and heterogeneity. This work highlights and synthesizes recent techniques and trends in power-efficient computer architecture.Table of Contents: Introduction / Voltage and Frequency Management / Heterogeneity and Sp
Efficient Actor-Critic Algorithm with Hierarchical Model Learning and Planning
Fu, QiMing
2016-01-01
To improve the convergence rate and the sample efficiency, two efficient learning methods AC-HMLP and RAC-HMLP (AC-HMLP with ℓ 2-regularization) are proposed by combining actor-critic algorithm with hierarchical model learning and planning. The hierarchical models consisting of the local and the global models, which are learned at the same time during learning of the value function and the policy, are approximated by local linear regression (LLR) and linear function approximation (LFA), respectively. Both the local model and the global model are applied to generate samples for planning; the former is used only if the state-prediction error does not surpass the threshold at each time step, while the latter is utilized at the end of each episode. The purpose of taking both models is to improve the sample efficiency and accelerate the convergence rate of the whole algorithm through fully utilizing the local and global information. Experimentally, AC-HMLP and RAC-HMLP are compared with three representative algorithms on two Reinforcement Learning (RL) benchmark problems. The results demonstrate that they perform best in terms of convergence rate and sample efficiency. PMID:27795704
Fast parallel algorithms that compute transitive closure of a fuzzy relation
Kreinovich, Vladik YA.
1993-01-01
The notion of a transitive closure of a fuzzy relation is very useful for clustering in pattern recognition, for fuzzy databases, etc. The original algorithm proposed by L. Zadeh (1971) requires the computation time O(n(sup 4)), where n is the number of elements in the relation. In 1974, J. C. Dunn proposed a O(n(sup 2)) algorithm. Since we must compute n(n-1)/2 different values s(a, b) (a not equal to b) that represent the fuzzy relation, and we need at least one computational step to compute each of these values, we cannot compute all of them in less than O(n(sup 2)) steps. So, Dunn's algorithm is in this sense optimal. For small n, it is ok. However, for big n (e.g., for big databases), it is still a lot, so it would be desirable to decrease the computation time (this problem was formulated by J. Bezdek). Since this decrease cannot be done on a sequential computer, the only way to do it is to use a computer with several processors working in parallel. We show that on a parallel computer, transitive closure can be computed in time O((log(sub 2)(n))2).
Directory of Open Access Journals (Sweden)
Weizhen Rao
2016-01-01
Full Text Available The classical model of vehicle routing problem (VRP generally minimizes either the total vehicle travelling distance or the total number of dispatched vehicles. Due to the increased importance of environmental sustainability, one variant of VRPs that minimizes the total vehicle fuel consumption has gained much attention. The resulting fuel consumption VRP (FCVRP becomes increasingly important yet difficult. We present a mixed integer programming model for the FCVRP, and fuel consumption is measured through the degree of road gradient. Complexity analysis of FCVRP is presented through analogy with the capacitated VRP. To tackle the FCVRP’s computational intractability, we propose an efficient two-objective hybrid local search algorithm (TOHLS. TOHLS is based on a hybrid local search algorithm (HLS that is also used to solve FCVRP. Based on the Golden CVRP benchmarks, 60 FCVRP instances are generated and tested. Finally, the computational results show that the proposed TOHLS significantly outperforms the HLS.
New preconditioned conjugate gradient algorithms for nonlinear unconstrained optimization problems
International Nuclear Information System (INIS)
Al-Bayati, A.; Al-Asadi, N.
1997-01-01
This paper presents two new predilection conjugate gradient algorithms for nonlinear unconstrained optimization problems and examines their computational performance. Computational experience shows that the new proposed algorithms generally imp lone the efficiency of Nazareth's [13] preconditioned conjugate gradient algorithm. (authors). 16 refs., 1 tab
Desiderata for computable representations of electronic health records-driven phenotype algorithms.
Mo, Huan; Thompson, William K; Rasmussen, Luke V; Pacheco, Jennifer A; Jiang, Guoqian; Kiefer, Richard; Zhu, Qian; Xu, Jie; Montague, Enid; Carrell, David S; Lingren, Todd; Mentch, Frank D; Ni, Yizhao; Wehbe, Firas H; Peissig, Peggy L; Tromp, Gerard; Larson, Eric B; Chute, Christopher G; Pathak, Jyotishman; Denny, Joshua C; Speltz, Peter; Kho, Abel N; Jarvik, Gail P; Bejan, Cosmin A; Williams, Marc S; Borthwick, Kenneth; Kitchner, Terrie E; Roden, Dan M; Harris, Paul A
2015-11-01
Electronic health records (EHRs) are increasingly used for clinical and translational research through the creation of phenotype algorithms. Currently, phenotype algorithms are most commonly represented as noncomputable descriptive documents and knowledge artifacts that detail the protocols for querying diagnoses, symptoms, procedures, medications, and/or text-driven medical concepts, and are primarily meant for human comprehension. We present desiderata for developing a computable phenotype representation model (PheRM). A team of clinicians and informaticians reviewed common features for multisite phenotype algorithms published in PheKB.org and existing phenotype representation platforms. We also evaluated well-known diagnostic criteria and clinical decision-making guidelines to encompass a broader category of algorithms. We propose 10 desired characteristics for a flexible, computable PheRM: (1) structure clinical data into queryable forms; (2) recommend use of a common data model, but also support customization for the variability and availability of EHR data among sites; (3) support both human-readable and computable representations of phenotype algorithms; (4) implement set operations and relational algebra for modeling phenotype algorithms; (5) represent phenotype criteria with structured rules; (6) support defining temporal relations between events; (7) use standardized terminologies and ontologies, and facilitate reuse of value sets; (8) define representations for text searching and natural language processing; (9) provide interfaces for external software algorithms; and (10) maintain backward compatibility. A computable PheRM is needed for true phenotype portability and reliability across different EHR products and healthcare systems. These desiderata are a guide to inform the establishment and evolution of EHR phenotype algorithm authoring platforms and languages. © The Author 2015. Published by Oxford University Press on behalf of the American Medical
Efficient sequential and parallel algorithms for finding edit distance based motifs.
Pal, Soumitra; Xiao, Peng; Rajasekaran, Sanguthevar
2016-08-18
Motif search is an important step in extracting meaningful patterns from biological data. The general problem of motif search is intractable and there is a pressing need to develop efficient, exact and approximation algorithms to solve this problem. In this paper, we present several novel, exact, sequential and parallel algorithms for solving the (l,d) Edit-distance-based Motif Search (EMS) problem: given two integers l,d and n biological strings, find all strings of length l that appear in each input string with atmost d errors of types substitution, insertion and deletion. One popular technique to solve the problem is to explore for each input string the set of all possible l-mers that belong to the d-neighborhood of any substring of the input string and output those which are common for all input strings. We introduce a novel and provably efficient neighborhood exploration technique. We show that it is enough to consider the candidates in neighborhood which are at a distance exactly d. We compactly represent these candidate motifs using wildcard characters and efficiently explore them with very few repetitions. Our sequential algorithm uses a trie based data structure to efficiently store and sort the candidate motifs. Our parallel algorithm in a multi-core shared memory setting uses arrays for storing and a novel modification of radix-sort for sorting the candidate motifs. The algorithms for EMS are customarily evaluated on several challenging instances such as (8,1), (12,2), (16,3), (20,4), and so on. The best previously known algorithm, EMS1, is sequential and in estimated 3 days solves up to instance (16,3). Our sequential algorithms are more than 20 times faster on (16,3). On other hard instances such as (9,2), (11,3), (13,4), our algorithms are much faster. Our parallel algorithm has more than 600 % scaling performance while using 16 threads. Our algorithms have pushed up the state-of-the-art of EMS solvers and we believe that the techniques introduced in
Quantum computation with classical light: The Deutsch Algorithm
International Nuclear Information System (INIS)
Perez-Garcia, Benjamin; Francis, Jason; McLaren, Melanie; Hernandez-Aranda, Raul I.; Forbes, Andrew; Konrad, Thomas
2015-01-01
We present an implementation of the Deutsch Algorithm using linear optical elements and laser light. We encoded two quantum bits in form of superpositions of electromagnetic fields in two degrees of freedom of the beam: its polarisation and orbital angular momentum. Our approach, based on a Sagnac interferometer, offers outstanding stability and demonstrates that optical quantum computation is possible using classical states of light. - Highlights: • We implement the Deutsh Algorithm using linear optical elements and classical light. • Our qubits are encoded in the polarisation and orbital angular momentum of the beam. • We show that it is possible to achieve quantum computation with two qubits in the classical domain of light
Quantum computation with classical light: The Deutsch Algorithm
Energy Technology Data Exchange (ETDEWEB)
Perez-Garcia, Benjamin [Photonics and Mathematical Optics Group, Tecnológico de Monterrey, Monterrey 64849 (Mexico); University of the Witwatersrand, Private Bag 3, Johannesburg 2050 (South Africa); Francis, Jason [School of Chemistry and Physics, University of KwaZulu-Natal, Private Bag X54001, Durban 4000 (South Africa); McLaren, Melanie [University of the Witwatersrand, Private Bag 3, Johannesburg 2050 (South Africa); Hernandez-Aranda, Raul I. [Photonics and Mathematical Optics Group, Tecnológico de Monterrey, Monterrey 64849 (Mexico); Forbes, Andrew [University of the Witwatersrand, Private Bag 3, Johannesburg 2050 (South Africa); Konrad, Thomas, E-mail: konradt@ukzn.ac.za [School of Chemistry and Physics, University of KwaZulu-Natal, Private Bag X54001, Durban 4000 (South Africa); National Institute of Theoretical Physics, Durban Node, Private Bag X54001, Durban 4000 (South Africa)
2015-08-28
We present an implementation of the Deutsch Algorithm using linear optical elements and laser light. We encoded two quantum bits in form of superpositions of electromagnetic fields in two degrees of freedom of the beam: its polarisation and orbital angular momentum. Our approach, based on a Sagnac interferometer, offers outstanding stability and demonstrates that optical quantum computation is possible using classical states of light. - Highlights: • We implement the Deutsh Algorithm using linear optical elements and classical light. • Our qubits are encoded in the polarisation and orbital angular momentum of the beam. • We show that it is possible to achieve quantum computation with two qubits in the classical domain of light.
Parallel Directionally Split Solver Based on Reformulation of Pipelined Thomas Algorithm
Povitsky, A.
1998-01-01
In this research an efficient parallel algorithm for 3-D directionally split problems is developed. The proposed algorithm is based on a reformulated version of the pipelined Thomas algorithm that starts the backward step computations immediately after the completion of the forward step computations for the first portion of lines This algorithm has data available for other computational tasks while processors are idle from the Thomas algorithm. The proposed 3-D directionally split solver is based on the static scheduling of processors where local and non-local, data-dependent and data-independent computations are scheduled while processors are idle. A theoretical model of parallelization efficiency is used to define optimal parameters of the algorithm, to show an asymptotic parallelization penalty and to obtain an optimal cover of a global domain with subdomains. It is shown by computational experiments and by the theoretical model that the proposed algorithm reduces the parallelization penalty about two times over the basic algorithm for the range of the number of processors (subdomains) considered and the number of grid nodes per subdomain.
Efficient algorithms for conditional independence inference
Czech Academy of Sciences Publication Activity Database
Bouckaert, R.; Hemmecke, R.; Lindner, S.; Studený, Milan
2010-01-01
Roč. 11, č. 1 (2010), s. 3453-3479 ISSN 1532-4435 R&D Projects: GA ČR GA201/08/0539; GA MŠk 1M0572 Institutional research plan: CEZ:AV0Z10750506 Keywords : conditional independence inference * linear programming approach Subject RIV: BA - General Mathematics Impact factor: 2.949, year: 2010 http://library.utia.cas.cz/separaty/2010/MTR/studeny-efficient algorithms for conditional independence inference.pdf
Fibonacci’s Computation Methods vs Modern Algorithms
Directory of Open Access Journals (Sweden)
Ernesto Burattini
2013-12-01
Full Text Available In this paper we discuss some computational procedures given by Leonardo Pisano Fibonacci in his famous Liber Abaci book, and we propose their translation into a modern language for computers (C ++. Among the other we describe the method of “cross” multiplication, we evaluate its computational complexity in algorithmic terms and we show the output of a C ++ code that describes the development of the method applied to the product of two integers. In a similar way we show the operations performed on fractions introduced by Fibonacci. Thanks to the possibility to reproduce on a computer, the Fibonacci’s different computational procedures, it was possible to identify some calculation errors present in the different versions of the original text.
Efficient Parallel Algorithms for Unsteady Incompressible Flows
Guermond, Jean-Luc; Minev, Peter D.
2013-01-01
The objective of this paper is to give an overview of recent developments on splitting schemes for solving the time-dependent incompressible Navier–Stokes equations and to discuss possible extensions to the variable density/viscosity case. A particular attention is given to algorithms that can be implemented efficiently on large parallel clusters.
Computationally efficient algorithms for statistical image processing : implementation in R
Langovoy, M.; Wittich, O.
2010-01-01
In the series of our earlier papers on the subject, we proposed a novel statistical hypothesis testing method for detection of objects in noisy images. The method uses results from percolation theory and random graph theory. We developed algorithms that allowed to detect objects of unknown shapes in
An efficient fractal image coding algorithm using unified feature and DCT
International Nuclear Information System (INIS)
Zhou Yiming; Zhang Chao; Zhang Zengke
2009-01-01
Fractal image compression is a promising technique to improve the efficiency of image storage and image transmission with high compression ratio, however, the huge time consumption for the fractal image coding is a great obstacle to the practical applications. In order to improve the fractal image coding, efficient fractal image coding algorithms using a special unified feature and a DCT coder are proposed in this paper. Firstly, based on a necessary condition to the best matching search rule during fractal image coding, the fast algorithm using a special unified feature (UFC) is addressed, and it can reduce the search space obviously and exclude most inappropriate matching subblocks before the best matching search. Secondly, on the basis of UFC algorithm, in order to improve the quality of the reconstructed image, a DCT coder is combined to construct a hybrid fractal image algorithm (DUFC). Experimental results show that the proposed algorithms can obtain good quality of the reconstructed images and need much less time than the baseline fractal coding algorithm.
Efficient Online Learning Algorithms Based on LSTM Neural Networks.
Ergen, Tolga; Kozat, Suleyman Serdar
2017-09-13
We investigate online nonlinear regression and introduce novel regression structures based on the long short term memory (LSTM) networks. For the introduced structures, we also provide highly efficient and effective online training methods. To train these novel LSTM-based structures, we put the underlying architecture in a state space form and introduce highly efficient and effective particle filtering (PF)-based updates. We also provide stochastic gradient descent and extended Kalman filter-based updates. Our PF-based training method guarantees convergence to the optimal parameter estimation in the mean square error sense provided that we have a sufficient number of particles and satisfy certain technical conditions. More importantly, we achieve this performance with a computational complexity in the order of the first-order gradient-based methods by controlling the number of particles. Since our approach is generic, we also introduce a gated recurrent unit (GRU)-based approach by directly replacing the LSTM architecture with the GRU architecture, where we demonstrate the superiority of our LSTM-based approach in the sequential prediction task via different real life data sets. In addition, the experimental results illustrate significant performance improvements achieved by the introduced algorithms with respect to the conventional methods over several different benchmark real life data sets.
Computational plasticity algorithm for particle dynamics simulations
Krabbenhoft, K.; Lyamin, A. V.; Vignes, C.
2018-01-01
The problem of particle dynamics simulation is interpreted in the framework of computational plasticity leading to an algorithm which is mathematically indistinguishable from the common implicit scheme widely used in the finite element analysis of elastoplastic boundary value problems. This algorithm provides somewhat of a unification of two particle methods, the discrete element method and the contact dynamics method, which usually are thought of as being quite disparate. In particular, it is shown that the former appears as the special case where the time stepping is explicit while the use of implicit time stepping leads to the kind of schemes usually labelled contact dynamics methods. The framing of particle dynamics simulation within computational plasticity paves the way for new approaches similar (or identical) to those frequently employed in nonlinear finite element analysis. These include mixed implicit-explicit time stepping, dynamic relaxation and domain decomposition schemes.
Microscope self-calibration based on micro laser line imaging and soft computing algorithms
Apolinar Muñoz Rodríguez, J.
2018-06-01
A technique to perform microscope self-calibration via micro laser line and soft computing algorithms is presented. In this technique, the microscope vision parameters are computed by means of soft computing algorithms based on laser line projection. To implement the self-calibration, a microscope vision system is constructed by means of a CCD camera and a 38 μm laser line. From this arrangement, the microscope vision parameters are represented via Bezier approximation networks, which are accomplished through the laser line position. In this procedure, a genetic algorithm determines the microscope vision parameters by means of laser line imaging. Also, the approximation networks compute the three-dimensional vision by means of the laser line position. Additionally, the soft computing algorithms re-calibrate the vision parameters when the microscope vision system is modified during the vision task. The proposed self-calibration improves accuracy of the traditional microscope calibration, which is accomplished via external references to the microscope system. The capability of the self-calibration based on soft computing algorithms is determined by means of the calibration accuracy and the micro-scale measurement error. This contribution is corroborated by an evaluation based on the accuracy of the traditional microscope calibration.
An investigation of genetic algorithms
International Nuclear Information System (INIS)
Douglas, S.R.
1995-04-01
Genetic algorithms mimic biological evolution by natural selection in their search for better individuals within a changing population. they can be used as efficient optimizers. This report discusses the developing field of genetic algorithms. It gives a simple example of the search process and introduces the concept of schema. It also discusses modifications to the basic genetic algorithm that result in species and niche formation, in machine learning and artificial evolution of computer programs, and in the streamlining of human-computer interaction. (author). 3 refs., 1 tab., 2 figs
A primer on the energy efficiency of computing
Energy Technology Data Exchange (ETDEWEB)
Koomey, Jonathan G. [Research Fellow, Steyer-Taylor Center for Energy Policy and Finance, Stanford University (United States)
2015-03-30
The efficiency of computing at peak output has increased rapidly since the dawn of the computer age. This paper summarizes some of the key factors affecting the efficiency of computing in all usage modes. While there is still great potential for improving the efficiency of computing devices, we will need to alter how we do computing in the next few decades because we are finally approaching the limits of current technologies.
Devi, D Chitra; Uthariaraj, V Rhymend
2016-01-01
Cloud computing uses the concepts of scheduling and load balancing to migrate tasks to underutilized VMs for effectively sharing the resources. The scheduling of the nonpreemptive tasks in the cloud computing environment is an irrecoverable restraint and hence it has to be assigned to the most appropriate VMs at the initial placement itself. Practically, the arrived jobs consist of multiple interdependent tasks and they may execute the independent tasks in multiple VMs or in the same VM's multiple cores. Also, the jobs arrive during the run time of the server in varying random intervals under various load conditions. The participating heterogeneous resources are managed by allocating the tasks to appropriate resources by static or dynamic scheduling to make the cloud computing more efficient and thus it improves the user satisfaction. Objective of this work is to introduce and evaluate the proposed scheduling and load balancing algorithm by considering the capabilities of each virtual machine (VM), the task length of each requested job, and the interdependency of multiple tasks. Performance of the proposed algorithm is studied by comparing with the existing methods.
Limits on the efficiency of event-based algorithms for Monte Carlo neutron transport
Directory of Open Access Journals (Sweden)
Paul K. Romano
2017-09-01
Full Text Available The traditional form of parallelism in Monte Carlo particle transport simulations, wherein each individual particle history is considered a unit of work, does not lend itself well to data-level parallelism. Event-based algorithms, which were originally used for simulations on vector processors, may offer a path toward better utilizing data-level parallelism in modern computer architectures. In this study, a simple model is developed for estimating the efficiency of the event-based particle transport algorithm under two sets of assumptions. Data collected from simulations of four reactor problems using OpenMC was then used in conjunction with the models to calculate the speedup due to vectorization as a function of the size of the particle bank and the vector width. When each event type is assumed to have constant execution time, the achievable speedup is directly related to the particle bank size. We observed that the bank size generally needs to be at least 20 times greater than vector size to achieve vector efficiency greater than 90%. When the execution times for events are allowed to vary, the vector speedup is also limited by differences in the execution time for events being carried out in a single event-iteration.
A Novel adaptative Discrete Cuckoo Search Algorithm for parameter optimization in computer vision
Directory of Open Access Journals (Sweden)
loubna benchikhi
2017-10-01
Full Text Available Computer vision applications require choosing operators and their parameters, in order to provide the best outcomes. Often, the users quarry on expert knowledge and must experiment many combinations to find manually the best one. As performance, time and accuracy are important, it is necessary to automate parameter optimization at least for crucial operators. In this paper, a novel approach based on an adaptive discrete cuckoo search algorithm (ADCS is proposed. It automates the process of algorithms’ setting and provides optimal parameters for vision applications. This work reconsiders a discretization problem to adapt the cuckoo search algorithm and presents the procedure of parameter optimization. Some experiments on real examples and comparisons to other metaheuristic-based approaches: particle swarm optimization (PSO, reinforcement learning (RL and ant colony optimization (ACO show the efficiency of this novel method.
The algorithmic level is the bridge between computation and brain.
Love, Bradley C
2015-04-01
Every scientist chooses a preferred level of analysis and this choice shapes the research program, even determining what counts as evidence. This contribution revisits Marr's (1982) three levels of analysis (implementation, algorithmic, and computational) and evaluates the prospect of making progress at each individual level. After reviewing limitations of theorizing within a level, two strategies for integration across levels are considered. One is top-down in that it attempts to build a bridge from the computational to algorithmic level. Limitations of this approach include insufficient theoretical constraint at the computation level to provide a foundation for integration, and that people are suboptimal for reasons other than capacity limitations. Instead, an inside-out approach is forwarded in which all three levels of analysis are integrated via the algorithmic level. This approach maximally leverages mutual data constraints at all levels. For example, algorithmic models can be used to interpret brain imaging data, and brain imaging data can be used to select among competing models. Examples of this approach to integration are provided. This merging of levels raises questions about the relevance of Marr's tripartite view. Copyright © 2015 Cognitive Science Society, Inc.
GATE: Improving the computational efficiency
International Nuclear Information System (INIS)
Staelens, S.; De Beenhouwer, J.; Kruecker, D.; Maigne, L.; Rannou, F.; Ferrer, L.; D'Asseler, Y.; Buvat, I.; Lemahieu, I.
2006-01-01
GATE is a software dedicated to Monte Carlo simulations in Single Photon Emission Computed Tomography (SPECT) and Positron Emission Tomography (PET). An important disadvantage of those simulations is the fundamental burden of computation time. This manuscript describes three different techniques in order to improve the efficiency of those simulations. Firstly, the implementation of variance reduction techniques (VRTs), more specifically the incorporation of geometrical importance sampling, is discussed. After this, the newly designed cluster version of the GATE software is described. The experiments have shown that GATE simulations scale very well on a cluster of homogeneous computers. Finally, an elaboration on the deployment of GATE on the Enabling Grids for E-Science in Europe (EGEE) grid will conclude the description of efficiency enhancement efforts. The three aforementioned methods improve the efficiency of GATE to a large extent and make realistic patient-specific overnight Monte Carlo simulations achievable
Birkegård, Anna Camilla; Andersen, Vibe Dalhoff; Halasa, Tariq; Jensen, Vibeke Frøkjær; Toft, Nils; Vigre, Håkan
2017-10-01
Accurate and detailed data on antimicrobial exposure in pig production are essential when studying the association between antimicrobial exposure and antimicrobial resistance. Due to difficulties in obtaining primary data on antimicrobial exposure in a large number of farms, there is a need for a robust and valid method to estimate the exposure using register data. An approach that estimates the antimicrobial exposure in every rearing period during the lifetime of a pig using register data was developed into a computational algorithm. In this approach data from national registers on antimicrobial purchases, movements of pigs and farm demographics registered at farm level are used. The algorithm traces batches of pigs retrospectively from slaughter to the farm(s) that housed the pigs during their finisher, weaner, and piglet period. Subsequently, the algorithm estimates the antimicrobial exposure as the number of Animal Defined Daily Doses for treatment of one kg pig in each of the rearing periods. Thus, the antimicrobial purchase data at farm level are translated into antimicrobial exposure estimates at batch level. A batch of pigs is defined here as pigs sent to slaughter at the same day from the same farm. In this study we present, validate, and optimise a computational algorithm that calculate the lifetime exposure of antimicrobials for slaughter pigs. The algorithm was evaluated by comparing the computed estimates to data on antimicrobial usage from farm records in 15 farm units. We found a good positive correlation between the two estimates. The algorithm was run for Danish slaughter pigs sent to slaughter in January to March 2015 from farms with more than 200 finishers to estimate the proportion of farms that it was applicable for. In the final process, the algorithm was successfully run for batches of pigs originating from 3026 farms with finisher units (77% of the initial population). This number can be increased if more accurate register data can be
A network of spiking neurons for computing sparse representations in an energy-efficient way.
Hu, Tao; Genkin, Alexander; Chklovskii, Dmitri B
2012-11-01
Computing sparse redundant representations is an important problem in both applied mathematics and neuroscience. In many applications, this problem must be solved in an energy-efficient way. Here, we propose a hybrid distributed algorithm (HDA), which solves this problem on a network of simple nodes communicating by low-bandwidth channels. HDA nodes perform both gradient-descent-like steps on analog internal variables and coordinate-descent-like steps via quantized external variables communicated to each other. Interestingly, the operation is equivalent to a network of integrate-and-fire neurons, suggesting that HDA may serve as a model of neural computation. We show that the numerical performance of HDA is on par with existing algorithms. In the asymptotic regime, the representation error of HDA decays with time, t, as 1/t. HDA is stable against time-varying noise; specifically, the representation error decays as 1/√t for gaussian white noise.
Improved Ant Colony Clustering Algorithm and Its Performance Study
Gao, Wei
2016-01-01
Clustering analysis is used in many disciplines and applications; it is an important tool that descriptively identifies homogeneous groups of objects based on attribute values. The ant colony clustering algorithm is a swarm-intelligent method used for clustering problems that is inspired by the behavior of ant colonies that cluster their corpses and sort their larvae. A new abstraction ant colony clustering algorithm using a data combination mechanism is proposed to improve the computational efficiency and accuracy of the ant colony clustering algorithm. The abstraction ant colony clustering algorithm is used to cluster benchmark problems, and its performance is compared with the ant colony clustering algorithm and other methods used in existing literature. Based on similar computational difficulties and complexities, the results show that the abstraction ant colony clustering algorithm produces results that are not only more accurate but also more efficiently determined than the ant colony clustering algorithm and the other methods. Thus, the abstraction ant colony clustering algorithm can be used for efficient multivariate data clustering. PMID:26839533
International Nuclear Information System (INIS)
Liu, Lei; Liu, Shuntao; Yuan, Fuh-Gwo
2012-01-01
A distributed on-board algorithm that is embedded and executed within a group of wireless sensors to locate structural damages in isotropic plates is presented. The algorithm is based on an energy-decay model of Lamb waves and singular value decomposition (SVD) to determine damage locations. A sensor group consists of a small number of sensors, each of which independently collects wave signals and evaluates wave energy upon an external triggering signal sent from a base station. The energy values, usually a few bytes in length, are then sent to the base station to determine the presence and location of damages. In comparison with traditional centralized approaches in which whole datasets are required to be transmitted, the proposed algorithm yields much less wireless communication traffic, yet with a modest amount of computation required within sensors. Experiments have shown that the algorithm is robust to locate damage for isotropic plate structures and is very power efficient, with more than an order-of-magnitude power saving
An exact and efficient first passage time algorithm for reaction–diffusion processes on a 2D-lattice
International Nuclear Information System (INIS)
Bezzola, Andri; Bales, Benjamin B.; Alkire, Richard C.; Petzold, Linda R.
2014-01-01
We present an exact and efficient algorithm for reaction–diffusion–nucleation processes on a 2D-lattice. The algorithm makes use of first passage time (FPT) to replace the computationally intensive simulation of diffusion hops in KMC by larger jumps when particles are far away from step-edges or other particles. Our approach computes exact probability distributions of jump times and target locations in a closed-form formula, based on the eigenvectors and eigenvalues of the corresponding 1D transition matrix, maintaining atomic-scale resolution of resulting shapes of deposit islands. We have applied our method to three different test cases of electrodeposition: pure diffusional aggregation for large ranges of diffusivity rates and for simulation domain sizes of up to 4096×4096 sites, the effect of diffusivity on island shapes and sizes in combination with a KMC edge diffusion, and the calculation of an exclusion zone in front of a step-edge, confirming statistical equivalence to standard KMC simulations. The algorithm achieves significant speedup compared to standard KMC for cases where particles diffuse over long distances before nucleating with other particles or being captured by larger islands
An exact and efficient first passage time algorithm for reaction–diffusion processes on a 2D-lattice
Energy Technology Data Exchange (ETDEWEB)
Bezzola, Andri, E-mail: andri.bezzola@gmail.com [Mechanical Engineering Department, University of California, Santa Barbara, CA 93106 (United States); Bales, Benjamin B., E-mail: bbbales2@gmail.com [Mechanical Engineering Department, University of California, Santa Barbara, CA 93106 (United States); Alkire, Richard C., E-mail: r-alkire@uiuc.edu [Department of Chemical Engineering, University of Illinois, Urbana, IL 61801 (United States); Petzold, Linda R., E-mail: petzold@engineering.ucsb.edu [Mechanical Engineering Department and Computer Science Department, University of California, Santa Barbara, CA 93106 (United States)
2014-01-01
We present an exact and efficient algorithm for reaction–diffusion–nucleation processes on a 2D-lattice. The algorithm makes use of first passage time (FPT) to replace the computationally intensive simulation of diffusion hops in KMC by larger jumps when particles are far away from step-edges or other particles. Our approach computes exact probability distributions of jump times and target locations in a closed-form formula, based on the eigenvectors and eigenvalues of the corresponding 1D transition matrix, maintaining atomic-scale resolution of resulting shapes of deposit islands. We have applied our method to three different test cases of electrodeposition: pure diffusional aggregation for large ranges of diffusivity rates and for simulation domain sizes of up to 4096×4096 sites, the effect of diffusivity on island shapes and sizes in combination with a KMC edge diffusion, and the calculation of an exclusion zone in front of a step-edge, confirming statistical equivalence to standard KMC simulations. The algorithm achieves significant speedup compared to standard KMC for cases where particles diffuse over long distances before nucleating with other particles or being captured by larger islands.
Limits on the Efficiency of Event-Based Algorithms for Monte Carlo Neutron Transport
Energy Technology Data Exchange (ETDEWEB)
Romano, Paul K.; Siegel, Andrew R.
2017-04-16
The traditional form of parallelism in Monte Carlo particle transport simulations, wherein each individual particle history is considered a unit of work, does not lend itself well to data-level parallelism. Event-based algorithms, which were originally used for simulations on vector processors, may offer a path toward better utilizing data-level parallelism in modern computer architectures. In this study, a simple model is developed for estimating the efficiency of the event-based particle transport algorithm under two sets of assumptions. Data collected from simulations of four reactor problems using OpenMC was then used in conjunction with the models to calculate the speedup due to vectorization as a function of two parameters: the size of the particle bank and the vector width. When each event type is assumed to have constant execution time, the achievable speedup is directly related to the particle bank size. We observed that the bank size generally needs to be at least 20 times greater than vector size in order to achieve vector efficiency greater than 90%. When the execution times for events are allowed to vary, however, the vector speedup is also limited by differences in execution time for events being carried out in a single event-iteration. For some problems, this implies that vector effciencies over 50% may not be attainable. While there are many factors impacting performance of an event-based algorithm that are not captured by our model, it nevertheless provides insights into factors that may be limiting in a real implementation.
Iterative concurrent reconstruction algorithms for emission computed tomography
International Nuclear Information System (INIS)
Brown, J.K.; Hasegawa, B.H.; Lang, T.F.
1994-01-01
Direct reconstruction techniques, such as those based on filtered backprojection, are typically used for emission computed tomography (ECT), even though it has been argued that iterative reconstruction methods may produce better clinical images. The major disadvantage of iterative reconstruction algorithms, and a significant reason for their lack of clinical acceptance, is their computational burden. We outline a new class of ''concurrent'' iterative reconstruction techniques for ECT in which the reconstruction process is reorganized such that a significant fraction of the computational processing occurs concurrently with the acquisition of ECT projection data. These new algorithms use the 10-30 min required for acquisition of a typical SPECT scan to iteratively process the available projection data, significantly reducing the requirements for post-acquisition processing. These algorithms are tested on SPECT projection data from a Hoffman brain phantom acquired with a 2 x 10 5 counts in 64 views each having 64 projections. The SPECT images are reconstructed as 64 x 64 tomograms, starting with six angular views. Other angular views are added to the reconstruction process sequentially, in a manner that reflects their availability for a typical acquisition protocol. The results suggest that if T s of concurrent processing are used, the reconstruction processing time required after completion of the data acquisition can be reduced by at least 1/3 T s. (Author)
Energy Efficient and Safe Weighted Clustering Algorithm for Mobile Wireless Sensor Networks
Directory of Open Access Journals (Sweden)
Amine Dahane
2015-01-01
Full Text Available The main concern of clustering approaches for mobile wireless sensor networks (WSNs is to prolong the battery life of the individual sensors and the network lifetime. For a successful clustering approach the need of a powerful mechanism to safely elect a cluster head remains a challenging task in many research works that take into account the mobility of the network. The approach based on the computing of the weight of each node in the network is one of the proposed techniques to deal with this problem. In this paper, we propose an energy efficient and safe weighted clustering algorithm (ES-WCA for mobile WSNs using a combination of five metrics. Among these metrics lies the behavioral level metric which promotes a safe choice of a cluster head in the sense where this last one will never be a malicious node. Moreover, the highlight of our work is summarized in a comprehensive strategy for monitoring the network, in order to detect and remove the malicious nodes. We use simulation study to demonstrate the performance of the proposed algorithm.
Efficient Algorithms for a Family of Matroid Intersection Problems
National Research Council Canada - National Science Library
Gabow, Harold N; Tarjan, Robert E
1982-01-01
.... its efficiency is demonstrated by implementations on specific matroids. In all cases but one, the running time matches the best-known algorithm for the problem without the red element constraint...
Efficient Geometric Sound Propagation Using Visibility Culling
Chandak, Anish
2011-07-01
Simulating propagation of sound can improve the sense of realism in interactive applications such as video games and can lead to better designs in engineering applications such as architectural acoustics. In this thesis, we present geometric sound propagation techniques which are faster than prior methods and map well to upcoming parallel multi-core CPUs. We model specular reflections by using the image-source method and model finite-edge diffraction by using the well-known Biot-Tolstoy-Medwin (BTM) model. We accelerate the computation of specular reflections by applying novel visibility algorithms, FastV and AD-Frustum, which compute visibility from a point. We accelerate finite-edge diffraction modeling by applying a novel visibility algorithm which computes visibility from a region. Our visibility algorithms are based on frustum tracing and exploit recent advances in fast ray-hierarchy intersections, data-parallel computations, and scalable, multi-core algorithms. The AD-Frustum algorithm adapts its computation to the scene complexity and allows small errors in computing specular reflection paths for higher computational efficiency. FastV and our visibility algorithm from a region are general, object-space, conservative visibility algorithms that together significantly reduce the number of image sources compared to other techniques while preserving the same accuracy. Our geometric propagation algorithms are an order of magnitude faster than prior approaches for modeling specular reflections and two to ten times faster for modeling finite-edge diffraction. Our algorithms are interactive, scale almost linearly on multi-core CPUs, and can handle large, complex, and dynamic scenes. We also compare the accuracy of our sound propagation algorithms with other methods. Once sound propagation is performed, it is desirable to listen to the propagated sound in interactive and engineering applications. We can generate smooth, artifact-free output audio signals by applying
Localized Ambient Solidity Separation Algorithm Based Computer User Segmentation
Sun, Xiao; Zhang, Tongda; Chai, Yueting; Liu, Yi
2015-01-01
Most of popular clustering methods typically have some strong assumptions of the dataset. For example, the k-means implicitly assumes that all clusters come from spherical Gaussian distributions which have different means but the same covariance. However, when dealing with datasets that have diverse distribution shapes or high dimensionality, these assumptions might not be valid anymore. In order to overcome this weakness, we proposed a new clustering algorithm named localized ambient solidity separation (LASS) algorithm, using a new isolation criterion called centroid distance. Compared with other density based isolation criteria, our proposed centroid distance isolation criterion addresses the problem caused by high dimensionality and varying density. The experiment on a designed two-dimensional benchmark dataset shows that our proposed LASS algorithm not only inherits the advantage of the original dissimilarity increments clustering method to separate naturally isolated clusters but also can identify the clusters which are adjacent, overlapping, and under background noise. Finally, we compared our LASS algorithm with the dissimilarity increments clustering method on a massive computer user dataset with over two million records that contains demographic and behaviors information. The results show that LASS algorithm works extremely well on this computer user dataset and can gain more knowledge from it. PMID:26221133
Work-Efficient Parallel Skyline Computation for the GPU
DEFF Research Database (Denmark)
Bøgh, Kenneth Sejdenfaden; Chester, Sean; Assent, Ira
2015-01-01
offers the potential for parallelizing skyline computation across thousands of cores. However, attempts to port skyline algorithms to the GPU have prioritized throughput and failed to outperform sequential algorithms. In this paper, we introduce a new skyline algorithm, designed for the GPU, that uses...... a global, static partitioning scheme. With the partitioning, we can permit controlled branching to exploit transitive relationships and avoid most point-to-point comparisons. The result is a non-traditional GPU algorithm, SkyAlign, that prioritizes work-effciency and respectable throughput, rather than...
Mental Computation or Standard Algorithm? Children's Strategy Choices on Multi-Digit Subtractions
Torbeyns, Joke; Verschaffel, Lieven
2016-01-01
This study analyzed children's use of mental computation strategies and the standard algorithm on multi-digit subtractions. Fifty-eight Flemish 4th graders of varying mathematical achievement level were individually offered subtractions that either stimulated the use of mental computation strategies or the standard algorithm in one choice and two…
Lee, Byungjin; Lee, Young Jae; Sung, Sangkyung
2018-05-01
A novel attitude determination method is investigated that is computationally efficient and implementable in low cost sensor and embedded platform. Recent result on attitude reference system design is adapted to further develop a three-dimensional attitude determination algorithm through the relative velocity incremental measurements. For this, velocity incremental vectors, computed respectively from INS and GPS with different update rate, are compared to generate filter measurement for attitude estimation. In the quaternion-based Kalman filter configuration, an Euler-like attitude perturbation angle is uniquely introduced for reducing filter states and simplifying propagation processes. Furthermore, assuming a small angle approximation between attitude update periods, it is shown that the reduced order filter greatly simplifies the propagation processes. For performance verification, both simulation and experimental studies are completed. A low cost MEMS IMU and GPS receiver are employed for system integration, and comparison with the true trajectory or a high-grade navigation system demonstrates the performance of the proposed algorithm.
Limits on efficient computation in the physical world
Aaronson, Scott Joel
More than a speculative technology, quantum computing seems to challenge our most basic intuitions about how the physical world should behave. In this thesis I show that, while some intuitions from classical computer science must be jettisoned in the light of modern physics, many others emerge nearly unscathed; and I use powerful tools from computational complexity theory to help determine which are which. In the first part of the thesis, I attack the common belief that quantum computing resembles classical exponential parallelism, by showing that quantum computers would face serious limitations on a wider range of problems than was previously known. In particular, any quantum algorithm that solves the collision problem---that of deciding whether a sequence of n integers is one-to-one or two-to-one---must query the sequence O (n1/5) times. This resolves a question that was open for years; previously no lower bound better than constant was known. A corollary is that there is no "black-box" quantum algorithm to break cryptographic hash functions or solve the Graph Isomorphism problem in polynomial time. I also show that relative to an oracle, quantum computers could not solve NP-complete problems in polynomial time, even with the help of nonuniform "quantum advice states"; and that any quantum algorithm needs O (2n/4/n) queries to find a local minimum of a black-box function on the n-dimensional hypercube. Surprisingly, the latter result also leads to new classical lower bounds for the local search problem. Finally, I give new lower bounds on quantum one-way communication complexity, and on the quantum query complexity of total Boolean functions and recursive Fourier sampling. The second part of the thesis studies the relationship of the quantum computing model to physical reality. I first examine the arguments of Leonid Levin, Stephen Wolfram, and others who believe quantum computing to be fundamentally impossible. I find their arguments unconvincing without a "Sure
Algorithms for the Computation of Debris Risk
Matney, Mark J.
2017-01-01
Determining the risks from space debris involve a number of statistical calculations. These calculations inevitably involve assumptions about geometry - including the physical geometry of orbits and the geometry of satellites. A number of tools have been developed in NASA’s Orbital Debris Program Office to handle these calculations; many of which have never been published before. These include algorithms that are used in NASA’s Orbital Debris Engineering Model ORDEM 3.0, as well as other tools useful for computing orbital collision rates and ground casualty risks. This paper presents an introduction to these algorithms and the assumptions upon which they are based.
Algorithm comparison and benchmarking using a parallel spectra transform shallow water model
Energy Technology Data Exchange (ETDEWEB)
Worley, P.H. [Oak Ridge National Lab., TN (United States); Foster, I.T.; Toonen, B. [Argonne National Lab., IL (United States)
1995-04-01
In recent years, a number of computer vendors have produced supercomputers based on a massively parallel processing (MPP) architecture. These computers have been shown to be competitive in performance with conventional vector supercomputers for some applications. As spectral weather and climate models are heavy users of vector supercomputers, it is interesting to determine how these models perform on MPPS, and which MPPs are best suited to the execution of spectral models. The benchmarking of MPPs is complicated by the fact that different algorithms may be more efficient on different architectures. Hence, a comprehensive benchmarking effort must answer two related questions: which algorithm is most efficient on each computer and how do the most efficient algorithms compare on different computers. In general, these are difficult questions to answer because of the high cost associated with implementing and evaluating a range of different parallel algorithms on each MPP platform.
Computational performance of a projection and rescaling algorithm
Pena, Javier; Soheili, Negar
2018-01-01
This paper documents a computational implementation of a {\\em projection and rescaling algorithm} for finding most interior solutions to the pair of feasibility problems \\[ \\text{find} \\; x\\in L\\cap\\mathbb{R}^n_{+} \\;\\;\\;\\; \\text{ and } \\; \\;\\;\\;\\; \\text{find} \\; \\hat x\\in L^\\perp\\cap\\mathbb{R}^n_{+}, \\] where $L$ denotes a linear subspace in $\\mathbb{R}^n$ and $L^\\perp$ denotes its orthogonal complement. The projection and rescaling algorithm is a recently developed method that combines a {\\...
A New Block Processing Algorithm of LLL for Fast High-dimension Ambiguity Resolution
Directory of Open Access Journals (Sweden)
LIU Wanke
2016-02-01
Full Text Available Due to high dimension and precision for the ambiguity vector under GNSS observations of multi-frequency and multi-system, a major problem to limit computational efficiency of ambiguity resolution is the longer reduction time when using conventional LLL algorithm. To address this problem, it is proposed a new block processing algorithm of LLL by analyzing the relationship between the reduction time and the dimensions and precision of ambiguity. The new algorithm reduces the reduction time to improve computational efficiency of ambiguity resolution, which is based on block processing ambiguity variance-covariance matrix that decreased the dimensions of single reduction matrix. It is validated that the new algorithm with two groups of measured data. The results show that the computing efficiency of the new algorithm increased by 65.2% and 60.2% respectively compared with that of LLL algorithm when choosing a reasonable number of blocks.
Computation-aware algorithm selection approach for interlaced-to-progressive conversion
Park, Sang-Jun; Jeon, Gwanggil; Jeong, Jechang
2010-05-01
We discuss deinterlacing results in a computationally constrained and varied environment. The proposed computation-aware algorithm selection approach (CASA) for fast interlaced to progressive conversion algorithm consists of three methods: the line-averaging (LA) method for plain regions, the modified edge-based line-averaging (MELA) method for medium regions, and the proposed covariance-based adaptive deinterlacing (CAD) method for complex regions. The proposed CASA uses two criteria, mean-squared error (MSE) and CPU time, for assigning the method. We proposed a CAD method. The principle idea of CAD is based on the correspondence between the high and low-resolution covariances. We estimated the local covariance coefficients from an interlaced image using Wiener filtering theory and then used these optimal minimum MSE interpolation coefficients to obtain a deinterlaced image. The CAD method, though more robust than most known methods, was not found to be very fast compared to the others. To alleviate this issue, we proposed an adaptive selection approach using a fast deinterlacing algorithm rather than using only one CAD algorithm. The proposed hybrid approach of switching between the conventional schemes (LA and MELA) and our CAD was proposed to reduce the overall computational load. A reliable condition to be used for switching the schemes was presented after a wide set of initial training processes. The results of computer simulations showed that the proposed methods outperformed a number of methods presented in the literature.
Implementation of PHENIX trigger algorithms on massively parallel computers
International Nuclear Information System (INIS)
Petridis, A.N.; Wohn, F.K.
1995-01-01
The event selection requirements of contemporary high energy and nuclear physics experiments are met by the introduction of on-line trigger algorithms which identify potentially interesting events and reduce the data acquisition rate to levels that are manageable by the electronics. Such algorithms being parallel in nature can be simulated off-line using massively parallel computers. The PHENIX experiment intends to investigate the possible existence of a new phase of matter called the quark gluon plasma which has been theorized to have existed in very early stages of the evolution of the universe by studying collisions of heavy nuclei at ultra-relativistic energies. Such interactions can also reveal important information regarding the structure of the nucleus and mandate a thorough investigation of the simpler proton-nucleus collisions at the same energies. The complexity of PHENIX events and the need to analyze and also simulate them at rates similar to the data collection ones imposes enormous computation demands. This work is a first effort to implement PHENIX trigger algorithms on parallel computers and to study the feasibility of using such machines to run the complex programs necessary for the simulation of the PHENIX detector response. Fine and coarse grain approaches have been studied and evaluated. Depending on the application the performance of a massively parallel computer can be much better or much worse than that of a serial workstation. A comparison between single instruction and multiple instruction computers is also made and possible applications of the single instruction machines to high energy and nuclear physics experiments are outlined. copyright 1995 American Institute of Physics
An efficient algorithm for calculation of the Luenberger canonical form.
Jordan, D.; Sridhar, B.
1973-01-01
A new algorithm is presented to obtain the Luenberger canonical form for multivariable systems. A distinct feature of the method is that the canonical form is obtained directly and, if necessary, the similarity transformation can be computed. There is a substantial reduction in the amount of computation compared to Luenberger's method. The reduced computations along with Gaussian techniques lend greater inherent accuracy and the ability to refine the solution with additional computations. An example is presented to illustrate the technique.
Realization of Deutsch-like algorithm using ensemble computing
International Nuclear Information System (INIS)
Wei Daxiu; Luo Jun; Sun Xianping; Zeng Xizhi
2003-01-01
The Deutsch-like algorithm [Phys. Rev. A. 63 (2001) 034101] distinguishes between even and odd query functions using fewer function calls than its possible classical counterpart in a two-qubit system. But the similar method cannot be applied to a multi-qubit system. We propose a new approach for solving Deutsch-like problem using ensemble computing. The proposed algorithm needs an ancillary qubit and can be easily extended to multi-qubit system with one query. Our ensemble algorithm beginning with a easily-prepared initial state has three main steps. The classifications of the functions can be obtained directly from the spectra of the ancilla qubit. We also demonstrate the new algorithm in a four-qubit molecular system using nuclear magnetic resonance (NMR). One hydrogen and three carbons are selected as the four qubits, and one of carbons is ancilla qubit. We choice two unitary transformations, corresponding to two functions (one odd function and one even function), to validate the ensemble algorithm. The results show that our experiment is successfully and our ensemble algorithm for solving the Deutsch-like problem is virtual
Sort-Mid tasks scheduling algorithm in grid computing.
Reda, Naglaa M; Tawfik, A; Marzok, Mohamed A; Khamis, Soheir M
2015-11-01
Scheduling tasks on heterogeneous resources distributed over a grid computing system is an NP-complete problem. The main aim for several researchers is to develop variant scheduling algorithms for achieving optimality, and they have shown a good performance for tasks scheduling regarding resources selection. However, using of the full power of resources is still a challenge. In this paper, a new heuristic algorithm called Sort-Mid is proposed. It aims to maximizing the utilization and minimizing the makespan. The new strategy of Sort-Mid algorithm is to find appropriate resources. The base step is to get the average value via sorting list of completion time of each task. Then, the maximum average is obtained. Finally, the task has the maximum average is allocated to the machine that has the minimum completion time. The allocated task is deleted and then, these steps are repeated until all tasks are allocated. Experimental tests show that the proposed algorithm outperforms almost other algorithms in terms of resources utilization and makespan.
Privacy-Preserving Computation with Trusted Computing via Scramble-then-Compute
Dang Hung; Dinh Tien Tuan Anh; Chang Ee-Chien; Ooi Beng Chin
2017-01-01
We consider privacy-preserving computation of big data using trusted computing primitives with limited private memory. Simply ensuring that the data remains encrypted outside the trusted computing environment is insufficient to preserve data privacy, for data movement observed during computation could leak information. While it is possible to thwart such leakage using generic solution such as ORAM [42], designing efficient privacy-preserving algorithms is challenging. Besides computation effi...
A coordinate descent MM algorithm for fast computation of sparse logistic PCA
Lee, Seokho
2013-06-01
Sparse logistic principal component analysis was proposed in Lee et al. (2010) for exploratory analysis of binary data. Relying on the joint estimation of multiple principal components, the algorithm therein is computationally too demanding to be useful when the data dimension is high. We develop a computationally fast algorithm using a combination of coordinate descent and majorization-minimization (MM) auxiliary optimization. Our new algorithm decouples the joint estimation of multiple components into separate estimations and consists of closed-form elementwise updating formulas for each sparse principal component. The performance of the proposed algorithm is tested using simulation and high-dimensional real-world datasets. © 2013 Elsevier B.V. All rights reserved.
Fast algorithm for automatically computing Strahler stream order
Lanfear, Kenneth J.
1990-01-01
An efficient algorithm was developed to determine Strahler stream order for segments of stream networks represented in a Geographic Information System (GIS). The algorithm correctly assigns Strahler stream order in topologically complex situations such as braided streams and multiple drainage outlets. Execution time varies nearly linearly with the number of stream segments in the network. This technique is expected to be particularly useful for studying the topology of dense stream networks derived from digital elevation model data.
Arbitrated Quantum Signature with Hamiltonian Algorithm Based on Blind Quantum Computation
Shi, Ronghua; Ding, Wanting; Shi, Jinjing
2018-03-01
A novel arbitrated quantum signature (AQS) scheme is proposed motivated by the Hamiltonian algorithm (HA) and blind quantum computation (BQC). The generation and verification of signature algorithm is designed based on HA, which enables the scheme to rely less on computational complexity. It is unnecessary to recover original messages when verifying signatures since the blind quantum computation is applied, which can improve the simplicity and operability of our scheme. It is proved that the scheme can be deployed securely, and the extended AQS has some extensive applications in E-payment system, E-government, E-business, etc.
Combinatorial optimization algorithms and complexity
Papadimitriou, Christos H
1998-01-01
This clearly written, mathematically rigorous text includes a novel algorithmic exposition of the simplex method and also discusses the Soviet ellipsoid algorithm for linear programming; efficient algorithms for network flow, matching, spanning trees, and matroids; the theory of NP-complete problems; approximation algorithms, local search heuristics for NP-complete problems, more. All chapters are supplemented by thought-provoking problems. A useful work for graduate-level students with backgrounds in computer science, operations research, and electrical engineering.
Algorithms for the Computation of Debris Risks
Matney, Mark
2017-01-01
Determining the risks from space debris involve a number of statistical calculations. These calculations inevitably involve assumptions about geometry - including the physical geometry of orbits and the geometry of non-spherical satellites. A number of tools have been developed in NASA's Orbital Debris Program Office to handle these calculations; many of which have never been published before. These include algorithms that are used in NASA's Orbital Debris Engineering Model ORDEM 3.0, as well as other tools useful for computing orbital collision rates and ground casualty risks. This paper will present an introduction to these algorithms and the assumptions upon which they are based.
A general algorithm for computing distance transforms in linear time
Meijster, A.; Roerdink, J.B.T.M.; Hesselink, W.H.; Goutsias, J; Vincent, L; Bloomberg, DS
2000-01-01
A new general algorithm fur computing distance transforms of digital images is presented. The algorithm consists of two phases. Both phases consist of two scans, a forward and a backward scan. The first phase scans the image column-wise, while the second phase scans the image row-wise. Since the
An efficient and accurate 3D displacements tracking strategy for digital volume correlation
Pan, Bing; Wang, Bo; Wu, Dafang; Lubineau, Gilles
2014-07-01
Owing to its inherent computational complexity, practical implementation of digital volume correlation (DVC) for internal displacement and strain mapping faces important challenges in improving its computational efficiency. In this work, an efficient and accurate 3D displacement tracking strategy is proposed for fast DVC calculation. The efficiency advantage is achieved by using three improvements. First, to eliminate the need of updating Hessian matrix in each iteration, an efficient 3D inverse compositional Gauss-Newton (3D IC-GN) algorithm is introduced to replace existing forward additive algorithms for accurate sub-voxel displacement registration. Second, to ensure the 3D IC-GN algorithm that converges accurately and rapidly and avoid time-consuming integer-voxel displacement searching, a generalized reliability-guided displacement tracking strategy is designed to transfer accurate and complete initial guess of deformation for each calculation point from its computed neighbors. Third, to avoid the repeated computation of sub-voxel intensity interpolation coefficients, an interpolation coefficient lookup table is established for tricubic interpolation. The computational complexity of the proposed fast DVC and the existing typical DVC algorithms are first analyzed quantitatively according to necessary arithmetic operations. Then, numerical tests are performed to verify the performance of the fast DVC algorithm in terms of measurement accuracy and computational efficiency. The experimental results indicate that, compared with the existing DVC algorithm, the presented fast DVC algorithm produces similar precision and slightly higher accuracy at a substantially reduced computational cost.
An efficient and accurate 3D displacements tracking strategy for digital volume correlation
Pan, Bing
2014-07-01
Owing to its inherent computational complexity, practical implementation of digital volume correlation (DVC) for internal displacement and strain mapping faces important challenges in improving its computational efficiency. In this work, an efficient and accurate 3D displacement tracking strategy is proposed for fast DVC calculation. The efficiency advantage is achieved by using three improvements. First, to eliminate the need of updating Hessian matrix in each iteration, an efficient 3D inverse compositional Gauss-Newton (3D IC-GN) algorithm is introduced to replace existing forward additive algorithms for accurate sub-voxel displacement registration. Second, to ensure the 3D IC-GN algorithm that converges accurately and rapidly and avoid time-consuming integer-voxel displacement searching, a generalized reliability-guided displacement tracking strategy is designed to transfer accurate and complete initial guess of deformation for each calculation point from its computed neighbors. Third, to avoid the repeated computation of sub-voxel intensity interpolation coefficients, an interpolation coefficient lookup table is established for tricubic interpolation. The computational complexity of the proposed fast DVC and the existing typical DVC algorithms are first analyzed quantitatively according to necessary arithmetic operations. Then, numerical tests are performed to verify the performance of the fast DVC algorithm in terms of measurement accuracy and computational efficiency. The experimental results indicate that, compared with the existing DVC algorithm, the presented fast DVC algorithm produces similar precision and slightly higher accuracy at a substantially reduced computational cost. © 2014 Elsevier Ltd.
Efficiency of free-energy calculations of spin lattices by spectral quantum algorithms
International Nuclear Information System (INIS)
Master, Cyrus P.; Yamaguchi, Fumiko; Yamamoto, Yoshihisa
2003-01-01
Ensemble quantum algorithms are well suited to calculate estimates of the energy spectra for spin-lattice systems. Based on the phase estimation algorithm, these algorithms efficiently estimate discrete Fourier coefficients of the density of states. Their efficiency in calculating the free energy per spin of general spin lattices to bounded error is examined. We find that the number of Fourier components required to bound the error in the free energy due to the broadening of the density of states scales polynomially with the number of spins in the lattice. However, the precision with which the Fourier components must be calculated is found to be an exponential function of the system size
A Line Search Multilevel Truncated Newton Algorithm for Computing the Optical Flow
Directory of Open Access Journals (Sweden)
Lluís Garrido
2015-06-01
Full Text Available We describe the implementation details and give the experimental results of three optimization algorithms for dense optical flow computation. In particular, using a line search strategy, we evaluate the performance of the unilevel truncated Newton method (LSTN, a multiresolution truncated Newton (MR/LSTN and a full multigrid truncated Newton (FMG/LSTN. We use three image sequences and four models of optical flow for performance evaluation. The FMG/LSTN algorithm is shown to lead to better optical flow estimation with less computational work than both the LSTN and MR/LSTN algorithms.
An Adaptive Evolutionary Algorithm for Traveling Salesman Problem with Precedence Constraints
Directory of Open Access Journals (Sweden)
Jinmo Sung
2014-01-01
Full Text Available Traveling sales man problem with precedence constraints is one of the most notorious problems in terms of the efficiency of its solution approach, even though it has very wide range of industrial applications. We propose a new evolutionary algorithm to efficiently obtain good solutions by improving the search process. Our genetic operators guarantee the feasibility of solutions over the generations of population, which significantly improves the computational efficiency even when it is combined with our flexible adaptive searching strategy. The efficiency of the algorithm is investigated by computational experiments.
An improved algorithm for finding all minimal paths in a network
International Nuclear Information System (INIS)
Bai, Guanghan; Tian, Zhigang; Zuo, Ming J.
2016-01-01
Minimal paths (MPs) play an important role in network reliability evaluation. In this paper, we report an efficient recursive algorithm for finding all MPs in two-terminal networks, which consist of a source node and a sink node. A linked path structure indexed by nodes is introduced, which accepts both directed and undirected form of networks. The distance between each node and the sink node is defined, and a simple recursive algorithm is presented for labeling the distance for each node. Based on the distance between each node and the sink node, additional conditions for backtracking are incorporated to reduce the number of search branches. With the newly introduced linked node structure, the distances between each node and the sink node, and the additional backtracking conditions, an improved backtracking algorithm for searching for all MPs is developed. In addition, the proposed algorithm can be adapted to search for all minimal paths for each source–sink pair in networks consisting of multiple source nodes and/or multiple sink nodes. Through computational experiments, it is demonstrated that the proposed algorithm is more efficient than existing algorithms when the network size is not too small. The proposed algorithm becomes more advantageous as the size of the network grows. - Highlights: • A linked path structure indexed by nodes is introduced to represent networks. • Additional conditions for backtracking are proposed based on the distance of each node. • An efficient algorithm is developed to find all MPs for two-terminal networks. • The computational efficiency of the algorithm for two-terminal networks is investigated. • The computational efficiency of the algorithm for multi-terminal networks is investigated.
CSA: An efficient algorithm to improve circular DNA multiple alignment
Directory of Open Access Journals (Sweden)
Pereira Luísa
2009-07-01
Full Text Available Abstract Background The comparison of homologous sequences from different species is an essential approach to reconstruct the evolutionary history of species and of the genes they harbour in their genomes. Several complete mitochondrial and nuclear genomes are now available, increasing the importance of using multiple sequence alignment algorithms in comparative genomics. MtDNA has long been used in phylogenetic analysis and errors in the alignments can lead to errors in the interpretation of evolutionary information. Although a large number of multiple sequence alignment algorithms have been proposed to date, they all deal with linear DNA and cannot handle directly circular DNA. Researchers interested in aligning circular DNA sequences must first rotate them to the "right" place using an essentially manual process, before they can use multiple sequence alignment tools. Results In this paper we propose an efficient algorithm that identifies the most interesting region to cut circular genomes in order to improve phylogenetic analysis when using standard multiple sequence alignment algorithms. This algorithm identifies the largest chain of non-repeated longest subsequences common to a set of circular mitochondrial DNA sequences. All the sequences are then rotated and made linear for multiple alignment purposes. To evaluate the effectiveness of this new tool, three different sets of mitochondrial DNA sequences were considered. Other tests considering randomly rotated sequences were also performed. The software package Arlequin was used to evaluate the standard genetic measures of the alignments obtained with and without the use of the CSA algorithm with two well known multiple alignment algorithms, the CLUSTALW and the MAVID tools, and also the visualization tool SinicView. Conclusion The results show that a circularization and rotation pre-processing step significantly improves the efficiency of public available multiple sequence alignment
Directory of Open Access Journals (Sweden)
D. Chitra Devi
2016-01-01
Full Text Available Cloud computing uses the concepts of scheduling and load balancing to migrate tasks to underutilized VMs for effectively sharing the resources. The scheduling of the nonpreemptive tasks in the cloud computing environment is an irrecoverable restraint and hence it has to be assigned to the most appropriate VMs at the initial placement itself. Practically, the arrived jobs consist of multiple interdependent tasks and they may execute the independent tasks in multiple VMs or in the same VM’s multiple cores. Also, the jobs arrive during the run time of the server in varying random intervals under various load conditions. The participating heterogeneous resources are managed by allocating the tasks to appropriate resources by static or dynamic scheduling to make the cloud computing more efficient and thus it improves the user satisfaction. Objective of this work is to introduce and evaluate the proposed scheduling and load balancing algorithm by considering the capabilities of each virtual machine (VM, the task length of each requested job, and the interdependency of multiple tasks. Performance of the proposed algorithm is studied by comparing with the existing methods.
Portfolios of quantum algorithms.
Maurer, S M; Hogg, T; Huberman, B A
2001-12-17
Quantum computation holds promise for the solution of many intractable problems. However, since many quantum algorithms are stochastic in nature they can find the solution of hard problems only probabilistically. Thus the efficiency of the algorithms has to be characterized by both the expected time to completion and the associated variance. In order to minimize both the running time and its uncertainty, we show that portfolios of quantum algorithms analogous to those of finance can outperform single algorithms when applied to the NP-complete problems such as 3-satisfiability.
Directory of Open Access Journals (Sweden)
Rabha W. Ibrahim
2018-01-01
Full Text Available The maximum min utility function (MMUF problem is an important representative of a large class of cloud computing systems (CCS. Having numerous applications in practice, especially in economy and industry. This paper introduces an effective solution-based search (SBS algorithm for solving the problem MMUF. First, we suggest a new formula of the utility function in term of the capacity of the cloud. We formulate the capacity in CCS, by using a fractional diffeo-integral equation. This equation usually describes the flow of CCS. The new formula of the utility function is modified recent active utility functions. The suggested technique first creates a high-quality initial solution by eliminating the less promising components, and then develops the quality of the achieved solution by the summation search solution (SSS. This method is considered by the Mittag-Leffler sum as hash functions to determine the position of the agent. Experimental results commonly utilized in the literature demonstrate that the proposed algorithm competes approvingly with the state-of-the-art algorithms both in terms of solution quality and computational efficiency.
Directory of Open Access Journals (Sweden)
MEI, G.
2015-05-01
Full Text Available In the calculating of convex hulls for point sets, a preprocessing procedure that is to filter the input points by discarding non-extreme points is commonly used to improve the computational efficiency. We previously proposed a quite straightforward preprocessing approach for accelerating 2D convex hull computation on the GPU. In this paper, we extend that algorithm to being used in 3D cases. The basic ideas behind these two preprocessing algorithms are similar: first, several groups of extreme points are found according to the original set of input points and several rotated versions of the input set; then, a convex polyhedron is created using the found extreme points; and finally those interior points locating inside the formed convex polyhedron are discarded. Experimental results show that: when employing the proposed preprocessing algorithm, it achieves the speedups of about 4x on average and 5x to 6x in the best cases over the cases where the proposed approach is not used. In addition, more than 95 percent of the input points can be discarded in most experimental tests.
Gerjuoy, Edward
2005-06-01
The security of messages encoded via the widely used RSA public key encryption system rests on the enormous computational effort required to find the prime factors of a large number N using classical (conventional) computers. In 1994 Peter Shor showed that for sufficiently large N, a quantum computer could perform the factoring with much less computational effort. This paper endeavors to explain, in a fashion comprehensible to the nonexpert, the RSA encryption protocol; the various quantum computer manipulations constituting the Shor algorithm; how the Shor algorithm performs the factoring; and the precise sense in which a quantum computer employing Shor's algorithm can be said to accomplish the factoring of very large numbers with less computational effort than a classical computer. It is made apparent that factoring N generally requires many successive runs of the algorithm. Our analysis reveals that the probability of achieving a successful factorization on a single run is about twice as large as commonly quoted in the literature.
Lee, Wei-Po; Hsiao, Yu-Ting; Hwang, Wei-Che
2014-01-16
To improve the tedious task of reconstructing gene networks through testing experimentally the possible interactions between genes, it becomes a trend to adopt the automated reverse engineering procedure instead. Some evolutionary algorithms have been suggested for deriving network parameters. However, to infer large networks by the evolutionary algorithm, it is necessary to address two important issues: premature convergence and high computational cost. To tackle the former problem and to enhance the performance of traditional evolutionary algorithms, it is advisable to use parallel model evolutionary algorithms. To overcome the latter and to speed up the computation, it is advocated to adopt the mechanism of cloud computing as a promising solution: most popular is the method of MapReduce programming model, a fault-tolerant framework to implement parallel algorithms for inferring large gene networks. This work presents a practical framework to infer large gene networks, by developing and parallelizing a hybrid GA-PSO optimization method. Our parallel method is extended to work with the Hadoop MapReduce programming model and is executed in different cloud computing environments. To evaluate the proposed approach, we use a well-known open-source software GeneNetWeaver to create several yeast S. cerevisiae sub-networks and use them to produce gene profiles. Experiments have been conducted and the results have been analyzed. They show that our parallel approach can be successfully used to infer networks with desired behaviors and the computation time can be largely reduced. Parallel population-based algorithms can effectively determine network parameters and they perform better than the widely-used sequential algorithms in gene network inference. These parallel algorithms can be distributed to the cloud computing environment to speed up the computation. By coupling the parallel model population-based optimization method and the parallel computational framework, high
Robust efficient video fingerprinting
Puri, Manika; Lubin, Jeffrey
2009-02-01
We have developed a video fingerprinting system with robustness and efficiency as the primary and secondary design criteria. In extensive testing, the system has shown robustness to cropping, letter-boxing, sub-titling, blur, drastic compression, frame rate changes, size changes and color changes, as well as to the geometric distortions often associated with camcorder capture in cinema settings. Efficiency is afforded by a novel two-stage detection process in which a fast matching process first computes a number of likely candidates, which are then passed to a second slower process that computes the overall best match with minimal false alarm probability. One key component of the algorithm is a maximally stable volume computation - a three-dimensional generalization of maximally stable extremal regions - that provides a content-centric coordinate system for subsequent hash function computation, independent of any affine transformation or extensive cropping. Other key features include an efficient bin-based polling strategy for initial candidate selection, and a final SIFT feature-based computation for final verification. We describe the algorithm and its performance, and then discuss additional modifications that can provide further improvement to efficiency and accuracy.
Toward efficient computation of the expected relative entropy for nonlinear experimental design
International Nuclear Information System (INIS)
Coles, Darrell; Prange, Michael
2012-01-01
The expected relative entropy between prior and posterior model-parameter distributions is a Bayesian objective function in experimental design theory that quantifies the expected gain in information of an experiment relative to a previous state of knowledge. The expected relative entropy is a preferred measure of experimental quality because it can handle nonlinear data-model relationships, an important fact due to the ubiquity of nonlinearity in science and engineering and its effects on post-inversion parameter uncertainty. This objective function does not necessarily yield experiments that mediate well-determined systems, but, being a Bayesian quality measure, it rigorously accounts for prior information which constrains model parameters that may be only weakly constrained by the optimized dataset. Historically, use of the expected relative entropy has been limited by the computing and storage requirements associated with high-dimensional numerical integration. Herein, a bifocal algorithm is developed that makes these computations more efficient. The algorithm is demonstrated on a medium-sized problem of sampling relaxation phenomena and on a large problem of source–receiver selection for a 2D vertical seismic profile. The method is memory intensive but workarounds are discussed. (paper)
An Algorithm for Fast Computation of 3D Zernike Moments for Volumetric Images
Hosny, Khalid M.; Hafez, Mohamed A.
2012-01-01
An algorithm was proposed for very fast and low-complexity computation of three-dimensional Zernike moments. The 3D Zernike moments were expressed in terms of exact 3D geometric moments where the later are computed exactly through the mathematical integration of the monomial terms over the digital image/object voxels. A new symmetry-based method was proposed to compute 3D Zernike moments with 87% reduction in the computational complexity. A fast 1D cascade algorithm was also employed to add m...
A new fast algorithm for computing a complex number: Theoretic transforms
Reed, I. S.; Liu, K. Y.; Truong, T. K.
1977-01-01
A high-radix fast Fourier transformation (FFT) algorithm for computing transforms over GF(sq q), where q is a Mersenne prime, is developed to implement fast circular convolutions. This new algorithm requires substantially fewer multiplications than the conventional FFT.
Parallel External Memory Graph Algorithms
DEFF Research Database (Denmark)
Arge, Lars Allan; Goodrich, Michael T.; Sitchinava, Nodari
2010-01-01
In this paper, we study parallel I/O efficient graph algorithms in the Parallel External Memory (PEM) model, one o f the private-cache chip multiprocessor (CMP) models. We study the fundamental problem of list ranking which leads to efficient solutions to problems on trees, such as computing lowest...... an optimal speedup of Â¿(P) in parallel I/O complexity and parallel computation time, compared to the single-processor external memory counterparts....
Signal and image processing algorithm performance in a virtual and elastic computing environment
Bennett, Kelly W.; Robertson, James
2013-05-01
The U.S. Army Research Laboratory (ARL) supports the development of classification, detection, tracking, and localization algorithms using multiple sensing modalities including acoustic, seismic, E-field, magnetic field, PIR, and visual and IR imaging. Multimodal sensors collect large amounts of data in support of algorithm development. The resulting large amount of data, and their associated high-performance computing needs, increases and challenges existing computing infrastructures. Purchasing computer power as a commodity using a Cloud service offers low-cost, pay-as-you-go pricing models, scalability, and elasticity that may provide solutions to develop and optimize algorithms without having to procure additional hardware and resources. This paper provides a detailed look at using a commercial cloud service provider, such as Amazon Web Services (AWS), to develop and deploy simple signal and image processing algorithms in a cloud and run the algorithms on a large set of data archived in the ARL Multimodal Signatures Database (MMSDB). Analytical results will provide performance comparisons with existing infrastructure. A discussion on using cloud computing with government data will discuss best security practices that exist within cloud services, such as AWS.
Sort-Mid tasks scheduling algorithm in grid computing
Directory of Open Access Journals (Sweden)
Naglaa M. Reda
2015-11-01
Full Text Available Scheduling tasks on heterogeneous resources distributed over a grid computing system is an NP-complete problem. The main aim for several researchers is to develop variant scheduling algorithms for achieving optimality, and they have shown a good performance for tasks scheduling regarding resources selection. However, using of the full power of resources is still a challenge. In this paper, a new heuristic algorithm called Sort-Mid is proposed. It aims to maximizing the utilization and minimizing the makespan. The new strategy of Sort-Mid algorithm is to find appropriate resources. The base step is to get the average value via sorting list of completion time of each task. Then, the maximum average is obtained. Finally, the task has the maximum average is allocated to the machine that has the minimum completion time. The allocated task is deleted and then, these steps are repeated until all tasks are allocated. Experimental tests show that the proposed algorithm outperforms almost other algorithms in terms of resources utilization and makespan.
Development of computational algorithms for quantification of pulmonary structures
International Nuclear Information System (INIS)
Oliveira, Marcela de; Alvarez, Matheus; Alves, Allan F.F.; Miranda, Jose R.A.; Pina, Diana R.
2012-01-01
The high-resolution computed tomography has become the imaging diagnostic exam most commonly used for the evaluation of the squeals of Paracoccidioidomycosis. The subjective evaluations the radiological abnormalities found on HRCT images do not provide an accurate quantification. The computer-aided diagnosis systems produce a more objective assessment of the abnormal patterns found in HRCT images. Thus, this research proposes the development of algorithms in MATLAB® computing environment can quantify semi-automatically pathologies such as pulmonary fibrosis and emphysema. The algorithm consists in selecting a region of interest (ROI), and by the use of masks, filter densities and morphological operators, to obtain a quantification of the injured area to the area of a healthy lung. The proposed method was tested on ten HRCT scans of patients with confirmed PCM. The results of semi-automatic measurements were compared with subjective evaluations performed by a specialist in radiology, falling to a coincidence of 80% for emphysema and 58% for fibrosis. (author)
An Efficient MapReduce-Based Parallel Clustering Algorithm for Distributed Traffic Subarea Division
Directory of Open Access Journals (Sweden)
Dawen Xia
2015-01-01
Full Text Available Traffic subarea division is vital for traffic system management and traffic network analysis in intelligent transportation systems (ITSs. Since existing methods may not be suitable for big traffic data processing, this paper presents a MapReduce-based Parallel Three-Phase K-Means (Par3PKM algorithm for solving traffic subarea division problem on a widely adopted Hadoop distributed computing platform. Specifically, we first modify the distance metric and initialization strategy of K-Means and then employ a MapReduce paradigm to redesign the optimized K-Means algorithm for parallel clustering of large-scale taxi trajectories. Moreover, we propose a boundary identifying method to connect the borders of clustering results for each cluster. Finally, we divide traffic subarea of Beijing based on real-world trajectory data sets generated by 12,000 taxis in a period of one month using the proposed approach. Experimental evaluation results indicate that when compared with K-Means, Par2PK-Means, and ParCLARA, Par3PKM achieves higher efficiency, more accuracy, and better scalability and can effectively divide traffic subarea with big taxi trajectory data.
An efficient community detection algorithm using greedy surprise maximization
International Nuclear Information System (INIS)
Jiang, Yawen; Jia, Caiyan; Yu, Jian
2014-01-01
Community detection is an important and crucial problem in complex network analysis. Although classical modularity function optimization approaches are widely used for identifying communities, the modularity function (Q) suffers from its resolution limit. Recently, the surprise function (S) was experimentally proved to be better than the Q function. However, up until now, there has been no algorithm available to perform searches to directly determine the maximal surprise values. In this paper, considering the superiority of the S function over the Q function, we propose an efficient community detection algorithm called AGSO (algorithm based on greedy surprise optimization) and its improved version FAGSO (fast-AGSO), which are based on greedy surprise optimization and do not suffer from the resolution limit. In addition, (F)AGSO does not need the number of communities K to be specified in advance. Tests on experimental networks show that (F)AGSO is able to detect optimal partitions in both simple and even more complex networks. Moreover, algorithms based on surprise maximization perform better than those algorithms based on modularity maximization, including Blondel–Guillaume–Lambiotte–Lefebvre (BGLL), Clauset–Newman–Moore (CNM) and the other state-of-the-art algorithms such as Infomap, order statistics local optimization method (OSLOM) and label propagation algorithm (LPA). (paper)
A Traffic Prediction Algorithm for Street Lighting Control Efficiency
Directory of Open Access Journals (Sweden)
POPA Valentin
2013-01-01
Full Text Available This paper presents the development of a traffic prediction algorithm that can be integrated in a street lighting monitoring and control system. The prediction algorithm must enable the reduction of energy costs and improve energy efficiency by decreasing the light intensity depending on the traffic level. The algorithm analyses and processes the information received at the command center based on the traffic level at different moments. The data is collected by means of the Doppler vehicle detection sensors integrated within the system. Thus, two methods are used for the implementation of the algorithm: a neural network and a k-NN (k-Nearest Neighbor prediction algorithm. For 500 training cycles, the mean square error of the neural network is 9.766 and for 500.000 training cycles the error amounts to 0.877. In case of the k-NN algorithm the error increases from 8.24 for k=5 to 12.27 for a number of 50 neighbors. In terms of a root means square error parameter, the use of a neural network ensures the highest performance level and can be integrated in a street lighting control system.
Asymptotic optimality and efficient computation of the leave-subject-out cross-validation
Xu, Ganggang
2012-12-01
Although the leave-subject-out cross-validation (CV) has been widely used in practice for tuning parameter selection for various nonparametric and semiparametric models of longitudinal data, its theoretical property is unknown and solving the associated optimization problem is computationally expensive, especially when there are multiple tuning parameters. In this paper, by focusing on the penalized spline method, we show that the leave-subject-out CV is optimal in the sense that it is asymptotically equivalent to the empirical squared error loss function minimization. An efficient Newton-type algorithm is developed to compute the penalty parameters that optimize the CV criterion. Simulated and real data are used to demonstrate the effectiveness of the leave-subject-out CV in selecting both the penalty parameters and the working correlation matrix. © 2012 Institute of Mathematical Statistics.
Computationally efficient method for optical simulation of solar cells and their applications
Semenikhin, I.; Zanuccoli, M.; Fiegna, C.; Vyurkov, V.; Sangiorgi, E.
2013-01-01
This paper presents two novel implementations of the Differential method to solve the Maxwell equations in nanostructured optoelectronic solid state devices. The first proposed implementation is based on an improved and computationally efficient T-matrix formulation that adopts multiple-precision arithmetic to tackle the numerical instability problem which arises due to evanescent modes. The second implementation adopts the iterative approach that allows to achieve low computational complexity O(N logN) or better. The proposed algorithms may work with structures with arbitrary spatial variation of the permittivity. The developed two-dimensional numerical simulator is applied to analyze the dependence of the absorption characteristics of a thin silicon slab on the morphology of the front interface and on the angle of incidence of the radiation with respect to the device surface.
Asymptotic optimality and efficient computation of the leave-subject-out cross-validation
Xu, Ganggang; Huang, Jianhua Z.
2012-01-01
Although the leave-subject-out cross-validation (CV) has been widely used in practice for tuning parameter selection for various nonparametric and semiparametric models of longitudinal data, its theoretical property is unknown and solving the associated optimization problem is computationally expensive, especially when there are multiple tuning parameters. In this paper, by focusing on the penalized spline method, we show that the leave-subject-out CV is optimal in the sense that it is asymptotically equivalent to the empirical squared error loss function minimization. An efficient Newton-type algorithm is developed to compute the penalty parameters that optimize the CV criterion. Simulated and real data are used to demonstrate the effectiveness of the leave-subject-out CV in selecting both the penalty parameters and the working correlation matrix. © 2012 Institute of Mathematical Statistics.
International Nuclear Information System (INIS)
Sunaguchi, Naoki; Yuasa, Tetsuya; Gupta, Rajiv; Ando, Masami
2015-01-01
The main focus of this paper is reconstruction of tomographic phase-contrast image from a set of projections. We propose an efficient reconstruction algorithm for differential phase-contrast computed tomography that can considerably reduce the number of projections required for reconstruction. The key result underlying this research is a projection theorem that states that the second derivative of the projection set is linearly related to the Laplacian of the tomographic image. The proposed algorithm first reconstructs the Laplacian image of the phase-shift distribution from the second-derivative of the projections using total variation regularization. The second step is to obtain the phase-shift distribution by solving a Poisson equation whose source is the Laplacian image previously reconstructed under the Dirichlet condition. We demonstrate the efficacy of this algorithm using both synthetically generated simulation data and projection data acquired experimentally at a synchrotron. The experimental phase data were acquired from a human coronary artery specimen using dark-field-imaging optics pioneered by our group. Our results demonstrate that the proposed algorithm can reduce the number of projections to approximately 33% as compared with the conventional filtered backprojection method, without any detrimental effect on the image quality
Directory of Open Access Journals (Sweden)
K. Shalini
2013-01-01
Full Text Available Green computing is all about using computers in a smarter and eco-friendly way. It is the environmentally responsible use of computers and related resources which includes the implementation of energy-efficient central processing units, servers and peripherals as well as reduced resource consumption and proper disposal of electronic waste .Computers certainly make up a large part of many people lives and traditionally are extremely damaging to the environment. Manufacturers of computer and its parts have been espousing the green cause to help protect environment from computers and electronic waste in any way.Research continues into key areas such as making the use of computers as energy-efficient as Possible, and designing algorithms and systems for efficiency-related computer technologies.
DEFF Research Database (Denmark)
Clausen, Johan Christian; Damkilde, Lars; Andersen, Lars
2007-01-01
. The stress return and the formation of the constitutive matrix is carried out in principal stress space. Here the manipulations simplify and rely on geometrical arguments. The singularities arising at the intersection of yield planes are dealt with in a straightforward way also based on geometrical......An efficient return algorithm for stress update in numerical plasticity computations is presented. The yield criterion must be linear in principal stress space and can be composed of any number of yield planes. Each of these yield planes may have an associated or non-associated flow rule...
Efficient Resource Management in Cloud Computing
Rushikesh Shingade; Amit Patil; Shivam Suryawanshi; M. Venkatesan
2015-01-01
Cloud computing, one of the widely used technology to provide cloud services for users who are charged for receiving services. In the aspect of a maximum number of resources, evaluating the performance of Cloud resource management policies are difficult to optimize efficiently. There are different simulation toolkits available for simulation and modelling the Cloud computing environment like GridSim CloudAnalyst, CloudSim, GreenCloud, CloudAuction etc. In proposed Efficient Resource Manage...
Introduction: a brief overview of iterative algorithms in X-ray computed tomography.
Soleimani, M; Pengpen, T
2015-06-13
This paper presents a brief overview of some basic iterative algorithms, and more sophisticated methods are presented in the research papers in this issue. A range of algebraic iterative algorithms are covered here including ART, SART and OS-SART. A major limitation of the traditional iterative methods is their computational time. The Krylov subspace based methods such as the conjugate gradients (CG) algorithm and its variants can be used to solve linear systems of equations arising from large-scale CT with possible implementation using modern high-performance computing tools. The overall aim of this theme issue is to stimulate international efforts to develop the next generation of X-ray computed tomography (CT) image reconstruction software. © 2015 The Author(s) Published by the Royal Society. All rights reserved.
International Nuclear Information System (INIS)
Murase, Kenya; Itoh, Hisao; Mogami, Hiroshi; Ishine, Masashiro; Kawamura, Masashi; Iio, Atsushi; Hamamoto, Ken
1987-01-01
A computer based simulation method was developed to assess the relative effectiveness and availability of various attenuation compensation algorithms in single photon emission computed tomography (SPECT). The effect of the nonuniformity of attenuation coefficient distribution in the body, the errors in determining a body contour and the statistical noise on reconstruction accuracy and the computation time in using the algorithms were studied. The algorithms were classified into three groups: precorrection, post correction and iterative correction methods. Furthermore, a hybrid method was devised by combining several methods. This study will be useful for understanding the characteristics limitations and strengths of the algorithms and searching for a practical correction method for photon attenuation in SPECT. (orig.)
International Nuclear Information System (INIS)
Mellor, R.A.; Harrington, C.L.; Bard, S.T.
1984-01-01
Proposed changes to 10CFR20 combining internal and external exposures will require accurate and precise in vivo bioassay data. One of the many uncertainties in the interpretation of in vivo bioassay data is the imprecise knowledge of the location of any observed radioactivity within the body of an individual. Attempts to minimize this uncertainty have been made by collimating the field of view of a single photon detector to each organ or body system of concern. In each of these cases, full removal of any potential gamma flux from organs other than the desired organ is not achieved. In certain commercially available systems this ''cross talk'' may range from 20 to 40 percent. A computerized algorithm has been developed which resolves this ''cross talk'' for all observed radionuclides in a system composed of two high purity germanium photon detectors separately viewing the lung and GI regions of a subject. The algorithm routinely applies cross talk correction factors and photopeak detection efficiencies to the net spectral photopeak areas determined by a peak search methodology. Separate lung and GI activities, corrected for cross talk, are calculated and reported. The logic utilized in the total software package, as well as the derivation of the cross talk correction factors, will be discussed. Any limitations of the computer algorithm when applied to various radioactivity levels will also be identified. An evaluation of the cross talk factors for potential use in differentiating surface contamination from true organ burdens will be presented. In addition, the capability to efficiently execute this software using a low cost, portable stand-alone computer system will be demonstrated
Implementation and analysis of a Navier-Stokes algorithm on parallel computers
Fatoohi, Raad A.; Grosch, Chester E.
1988-01-01
The results of the implementation of a Navier-Stokes algorithm on three parallel/vector computers are presented. The object of this research is to determine how well, or poorly, a single numerical algorithm would map onto three different architectures. The algorithm is a compact difference scheme for the solution of the incompressible, two-dimensional, time-dependent Navier-Stokes equations. The computers were chosen so as to encompass a variety of architectures. They are the following: the MPP, an SIMD machine with 16K bit serial processors; Flex/32, an MIMD machine with 20 processors; and Cray/2. The implementation of the algorithm is discussed in relation to these architectures and measures of the performance on each machine are given. The basic comparison is among SIMD instruction parallelism on the MPP, MIMD process parallelism on the Flex/32, and vectorization of a serial code on the Cray/2. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Finally, conclusions are presented.
Autumn Algorithm-Computation of Hybridization Networks for Realistic Phylogenetic Trees.
Huson, Daniel H; Linz, Simone
2018-01-01
A minimum hybridization network is a rooted phylogenetic network that displays two given rooted phylogenetic trees using a minimum number of reticulations. Previous mathematical work on their calculation has usually assumed the input trees to be bifurcating, correctly rooted, or that they both contain the same taxa. These assumptions do not hold in biological studies and "realistic" trees have multifurcations, are difficult to root, and rarely contain the same taxa. We present a new algorithm for computing minimum hybridization networks for a given pair of "realistic" rooted phylogenetic trees. We also describe how the algorithm might be used to improve the rooting of the input trees. We introduce the concept of "autumn trees", a nice framework for the formulation of algorithms based on the mathematics of "maximum acyclic agreement forests". While the main computational problem is hard, the run-time depends mainly on how different the given input trees are. In biological studies, where the trees are reasonably similar, our parallel implementation performs well in practice. The algorithm is available in our open source program Dendroscope 3, providing a platform for biologists to explore rooted phylogenetic networks. We demonstrate the utility of the algorithm using several previously studied data sets.
Directory of Open Access Journals (Sweden)
Ahmad Shokuh Saljoughi
2018-01-01
Full Text Available Today, cloud computing has become popular among users in organizations and companies. Security and efficiency are the two major issues facing cloud service providers and their customers. Since cloud computing is a virtual pool of resources provided in an open environment (Internet, cloud-based services entail security risks. Detection of intrusions and attacks through unauthorized users is one of the biggest challenges for both cloud service providers and cloud users. In the present study, artificial intelligence techniques, e.g. MLP Neural Network sand particle swarm optimization algorithm, were used to detect intrusion and attacks. The methods were tested for NSL-KDD, KDD-CUP datasets. The results showed improved accuracy in detecting attacks and intrusions by unauthorized users.
A genetic algorithm applied to a PWR turbine extraction optimization to increase cycle efficiency
International Nuclear Information System (INIS)
Sacco, Wagner F.; Schirru, Roberto
2002-01-01
In nuclear power plants feedwater heaters are used to heat feedwater from its temperature leaving the condenser to final feedwater temperature using steam extracted from various stages of the turbines. The purpose of this process is to increase cycle efficiency. The determination of the optimal fraction of mass flow rate to be extracted from each stage of the turbines is a complex optimization problem. This kind of problem has been efficiently solved by means of evolutionary computation techniques, such as Genetic Algorithms (GAs). GAs, which are systems based upon principles from biological genetics, have been successfully applied to several combinatorial optimization problems in nuclear engineering, as the nuclear fuel reload optimization problem. We introduce the use of GAs in cycle efficiency optimization by finding an optimal combination of turbine extractions. In order to demonstrate the effectiveness of our approach, we have chosen a typical PWR as case study. The secondary side of the PWR was simulated using PEPSE, which is a modeling tool used to perform integrated heat balances for power plants. The results indicate that the GA is a quite promising tool for cycle efficiency optimization. (author)
A novel computer algorithm for modeling and treating mandibular fractures: A pilot study.
Rizzi, Christopher J; Ortlip, Timothy; Greywoode, Jewel D; Vakharia, Kavita T; Vakharia, Kalpesh T
2017-02-01
To describe a novel computer algorithm that can model mandibular fracture repair. To evaluate the algorithm as a tool to model mandibular fracture reduction and hardware selection. Retrospective pilot study combined with cross-sectional survey. A computer algorithm utilizing Aquarius Net (TeraRecon, Inc, Foster City, CA) and Adobe Photoshop CS6 (Adobe Systems, Inc, San Jose, CA) was developed to model mandibular fracture repair. Ten different fracture patterns were selected from nine patients who had already undergone mandibular fracture repair. The preoperative computed tomography (CT) images were processed with the computer algorithm to create virtual images that matched the actual postoperative three-dimensional CT images. A survey comparing the true postoperative image with the virtual postoperative images was created and administered to otolaryngology resident and attending physicians. They were asked to rate on a scale from 0 to 10 (0 = completely different; 10 = identical) the similarity between the two images in terms of the fracture reduction and fixation hardware. Ten mandible fracture cases were analyzed and processed. There were 15 survey respondents. The mean score for overall similarity between the images was 8.41 ± 0.91; the mean score for similarity of fracture reduction was 8.61 ± 0.98; and the mean score for hardware appearance was 8.27 ± 0.97. There were no significant differences between attending and resident responses. There were no significant differences based on fracture location. This computer algorithm can accurately model mandibular fracture repair. Images created by the algorithm are highly similar to true postoperative images. The algorithm can potentially assist a surgeon planning mandibular fracture repair. 4. Laryngoscope, 2016 127:331-336, 2017. © 2016 The American Laryngological, Rhinological and Otological Society, Inc.
Two efficient label-equivalence-based connected-component labeling algorithms for 3-D binary images.
He, Lifeng; Chao, Yuyan; Suzuki, Kenji
2011-08-01
Whenever one wants to distinguish, recognize, and/or measure objects (connected components) in binary images, labeling is required. This paper presents two efficient label-equivalence-based connected-component labeling algorithms for 3-D binary images. One is voxel based and the other is run based. For the voxel-based one, we present an efficient method of deciding the order for checking voxels in the mask. For the run-based one, instead of assigning each foreground voxel, we assign each run a provisional label. Moreover, we use run data to label foreground voxels without scanning any background voxel in the second scan. Experimental results have demonstrated that our voxel-based algorithm is efficient for 3-D binary images with complicated connected components, that our run-based one is efficient for those with simple connected components, and that both are much more efficient than conventional 3-D labeling algorithms.
Energy Technology Data Exchange (ETDEWEB)
Ahn, Surl-Hee; Grate, Jay W.; Darve, Eric F.
2017-08-21
Molecular dynamics (MD) simulations are useful in obtaining thermodynamic and kinetic properties of bio-molecules but are limited by the timescale barrier, i.e., we may be unable to efficiently obtain properties because we need to run microseconds or longer simulations using femtoseconds time steps. While there are several existing methods to overcome this timescale barrier and efficiently sample thermodynamic and/or kinetic properties, problems remain in regard to being able to sample un- known systems, deal with high-dimensional space of collective variables, and focus the computational effort on slow timescales. Hence, a new sampling method, called the “Concurrent Adaptive Sampling (CAS) algorithm,” has been developed to tackle these three issues and efficiently obtain conformations and pathways. The method is not constrained to use only one or two collective variables, unlike most reaction coordinate-dependent methods. Instead, it can use a large number of collective vari- ables and uses macrostates (a partition of the collective variable space) to enhance the sampling. The exploration is done by running a large number of short simula- tions, and a clustering technique is used to accelerate the sampling. In this paper, we introduce the new methodology and show results from two-dimensional models and bio-molecules, such as penta-alanine and triazine polymer
An Efficient Virtual Machine Consolidation Scheme for Multimedia Cloud Computing.
Han, Guangjie; Que, Wenhui; Jia, Gangyong; Shu, Lei
2016-02-18
Cloud computing has innovated the IT industry in recent years, as it can delivery subscription-based services to users in the pay-as-you-go model. Meanwhile, multimedia cloud computing is emerging based on cloud computing to provide a variety of media services on the Internet. However, with the growing popularity of multimedia cloud computing, its large energy consumption cannot only contribute to greenhouse gas emissions, but also result in the rising of cloud users' costs. Therefore, the multimedia cloud providers should try to minimize its energy consumption as much as possible while satisfying the consumers' resource requirements and guaranteeing quality of service (QoS). In this paper, we have proposed a remaining utilization-aware (RUA) algorithm for virtual machine (VM) placement, and a power-aware algorithm (PA) is proposed to find proper hosts to shut down for energy saving. These two algorithms have been combined and applied to cloud data centers for completing the process of VM consolidation. Simulation results have shown that there exists a trade-off between the cloud data center's energy consumption and service-level agreement (SLA) violations. Besides, the RUA algorithm is able to deal with variable workload to prevent hosts from overloading after VM placement and to reduce the SLA violations dramatically.
Tanaka, T.; Tachikawa, Y.; Ichikawa, Y.; Yorozu, K.
2017-12-01
Flood is one of the most hazardous disasters and causes serious damage to people and property around the world. To prevent/mitigate flood damage through early warning system and/or river management planning, numerical modelling of flood-inundation processes is essential. In a literature, flood-inundation models have been extensively developed and improved to achieve flood flow simulation with complex topography at high resolution. With increasing demands on flood-inundation modelling, its computational burden is now one of the key issues. Improvements of computational efficiency of full shallow water equations are made from various perspectives such as approximations of the momentum equations, parallelization technique, and coarsening approaches. To support these techniques and more improve the computational efficiency of flood-inundation simulations, this study proposes an Automatic Domain Updating (ADU) method of 2-D flood-inundation simulation. The ADU method traces the wet and dry interface and automatically updates the simulation domain in response to the progress and recession of flood propagation. The updating algorithm is as follow: first, to register the simulation cells potentially flooded at initial stage (such as floodplains nearby river channels), and then if a registered cell is flooded, to register its surrounding cells. The time for this additional process is saved by checking only cells at wet and dry interface. The computation time is reduced by skipping the processing time of non-flooded area. This algorithm is easily applied to any types of 2-D flood inundation models. The proposed ADU method is implemented to 2-D local inertial equations for the Yodo River basin, Japan. Case studies for two flood events show that the simulation is finished within two to 10 times smaller time showing the same result as that without the ADU method.
Intelligent cloud computing security using genetic algorithm as a computational tools
Razuky AL-Shaikhly, Mazin H.
2018-05-01
An essential change had occurred in the field of Information Technology which represented with cloud computing, cloud giving virtual assets by means of web yet awesome difficulties in the field of information security and security assurance. Currently main problem with cloud computing is how to improve privacy and security for cloud “cloud is critical security”. This paper attempts to solve cloud security by using intelligent system with genetic algorithm as wall to provide cloud data secure, all services provided by cloud must detect who receive and register it to create list of users (trusted or un-trusted) depend on behavior. The execution of present proposal has shown great outcome.
A Biologically-Inspired Power Control Algorithm for Energy-Efficient Cellular Networks
Directory of Open Access Journals (Sweden)
Hyun-Ho Choi
2016-03-01
Full Text Available Most of the energy used to operate a cellular network is consumed by a base station (BS, and reducing the transmission power of a BS can therefore afford a substantial reduction in the amount of energy used in a network. In this paper, we propose a distributed transmit power control (TPC algorithm inspired by bird flocking behavior as a means of improving the energy efficiency of a cellular network. Just as each bird in a flock attempts to match its velocity with the average velocity of adjacent birds, in the proposed algorithm, each mobile station (MS in a cell matches its rate with the average rate of the co-channel MSs in adjacent cells by controlling the transmit power of its serving BS. We verify that this bio-inspired TPC algorithm using a local rate-average process achieves an exponential convergence and maximizes the minimum rate of the MSs concerned. Simulation results show that the proposed TPC algorithm follows the same convergence properties as the flocking algorithm and also effectively reduces the power consumption at the BSs while maintaining a low outage probability as the inter-cell interference increases; in so doing, it significantly improves the energy efficiency of a cellular network.
A Parallel Prefix Algorithm for Almost Toeplitz Tridiagonal Systems
Sun, Xian-He; Joslin, Ronald D.
1995-01-01
A compact scheme is a discretization scheme that is advantageous in obtaining highly accurate solutions. However, the resulting systems from compact schemes are tridiagonal systems that are difficult to solve efficiently on parallel computers. Considering the almost symmetric Toeplitz structure, a parallel algorithm, simple parallel prefix (SPP), is proposed. The SPP algorithm requires less memory than the conventional LU decomposition and is efficient on parallel machines. It consists of a prefix communication pattern and AXPY operations. Both the computation and the communication can be truncated without degrading the accuracy when the system is diagonally dominant. A formal accuracy study has been conducted to provide a simple truncation formula. Experimental results have been measured on a MasPar MP-1 SIMD machine and on a Cray 2 vector machine. Experimental results show that the simple parallel prefix algorithm is a good algorithm for symmetric, almost symmetric Toeplitz tridiagonal systems and for the compact scheme on high-performance computers.
Madni, Syed Hamid Hussain; Abd Latiff, Muhammad Shafie; Abdullahi, Mohammed; Usman, Mohammed Joda
2017-01-01
Cloud computing infrastructure is suitable for meeting computational needs of large task sizes. Optimal scheduling of tasks in cloud computing environment has been proved to be an NP-complete problem, hence the need for the application of heuristic methods. Several heuristic algorithms have been developed and used in addressing this problem, but choosing the appropriate algorithm for solving task assignment problem of a particular nature is difficult since the methods are developed under different assumptions. Therefore, six rule based heuristic algorithms are implemented and used to schedule autonomous tasks in homogeneous and heterogeneous environments with the aim of comparing their performance in terms of cost, degree of imbalance, makespan and throughput. First Come First Serve (FCFS), Minimum Completion Time (MCT), Minimum Execution Time (MET), Max-min, Min-min and Sufferage are the heuristic algorithms considered for the performance comparison and analysis of task scheduling in cloud computing. PMID:28467505
Madni, Syed Hamid Hussain; Abd Latiff, Muhammad Shafie; Abdullahi, Mohammed; Abdulhamid, Shafi'i Muhammad; Usman, Mohammed Joda
2017-01-01
Cloud computing infrastructure is suitable for meeting computational needs of large task sizes. Optimal scheduling of tasks in cloud computing environment has been proved to be an NP-complete problem, hence the need for the application of heuristic methods. Several heuristic algorithms have been developed and used in addressing this problem, but choosing the appropriate algorithm for solving task assignment problem of a particular nature is difficult since the methods are developed under different assumptions. Therefore, six rule based heuristic algorithms are implemented and used to schedule autonomous tasks in homogeneous and heterogeneous environments with the aim of comparing their performance in terms of cost, degree of imbalance, makespan and throughput. First Come First Serve (FCFS), Minimum Completion Time (MCT), Minimum Execution Time (MET), Max-min, Min-min and Sufferage are the heuristic algorithms considered for the performance comparison and analysis of task scheduling in cloud computing.
Adiabatic quantum search algorithm for structured problems
International Nuclear Information System (INIS)
Roland, Jeremie; Cerf, Nicolas J.
2003-01-01
The study of quantum computation has been motivated by the hope of finding efficient quantum algorithms for solving classically hard problems. In this context, quantum algorithms by local adiabatic evolution have been shown to solve an unstructured search problem with a quadratic speedup over a classical search, just as Grover's algorithm. In this paper, we study how the structure of the search problem may be exploited to further improve the efficiency of these quantum adiabatic algorithms. We show that by nesting a partial search over a reduced set of variables into a global search, it is possible to devise quantum adiabatic algorithms with a complexity that, although still exponential, grows with a reduced order in the problem size
Automated System for Teaching Computational Complexity of Algorithms Course
Directory of Open Access Journals (Sweden)
Vadim S. Roublev
2017-01-01
Full Text Available This article describes problems of designing automated teaching system for “Computational complexity of algorithms” course. This system should provide students with means to familiarize themselves with complex mathematical apparatus and improve their mathematical thinking in the respective area. The article introduces the technique of algorithms symbol scroll table that allows estimating lower and upper bounds of computational complexity. Further, we introduce a set of theorems that facilitate the analysis in cases when the integer rounding of algorithm parameters is involved and when analyzing the complexity of a sum. At the end, the article introduces a normal system of symbol transformations that allows one both to perform any symbol transformations and simplifies the automated validation of such transformations. The article is published in the authors’ wording.
Computational electromagnetics recent advances and engineering applications
2014-01-01
Emerging Topics in Computational Electromagnetics in Computational Electromagnetics presents advances in Computational Electromagnetics. This book is designed to fill the existing gap in current CEM literature that only cover the conventional numerical techniques for solving traditional EM problems. The book examines new algorithms, and applications of these algorithms for solving problems of current interest that are not readily amenable to efficient treatment by using the existing techniques. The authors discuss solution techniques for problems arising in nanotechnology, bioEM, metamaterials, as well as multiscale problems. They present techniques that utilize recent advances in computer technology, such as parallel architectures, and the increasing need to solve large and complex problems in a time efficient manner by using highly scalable algorithms.
Schüller, Anton; Schweitzer, Marc
2017-01-01
The contributions gathered here provide an overview of current research projects and selected software products of the Fraunhofer Institute for Algorithms and Scientific Computing SCAI. They show the wide range of challenges that scientific computing currently faces, the solutions it offers, and its important role in developing applications for industry. Given the exciting field of applied collaborative research and development it discusses, the book will appeal to scientists, practitioners, and students alike. The Fraunhofer Institute for Algorithms and Scientific Computing SCAI combines excellent research and application-oriented development to provide added value for our partners. SCAI develops numerical techniques, parallel algorithms and specialized software tools to support and optimize industrial simulations. Moreover, it implements custom software solutions for production and logistics, and offers calculations on high-performance computers. Its services and products are based on state-of-the-art metho...
Directory of Open Access Journals (Sweden)
Yao-Liang Chung
2016-11-01
Full Text Available The simultaneous aggregation of multiple component carriers (CCs for use by a base station constitutes one of the more promising strategies for providing substantially enhanced bandwidths for packet transmissions in 4th and 5th generation cellular systems. To the best of our knowledge, however, few previous studies have undertaken a thorough investigation of various performance aspects of the use of a simple yet effective packet scheduling algorithm in which multiple CCs are aggregated for transmission in such systems. Consequently, the present study presents an efficient packet scheduling algorithm designed on the basis of the proportional fair criterion for use in multiple-CC systems for downlink transmission. The proposed algorithm includes a focus on providing simultaneous transmission support for both real-time (RT and non-RT traffic. This algorithm can, when applied with sufficiently efficient designs, provide adequate utilization of spectrum resources for the purposes of transmissions, while also improving energy efficiency to some extent. According to simulation results, the performance of the proposed algorithm in terms of system throughput, mean delay, and fairness constitute substantial improvements over those of an algorithm in which the CCs are used independently instead of being aggregated.
On efficient randomized algorithms for finding the PageRank vector
Gasnikov, A. V.; Dmitriev, D. Yu.
2015-03-01
Two randomized methods are considered for finding the PageRank vector; in other words, the solution of the system p T = p T P with a stochastic n × n matrix P, where n ˜ 107-109, is sought (in the class of probability distributions) with accuracy ɛ: ɛ ≫ n -1. Thus, the possibility of brute-force multiplication of P by the column is ruled out in the case of dense objects. The first method is based on the idea of Markov chain Monte Carlo algorithms. This approach is efficient when the iterative process p {/t+1 T} = p {/t T} P quickly reaches a steady state. Additionally, it takes into account another specific feature of P, namely, the nonzero off-diagonal elements of P are equal in rows (this property is used to organize a random walk over the graph with the matrix P). Based on modern concentration-of-measure inequalities, new bounds for the running time of this method are presented that take into account the specific features of P. In the second method, the search for a ranking vector is reduced to finding the equilibrium in the antagonistic matrix game where S n (1) is a unit simplex in ℝ n and I is the identity matrix. The arising problem is solved by applying a slightly modified Grigoriadis-Khachiyan algorithm (1995). This technique, like the Nazin-Polyak method (2009), is a randomized version of Nemirovski's mirror descent method. The difference is that randomization in the Grigoriadis-Khachiyan algorithm is used when the gradient is projected onto the simplex rather than when the stochastic gradient is computed. For sparse matrices P, the method proposed yields noticeably better results.
Unified algorithm for partial differential equations and examples of numerical computation
International Nuclear Information System (INIS)
Watanabe, Tsuguhiro
1999-01-01
A new unified algorithm is proposed to solve partial differential equations which describe nonlinear boundary value problems, eigenvalue problems and time developing boundary value problems. The algorithm is composed of implicit difference scheme and multiple shooting scheme and is named as HIDM (Higher order Implicit Difference Method). A new prototype computer programs for 2-dimensional partial differential equations is constructed and tested successfully to several problems. Extension of the computer programs to 3 or more higher order dimension problems will be easy due to the direct product type difference scheme. (author)
External-Memory Algorithms and Data Structures
DEFF Research Database (Denmark)
Arge, Lars; Zeh, Norbert
2010-01-01
The data sets involved in many modern applications are often too massive to fit in main memory of even the most powerful computers and must therefore reside on disk. Thus communication between internal and external memory, and not actual computation time, becomes the bottleneck in the computation....... This is due to the huge difference in access time of fast internal memory and slower external memory such as disks. The goal of theoretical work in the area of external memory algorithms (also called I/O algorithms or out-of-core algorithms) has been to develop algorithms that minimize the Input...... in parallel and the use of parallel disks has received a lot of theoretical attention. See below for recent surveys of theoretical results in the area of I/O-efficient algorithms. TPIE is designed to bridge the gap between the theory and practice of parallel I/O systems. It is intended to demonstrate all...
Loewenstein, Yaniv; Portugaly, Elon; Fromer, Menachem; Linial, Michal
2008-07-01
UPGMA (average linking) is probably the most popular algorithm for hierarchical data clustering, especially in computational biology. However, UPGMA requires the entire dissimilarity matrix in memory. Due to this prohibitive requirement, UPGMA is not scalable to very large datasets. We present a novel class of memory-constrained UPGMA (MC-UPGMA) algorithms. Given any practical memory size constraint, this framework guarantees the correct clustering solution without explicitly requiring all dissimilarities in memory. The algorithms are general and are applicable to any dataset. We present a data-dependent characterization of hardness and clustering efficiency. The presented concepts are applicable to any agglomerative clustering formulation. We apply our algorithm to the entire collection of protein sequences, to automatically build a comprehensive evolutionary-driven hierarchy of proteins from sequence alone. The newly created tree captures protein families better than state-of-the-art large-scale methods such as CluSTr, ProtoNet4 or single-linkage clustering. We demonstrate that leveraging the entire mass embodied in all sequence similarities allows to significantly improve on current protein family clusterings which are unable to directly tackle the sheer mass of this data. Furthermore, we argue that non-metric constraints are an inherent complexity of the sequence space and should not be overlooked. The robustness of UPGMA allows significant improvement, especially for multidomain proteins, and for large or divergent families. A comprehensive tree built from all UniProt sequence similarities, together with navigation and classification tools will be made available as part of the ProtoNet service. A C++ implementation of the algorithm is available on request.
Computation of the efficiency distribution of a multichannel focusing collimator
International Nuclear Information System (INIS)
Balasubramanian, A.; Venkateswaran, T.V.
1977-01-01
This article describes two computer methods of calculating the point source efficiency distribution functions of a focusing collimator with round tapered holes. The first method which computes only the geometric efficiency distribution is adequate for low energy collimators while the second method which computes both geometric and penetration efficiencies can be made use of for medium and high energy collimators. The scatter contribution to the efficiency is not taken into account. In the first method the efficiency distribution of a single cone of the collimator is obtained and the data are used for computing the distribution of the whole collimator. For high energy collimator the entire detector region is imagined to be divided into elemental areas. Efficiency of the elemental area is computed after suitably weighting for the penetration within the collimator septa, which is determined by three dimensional geometric techniques. The method of computing the line source efficiency distribution from point source distribution is also explained. The formulations have been tested by computing the efficiency distribution of several commercial collimators and collimators fabricated by us. (Auth.)
A New Algorithm for System of Integral Equations
Directory of Open Access Journals (Sweden)
Abdujabar Rasulov
2014-01-01
Full Text Available We develop a new algorithm to solve the system of integral equations. In this new method no need to use matrix weights. Beacause of it, we reduce computational complexity considerable. Using the new algorithm it is also possible to solve an initial boundary value problem for system of parabolic equations. To verify the efficiency, the results of computational experiments are given.
Efficient Multi-Party Computation over Rings
DEFF Research Database (Denmark)
Cramer, Ronald; Fehr, Serge; Ishai, Yuval
2003-01-01
Secure multi-party computation (MPC) is an active research area, and a wide range of literature can be found nowadays suggesting improvements and generalizations of existing protocols in various directions. However, all current techniques for secure MPC apply to functions that are represented by ...... the usefulness of the above results by presenting a novel application of MPC over (non-field) rings to the round-efficient secure computation of the maximum function. Basic Research in Computer Science (www.brics.dk), funded by the Danish National Research Foundation.......Secure multi-party computation (MPC) is an active research area, and a wide range of literature can be found nowadays suggesting improvements and generalizations of existing protocols in various directions. However, all current techniques for secure MPC apply to functions that are represented...... by (boolean or arithmetic) circuits over finite fields. We are motivated by two limitations of these techniques: – Generality. Existing protocols do not apply to computation over more general algebraic structures (except via a brute-force simulation of computation in these structures). – Efficiency. The best...
Evaluation of six TPS algorithms in computing entrance and exit doses
Metwaly, Mohamed; Glegg, Martin; Baggarley, Shaun P.; Elliott, Alex
2014-01-01
Entrance and exit doses are commonly measured in in vivo dosimetry for comparison with expected values, usually generated by the treatment planning system (TPS), to verify accuracy of treatment delivery. This report aims to evaluate the accuracy of six TPS algorithms in computing entrance and exit doses for a 6 MV beam. The algorithms tested were: pencil beam convolution (Eclipse PBC), analytical anisotropic algorithm (Eclipse AAA), AcurosXB (Eclipse AXB), FFT convolution (XiO Convolution), multigrid superposition (XiO Superposition), and Monte Carlo photon (Monaco MC). Measurements with ionization chamber (IC) and diode detector in water phantoms were used as a reference. Comparisons were done in terms of central axis point dose, 1D relative profiles, and 2D absolute gamma analysis. Entrance doses computed by all TPS algorithms agreed to within 2% of the measured values. Exit doses computed by XiO Convolution, XiO Superposition, Eclipse AXB, and Monaco MC agreed with the IC measured doses to within 2%‐3%. Meanwhile, Eclipse PBC and Eclipse AAA computed exit doses were higher than the IC measured doses by up to 5.3% and 4.8%, respectively. Both algorithms assume that full backscatter exists even at the exit level, leading to an overestimation of exit doses. Despite good agreements at the central axis for Eclipse AXB and Monaco MC, 1D relative comparisons showed profiles mismatched at depths beyond 11.5 cm. Overall, the 2D absolute gamma (3%/3 mm) pass rates were better for Monaco MC, while Eclipse AXB failed mostly at the outer 20% of the field area. The findings of this study serve as a useful baseline for the implementation of entrance and exit in vivo dosimetry in clinical departments utilizing any of these six common TPS algorithms for reference comparison. PACS numbers: 87.55.‐x, 87.55.D‐, 87.55.N‐, 87.53.Bn PMID:24892349
Arkin, Ethem; Tekinerdogan, Bedir
2016-01-01
Mapping parallel algorithms to parallel computing platforms requires several activities such as the analysis of the parallel algorithm, the definition of the logical configuration of the platform, the mapping of the algorithm to the logical configuration platform and the implementation of the
Spatial compression algorithm for the analysis of very large multivariate images
Keenan, Michael R [Albuquerque, NM
2008-07-15
A method for spatially compressing data sets enables the efficient analysis of very large multivariate images. The spatial compression algorithms use a wavelet transformation to map an image into a compressed image containing a smaller number of pixels that retain the original image's information content. Image analysis can then be performed on a compressed data matrix consisting of a reduced number of significant wavelet coefficients. Furthermore, a block algorithm can be used for performing common operations more efficiently. The spatial compression algorithms can be combined with spectral compression algorithms to provide further computational efficiencies.
Efficient predictive algorithms for image compression
Rosário Lucas, Luís Filipe; Maciel de Faria, Sérgio Manuel; Morais Rodrigues, Nuno Miguel; Liberal Pagliari, Carla
2017-01-01
This book discusses efficient prediction techniques for the current state-of-the-art High Efficiency Video Coding (HEVC) standard, focusing on the compression of a wide range of video signals, such as 3D video, Light Fields and natural images. The authors begin with a review of the state-of-the-art predictive coding methods and compression technologies for both 2D and 3D multimedia contents, which provides a good starting point for new researchers in the field of image and video compression. New prediction techniques that go beyond the standardized compression technologies are then presented and discussed. In the context of 3D video, the authors describe a new predictive algorithm for the compression of depth maps, which combines intra-directional prediction, with flexible block partitioning and linear residue fitting. New approaches are described for the compression of Light Field and still images, which enforce sparsity constraints on linear models. The Locally Linear Embedding-based prediction method is in...
Parallel image encryption algorithm based on discretized chaotic map
International Nuclear Information System (INIS)
Zhou Qing; Wong Kwokwo; Liao Xiaofeng; Xiang Tao; Hu Yue
2008-01-01
Recently, a variety of chaos-based algorithms were proposed for image encryption. Nevertheless, none of them works efficiently in parallel computing environment. In this paper, we propose a framework for parallel image encryption. Based on this framework, a new algorithm is designed using the discretized Kolmogorov flow map. It fulfills all the requirements for a parallel image encryption algorithm. Moreover, it is secure and fast. These properties make it a good choice for image encryption on parallel computing platforms
A comparison between physicians and computer algorithms for form CMS-2728 data reporting.
Malas, Mohammed Said; Wish, Jay; Moorthi, Ranjani; Grannis, Shaun; Dexter, Paul; Duke, Jon; Moe, Sharon
2017-01-01
CMS-2728 form (Medical Evidence Report) assesses 23 comorbidities chosen to reflect poor outcomes and increased mortality risk. Previous studies questioned the validity of physician reporting on forms CMS-2728. We hypothesize that reporting of comorbidities by computer algorithms identifies more comorbidities than physician completion, and, therefore, is more reflective of underlying disease burden. We collected data from CMS-2728 forms for all 296 patients who had incident ESRD diagnosis and received chronic dialysis from 2005 through 2014 at Indiana University outpatient dialysis centers. We analyzed patients' data from electronic medical records systems that collated information from multiple health care sources. Previously utilized algorithms or natural language processing was used to extract data on 10 comorbidities for a period of up to 10 years prior to ESRD incidence. These algorithms incorporate billing codes, prescriptions, and other relevant elements. We compared the presence or unchecked status of these comorbidities on the forms to the presence or absence according to the algorithms. Computer algorithms had higher reporting of comorbidities compared to forms completion by physicians. This remained true when decreasing data span to one year and using only a single health center source. The algorithms determination was well accepted by a physician panel. Importantly, algorithms use significantly increased the expected deaths and lowered the standardized mortality ratios. Using computer algorithms showed superior identification of comorbidities for form CMS-2728 and altered standardized mortality ratios. Adapting similar algorithms in available EMR systems may offer more thorough evaluation of comorbidities and improve quality reporting. © 2016 International Society for Hemodialysis.
Automatic computer aided analysis algorithms and system for adrenal tumors on CT images.
Chai, Hanchao; Guo, Yi; Wang, Yuanyuan; Zhou, Guohui
2017-12-04
The adrenal tumor will disturb the secreting function of adrenocortical cells, leading to many diseases. Different kinds of adrenal tumors require different therapeutic schedules. In the practical diagnosis, it highly relies on the doctor's experience to judge the tumor type by reading the hundreds of CT images. This paper proposed an automatic computer aided analysis method for adrenal tumors detection and classification. It consisted of the automatic segmentation algorithms, the feature extraction and the classification algorithms. These algorithms were then integrated into a system and conducted on the graphic interface by using MATLAB Graphic user interface (GUI). The accuracy of the automatic computer aided segmentation and classification reached 90% on 436 CT images. The experiments proved the stability and reliability of this automatic computer aided analytic system.
Uhr, Leonard
1984-01-01
Computer Science and Applied Mathematics: Algorithm-Structured Computer Arrays and Networks: Architectures and Processes for Images, Percepts, Models, Information examines the parallel-array, pipeline, and other network multi-computers.This book describes and explores arrays and networks, those built, being designed, or proposed. The problems of developing higher-level languages for systems and designing algorithm, program, data flow, and computer structure are also discussed. This text likewise describes several sequences of successively more general attempts to combine the power of arrays wi
Directory of Open Access Journals (Sweden)
Eric Chalmers
2016-12-01
Full Text Available The mammalian brain is thought to use a version of Model-based Reinforcement Learning (MBRL to guide goal-directed behavior, wherein animals consider goals and make plans to acquire desired outcomes. However, conventional MBRL algorithms do not fully explain animals’ ability to rapidly adapt to environmental changes, or learn multiple complex tasks. They also require extensive computation, suggesting that goal-directed behavior is cognitively expensive. We propose here that key features of processing in the hippocampus support a flexible MBRL mechanism for spatial navigation that is computationally efficient and can adapt quickly to change. We investigate this idea by implementing a computational MBRL framework that incorporates features inspired by computational properties of the hippocampus: a hierarchical representation of space, forward sweeps through future spatial trajectories, and context-driven remapping of place cells. We find that a hierarchical abstraction of space greatly reduces the computational load (mental effort required for adaptation to changing environmental conditions, and allows efficient scaling to large problems. It also allows abstract knowledge gained at high levels to guide adaptation to new obstacles. Moreover, a context-driven remapping mechanism allows learning and memory of multiple tasks. Simulating dorsal or ventral hippocampal lesions in our computational framework qualitatively reproduces behavioral deficits observed in rodents with analogous lesions. The framework may thus embody key features of how the brain organizes model-based RL to efficiently solve navigation and other difficult tasks.
A novel iris localization algorithm using correlation filtering
Pohit, Mausumi; Sharma, Jitu
2015-06-01
Fast and efficient segmentation of iris from the eye images is a primary requirement for robust database independent iris recognition. In this paper we have presented a new algorithm for computing the inner and outer boundaries of the iris and locating the pupil centre. Pupil-iris boundary computation is based on correlation filtering approach, whereas iris-sclera boundary is determined through one dimensional intensity mapping. The proposed approach is computationally less extensive when compared with the existing algorithms like Hough transform.
Romano, Paul Kollath
Monte Carlo particle transport methods are being considered as a viable option for high-fidelity simulation of nuclear reactors. While Monte Carlo methods offer several potential advantages over deterministic methods, there are a number of algorithmic shortcomings that would prevent their immediate adoption for full-core analyses. In this thesis, algorithms are proposed both to ameliorate the degradation in parallel efficiency typically observed for large numbers of processors and to offer a means of decomposing large tally data that will be needed for reactor analysis. A nearest-neighbor fission bank algorithm was proposed and subsequently implemented in the OpenMC Monte Carlo code. A theoretical analysis of the communication pattern shows that the expected cost is O( N ) whereas traditional fission bank algorithms are O(N) at best. The algorithm was tested on two supercomputers, the Intrepid Blue Gene/P and the Titan Cray XK7, and demonstrated nearly linear parallel scaling up to 163,840 processor cores on a full-core benchmark problem. An algorithm for reducing network communication arising from tally reduction was analyzed and implemented in OpenMC. The proposed algorithm groups only particle histories on a single processor into batches for tally purposes---in doing so it prevents all network communication for tallies until the very end of the simulation. The algorithm was tested, again on a full-core benchmark, and shown to reduce network communication substantially. A model was developed to predict the impact of load imbalances on the performance of domain decomposed simulations. The analysis demonstrated that load imbalances in domain decomposed simulations arise from two distinct phenomena: non-uniform particle densities and non-uniform spatial leakage. The dominant performance penalty for domain decomposition was shown to come from these physical effects rather than insufficient network bandwidth or high latency. The model predictions were verified with
ESHOPPS: A COMPUTATIONAL TOOL TO AID THE TEACHING OF SHORTEST PATH ALGORITHMS
Directory of Open Access Journals (Sweden)
S. J. de A. LIMA
2015-07-01
Full Text Available The development of a computational tool called EShoPPS – Environment for Shortest Path Problem Solving, which is used to assist students in understanding the working of Dijkstra, Greedy search and A*(star algorithms is presented in this paper. Such algorithms are commonly taught in graduate and undergraduate courses of Engineering and Informatics and are used for solving many optimization problems that can be characterized as Shortest Path Problem. The EShoPPS is an interactive tool that allows students to create a graph representing the problem and also helps in developing their knowledge of each specific algorithm. Experiments performed with 155 students of undergraduate and graduate courses such as Industrial Engineering, Computer Science and Information Systems have shown that by using the EShoPPS tool students were able to improve their interpretation of investigated algorithms.
Gunnels, John; Lee, Jon; Margulies, Susan
2010-01-01
We provide a first demonstration of the idea that matrix-based algorithms for nonlinear combinatorial optimization problems can be efficiently implemented. Such algorithms were mainly conceived by theoretical computer scientists for proving efficiency. We are able to demonstrate the practicality of our approach by developing an implementation on a massively parallel architecture, and exploiting scalable and efficient parallel implementations of algorithms for ultra high-precision linear algebra. Additionally, we have delineated and implemented the necessary algorithmic and coding changes required in order to address problems several orders of magnitude larger, dealing with the limits of scalability from memory footprint, computational efficiency, reliability, and interconnect perspectives. © Springer and Mathematical Programming Society 2010.
Gunnels, John
2010-06-01
We provide a first demonstration of the idea that matrix-based algorithms for nonlinear combinatorial optimization problems can be efficiently implemented. Such algorithms were mainly conceived by theoretical computer scientists for proving efficiency. We are able to demonstrate the practicality of our approach by developing an implementation on a massively parallel architecture, and exploiting scalable and efficient parallel implementations of algorithms for ultra high-precision linear algebra. Additionally, we have delineated and implemented the necessary algorithmic and coding changes required in order to address problems several orders of magnitude larger, dealing with the limits of scalability from memory footprint, computational efficiency, reliability, and interconnect perspectives. © Springer and Mathematical Programming Society 2010.
Xu, Jincheng; Liu, Wei; Wang, Jin; Liu, Linong; Zhang, Jianfeng
2018-02-01
De-absorption pre-stack time migration (QPSTM) compensates for the absorption and dispersion of seismic waves by introducing an effective Q parameter, thereby making it an effective tool for 3D, high-resolution imaging of seismic data. Although the optimal aperture obtained via stationary-phase migration reduces the computational cost of 3D QPSTM and yields 3D stationary-phase QPSTM, the associated computational efficiency is still the main problem in the processing of 3D, high-resolution images for real large-scale seismic data. In the current paper, we proposed a division method for large-scale, 3D seismic data to optimize the performance of stationary-phase QPSTM on clusters of graphics processing units (GPU). Then, we designed an imaging point parallel strategy to achieve an optimal parallel computing performance. Afterward, we adopted an asynchronous double buffering scheme for multi-stream to perform the GPU/CPU parallel computing. Moreover, several key optimization strategies of computation and storage based on the compute unified device architecture (CUDA) were adopted to accelerate the 3D stationary-phase QPSTM algorithm. Compared with the initial GPU code, the implementation of the key optimization steps, including thread optimization, shared memory optimization, register optimization and special function units (SFU), greatly improved the efficiency. A numerical example employing real large-scale, 3D seismic data showed that our scheme is nearly 80 times faster than the CPU-QPSTM algorithm. Our GPU/CPU heterogeneous parallel computing framework significant reduces the computational cost and facilitates 3D high-resolution imaging for large-scale seismic data.
Heliostat blocking and shadowing efficiency in the video-game era
Energy Technology Data Exchange (ETDEWEB)
Ramos, Alberto [Deutsches Elektronen-Synchrotron (DESY), Zeuthen (Germany). John von Neumann-Inst. fuer Computing NIC; Ramos, Francisco [Nevada Software Informatica S.L., Madrid (Spain)
2014-02-15
Blocking and shadowing is one of the key effects in designing and evaluating a thermal central receiver solar tower plant. Therefore it is convenient to develop efficient algorithms to compute the area of an heliostat blocked or shadowed by the rest of the field. In this paper we explore the possibility of using very efficient clipping algorithms developed for the video game and imaging industry to compute the blocking and shadowing efficiency of a solar thermal plant layout. We propose an algorithm valid for arbitrary position, orientation and size of the heliostats. This algorithm turns out to be very accurate, free of assumptions and fast. We show the feasibility of the use of this algorithm to the optimization of a solar plant by studying a couple of examples in detail.
Heliostat blocking and shadowing efficiency in the video-game era
International Nuclear Information System (INIS)
Ramos, Alberto
2014-02-01
Blocking and shadowing is one of the key effects in designing and evaluating a thermal central receiver solar tower plant. Therefore it is convenient to develop efficient algorithms to compute the area of an heliostat blocked or shadowed by the rest of the field. In this paper we explore the possibility of using very efficient clipping algorithms developed for the video game and imaging industry to compute the blocking and shadowing efficiency of a solar thermal plant layout. We propose an algorithm valid for arbitrary position, orientation and size of the heliostats. This algorithm turns out to be very accurate, free of assumptions and fast. We show the feasibility of the use of this algorithm to the optimization of a solar plant by studying a couple of examples in detail.
Smolyak's algorithm: A powerful black box for the acceleration of scientific computations
Tempone, Raul; Wolfers, Soeren
2017-01-01
We provide a general discussion of Smolyak's algorithm for the acceleration of scientific computations. The algorithm first appeared in Smolyak's work on multidimensional integration and interpolation. Since then, it has been generalized in multiple directions and has been associated with the keywords: sparse grids, hyperbolic cross approximation, combination technique, and multilevel methods. Variants of Smolyak's algorithm have been employed in the computation of high-dimensional integrals in finance, chemistry, and physics, in the numerical solution of partial and stochastic differential equations, and in uncertainty quantification. Motivated by this broad and ever-increasing range of applications, we describe a general framework that summarizes fundamental results and assumptions in a concise application-independent manner.
Smolyak's algorithm: A powerful black box for the acceleration of scientific computations
Tempone, Raul
2017-03-26
We provide a general discussion of Smolyak\\'s algorithm for the acceleration of scientific computations. The algorithm first appeared in Smolyak\\'s work on multidimensional integration and interpolation. Since then, it has been generalized in multiple directions and has been associated with the keywords: sparse grids, hyperbolic cross approximation, combination technique, and multilevel methods. Variants of Smolyak\\'s algorithm have been employed in the computation of high-dimensional integrals in finance, chemistry, and physics, in the numerical solution of partial and stochastic differential equations, and in uncertainty quantification. Motivated by this broad and ever-increasing range of applications, we describe a general framework that summarizes fundamental results and assumptions in a concise application-independent manner.
ELASTIC CLOUD COMPUTING ARCHITECTURE AND SYSTEM FOR HETEROGENEOUS SPATIOTEMPORAL COMPUTING
Directory of Open Access Journals (Sweden)
X. Shi
2017-10-01
Full Text Available Spatiotemporal computation implements a variety of different algorithms. When big data are involved, desktop computer or standalone application may not be able to complete the computation task due to limited memory and computing power. Now that a variety of hardware accelerators and computing platforms are available to improve the performance of geocomputation, different algorithms may have different behavior on different computing infrastructure and platforms. Some are perfect for implementation on a cluster of graphics processing units (GPUs, while GPUs may not be useful on certain kind of spatiotemporal computation. This is the same situation in utilizing a cluster of Intel's many-integrated-core (MIC or Xeon Phi, as well as Hadoop or Spark platforms, to handle big spatiotemporal data. Furthermore, considering the energy efficiency requirement in general computation, Field Programmable Gate Array (FPGA may be a better solution for better energy efficiency when the performance of computation could be similar or better than GPUs and MICs. It is expected that an elastic cloud computing architecture and system that integrates all of GPUs, MICs, and FPGAs could be developed and deployed to support spatiotemporal computing over heterogeneous data types and computational problems.
Elastic Cloud Computing Architecture and System for Heterogeneous Spatiotemporal Computing
Shi, X.
2017-10-01
Spatiotemporal computation implements a variety of different algorithms. When big data are involved, desktop computer or standalone application may not be able to complete the computation task due to limited memory and computing power. Now that a variety of hardware accelerators and computing platforms are available to improve the performance of geocomputation, different algorithms may have different behavior on different computing infrastructure and platforms. Some are perfect for implementation on a cluster of graphics processing units (GPUs), while GPUs may not be useful on certain kind of spatiotemporal computation. This is the same situation in utilizing a cluster of Intel's many-integrated-core (MIC) or Xeon Phi, as well as Hadoop or Spark platforms, to handle big spatiotemporal data. Furthermore, considering the energy efficiency requirement in general computation, Field Programmable Gate Array (FPGA) may be a better solution for better energy efficiency when the performance of computation could be similar or better than GPUs and MICs. It is expected that an elastic cloud computing architecture and system that integrates all of GPUs, MICs, and FPGAs could be developed and deployed to support spatiotemporal computing over heterogeneous data types and computational problems.
An Efficient Randomized Algorithm for Real-Time Process Scheduling in PicOS Operating System
Helmy*, Tarek; Fatai, Anifowose; Sallam, El-Sayed
PicOS is an event-driven operating environment designed for use with embedded networked sensors. More specifically, it is designed to support the concurrency in intensive operations required by networked sensors with minimal hardware requirements. Existing process scheduling algorithms of PicOS; a commercial tiny, low-footprint, real-time operating system; have their associated drawbacks. An efficient, alternative algorithm, based on a randomized selection policy, has been proposed, demonstrated, confirmed for efficiency and fairness, on the average, and has been recommended for implementation in PicOS. Simulations were carried out and performance measures such as Average Waiting Time (AWT) and Average Turn-around Time (ATT) were used to assess the efficiency of the proposed randomized version over the existing ones. The results prove that Randomized algorithm is the best and most attractive for implementation in PicOS, since it is most fair and has the least AWT and ATT on average over the other non-preemptive scheduling algorithms implemented in this paper.