cuda-enabled graphics processing: Topics by WorldWideScience.org

Sample records for cuda-enabled graphics processing

CUDASW++: optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units

Directory of Open Access Journals (Sweden)

Maskell Douglas L

2009-05-01

Full Text Available Abstract Background The Smith-Waterman algorithm is one of the most widely used tools for searching biological sequence databases due to its high sensitivity. Unfortunately, the Smith-Waterman algorithm is computationally demanding, which is further compounded by the exponential growth of sequence databases. The recent emergence of many-core architectures, and their associated programming interfaces, provides an opportunity to accelerate sequence database searches using commonly available and inexpensive hardware. Findings Our CUDASW++ implementation (benchmarked on a single-GPU NVIDIA GeForce GTX 280 graphics card and a dual-GPU GeForce GTX 295 graphics card provides a significant performance improvement compared to other publicly available implementations, such as SWPS3, CBESW, SW-CUDA, and NCBI-BLAST. CUDASW++ supports query sequences of length up to 59K and for query sequences ranging in length from 144 to 5,478 in Swiss-Prot release 56.6, the single-GPU version achieves an average performance of 9.509 GCUPS with a lowest performance of 9.039 GCUPS and a highest performance of 9.660 GCUPS, and the dual-GPU version achieves an average performance of 14.484 GCUPS with a lowest performance of 10.660 GCUPS and a highest performance of 16.087 GCUPS. Conclusion CUDASW++ is publicly available open-source software. It provides a significant performance improvement for Smith-Waterman-based protein sequence database searches by fully exploiting the compute capability of commonly used CUDA-enabled low-cost GPUs.
BarraCUDA - a fast short read sequence aligner using graphics processing units

Directory of Open Access Journals (Sweden)

Klus Petr

2012-01-01

Full Text Available Abstract Background With the maturation of next-generation DNA sequencing (NGS technologies, the throughput of DNA sequencing reads has soared to over 600 gigabases from a single instrument run. General purpose computing on graphics processing units (GPGPU, extracts the computing power from hundreds of parallel stream processors within graphics processing cores and provides a cost-effective and energy efficient alternative to traditional high-performance computing (HPC clusters. In this article, we describe the implementation of BarraCUDA, a GPGPU sequence alignment software that is based on BWA, to accelerate the alignment of sequencing reads generated by these instruments to a reference DNA sequence. Findings Using the NVIDIA Compute Unified Device Architecture (CUDA software development environment, we ported the most computational-intensive alignment component of BWA to GPU to take advantage of the massive parallelism. As a result, BarraCUDA offers a magnitude of performance boost in alignment throughput when compared to a CPU core while delivering the same level of alignment fidelity. The software is also capable of supporting multiple CUDA devices in parallel to further accelerate the alignment throughput. Conclusions BarraCUDA is designed to take advantage of the parallelism of GPU to accelerate the alignment of millions of sequencing reads generated by NGS instruments. By doing this, we could, at least in part streamline the current bioinformatics pipeline such that the wider scientific community could benefit from the sequencing technology. BarraCUDA is currently available from http://seqbarracuda.sf.net
BarraCUDA - a fast short read sequence aligner using graphics processing units

LENUS (Irish Health Repository)

Klus, Petr

2012-01-13

Abstract Background With the maturation of next-generation DNA sequencing (NGS) technologies, the throughput of DNA sequencing reads has soared to over 600 gigabases from a single instrument run. General purpose computing on graphics processing units (GPGPU), extracts the computing power from hundreds of parallel stream processors within graphics processing cores and provides a cost-effective and energy efficient alternative to traditional high-performance computing (HPC) clusters. In this article, we describe the implementation of BarraCUDA, a GPGPU sequence alignment software that is based on BWA, to accelerate the alignment of sequencing reads generated by these instruments to a reference DNA sequence. Findings Using the NVIDIA Compute Unified Device Architecture (CUDA) software development environment, we ported the most computational-intensive alignment component of BWA to GPU to take advantage of the massive parallelism. As a result, BarraCUDA offers a magnitude of performance boost in alignment throughput when compared to a CPU core while delivering the same level of alignment fidelity. The software is also capable of supporting multiple CUDA devices in parallel to further accelerate the alignment throughput. Conclusions BarraCUDA is designed to take advantage of the parallelism of GPU to accelerate the alignment of millions of sequencing reads generated by NGS instruments. By doing this, we could, at least in part streamline the current bioinformatics pipeline such that the wider scientific community could benefit from the sequencing technology. BarraCUDA is currently available from http:\\/\\/seqbarracuda.sf.net
Graphic filter library implemented in CUDA language

OpenAIRE

Peroutková, Hedvika

2009-01-01

This thesis deals with the problem of reducing computation time of raster image processing by parallel computing on graphics processing unit. Raster image processing thereby refers to the application of graphic filters, which can be applied in sequence with different settings. This thesis evaluates the suitability of using parallelization on graphic card for raster image adjustments based on multicriterial choice. Filters are implemented for graphics processing unit in CUDA language. Opacity ...
Efficient particle-in-cell simulation of auroral plasma phenomena using a CUDA enabled graphics processing unit

Science.gov (United States)

Sewell, Stephen

This thesis introduces a software framework that effectively utilizes low-cost commercially available Graphic Processing Units (GPUs) to simulate complex scientific plasma phenomena that are modeled using the Particle-In-Cell (PIC) paradigm. The software framework that was developed conforms to the Compute Unified Device Architecture (CUDA), a standard for general purpose graphic processing that was introduced by NVIDIA Corporation. This framework has been verified for correctness and applied to advance the state of understanding of the electromagnetic aspects of the development of the Aurora Borealis and Aurora Australis. For each phase of the PIC methodology, this research has identified one or more methods to exploit the problem's natural parallelism and effectively map it for execution on the graphic processing unit and its host processor. The sources of overhead that can reduce the effectiveness of parallelization for each of these methods have also been identified. One of the novel aspects of this research was the utilization of particle sorting during the grid interpolation phase. The final representation resulted in simulations that executed about 38 times faster than simulations that were run on a single-core general-purpose processing system. The scalability of this framework to larger problem sizes and future generation systems has also been investigated.
High performance direct gravitational N-body simulations on graphics processing units II: An implementation in CUDA

NARCIS (Netherlands)

Belleman, R.G.; Bédorf, J.; Portegies Zwart, S.F.

2008-01-01

We present the results of gravitational direct N-body simulations using the graphics processing unit (GPU) on a commercial NVIDIA GeForce 8800GTX designed for gaming computers. The force evaluation of the N-body problem is implemented in "Compute Unified Device Architecture" (CUDA) using the GPU to
CUDA/GPU Technology : Parallel Programming For High Performance Scientific Computing

OpenAIRE

YUHENDRA; KUZE, Hiroaki; JOSAPHAT, Tetuko Sri Sumantyo

2009-01-01

[ABSTRACT]Graphics processing units (GP Us) originally designed for computer video cards have emerged as the most powerful chip in a high-performance workstation. In the high performance computation capabilities, graphic processing units (GPU) lead to much more powerful performance than conventional CPUs by means of parallel processing. In 2007, the birth of Compute Unified Device Architecture (CUDA) and CUDA-enabled GPUs by NVIDIA Corporation brought a revolution in the general purpose GPU a...
High Performance Processing and Analysis of Geospatial Data Using CUDA on GPU

Directory of Open Access Journals (Sweden)

STOJANOVIC, N.

2014-11-01

Full Text Available In this paper, the high-performance processing of massive geospatial data on many-core GPU (Graphic Processing Unit is presented. We use CUDA (Compute Unified Device Architecture programming framework to implement parallel processing of common Geographic Information Systems (GIS algorithms, such as viewshed analysis and map-matching. Experimental evaluation indicates the improvement in performance with respect to CPU-based solutions and shows feasibility of using GPU and CUDA for parallel implementation of GIS algorithms over large-scale geospatial datasets.
Monte Carlo methods for neutron transport on graphics processing units using Cuda - 015

International Nuclear Information System (INIS)

Nelson, A.G.; Ivanov, K.N.

2010-01-01

This work examined the feasibility of utilizing Graphics Processing Units (GPUs) to accelerate Monte Carlo neutron transport simulations. First, a clean-sheet MC code was written in C++ for an x86 CPU and later ported to run on GPUs using NVIDIA's CUDA programming language. After further optimization, the GPU ran 21 times faster than the CPU code when using single-precision floating point math. This can be further increased with no additional effort if accuracy is sacrificed for speed: using a compiler flag, the speedup was increased to 22x. Further, if double-precision floating point math is desired for neutron tracking through the geometry, a speedup of 11x was obtained. The GPUs have proven to be useful in this study, but the current generation does have limitations: the maximum memory currently available on a single GPU is only 4 GB; the GPU RAM does not provide error-checking and correction; and the optimization required for large speedups can lead to confusing code. (authors)
Fast simulation of Proton Induced X-Ray Emission Tomography using CUDA

Energy Technology Data Exchange (ETDEWEB)

Beasley, D.G., E-mail: dgbeasley@itn.pt; Marques, A.C.; Alves, L.C.; Silva, R.C. da

2013-07-01

A new 3D Proton Induced X-Ray Emission Tomography (PIXE-T) and Scanning Transmission Ion Microscopy Tomography (STIM-T) simulation software has been developed in Java and uses NVIDIA™ Common Unified Device Architecture (CUDA) to calculate the X-ray attenuation for large detector areas. A challenge with PIXE-T is to get sufficient counts while retaining a small beam spot size. Therefore a high geometric efficiency is required. However, as the detector solid angle increases the calculations required for accurate reconstruction of the data increase substantially. To overcome this limitation, the CUDA parallel computing platform was used which enables general purpose programming of NVIDIA graphics processing units (GPUs) to perform computations traditionally handled by the central processing unit (CPU). For simulation performance evaluation, the results of a CPU- and a CUDA-based simulation of a phantom are presented. Furthermore, a comparison with the simulation code in the PIXE-Tomography reconstruction software DISRA (A. Sakellariou, D.N. Jamieson, G.J.F. Legge, 2001) is also shown. Compared to a CPU implementation, the CUDA based simulation is approximately 30× faster.
Fast simulation of Proton Induced X-Ray Emission Tomography using CUDA

International Nuclear Information System (INIS)

Beasley, D.G.; Marques, A.C.; Alves, L.C.; Silva, R.C. da

2013-01-01

A new 3D Proton Induced X-Ray Emission Tomography (PIXE-T) and Scanning Transmission Ion Microscopy Tomography (STIM-T) simulation software has been developed in Java and uses NVIDIA™ Common Unified Device Architecture (CUDA) to calculate the X-ray attenuation for large detector areas. A challenge with PIXE-T is to get sufficient counts while retaining a small beam spot size. Therefore a high geometric efficiency is required. However, as the detector solid angle increases the calculations required for accurate reconstruction of the data increase substantially. To overcome this limitation, the CUDA parallel computing platform was used which enables general purpose programming of NVIDIA graphics processing units (GPUs) to perform computations traditionally handled by the central processing unit (CPU). For simulation performance evaluation, the results of a CPU- and a CUDA-based simulation of a phantom are presented. Furthermore, a comparison with the simulation code in the PIXE-Tomography reconstruction software DISRA (A. Sakellariou, D.N. Jamieson, G.J.F. Legge, 2001) is also shown. Compared to a CPU implementation, the CUDA based simulation is approximately 30× faster
Exploration of automatic optimisation for CUDA programming

KAUST Repository

Al-Mouhamed, Mayez; Khan, Ayaz ul Hassan

2014-01-01

© 2014 Taylor & Francis. Writing optimised compute unified device architecture (CUDA) program for graphic processing units (GPUs) is complex even for experts. We present a design methodology for a restructuring tool that converts C-loops into optimised CUDA kernels based on a three-step algorithm which are loop tiling, coalesced memory access and resource optimisation. A method for finding possible loop tiling solutions with coalesced memory access is developed and a simplified algorithm for restructuring C-loops into an efficient CUDA kernel is presented. In the evaluation, we implement matrix multiply (MM), matrix transpose (M-transpose), matrix scaling (M-scaling) and matrix vector multiply (MV) using the proposed algorithm. We present the analysis of the execution time and GPU throughput for the above applications, which favourably compare to other proposals. Evaluation is carried out while scaling the problem size and running under a variety of kernel configurations. The obtained speedup is about 28-35% for M-transpose compared to NVIDIA Software Development Kit, 33% speedup for MV compared to general purpose computation on graphics processing unit compiler, and more than 80% speedup for MM and M-scaling compared to CUDA-lite.
Exploration of automatic optimisation for CUDA programming

KAUST Repository

Al-Mouhamed, Mayez

2014-09-16

© 2014 Taylor & Francis. Writing optimised compute unified device architecture (CUDA) program for graphic processing units (GPUs) is complex even for experts. We present a design methodology for a restructuring tool that converts C-loops into optimised CUDA kernels based on a three-step algorithm which are loop tiling, coalesced memory access and resource optimisation. A method for finding possible loop tiling solutions with coalesced memory access is developed and a simplified algorithm for restructuring C-loops into an efficient CUDA kernel is presented. In the evaluation, we implement matrix multiply (MM), matrix transpose (M-transpose), matrix scaling (M-scaling) and matrix vector multiply (MV) using the proposed algorithm. We present the analysis of the execution time and GPU throughput for the above applications, which favourably compare to other proposals. Evaluation is carried out while scaling the problem size and running under a variety of kernel configurations. The obtained speedup is about 28-35% for M-transpose compared to NVIDIA Software Development Kit, 33% speedup for MV compared to general purpose computation on graphics processing unit compiler, and more than 80% speedup for MM and M-scaling compared to CUDA-lite.
Performance Modeling in CUDA Streams - A Means for High-Throughput Data Processing.

Science.gov (United States)

Li, Hao; Yu, Di; Kumar, Anand; Tu, Yi-Cheng

2014-10-01

Push-based database management system (DBMS) is a new type of data processing software that streams large volume of data to concurrent query operators. The high data rate of such systems requires large computing power provided by the query engine. In our previous work, we built a push-based DBMS named G-SDMS to harness the unrivaled computational capabilities of modern GPUs. A major design goal of G-SDMS is to support concurrent processing of heterogenous query processing operations and enable resource allocation among such operations. Understanding the performance of operations as a result of resource consumption is thus a premise in the design of G-SDMS. With NVIDIA's CUDA framework as the system implementation platform, we present our recent work on performance modeling of CUDA kernels running concurrently under a runtime mechanism named CUDA stream . Specifically, we explore the connection between performance and resource occupancy of compute-bound kernels and develop a model that can predict the performance of such kernels. Furthermore, we provide an in-depth anatomy of the CUDA stream mechanism and summarize the main kernel scheduling disciplines in it. Our models and derived scheduling disciplines are verified by extensive experiments using synthetic and real-world CUDA kernels.
CPU and GPU (Cuda Template Matching Comparison

Directory of Open Access Journals (Sweden)

Evaldas Borcovas

2014-05-01

Full Text Available Image processing, computer vision or other complicated opticalinformation processing algorithms require large resources. It isoften desired to execute algorithms in real time. It is hard tofulfill such requirements with single CPU processor. NVidiaproposed CUDA technology enables programmer to use theGPU resources in the computer. Current research was madewith Intel Pentium Dual-Core T4500 2.3 GHz processor with4 GB RAM DDR3 (CPU I, NVidia GeForce GT320M CUDAcompliable graphics card (GPU I and Intel Core I5-2500K3.3 GHz processor with 4 GB RAM DDR3 (CPU II, NVidiaGeForce GTX 560 CUDA compatible graphic card (GPU II.Additional libraries as OpenCV 2.1 and OpenCV 2.4.0 CUDAcompliable were used for the testing. Main test were made withstandard function MatchTemplate from the OpenCV libraries.The algorithm uses a main image and a template. An influenceof these factors was tested. Main image and template have beenresized and the algorithm computing time and performancein Gtpix/s have been measured. According to the informationobtained from the research GPU computing using the hardwarementioned earlier is till 24 times faster when it is processing abig amount of information. When the images are small the performanceof CPU and GPU are not significantly different. Thechoice of the template size makes influence on calculating withCPU. Difference in the computing time between the GPUs canbe explained by the number of cores which they have.
Graphics processing unit based computation for NDE applications

Science.gov (United States)

Nahas, C. A.; Rajagopal, Prabhu; Balasubramaniam, Krishnan; Krishnamurthy, C. V.

2012-05-01

Advances in parallel processing in recent years are helping to improve the cost of numerical simulation. Breakthroughs in Graphical Processing Unit (GPU) based computation now offer the prospect of further drastic improvements. The introduction of 'compute unified device architecture' (CUDA) by NVIDIA (the global technology company based in Santa Clara, California, USA) has made programming GPUs for general purpose computing accessible to the average programmer. Here we use CUDA to develop parallel finite difference schemes as applicable to two problems of interest to NDE community, namely heat diffusion and elastic wave propagation. The implementations are for two-dimensions. Performance improvement of the GPU implementation against serial CPU implementation is then discussed.
Exploiting graphics processing units for computational biology and bioinformatics.

Science.gov (United States)

Payne, Joshua L; Sinnott-Armstrong, Nicholas A; Moore, Jason H

2010-09-01

Advances in the video gaming industry have led to the production of low-cost, high-performance graphics processing units (GPUs) that possess more memory bandwidth and computational capability than central processing units (CPUs), the standard workhorses of scientific computing. With the recent release of generalpurpose GPUs and NVIDIA's GPU programming language, CUDA, graphics engines are being adopted widely in scientific computing applications, particularly in the fields of computational biology and bioinformatics. The goal of this article is to concisely present an introduction to GPU hardware and programming, aimed at the computational biologist or bioinformaticist. To this end, we discuss the primary differences between GPU and CPU architecture, introduce the basics of the CUDA programming language, and discuss important CUDA programming practices, such as the proper use of coalesced reads, data types, and memory hierarchies. We highlight each of these topics in the context of computing the all-pairs distance between instances in a dataset, a common procedure in numerous disciplines of scientific computing. We conclude with a runtime analysis of the GPU and CPU implementations of the all-pairs distance calculation. We show our final GPU implementation to outperform the CPU implementation by a factor of 1700.
cudaMap: a GPU accelerated program for gene expression connectivity mapping.

Science.gov (United States)

McArt, Darragh G; Bankhead, Peter; Dunne, Philip D; Salto-Tellez, Manuel; Hamilton, Peter; Zhang, Shu-Dong

2013-10-11

Modern cancer research often involves large datasets and the use of sophisticated statistical techniques. Together these add a heavy computational load to the analysis, which is often coupled with issues surrounding data accessibility. Connectivity mapping is an advanced bioinformatic and computational technique dedicated to therapeutics discovery and drug re-purposing around differential gene expression analysis. On a normal desktop PC, it is common for the connectivity mapping task with a single gene signature to take > 2h to complete using sscMap, a popular Java application that runs on standard CPUs (Central Processing Units). Here, we describe new software, cudaMap, which has been implemented using CUDA C/C++ to harness the computational power of NVIDIA GPUs (Graphics Processing Units) to greatly reduce processing times for connectivity mapping. cudaMap can identify candidate therapeutics from the same signature in just over thirty seconds when using an NVIDIA Tesla C2050 GPU. Results from the analysis of multiple gene signatures, which would previously have taken several days, can now be obtained in as little as 10 minutes, greatly facilitating candidate therapeutics discovery with high throughput. We are able to demonstrate dramatic speed differentials between GPU assisted performance and CPU executions as the computational load increases for high accuracy evaluation of statistical significance. Emerging 'omics' technologies are constantly increasing the volume of data and information to be processed in all areas of biomedical research. Embracing the multicore functionality of GPUs represents a major avenue of local accelerated computing. cudaMap will make a strong contribution in the discovery of candidate therapeutics by enabling speedy execution of heavy duty connectivity mapping tasks, which are increasingly required in modern cancer research. cudaMap is open source and can be freely downloaded from http://purl.oclc.org/NET/cudaMap.
Data Sorting Using Graphics Processing Units

Directory of Open Access Journals (Sweden)

M. J. Mišić

2012-06-01

Full Text Available Graphics processing units (GPUs have been increasingly used for general-purpose computation in recent years. The GPU accelerated applications are found in both scientific and commercial domains. Sorting is considered as one of the very important operations in many applications, so its efficient implementation is essential for the overall application performance. This paper represents an effort to analyze and evaluate the implementations of the representative sorting algorithms on the graphics processing units. Three sorting algorithms (Quicksort, Merge sort, and Radix sort were evaluated on the Compute Unified Device Architecture (CUDA platform that is used to execute applications on NVIDIA graphics processing units. Algorithms were tested and evaluated using an automated test environment with input datasets of different characteristics. Finally, the results of this analysis are briefly discussed.
Application of the opportunities of tool system 'CUDA' for graphic processors programming in scientific and technical calculation tasks

International Nuclear Information System (INIS)

Dudnik, V.A.; Kudryavtsev, V.I.; Sereda, T.M.; Us, S.A.; Shestakov, M.V.

2009-01-01

The opportunities of technology CUDA (Compute Unified Device Architecture - the unified hardware-software decision for parallel calculations on GPU)of the company NVIDIA were described. The basic differences of the programming language 'C' for GPU from 'usual' language 'C' were selected. The examples of CUDA usage for acceleration of development of applications and realization of algorithms of scientific and technical calculations were given which are carried out by the means of graphic processors (GPGPU) of accelerators GeForce of the eighth generation. The recommendations on optimization of the programs using GPU were resulted.

Benchmarking BarraCUDA on Epigenetic DNA and nVidia Pascal GPUs

OpenAIRE

Langdon, W

2016-01-01

Typically BarraCUDA uses CUDA graphics cards to map DNA reads to the human genome. Previously its software source code was genetically improved for short paired end next generation sequences. On longer, 150 base paired end noisy Cambridge Epigenetix's data, a Pascal GTX 1080 processes about 10000 strings per second, comparable with twin nVidia Tesla K40.
Optimization Solutions for Improving the Performance of the Parallel Reduction Algorithm Using Graphics Processing Units

Directory of Open Access Journals (Sweden)

Ion LUNGU

2012-01-01

Full Text Available In this paper, we research, analyze and develop optimization solutions for the parallel reduction function using graphics processing units (GPUs that implement the Compute Unified Device Architecture (CUDA, a modern and novel approach for improving the software performance of data processing applications and algorithms. Many of these applications and algorithms make use of the reduction function in their computational steps. After having designed the function and its algorithmic steps in CUDA, we have progressively developed and implemented optimization solutions for the reduction function. In order to confirm, test and evaluate the solutions' efficiency, we have developed a custom tailored benchmark suite. We have analyzed the obtained experimental results regarding: the comparison of the execution time and bandwidth when using graphic processing units covering the main CUDA architectures (Tesla GT200, Fermi GF100, Kepler GK104 and a central processing unit; the data type influence; the binary operator's influence.
Spectral-element Seismic Wave Propagation on CUDA/OpenCL Hardware Accelerators

Science.gov (United States)

Peter, D. B.; Videau, B.; Pouget, K.; Komatitsch, D.

2015-12-01

Seismic wave propagation codes are essential tools to investigate a variety of wave phenomena in the Earth. Furthermore, they can now be used for seismic full-waveform inversions in regional- and global-scale adjoint tomography. Although these seismic wave propagation solvers are crucial ingredients to improve the resolution of tomographic images to answer important questions about the nature of Earth's internal processes and subsurface structure, their practical application is often limited due to high computational costs. They thus need high-performance computing (HPC) facilities to improving the current state of knowledge. At present, numerous large HPC systems embed many-core architectures such as graphics processing units (GPUs) to enhance numerical performance. Such hardware accelerators can be programmed using either the CUDA programming environment or the OpenCL language standard. CUDA software development targets NVIDIA graphic cards while OpenCL was adopted by additional hardware accelerators, like e.g. AMD graphic cards, ARM-based processors as well as Intel Xeon Phi coprocessors. For seismic wave propagation simulations using the open-source spectral-element code package SPECFEM3D_GLOBE, we incorporated an automatic source-to-source code generation tool (BOAST) which allows us to use meta-programming of all computational kernels for forward and adjoint runs. Using our BOAST kernels, we generate optimized source code for both CUDA and OpenCL languages within the source code package. Thus, seismic wave simulations are able now to fully utilize CUDA and OpenCL hardware accelerators. We show benchmarks of forward seismic wave propagation simulations using SPECFEM3D_GLOBE on CUDA/OpenCL GPUs, validating results and comparing performances for different simulations and hardware usages.
General purpose graphic processing unit implementation of adaptive pulse compression algorithms

Science.gov (United States)

Cai, Jingxiao; Zhang, Yan

2017-07-01

This study introduces a practical approach to implement real-time signal processing algorithms for general surveillance radar based on NVIDIA graphical processing units (GPUs). The pulse compression algorithms are implemented using compute unified device architecture (CUDA) libraries such as CUDA basic linear algebra subroutines and CUDA fast Fourier transform library, which are adopted from open source libraries and optimized for the NVIDIA GPUs. For more advanced, adaptive processing algorithms such as adaptive pulse compression, customized kernel optimization is needed and investigated. A statistical optimization approach is developed for this purpose without needing much knowledge of the physical configurations of the kernels. It was found that the kernel optimization approach can significantly improve the performance. Benchmark performance is compared with the CPU performance in terms of processing accelerations. The proposed implementation framework can be used in various radar systems including ground-based phased array radar, airborne sense and avoid radar, and aerospace surveillance radar.
Exploration of automatic optimization for CUDA programming

KAUST Repository

Al-Mouhamed, Mayez; Khan, Ayaz ul Hassan

2012-01-01

Graphic processing Units (GPUs) are gaining ground in high-performance computing. CUDA (an extension to C) is most widely used parallel programming framework for general purpose GPU computations. However, the task of writing optimized CUDA program is complex even for experts. We present a method for restructuring loops into an optimized CUDA kernels based on a 3-step algorithm which are loop tiling, coalesced memory access, and resource optimization. We also establish the relationships between the influencing parameters and propose a method for finding possible tiling solutions with coalesced memory access that best meets the identified constraints. We also present a simplified algorithm for restructuring loops and rewrite them as an efficient CUDA Kernel. The execution model of synthesized kernel consists of uniformly distributing the kernel threads to keep all cores busy while transferring a tailored data locality which is accessed using coalesced pattern to amortize the long latency of the secondary memory. In the evaluation, we implement some simple applications using the proposed restructuring strategy and evaluate the performance in terms of execution time and GPU throughput. © 2012 IEEE.
Exploration of automatic optimization for CUDA programming

KAUST Repository

Al-Mouhamed, Mayez

2012-12-01

Graphic processing Units (GPUs) are gaining ground in high-performance computing. CUDA (an extension to C) is most widely used parallel programming framework for general purpose GPU computations. However, the task of writing optimized CUDA program is complex even for experts. We present a method for restructuring loops into an optimized CUDA kernels based on a 3-step algorithm which are loop tiling, coalesced memory access, and resource optimization. We also establish the relationships between the influencing parameters and propose a method for finding possible tiling solutions with coalesced memory access that best meets the identified constraints. We also present a simplified algorithm for restructuring loops and rewrite them as an efficient CUDA Kernel. The execution model of synthesized kernel consists of uniformly distributing the kernel threads to keep all cores busy while transferring a tailored data locality which is accessed using coalesced pattern to amortize the long latency of the secondary memory. In the evaluation, we implement some simple applications using the proposed restructuring strategy and evaluate the performance in terms of execution time and GPU throughput. © 2012 IEEE.
Performance analysis of a parallel Monte Carlo code for simulating solar radiative transfer in cloudy atmospheres using CUDA-enabled NVIDIA GPU

Science.gov (United States)

Russkova, Tatiana V.

2017-11-01

One tool to improve the performance of Monte Carlo methods for numerical simulation of light transport in the Earth's atmosphere is the parallel technology. A new algorithm oriented to parallel execution on the CUDA-enabled NVIDIA graphics processor is discussed. The efficiency of parallelization is analyzed on the basis of calculating the upward and downward fluxes of solar radiation in both a vertically homogeneous and inhomogeneous models of the atmosphere. The results of testing the new code under various atmospheric conditions including continuous singlelayered and multilayered clouds, and selective molecular absorption are presented. The results of testing the code using video cards with different compute capability are analyzed. It is shown that the changeover of computing from conventional PCs to the architecture of graphics processors gives more than a hundredfold increase in performance and fully reveals the capabilities of the technology used.
The Metropolis Monte Carlo method with CUDA enabled Graphic Processing Units

Energy Technology Data Exchange (ETDEWEB)

Hall, Clifford [Computational Materials Science Center, George Mason University, 4400 University Dr., Fairfax, VA 22030 (United States); School of Physics, Astronomy, and Computational Sciences, George Mason University, 4400 University Dr., Fairfax, VA 22030 (United States); Ji, Weixiao [Computational Materials Science Center, George Mason University, 4400 University Dr., Fairfax, VA 22030 (United States); Blaisten-Barojas, Estela, E-mail: blaisten@gmu.edu [Computational Materials Science Center, George Mason University, 4400 University Dr., Fairfax, VA 22030 (United States); School of Physics, Astronomy, and Computational Sciences, George Mason University, 4400 University Dr., Fairfax, VA 22030 (United States)

2014-02-01

We present a CPU–GPU system for runtime acceleration of large molecular simulations using GPU computation and memory swaps. The memory architecture of the GPU can be used both as container for simulation data stored on the graphics card and as floating-point code target, providing an effective means for the manipulation of atomistic or molecular data on the GPU. To fully take advantage of this mechanism, efficient GPU realizations of algorithms used to perform atomistic and molecular simulations are essential. Our system implements a versatile molecular engine, including inter-molecule interactions and orientational variables for performing the Metropolis Monte Carlo (MMC) algorithm, which is one type of Markov chain Monte Carlo. By combining memory objects with floating-point code fragments we have implemented an MMC parallel engine that entirely avoids the communication time of molecular data at runtime. Our runtime acceleration system is a forerunner of a new class of CPU–GPU algorithms exploiting memory concepts combined with threading for avoiding bus bandwidth and communication. The testbed molecular system used here is a condensed phase system of oligopyrrole chains. A benchmark shows a size scaling speedup of 60 for systems with 210,000 pyrrole monomers. Our implementation can easily be combined with MPI to connect in parallel several CPU–GPU duets. -- Highlights: •We parallelize the Metropolis Monte Carlo (MMC) algorithm on one CPU—GPU duet. •The Adaptive Tempering Monte Carlo employs MMC and profits from this CPU—GPU implementation. •Our benchmark shows a size scaling-up speedup of 62 for systems with 225,000 particles. •The testbed involves a polymeric system of oligopyrroles in the condensed phase. •The CPU—GPU parallelization includes dipole—dipole and Mie—Jones classic potentials.
The Metropolis Monte Carlo method with CUDA enabled Graphic Processing Units

International Nuclear Information System (INIS)

Hall, Clifford; Ji, Weixiao; Blaisten-Barojas, Estela

2014-01-01

We present a CPU–GPU system for runtime acceleration of large molecular simulations using GPU computation and memory swaps. The memory architecture of the GPU can be used both as container for simulation data stored on the graphics card and as floating-point code target, providing an effective means for the manipulation of atomistic or molecular data on the GPU. To fully take advantage of this mechanism, efficient GPU realizations of algorithms used to perform atomistic and molecular simulations are essential. Our system implements a versatile molecular engine, including inter-molecule interactions and orientational variables for performing the Metropolis Monte Carlo (MMC) algorithm, which is one type of Markov chain Monte Carlo. By combining memory objects with floating-point code fragments we have implemented an MMC parallel engine that entirely avoids the communication time of molecular data at runtime. Our runtime acceleration system is a forerunner of a new class of CPU–GPU algorithms exploiting memory concepts combined with threading for avoiding bus bandwidth and communication. The testbed molecular system used here is a condensed phase system of oligopyrrole chains. A benchmark shows a size scaling speedup of 60 for systems with 210,000 pyrrole monomers. Our implementation can easily be combined with MPI to connect in parallel several CPU–GPU duets. -- Highlights: •We parallelize the Metropolis Monte Carlo (MMC) algorithm on one CPU—GPU duet. •The Adaptive Tempering Monte Carlo employs MMC and profits from this CPU—GPU implementation. •Our benchmark shows a size scaling-up speedup of 62 for systems with 225,000 particles. •The testbed involves a polymeric system of oligopyrroles in the condensed phase. •The CPU—GPU parallelization includes dipole—dipole and Mie—Jones classic potentials.
Swan: A tool for porting CUDA programs to OpenCL

Science.gov (United States)

Harvey, M. J.; De Fabritiis, G.

2011-04-01

The use of modern, high-performance graphical processing units (GPUs) for acceleration of scientific computation has been widely reported. The majority of this work has used the CUDA programming model supported exclusively by GPUs manufactured by NVIDIA. An industry standardisation effort has recently produced the OpenCL specification for GPU programming. This offers the benefits of hardware-independence and reduced dependence on proprietary tool-chains. Here we describe a source-to-source translation tool, "Swan" for facilitating the conversion of an existing CUDA code to use the OpenCL model, as a means to aid programmers experienced with CUDA in evaluating OpenCL and alternative hardware. While the performance of equivalent OpenCL and CUDA code on fixed hardware should be comparable, we find that a real-world CUDA application ported to OpenCL exhibits an overall 50% increase in runtime, a reduction in performance attributable to the immaturity of contemporary compilers. The ported application is shown to have platform independence, running on both NVIDIA and AMD GPUs without modification. We conclude that OpenCL is a viable platform for developing portable GPU applications but that the more mature CUDA tools continue to provide best performance. Program summaryProgram title: Swan Catalogue identifier: AEIH_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEIH_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU Public License version 2 No. of lines in distributed program, including test data, etc.: 17 736 No. of bytes in distributed program, including test data, etc.: 131 177 Distribution format: tar.gz Programming language: C Computer: PC Operating system: Linux RAM: 256 Mbytes Classification: 6.5 External routines: NVIDIA CUDA, OpenCL Nature of problem: Graphical Processing Units (GPUs) from NVIDIA are preferentially programed with the proprietary CUDA programming toolkit. An
CUDASW++2.0: enhanced Smith-Waterman protein database search on CUDA-enabled GPUs based on SIMT and virtualized SIMD abstractions

Directory of Open Access Journals (Sweden)

Schmidt Bertil

2010-04-01

Full Text Available Abstract Background Due to its high sensitivity, the Smith-Waterman algorithm is widely used for biological database searches. Unfortunately, the quadratic time complexity of this algorithm makes it highly time-consuming. The exponential growth of biological databases further deteriorates the situation. To accelerate this algorithm, many efforts have been made to develop techniques in high performance architectures, especially the recently emerging many-core architectures and their associated programming models. Findings This paper describes the latest release of the CUDASW++ software, CUDASW++ 2.0, which makes new contributions to Smith-Waterman protein database searches using compute unified device architecture (CUDA. A parallel Smith-Waterman algorithm is proposed to further optimize the performance of CUDASW++ 1.0 based on the single instruction, multiple thread (SIMT abstraction. For the first time, we have investigated a partitioned vectorized Smith-Waterman algorithm using CUDA based on the virtualized single instruction, multiple data (SIMD abstraction. The optimized SIMT and the partitioned vectorized algorithms were benchmarked, and remarkably, have similar performance characteristics. CUDASW++ 2.0 achieves performance improvement over CUDASW++ 1.0 as much as 1.74 (1.72 times using the optimized SIMT algorithm and up to 1.77 (1.66 times using the partitioned vectorized algorithm, with a performance of up to 17 (30 billion cells update per second (GCUPS on a single-GPU GeForce GTX 280 (dual-GPU GeForce GTX 295 graphics card. Conclusions CUDASW++ 2.0 is publicly available open-source software, written in CUDA and C++ programming languages. It obtains significant performance improvement over CUDASW++ 1.0 using either the optimized SIMT algorithm or the partitioned vectorized algorithm for Smith-Waterman protein database searches by fully exploiting the compute capability of commonly used CUDA-enabled low-cost GPUs.
CUDA-based real time surgery simulation.

Science.gov (United States)

Liu, Youquan; De, Suvranu

2008-01-01

In this paper we present a general software platform that enables real time surgery simulation on the newly available compute unified device architecture (CUDA)from NVIDIA. CUDA-enabled GPUs harness the power of 128 processors which allow data parallel computations. Compared to the previous GPGPU, it is significantly more flexible with a C language interface. We report implementation of both collision detection and consequent deformation computation algorithms. Our test results indicate that the CUDA enables a twenty times speedup for collision detection and about fifteen times speedup for deformation computation on an Intel Core 2 Quad 2.66 GHz machine with GeForce 8800 GTX.
Development of High-speed Visualization System of Hypocenter Data Using CUDA-based GPU computing

Science.gov (United States)

Kumagai, T.; Okubo, K.; Uchida, N.; Matsuzawa, T.; Kawada, N.; Takeuchi, N.

2014-12-01

After the Great East Japan Earthquake on March 11, 2011, intelligent visualization of seismic information is becoming important to understand the earthquake phenomena. On the other hand, to date, the quantity of seismic data becomes enormous as a progress of high accuracy observation network; we need to treat many parameters (e.g., positional information, origin time, magnitude, etc.) to efficiently display the seismic information. Therefore, high-speed processing of data and image information is necessary to handle enormous amounts of seismic data. Recently, GPU (Graphic Processing Unit) is used as an acceleration tool for data processing and calculation in various study fields. This movement is called GPGPU (General Purpose computing on GPUs). In the last few years the performance of GPU keeps on improving rapidly. GPU computing gives us the high-performance computing environment at a lower cost than before. Moreover, use of GPU has an advantage of visualization of processed data, because GPU is originally architecture for graphics processing. In the GPU computing, the processed data is always stored in the video memory. Therefore, we can directly write drawing information to the VRAM on the video card by combining CUDA and the graphics API. In this study, we employ CUDA and OpenGL and/or DirectX to realize full-GPU implementation. This method makes it possible to write drawing information to the VRAM on the video card without PCIe bus data transfer: It enables the high-speed processing of seismic data. The present study examines the GPU computing-based high-speed visualization and the feasibility for high-speed visualization system of hypocenter data.
cudaBayesreg: Parallel Implementation of a Bayesian Multilevel Model for fMRI Data Analysis

Directory of Open Access Journals (Sweden)

Adelino R. Ferreira da Silva

2011-10-01

Full Text Available Graphic processing units (GPUs are rapidly gaining maturity as powerful general parallel computing devices. A key feature in the development of modern GPUs has been the advancement of the programming model and programming tools. Compute Unified Device Architecture (CUDA is a software platform for massively parallel high-performance computing on Nvidia many-core GPUs. In functional magnetic resonance imaging (fMRI, the volume of the data to be processed, and the type of statistical analysis to perform call for high-performance computing strategies. In this work, we present the main features of the R-CUDA package cudaBayesreg which implements in CUDA the core of a Bayesian multilevel model for the analysis of brain fMRI data. The statistical model implements a Gibbs sampler for multilevel/hierarchical linear models with a normal prior. The main contribution for the increased performance comes from the use of separate threads for fitting the linear regression model at each voxel in parallel. The R-CUDA implementation of the Bayesian model proposed here has been able to reduce significantly the run-time processing of Markov chain Monte Carlo (MCMC simulations used in Bayesian fMRI data analyses. Presently, cudaBayesreg is only configured for Linux systems with Nvidia CUDA support.
Accelerating Solution Proposal of AES Using a Graphic Processor

Directory of Open Access Journals (Sweden)

STRATULAT, M.

2011-11-01

Full Text Available The main goal of this work is to analyze the possibility of using a graphic processing unit in non graphical calculations. Graphic Processing Units are being used nowadays not only for game engines and movie encoding/decoding, but also for a vast area of applications, like Cryptography. We used the graphic processing unit as a cryptographic coprocessor in order accelerate AES algorithm. Our implementation of AES is on a GPU using CUDA architecture. The performances obtained show that the CUDA implementation can offer speedups of 11.95Gbps. The tests are conducted in two directions: running the tests on small data sizes that are located in memory and large data that are stored in files on hard drives.
Advanced mathematical on-line analysis in nuclear experiments. Usage of parallel computing CUDA routines in standard root analysis

Science.gov (United States)

Grzeszczuk, A.; Kowalski, S.

2015-04-01

Compute Unified Device Architecture (CUDA) is a parallel computing platform developed by Nvidia for increase speed of graphics by usage of parallel mode for processes calculation. The success of this solution has opened technology General-Purpose Graphic Processor Units (GPGPUs) for applications not coupled with graphics. The GPGPUs system can be applying as effective tool for reducing huge number of data for pulse shape analysis measures, by on-line recalculation or by very quick system of compression. The simplified structure of CUDA system and model of programming based on example Nvidia GForce GTX580 card are presented by our poster contribution in stand-alone version and as ROOT application.
Optimization Specifications for CUDA Code Restructuring Tool

KAUST Repository

Khan, Ayaz

2017-03-13

In this work we have developed a restructuring software tool (RT-CUDA) following the proposed optimization specifications to bridge the gap between high-level languages and the machine dependent CUDA environment. RT-CUDA takes a C program and convert it into an optimized CUDA kernel with user directives in a configuration file for guiding the compiler. RTCUDA also allows transparent invocation of the most optimized external math libraries like cuSparse and cuBLAS enabling efficient design of linear algebra solvers. We expect RT-CUDA to be needed by many KSA industries dealing with science and engineering simulation on massively parallel computers like NVIDIA GPUs.
Utilizing General Purpose Graphics Processing Units to Improve Performance of Computer Modelling and Visualization

Science.gov (United States)

Monk, J.; Zhu, Y.; Koons, P. O.; Segee, B. E.

2009-12-01

With the introduction of the G8X series of cards by nVidia an architecture called CUDA was released, virtually all subsequent video cards have had CUDA support. With this new architecture nVidia provided extensions for C/C++ that create an Application Programming Interface (API) allowing code to be executed on the GPU. Since then the concept of GPGPU (general purpose graphics processing unit) has been growing, this is the concept that the GPU is very good a algebra and running things in parallel so we should take use of that power for other applications. This is highly appealing in the area of geodynamic modeling, as multiple parallel solutions of the same differential equations at different points in space leads to a large speedup in simulation speed. Another benefit of CUDA is a programmatic method of transferring large amounts of data between the computer's main memory and the dedicated GPU memory located on the video card. In addition to being able to compute and render on the video card, the CUDA framework allows for a large speedup in the situation, such as with a tiled display wall, where the rendered pixels are to be displayed in a different location than where they are rendered. A CUDA extension for VirtualGL was developed allowing for faster read back at high resolutions. This paper examines several aspects of rendering OpenGL graphics on large displays using VirtualGL and VNC. It demonstrates how performance can be significantly improved in rendering on a tiled monitor wall. We present a CUDA enhanced version of VirtualGL as well as the advantages to having multiple VNC servers. It will discuss restrictions caused by read back and blitting rates and how they are affected by different sizes of virtual displays being rendered.
A Block-Asynchronous Relaxation Method for Graphics Processing Units

OpenAIRE

Anzt, H.; Dongarra, J.; Heuveline, Vincent; Tomov, S.

2011-01-01

In this paper, we analyze the potential of asynchronous relaxation methods on Graphics Processing Units (GPUs). For this purpose, we developed a set of asynchronous iteration algorithms in CUDA and compared them with a parallel implementation of synchronous relaxation methods on CPU-based systems. For a set of test matrices taken from the University of Florida Matrix Collection we monitor the convergence behavior, the average iteration time and the total time-to-solution time. Analyzing the r...
Rapid data processing for ultrafast X-ray computed tomography using scalable and modular CUDA based pipelines

Science.gov (United States)

Frust, Tobias; Wagner, Michael; Stephan, Jan; Juckeland, Guido; Bieberle, André

2017-10-01

Ultrafast X-ray tomography is an advanced imaging technique for the study of dynamic processes basing on the principles of electron beam scanning. A typical application case for this technique is e.g. the study of multiphase flows, that is, flows of mixtures of substances such as gas-liquidflows in pipelines or chemical reactors. At Helmholtz-Zentrum Dresden-Rossendorf (HZDR) a number of such tomography scanners are operated. Currently, there are two main points limiting their application in some fields. First, after each CT scan sequence the data of the radiation detector must be downloaded from the scanner to a data processing machine. Second, the current data processing is comparably time-consuming compared to the CT scan sequence interval. To enable online observations or use this technique to control actuators in real-time, a modular and scalable data processing tool has been developed, consisting of user-definable stages working independently together in a so called data processing pipeline, that keeps up with the CT scanner's maximal frame rate of up to 8 kHz. The newly developed data processing stages are freely programmable and combinable. In order to achieve the highest processing performance all relevant data processing steps, which are required for a standard slice image reconstruction, were individually implemented in separate stages using Graphics Processing Units (GPUs) and NVIDIA's CUDA programming language. Data processing performance tests on different high-end GPUs (Tesla K20c, GeForce GTX 1080, Tesla P100) showed excellent performance. Program Files doi:http://dx.doi.org/10.17632/65sx747rvm.1 Licensing provisions: LGPLv3 Programming language: C++/CUDA Supplementary material: Test data set, used for the performance analysis. Nature of problem: Ultrafast computed tomography is performed with a scan rate of up to 8 kHz. To obtain cross-sectional images from projection data computer-based image reconstruction algorithms must be applied. The

Advanced mathematical on-line analysis in nuclear experiments. Usage of parallel computing CUDA routines in standard root analysis

Directory of Open Access Journals (Sweden)

Grzeszczuk A.

2015-01-01

Full Text Available Compute Unified Device Architecture (CUDA is a parallel computing platform developed by Nvidia for increase speed of graphics by usage of parallel mode for processes calculation. The success of this solution has opened technology General-Purpose Graphic Processor Units (GPGPUs for applications not coupled with graphics. The GPGPUs system can be applying as effective tool for reducing huge number of data for pulse shape analysis measures, by on-line recalculation or by very quick system of compression. The simplified structure of CUDA system and model of programming based on example Nvidia GForce GTX580 card are presented by our poster contribution in stand-alone version and as ROOT application.
Real-time radar signal processing using GPGPU (general-purpose graphic processing unit)

Science.gov (United States)

Kong, Fanxing; Zhang, Yan Rockee; Cai, Jingxiao; Palmer, Robert D.

2016-05-01

This study introduces a practical approach to develop real-time signal processing chain for general phased array radar on NVIDIA GPUs(Graphical Processing Units) using CUDA (Compute Unified Device Architecture) libraries such as cuBlas and cuFFT, which are adopted from open source libraries and optimized for the NVIDIA GPUs. The processed results are rigorously verified against those from the CPUs. Performance benchmarked in computation time with various input data cube sizes are compared across GPUs and CPUs. Through the analysis, it will be demonstrated that GPGPUs (General Purpose GPU) real-time processing of the array radar data is possible with relatively low-cost commercial GPUs.
Multi–GPU Implementation of Machine Learning Algorithm using CUDA and OpenCL

Directory of Open Access Journals (Sweden)

Jan Masek

2016-06-01

Full Text Available Using modern Graphic Processing Units (GPUs becomes very useful for computing complex and time consuming processes. GPUs provide high–performance computation capabilities with a good price. This paper deals with a multi–GPU OpenCL and CUDA implementations of k–Nearest Neighbor (k–NN algorithm. This work compares performances of OpenCLand CUDA implementations where each of them is suitable for different number of used attributes. The proposed CUDA algorithm achieves acceleration up to 880x in comparison witha single thread CPU version. The common k-NN was modified to be faster when the lower number of k neighbors is set. The performance of algorithm was verified with two GPUs dual-core NVIDIA GeForce GTX 690 and CPU Intel Core i7 3770 with 4.1 GHz frequency. The results of speed up were measured for one GPU, two GPUs, three and four GPUs. We performed several tests with data sets containing up to 4 million elements with various number of attributes.
Processing-in-Memory Enabled Graphics Processors for 3D Rendering

Energy Technology Data Exchange (ETDEWEB)

Xie, Chenhao; Song, Shuaiwen; Wang, Jing; Zhang, Weigong; Fu, Xin

2017-02-06

The performance of 3D rendering of Graphics Processing Unit that convents 3D vector stream into 2D frame with 3D image effects significantly impact users’ gaming experience on modern computer systems. Due to the high texture throughput in 3D rendering, main memory bandwidth becomes a critical obstacle for improving the overall rendering performance. 3D stacked memory systems such as Hybrid Memory Cube (HMC) provide opportunities to significantly overcome the memory wall by directly connecting logic controllers to DRAM dies. Based on the observation that texel fetches significantly impact off-chip memory traffic, we propose two architectural designs to enable Processing-In-Memory based GPU for efficient 3D rendering.
Elastic Alignment of Microscopic Images Using Parallel Processing on CUDA-Supported Graphics Processor Units

Czech Academy of Sciences Publication Activity Database

Michálek, Jan; Čapek, M.; Janáček, Jiří; Kubínová, Lucie

2010-01-01

Roč. 16, Suppl.2 (2010), s. 730-731 ISSN 1431-9276. [Microscopy and Microanalysis 2010. Portland, 01.08.2010-05.08.2010] R&D Projects: GA ČR(CZ) GA102/08/0691; GA ČR(CZ) GA304/09/0733; GA MŠk(CZ) LC06063 Institutional research plan: CEZ:AV0Z50110509 Keywords : elastic alignment * CUDA * confocal microscopy Subject RIV: JD - Computer Applications, Robotics Impact factor: 2.179, year: 2010
Performance comparison of OpenCL and CUDA by benchmarking an optimized perspective backprojection

Energy Technology Data Exchange (ETDEWEB)

Swall, Stefan; Ritschl, Ludwig; Knaup, Michael; Kachelriess, Marc [Erlangen-Nuernberg Univ., Erlangen (Germany). Inst. of Medical Physics (IMP)

2011-07-01

The increase in performance of Graphical Processing Units (GPUs) and the onward development of dedicated software tools within the last decade allows to transfer performance-demanding computations from the Central Processing Unit (CPU) to the GPU and to speed up certain tasks by utilizing the massiv parallel architecture of these devices. The Computate Unified Device Architecture (CUDA) developed by NVIDIA provides an easy hence effective way to develop application that target NVIDIA GPUs. It has become one of the cardinal software tools for this purpose. Recently the Open Computing Language (OpenCL) became available that is neither vendor-specific nor limited to GPUs only. As the benefits of CUDA-based image reconstruction are well known we aim at providing a comparison between the performance that can be achieved with CUDA in comparison to OpenCL by benchmarking the time required to perform a simple but computationally demanding task: the perspective backprojection. (orig.)
3D Tomographic Image Reconstruction using CUDA C

International Nuclear Information System (INIS)

Dominguez, J. S.; Assis, J. T.; Oliveira, L. F. de

2011-01-01

This paper presents the study and implementation of a software for three dimensional reconstruction of images obtained with a tomographic system using the capabilities of Graphic Processing Units(GPU). The reconstruction by filtered back-projection method was developed using the CUDA C, for maximum utilization of the processing capabilities of GPUs to solve computational problems with large computational cost and highly parallelizable. It was discussed the potential of GPUs and shown its advantages to solving this kind of problems. The results in terms of runtime will be compared with non-parallelized implementations and must show a great reduction of processing time. (Author)
A high performance GPU implementation of Surface Energy Balance System (SEBS) based on CUDA-C

NARCIS (Netherlands)

Abouali, Mohammad; Timmermans, J.; Castillo, Jose E.; Su, Zhongbo

2013-01-01

This paper introduces a new implementation of the Surface Energy Balance System (SEBS) algorithm harnessing the many cores available on Graphics Processing Units (GPUs). This new implementation uses Compute Unified Device Architecture C (CUDA-C) programming model and is designed to be executed on a
Workflow of the Grover algorithm simulation incorporating CUDA and GPGPU

Science.gov (United States)

Lu, Xiangwen; Yuan, Jiabin; Zhang, Weiwei

2013-09-01

The Grover quantum search algorithm, one of only a few representative quantum algorithms, can speed up many classical algorithms that use search heuristics. No true quantum computer has yet been developed. For the present, simulation is one effective means of verifying the search algorithm. In this work, we focus on the simulation workflow using a compute unified device architecture (CUDA). Two simulation workflow schemes are proposed. These schemes combine the characteristics of the Grover algorithm and the parallelism of general-purpose computing on graphics processing units (GPGPU). We also analyzed the optimization of memory space and memory access from this perspective. We implemented four programs on CUDA to evaluate the performance of schemes and optimization. Through experimentation, we analyzed the organization of threads suited to Grover algorithm simulations, compared the storage costs of the four programs, and validated the effectiveness of optimization. Experimental results also showed that the distinguished program on CUDA outperformed the serial program of libquantum on a CPU with a speedup of up to 23 times (12 times on average), depending on the scale of the simulation.
Accelerating VASP electronic structure calculations using graphic processing units

KAUST Repository

Hacene, Mohamed

2012-08-20

We present a way to improve the performance of the electronic structure Vienna Ab initio Simulation Package (VASP) program. We show that high-performance computers equipped with graphics processing units (GPUs) as accelerators may reduce drastically the computation time when offloading these sections to the graphic chips. The procedure consists of (i) profiling the performance of the code to isolate the time-consuming parts, (ii) rewriting these so that the algorithms become better-suited for the chosen graphic accelerator, and (iii) optimizing memory traffic between the host computer and the GPU accelerator. We chose to accelerate VASP with NVIDIA GPU using CUDA. We compare the GPU and original versions of VASP by evaluating the Davidson and RMM-DIIS algorithms on chemical systems of up to 1100 atoms. In these tests, the total time is reduced by a factor between 3 and 8 when running on n (CPU core + GPU) compared to n CPU cores only, without any accuracy loss. © 2012 Wiley Periodicals, Inc.
Accelerating VASP electronic structure calculations using graphic processing units

KAUST Repository

Hacene, Mohamed; Anciaux-Sedrakian, Ani; Rozanska, Xavier; Klahr, Diego; Guignon, Thomas; Fleurat-Lessard, Paul

2012-01-01

We present a way to improve the performance of the electronic structure Vienna Ab initio Simulation Package (VASP) program. We show that high-performance computers equipped with graphics processing units (GPUs) as accelerators may reduce drastically the computation time when offloading these sections to the graphic chips. The procedure consists of (i) profiling the performance of the code to isolate the time-consuming parts, (ii) rewriting these so that the algorithms become better-suited for the chosen graphic accelerator, and (iii) optimizing memory traffic between the host computer and the GPU accelerator. We chose to accelerate VASP with NVIDIA GPU using CUDA. We compare the GPU and original versions of VASP by evaluating the Davidson and RMM-DIIS algorithms on chemical systems of up to 1100 atoms. In these tests, the total time is reduced by a factor between 3 and 8 when running on n (CPU core + GPU) compared to n CPU cores only, without any accuracy loss. © 2012 Wiley Periodicals, Inc.
Application of the Characteristic Basis Function Method Using CUDA

Directory of Open Access Journals (Sweden)

Juan Ignacio Pérez

2014-01-01

Full Text Available The characteristic basis function method (CBFM is a popular technique for efficiently solving the method of moments (MoM matrix equations. In this work, we address the adaptation of this method to a relatively new computing infrastructure provided by NVIDIA, the Compute Unified Device Architecture (CUDA, and take into account some of the limitations which appear when the geometry under analysis becomes too big to fit into the Graphics Processing Unit’s (GPU’s memory.
MALBEC: a new CUDA-C ray-tracer in general relativity

Science.gov (United States)

Quiroga, G. D.

2018-06-01

A new CUDA-C code for tracing orbits around non-charged black holes is presented. This code, named MALBEC, take advantage of the graphic processing units and the CUDA platform for tracking null and timelike test particles in Schwarzschild and Kerr. Also, a new general set of equations that describe the closed circular orbits of any timelike test particle in the equatorial plane is derived. These equations are extremely important in order to compare the analytical behavior of the orbits with the numerical results and verify the correct implementation of the Runge-Kutta algorithm in MALBEC. Finally, other numerical tests are performed, demonstrating that MALBEC is able to reproduce some well-known results in these metrics in a faster and more efficient way than a conventional CPU implementation.
Fast phase processing in off-axis holography by CUDA including parallel phase unwrapping.

Science.gov (United States)

Backoach, Ohad; Kariv, Saar; Girshovitz, Pinhas; Shaked, Natan T

2016-02-22

We present parallel processing implementation for rapid extraction of the quantitative phase maps from off-axis holograms on the Graphics Processing Unit (GPU) of the computer using computer unified device architecture (CUDA) programming. To obtain efficient implementation, we parallelized both the wrapped phase map extraction algorithm and the two-dimensional phase unwrapping algorithm. In contrast to previous implementations, we utilized unweighted least squares phase unwrapping algorithm that better suits parallelism. We compared the proposed algorithm run times on the CPU and the GPU of the computer for various sizes of off-axis holograms. Using the GPU implementation, we extracted the unwrapped phase maps from the recorded off-axis holograms at 35 frames per second (fps) for 4 mega pixel holograms, and at 129 fps for 1 mega pixel holograms, which presents the fastest processing framerates obtained so far, to the best of our knowledge. We then used common-path off-axis interferometric imaging to quantitatively capture the phase maps of a micro-organism with rapid flagellum movements.
Frequency Domain Image Filtering Using CUDA

Directory of Open Access Journals (Sweden)

Muhammad Awais Rajput

2014-10-01

Full Text Available In this paper, we investigate the implementation of image filtering in frequency domain using NVIDIA?s CUDA (Compute Unified Device Architecture. In contrast to signal and image filtering in spatial domain which uses convolution operations and hence is more compute-intensive for filters having larger spatial extent, the frequency domain filtering uses FFT (Fast Fourier Transform which is much faster and significantly reduces the computational complexity of the filtering. We implement the frequency domain filtering on CPU and GPU respectively and analyze the speed-up obtained from the CUDA?s parallel processing paradigm. In order to demonstrate the efficiency of frequency domain filtering on CUDA, we implement three frequency domain filters, i.e., Butterworth, low-pass and Gaussian for processing different sizes of images on CPU and GPU respectively and perform the GPU vs. CPU benchmarks. The results presented in this paper show that the frequency domain filtering with CUDA achieves significant speed-up over the CPU processing in frequency domain with the same level of (output image quality on both the processing architectures
GPUmotif: an ultra-fast and energy-efficient motif analysis program using graphics processing units.

Science.gov (United States)

Zandevakili, Pooya; Hu, Ming; Qin, Zhaohui

2012-01-01

Computational detection of TF binding patterns has become an indispensable tool in functional genomics research. With the rapid advance of new sequencing technologies, large amounts of protein-DNA interaction data have been produced. Analyzing this data can provide substantial insight into the mechanisms of transcriptional regulation. However, the massive amount of sequence data presents daunting challenges. In our previous work, we have developed a novel algorithm called Hybrid Motif Sampler (HMS) that enables more scalable and accurate motif analysis. Despite much improvement, HMS is still time-consuming due to the requirement to calculate matching probabilities position-by-position. Using the NVIDIA CUDA toolkit, we developed a graphics processing unit (GPU)-accelerated motif analysis program named GPUmotif. We proposed a "fragmentation" technique to hide data transfer time between memories. Performance comparison studies showed that commonly-used model-based motif scan and de novo motif finding procedures such as HMS can be dramatically accelerated when running GPUmotif on NVIDIA graphics cards. As a result, energy consumption can also be greatly reduced when running motif analysis using GPUmotif. The GPUmotif program is freely available at http://sourceforge.net/projects/gpumotif/
GPUmotif: an ultra-fast and energy-efficient motif analysis program using graphics processing units.

Directory of Open Access Journals (Sweden)

Pooya Zandevakili

Full Text Available Computational detection of TF binding patterns has become an indispensable tool in functional genomics research. With the rapid advance of new sequencing technologies, large amounts of protein-DNA interaction data have been produced. Analyzing this data can provide substantial insight into the mechanisms of transcriptional regulation. However, the massive amount of sequence data presents daunting challenges. In our previous work, we have developed a novel algorithm called Hybrid Motif Sampler (HMS that enables more scalable and accurate motif analysis. Despite much improvement, HMS is still time-consuming due to the requirement to calculate matching probabilities position-by-position. Using the NVIDIA CUDA toolkit, we developed a graphics processing unit (GPU-accelerated motif analysis program named GPUmotif. We proposed a "fragmentation" technique to hide data transfer time between memories. Performance comparison studies showed that commonly-used model-based motif scan and de novo motif finding procedures such as HMS can be dramatically accelerated when running GPUmotif on NVIDIA graphics cards. As a result, energy consumption can also be greatly reduced when running motif analysis using GPUmotif. The GPUmotif program is freely available at http://sourceforge.net/projects/gpumotif/
Survey of using GPU CUDA programming model in medical image analysis

Directory of Open Access Journals (Sweden)

T. Kalaiselvi

2017-01-01

Full Text Available With the technology development of medical industry, processing data is expanding rapidly and computation time also increases due to many factors like 3D, 4D treatment planning, the increasing sophistication of MRI pulse sequences and the growing complexity of algorithms. Graphics processing unit (GPU addresses these problems and gives the solutions for using their features such as, high computation throughput, high memory bandwidth, support for floating-point arithmetic and low cost. Compute unified device architecture (CUDA is a popular GPU programming model introduced by NVIDIA for parallel computing. This review paper briefly discusses the need of GPU CUDA computing in the medical image analysis. The GPU performances of existing algorithms are analyzed and the computational gain is discussed. A few open issues, hardware configurations and optimization principles of existing methods are discussed. This survey concludes the few optimization techniques with the medical imaging algorithms on GPU. Finally, limitation and future scope of GPU programming are discussed.
Using of opportunities of graphic processors for acceleration of scientific and technical calculations

International Nuclear Information System (INIS)

Dudnik, V.A.; Kudryavtsev, V.I.; Sereda, T.M.; Us, S.A.; Shestakov, M.V.

2009-01-01

The new opportunities of modern graphic processors (GPU) for acceleration of the scientific and technical calculations with the help of paralleling of a calculating task between the central processor and GPU are described. The description of using the technology NVIDIA CUDA for connection of parallel computing opportunities of GPU within the programme of the some intensive mathematical tasks is resulted. The examples of comparison of parameters of productivity in the process of these tasks' calculation without application of GPU and with use of opportunities NVIDIA CUDA for graphic processor GeForce 8800 are resulted
Lamb wave propagation modelling and simulation using parallel processing architecture and graphical cards

International Nuclear Information System (INIS)

Paćko, P; Bielak, T; Staszewski, W J; Uhl, T; Spencer, A B; Worden, K

2012-01-01

This paper demonstrates new parallel computation technology and an implementation for Lamb wave propagation modelling in complex structures. A graphical processing unit (GPU) and computer unified device architecture (CUDA), available in low-cost graphical cards in standard PCs, are used for Lamb wave propagation numerical simulations. The local interaction simulation approach (LISA) wave propagation algorithm has been implemented as an example. Other algorithms suitable for parallel discretization can also be used in practice. The method is illustrated using examples related to damage detection. The results demonstrate good accuracy and effective computational performance of very large models. The wave propagation modelling presented in the paper can be used in many practical applications of science and engineering. (paper)

Implementation of RLS-based Adaptive Filterson nVIDIA GeForce Graphics Processing Unit

OpenAIRE

Hirano, Akihiro; Nakayama, Kenji

2011-01-01

This paper presents efficient implementa- tion of RLS-based adaptive filters with a large number of taps on nVIDIA GeForce graphics processing unit (GPU) and CUDA software development environment. Modification of the order and the combination of calcu- lations reduces the number of accesses to slow off-chip memory. Assigning tasks into multiple threads also takes memory access order into account. For a 4096-tap case, a GPU program is almost three times faster than a CPU program.
Harvesting graphics power for MD simulations

NARCIS (Netherlands)

van Meel, J.A.; Arnold, A.; Frenkel, D.; Portegies Zwart, S.F.; Belleman, R.G.

2008-01-01

We discuss an implementation of molecular dynamics (MD) simulations on a graphic processing unit (GPU) in the NVIDIA CUDA language. We tested our code on a modern GPU, the NVIDIA GeForce 8800 GTX. Results for two MD algorithms suitable for short-ranged and long-ranged interactions, and a
Harvesting graphics power for MD simulations

NARCIS (Netherlands)

Meel, J.A. van; Arnold, A.; Frenkel, D.; Portegies Zwart, S.F.; Belleman, R.G.

We discuss an implementation of molecular dynamics (MD) simulations on a graphic processing unit (GPU) in the NVIDIA CUDA language. We tested our code on a modern GPU, the NVIDIA GeForce 8800 GTX. Results for two MD algorithms suitable for short-ranged and long-ranged interactions, and a
Frequency domain image filtering using cuda

International Nuclear Information System (INIS)

Rajput, M.A.; Khan, U.A.

2014-01-01

In this paper, we investigate the implementation of image filtering in frequency domain using NVIDIA's CUDA (Compute Unified Device Architecture). In contrast to signal and image filtering in spatial domain which uses convolution operations and hence is more compute-intensive for filters having larger spatial extent, the frequency domain filtering uses FFT (Fast Fourier Transform) which is much faster and significantly reduces the computational complexity of the filtering. We implement the frequency domain filtering on CPU and GPU respectively and analyze the speed-up obtained from the CUDA's parallel processing paradigm. In order to demonstrate the efficiency of frequency domain filtering on CUDA, we implement three frequency domain filters, i.e., Butter worth, low-pass and Gaussian for processing different sizes of images on CPU and GPU respectively and perform the GPU vs. CPU benchmarks. The results presented in this paper show that the frequency domain filtering with CUDA achieves significant speed-up over the CPU processing in frequency domain with the same level of (output) image quality on both the processing architectures. (author)
Hypergraph partitioning implementation for parallelizing matrix-vector multiplication using CUDA GPU-based parallel computing

Science.gov (United States)

Murni, Bustamam, A.; Ernastuti, Handhika, T.; Kerami, D.

2017-07-01

Calculation of the matrix-vector multiplication in the real-world problems often involves large matrix with arbitrary size. Therefore, parallelization is needed to speed up the calculation process that usually takes a long time. Graph partitioning techniques that have been discussed in the previous studies cannot be used to complete the parallelized calculation of matrix-vector multiplication with arbitrary size. This is due to the assumption of graph partitioning techniques that can only solve the square and symmetric matrix. Hypergraph partitioning techniques will overcome the shortcomings of the graph partitioning technique. This paper addresses the efficient parallelization of matrix-vector multiplication through hypergraph partitioning techniques using CUDA GPU-based parallel computing. CUDA (compute unified device architecture) is a parallel computing platform and programming model that was created by NVIDIA and implemented by the GPU (graphics processing unit).
Exploring Parallel Algorithms for Volumetric Mass-Spring-Damper Models in CUDA

DEFF Research Database (Denmark)

Rasmusson, Allan; Mosegaard, Jesper; Sørensen, Thomas Sangild

2008-01-01

) from Nvidia. This paper investigates multiple implementations of volumetric Mass-Spring-Damper systems in CUDA. The obtained performance is compared to previous implementations utilizing the GPU through the OpenGL graphics API. We find that both performance and optimization strategies differ widely...
Optimized Laplacian image sharpening algorithm based on graphic processing unit

Science.gov (United States)

Ma, Tinghuai; Li, Lu; Ji, Sai; Wang, Xin; Tian, Yuan; Al-Dhelaan, Abdullah; Al-Rodhaan, Mznah

2014-12-01

In classical Laplacian image sharpening, all pixels are processed one by one, which leads to large amount of computation. Traditional Laplacian sharpening processed on CPU is considerably time-consuming especially for those large pictures. In this paper, we propose a parallel implementation of Laplacian sharpening based on Compute Unified Device Architecture (CUDA), which is a computing platform of Graphic Processing Units (GPU), and analyze the impact of picture size on performance and the relationship between the processing time of between data transfer time and parallel computing time. Further, according to different features of different memory, an improved scheme of our method is developed, which exploits shared memory in GPU instead of global memory and further increases the efficiency. Experimental results prove that two novel algorithms outperform traditional consequentially method based on OpenCV in the aspect of computing speed.
Acceleration of the OpenFOAM-based MHD solver using graphics processing units

International Nuclear Information System (INIS)

He, Qingyun; Chen, Hongli; Feng, Jingchao

2015-01-01

Highlights: • A 3D PISO-MHD was implemented on Kepler-class graphics processing units (GPUs) using CUDA technology. • A consistent and conservative scheme is used in the code which was validated by three basic benchmarks in a rectangular and round ducts. • Parallelized of CPU and GPU acceleration were compared relating to single core CPU in MHD problems and non-MHD problems. • Different preconditions for solving MHD solver were compared and the results showed that AMG method is better for calculations. - Abstract: The pressure-implicit with splitting of operators (PISO) magnetohydrodynamics MHD solver of the couple of Navier–Stokes equations and Maxwell equations was implemented on Kepler-class graphics processing units (GPUs) using the CUDA technology. The solver is developed on open source code OpenFOAM based on consistent and conservative scheme which is suitable for simulating MHD flow under strong magnetic field in fusion liquid metal blanket with structured or unstructured mesh. We verified the validity of the implementation on several standard cases including the benchmark I of Shercliff and Hunt's cases, benchmark II of fully developed circular pipe MHD flow cases and benchmark III of KIT experimental case. Computational performance of the GPU implementation was examined by comparing its double precision run times with those of essentially the same algorithms and meshes. The resulted showed that a GPU (GTX 770) can outperform a server-class 4-core, 8-thread CPU (Intel Core i7-4770k) by a factor of 2 at least.
Acceleration of the OpenFOAM-based MHD solver using graphics processing units

Energy Technology Data Exchange (ETDEWEB)

He, Qingyun; Chen, Hongli, E-mail: hlchen1@ustc.edu.cn; Feng, Jingchao

2015-12-15

Highlights: • A 3D PISO-MHD was implemented on Kepler-class graphics processing units (GPUs) using CUDA technology. • A consistent and conservative scheme is used in the code which was validated by three basic benchmarks in a rectangular and round ducts. • Parallelized of CPU and GPU acceleration were compared relating to single core CPU in MHD problems and non-MHD problems. • Different preconditions for solving MHD solver were compared and the results showed that AMG method is better for calculations. - Abstract: The pressure-implicit with splitting of operators (PISO) magnetohydrodynamics MHD solver of the couple of Navier–Stokes equations and Maxwell equations was implemented on Kepler-class graphics processing units (GPUs) using the CUDA technology. The solver is developed on open source code OpenFOAM based on consistent and conservative scheme which is suitable for simulating MHD flow under strong magnetic field in fusion liquid metal blanket with structured or unstructured mesh. We verified the validity of the implementation on several standard cases including the benchmark I of Shercliff and Hunt's cases, benchmark II of fully developed circular pipe MHD flow cases and benchmark III of KIT experimental case. Computational performance of the GPU implementation was examined by comparing its double precision run times with those of essentially the same algorithms and meshes. The resulted showed that a GPU (GTX 770) can outperform a server-class 4-core, 8-thread CPU (Intel Core i7-4770k) by a factor of 2 at least.
QDP-JIT/PTX: A QDP++ Implementation for CUDA-Enabled GPUs

Energy Technology Data Exchange (ETDEWEB)

Winter, Frank T. [JLAB; Edwards, Robert G. [JLAB

2014-11-01

These proceedings describe briefly the QDP-JIT/PTX framework for lattice field theory calcula- tions on the CUDA architecture. The framework generates compute kernels in the PTX assembler language which can be compiled to efficient GPU machine code by the NVIDIA JIT compiler. A comprehensive memory management was added to the framework so that applications, e.g. Chroma, can run unaltered on GPU clusters and supercomputers.
AN APPROACH TO EFFICIENT FEM SIMULATIONS ON GRAPHICS PROCESSING UNITS USING CUDA

Directory of Open Access Journals (Sweden)

Björn Nutti

2014-04-01

Full Text Available The paper presents a highly efficient way of simulating the dynamic behavior of deformable objects by means of the finite element method (FEM with computations performed on Graphics Processing Units (GPU. The presented implementation reduces bottlenecks related to memory accesses by grouping the necessary data per node pairs, in contrast to the classical way done per element. This strategy reduces the memory access patterns that are not suitable for the GPU memory architecture. Furthermore, the presented implementation takes advantage of the underlying sparse-block-matrix structure, and it has been demonstrated how to avoid potential bottlenecks in the algorithm. To achieve plausible deformational behavior for large local rotations, the objects are modeled by means of a simplified co-rotational FEM formulation.
Optimization Specifications for CUDA Code Restructuring Tool

KAUST Repository

Khan, Ayaz

2017-01-01

and convert it into an optimized CUDA kernel with user directives in a configuration file for guiding the compiler. RTCUDA also allows transparent invocation of the most optimized external math libraries like cuSparse and cuBLAS enabling efficient design
Fast analytical scatter estimation using graphics processing units.

Science.gov (United States)

Ingleby, Harry; Lippuner, Jonas; Rickey, Daniel W; Li, Yue; Elbakri, Idris

2015-01-01

To develop a fast patient-specific analytical estimator of first-order Compton and Rayleigh scatter in cone-beam computed tomography, implemented using graphics processing units. The authors developed an analytical estimator for first-order Compton and Rayleigh scatter in a cone-beam computed tomography geometry. The estimator was coded using NVIDIA's CUDA environment for execution on an NVIDIA graphics processing unit. Performance of the analytical estimator was validated by comparison with high-count Monte Carlo simulations for two different numerical phantoms. Monoenergetic analytical simulations were compared with monoenergetic and polyenergetic Monte Carlo simulations. Analytical and Monte Carlo scatter estimates were compared both qualitatively, from visual inspection of images and profiles, and quantitatively, using a scaled root-mean-square difference metric. Reconstruction of simulated cone-beam projection data of an anthropomorphic breast phantom illustrated the potential of this method as a component of a scatter correction algorithm. The monoenergetic analytical and Monte Carlo scatter estimates showed very good agreement. The monoenergetic analytical estimates showed good agreement for Compton single scatter and reasonable agreement for Rayleigh single scatter when compared with polyenergetic Monte Carlo estimates. For a voxelized phantom with dimensions 128 × 128 × 128 voxels and a detector with 256 × 256 pixels, the analytical estimator required 669 seconds for a single projection, using a single NVIDIA 9800 GX2 video card. Accounting for first order scatter in cone-beam image reconstruction improves the contrast to noise ratio of the reconstructed images. The analytical scatter estimator, implemented using graphics processing units, provides rapid and accurate estimates of single scatter and with further acceleration and a method to account for multiple scatter may be useful for practical scatter correction schemes.
GPU acceleration for digitally reconstructed radiographs using bindless texture objects and CUDA/OpenGL interoperability.

Science.gov (United States)

Abdellah, Marwan; Eldeib, Ayman; Owis, Mohamed I

2015-01-01

This paper features an advanced implementation of the X-ray rendering algorithm that harnesses the giant computing power of the current commodity graphics processors to accelerate the generation of high resolution digitally reconstructed radiographs (DRRs). The presented pipeline exploits the latest features of NVIDIA Graphics Processing Unit (GPU) architectures, mainly bindless texture objects and dynamic parallelism. The rendering throughput is substantially improved by exploiting the interoperability mechanisms between CUDA and OpenGL. The benchmarks of our optimized rendering pipeline reflect its capability of generating DRRs with resolutions of 2048(2) and 4096(2) at interactive and semi interactive frame-rates using an NVIDIA GeForce 970 GTX device.
Stochastic first passage time accelerated with CUDA

Science.gov (United States)

Pierro, Vincenzo; Troiano, Luigi; Mejuto, Elena; Filatrella, Giovanni

2018-05-01

The numerical integration of stochastic trajectories to estimate the time to pass a threshold is an interesting physical quantity, for instance in Josephson junctions and atomic force microscopy, where the full trajectory is not accessible. We propose an algorithm suitable for efficient implementation on graphical processing unit in CUDA environment. The proposed approach for well balanced loads achieves almost perfect scaling with the number of available threads and processors, and allows an acceleration of about 400× with a GPU GTX980 respect to standard multicore CPU. This method allows with off the shell GPU to challenge problems that are otherwise prohibitive, as thermal activation in slowly tilted potentials. In particular, we demonstrate that it is possible to simulate the switching currents distributions of Josephson junctions in the timescale of actual experiments.
Image Quality Improvement on OpenGL-Based Animations by Using CUDA Architecture

Directory of Open Access Journals (Sweden)

Taner UÇKAN

2016-04-01

Full Text Available 2D or 3D rendering technology is used for graphically modelling many physical phenomena occurring in real life by means of the computers. On the other hand, the ever-increasing intensity of the graphics applications require that the image quality of the so-called modellings is enhanced and they are performed more quickly. In this direction, a new software and hardware-based architecture called CUDA has been introduced by Nvidia at the end of 2006. Thanks to this architecture, larger number of graphics processors has started contributing towards the parallel solutions of the general-purpose problems. In this study, this new parallel computing architecture is taken into consideration and an animation application consisting of humanoid robots with different behavioral characteristics is developed using the OpenGL library in C++. This animation is initially implemented on a single serial CPU and then parallelized using the CUDA architecture. Eventually, the serial and the parallel versions of the same animation are compared against each other on the basis of the number of image frames per second. The results reveal that the parallel application is by far the best yielding high quality images.
Graphics Processing Unit Enhanced Parallel Document Flocking Clustering

Energy Technology Data Exchange (ETDEWEB)

Cui, Xiaohui [ORNL; Potok, Thomas E [ORNL; ST Charles, Jesse Lee [ORNL

2010-01-01

Analyzing and clustering documents is a complex problem. One explored method of solving this problem borrows from nature, imitating the flocking behavior of birds. One limitation of this method of document clustering is its complexity O(n2). As the number of documents grows, it becomes increasingly difficult to generate results in a reasonable amount of time. In the last few years, the graphics processing unit (GPU) has received attention for its ability to solve highly-parallel and semi-parallel problems much faster than the traditional sequential processor. In this paper, we have conducted research to exploit this archi- tecture and apply its strengths to the flocking based document clustering problem. Using the CUDA platform from NVIDIA, we developed a doc- ument flocking implementation to be run on the NVIDIA GEFORCE GPU. Performance gains ranged from thirty-six to nearly sixty times improvement of the GPU over the CPU implementation.
Cuda Library for T2ournamint

Science.gov (United States)

2016-09-13

NVIDIA Corporation . [3] NVIDIA , “CUDA C Programming Guide,” 2007-2015 NVIDIA Corporation . [4] NVIDIA ...cuFFT User Guide,” 2015, NVIDIA Corporation , Retrieved from http://docs.nvidia.com/cuda/cufft/index.html#axzz49bBKJGXy. [5] NVIDIA , “Whitepaper, NVIDIA TESLA P100,” 2016, NVIDIA Corporation . ...performance and realize acceptable processing speeds we leverage the use of an accelerator, the NVIDIA Tesla K40
Initial Assessment of Parallelization of Monte Carlo Calculation using Graphics Processing Units

International Nuclear Information System (INIS)

Choi, Sung Hoon; Joo, Han Gyu

2009-01-01

Monte Carlo (MC) simulation is an effective tool for calculating neutron transports in complex geometry. However, because Monte Carlo simulates each neutron behavior one by one, it takes a very long computing time if enough neutrons are used for high precision of calculation. Accordingly, methods that reduce the computing time are required. In a Monte Carlo code, parallel calculation is well-suited since it simulates the behavior of each neutron independently and thus parallel computation is natural. The parallelization of the Monte Carlo codes, however, was done using multi CPUs. By the global demand for high quality 3D graphics, the Graphics Processing Unit (GPU) has developed into a highly parallel, multi-core processor. This parallel processing capability of GPUs can be available to engineering computing once a suitable interface is provided. Recently, NVIDIA introduced CUDATM, a general purpose parallel computing architecture. CUDA is a software environment that allows developers to manage GPU using C/C++ or other languages. In this work, a GPU-based Monte Carlo is developed and the initial assessment of it parallel performance is investigated
Accelerating solidification process simulation for large-sized system of liquid metal atoms using GPU with CUDA

Energy Technology Data Exchange (ETDEWEB)

Jie, Liang [School of Information Science and Engineering, Hunan University, Changshang, 410082 (China); Li, KenLi, E-mail: lkl@hnu.edu.cn [School of Information Science and Engineering, Hunan University, Changshang, 410082 (China); National Supercomputing Center in Changsha, 410082 (China); Shi, Lin [School of Information Science and Engineering, Hunan University, Changshang, 410082 (China); Liu, RangSu [School of Physics and Micro Electronic, Hunan University, Changshang, 410082 (China); Mei, Jing [School of Information Science and Engineering, Hunan University, Changshang, 410082 (China)

2014-01-15

Molecular dynamics simulation is a powerful tool to simulate and analyze complex physical processes and phenomena at atomic characteristic for predicting the natural time-evolution of a system of atoms. Precise simulation of physical processes has strong requirements both in the simulation size and computing timescale. Therefore, finding available computing resources is crucial to accelerate computation. However, a tremendous computational resource (GPGPU) are recently being utilized for general purpose computing due to its high performance of floating-point arithmetic operation, wide memory bandwidth and enhanced programmability. As for the most time-consuming component in MD simulation calculation during the case of studying liquid metal solidification processes, this paper presents a fine-grained spatial decomposition method to accelerate the computation of update of neighbor lists and interaction force calculation by take advantage of modern graphics processors units (GPU), enlarging the scale of the simulation system to a simulation system involving 10 000 000 atoms. In addition, a number of evaluations and tests, ranging from executions on different precision enabled-CUDA versions, over various types of GPU (NVIDIA 480GTX, 580GTX and M2050) to CPU clusters with different number of CPU cores are discussed. The experimental results demonstrate that GPU-based calculations are typically 9∼11 times faster than the corresponding sequential execution and approximately 1.5∼2 times faster than 16 CPU cores clusters implementations. On the basis of the simulated results, the comparisons between the theoretical results and the experimental ones are executed, and the good agreement between the two and more complete and larger cluster structures in the actual macroscopic materials are observed. Moreover, different nucleation and evolution mechanism of nano-clusters and nano-crystals formed in the processes of metal solidification is observed with large

CELES: CUDA-accelerated simulation of electromagnetic scattering by large ensembles of spheres

Science.gov (United States)

Egel, Amos; Pattelli, Lorenzo; Mazzamuto, Giacomo; Wiersma, Diederik S.; Lemmer, Uli

2017-09-01

CELES is a freely available MATLAB toolbox to simulate light scattering by many spherical particles. Aiming at high computational performance, CELES leverages block-diagonal preconditioning, a lookup-table approach to evaluate costly functions and massively parallel execution on NVIDIA graphics processing units using the CUDA computing platform. The combination of these techniques allows to efficiently address large electrodynamic problems (>104 scatterers) on inexpensive consumer hardware. In this paper, we validate near- and far-field distributions against the well-established multi-sphere T-matrix (MSTM) code and discuss the convergence behavior for ensembles of different sizes, including an exemplary system comprising 105 particles.
CUDAMPF: a multi-tiered parallel framework for accelerating protein sequence search in HMMER on CUDA-enabled GPU.

Science.gov (United States)

Jiang, Hanyu; Ganesan, Narayan

2016-02-27

HMMER software suite is widely used for analysis of homologous protein and nucleotide sequences with high sensitivity. The latest version of hmmsearch in HMMER 3.x, utilizes heuristic-pipeline which consists of MSV/SSV (Multiple/Single ungapped Segment Viterbi) stage, P7Viterbi stage and the Forward scoring stage to accelerate homology detection. Since the latest version is highly optimized for performance on modern multi-core CPUs with SSE capabilities, only a few acceleration attempts report speedup. However, the most compute intensive tasks within the pipeline (viz., MSV/SSV and P7Viterbi stages) still stand to benefit from the computational capabilities of massively parallel processors. A Multi-Tiered Parallel Framework (CUDAMPF) implemented on CUDA-enabled GPUs presented here, offers a finer-grained parallelism for MSV/SSV and Viterbi algorithms. We couple SIMT (Single Instruction Multiple Threads) mechanism with SIMD (Single Instructions Multiple Data) video instructions with warp-synchronism to achieve high-throughput processing and eliminate thread idling. We also propose a hardware-aware optimal allocation scheme of scarce resources like on-chip memory and caches in order to boost performance and scalability of CUDAMPF. In addition, runtime compilation via NVRTC available with CUDA 7.0 is incorporated into the presented framework that not only helps unroll innermost loop to yield upto 2 to 3-fold speedup than static compilation but also enables dynamic loading and switching of kernels depending on the query model size, in order to achieve optimal performance. CUDAMPF is designed as a hardware-aware parallel framework for accelerating computational hotspots within the hmmsearch pipeline as well as other sequence alignment applications. It achieves significant speedup by exploiting hierarchical parallelism on single GPU and takes full advantage of limited resources based on their own performance features. In addition to exceeding performance of other
Molecular Monte Carlo Simulations Using Graphics Processing Units: To Waste Recycle or Not?

Science.gov (United States)

Kim, Jihan; Rodgers, Jocelyn M; Athènes, Manuel; Smit, Berend

2011-10-11

In the waste recycling Monte Carlo (WRMC) algorithm, (1) multiple trial states may be simultaneously generated and utilized during Monte Carlo moves to improve the statistical accuracy of the simulations, suggesting that such an algorithm may be well posed for implementation in parallel on graphics processing units (GPUs). In this paper, we implement two waste recycling Monte Carlo algorithms in CUDA (Compute Unified Device Architecture) using uniformly distributed random trial states and trial states based on displacement random-walk steps, and we test the methods on a methane-zeolite MFI framework system to evaluate their utility. We discuss the specific implementation details of the waste recycling GPU algorithm and compare the methods to other parallel algorithms optimized for the framework system. We analyze the relationship between the statistical accuracy of our simulations and the CUDA block size to determine the efficient allocation of the GPU hardware resources. We make comparisons between the GPU and the serial CPU Monte Carlo implementations to assess speedup over conventional microprocessors. Finally, we apply our optimized GPU algorithms to the important problem of determining free energy landscapes, in this case for molecular motion through the zeolite LTA.
Research of the fast data processing method for the Infrared fourier transform imaging spectrometer based on CUDA architecture

Science.gov (United States)

Yu, Chunchao; Du, Debiao; Xia, Fei; Huang, Xiaobo; Zheng, Weijian; Yan, Min; Lei, Zhenggang

2017-10-01

The windowing static spectrometer has the advantage of high spectral resolution and high flux. Then combined the spectrometer reconstruction processing algorithms with the new computer technology CUDA, for the large spectral data and the suitable of being processed in parallel lines. Researched the parallel algorithms and programming including the cube data access, restructuring , filtering, mirroring and FFT. The results show that, compared with the traditional spectral reconstruction algorithms, CUDA-based spectral reconstruction has been greatly speeds up the spectral reconstruction.
Multidisciplinary Simulation Acceleration using Multiple Shared-Memory Graphical Processing Units

Science.gov (United States)

Kemal, Jonathan Yashar

For purposes of optimizing and analyzing turbomachinery and other designs, the unsteady Favre-averaged flow-field differential equations for an ideal compressible gas can be solved in conjunction with the heat conduction equation. We solve all equations using the finite-volume multiple-grid numerical technique, with the dual time-step scheme used for unsteady simulations. Our numerical solver code targets CUDA-capable Graphical Processing Units (GPUs) produced by NVIDIA. Making use of MPI, our solver can run across networked compute notes, where each MPI process can use either a GPU or a Central Processing Unit (CPU) core for primary solver calculations. We use NVIDIA Tesla C2050/C2070 GPUs based on the Fermi architecture, and compare our resulting performance against Intel Zeon X5690 CPUs. Solver routines converted to CUDA typically run about 10 times faster on a GPU for sufficiently dense computational grids. We used a conjugate cylinder computational grid and ran a turbulent steady flow simulation using 4 increasingly dense computational grids. Our densest computational grid is divided into 13 blocks each containing 1033x1033 grid points, for a total of 13.87 million grid points or 1.07 million grid points per domain block. To obtain overall speedups, we compare the execution time of the solver's iteration loop, including all resource intensive GPU-related memory copies. Comparing the performance of 8 GPUs to that of 8 CPUs, we obtain an overall speedup of about 6.0 when using our densest computational grid. This amounts to an 8-GPU simulation running about 39.5 times faster than running than a single-CPU simulation.
An Optimized Multicolor Point-Implicit Solver for Unstructured Grid Applications on Graphics Processing Units

Science.gov (United States)

Zubair, Mohammad; Nielsen, Eric; Luitjens, Justin; Hammond, Dana

2016-01-01

In the field of computational fluid dynamics, the Navier-Stokes equations are often solved using an unstructuredgrid approach to accommodate geometric complexity. Implicit solution methodologies for such spatial discretizations generally require frequent solution of large tightly-coupled systems of block-sparse linear equations. The multicolor point-implicit solver used in the current work typically requires a significant fraction of the overall application run time. In this work, an efficient implementation of the solver for graphics processing units is proposed. Several factors present unique challenges to achieving an efficient implementation in this environment. These include the variable amount of parallelism available in different kernel calls, indirect memory access patterns, low arithmetic intensity, and the requirement to support variable block sizes. In this work, the solver is reformulated to use standard sparse and dense Basic Linear Algebra Subprograms (BLAS) functions. However, numerical experiments show that the performance of the BLAS functions available in existing CUDA libraries is suboptimal for matrices representative of those encountered in actual simulations. Instead, optimized versions of these functions are developed. Depending on block size, the new implementations show performance gains of up to 7x over the existing CUDA library functions.
Accelerating Smith-Waterman Algorithm for Biological Database Search on CUDA-Compatible GPUs

Science.gov (United States)

Munekawa, Yuma; Ino, Fumihiko; Hagihara, Kenichi

This paper presents a fast method capable of accelerating the Smith-Waterman algorithm for biological database search on a cluster of graphics processing units (GPUs). Our method is implemented using compute unified device architecture (CUDA), which is available on the nVIDIA GPU. As compared with previous methods, our method has four major contributions. (1) The method efficiently uses on-chip shared memory to reduce the data amount being transferred between off-chip video memory and processing elements in the GPU. (2) It also reduces the number of data fetches by applying a data reuse technique to query and database sequences. (3) A pipelined method is also implemented to overlap GPU execution with database access. (4) Finally, a master/worker paradigm is employed to accelerate hundreds of database searches on a cluster system. In experiments, the peak performance on a GeForce GTX 280 card reaches 8.32 giga cell updates per second (GCUPS). We also find that our method reduces the amount of data fetches to 1/140, achieving approximately three times higher performance than a previous CUDA-based method. Our 32-node cluster version is approximately 28 times faster than a single GPU version. Furthermore, the effective performance reaches 75.6 giga instructions per second (GIPS) using 32 GeForce 8800 GTX cards.
Fast, multi-channel real-time processing of signals with microsecond latency using graphics processing units

Energy Technology Data Exchange (ETDEWEB)

Rath, N., E-mail: Nikolaus@rath.org; Levesque, J. P.; Mauel, M. E.; Navratil, G. A.; Peng, Q. [Department of Applied Physics and Applied Mathematics, Columbia University, 500 W 120th St, New York, New York 10027 (United States); Kato, S. [Department of Information Engineering, Nagoya University, Nagoya (Japan)

2014-04-15

Fast, digital signal processing (DSP) has many applications. Typical hardware options for performing DSP are field-programmable gate arrays (FPGAs), application-specific integrated DSP chips, or general purpose personal computer systems. This paper presents a novel DSP platform that has been developed for feedback control on the HBT-EP tokamak device. The system runs all signal processing exclusively on a Graphics Processing Unit (GPU) to achieve real-time performance with latencies below 8 μs. Signals are transferred into and out of the GPU using PCI Express peer-to-peer direct-memory-access transfers without involvement of the central processing unit or host memory. Tests were performed on the feedback control system of the HBT-EP tokamak using forty 16-bit floating point inputs and outputs each and a sampling rate of up to 250 kHz. Signals were digitized by a D-TACQ ACQ196 module, processing done on an NVIDIA GTX 580 GPU programmed in CUDA, and analog output was generated by D-TACQ AO32CPCI modules.
Fast, multi-channel real-time processing of signals with microsecond latency using graphics processing units

International Nuclear Information System (INIS)

Rath, N.; Levesque, J. P.; Mauel, M. E.; Navratil, G. A.; Peng, Q.; Kato, S.

2014-01-01

Fast, digital signal processing (DSP) has many applications. Typical hardware options for performing DSP are field-programmable gate arrays (FPGAs), application-specific integrated DSP chips, or general purpose personal computer systems. This paper presents a novel DSP platform that has been developed for feedback control on the HBT-EP tokamak device. The system runs all signal processing exclusively on a Graphics Processing Unit (GPU) to achieve real-time performance with latencies below 8 μs. Signals are transferred into and out of the GPU using PCI Express peer-to-peer direct-memory-access transfers without involvement of the central processing unit or host memory. Tests were performed on the feedback control system of the HBT-EP tokamak using forty 16-bit floating point inputs and outputs each and a sampling rate of up to 250 kHz. Signals were digitized by a D-TACQ ACQ196 module, processing done on an NVIDIA GTX 580 GPU programmed in CUDA, and analog output was generated by D-TACQ AO32CPCI modules
CUDA-Accelerated Geodesic Ray-Tracing for Fiber Tracking

Directory of Open Access Journals (Sweden)

Evert van Aart

2011-01-01

Full Text Available Diffusion Tensor Imaging (DTI allows to noninvasively measure the diffusion of water in fibrous tissue. By reconstructing the fibers from DTI data using a fiber-tracking algorithm, we can deduce the structure of the tissue. In this paper, we outline an approach to accelerating such a fiber-tracking algorithm using a Graphics Processing Unit (GPU. This algorithm, which is based on the calculation of geodesics, has shown promising results for both synthetic and real data, but is limited in its applicability by its high computational requirements. We present a solution which uses the parallelism offered by modern GPUs, in combination with the CUDA platform by NVIDIA, to significantly reduce the execution time of the fiber-tracking algorithm. Compared to a multithreaded CPU implementation of the same algorithm, our GPU mapping achieves a speedup factor of up to 40 times.
GPU accelerated fuzzy connected image segmentation by using CUDA.

Science.gov (United States)

Zhuge, Ying; Cao, Yong; Miller, Robert W

2009-01-01

Image segmentation techniques using fuzzy connectedness principles have shown their effectiveness in segmenting a variety of objects in several large applications in recent years. However, one problem of these algorithms has been their excessive computational requirements when processing large image datasets. Nowadays commodity graphics hardware provides high parallel computing power. In this paper, we present a parallel fuzzy connected image segmentation algorithm on Nvidia's Compute Unified Device Architecture (CUDA) platform for segmenting large medical image data sets. Our experiments based on three data sets with small, medium, and large data size demonstrate the efficiency of the parallel algorithm, which achieves a speed-up factor of 7.2x, 7.3x, and 14.4x, correspondingly, for the three data sets over the sequential implementation of fuzzy connected image segmentation algorithm on CPU.
High-throughput sequence alignment using Graphics Processing Units

Directory of Open Access Journals (Sweden)

Trapnell Cole

2007-12-01

Full Text Available Abstract Background The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and de novo genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies. Results This paper describes MUMmerGPU, an open-source high-throughput parallel pairwise local sequence alignment program that runs on commodity Graphics Processing Units (GPUs in common workstations. MUMmerGPU uses the new Compute Unified Device Architecture (CUDA from nVidia to align multiple query sequences against a single reference sequence stored as a suffix tree. By processing the queries in parallel on the highly parallel graphics card, MUMmerGPU achieves more than a 10-fold speedup over a serial CPU version of the sequence alignment kernel, and outperforms the exact alignment component of MUMmer on a high end CPU by 3.5-fold in total application time when aligning reads from recent sequencing projects using Solexa/Illumina, 454, and Sanger sequencing technologies. Conclusion MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU.
Porting and optimizing MAGFLOW on CUDA

Directory of Open Access Journals (Sweden)

Ciro Del Negro

2011-12-01

Full Text Available The MAGFLOW lava simulation model is a cellular automaton developed by the Sezione di Catania of the Istituto Nazionale di Geofisica e Vulcanologia (INGV and it represents the peak of the evolution of cell-based models for lava-flow simulation. The accuracy and adherence to reality achieved by the physics-based cell evolution of MAGFLOW comes at the cost of significant computational times for long-running simulations. The present study describes the efforts and results obtained by porting the original serial code to the parallel computational platforms offered by modern video cards, and in particular to the NVIDIA Compute Unified Device Architecture (CUDA. A number of optimization strategies that have been used to achieve optimal performance on a graphic processing units (GPU are also discussed. The actual benefits of running on the GPU rather than the central processing unit depends on the extent and duration of the simulated event; for large, long-running simulations, the GPU can be 70-to-80-times faster, while for short-lived eruptions with a small extents the speed improvements obtained are 40-to-50 times.
Massively Parallel Signal Processing using the Graphics Processing Unit for Real-Time Brain-Computer Interface Feature Extraction.

Science.gov (United States)

Wilson, J Adam; Williams, Justin C

2009-01-01

The clock speeds of modern computer processors have nearly plateaued in the past 5 years. Consequently, neural prosthetic systems that rely on processing large quantities of data in a short period of time face a bottleneck, in that it may not be possible to process all of the data recorded from an electrode array with high channel counts and bandwidth, such as electrocorticographic grids or other implantable systems. Therefore, in this study a method of using the processing capabilities of a graphics card [graphics processing unit (GPU)] was developed for real-time neural signal processing of a brain-computer interface (BCI). The NVIDIA CUDA system was used to offload processing to the GPU, which is capable of running many operations in parallel, potentially greatly increasing the speed of existing algorithms. The BCI system records many channels of data, which are processed and translated into a control signal, such as the movement of a computer cursor. This signal processing chain involves computing a matrix-matrix multiplication (i.e., a spatial filter), followed by calculating the power spectral density on every channel using an auto-regressive method, and finally classifying appropriate features for control. In this study, the first two computationally intensive steps were implemented on the GPU, and the speed was compared to both the current implementation and a central processing unit-based implementation that uses multi-threading. Significant performance gains were obtained with GPU processing: the current implementation processed 1000 channels of 250 ms in 933 ms, while the new GPU method took only 27 ms, an improvement of nearly 35 times.
The application of projected conjugate gradient solvers on graphical processing units

International Nuclear Information System (INIS)

Lin, Youzuo; Renaut, Rosemary

2011-01-01

Graphical processing units introduce the capability for large scale computation at the desktop. Presented numerical results verify that efficiencies and accuracies of basic linear algebra subroutines of all levels when implemented in CUDA and Jacket are comparable. But experimental results demonstrate that the basic linear algebra subroutines of level three offer the greatest potential for improving efficiency of basic numerical algorithms. We consider the solution of the multiple right hand side set of linear equations using Krylov subspace-based solvers. Thus, for the multiple right hand side case, it is more efficient to make use of a block implementation of the conjugate gradient algorithm, rather than to solve each system independently. Jacket is used for the implementation. Furthermore, including projection from one system to another improves efficiency. A relevant example, for which simulated results are provided, is the reconstruction of a three dimensional medical image volume acquired from a positron emission tomography scanner. Efficiency of the reconstruction is improved by using projection across nearby slices.
The application of projected conjugate gradient solvers on graphical processing units

Energy Technology Data Exchange (ETDEWEB)

Lin, Youzuo [Los Alamos National Laboratory; Renaut, Rosemary [ARIZONA STATE UNIV.

2011-01-26

Graphical processing units introduce the capability for large scale computation at the desktop. Presented numerical results verify that efficiencies and accuracies of basic linear algebra subroutines of all levels when implemented in CUDA and Jacket are comparable. But experimental results demonstrate that the basic linear algebra subroutines of level three offer the greatest potential for improving efficiency of basic numerical algorithms. We consider the solution of the multiple right hand side set of linear equations using Krylov subspace-based solvers. Thus, for the multiple right hand side case, it is more efficient to make use of a block implementation of the conjugate gradient algorithm, rather than to solve each system independently. Jacket is used for the implementation. Furthermore, including projection from one system to another improves efficiency. A relevant example, for which simulated results are provided, is the reconstruction of a three dimensional medical image volume acquired from a positron emission tomography scanner. Efficiency of the reconstruction is improved by using projection across nearby slices.
Accelerating the reconstruction of magnetic resonance imaging by three-dimensional dual-dictionary learning using CUDA.

Science.gov (United States)

Jiansen Li; Jianqi Sun; Ying Song; Yanran Xu; Jun Zhao

2014-01-01

An effective way to improve the data acquisition speed of magnetic resonance imaging (MRI) is using under-sampled k-space data, and dictionary learning method can be used to maintain the reconstruction quality. Three-dimensional dictionary trains the atoms in dictionary in the form of blocks, which can utilize the spatial correlation among slices. Dual-dictionary learning method includes a low-resolution dictionary and a high-resolution dictionary, for sparse coding and image updating respectively. However, the amount of data is huge for three-dimensional reconstruction, especially when the number of slices is large. Thus, the procedure is time-consuming. In this paper, we first utilize the NVIDIA Corporation's compute unified device architecture (CUDA) programming model to design the parallel algorithms on graphics processing unit (GPU) to accelerate the reconstruction procedure. The main optimizations operate in the dictionary learning algorithm and the image updating part, such as the orthogonal matching pursuit (OMP) algorithm and the k-singular value decomposition (K-SVD) algorithm. Then we develop another version of CUDA code with algorithmic optimization. Experimental results show that more than 324 times of speedup is achieved compared with the CPU-only codes when the number of MRI slices is 24.
Performance evaluation of throughput computing workloads using multi-core processors and graphics processors

Science.gov (United States)

Dave, Gaurav P.; Sureshkumar, N.; Blessy Trencia Lincy, S. S.

2017-11-01

Current trend in processor manufacturing focuses on multi-core architectures rather than increasing the clock speed for performance improvement. Graphic processors have become as commodity hardware for providing fast co-processing in computer systems. Developments in IoT, social networking web applications, big data created huge demand for data processing activities and such kind of throughput intensive applications inherently contains data level parallelism which is more suited for SIMD architecture based GPU. This paper reviews the architectural aspects of multi/many core processors and graphics processors. Different case studies are taken to compare performance of throughput computing applications using shared memory programming in OpenMP and CUDA API based programming.
Parallelized Seeded Region Growing Using CUDA

Directory of Open Access Journals (Sweden)

Seongjin Park

2014-01-01

Full Text Available This paper presents a novel method for parallelizing the seeded region growing (SRG algorithm using Compute Unified Device Architecture (CUDA technology, with intention to overcome the theoretical weakness of SRG algorithm of its computation time being directly proportional to the size of a segmented region. The segmentation performance of the proposed CUDA-based SRG is compared with SRG implementations on single-core CPUs, quad-core CPUs, and shader language programming, using synthetic datasets and 20 body CT scans. Based on the experimental results, the CUDA-based SRG outperforms the other three implementations, advocating that it can substantially assist the segmentation during massive CT screening tests.
Online measurement for geometrical parameters of wheel set based on structure light and CUDA parallel processing

Science.gov (United States)

Wu, Kaihua; Shao, Zhencheng; Chen, Nian; Wang, Wenjie

2018-01-01

The wearing degree of the wheel set tread is one of the main factors that influence the safety and stability of running train. Geometrical parameters mainly include flange thickness and flange height. Line structure laser light was projected on the wheel tread surface. The geometrical parameters can be deduced from the profile image. An online image acquisition system was designed based on asynchronous reset of CCD and CUDA parallel processing unit. The image acquisition was fulfilled by hardware interrupt mode. A high efficiency parallel segmentation algorithm based on CUDA was proposed. The algorithm firstly divides the image into smaller squares, and extracts the squares of the target by fusion of k_means and STING clustering image segmentation algorithm. Segmentation time is less than 0.97ms. A considerable acceleration ratio compared with the CPU serial calculation was obtained, which greatly improved the real-time image processing capacity. When wheel set was running in a limited speed, the system placed alone railway line can measure the geometrical parameters automatically. The maximum measuring speed is 120km/h.

CUDA vs. OpenCL: uma comparação teórica e tecnológica

Directory of Open Access Journals (Sweden)

Lauro Cássio Martins de Paula

2014-08-01

graphics cards from NVIDIA®, has been a reference and more used recently.Keywords: CUDA. OpenCL. GPU.
GAMUT: GPU accelerated microRNA analysis to uncover target genes through CUDA-miRanda

Science.gov (United States)

2014-01-01

Background Non-coding sequences such as microRNAs have important roles in disease processes. Computational microRNA target identification (CMTI) is becoming increasingly important since traditional experimental methods for target identification pose many difficulties. These methods are time-consuming, costly, and often need guidance from computational methods to narrow down candidate genes anyway. However, most CMTI methods are computationally demanding, since they need to handle not only several million query microRNA and reference RNA pairs, but also several million nucleotide comparisons within each given pair. Thus, the need to perform microRNA identification at such large scale has increased the demand for parallel computing. Methods Although most CMTI programs (e.g., the miRanda algorithm) are based on a modified Smith-Waterman (SW) algorithm, the existing parallel SW implementations (e.g., CUDASW++ 2.0/3.0, SWIPE) are unable to meet this demand in CMTI tasks. We present CUDA-miRanda, a fast microRNA target identification algorithm that takes advantage of massively parallel computing on Graphics Processing Units (GPU) using NVIDIA's Compute Unified Device Architecture (CUDA). CUDA-miRanda specifically focuses on the local alignment of short (i.e., ≤ 32 nucleotides) sequences against longer reference sequences (e.g., 20K nucleotides). Moreover, the proposed algorithm is able to report multiple alignments (up to 191 top scores) and the corresponding traceback sequences for any given (query sequence, reference sequence) pair. Results Speeds over 5.36 Giga Cell Updates Per Second (GCUPs) are achieved on a server with 4 NVIDIA Tesla M2090 GPUs. Compared to the original miRanda algorithm, which is evaluated on an Intel Xeon E5620@2.4 GHz CPU, the experimental results show up to 166 times performance gains in terms of execution time. In addition, we have verified that the exact same targets were predicted in both CUDA-miRanda and the original mi
GPU accelerated study of heat transfer and fluid flow by lattice Boltzmann method on CUDA

Science.gov (United States)

Ren, Qinlong

Lattice Boltzmann method (LBM) has been developed as a powerful numerical approach to simulate the complex fluid flow and heat transfer phenomena during the past two decades. As a mesoscale method based on the kinetic theory, LBM has several advantages compared with traditional numerical methods such as physical representation of microscopic interactions, dealing with complex geometries and highly parallel nature. Lattice Boltzmann method has been applied to solve various fluid behaviors and heat transfer process like conjugate heat transfer, magnetic and electric field, diffusion and mixing process, chemical reactions, multiphase flow, phase change process, non-isothermal flow in porous medium, microfluidics, fluid-structure interactions in biological system and so on. In addition, as a non-body-conformal grid method, the immersed boundary method (IBM) could be applied to handle the complex or moving geometries in the domain. The immersed boundary method could be coupled with lattice Boltzmann method to study the heat transfer and fluid flow problems. Heat transfer and fluid flow are solved on Euler nodes by LBM while the complex solid geometries are captured by Lagrangian nodes using immersed boundary method. Parallel computing has been a popular topic for many decades to accelerate the computational speed in engineering and scientific fields. Today, almost all the laptop and desktop have central processing units (CPUs) with multiple cores which could be used for parallel computing. However, the cost of CPUs with hundreds of cores is still high which limits its capability of high performance computing on personal computer. Graphic processing units (GPU) is originally used for the computer video cards have been emerged as the most powerful high-performance workstation in recent years. Unlike the CPUs, the cost of GPU with thousands of cores is cheap. For example, the GPU (GeForce GTX TITAN) which is used in the current work has 2688 cores and the price is only 1
Acceleration of Linear Finite-Difference Poisson-Boltzmann Methods on Graphics Processing Units.

Science.gov (United States)

Qi, Ruxi; Botello-Smith, Wesley M; Luo, Ray

2017-07-11

Electrostatic interactions play crucial roles in biophysical processes such as protein folding and molecular recognition. Poisson-Boltzmann equation (PBE)-based models have emerged as widely used in modeling these important processes. Though great efforts have been put into developing efficient PBE numerical models, challenges still remain due to the high dimensionality of typical biomolecular systems. In this study, we implemented and analyzed commonly used linear PBE solvers for the ever-improving graphics processing units (GPU) for biomolecular simulations, including both standard and preconditioned conjugate gradient (CG) solvers with several alternative preconditioners. Our implementation utilizes the standard Nvidia CUDA libraries cuSPARSE, cuBLAS, and CUSP. Extensive tests show that good numerical accuracy can be achieved given that the single precision is often used for numerical applications on GPU platforms. The optimal GPU performance was observed with the Jacobi-preconditioned CG solver, with a significant speedup over standard CG solver on CPU in our diversified test cases. Our analysis further shows that different matrix storage formats also considerably affect the efficiency of different linear PBE solvers on GPU, with the diagonal format best suited for our standard finite-difference linear systems. Further efficiency may be possible with matrix-free operations and integrated grid stencil setup specifically tailored for the banded matrices in PBE-specific linear systems.
Simple sorting algorithm test based on CUDA

OpenAIRE

Meng, Hongyu; Guo, Fangjin

2015-01-01

With the development of computing technology, CUDA has become a very important tool. In computer programming, sorting algorithm is widely used. There are many simple sorting algorithms such as enumeration sort, bubble sort and merge sort. In this paper, we test some simple sorting algorithm based on CUDA and draw some useful conclusions.
CUDArray: CUDA-based NumPy

DEFF Research Database (Denmark)

Larsen, Anders Boesen Lindbo

This technical report introduces CUDArray – a CUDA-accelerated subset of the NumPy library. The goal of CUDArray is to combine the ease of development from NumPy with the computational power of Nvidia GPUs in a lightweight and extensible framework. Since the motivation behind CUDArray is to facil......This technical report introduces CUDArray – a CUDA-accelerated subset of the NumPy library. The goal of CUDArray is to combine the ease of development from NumPy with the computational power of Nvidia GPUs in a lightweight and extensible framework. Since the motivation behind CUDArray...
permGPU: Using graphics processing units in RNA microarray association studies

Directory of Open Access Journals (Sweden)

George Stephen L

2010-06-01

Full Text Available Abstract Background Many analyses of microarray association studies involve permutation, bootstrap resampling and cross-validation, that are ideally formulated as embarrassingly parallel computing problems. Given that these analyses are computationally intensive, scalable approaches that can take advantage of multi-core processor systems need to be developed. Results We have developed a CUDA based implementation, permGPU, that employs graphics processing units in microarray association studies. We illustrate the performance and applicability of permGPU within the context of permutation resampling for a number of test statistics. An extensive simulation study demonstrates a dramatic increase in performance when using permGPU on an NVIDIA GTX 280 card compared to an optimized C/C++ solution running on a conventional Linux server. Conclusions permGPU is available as an open-source stand-alone application and as an extension package for the R statistical environment. It provides a dramatic increase in performance for permutation resampling analysis in the context of microarray association studies. The current version offers six test statistics for carrying out permutation resampling analyses for binary, quantitative and censored time-to-event traits.
Exploiting current-generation graphics hardware for synthetic-scene generation

Science.gov (United States)

Tanner, Michael A.; Keen, Wayne A.

2010-04-01

Increasing seeker frame rate and pixel count, as well as the demand for higher levels of scene fidelity, have driven scene generation software for hardware-in-the-loop (HWIL) and software-in-the-loop (SWIL) testing to higher levels of parallelization. Because modern PC graphics cards provide multiple computational cores (240 shader cores for a current NVIDIA Corporation GeForce and Quadro cards), implementation of phenomenology codes on graphics processing units (GPUs) offers significant potential for simultaneous enhancement of simulation frame rate and fidelity. To take advantage of this potential requires algorithm implementation that is structured to minimize data transfers between the central processing unit (CPU) and the GPU. In this paper, preliminary methodologies developed at the Kinetic Hardware In-The-Loop Simulator (KHILS) will be presented. Included in this paper will be various language tradeoffs between conventional shader programming, Compute Unified Device Architecture (CUDA) and Open Computing Language (OpenCL), including performance trades and possible pathways for future tool development.
Commercial Off-The-Shelf (COTS) Graphics Processing Board (GPB) Radiation Test Evaluation Report

Science.gov (United States)

Salazar, George A.; Steele, Glen F.

2013-01-01

Large round trip communications latency for deep space missions will require more onboard computational capabilities to enable the space vehicle to undertake many tasks that have traditionally been ground-based, mission control responsibilities. As a result, visual display graphics will be required to provide simpler vehicle situational awareness through graphical representations, as well as provide capabilities never before done in a space mission, such as augmented reality for in-flight maintenance or Telepresence activities. These capabilities will require graphics processors and associated support electronic components for high computational graphics processing. In an effort to understand the performance of commercial graphics card electronics operating in the expected radiation environment, a preliminary test was performed on five commercial offthe- shelf (COTS) graphics cards. This paper discusses the preliminary evaluation test results of five COTS graphics processing cards tested to the International Space Station (ISS) low earth orbit radiation environment. Three of the five graphics cards were tested to a total dose of 6000 rads (Si). The test articles, test configuration, preliminary results, and recommendations are discussed.
Performance evaluation for volumetric segmentation of multiple sclerosis lesions using MATLAB and computing engine in the graphical processing unit (GPU)

Science.gov (United States)

Le, Anh H.; Park, Young W.; Ma, Kevin; Jacobs, Colin; Liu, Brent J.

2010-03-01

Multiple Sclerosis (MS) is a progressive neurological disease affecting myelin pathways in the brain. Multiple lesions in the white matter can cause paralysis and severe motor disabilities of the affected patient. To solve the issue of inconsistency and user-dependency in manual lesion measurement of MRI, we have proposed a 3-D automated lesion quantification algorithm to enable objective and efficient lesion volume tracking. The computer-aided detection (CAD) of MS, written in MATLAB, utilizes K-Nearest Neighbors (KNN) method to compute the probability of lesions on a per-voxel basis. Despite the highly optimized algorithm of imaging processing that is used in CAD development, MS CAD integration and evaluation in clinical workflow is technically challenging due to the requirement of high computation rates and memory bandwidth in the recursive nature of the algorithm. In this paper, we present the development and evaluation of using a computing engine in the graphical processing unit (GPU) with MATLAB for segmentation of MS lesions. The paper investigates the utilization of a high-end GPU for parallel computing of KNN in the MATLAB environment to improve algorithm performance. The integration is accomplished using NVIDIA's CUDA developmental toolkit for MATLAB. The results of this study will validate the practicality and effectiveness of the prototype MS CAD in a clinical setting. The GPU method may allow MS CAD to rapidly integrate in an electronic patient record or any disease-centric health care system.
Massively parallel signal processing using the graphics processing unit for real-time brain-computer interface feature extraction

Directory of Open Access Journals (Sweden)

J. Adam Wilson

2009-07-01

Full Text Available The clock speeds of modern computer processors have nearly plateaued in the past five years. Consequently, neural prosthetic systems that rely on processing large quantities of data in a short period of time face a bottleneck, in that it may not be possible to process all of the data recorded from an electrode array with high channel counts and bandwidth, such as electrocorticographic grids or other implantable systems. Therefore, in this study a method of using the processing capabilities of a graphics card (GPU was developed for real-time neural signal processing of a brain-computer interface (BCI. The NVIDIA CUDA system was used to offload processing to the GPU, which is capable of running many operations in parallel, potentially greatly increasing the speed of existing algorithms. The BCI system records many channels of data, which are processed and translated into a control signal, such as the movement of a computer cursor. This signal processing chain involves computing a matrix-matrix multiplication (i.e., a spatial filter, followed by calculating the power spectral density on every channel using an auto-regressive method, and finally classifying appropriate features for control. In this study, the first two computationally-intensive steps were implemented on the GPU, and the speed was compared to both the current implementation and a CPU-based implementation that uses multi-threading. Significant performance gains were obtained with GPU processing: the current implementation processed 1000 channels in 933 ms, while the new GPU method took only 27 ms, an improvement of nearly 35 times.
Study of the acceleration of nuclide burnup calculation using GPU with CUDA

International Nuclear Information System (INIS)

Okui, S.; Ohoka, Y.; Tatsumi, M.

2009-01-01

The computation costs of neutronics calculation code become higher as physics models and methods are complicated. The degree of them in neutronics calculation tends to be limited due to available computing power. In order to open a door to the new world, use of GPU for general purpose computing, called GPGPU, has been studied [1]. GPU has multi-threads computing mechanism enabled with multi-processors which realize mush higher performance than CPUs. NVIDIA recently released the CUDA language for general purpose computation which is a C-like programming language. It is relatively easy to learn compared to the conventional ones used for GPGPU, such as OpenGL or CG. Therefore application of GPU to the numerical calculation became much easier. In this paper, we tried to accelerate nuclide burnup calculation, which is important to predict nuclides time dependence in the core, using GPU with CUDA. We chose the 4.-order Runge-Kutta method to solve the nuclide burnup equation. The nuclide burnup calculation and the 4.-order Runge-Kutta method were suitable to the first step of introduction CUDA into numerical calculation because these consist of simple operations of matrices and vectors of single precision where actual codes were written in the C++ language. Our experimental results showed that nuclide burnup calculations with GPU have possibility of speedup by factor of 100 compared to that with CPU. (authors)
Analysis of impact of general-purpose graphics processor units in supersonic flow modeling

Science.gov (United States)

Emelyanov, V. N.; Karpenko, A. G.; Kozelkov, A. S.; Teterina, I. V.; Volkov, K. N.; Yalozo, A. V.

2017-06-01

Computational methods are widely used in prediction of complex flowfields associated with off-normal situations in aerospace engineering. Modern graphics processing units (GPU) provide architectures and new programming models that enable to harness their large processing power and to design computational fluid dynamics (CFD) simulations at both high performance and low cost. Possibilities of the use of GPUs for the simulation of external and internal flows on unstructured meshes are discussed. The finite volume method is applied to solve three-dimensional unsteady compressible Euler and Navier-Stokes equations on unstructured meshes with high resolution numerical schemes. CUDA technology is used for programming implementation of parallel computational algorithms. Solutions of some benchmark test cases on GPUs are reported, and the results computed are compared with experimental and computational data. Approaches to optimization of the CFD code related to the use of different types of memory are considered. Speedup of solution on GPUs with respect to the solution on central processor unit (CPU) is compared. Performance measurements show that numerical schemes developed achieve 20-50 speedup on GPU hardware compared to CPU reference implementation. The results obtained provide promising perspective for designing a GPU-based software framework for applications in CFD.
Design and implementation of a hybrid MPI-CUDA model for the Smith-Waterman algorithm.

Science.gov (United States)

Khaled, Heba; Faheem, Hossam El Deen Mostafa; El Gohary, Rania

2015-01-01

This paper provides a novel hybrid model for solving the multiple pair-wise sequence alignment problem combining message passing interface and CUDA, the parallel computing platform and programming model invented by NVIDIA. The proposed model targets homogeneous cluster nodes equipped with similar Graphical Processing Unit (GPU) cards. The model consists of the Master Node Dispatcher (MND) and the Worker GPU Nodes (WGN). The MND distributes the workload among the cluster working nodes and then aggregates the results. The WGN performs the multiple pair-wise sequence alignments using the Smith-Waterman algorithm. We also propose a modified implementation to the Smith-Waterman algorithm based on computing the alignment matrices row-wise. The experimental results demonstrate a considerable reduction in the running time by increasing the number of the working GPU nodes. The proposed model achieved a performance of about 12 Giga cell updates per second when we tested against the SWISS-PROT protein knowledge base running on four nodes.
4kUHD H264 Wireless Live Video Streaming Using CUDA

Directory of Open Access Journals (Sweden)

A. O. Adeyemi-Ejeye

2014-01-01

Full Text Available Ultrahigh definition video streaming has been explored in recent years. Most recently the possibility of 4kUHD video streaming over wireless 802.11n was presented, using preencoded video. Live encoding for streaming using x264 has proven to be very slow. The use of parallel encoding has been explored to speed up the process using CUDA. However there hasnot been a parallel implementation for video streaming. We therefore present for the first time a novel implementation of 4kUHD live encoding for streaming over a wireless network at low bitrate indoors, using CUDA for parallel H264 encoding. Our experimental results are used to verify our claim.
High speed finite element simulations on the graphics card

Energy Technology Data Exchange (ETDEWEB)

Huthwaite, P.; Lowe, M. J. S. [Department of Mechanical Engineering, Imperial College, London, SW7 2AZ (United Kingdom)

2014-02-18

A software package is developed to perform explicit time domain finite element simulations of ultrasonic propagation on the graphical processing unit, using Nvidia’s CUDA. Of critical importance for this problem is the arrangement of nodes in memory, allowing data to be loaded efficiently and minimising communication between the independently executed blocks of threads. The initial stage of memory arrangement is partitioning the mesh; both a well established ‘greedy’ partitioner and a new, more efficient ‘aligned’ partitioner are investigated. A method is then developed to efficiently arrange the memory within each partition. The technique is compared to a commercial CPU equivalent, demonstrating an overall speedup of at least 100 for a non-destructive testing weld model.
High speed finite element simulations on the graphics card

International Nuclear Information System (INIS)

Huthwaite, P.; Lowe, M. J. S.

2014-01-01

A software package is developed to perform explicit time domain finite element simulations of ultrasonic propagation on the graphical processing unit, using Nvidia’s CUDA. Of critical importance for this problem is the arrangement of nodes in memory, allowing data to be loaded efficiently and minimising communication between the independently executed blocks of threads. The initial stage of memory arrangement is partitioning the mesh; both a well established ‘greedy’ partitioner and a new, more efficient ‘aligned’ partitioner are investigated. A method is then developed to efficiently arrange the memory within each partition. The technique is compared to a commercial CPU equivalent, demonstrating an overall speedup of at least 100 for a non-destructive testing weld model
Efficient molecular dynamics simulations with many-body potentials on graphics processing units

Science.gov (United States)

Fan, Zheyong; Chen, Wei; Vierimaa, Ville; Harju, Ari

2017-09-01

Graphics processing units have been extensively used to accelerate classical molecular dynamics simulations. However, there is much less progress on the acceleration of force evaluations for many-body potentials compared to pairwise ones. In the conventional force evaluation algorithm for many-body potentials, the force, virial stress, and heat current for a given atom are accumulated within different loops, which could result in write conflict between different threads in a CUDA kernel. In this work, we provide a new force evaluation algorithm, which is based on an explicit pairwise force expression for many-body potentials derived recently (Fan et al., 2015). In our algorithm, the force, virial stress, and heat current for a given atom can be accumulated within a single thread and is free of write conflicts. We discuss the formulations and algorithms and evaluate their performance. A new open-source code, GPUMD, is developed based on the proposed formulations. For the Tersoff many-body potential, the double precision performance of GPUMD using a Tesla K40 card is equivalent to that of the LAMMPS (Large-scale Atomic/Molecular Massively Parallel Simulator) molecular dynamics code running with about 100 CPU cores (Intel Xeon CPU X5670 @ 2.93 GHz).
Accelerating Monte Carlo simulations of photon transport in a voxelized geometry using a massively parallel graphics processing unit

International Nuclear Information System (INIS)

Badal, Andreu; Badano, Aldo

2009-01-01

Purpose: It is a known fact that Monte Carlo simulations of radiation transport are computationally intensive and may require long computing times. The authors introduce a new paradigm for the acceleration of Monte Carlo simulations: The use of a graphics processing unit (GPU) as the main computing device instead of a central processing unit (CPU). Methods: A GPU-based Monte Carlo code that simulates photon transport in a voxelized geometry with the accurate physics models from PENELOPE has been developed using the CUDA programming model (NVIDIA Corporation, Santa Clara, CA). Results: An outline of the new code and a sample x-ray imaging simulation with an anthropomorphic phantom are presented. A remarkable 27-fold speed up factor was obtained using a GPU compared to a single core CPU. Conclusions: The reported results show that GPUs are currently a good alternative to CPUs for the simulation of radiation transport. Since the performance of GPUs is currently increasing at a faster pace than that of CPUs, the advantages of GPU-based software are likely to be more pronounced in the future.
Accelerating Monte Carlo simulations of photon transport in a voxelized geometry using a massively parallel graphics processing unit

Energy Technology Data Exchange (ETDEWEB)

Badal, Andreu; Badano, Aldo [Division of Imaging and Applied Mathematics, OSEL, CDRH, U.S. Food and Drug Administration, Silver Spring, Maryland 20993-0002 (United States)

2009-11-15

Purpose: It is a known fact that Monte Carlo simulations of radiation transport are computationally intensive and may require long computing times. The authors introduce a new paradigm for the acceleration of Monte Carlo simulations: The use of a graphics processing unit (GPU) as the main computing device instead of a central processing unit (CPU). Methods: A GPU-based Monte Carlo code that simulates photon transport in a voxelized geometry with the accurate physics models from PENELOPE has been developed using the CUDA programming model (NVIDIA Corporation, Santa Clara, CA). Results: An outline of the new code and a sample x-ray imaging simulation with an anthropomorphic phantom are presented. A remarkable 27-fold speed up factor was obtained using a GPU compared to a single core CPU. Conclusions: The reported results show that GPUs are currently a good alternative to CPUs for the simulation of radiation transport. Since the performance of GPUs is currently increasing at a faster pace than that of CPUs, the advantages of GPU-based software are likely to be more pronounced in the future.

Development of a Monte Carlo software to photon transportation in voxel structures using graphic processing units

International Nuclear Information System (INIS)

Bellezzo, Murillo

2014-01-01

As the most accurate method to estimate absorbed dose in radiotherapy, Monte Carlo Method (MCM) has been widely used in radiotherapy treatment planning. Nevertheless, its efficiency can be improved for clinical routine applications. In this thesis, the CUBMC code is presented, a GPU-based MC photon transport algorithm for dose calculation under the Compute Unified Device Architecture (CUDA) platform. The simulation of physical events is based on the algorithm used in PENELOPE, and the cross section table used is the one generated by the MATERIAL routine, also present in PENELOPE code. Photons are transported in voxel-based geometries with different compositions. There are two distinct approaches used for transport simulation. The rst of them forces the photon to stop at every voxel frontier, the second one is the Woodcock method, where the photon ignores the existence of borders and travels in homogeneous fictitious media. The CUBMC code aims to be an alternative of Monte Carlo simulator code that, by using the capability of parallel processing of graphics processing units (GPU), provide high performance simulations in low cost compact machines, and thus can be applied in clinical cases and incorporated in treatment planning systems for radiotherapy. (author)
Applying of the NVIDIA CUDA to the video processing in the task of the roundwood volume estimation

Directory of Open Access Journals (Sweden)

Kruglov Artem

2016-01-01

Full Text Available The paper is devoted to the parallel computing. The algorithm for roundwood volume estimation had insufficient performance so it was decided to port its bottleneck part on the GPU. The analysis of various GPGPU techniques was observed and the NVIDIA CUDA technology was chosen for implementation. The results of the research have shown the high potential of the GPU implementation in the improvement performance of the computation. The speedup of the algorithm for the roundwood volume estimation is more than 300% after porting on GPU with implementation of the CUDA technology. This helps to apply the machine vision algorithm in real-time system.
The gputools package enables GPU computing in R.

Science.gov (United States)

Buckner, Joshua; Wilson, Justin; Seligman, Mark; Athey, Brian; Watson, Stanley; Meng, Fan

2010-01-01

By default, the R statistical environment does not make use of parallelism. Researchers may resort to expensive solutions such as cluster hardware for large analysis tasks. Graphics processing units (GPUs) provide an inexpensive and computationally powerful alternative. Using R and the CUDA toolkit from Nvidia, we have implemented several functions commonly used in microarray gene expression analysis for GPU-equipped computers. R users can take advantage of the better performance provided by an Nvidia GPU. The package is available from CRAN, the R project's repository of packages, at http://cran.r-project.org/web/packages/gputools More information about our gputools R package is available at http://brainarray.mbni.med.umich.edu/brainarray/Rgpgpu
Optimasi Rendering Game 2D Asteroids Menggunakan Pemrograman CUDA

Directory of Open Access Journals (Sweden)

Fathony Teguh Irawan

2017-12-01

There are many sources for having fun, one of them is through video game. Public interest on video game is proven by the large number of video game user. Therefore, the performance of video game is considered to expand the market. One of many ways to improve performance is using GPU processing. The way to prove that GPU processing is faster than CPU processing on parallel process is by comparing the result of GPU processing and CPU processing. This paper describes the differences in performance of video game that is implemented using GPU approach and CPU approach. Keywords: games, video game, game development, CPU, GPU, CUDA, optimization, analysis
A TBB-CUDA Implementation for Background Removal in a Video-Based Fire Detection System

Directory of Open Access Journals (Sweden)

Fan Wang

2014-01-01

Full Text Available This paper presents a parallel TBB-CUDA implementation for the acceleration of single-Gaussian distribution model, which is effective for background removal in the video-based fire detection system. In this framework, TBB mainly deals with initializing work of the estimated Gaussian model running on CPU, and CUDA performs background removal and adaption of the model running on GPU. This implementation can exploit the combined computation power of TBB-CUDA, which can be applied to the real-time environment. Over 220 video sequences are utilized in the experiments. The experimental results illustrate that TBB+CUDA can achieve a higher speedup than both TBB and CUDA. The proposed framework can effectively overcome the disadvantages of limited memory bandwidth and few execution units of CPU, and it reduces data transfer latency and memory latency between CPU and GPU.
Graphical Language for Data Processing

Science.gov (United States)

Alphonso, Keith

2011-01-01

A graphical language for processing data allows processing elements to be connected with virtual wires that represent data flows between processing modules. The processing of complex data, such as lidar data, requires many different algorithms to be applied. The purpose of this innovation is to automate the processing of complex data, such as LIDAR, without the need for complex scripting and programming languages. The system consists of a set of user-interface components that allow the user to drag and drop various algorithmic and processing components onto a process graph. By working graphically, the user can completely visualize the process flow and create complex diagrams. This innovation supports the nesting of graphs, such that a graph can be included in another graph as a single step for processing. In addition to the user interface components, the system includes a set of .NET classes that represent the graph internally. These classes provide the internal system representation of the graphical user interface. The system includes a graph execution component that reads the internal representation of the graph (as described above) and executes that graph. The execution of the graph follows the interpreted model of execution in that each node is traversed and executed from the original internal representation. In addition, there are components that allow external code elements, such as algorithms, to be easily integrated into the system, thus making the system infinitely expandable.
Introduction to assembly of finite element methods on graphics processors

International Nuclear Information System (INIS)

Cecka, Cristopher; Lew, Adrian; Darve, Eric

2010-01-01

Recently, graphics processing units (GPUs) have had great success in accelerating numerical computations. We present their application to computations on unstructured meshes such as those in finite element methods. Multiple approaches in assembling and solving sparse linear systems with NVIDIA GPUs and the Compute Unified Device Architecture (CUDA) are presented and discussed. Multiple strategies for efficient use of global, shared, and local memory, methods to achieve memory coalescing, and optimal choice of parameters are introduced. We find that with appropriate preprocessing and arrangement of support data, the GPU coprocessor achieves speedups of 30x or more in comparison to a well optimized serial implementation on the CPU. We also find that the optimal assembly strategy depends on the order of polynomials used in the finite-element discretization.
Graphic Design in Libraries: A Conceptual Process

Science.gov (United States)

Ruiz, Miguel

2014-01-01

Providing successful library services requires efficient and effective communication with users; therefore, it is important that content creators who develop visual materials understand key components of design and, specifically, develop a holistic graphic design process. Graphic design, as a form of visual communication, is the process of…
A Visual Approach to Investigating Shared and Global Memory Behavior of CUDA Kernels

KAUST Repository

Rosen, Paul

2013-01-01

We present an approach to investigate the memory behavior of a parallel kernel executing on thousands of threads simultaneously within the CUDA architecture. Our top-down approach allows for quickly identifying any significant differences between the execution of the many blocks and warps. As interesting warps are identified, we allow further investigation of memory behavior by visualizing the shared memory bank conflicts and global memory coalescence, first with an overview of a single warp with many operations and, subsequently, with a detailed view of a single warp and a single operation. We demonstrate the strength of our approach in the context of a parallel matrix transpose kernel and a parallel 1D Haar Wavelet transform kernel. © 2013 The Author(s) Computer Graphics Forum © 2013 The Eurographics Association and Blackwell Publishing Ltd.
A Visual Approach to Investigating Shared and Global Memory Behavior of CUDA Kernels

KAUST Repository

Rosen, Paul

2013-06-01

We present an approach to investigate the memory behavior of a parallel kernel executing on thousands of threads simultaneously within the CUDA architecture. Our top-down approach allows for quickly identifying any significant differences between the execution of the many blocks and warps. As interesting warps are identified, we allow further investigation of memory behavior by visualizing the shared memory bank conflicts and global memory coalescence, first with an overview of a single warp with many operations and, subsequently, with a detailed view of a single warp and a single operation. We demonstrate the strength of our approach in the context of a parallel matrix transpose kernel and a parallel 1D Haar Wavelet transform kernel. © 2013 The Author(s) Computer Graphics Forum © 2013 The Eurographics Association and Blackwell Publishing Ltd.
Finite Temperature Lattice QCD with GPUs

International Nuclear Information System (INIS)

Cardoso, N.; Cardoso, M.; Bicudo, P.

2011-01-01

Graphics Processing Units (GPUs) are being used in many areas of physics, since the performance versus cost is very attractive. The GPUs can be addressed by CUDA which is a NVIDIA's parallel computing architecture. It enables dramatic increases in computing performance by harnessing the power of the GPU. We present a performance comparison between the GPU and CPU with single precision and double precision in generating lattice SU(2) configurations. Analyses with single and multiple GPUs, using CUDA and OPENMP, are also presented. We also present SU(2) results for the renormalized Polyakov loop, colour averaged free energy and the string tension as a function of the temperature. (authors)
[Research on fast implementation method of image Gaussian RBF interpolation based on CUDA].

Science.gov (United States)

Chen, Hao; Yu, Haizhong

2014-04-01

Image interpolation is often required during medical image processing and analysis. Although interpolation method based on Gaussian radial basis function (GRBF) has high precision, the long calculation time still limits its application in field of image interpolation. To overcome this problem, a method of two-dimensional and three-dimensional medical image GRBF interpolation based on computing unified device architecture (CUDA) is proposed in this paper. According to single instruction multiple threads (SIMT) executive model of CUDA, various optimizing measures such as coalesced access and shared memory are adopted in this study. To eliminate the edge distortion of image interpolation, natural suture algorithm is utilized in overlapping regions while adopting data space strategy of separating 2D images into blocks or dividing 3D images into sub-volumes. Keeping a high interpolation precision, the 2D and 3D medical image GRBF interpolation achieved great acceleration in each basic computing step. The experiments showed that the operative efficiency of image GRBF interpolation based on CUDA platform was obviously improved compared with CPU calculation. The present method is of a considerable reference value in the application field of image interpolation.
Assembly of finite element methods on graphics processors

KAUST Repository

Cecka, Cris

2010-08-23

Recently, graphics processing units (GPUs) have had great success in accelerating many numerical computations. We present their application to computations on unstructured meshes such as those in finite element methods. Multiple approaches in assembling and solving sparse linear systems with NVIDIA GPUs and the Compute Unified Device Architecture (CUDA) are created and analyzed. Multiple strategies for efficient use of global, shared, and local memory, methods to achieve memory coalescing, and optimal choice of parameters are introduced. We find that with appropriate preprocessing and arrangement of support data, the GPU coprocessor using single-precision arithmetic achieves speedups of 30 or more in comparison to a well optimized double-precision single core implementation. We also find that the optimal assembly strategy depends on the order of polynomials used in the finite element discretization. © 2010 John Wiley & Sons, Ltd.
ACL2 Meets the GPU: Formalizing a CUDA-based Parallelizable All-Pairs Shortest Path Algorithm in ACL2

Directory of Open Access Journals (Sweden)

David S. Hardin

2013-04-01

Full Text Available As Graphics Processing Units (GPUs have gained in capability and GPU development environments have matured, developers are increasingly turning to the GPU to off-load the main host CPU of numerically-intensive, parallelizable computations. Modern GPUs feature hundreds of cores, and offer programming niceties such as double-precision floating point, and even limited recursion. This shift from CPU to GPU, however, raises the question: how do we know that these new GPU-based algorithms are correct? In order to explore this new verification frontier, we formalized a parallelizable all-pairs shortest path (APSP algorithm for weighted graphs, originally coded in NVIDIA's CUDA language, in ACL2. The ACL2 specification is written using a single-threaded object (stobj and tail recursion, as the stobj/tail recursion combination yields the most straightforward translation from imperative programming languages, as well as efficient, scalable executable specifications within ACL2 itself. The ACL2 version of the APSP algorithm can process millions of vertices and edges with little to no garbage generation, and executes at one-sixth the speed of a host-based version of APSP coded in C – a very respectable result for a theorem prover. In addition to formalizing the APSP algorithm (which uses Dijkstra's shortest path algorithm at its core, we have also provided capability that the original APSP code lacked, namely shortest path recovery. Path recovery is accomplished using a secondary ACL2 stobj implementing a LIFO stack, which is proven correct. To conclude the experiment, we ported the ACL2 version of the APSP kernels back to C, resulting in a less than 5% slowdown, and also performed a partial back-port to CUDA, which, surprisingly, yielded a slight performance increase.
IMPROVING THE PERFORMANCE OF THE LINEAR SYSTEMS SOLVERS USING CUDA

Directory of Open Access Journals (Sweden)

BOGDAN OANCEA

2012-05-01

Full Text Available Parallel computing can offer an enormous advantage regarding the performance for very large applications in almost any field: scientific computing, computer vision, databases, data mining, and economics. GPUs are high performance many-core processors that can obtain very high FLOP rates. Since the first idea of using GPU for general purpose computing, things have evolved and now there are several approaches to GPU programming: CUDA from NVIDIA and Stream from AMD. CUDA is now a popular programming model for general purpose computations on GPU for C/C++ programmers. A great number of applications were ported to CUDA programming model and they obtain speedups of orders of magnitude comparing to optimized CPU implementations. In this paper we present an implementation of a library for solving linear systems using the CCUDA framework. We present the results of performance tests and show that using GPU one can obtain speedups of about of approximately 80 times comparing with a CPU implementation.
Efficient CUDA Polynomial Preconditioned Conjugate Gradient Solver for Finite Element Computation of Elasticity Problems

Directory of Open Access Journals (Sweden)

Jianfei Zhang

2013-01-01

Full Text Available Graphics processing unit (GPU has obtained great success in scientific computations for its tremendous computational horsepower and very high memory bandwidth. This paper discusses the efficient way to implement polynomial preconditioned conjugate gradient solver for the finite element computation of elasticity on NVIDIA GPUs using compute unified device architecture (CUDA. Sliced block ELLPACK (SBELL format is introduced to store sparse matrix arising from finite element discretization of elasticity with fewer padding zeros than traditional ELLPACK-based formats. Polynomial preconditioning methods have been investigated both in convergence and running time. From the overall performance, the least-squares (L-S polynomial method is chosen as a preconditioner in PCG solver to finite element equations derived from elasticity for its best results on different example meshes. In the PCG solver, mixed precision algorithm is used not only to reduce the overall computational, storage requirements and bandwidth but to make full use of the capacity of the GPU devices. With SBELL format and mixed precision algorithm, the GPU-based L-S preconditioned CG can get a speedup of about 7–9 to CPU-implementation.
Flocking-based Document Clustering on the Graphics Processing Unit

Energy Technology Data Exchange (ETDEWEB)

Cui, Xiaohui [ORNL; Potok, Thomas E [ORNL; Patton, Robert M [ORNL; ST Charles, Jesse Lee [ORNL

2008-01-01

Abstract?Analyzing and grouping documents by content is a complex problem. One explored method of solving this problem borrows from nature, imitating the flocking behavior of birds. Each bird represents a single document and flies toward other documents that are similar to it. One limitation of this method of document clustering is its complexity O(n2). As the number of documents grows, it becomes increasingly difficult to receive results in a reasonable amount of time. However, flocking behavior, along with most naturally inspired algorithms such as ant colony optimization and particle swarm optimization, are highly parallel and have found increased performance on expensive cluster computers. In the last few years, the graphics processing unit (GPU) has received attention for its ability to solve highly-parallel and semi-parallel problems much faster than the traditional sequential processor. Some applications see a huge increase in performance on this new platform. The cost of these high-performance devices is also marginal when compared with the price of cluster machines. In this paper, we have conducted research to exploit this architecture and apply its strengths to the document flocking problem. Our results highlight the potential benefit the GPU brings to all naturally inspired algorithms. Using the CUDA platform from NIVIDA? we developed a document flocking implementation to be run on the NIVIDA?GEFORCE 8800. Additionally, we developed a similar but sequential implementation of the same algorithm to be run on a desktop CPU. We tested the performance of each on groups of news articles ranging in size from 200 to 3000 documents. The results of these tests were very significant. Performance gains ranged from three to nearly five times improvement of the GPU over the CPU implementation. This dramatic improvement in runtime makes the GPU a potentially revolutionary platform for document clustering algorithms.
Parallelizing flow-accumulation calculations on graphics processing units—From iterative DEM preprocessing algorithm to recursive multiple-flow-direction algorithm

Science.gov (United States)

Qin, Cheng-Zhi; Zhan, Lijun

2012-06-01

As one of the important tasks in digital terrain analysis, the calculation of flow accumulations from gridded digital elevation models (DEMs) usually involves two steps in a real application: (1) using an iterative DEM preprocessing algorithm to remove the depressions and flat areas commonly contained in real DEMs, and (2) using a recursive flow-direction algorithm to calculate the flow accumulation for every cell in the DEM. Because both algorithms are computationally intensive, quick calculation of the flow accumulations from a DEM (especially for a large area) presents a practical challenge to personal computer (PC) users. In recent years, rapid increases in hardware capacity of the graphics processing units (GPUs) provided in modern PCs have made it possible to meet this challenge in a PC environment. Parallel computing on GPUs using a compute-unified-device-architecture (CUDA) programming model has been explored to speed up the execution of the single-flow-direction algorithm (SFD). However, the parallel implementation on a GPU of the multiple-flow-direction (MFD) algorithm, which generally performs better than the SFD algorithm, has not been reported. Moreover, GPU-based parallelization of the DEM preprocessing step in the flow-accumulation calculations has not been addressed. This paper proposes a parallel approach to calculate flow accumulations (including both iterative DEM preprocessing and a recursive MFD algorithm) on a CUDA-compatible GPU. For the parallelization of an MFD algorithm (MFD-md), two different parallelization strategies using a GPU are explored. The first parallelization strategy, which has been used in the existing parallel SFD algorithm on GPU, has the problem of computing redundancy. Therefore, we designed a parallelization strategy based on graph theory. The application results show that the proposed parallel approach to calculate flow accumulations on a GPU performs much faster than either sequential algorithms or other parallel GPU
Graphics Processing Unit Accelerated Hirsch-Fye Quantum Monte Carlo

Science.gov (United States)

Moore, Conrad; Abu Asal, Sameer; Rajagoplan, Kaushik; Poliakoff, David; Caprino, Joseph; Tomko, Karen; Thakur, Bhupender; Yang, Shuxiang; Moreno, Juana; Jarrell, Mark

2012-02-01

In Dynamical Mean Field Theory and its cluster extensions, such as the Dynamic Cluster Algorithm, the bottleneck of the algorithm is solving the self-consistency equations with an impurity solver. Hirsch-Fye Quantum Monte Carlo is one of the most commonly used impurity and cluster solvers. This work implements optimizations of the algorithm, such as enabling large data re-use, suitable for the Graphics Processing Unit (GPU) architecture. The GPU's sheer number of concurrent parallel computations and large bandwidth to many shared memories takes advantage of the inherent parallelism in the Green function update and measurement routines, and can substantially improve the efficiency of the Hirsch-Fye impurity solver.
A Large Scale, High Resolution Agent-Based Insurgency Model

Science.gov (United States)

2013-09-30

CUDA) is NVIDIA Corporation’s software development model for General Purpose Programming on Graphics Processing Units (GPGPU) ( NVIDIA Corporation ...Conference. Argonne National Laboratory, Argonne, IL, October, 2005. NVIDIA Corporation . NVIDIA CUDA Programming Guide 2.0 [Online]. NVIDIA Corporation

GPU-Based FFT Computation for Multi-Gigabit WirelessHD Baseband Processing

Directory of Open Access Journals (Sweden)

Nicholas Hinitt

2010-01-01

Full Text Available The next generation Graphics Processing Units (GPUs are being considered for non-graphics applications. Millimeter wave (60 Ghz wireless networks that are capable of multi-gigabit per second (Gbps transfer rates require a significant baseband throughput. In this work, we consider the baseband of WirelessHD, a 60 GHz communications system, which can provide a data rate of up to 3.8 Gbps over a short range wireless link. Thus, we explore the feasibility of achieving gigabit baseband throughput using the GPUs. One of the most computationally intensive functions commonly used in baseband communications, the Fast Fourier Transform (FFT algorithm, is implemented on an NVIDIA GPU using their general-purpose computing platform called the Compute Unified Device Architecture (CUDA. The paper, first, investigates the implementation of an FFT algorithm using the GPU hardware and exploiting the computational capability available. It then outlines the limitations discovered and the methods used to overcome these challenges. Finally a new algorithm to compute FFT is proposed, which reduces interprocessor communication. It is further optimized by improving memory access, enabling the processing rate to exceed 4 Gbps, achieving a processing time of a 512-point FFT in less than 200 ns using a two-GPU solution.
Argo_CUDA: Exhaustive GPU based approach for motif discovery in large DNA datasets.

Science.gov (United States)

Vishnevsky, Oleg V; Bocharnikov, Andrey V; Kolchanov, Nikolay A

2018-02-01

The development of chromatin immunoprecipitation sequencing (ChIP-seq) technology has revolutionized the genetic analysis of the basic mechanisms underlying transcription regulation and led to accumulation of information about a huge amount of DNA sequences. There are a lot of web services which are currently available for de novo motif discovery in datasets containing information about DNA/protein binding. An enormous motif diversity makes their finding challenging. In order to avoid the difficulties, researchers use different stochastic approaches. Unfortunately, the efficiency of the motif discovery programs dramatically declines with the query set size increase. This leads to the fact that only a fraction of top "peak" ChIP-Seq segments can be analyzed or the area of analysis should be narrowed. Thus, the motif discovery in massive datasets remains a challenging issue. Argo_Compute Unified Device Architecture (CUDA) web service is designed to process the massive DNA data. It is a program for the detection of degenerate oligonucleotide motifs of fixed length written in 15-letter IUPAC code. Argo_CUDA is a full-exhaustive approach based on the high-performance GPU technologies. Compared with the existing motif discovery web services, Argo_CUDA shows good prediction quality on simulated sets. The analysis of ChIP-Seq sequences revealed the motifs which correspond to known transcription factor binding sites.
Accelerating atomistic calculations of quantum energy eigenstates on graphic cards

Science.gov (United States)

Rodrigues, Walter; Pecchia, A.; Lopez, M.; Auf der Maur, M.; Di Carlo, A.

2014-10-01

Electronic properties of nanoscale materials require the calculation of eigenvalues and eigenvectors of large matrices. This bottleneck can be overcome by parallel computing techniques or the introduction of faster algorithms. In this paper we report a custom implementation of the Lanczos algorithm with simple restart, optimized for graphical processing units (GPUs). The whole algorithm has been developed using CUDA and runs entirely on the GPU, with a specialized implementation that spares memory and reduces at most machine-to-device data transfers. Furthermore parallel distribution over several GPUs has been attained using the standard message passing interface (MPI). Benchmark calculations performed on a GaN/AlGaN wurtzite quantum dot with up to 600,000 atoms are presented. The empirical tight-binding (ETB) model with an sp3d5s∗+spin-orbit parametrization has been used to build the system Hamiltonian (H).
Accelerating Calculations of Reaction Dissipative Particle Dynamics in LAMMPS

Science.gov (United States)

2017-05-17

HPC) resources and exploit emerging, heterogeneous architectures (e.g., co-processors and graphics processing units [GPUs]), while enabling EM...2 ODE solvers—CVODE* and RKF45—which we previously developed for NVIDIA Compute Unified Device Architecture (CUDA) GPUs.9 The CPU versions of both...nodes. Half of the accelerator nodes (178) have 2 NVIDIA Kepler K40m GPUs and the remaining 178 accelerator nodes have 2 Intel Xeon Phi 7120P co
CUDA-Sankoff

DEFF Research Database (Denmark)

Sundfeld, Daniel; Havgaard, Jakob H.; Gorodkin, Jan

2017-01-01

In this paper, we propose and evaluate CUDASankoff, a solution to the RNA structural alignment problem based on the Sankoff algorithm in Graphics Processing Units (GPUS). To our knowledge, this is the first time the Sankoff algorithm is implemented in GPU. In our solution, we show how to lineariz...... to 24 times faster than a 16-core CPU solution in the 281 nucleotide Sankoff execution....
IMPLEMENTATION OF OBJECT TRACKING ALGORITHMS ON THE BASIS OF CUDA TECHNOLOGY

Directory of Open Access Journals (Sweden)

B. A. Zalesky

2014-01-01

Full Text Available A fast version of correlation algorithm to track objects on video-sequences made by a nonstabilized camcorder is presented. The algorithm is based on comparison of local correlations of the object image and regions of video-frames. The algorithm is implemented in programming technology CUDA. Application of CUDA allowed to attain real time execution of the algorithm. To improve its precision and stability, a robust version of the Kalman filter has been incorporated into the flowchart. Tests showed applicability of the algorithm to practical object tracking.
Iterative Methods for MPC on Graphical Processing Units

DEFF Research Database (Denmark)

Gade-Nielsen, Nicolai Fog; Jørgensen, John Bagterp; Dammann, Bernd

2012-01-01

The high oating point performance and memory bandwidth of Graphical Processing Units (GPUs) makes them ideal for a large number of computations which often arises in scientic computing, such as matrix operations. GPUs achieve this performance by utilizing massive par- allelism, which requires ree...... as to avoid the use of dense matrices, which may be too large for the limited memory capacity of current graphics cards.......The high oating point performance and memory bandwidth of Graphical Processing Units (GPUs) makes them ideal for a large number of computations which often arises in scientic computing, such as matrix operations. GPUs achieve this performance by utilizing massive par- allelism, which requires...
A Relational Reasoning Approach to Text-Graphic Processing

Science.gov (United States)

Danielson, Robert W.; Sinatra, Gale M.

2017-01-01

We propose that research on text-graphic processing could be strengthened by the inclusion of relational reasoning perspectives. We briefly outline four aspects of relational reasoning: "analogies," "anomalies," "antinomies", and "antitheses". Next, we illustrate how text-graphic researchers have been…
Identification of Learning Processes by Means of Computer Graphics.

Science.gov (United States)

Sorensen, Birgitte Holm

1993-01-01

Describes a development project for the use of computer graphics and video in connection with an inservice training course for primary education teachers in Denmark. Topics addressed include research approaches to computers; computer graphics in learning processes; activities relating to computer graphics; the role of the teacher; and student…
Projector and backprojector for iterative CT reconstruction with blobs using CUDA

Energy Technology Data Exchange (ETDEWEB)

Bippus, Rolf-Dieter; Koehler, Thomas; Bergner, Frank; Brendel, Bernhard; Proksa, Roland [Philips Research Laboratories, Hamburg (Germany); Hansis, Eberhard [Philips Healthcare, Nuclear Medicine, San Jose, CA (United States)

2011-07-01

Using blobs allows modeling the CT system's geometry more correctly within an iterative reconstruction framework. However their application comes with an increased computational demand. This led us to use blobs for image representation and a dedicated GPU hardware implementation to counteract their computational demand. Making extensive use of the texture interpolation capabilities of CUDA and implementing an asymmetric projector/backprojector pair we achieve reasonable processing times and good system modeling at the same time. (orig.)
A comparative study of history-based versus vectorized Monte Carlo methods in the GPU/CUDA environment for a simple neutron eigenvalue problem

International Nuclear Information System (INIS)

Liu, T.; Du, X.; Ji, W.; Xu, G.; Brown, F.B.

2013-01-01

For nuclear reactor analysis such as the neutron eigenvalue calculations, the time consuming Monte Carlo (MC) simulations can be accelerated by using graphics processing units (GPUs). However, traditional MC methods are often history-based, and their performance on GPUs is affected significantly by the thread divergence problem. In this paper we describe the development of a newly designed event-based vectorized MC algorithm for solving the neutron eigenvalue problem. The code was implemented using NVIDIA's Compute Unified Device Architecture (CUDA), and tested on a NVIDIA Tesla M2090 GPU card. We found that although the vectorized MC algorithm greatly reduces the occurrence of thread divergence thus enhancing the warp execution efficiency, the overall simulation speed is roughly ten times slower than the history-based MC code on GPUs. Profiling results suggest that the slow speed is probably due to the memory access latency caused by the large amount of global memory transactions. Possible solutions to improve the code efficiency are discussed. (authors)
A comparative study of history-based versus vectorized Monte Carlo methods in the GPU/CUDA environment for a simple neutron eigenvalue problem

Science.gov (United States)

Liu, Tianyu; Du, Xining; Ji, Wei; Xu, X. George; Brown, Forrest B.

2014-06-01

For nuclear reactor analysis such as the neutron eigenvalue calculations, the time consuming Monte Carlo (MC) simulations can be accelerated by using graphics processing units (GPUs). However, traditional MC methods are often history-based, and their performance on GPUs is affected significantly by the thread divergence problem. In this paper we describe the development of a newly designed event-based vectorized MC algorithm for solving the neutron eigenvalue problem. The code was implemented using NVIDIA's Compute Unified Device Architecture (CUDA), and tested on a NVIDIA Tesla M2090 GPU card. We found that although the vectorized MC algorithm greatly reduces the occurrence of thread divergence thus enhancing the warp execution efficiency, the overall simulation speed is roughly ten times slower than the history-based MC code on GPUs. Profiling results suggest that the slow speed is probably due to the memory access latency caused by the large amount of global memory transactions. Possible solutions to improve the code efficiency are discussed.
CUDT: A CUDA Based Decision Tree Algorithm

Directory of Open Access Journals (Sweden)

Win-Tsung Lo

2014-01-01

Full Text Available Decision tree is one of the famous classification methods in data mining. Many researches have been proposed, which were focusing on improving the performance of decision tree. However, those algorithms are developed and run on traditional distributed systems. Obviously the latency could not be improved while processing huge data generated by ubiquitous sensing node in the era without new technology help. In order to improve data processing latency in huge data mining, in this paper, we design and implement a new parallelized decision tree algorithm on a CUDA (compute unified device architecture, which is a GPGPU solution provided by NVIDIA. In the proposed system, CPU is responsible for flow control while the GPU is responsible for computation. We have conducted many experiments to evaluate system performance of CUDT and made a comparison with traditional CPU version. The results show that CUDT is 5∼55 times faster than Weka-j48 and is 18 times speedup than SPRINT for large data set.
FLOCKING-BASED DOCUMENT CLUSTERING ON THE GRAPHICS PROCESSING UNIT [Book Chapter

Energy Technology Data Exchange (ETDEWEB)

Charles, J S; Patton, R M; Potok, T E; Cui, X

2008-01-01

Analyzing and grouping documents by content is a complex problem. One explored method of solving this problem borrows from nature, imitating the fl ocking behavior of birds. Each bird represents a single document and fl ies toward other documents that are similar to it. One limitation of this method of document clustering is its complexity O(n2). As the number of documents grows, it becomes increasingly diffi cult to receive results in a reasonable amount of time. However, fl ocking behavior, along with most naturally inspired algorithms such as ant colony optimization and particle swarm optimization, are highly parallel and have experienced improved performance on expensive cluster computers. In the last few years, the graphics processing unit (GPU) has received attention for its ability to solve highly-parallel and semi-parallel problems much faster than the traditional sequential processor. Some applications see a huge increase in performance on this new platform. The cost of these high-performance devices is also marginal when compared with the price of cluster machines. In this paper, we have conducted research to exploit this architecture and apply its strengths to the document flocking problem. Our results highlight the potential benefi t the GPU brings to all naturally inspired algorithms. Using the CUDA platform from NVIDIA®, we developed a document fl ocking implementation to be run on the NVIDIA® GEFORCE 8800. Additionally, we developed a similar but sequential implementation of the same algorithm to be run on a desktop CPU. We tested the performance of each on groups of news articles ranging in size from 200 to 3,000 documents. The results of these tests were very signifi cant. Performance gains ranged from three to nearly fi ve times improvement of the GPU over the CPU implementation. This dramatic improvement in runtime makes the GPU a potentially revolutionary platform for document clustering algorithms.
DiVinE-CUDA - A Tool for GPU Accelerated LTL Model Checking

Directory of Open Access Journals (Sweden)

Jiří Barnat

2009-12-01

Full Text Available In this paper we present a tool that performs CUDA accelerated LTL Model Checking. The tool exploits parallel algorithm MAP adjusted to the NVIDIA CUDA architecture in order to efficiently detect the presence of accepting cycles in a directed graph. Accepting cycle detection is the core algorithmic procedure in automata-based LTL Model Checking. We demonstrate that the tool outperforms non-accelerated version of the algorithm and we discuss where the limits of the tool are and what we intend to do in the future to avoid them.
Software Graphics Processing Unit (sGPU) for Deep Space Applications

Science.gov (United States)

McCabe, Mary; Salazar, George; Steele, Glen

2015-01-01

A graphics processing capability will be required for deep space missions and must include a range of applications, from safety-critical vehicle health status to telemedicine for crew health. However, preliminary radiation testing of commercial graphics processing cards suggest they cannot operate in the deep space radiation environment. Investigation into an Software Graphics Processing Unit (sGPU)comprised of commercial-equivalent radiation hardened/tolerant single board computers, field programmable gate arrays, and safety-critical display software shows promising results. Preliminary performance of approximately 30 frames per second (FPS) has been achieved. Use of multi-core processors may provide a significant increase in performance.
Implementing the lattice Boltzmann model on commodity graphics hardware

International Nuclear Information System (INIS)

Kaufman, Arie; Fan, Zhe; Petkov, Kaloian

2009-01-01

Modern graphics processing units (GPUs) can perform general-purpose computations in addition to the native specialized graphics operations. Due to the highly parallel nature of graphics processing, the GPU has evolved into a many-core coprocessor that supports high data parallelism. Its performance has been growing at a rate of squared Moore's law, and its peak floating point performance exceeds that of the CPU by an order of magnitude. Therefore, it is a viable platform for time-sensitive and computationally intensive applications. The lattice Boltzmann model (LBM) computations are carried out via linear operations at discrete lattice sites, which can be implemented efficiently using a GPU-based architecture. Our simulations produce results comparable to the CPU version while improving performance by an order of magnitude. We have demonstrated that the GPU is well suited for interactive simulations in many applications, including simulating fire, smoke, lightweight objects in wind, jellyfish swimming in water, and heat shimmering and mirage (using the hybrid thermal LBM). We further advocate the use of a GPU cluster for large scale LBM simulations and for high performance computing. The Stony Brook Visual Computing Cluster has been the platform for several applications, including simulations of real-time plume dispersion in complex urban environments and thermal fluid dynamics in a pressurized water reactor. Major GPU vendors have been targeting the high performance computing market with GPU hardware implementations. Software toolkits such as NVIDIA CUDA provide a convenient development platform that abstracts the GPU and allows access to its underlying stream computing architecture. However, software programming for a GPU cluster remains a challenging task. We have therefore developed the Zippy framework to simplify GPU cluster programming. Zippy is based on global arrays combined with the stream programming model and it hides the low-level details of the
A graphics-card implementation of Monte-Carlo simulations for cosmic-ray transport

Science.gov (United States)

Tautz, R. C.

2016-05-01

A graphics card implementation of a test-particle simulation code is presented that is based on the CUDA extension of the C/C++ programming language. The original CPU version has been developed for the calculation of cosmic-ray diffusion coefficients in artificial Kolmogorov-type turbulence. In the new implementation, the magnetic turbulence generation, which is the most time-consuming part, is separated from the particle transport and is performed on a graphics card. In this article, the modification of the basic approach of integrating test particle trajectories to employ the SIMD (single instruction, multiple data) model is presented and verified. The efficiency of the new code is tested and several language-specific accelerating factors are discussed. For the example of isotropic magnetostatic turbulence, sample results are shown and a comparison to the results of the CPU implementation is performed.
Animated GIFs as vernacular graphic design

DEFF Research Database (Denmark)

Gürsimsek, Ödül Akyapi

2016-01-01

and often a mix of some of these modes, seem to enable participatory conversations by the audience communities that continue over a period of time. One example of such multimodal digital content is the graphic format called the animated GIF (graphics interchange format). This article focuses on content......Online television audiences create a variety of digital content on the internet. Fans of television production design produce and share such content to express themselves and engage with the objects of their interest. These digital expressions, which exist in the form of graphics, text, videos...... as design, both in the sense that multimodal meaning making is an act of design and in the sense that web-based graphics are designed graphics that are created through a design process. She specifically focuses on the transmedia television production entitled Lost and analyzes the design of animated GIFs...
A real-time GNSS-R system based on software-defined radio and graphics processing units

Science.gov (United States)

Hobiger, Thomas; Amagai, Jun; Aida, Masanori; Narita, Hideki

2012-04-01

Reflected signals of the Global Navigation Satellite System (GNSS) from the sea or land surface can be utilized to deduce and monitor physical and geophysical parameters of the reflecting area. Unlike most other remote sensing techniques, GNSS-Reflectometry (GNSS-R) operates as a passive radar that takes advantage from the increasing number of navigation satellites that broadcast their L-band signals. Thereby, most of the GNSS-R receiver architectures are based on dedicated hardware solutions. Software-defined radio (SDR) technology has advanced in the recent years and enabled signal processing in real-time, which makes it an ideal candidate for the realization of a flexible GNSS-R system. Additionally, modern commodity graphic cards, which offer massive parallel computing performances, allow to handle the whole signal processing chain without interfering with the PC's CPU. Thus, this paper describes a GNSS-R system which has been developed on the principles of software-defined radio supported by General Purpose Graphics Processing Units (GPGPUs), and presents results from initial field tests which confirm the anticipated capability of the system.

GPUs, a new tool of acceleration in CFD: efficiency and reliability on smoothed particle hydrodynamics methods.

Directory of Open Access Journals (Sweden)

Alejandro C Crespo

Full Text Available Smoothed Particle Hydrodynamics (SPH is a numerical method commonly used in Computational Fluid Dynamics (CFD to simulate complex free-surface flows. Simulations with this mesh-free particle method far exceed the capacity of a single processor. In this paper, as part of a dual-functioning code for either central processing units (CPUs or Graphics Processor Units (GPUs, a parallelisation using GPUs is presented. The GPU parallelisation technique uses the Compute Unified Device Architecture (CUDA of nVidia devices. Simulations with more than one million particles on a single GPU card exhibit speedups of up to two orders of magnitude over using a single-core CPU. It is demonstrated that the code achieves different speedups with different CUDA-enabled GPUs. The numerical behaviour of the SPH code is validated with a standard benchmark test case of dam break flow impacting on an obstacle where good agreement with the experimental results is observed. Both the achieved speed-ups and the quantitative agreement with experiments suggest that CUDA-based GPU programming can be used in SPH methods with efficiency and reliability.
Using CUDA Technology for Defining the Stiffness Matrix in the Subspace of Eigenvectors

Directory of Open Access Journals (Sweden)

Yu. V. Berchun

2015-01-01

Full Text Available The aim is to improve the performance of solving a problem of deformable solid mechanics through the use of GPGPU. The paper describes technologies for computing systems using both a central and a graphics processor and provides motivation for using CUDA technology as the efficient one.The paper also analyses methods to solve the problem of defining natural frequencies and design waveforms, i.e. an iteration method in the subspace. The method includes several stages. The paper considers the most resource-hungry stage, which defines the stiffness matrix in the subspace of eigenforms and gives the mathematical interpretation of this stage.The GPU choice as a computing device is justified. The paper presents an algorithm for calculating the stiffness matrix in the subspace of eigenforms taking into consideration the features of input data. The global stiffness matrix is very sparse, and its size can reach tens of millions. Therefore, it is represented as a set of the stiffness matrices of the single elements of a model. The paper analyses methods of data representation in the software and selects the best practices for GPU computing.It describes the software implementation using CUDA technology to calculate the stiffness matrix in the subspace of eigenforms. Due to the input data nature, it is impossible to use the universal libraries of matrix computations (cuSPARSE and cuBLAS for loading the GPU. For efficient use of GPU resources in the software implementation, the stiffness matrices of elements are built in the block matrices of a special form. The advantages of using shared memory in GPU calculations are described.The transfer to the GPU computations allowed a twentyfold increase in performance (as compared to the multithreaded CPU-implementation on the model of middle dimensions (degrees of freedom about 2 million. Such an acceleration of one stage speeds up defining the natural frequencies and waveforms by the iteration method in a subspace
Elastically deformable models based on the finite element method accelerated on graphics hardware using CUDA

NARCIS (Netherlands)

Verschoor, M.; Jalba, A.C.

2012-01-01

Elastically deformable models have found applications in various areas ranging from mechanical sciences and engineering to computer graphics. The method of Finite Elements has been the tool of choice for solving the underlying PDE, when accuracy and stability of the computations are more important
Printing--Graphic Arts--Graphic Communications

Science.gov (United States)

Hauenstein, A. Dean

1975-01-01

Recently, "graphic arts" has shifted from printing skills to a conceptual approach of production processes. "Graphic communications" must embrace the total system of communication through graphic media, to serve broad career education purposes; students taught concepts and principles can be flexible and adaptive. The author…
Interactive and Animated Scalable Vector Graphics and R Data Displays

Directory of Open Access Journals (Sweden)

Deborah Nolan

2012-01-01

Full Text Available We describe an approach to creating interactive and animated graphical displays using R's graphics engine and Scalable Vector Graphics, an XML vocabulary for describing two-dimensional graphical displays. We use the svg( graphics device in R and then post-process the resulting XML documents. The post-processing identities the elements in the SVG that correspond to the different components of the graphical display, e.g., points, axes, labels, lines. One can then annotate these elements to add interactivity and animation effects. One can also use JavaScript to provide dynamic interactive effects to the plot, enabling rich user interactions and compelling visualizations. The resulting SVG documents can be embedded withinHTML documents and can involve JavaScript code that integrates the SVG and HTML objects. The functionality is provided via the SVGAnnotation package and makes static plots generated via R graphics functions available as stand-alone, interactive and animated plots for the Web and other venues.
Development of the spent fuel disassembling process by utilizing the 3D graphic design technology

International Nuclear Information System (INIS)

Song, T. K.; Lee, J. Y.; Kim, S. H.; Yun, J. S.

2001-01-01

For developing the spent fuel disassembling process, the 3D graphic simulation has been established by utilizing the 3D graphic design technology which is widely used in the industry. The spent fuel disassembling process consists of a downender, a rod extraction device, a rod cutting device, a pellet extracting device and a skeleton compaction device. In this study, the 3D graphical design model of these devices is implemented by conceptual design and established the virtual workcell within kinematics to motion of each device. By implementing this graphic simulation, all the unit process involved in the spent fuel disassembling processes are analyzed and optimized. The 3D graphical model and the 3D graphic simulation can be effectively used for designing the process equipment, as well as the optimized process and maintenance process
Ratoath Manor Nursing Home, Ratoath, Meath.

LENUS (Irish Health Repository)

Klus, Petr

2012-01-13

Abstract Background With the maturation of next-generation DNA sequencing (NGS) technologies, the throughput of DNA sequencing reads has soared to over 600 gigabases from a single instrument run. General purpose computing on graphics processing units (GPGPU), extracts the computing power from hundreds of parallel stream processors within graphics processing cores and provides a cost-effective and energy efficient alternative to traditional high-performance computing (HPC) clusters. In this article, we describe the implementation of BarraCUDA, a GPGPU sequence alignment software that is based on BWA, to accelerate the alignment of sequencing reads generated by these instruments to a reference DNA sequence. Findings Using the NVIDIA Compute Unified Device Architecture (CUDA) software development environment, we ported the most computational-intensive alignment component of BWA to GPU to take advantage of the massive parallelism. As a result, BarraCUDA offers a magnitude of performance boost in alignment throughput when compared to a CPU core while delivering the same level of alignment fidelity. The software is also capable of supporting multiple CUDA devices in parallel to further accelerate the alignment throughput. Conclusions BarraCUDA is designed to take advantage of the parallelism of GPU to accelerate the alignment of millions of sequencing reads generated by NGS instruments. By doing this, we could, at least in part streamline the current bioinformatics pipeline such that the wider scientific community could benefit from the sequencing technology. BarraCUDA is currently available from http:\\/\\/seqbarracuda.sf.net
Heterogeneous Multicore Parallel Programming for Graphics Processing Units

Directory of Open Access Journals (Sweden)

Francois Bodin

2009-01-01

Full Text Available Hybrid parallel multicore architectures based on graphics processing units (GPUs can provide tremendous computing power. Current NVIDIA and AMD Graphics Product Group hardware display a peak performance of hundreds of gigaflops. However, exploiting GPUs from existing applications is a difficult task that requires non-portable rewriting of the code. In this paper, we present HMPP, a Heterogeneous Multicore Parallel Programming workbench with compilers, developed by CAPS entreprise, that allows the integration of heterogeneous hardware accelerators in a unintrusive manner while preserving the legacy code.
Discrete-Event Execution Alternatives on General Purpose Graphical Processing Units

International Nuclear Information System (INIS)

Perumalla, Kalyan S.

2006-01-01

Graphics cards, traditionally designed as accelerators for computer graphics, have evolved to support more general-purpose computation. General Purpose Graphical Processing Units (GPGPUs) are now being used as highly efficient, cost-effective platforms for executing certain simulation applications. While most of these applications belong to the category of time-stepped simulations, little is known about the applicability of GPGPUs to discrete event simulation (DES). Here, we identify some of the issues and challenges that the GPGPU stream-based interface raises for DES, and present some possible approaches to moving DES to GPGPUs. Initial performance results on simulation of a diffusion process show that DES-style execution on GPGPU runs faster than DES on CPU and also significantly faster than time-stepped simulations on either CPU or GPGPU.
Accelerating wavelet lifting on graphics hardware using CUDA

NARCIS (Netherlands)

Laan, van der W.J.; Roerdink, J.B.T.M.; Jalba, A.C.

2011-01-01

The Discrete Wavelet Transform (DWT) has a wide range of applications from signal processing to video and image compression. We show that this transform, by means of the lifting scheme, can be performed in a memory and computation-efficient way on modern, programmable GPUs, which can be regarded as
Accelerating Wavelet Lifting on Graphics Hardware Using CUDA

NARCIS (Netherlands)

Laan, Wladimir J. van der; Jalba, Andrei C.; Roerdink, Jos B.T.M.

The Discrete Wavelet Transform (DWT) has a wide range of applications from signal processing to video and image compression. We show that this transform, by means of the lifting scheme, can be performed in a memory and computation-efficient way on modern, programmable GPUs, which can be regarded as
Forward and adjoint spectral-element simulations of seismic wave propagation using hardware accelerators

Science.gov (United States)

Peter, Daniel; Videau, Brice; Pouget, Kevin; Komatitsch, Dimitri

2015-04-01

Improving the resolution of tomographic images is crucial to answer important questions on the nature of Earth's subsurface structure and internal processes. Seismic tomography is the most prominent approach where seismic signals from ground-motion records are used to infer physical properties of internal structures such as compressional- and shear-wave speeds, anisotropy and attenuation. Recent advances in regional- and global-scale seismic inversions move towards full-waveform inversions which require accurate simulations of seismic wave propagation in complex 3D media, providing access to the full 3D seismic wavefields. However, these numerical simulations are computationally very expensive and need high-performance computing (HPC) facilities for further improving the current state of knowledge. During recent years, many-core architectures such as graphics processing units (GPUs) have been added to available large HPC systems. Such GPU-accelerated computing together with advances in multi-core central processing units (CPUs) can greatly accelerate scientific applications. There are mainly two possible choices of language support for GPU cards, the CUDA programming environment and OpenCL language standard. CUDA software development targets NVIDIA graphic cards while OpenCL was adopted mainly by AMD graphic cards. In order to employ such hardware accelerators for seismic wave propagation simulations, we incorporated a code generation tool BOAST into an existing spectral-element code package SPECFEM3D_GLOBE. This allows us to use meta-programming of computational kernels and generate optimized source code for both CUDA and OpenCL languages, running simulations on either CUDA or OpenCL hardware accelerators. We show here applications of forward and adjoint seismic wave propagation on CUDA/OpenCL GPUs, validating results and comparing performances for different simulations and hardware usages.
High performance graphics processor based computed tomography reconstruction algorithms for nuclear and other large scale applications.

Energy Technology Data Exchange (ETDEWEB)

Jimenez, Edward S. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Orr, Laurel J. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Thompson, Kyle R. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

2013-09-01

The goal of this work is to develop a fast computed tomography (CT) reconstruction algorithm based on graphics processing units (GPU) that achieves significant improvement over traditional central processing unit (CPU) based implementations. The main challenge in developing a CT algorithm that is capable of handling very large datasets is parallelizing the algorithm in such a way that data transfer does not hinder performance of the reconstruction algorithm. General Purpose Graphics Processing (GPGPU) is a new technology that the Science and Technology (S&T) community is starting to adopt in many fields where CPU-based computing is the norm. GPGPU programming requires a new approach to algorithm development that utilizes massively multi-threaded environments. Multi-threaded algorithms in general are difficult to optimize since performance bottlenecks occur that are non-existent in single-threaded algorithms such as memory latencies. If an efficient GPU-based CT reconstruction algorithm can be developed; computational times could be improved by a factor of 20. Additionally, cost benefits will be realized as commodity graphics hardware could potentially replace expensive supercomputers and high-end workstations. This project will take advantage of the CUDA programming environment and attempt to parallelize the task in such a way that multiple slices of the reconstruction volume are computed simultaneously. This work will also take advantage of the GPU memory by utilizing asynchronous memory transfers, GPU texture memory, and (when possible) pinned host memory so that the memory transfer bottleneck inherent to GPGPU is amortized. Additionally, this work will take advantage of GPU-specific hardware (i.e. fast texture memory, pixel-pipelines, hardware interpolators, and varying memory hierarchy) that will allow for additional performance improvements.
Common Graphics Library (CGL). Volume 1: LEZ user's guide

Science.gov (United States)

Taylor, Nancy L.; Hammond, Dana P.; Hofler, Alicia S.; Miner, David L.

1988-01-01

Users are introduced to and instructed in the use of the Langley Easy (LEZ) routines of the Common Graphics Library (CGL). The LEZ routines form an application independent graphics package which enables the user community to view data quickly and easily, while providing a means of generating scientific charts conforming to the publication and/or viewgraph process. A distinct advantage for using the LEZ routines is that the underlying graphics package may be replaced or modified without requiring the users to change their application programs. The library is written in ANSI FORTRAN 77, and currently uses a CORE-based underlying graphics package, and is therefore machine independent, providing support for centralized and/or distributed computer systems.
Reflector antenna analysis using physical optics on Graphics Processing Units

DEFF Research Database (Denmark)

Borries, Oscar Peter; Sørensen, Hans Henrik Brandenborg; Dammann, Bernd

2014-01-01

The Physical Optics approximation is a widely used asymptotic method for calculating the scattering from electrically large bodies. It requires significant computational work and little memory, and is thus well suited for application on a Graphics Processing Unit. Here, we investigate the perform......The Physical Optics approximation is a widely used asymptotic method for calculating the scattering from electrically large bodies. It requires significant computational work and little memory, and is thus well suited for application on a Graphics Processing Unit. Here, we investigate...
GPU: the biggest key processor for AI and parallel processing

Science.gov (United States)

Baji, Toru

2017-07-01

Two types of processors exist in the market. One is the conventional CPU and the other is Graphic Processor Unit (GPU). Typical CPU is composed of 1 to 8 cores while GPU has thousands of cores. CPU is good for sequential processing, while GPU is good to accelerate software with heavy parallel executions. GPU was initially dedicated for 3D graphics. However from 2006, when GPU started to apply general-purpose cores, it was noticed that this architecture can be used as a general purpose massive-parallel processor. NVIDIA developed a software framework Compute Unified Device Architecture (CUDA) that make it possible to easily program the GPU for these application. With CUDA, GPU started to be used in workstations and supercomputers widely. Recently two key technologies are highlighted in the industry. The Artificial Intelligence (AI) and Autonomous Driving Cars. AI requires a massive parallel operation to train many-layers of neural networks. With CPU alone, it was impossible to finish the training in a practical time. The latest multi-GPU system with P100 makes it possible to finish the training in a few hours. For the autonomous driving cars, TOPS class of performance is required to implement perception, localization, path planning processing and again SoC with integrated GPU will play a key role there. In this paper, the evolution of the GPU which is one of the biggest commercial devices requiring state-of-the-art fabrication technology will be introduced. Also overview of the GPU demanding key application like the ones described above will be introduced.
Improving Software Performance in the Compute Unified Device Architecture

Directory of Open Access Journals (Sweden)

Alexandru PIRJAN

2010-01-01

Full Text Available This paper analyzes several aspects regarding the improvement of software performance for applications written in the Compute Unified Device Architecture CUDA. We address an issue of great importance when programming a CUDA application: the Graphics Processing Unit’s (GPU’s memory management through ranspose ernels. We also benchmark and evaluate the performance for progressively optimizing a transposing matrix application in CUDA. One particular interest was to research how well the optimization techniques, applied to software application written in CUDA, scale to the latest generation of general-purpose graphic processors units (GPGPU, like the Fermi architecture implemented in the GTX480 and the previous architecture implemented in GTX280. Lately, there has been a lot of interest in the literature for this type of optimization analysis, but none of the works so far (to our best knowledge tried to validate if the optimizations can apply to a GPU from the latest Fermi architecture and how well does the Fermi architecture scale to these software performance improving techniques.
Functional graphical languages for process control

International Nuclear Information System (INIS)

1996-01-01

A wide variety of safety systems are in use today in the process industries. Most of these systems rely on control software using procedural programming languages. This study investigates the use of functional graphical languages for controls in the process industry. Different vendor proprietary software and languages are investigated and evaluation criteria are outlined based on ability to meet regulatory requirements, reference sites involving applications with similar safety concerns, QA/QC procedures, community of users, type and user-friendliness of the man-machine interface, performance of operational code, and degree of flexibility. (author) 16 refs., 4 tabs
Massively parallel data processing for quantitative total flow imaging with optical coherence microscopy and tomography

Science.gov (United States)

Sylwestrzak, Marcin; Szlag, Daniel; Marchand, Paul J.; Kumar, Ashwin S.; Lasser, Theo

2017-08-01

We present an application of massively parallel processing of quantitative flow measurements data acquired using spectral optical coherence microscopy (SOCM). The need for massive signal processing of these particular datasets has been a major hurdle for many applications based on SOCM. In view of this difficulty, we implemented and adapted quantitative total flow estimation algorithms on graphics processing units (GPU) and achieved a 150 fold reduction in processing time when compared to a former CPU implementation. As SOCM constitutes the microscopy counterpart to spectral optical coherence tomography (SOCT), the developed processing procedure can be applied to both imaging modalities. We present the developed DLL library integrated in MATLAB (with an example) and have included the source code for adaptations and future improvements. Catalogue identifier: AFBT_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AFBT_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU GPLv3 No. of lines in distributed program, including test data, etc.: 913552 No. of bytes in distributed program, including test data, etc.: 270876249 Distribution format: tar.gz Programming language: CUDA/C, MATLAB. Computer: Intel x64 CPU, GPU supporting CUDA technology. Operating system: 64-bit Windows 7 Professional. Has the code been vectorized or parallelized?: Yes, CPU code has been vectorized in MATLAB, CUDA code has been parallelized. RAM: Dependent on users parameters, typically between several gigabytes and several tens of gigabytes Classification: 6.5, 18. Nature of problem: Speed up of data processing in optical coherence microscopy Solution method: Utilization of GPU for massively parallel data processing Additional comments: Compiled DLL library with source code and documentation, example of utilization (MATLAB script with raw data) Running time: 1,8 s for one B-scan (150 × faster in comparison to the CPU
Stochastic Analysis of a Queue Length Model Using a Graphics Processing Unit

Czech Academy of Sciences Publication Activity Database

Přikryl, Jan; Kocijan, J.

2012-01-01

Roč. 5, č. 2 (2012), s. 55-62 ISSN 1802-971X R&D Projects: GA MŠk(CZ) MEB091015 Institutional support: RVO:67985556 Keywords : graphics processing unit * GPU * Monte Carlo simulation * computer simulation * modeling Subject RIV: BC - Control Systems Theory http://library.utia.cas.cz/separaty/2012/AS/prikryl-stochastic analysis of a queue length model using a graphics processing unit.pdf

Micromagnetic simulations using Graphics Processing Units

International Nuclear Information System (INIS)

Lopez-Diaz, L; Aurelio, D; Torres, L; Martinez, E; Hernandez-Lopez, M A; Gomez, J; Alejos, O; Carpentieri, M; Finocchio, G; Consolo, G

2012-01-01

The methodology for adapting a standard micromagnetic code to run on graphics processing units (GPUs) and exploit the potential for parallel calculations of this platform is discussed. GPMagnet, a general purpose finite-difference GPU-based micromagnetic tool, is used as an example. Speed-up factors of two orders of magnitude can be achieved with GPMagnet with respect to a serial code. This allows for running extensive simulations, nearly inaccessible with a standard micromagnetic solver, at reasonable computational times. (topical review)
Python for Development of OpenMP and CUDA Kernels for Multidimensional Data

International Nuclear Information System (INIS)

Bell, Zane W.; Davidson, Gregory G.; D'Azevedo, Ed F.; Evans, Thomas M.; Joubert, Wayne; Munro, John K. Jr.; Patlolla, Dilip Reddy; Vacaliuc, Bogdan

2011-01-01

Design of data structures for high performance computing (HPC) is one of the principal challenges facing researchers looking to utilize heterogeneous computing machinery. Heterogeneous systems derive cost, power, and speed efficiency by being composed of the appropriate hardware for the task. Yet, each type of processor requires a specific organization of the application state in order to achieve peak performance. Discovering this and refactoring the code can be a challenging and time-consuming task for the researcher, as the data structures and the computational model must be co-designed. We present a methodology that uses Python as the environment for which to explore tradeoffs in both the data structure design as well as the code executing on the computation accelerator. Our method enables multi-dimensional arrays to be used effectively in any target environment. We have chosen to focus on OpenMP and CUDA environments, thus exploring the development of optimized kernels for the two most common classes of computing hardware available today: multi-core CPU and GPU. Python s large palette of file and network access routines, its associative indexing syntax and support for common HPC environments makes it relevant for diverse hardware ranging from laptops through computing clusters to the highest performance supercomputers. Our work enables researchers to accelerate the development of their codes on the computing hardware of their choice.
A modular CUDA-based framework for scale-space feature detection in video streams

International Nuclear Information System (INIS)

Kinsner, M; Capson, D; Spence, A

2010-01-01

Multi-scale image processing techniques enable extraction of features where the size of a feature is either unknown or changing, but the requirement to process image data at multiple scale levels imposes a substantial computational load. This paper describes the architecture and emerging results from the implementation of a GPGPU-accelerated scale-space feature detection framework for video processing. A discrete scale-space representation is generated for image frames within a video stream, and multi-scale feature detection metrics are applied to detect ridges and Gaussian blobs at video frame rates. A modular structure is adopted, in which common feature extraction tasks such as non-maximum suppression and local extrema search may be reused across a variety of feature detectors. Extraction of ridge and blob features is achieved at faster than 15 frames per second on video sequences from a machine vision system, utilizing an NVIDIA GTX 480 graphics card. By design, the framework is easily extended to additional feature classes through the inclusion of feature metrics to be applied to the scale-space representation, and using common post-processing modules to reduce the required CPU workload. The framework is scalable across multiple and more capable GPUs, and enables previously intractable image processing at video frame rates using commodity computational hardware.
Common Graphics Library (CGL). Volume 2: Low-level user's guide

Science.gov (United States)

Taylor, Nancy L.; Hammond, Dana P.; Theophilos, Pauline M.

1989-01-01

The intent is to instruct the users of the Low-Level routines of the Common Graphics Library (CGL). The Low-Level routines form an application-independent graphics package enabling the user community to construct and design scientific charts conforming to the publication and/or viewgraph process. The Low-Level routines allow the user to design unique or unusual report-quality charts from a set of graphics utilities. The features of these routines can be used stand-alone or in conjunction with other packages to enhance or augment their capabilities. This library is written in ANSI FORTRAN 77, and currently uses a CORE-based underlying graphics package, and is therefore machine-independent, providing support for centralized and/or distributed computer systems.
Visualisation for Stochastic Process Algebras: The Graphic Truth

DEFF Research Database (Denmark)

Smith, Michael James Andrew; Gilmore, Stephen

2011-01-01

and stochastic activity networks provide an automaton-based view of the model, which may be easier to visualise, at the expense of portability. In this paper, we argue that we can achieve the benefits of both approaches by generating a graphical view of a stochastic process algebra model, which is synchronised...
iMOSFLM: a new graphical interface for diffraction-image processing with MOSFLM

International Nuclear Information System (INIS)

Battye, T. Geoff G.; Kontogiannis, Luke; Johnson, Owen; Powell, Harold R.; Leslie, Andrew G. W.

2011-01-01

A new graphical user interface to the MOSFLM program has been developed to simplify the processing of macromolecular diffraction data. The interface, iMOSFLM, allows data processing via a series of clearly defined tasks and provides visual feedback on the progress of each stage. iMOSFLM is a graphical user interface to the diffraction data-integration program MOSFLM. It is designed to simplify data processing by dividing the process into a series of steps, which are normally carried out sequentially. Each step has its own display pane, allowing control over parameters that influence that step and providing graphical feedback to the user. Suitable values for integration parameters are set automatically, but additional menus provide a detailed level of control for experienced users. The image display and the interfaces to the different tasks (indexing, strategy calculation, cell refinement, integration and history) are described. The most important parameters for each step and the best way of assessing success or failure are discussed
Development of GPU Based Parallel Computing Module for Solving Pressure Equation in the CUPID Component Thermo-Fluid Analysis Code

International Nuclear Information System (INIS)

Lee, Jin Pyo; Joo, Han Gyu

2010-01-01

In the thermo-fluid analysis code named CUPID, the linear system of pressure equations must be solved in each iteration step. The time for repeatedly solving the linear system can be quite significant because large sparse matrices of Rank more than 50,000 are involved and the diagonal dominance of the system is hardly hold. Therefore parallelization of the linear system solver is essential to reduce the computing time. Meanwhile, Graphics Processing Units (GPU) have been developed as highly parallel, multi-core processors for the global demand of high quality 3D graphics. If a suitable interface is provided, parallelization using GPU can be available to engineering computing. NVIDIA provides a Software Development Kit(SDK) named CUDA(Compute Unified Device Architecture) to code developers so that they can manage GPUs for parallelization using the C language. In this research, we implement parallel routines for the linear system solver using CUDA, and examine the performance of the parallelization. In the next section, we will describe the method of CUDA parallelization for the CUPID code, and then the performance of the CUDA parallelization will be discussed
Accelerating Smith-Waterman Alignment for Protein Database Search Using Frequency Distance Filtration Scheme Based on CPU-GPU Collaborative System

Directory of Open Access Journals (Sweden)

Yu Liu

2015-01-01

Full Text Available The Smith-Waterman (SW algorithm has been widely utilized for searching biological sequence databases in bioinformatics. Recently, several works have adopted the graphic card with Graphic Processing Units (GPUs and their associated CUDA model to enhance the performance of SW computations. However, these works mainly focused on the protein database search by using the intertask parallelization technique, and only using the GPU capability to do the SW computations one by one. Hence, in this paper, we will propose an efficient SW alignment method, called CUDA-SWfr, for the protein database search by using the intratask parallelization technique based on a CPU-GPU collaborative system. Before doing the SW computations on GPU, a procedure is applied on CPU by using the frequency distance filtration scheme (FDFS to eliminate the unnecessary alignments. The experimental results indicate that CUDA-SWfr runs 9.6 times and 96 times faster than the CPU-based SW method without and with FDFS, respectively.
Accelerating Smith-Waterman Alignment for Protein Database Search Using Frequency Distance Filtration Scheme Based on CPU-GPU Collaborative System.

Science.gov (United States)

Liu, Yu; Hong, Yang; Lin, Chun-Yuan; Hung, Che-Lun

2015-01-01

The Smith-Waterman (SW) algorithm has been widely utilized for searching biological sequence databases in bioinformatics. Recently, several works have adopted the graphic card with Graphic Processing Units (GPUs) and their associated CUDA model to enhance the performance of SW computations. However, these works mainly focused on the protein database search by using the intertask parallelization technique, and only using the GPU capability to do the SW computations one by one. Hence, in this paper, we will propose an efficient SW alignment method, called CUDA-SWfr, for the protein database search by using the intratask parallelization technique based on a CPU-GPU collaborative system. Before doing the SW computations on GPU, a procedure is applied on CPU by using the frequency distance filtration scheme (FDFS) to eliminate the unnecessary alignments. The experimental results indicate that CUDA-SWfr runs 9.6 times and 96 times faster than the CPU-based SW method without and with FDFS, respectively.
A new tool for supervised classification of satellite images available on web servers: Google Maps as a case study

Science.gov (United States)

García-Flores, Agustín.; Paz-Gallardo, Abel; Plaza, Antonio; Li, Jun

2016-10-01

This paper describes a new web platform dedicated to the classification of satellite images called Hypergim. The current implementation of this platform enables users to perform classification of satellite images from any part of the world thanks to the worldwide maps provided by Google Maps. To perform this classification, Hypergim uses unsupervised algorithms like Isodata and K-means. Here, we present an extension of the original platform in which we adapt Hypergim in order to use supervised algorithms to improve the classification results. This involves a significant modification of the user interface, providing the user with a way to obtain samples of classes present in the images to use in the training phase of the classification process. Another main goal of this development is to improve the runtime of the image classification process. To achieve this goal, we use a parallel implementation of the Random Forest classification algorithm. This implementation is a modification of the well-known CURFIL software package. The use of this type of algorithms to perform image classification is widespread today thanks to its precision and ease of training. The actual implementation of Random Forest was developed using CUDA platform, which enables us to exploit the potential of several models of NVIDIA graphics processing units using them to execute general purpose computing tasks as image classification algorithms. As well as CUDA, we use other parallel libraries as Intel Boost, taking advantage of the multithreading capabilities of modern CPUs. To ensure the best possible results, the platform is deployed in a cluster of commodity graphics processing units (GPUs), so that multiple users can use the tool in a concurrent way. The experimental results indicate that this new algorithm widely outperform the previous unsupervised algorithms implemented in Hypergim, both in runtime as well as precision of the actual classification of the images.
Spectra processing with computer graphics

International Nuclear Information System (INIS)

Kruse, H.

1979-01-01

A program of processng gamma-ray spectra in rock analysis is described. The peak search was performed by applying a cross-correlation function. The experimental data were approximated by an analytical function represented by the sum of a polynomial and a multiple peak function. The latter is Gaussian, joined with the low-energy side by an exponential. A modified Gauss-Newton algorithm is applied for the purpose of fitting the data to the function. The processing of the values derived from a lunar sample demonstrates the effect of different choices of polynomial orders for approximating the background for various fitting intervals. Observations on applications of interactive graphics are presented. 3 figures, 1 table
Spins Dynamics in a Dissipative Environment: Hierarchal Equations of Motion Approach Using a Graphics Processing Unit (GPU).

Science.gov (United States)

Tsuchimoto, Masashi; Tanimura, Yoshitaka

2015-08-11

A system with many energy states coupled to a harmonic oscillator bath is considered. To study quantum non-Markovian system-bath dynamics numerically rigorously and nonperturbatively, we developed a computer code for the reduced hierarchy equations of motion (HEOM) for a graphics processor unit (GPU) that can treat the system as large as 4096 energy states. The code employs a Padé spectrum decomposition (PSD) for a construction of HEOM and the exponential integrators. Dynamics of a quantum spin glass system are studied by calculating the free induction decay signal for the cases of 3 × 2 to 3 × 4 triangular lattices with antiferromagnetic interactions. We found that spins relax faster at lower temperature due to transitions through a quantum coherent state, as represented by the off-diagonal elements of the reduced density matrix, while it has been known that the spins relax slower due to suppression of thermal activation in a classical case. The decay of the spins are qualitatively similar regardless of the lattice sizes. The pathway of spin relaxation is analyzed under a sudden temperature drop condition. The Compute Unified Device Architecture (CUDA) based source code used in the present calculations is provided as Supporting Information .
Compute-unified device architecture implementation of a block-matching algorithm for multiple graphical processing unit cards.

Science.gov (United States)

Massanes, Francesc; Cadennes, Marie; Brankov, Jovan G

2011-07-01

In this paper we describe and evaluate a fast implementation of a classical block matching motion estimation algorithm for multiple Graphical Processing Units (GPUs) using the Compute Unified Device Architecture (CUDA) computing engine. The implemented block matching algorithm (BMA) uses summed absolute difference (SAD) error criterion and full grid search (FS) for finding optimal block displacement. In this evaluation we compared the execution time of a GPU and CPU implementation for images of various sizes, using integer and non-integer search grids.The results show that use of a GPU card can shorten computation time by a factor of 200 times for integer and 1000 times for a non-integer search grid. The additional speedup for non-integer search grid comes from the fact that GPU has built-in hardware for image interpolation. Further, when using multiple GPU cards, the presented evaluation shows the importance of the data splitting method across multiple cards, but an almost linear speedup with a number of cards is achievable.In addition we compared execution time of the proposed FS GPU implementation with two existing, highly optimized non-full grid search CPU based motion estimations methods, namely implementation of the Pyramidal Lucas Kanade Optical flow algorithm in OpenCV and Simplified Unsymmetrical multi-Hexagon search in H.264/AVC standard. In these comparisons, FS GPU implementation still showed modest improvement even though the computational complexity of FS GPU implementation is substantially higher than non-FS CPU implementation.We also demonstrated that for an image sequence of 720×480 pixels in resolution, commonly used in video surveillance, the proposed GPU implementation is sufficiently fast for real-time motion estimation at 30 frames-per-second using two NVIDIA C1060 Tesla GPU cards.
A low-cost system for graphical process monitoring with colour video symbol display units

International Nuclear Information System (INIS)

Grauer, H.; Jarsch, V.; Mueller, W.

1977-01-01

A system for computer controlled graphic process supervision, using color symbol video displays is described. It has the following characteristics: - compact unit: no external memory for image storage - problem oriented simple descriptive cut to the process program - no restriction of the graphical representation of process variables - computer and display independent, by implementation of colours and parameterized code creation for the display. (WB) [de
Energy- and cost-efficient lattice-QCD computations using graphics processing units

Energy Technology Data Exchange (ETDEWEB)

Bach, Matthias

2014-07-01

Quarks and gluons are the building blocks of all hadronic matter, like protons and neutrons. Their interaction is described by Quantum Chromodynamics (QCD), a theory under test by large scale experiments like the Large Hadron Collider (LHC) at CERN and in the future at the Facility for Antiproton and Ion Research (FAIR) at GSI. However, perturbative methods can only be applied to QCD for high energies. Studies from first principles are possible via a discretization onto an Euclidean space-time grid. This discretization of QCD is called Lattice QCD (LQCD) and is the only ab-initio option outside of the high-energy regime. LQCD is extremely compute and memory intensive. In particular, it is by definition always bandwidth limited. Thus - despite the complexity of LQCD applications - it led to the development of several specialized compute platforms and influenced the development of others. However, in recent years General-Purpose computation on Graphics Processing Units (GPGPU) came up as a new means for parallel computing. Contrary to machines traditionally used for LQCD, graphics processing units (GPUs) are a massmarket product. This promises advantages in both the pace at which higher-performing hardware becomes available and its price. CL2QCD is an OpenCL based implementation of LQCD using Wilson fermions that was developed within this thesis. It operates on GPUs by all major vendors as well as on central processing units (CPUs). On the AMD Radeon HD 7970 it provides the fastest double-precision D kernel for a single GPU, achieving 120GFLOPS. D - the most compute intensive kernel in LQCD simulations - is commonly used to compare LQCD platforms. This performance is enabled by an in-depth analysis of optimization techniques for bandwidth-limited codes on GPUs. Further, analysis of the communication between GPU and CPU, as well as between multiple GPUs, enables high-performance Krylov space solvers and linear scaling to multiple GPUs within a single system. LQCD
Energy- and cost-efficient lattice-QCD computations using graphics processing units

International Nuclear Information System (INIS)

Bach, Matthias

2014-01-01

Quarks and gluons are the building blocks of all hadronic matter, like protons and neutrons. Their interaction is described by Quantum Chromodynamics (QCD), a theory under test by large scale experiments like the Large Hadron Collider (LHC) at CERN and in the future at the Facility for Antiproton and Ion Research (FAIR) at GSI. However, perturbative methods can only be applied to QCD for high energies. Studies from first principles are possible via a discretization onto an Euclidean space-time grid. This discretization of QCD is called Lattice QCD (LQCD) and is the only ab-initio option outside of the high-energy regime. LQCD is extremely compute and memory intensive. In particular, it is by definition always bandwidth limited. Thus - despite the complexity of LQCD applications - it led to the development of several specialized compute platforms and influenced the development of others. However, in recent years General-Purpose computation on Graphics Processing Units (GPGPU) came up as a new means for parallel computing. Contrary to machines traditionally used for LQCD, graphics processing units (GPUs) are a massmarket product. This promises advantages in both the pace at which higher-performing hardware becomes available and its price. CL2QCD is an OpenCL based implementation of LQCD using Wilson fermions that was developed within this thesis. It operates on GPUs by all major vendors as well as on central processing units (CPUs). On the AMD Radeon HD 7970 it provides the fastest double-precision D kernel for a single GPU, achieving 120GFLOPS. D - the most compute intensive kernel in LQCD simulations - is commonly used to compare LQCD platforms. This performance is enabled by an in-depth analysis of optimization techniques for bandwidth-limited codes on GPUs. Further, analysis of the communication between GPU and CPU, as well as between multiple GPUs, enables high-performance Krylov space solvers and linear scaling to multiple GPUs within a single system. LQCD
Significantly reducing registration time in IGRT using graphics processing units

DEFF Research Database (Denmark)

Noe, Karsten Østergaard; Denis de Senneville, Baudouin; Tanderup, Kari

2008-01-01

respiration phases in a free breathing volunteer and 41 anatomical landmark points in each image series. The registration method used is a multi-resolution GPU implementation of the 3D Horn and Schunck algorithm. It is based on the CUDA framework from Nvidia. Results On an Intel Core 2 CPU at 2.4GHz each...... registration took 30 minutes. On an Nvidia Geforce 8800GTX GPU in the same machine this registration took 37 seconds, making the GPU version 48.7 times faster. The nine image series of different respiration phases were registered to the same reference image (full inhale). Accuracy was evaluated on landmark...
On the design of a demo for exhibiting rCUDA

OpenAIRE

Reaño González, Carlos; Pérez López, Ferrán; Silla Jiménez, Federico

2015-01-01

© 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. CUDA is a technology developed by NVIDIA which provides a parallel computing platform and programming mo...
Development of a Monte Carlo software to photon transportation in voxel structures using graphic processing units; Desenvolvimento de um software de Monte Carlo para transporte de fotons em estruturas de voxels usando unidades de processamento grafico

Energy Technology Data Exchange (ETDEWEB)

Bellezzo, Murillo

2014-09-01

As the most accurate method to estimate absorbed dose in radiotherapy, Monte Carlo Method (MCM) has been widely used in radiotherapy treatment planning. Nevertheless, its efficiency can be improved for clinical routine applications. In this thesis, the CUBMC code is presented, a GPU-based MC photon transport algorithm for dose calculation under the Compute Unified Device Architecture (CUDA) platform. The simulation of physical events is based on the algorithm used in PENELOPE, and the cross section table used is the one generated by the MATERIAL routine, also present in PENELOPE code. Photons are transported in voxel-based geometries with different compositions. There are two distinct approaches used for transport simulation. The rst of them forces the photon to stop at every voxel frontier, the second one is the Woodcock method, where the photon ignores the existence of borders and travels in homogeneous fictitious media. The CUBMC code aims to be an alternative of Monte Carlo simulator code that, by using the capability of parallel processing of graphics processing units (GPU), provide high performance simulations in low cost compact machines, and thus can be applied in clinical cases and incorporated in treatment planning systems for radiotherapy. (author)
A Monte Carlo neutron transport code for eigenvalue calculations on a dual-GPU system and CUDA environment

Energy Technology Data Exchange (ETDEWEB)

Liu, T.; Ding, A.; Ji, W.; Xu, X. G. [Nuclear Engineering and Engineering Physics, Rensselaer Polytechnic Inst., Troy, NY 12180 (United States); Carothers, C. D. [Dept. of Computer Science, Rensselaer Polytechnic Inst. RPI (United States); Brown, F. B. [Los Alamos National Laboratory (LANL) (United States)

2012-07-01

Monte Carlo (MC) method is able to accurately calculate eigenvalues in reactor analysis. Its lengthy computation time can be reduced by general-purpose computing on Graphics Processing Units (GPU), one of the latest parallel computing techniques under development. The method of porting a regular transport code to GPU is usually very straightforward due to the 'embarrassingly parallel' nature of MC code. However, the situation becomes different for eigenvalue calculation in that it will be performed on a generation-by-generation basis and the thread coordination should be explicitly taken care of. This paper presents our effort to develop such a GPU-based MC code in Compute Unified Device Architecture (CUDA) environment. The code is able to perform eigenvalue calculation under simple geometries on a multi-GPU system. The specifics of algorithm design, including thread organization and memory management were described in detail. The original CPU version of the code was tested on an Intel Xeon X5660 2.8 GHz CPU, and the adapted GPU version was tested on NVIDIA Tesla M2090 GPUs. Double-precision floating point format was used throughout the calculation. The result showed that a speedup of 7.0 and 33.3 were obtained for a bare spherical core and a binary slab system respectively. The speedup factor was further increased by a factor of {approx}2 on a dual GPU system. The upper limit of device-level parallelism was analyzed, and a possible method to enhance the thread-level parallelism was proposed. (authors)

A Monte Carlo neutron transport code for eigenvalue calculations on a dual-GPU system and CUDA environment

International Nuclear Information System (INIS)

Liu, T.; Ding, A.; Ji, W.; Xu, X. G.; Carothers, C. D.; Brown, F. B.

2012-01-01

Monte Carlo (MC) method is able to accurately calculate eigenvalues in reactor analysis. Its lengthy computation time can be reduced by general-purpose computing on Graphics Processing Units (GPU), one of the latest parallel computing techniques under development. The method of porting a regular transport code to GPU is usually very straightforward due to the 'embarrassingly parallel' nature of MC code. However, the situation becomes different for eigenvalue calculation in that it will be performed on a generation-by-generation basis and the thread coordination should be explicitly taken care of. This paper presents our effort to develop such a GPU-based MC code in Compute Unified Device Architecture (CUDA) environment. The code is able to perform eigenvalue calculation under simple geometries on a multi-GPU system. The specifics of algorithm design, including thread organization and memory management were described in detail. The original CPU version of the code was tested on an Intel Xeon X5660 2.8 GHz CPU, and the adapted GPU version was tested on NVIDIA Tesla M2090 GPUs. Double-precision floating point format was used throughout the calculation. The result showed that a speedup of 7.0 and 33.3 were obtained for a bare spherical core and a binary slab system respectively. The speedup factor was further increased by a factor of ∼2 on a dual GPU system. The upper limit of device-level parallelism was analyzed, and a possible method to enhance the thread-level parallelism was proposed. (authors)
Graphic Arts: Book Three. The Press and Related Processes.

Science.gov (United States)

Farajollahi, Karim; And Others

The third of a three-volume set of instructional materials for a graphic arts course, this manual consists of nine instructional units dealing with presses and related processes. Covered in the units are basic press fundamentals, offset press systems, offset press operating procedures, offset inks and dampening chemistry, preventive maintenance…
The Use of Computer Graphics in the Design Process.

Science.gov (United States)

Palazzi, Maria

This master's thesis examines applications of computer technology to the field of industrial design and ways in which technology can transform the traditional process. Following a statement of the problem, the history and applications of the fields of computer graphics and industrial design are reviewed. The traditional industrial design process…
Performance modeling and optimization of sparse matrix-vector multiplication on NVIDIA CUDA platform

NARCIS (Netherlands)

Xu, S.; Xue, W.; Lin, H.X.

2011-01-01

In this article, we discuss the performance modeling and optimization of Sparse Matrix-Vector Multiplication (SpMV) on NVIDIA GPUs using CUDA. SpMV has a very low computation-data ratio and its performance is mainly bound by the memory bandwidth. We propose optimization of SpMV based on ELLPACK from
MEDINA: MECCA Development in Accelerators – KPP Fortran to CUDA source-to-source Pre-processor

Directory of Open Access Journals (Sweden)

Michail Alvanos

2017-04-01

Full Text Available The global climate model ECHAM/MESSy Atmospheric Chemistry (EMAC is a modular global model that simulates climate change and air quality scenarios. The application includes different sub-models for the calculation of chemical species concentrations, their interaction with land and sea, and the human interaction. The paper presents a source-to-source parser that enables support for Graphics Processing Units (GPU by the Kinetic Pre-Processor (KPP general purpose open-source software tool. The requirements of the host system are also described. The source code of the source-to-source parser is available under the MIT License.
A Theoretical Analysis of Learning with Graphics--Implications for Computer Graphics Design.

Science.gov (United States)

ChanLin, Lih-Juan

This paper reviews the literature pertinent to learning with graphics. The dual coding theory provides explanation about how graphics are stored and precessed in semantic memory. The level of processing theory suggests how graphics can be employed in learning to encourage deeper processing. In addition to dual coding theory and level of processing…
Programming Language Software For Graphics Applications

Science.gov (United States)

Beckman, Brian C.

1993-01-01

New approach reduces repetitive development of features common to different applications. High-level programming language and interactive environment with access to graphical hardware and software created by adding graphical commands and other constructs to standardized, general-purpose programming language, "Scheme". Designed for use in developing other software incorporating interactive computer-graphics capabilities into application programs. Provides alternative to programming entire applications in C or FORTRAN, specifically ameliorating design and implementation of complex control and data structures typifying applications with interactive graphics. Enables experimental programming and rapid development of prototype software, and yields high-level programs serving as executable versions of software-design documentation.
Fourier analysis of Solar atmospheric numerical simulations accelerated with GPUs (CUDA).

Science.gov (United States)

Marur, A.

2015-12-01

Solar dynamics from the convection zone creates a variety of waves that may propagate through the solar atmosphere. These waves are important in facilitating the energy transfer between the sun's surface and the corona as well as propagating energy throughout the solar system. How and where these waves are dissipated remains an open question. Advanced 3D numerical simulations have furthered our understanding of the processes involved. Fourier transforms to understand the nature of the waves by finding the frequency and wavelength of these waves through the simulated atmosphere, as well as the nature of their propagation and where they get dissipated. In order to analyze the different waves produced by the aforementioned simulations and models, Fast Fourier Transform algorithms will be applied. Since the processing of the multitude of different layers of the simulations (of the order of several 100^3 grid points) would be time intensive and inefficient on a CPU, CUDA, a computing architecture that harnesses the power of the GPU, will be used to accelerate the calculations.
Identification of computer graphics objects

Directory of Open Access Journals (Sweden)

Rossinskyi Yu.M.

2016-04-01

Full Text Available The article is devoted to the use of computer graphics methods in problems of creating drawings, charts, drafting, etc. The widespread use of these methods requires the development of efficient algorithms for the identification of objects of drawings. The article analyzes the model-making algorithms for this problem and considered the possibility of reducing the time using graphics editing operations. Editing results in such operations as copying, moving and deleting objects specified images. These operations allow the use of a reliable identification of images of objects methods. For information on the composition of the image of the object along with information about the identity and the color should include information about the spatial location and other characteristics of the object (the thickness and style of contour lines, fill style, and so on. In order to enable the pixel image analysis to structure the information it is necessary to enable the initial code image objects color. The article shows the results of the implementation of the algorithm of encoding object identifiers. To simplify the process of building drawings of any kind, and reduce time-consuming, method of drawing objects identification is proposed based on the use as the ID information of the object color.
Measuring Cognitive Load in Test Items: Static Graphics versus Animated Graphics

Science.gov (United States)

Dindar, M.; Kabakçi Yurdakul, I.; Inan Dönmez, F.

2015-01-01

The majority of multimedia learning studies focus on the use of graphics in learning process but very few of them examine the role of graphics in testing students' knowledge. This study investigates the use of static graphics versus animated graphics in a computer-based English achievement test from a cognitive load theory perspective. Three…
CudaFilters: A SignalPlant library for GPU-accelerated FFT and FIR filtering

Czech Academy of Sciences Publication Activity Database

Nejedlý, Petr; Plešinger, Filip; Halámek, Josef; Jurák, Pavel

2018-01-01

Roč. 48, č. 1 (2018), s. 3-9 ISSN 0038-0644 R&D Projects: GA ČR GA17-13830S; GA MŠk(CZ) LO1212; GA MŠk ED0017/01/01 Institutional support: RVO:68081731 Keywords : CUDA * FFT filter * FIR filter * GPU acceleration * SignalPlant Impact factor: 1.609, year: 2016
Partial wave analysis using graphics processing units

Energy Technology Data Exchange (ETDEWEB)

Berger, Niklaus; Liu Beijiang; Wang Jike, E-mail: nberger@ihep.ac.c [Institute of High Energy Physics, Chinese Academy of Sciences, 19B Yuquan Lu, Shijingshan, 100049 Beijing (China)

2010-04-01

Partial wave analysis is an important tool for determining resonance properties in hadron spectroscopy. For large data samples however, the un-binned likelihood fits employed are computationally very expensive. At the Beijing Spectrometer (BES) III experiment, an increase in statistics compared to earlier experiments of up to two orders of magnitude is expected. In order to allow for a timely analysis of these datasets, additional computing power with short turnover times has to be made available. It turns out that graphics processing units (GPUs) originally developed for 3D computer games have an architecture of massively parallel single instruction multiple data floating point units that is almost ideally suited for the algorithms employed in partial wave analysis. We have implemented a framework for tensor manipulation and partial wave fits called GPUPWA. The user writes a program in pure C++ whilst the GPUPWA classes handle computations on the GPU, memory transfers, caching and other technical details. In conjunction with a recent graphics processor, the framework provides a speed-up of the partial wave fit by more than two orders of magnitude compared to legacy FORTRAN code.
Synergia CUDA: GPU-accelerated accelerator modeling package

International Nuclear Information System (INIS)

Lu, Q; Amundson, J

2014-01-01

Synergia is a parallel, 3-dimensional space-charge particle-in-cell accelerator modeling code. We present our work porting the purely MPI-based version of the code to a hybrid of CPU and GPU computing kernels. The hybrid code uses the CUDA platform in the same framework as the pure MPI solution. We have implemented a lock-free collaborative charge-deposition algorithm for the GPU, as well as other optimizations, including local communication avoidance for GPUs, a customized FFT, and fine-tuned memory access patterns. On a small GPU cluster (up to 4 Tesla C1070 GPUs), our benchmarks exhibit both superior peak performance and better scaling than a CPU cluster with 16 nodes and 128 cores. We also compare the code performance on different GPU architectures, including C1070 Tesla and K20 Kepler.
Accelerating Molecular Dynamic Simulation on Graphics Processing Units

Science.gov (United States)

Friedrichs, Mark S.; Eastman, Peter; Vaidyanathan, Vishal; Houston, Mike; Legrand, Scott; Beberg, Adam L.; Ensign, Daniel L.; Bruns, Christopher M.; Pande, Vijay S.

2009-01-01

We describe a complete implementation of all-atom protein molecular dynamics running entirely on a graphics processing unit (GPU), including all standard force field terms, integration, constraints, and implicit solvent. We discuss the design of our algorithms and important optimizations needed to fully take advantage of a GPU. We evaluate its performance, and show that it can be more than 700 times faster than a conventional implementation running on a single CPU core. PMID:19191337
GPU based numerical simulation of core shooting process

Directory of Open Access Journals (Sweden)

Yi-zhong Zhang

2017-11-01

Full Text Available Core shooting process is the most widely used technique to make sand cores and it plays an important role in the quality of sand cores. Although numerical simulation can hopefully optimize the core shooting process, research on numerical simulation of the core shooting process is very limited. Based on a two-fluid model (TFM and a kinetic-friction constitutive correlation, a program for 3D numerical simulation of the core shooting process has been developed and achieved good agreements with in-situ experiments. To match the needs of engineering applications, a graphics processing unit (GPU has also been used to improve the calculation efficiency. The parallel algorithm based on the Compute Unified Device Architecture (CUDA platform can significantly decrease computing time by multi-threaded GPU. In this work, the program accelerated by CUDA parallelization method was developed and the accuracy of the calculations was ensured by comparing with in-situ experimental results photographed by a high-speed camera. The design and optimization of the parallel algorithm were discussed. The simulation result of a sand core test-piece indicated the improvement of the calculation efficiency by GPU. The developed program has also been validated by in-situ experiments with a transparent core-box, a high-speed camera, and a pressure measuring system. The computing time of the parallel program was reduced by nearly 95% while the simulation result was still quite consistent with experimental data. The GPU parallelization method can successfully solve the problem of low computational efficiency of the 3D sand shooting simulation program, and thus the developed GPU program is appropriate for engineering applications.
Optimizing Raytracing Algorithm Using CUDA

Directory of Open Access Journals (Sweden)

Sayed Ahmadreza Razian

2017-11-01

The results show that one can generate at least 11 frames per second in HD (720p resolution by GPU processor and GT 840M graphic card, using trace method. If better graphic card employ, this algorithm and program can be used to generate real-time animation.
Graphics gems II

CERN Document Server

Arvo, James

1991-01-01

Graphics Gems II is a collection of articles shared by a diverse group of people that reflect ideas and approaches in graphics programming which can benefit other computer graphics programmers.This volume presents techniques for doing well-known graphics operations faster or easier. The book contains chapters devoted to topics on two-dimensional and three-dimensional geometry and algorithms, image processing, frame buffer techniques, and ray tracing techniques. The radiosity approach, matrix techniques, and numerical and programming techniques are likewise discussed.Graphics artists and comput
Parallel Computer System for 3D Visualization Stereo on GPU

Science.gov (United States)

Al-Oraiqat, Anas M.; Zori, Sergii A.

2018-03-01

This paper proposes the organization of a parallel computer system based on Graphic Processors Unit (GPU) for 3D stereo image synthesis. The development is based on the modified ray tracing method developed by the authors for fast search of tracing rays intersections with scene objects. The system allows significant increase in the productivity for the 3D stereo synthesis of photorealistic quality. The generalized procedure of 3D stereo image synthesis on the Graphics Processing Unit/Graphics Processing Clusters (GPU/GPC) is proposed. The efficiency of the proposed solutions by GPU implementation is compared with single-threaded and multithreaded implementations on the CPU. The achieved average acceleration in multi-thread implementation on the test GPU and CPU is about 7.5 and 1.6 times, respectively. Studying the influence of choosing the size and configuration of the computational Compute Unified Device Archi-tecture (CUDA) network on the computational speed shows the importance of their correct selection. The obtained experimental estimations can be significantly improved by new GPUs with a large number of processing cores and multiprocessors, as well as optimized configuration of the computing CUDA network.
Accelerating wavelet-based video coding on graphics hardware using CUDA

NARCIS (Netherlands)

Laan, van der W.J.; Roerdink, J.B.T.M.; Jalba, A.C.; Zinterhof, P.; Loncaric, S.; Uhl, A.; Carini, A.

2009-01-01

The DiscreteWavelet Transform (DWT) has a wide range of applications from signal processing to video and image compression. This transform, by means of the lifting scheme, can be performed in a memory and computation efficient way on modern, programmable GPUs, which can be regarded as massively
Accelerating Wavelet-Based Video Coding on Graphics Hardware using CUDA

NARCIS (Netherlands)

Laan, Wladimir J. van der; Roerdink, Jos B.T.M.; Jalba, Andrei C.; Zinterhof, P; Loncaric, S; Uhl, A; Carini, A

2009-01-01

The Discrete Wavelet Transform (DWT) has a wide range of applications from signal processing to video and image compression. This transform, by means of the lifting scheme, can be performed in a memory mid computation efficient way on modern, programmable GPUs, which can be regarded as massively

CUDA Accelerated Multi-domain Volumetric Image Segmentation and Using a Higher Order Level Set Method

DEFF Research Database (Denmark)

Sharma, Ojaswa; Anton, François; Zhang, Qin

2009-01-01

-manding in terms of computation and memory space, we employ a CUDA based fast GPU segmentation and provide accuracy measures compared with an equivalent CPU implementation. Our resulting surfaces are C2-smooth resulting from tri-cubic spline interpolation algorithm. We also provide error bounds...
Distortion correction algorithm for UAV remote sensing image based on CUDA

International Nuclear Information System (INIS)

Wenhao, Zhang; Yingcheng, Li; Delong, Li; Changsheng, Teng; Jin, Liu

2014-01-01

In China, natural disasters are characterized by wide distribution, severe destruction and high impact range, and they cause significant property damage and casualties every year. Following a disaster, timely and accurate acquisition of geospatial information can provide an important basis for disaster assessment, emergency relief, and reconstruction. In recent years, Unmanned Aerial Vehicle (UAV) remote sensing systems have played an important role in major natural disasters, with UAVs becoming an important technique of obtaining disaster information. UAV is equipped with a non-metric digital camera with lens distortion, resulting in larger geometric deformation for acquired images, and affecting the accuracy of subsequent processing. The slow speed of the traditional CPU-based distortion correction algorithm cannot meet the requirements of disaster emergencies. Therefore, we propose a Compute Unified Device Architecture (CUDA)-based image distortion correction algorithm for UAV remote sensing, which takes advantage of the powerful parallel processing capability of the GPU, greatly improving the efficiency of distortion correction. Our experiments show that, compared with traditional CPU algorithms and regardless of image loading and saving times, the maximum acceleration ratio using our proposed algorithm reaches 58 times that using the traditional algorithm. Thus, data processing time can be reduced by one to two hours, thereby considerably improving disaster emergency response capability
A graphical interface to the TOUGH family of flow simulators

Energy Technology Data Exchange (ETDEWEB)

O`Sullivan, M.J.; Bullivant, D.P. [Univ. of Auckland (New Zealand)

1995-03-01

A graphical interface for the TOUGH family of simulators is presented. The interface allows the user to graphically create or modify a computer model and then to graphically examine the simulation results. The package uses the X Window System, enabling it to be used on many computer platforms.
GPU Boosted CNN Simulator Library for Graphical Flow-Based Programmability

Directory of Open Access Journals (Sweden)

Balázs Gergely Soós

2009-01-01

Full Text Available A graphical environment for CNN algorithm development is presented. The new generation of graphical cards with many general purpose processing units introduces the massively parallel computing into PC environment. Universal Machine on Flows- (UMF like notation, highlighting image flows and operations, is a useful tool to describe image processing algorithms. This documentation step can be turned into modeling using our framework backed with MATLAB Simulink and the power of a video card. This latter relatively cheap extension enables a convenient and fast analysis of CNN dynamics and complex algorithms. Comparison with other PC solutions is also presented. For single template execution, our approach yields run times 40x faster than that of the widely used Candy simulator. In the case of simpler algorithms, real-time execution is also possible.
Graphics workflow optimization when editing standard tasks using modern graphics editing programs

OpenAIRE

Khabirova, Maja

2012-01-01

This work focuses on the description and characteristics of common problems which graphic designers face daily when working for advertising agencies. This work describes tasks and organises them according to the type of graphic being processed and the types of output. In addition, this work describes the ways these common tasks can be completed using modern graphics editing software. It also provides a practical definition of a graphic designer and graphic agency. The aim of this work is to m...
Real-Time Simulation of Ship-Structure and Ship-Ship Interaction

DEFF Research Database (Denmark)

Lindberg, Ole; Glimberg, Stefan Lemvig; Bingham, Harry B.

2013-01-01

, because it is simple, easy to implement and computationally efficient. Multiple many-core graphical processing units (GPUs) are used for parallel execution and the model is implemented using a combination of C/C++, CUDA and MPI. Two ship hydrodynamic cases are presented: Kriso Container Carrier at steady...
Ultraviolet Communication for Medical Applications

Science.gov (United States)

2015-06-01

In the previous Phase I effort, Directed Energy Inc.’s (DEI) parent company Imaging Systems Technology (IST) demonstrated feasibility of several key...accurately model high path loss. Custom photon scatter code was rewritten for parallel execution on a graphics processing unit (GPU). The NVidia CUDA
Graphics Processing Units for HEP trigger systems

International Nuclear Information System (INIS)

Ammendola, R.; Bauce, M.; Biagioni, A.; Chiozzi, S.; Cotta Ramusino, A.; Fantechi, R.; Fiorini, M.; Giagu, S.; Gianoli, A.; Lamanna, G.; Lonardo, A.; Messina, A.

2016-01-01

General-purpose computing on GPUs (Graphics Processing Units) is emerging as a new paradigm in several fields of science, although so far applications have been tailored to the specific strengths of such devices as accelerator in offline computation. With the steady reduction of GPU latencies, and the increase in link and memory throughput, the use of such devices for real-time applications in high-energy physics data acquisition and trigger systems is becoming ripe. We will discuss the use of online parallel computing on GPU for synchronous low level trigger, focusing on CERN NA62 experiment trigger system. The use of GPU in higher level trigger system is also briefly considered.
Graphics Processing Units for HEP trigger systems

Energy Technology Data Exchange (ETDEWEB)

Ammendola, R. [INFN Sezione di Roma “Tor Vergata”, Via della Ricerca Scientifica 1, 00133 Roma (Italy); Bauce, M. [INFN Sezione di Roma “La Sapienza”, P.le A. Moro 2, 00185 Roma (Italy); University of Rome “La Sapienza”, P.lee A.Moro 2, 00185 Roma (Italy); Biagioni, A. [INFN Sezione di Roma “La Sapienza”, P.le A. Moro 2, 00185 Roma (Italy); Chiozzi, S.; Cotta Ramusino, A. [INFN Sezione di Ferrara, Via Saragat 1, 44122 Ferrara (Italy); University of Ferrara, Via Saragat 1, 44122 Ferrara (Italy); Fantechi, R. [INFN Sezione di Pisa, Largo B. Pontecorvo 3, 56127 Pisa (Italy); CERN, Geneve (Switzerland); Fiorini, M. [INFN Sezione di Ferrara, Via Saragat 1, 44122 Ferrara (Italy); University of Ferrara, Via Saragat 1, 44122 Ferrara (Italy); Giagu, S. [INFN Sezione di Roma “La Sapienza”, P.le A. Moro 2, 00185 Roma (Italy); University of Rome “La Sapienza”, P.lee A.Moro 2, 00185 Roma (Italy); Gianoli, A. [INFN Sezione di Ferrara, Via Saragat 1, 44122 Ferrara (Italy); University of Ferrara, Via Saragat 1, 44122 Ferrara (Italy); Lamanna, G., E-mail: gianluca.lamanna@cern.ch [INFN Sezione di Pisa, Largo B. Pontecorvo 3, 56127 Pisa (Italy); INFN Laboratori Nazionali di Frascati, Via Enrico Fermi 40, 00044 Frascati (Roma) (Italy); Lonardo, A. [INFN Sezione di Roma “La Sapienza”, P.le A. Moro 2, 00185 Roma (Italy); Messina, A. [INFN Sezione di Roma “La Sapienza”, P.le A. Moro 2, 00185 Roma (Italy); University of Rome “La Sapienza”, P.lee A.Moro 2, 00185 Roma (Italy); and others

2016-07-11

General-purpose computing on GPUs (Graphics Processing Units) is emerging as a new paradigm in several fields of science, although so far applications have been tailored to the specific strengths of such devices as accelerator in offline computation. With the steady reduction of GPU latencies, and the increase in link and memory throughput, the use of such devices for real-time applications in high-energy physics data acquisition and trigger systems is becoming ripe. We will discuss the use of online parallel computing on GPU for synchronous low level trigger, focusing on CERN NA62 experiment trigger system. The use of GPU in higher level trigger system is also briefly considered.
Fault tree graphics

International Nuclear Information System (INIS)

Bass, L.; Wynholds, H.W.; Porterfield, W.R.

1975-01-01

Described is an operational system that enables the user, through an intelligent graphics terminal, to construct, modify, analyze, and store fault trees. With this system, complex engineering designs can be analyzed. This paper discusses the system and its capabilities. Included is a brief discussion of fault tree analysis, which represents an aspect of reliability and safety modeling
Enabling Seamless Access to Digital Graphical Contents for Visually Impaired Individuals via Semantic-Aware Processing

Directory of Open Access Journals (Sweden)

Baoxin Li

2007-11-01

Full Text Available Vision is one of the main sources through which people obtain information from the world, but unfortunately, visually-impaired people are partially or completely deprived of this type of information. With the help of computer technologies, people with visual impairment can independently access digital textual information by using text-to-speech and text-to-Braille software. However, in general, there still exists a major barrier for people who are blind to access the graphical information independently in real-time without the help of sighted people. In this paper, we propose a novel multi-level and multi-modal approach aiming at addressing this challenging and practical problem, with the key idea being semantic-aware visual-to-tactile conversion through semantic image categorization and segmentation, and semantic-driven image simplification. An end-to-end prototype system was built based on the approach. We present the details of the approach and the system, report sample experimental results with realistic data, and compare our approach with current typical practice.
Enabling Seamless Access to Digital Graphical Contents for Visually Impaired Individuals via Semantic-Aware Processing

Directory of Open Access Journals (Sweden)

Wang Zheshen

2007-01-01

Full Text Available Vision is one of the main sources through which people obtain information from the world, but unfortunately, visually-impaired people are partially or completely deprived of this type of information. With the help of computer technologies, people with visual impairment can independently access digital textual information by using text-to-speech and text-to-Braille software. However, in general, there still exists a major barrier for people who are blind to access the graphical information independently in real-time without the help of sighted people. In this paper, we propose a novel multi-level and multi-modal approach aiming at addressing this challenging and practical problem, with the key idea being semantic-aware visual-to-tactile conversion through semantic image categorization and segmentation, and semantic-driven image simplification. An end-to-end prototype system was built based on the approach. We present the details of the approach and the system, report sample experimental results with realistic data, and compare our approach with current typical practice.
High-speed optical coherence tomography signal processing on GPU

International Nuclear Information System (INIS)

Li Xiqi; Shi Guohua; Zhang Yudong

2011-01-01

The signal processing speed of spectral domain optical coherence tomography (SD-OCT) has become a bottleneck in many medical applications. Recently, a time-domain interpolation method was proposed. This method not only gets a better signal-to noise ratio (SNR) but also gets a faster signal processing time for the SD-OCT than the widely used zero-padding interpolation method. Furthermore, the re-sampled data is obtained by convoluting the acquired data and the coefficients in time domain. Thus, a lot of interpolations can be performed concurrently. So, this interpolation method is suitable for parallel computing. An ultra-high optical coherence tomography signal processing can be realized by using graphics processing unit (GPU) with computer unified device architecture (CUDA). This paper will introduce the signal processing steps of SD-OCT on GPU. An experiment is performed to acquire a frame SD-OCT data (400A-linesx2048 pixel per A-line) and real-time processed the data on GPU. The results show that it can be finished in 6.208 milliseconds, which is 37 times faster than that on Central Processing Unit (CPU).
Mathematics of shape description a morphological approach to image processing and computer graphics

CERN Document Server

Ghosh, Pijush K

2009-01-01

Image processing problems are often not well defined because real images are contaminated with noise and other uncertain factors. In Mathematics of Shape Description, the authors take a mathematical approach to address these problems using the morphological and set-theoretic approach to image processing and computer graphics by presenting a simple shape model using two basic shape operators called Minkowski addition and decomposition. This book is ideal for professional researchers and engineers in Information Processing, Image Measurement, Shape Description, Shape Representation and Computer Graphics. Post-graduate and advanced undergraduate students in pure and applied mathematics, computer sciences, robotics and engineering will also benefit from this book. Key FeaturesExplains the fundamental and advanced relationships between algebraic system and shape description through the set-theoretic approachPromotes interaction of image processing geochronology and mathematics in the field of algebraic geometryP...
The PC graphics handbook

CERN Document Server

Sanchez, Julio

2003-01-01

Part I - Graphics Fundamentals PC GRAPHICS OVERVIEW History and Evolution Short History of PC Video PS/2 Video Systems SuperVGA Graphics Coprocessors and Accelerators Graphics Applications State-of-the-Art in PC Graphics 3D Application Programming Interfaces POLYGONAL MODELING Vector and Raster Data Coordinate Systems Modeling with Polygons IMAGE TRANSFORMATIONS Matrix-based Representations Matrix Arithmetic 3D Transformations PROGRAMMING MATRIX TRANSFORMATIONS Numeric Data in Matrix Form Array Processing PROJECTIONS AND RENDERING Perspective The Rendering Pipeline LIGHTING AND SHADING Lightin
Impact of memory bottleneck on the performance of graphics processing units

Science.gov (United States)

Son, Dong Oh; Choi, Hong Jun; Kim, Jong Myon; Kim, Cheol Hong

2015-12-01

Recent graphics processing units (GPUs) can process general-purpose applications as well as graphics applications with the help of various user-friendly application programming interfaces (APIs) supported by GPU vendors. Unfortunately, utilizing the hardware resource in the GPU efficiently is a challenging problem, since the GPU architecture is totally different to the traditional CPU architecture. To solve this problem, many studies have focused on the techniques for improving the system performance using GPUs. In this work, we analyze the GPU performance varying GPU parameters such as the number of cores and clock frequency. According to our simulations, the GPU performance can be improved by 125.8% and 16.2% on average as the number of cores and clock frequency increase, respectively. However, the performance is saturated when memory bottleneck problems incur due to huge data requests to the memory. The performance of GPUs can be improved as the memory bottleneck is reduced by changing GPU parameters dynamically.
GeoCrystal: graphic-interactive access to geodata archives

Science.gov (United States)

Goebel, Stefan; Haist, Joerg; Jasnoch, Uwe

2002-03-01

Recently there is spent a lot of effort to establish information systems and global infrastructures enabling both data suppliers and users to describe (-> eCommerce, metadata) as well as to find appropriate data. Examples for this are metadata information systems, online-shops or portals for geodata. The main disadvantages of existing approaches are insufficient methods and mechanisms leading users to (e.g. spatial) data archives. This affects aspects concerning usability and personalization in general as well as visual feedback techniques in the different steps of the information retrieval process. Several approaches aim at the improvement of graphical user interfaces by using intuitive metaphors, but only some of them offer 3D interfaces in the form of information landscapes or geographic result scenes in the context of information systems for geodata. This paper presents GeoCrystal, which basic idea is to adopt Venn diagrams to compose complex queries and to visualize search results in a 3D information and navigation space for geodata. These concepts are enhanced with spatial metaphors and 3D information landscapes (library for geodata) wherein users can specify searches for appropriate geodata and are enabled to graphic-interactively communicate with search results (book metaphor).
Papaya Tree Detection with UAV Images Using a GPU-Accelerated Scale-Space Filtering Method

Directory of Open Access Journals (Sweden)

Hao Jiang

2017-07-01

Full Text Available The use of unmanned aerial vehicles (UAV can allow individual tree detection for forest inventories in a cost-effective way. The scale-space filtering (SSF algorithm is commonly used and has the capability of detecting trees of different crown sizes. In this study, we made two improvements with regard to the existing method and implementations. First, we incorporated SSF with a Lab color transformation to reduce over-detection problems associated with the original luminance image. Second, we ported four of the most time-consuming processes to the graphics processing unit (GPU to improve computational efficiency. The proposed method was implemented using PyCUDA, which enabled access to NVIDIA’s compute unified device architecture (CUDA through high-level scripting of the Python language. Our experiments were conducted using two images captured by the DJI Phantom 3 Professional and a most recent NVIDIA GPU GTX1080. The resulting accuracy was high, with an F-measure larger than 0.94. The speedup achieved by our parallel implementation was 44.77 and 28.54 for the first and second test image, respectively. For each 4000 × 3000 image, the total runtime was less than 1 s, which was sufficient for real-time performance and interactive application.
GPU-Based Cloud Service for Smith-Waterman Algorithm Using Frequency Distance Filtration Scheme

Directory of Open Access Journals (Sweden)

Sheng-Ta Lee

2013-01-01

Full Text Available As the conventional means of analyzing the similarity between a query sequence and database sequences, the Smith-Waterman algorithm is feasible for a database search owing to its high sensitivity. However, this algorithm is still quite time consuming. CUDA programming can improve computations efficiently by using the computational power of massive computing hardware as graphics processing units (GPUs. This work presents a novel Smith-Waterman algorithm with a frequency-based filtration method on GPUs rather than merely accelerating the comparisons yet expending computational resources to handle such unnecessary comparisons. A user friendly interface is also designed for potential cloud server applications with GPUs. Additionally, two data sets, H1N1 protein sequences (query sequence set and human protein database (database set, are selected, followed by a comparison of CUDA-SW and CUDA-SW with the filtration method, referred to herein as CUDA-SWf. Experimental results indicate that reducing unnecessary sequence alignments can improve the computational time by up to 41%. Importantly, by using CUDA-SWf as a cloud service, this application can be accessed from any computing environment of a device with an Internet connection without time constraints.
GPU-based cloud service for Smith-Waterman algorithm using frequency distance filtration scheme.

Science.gov (United States)

Lee, Sheng-Ta; Lin, Chun-Yuan; Hung, Che Lun

2013-01-01

As the conventional means of analyzing the similarity between a query sequence and database sequences, the Smith-Waterman algorithm is feasible for a database search owing to its high sensitivity. However, this algorithm is still quite time consuming. CUDA programming can improve computations efficiently by using the computational power of massive computing hardware as graphics processing units (GPUs). This work presents a novel Smith-Waterman algorithm with a frequency-based filtration method on GPUs rather than merely accelerating the comparisons yet expending computational resources to handle such unnecessary comparisons. A user friendly interface is also designed for potential cloud server applications with GPUs. Additionally, two data sets, H1N1 protein sequences (query sequence set) and human protein database (database set), are selected, followed by a comparison of CUDA-SW and CUDA-SW with the filtration method, referred to herein as CUDA-SWf. Experimental results indicate that reducing unnecessary sequence alignments can improve the computational time by up to 41%. Importantly, by using CUDA-SWf as a cloud service, this application can be accessed from any computing environment of a device with an Internet connection without time constraints.

Graphics gems

CERN Document Server

Heckbert, Paul S

1994-01-01

Graphics Gems IV contains practical techniques for 2D and 3D modeling, animation, rendering, and image processing. The book presents articles on polygons and polyhedral; a mix of formulas, optimized algorithms, and tutorial information on the geometry of 2D, 3D, and n-D space; transformations; and parametric curves and surfaces. The text also includes articles on ray tracing; shading 3D models; and frame buffer techniques. Articles on image processing; algorithms for graphical layout; basic interpolation methods; and subroutine libraries for vector and matrix algebra are also demonstrated. Com
Graphics Processing Unit–Enhanced Genetic Algorithms for Solving the Temporal Dynamics of Gene Regulatory Networks

Science.gov (United States)

García-Calvo, Raúl; Guisado, JL; Diaz-del-Rio, Fernando; Córdoba, Antonio; Jiménez-Morales, Francisco

2018-01-01

Understanding the regulation of gene expression is one of the key problems in current biology. A promising method for that purpose is the determination of the temporal dynamics between known initial and ending network states, by using simple acting rules. The huge amount of rule combinations and the nonlinear inherent nature of the problem make genetic algorithms an excellent candidate for finding optimal solutions. As this is a computationally intensive problem that needs long runtimes in conventional architectures for realistic network sizes, it is fundamental to accelerate this task. In this article, we study how to develop efficient parallel implementations of this method for the fine-grained parallel architecture of graphics processing units (GPUs) using the compute unified device architecture (CUDA) platform. An exhaustive and methodical study of various parallel genetic algorithm schemes—master-slave, island, cellular, and hybrid models, and various individual selection methods (roulette, elitist)—is carried out for this problem. Several procedures that optimize the use of the GPU’s resources are presented. We conclude that the implementation that produces better results (both from the performance and the genetic algorithm fitness perspectives) is simulating a few thousands of individuals grouped in a few islands using elitist selection. This model comprises 2 mighty factors for discovering the best solutions: finding good individuals in a short number of generations, and introducing genetic diversity via a relatively frequent and numerous migration. As a result, we have even found the optimal solution for the analyzed gene regulatory network (GRN). In addition, a comparative study of the performance obtained by the different parallel implementations on GPU versus a sequential application on CPU is carried out. In our tests, a multifold speedup was obtained for our optimized parallel implementation of the method on medium class GPU over an equivalent
Graphics Processing Unit-Enhanced Genetic Algorithms for Solving the Temporal Dynamics of Gene Regulatory Networks.

Science.gov (United States)

García-Calvo, Raúl; Guisado, J L; Diaz-Del-Rio, Fernando; Córdoba, Antonio; Jiménez-Morales, Francisco

2018-01-01

Understanding the regulation of gene expression is one of the key problems in current biology. A promising method for that purpose is the determination of the temporal dynamics between known initial and ending network states, by using simple acting rules. The huge amount of rule combinations and the nonlinear inherent nature of the problem make genetic algorithms an excellent candidate for finding optimal solutions. As this is a computationally intensive problem that needs long runtimes in conventional architectures for realistic network sizes, it is fundamental to accelerate this task. In this article, we study how to develop efficient parallel implementations of this method for the fine-grained parallel architecture of graphics processing units (GPUs) using the compute unified device architecture (CUDA) platform. An exhaustive and methodical study of various parallel genetic algorithm schemes-master-slave, island, cellular, and hybrid models, and various individual selection methods (roulette, elitist)-is carried out for this problem. Several procedures that optimize the use of the GPU's resources are presented. We conclude that the implementation that produces better results (both from the performance and the genetic algorithm fitness perspectives) is simulating a few thousands of individuals grouped in a few islands using elitist selection. This model comprises 2 mighty factors for discovering the best solutions: finding good individuals in a short number of generations, and introducing genetic diversity via a relatively frequent and numerous migration. As a result, we have even found the optimal solution for the analyzed gene regulatory network (GRN). In addition, a comparative study of the performance obtained by the different parallel implementations on GPU versus a sequential application on CPU is carried out. In our tests, a multifold speedup was obtained for our optimized parallel implementation of the method on medium class GPU over an equivalent
GPU implementation of Bayesian neural network construction for data-intensive applications

International Nuclear Information System (INIS)

Perry, Michelle; Meyer-Baese, Anke; Prosper, Harrison B

2014-01-01

We describe a graphical processing unit (GPU) implementation of the Hybrid Markov Chain Monte Carlo (HMC) method for training Bayesian Neural Networks (BNN). Our implementation uses NVIDIA's parallel computing architecture, CUDA. We briefly review BNNs and the HMC method and we describe our implementations and give preliminary results.
Personal Supercomputing for Monte Carlo Simulation Using a GPU

Energy Technology Data Exchange (ETDEWEB)

Oh, Jae-Yong; Koo, Yang-Hyun; Lee, Byung-Ho [Korea Atomic Energy Research Institute, Daejeon (Korea, Republic of)

2008-05-15

Since the usability, accessibility, and maintenance of a personal computer (PC) are very good, a PC is a useful computer simulation tool for researchers. It has enough calculation power to simulate a small scale system with the improved performance of a PC's CPU. However, if a system is large or long time scale, we need a cluster computer or supercomputer. Recently great changes have occurred in the PC calculation environment. A graphic process unit (GPU) on a graphic card, only used to calculate display data, has a superior calculation capability to a PC's CPU. This GPU calculation performance is a match for the supercomputer in 2000. Although it has such a great calculation potential, it is not easy to program a simulation code for GPU due to difficult programming techniques for converting a calculation matrix to a 3D rendering image using graphic APIs. In 2006, NVIDIA provided the Software Development Kit (SDK) for the programming environment for NVIDIA's graphic cards, which is called the Compute Unified Device Architecture (CUDA). It makes the programming on the GPU easy without knowledge of the graphic APIs. This paper describes the basic architectures of NVIDIA's GPU and CUDA, and carries out a performance benchmark for the Monte Carlo simulation.
Personal Supercomputing for Monte Carlo Simulation Using a GPU

International Nuclear Information System (INIS)

Oh, Jae-Yong; Koo, Yang-Hyun; Lee, Byung-Ho

2008-01-01

Since the usability, accessibility, and maintenance of a personal computer (PC) are very good, a PC is a useful computer simulation tool for researchers. It has enough calculation power to simulate a small scale system with the improved performance of a PC's CPU. However, if a system is large or long time scale, we need a cluster computer or supercomputer. Recently great changes have occurred in the PC calculation environment. A graphic process unit (GPU) on a graphic card, only used to calculate display data, has a superior calculation capability to a PC's CPU. This GPU calculation performance is a match for the supercomputer in 2000. Although it has such a great calculation potential, it is not easy to program a simulation code for GPU due to difficult programming techniques for converting a calculation matrix to a 3D rendering image using graphic APIs. In 2006, NVIDIA provided the Software Development Kit (SDK) for the programming environment for NVIDIA's graphic cards, which is called the Compute Unified Device Architecture (CUDA). It makes the programming on the GPU easy without knowledge of the graphic APIs. This paper describes the basic architectures of NVIDIA's GPU and CUDA, and carries out a performance benchmark for the Monte Carlo simulation
Gfargo: Fargo for Gpu

Science.gov (United States)

Masset, Frédéric

2015-09-01

GFARGO is a GPU version of FARGO. It is written in C and C for CUDA and runs only on NVIDIA’s graphics cards. Though it corresponds to the standard, isothermal version of FARGO, not all functionnalities of the CPU version have been translated to CUDA. The code is available in single and double precision versions, the latter compatible with FERMI architectures. GFARGO can run on a graphics card connected to the display, allowing the user to see in real time how the fields evolve.
CUDA-accelerated genetic feedforward-ANN training for data mining

International Nuclear Information System (INIS)

Patulea, Catalin; Peace, Robert; Green, James

2010-01-01

We present an implementation of genetic algorithm (GA) training of feedforward artificial neural networks (ANNs) targeting commodity graphics cards (GPUs). By carefully mapping the problem onto the unique GPU architecture, we achieve order-of-magnitude speedup over a conventional CPU implementation. Furthermore, we show that the speedup is consistent across a wide range of data set sizes, making this implementation ideal for large data sets. This performance boost enables the genetic algorithm to search a larger subset of the solution space, which results in more accurate pattern classification. Finally, we demonstrate this method in the context of the 2009 UC San Diego Data Mining Contest, achieving a world-class lift on a data set of 94682 e-commerce transactions.
CUDA-accelerated genetic feedforward-ANN training for data mining

Energy Technology Data Exchange (ETDEWEB)

Patulea, Catalin; Peace, Robert; Green, James, E-mail: cpatulea@sce.carleton.ca, E-mail: rpeace@sce.carleton.ca, E-mail: jrgreen@sce.carleton.ca [School of Systems and Computer Engineering, Carleton University, Ottawa, K1S 5B6 (Canada)

2010-11-01

We present an implementation of genetic algorithm (GA) training of feedforward artificial neural networks (ANNs) targeting commodity graphics cards (GPUs). By carefully mapping the problem onto the unique GPU architecture, we achieve order-of-magnitude speedup over a conventional CPU implementation. Furthermore, we show that the speedup is consistent across a wide range of data set sizes, making this implementation ideal for large data sets. This performance boost enables the genetic algorithm to search a larger subset of the solution space, which results in more accurate pattern classification. Finally, we demonstrate this method in the context of the 2009 UC San Diego Data Mining Contest, achieving a world-class lift on a data set of 94682 e-commerce transactions.
Examination of Speed Contribution of Parallelization for Several Fingerprint Pre-Processing Algorithms

Directory of Open Access Journals (Sweden)

GORGUNOGLU, S.

2014-05-01

Full Text Available In analysis of minutiae based fingerprint systems, fingerprints needs to be pre-processed. The pre-processing is carried out to enhance the quality of the fingerprint and to obtain more accurate minutiae points. Reducing the pre-processing time is important for identification and verification in real time systems and especially for databases holding large fingerprints information. Parallel processing and parallel CPU computing can be considered as distribution of processes over multi core processor. This is done by using parallel programming techniques. Reducing the execution time is the main objective in parallel processing. In this study, pre-processing of minutiae based fingerprint system is implemented by parallel processing on multi core computers using OpenMP and on graphics processor using CUDA to improve execution time. The execution times and speedup ratios are compared with the one that of single core processor. The results show that by using parallel processing, execution time is substantially improved. The improvement ratios obtained for different pre-processing algorithms allowed us to make suggestions on the more suitable approaches for parallelization.
The Performance Improvement of the Lagrangian Particle Dispersion Model (LPDM) Using Graphics Processing Unit (GPU) Computing

Science.gov (United States)

2017-08-01

used for its GPU computing capability during the experiment. It has Nvidia Tesla K40 GPU accelerators containing 32 GPU nodes consisting of 1024...cores. CUDA is a parallel computing platform and application programming interface (API) model that was created and designed by Nvidia to give direct...Agricultural and Forest Meteorology. 1995:76:277–291, ISSN 0168-1923. 3. GPU vs. CPU? What is GPU computing? Santa Clara (CA): Nvidia Corporation; 2017
A sampler of useful computational tools for applied geometry, computer graphics, and image processing foundations for computer graphics, vision, and image processing

CERN Document Server

Cohen-Or, Daniel; Ju, Tao; Mitra, Niloy J; Shamir, Ariel; Sorkine-Hornung, Olga; Zhang, Hao (Richard)

2015-01-01

A Sampler of Useful Computational Tools for Applied Geometry, Computer Graphics, and Image Processing shows how to use a collection of mathematical techniques to solve important problems in applied mathematics and computer science areas. The book discusses fundamental tools in analytical geometry and linear algebra. It covers a wide range of topics, from matrix decomposition to curvature analysis and principal component analysis to dimensionality reduction.Written by a team of highly respected professors, the book can be used in a one-semester, intermediate-level course in computer science. It
Mechanical properties of bovine cortical bone based on the automated ball indentation technique and graphics processing method.

Science.gov (United States)

Zhang, Airong; Zhang, Song; Bian, Cuirong

2018-02-01

Cortical bone provides the main form of support in humans and other vertebrates against various forces. Thus, capturing its mechanical properties is important. In this study, the mechanical properties of cortical bone were investigated by using automated ball indentation and graphics processing at both the macroscopic and microstructural levels under dry conditions. First, all polished samples were photographed under a metallographic microscope, and the area ratio of the circumferential lamellae and osteons was calculated through the graphics processing method. Second, fully-computer-controlled automated ball indentation (ABI) tests were performed to explore the micro-mechanical properties of the cortical bone at room temperature and a constant indenter speed. The indentation defects were examined with a scanning electron microscope. Finally, the macroscopic mechanical properties of the cortical bone were estimated with the graphics processing method and mixture rule. Combining ABI and graphics processing proved to be an effective tool to obtaining the mechanical properties of the cortical bone, and the indenter size had a significant effect on the measurement. The methods presented in this paper provide an innovative approach to acquiring the macroscopic mechanical properties of cortical bone in a nondestructive manner. Copyright © 2017 Elsevier Ltd. All rights reserved.
Computer graphics visions and challenges: a European perspective.

Science.gov (United States)

Encarnação, José L

2006-01-01

I have briefly described important visions and challenges in computer graphics. They are a personal and therefore subjective selection. But most of these issues have to be addressed and solved--no matter if we call them visions or challenges or something else--if we want to make and further develop computer graphics into a key enabling technology for our IT-based society.
Systems Biology Graphical Notation: Process Description language Level 1 Version 1.3.

Science.gov (United States)

Moodie, Stuart; Le Novère, Nicolas; Demir, Emek; Mi, Huaiyu; Villéger, Alice

2015-09-04

The Systems Biological Graphical Notation (SBGN) is an international community effort for standardized graphical representations of biological pathways and networks. The goal of SBGN is to provide unambiguous pathway and network maps for readers with different scientific backgrounds as well as to support efficient and accurate exchange of biological knowledge between different research communities, industry, and other players in systems biology. Three SBGN languages, Process Description (PD), Entity Relationship (ER) and Activity Flow (AF), allow for the representation of different aspects of biological and biochemical systems at different levels of detail. The SBGN Process Description language represents biological entities and processes between these entities within a network. SBGN PD focuses on the mechanistic description and temporal dependencies of biological interactions and transformations. The nodes (elements) are split into entity nodes describing, e.g., metabolites, proteins, genes and complexes, and process nodes describing, e.g., reactions and associations. The edges (connections) provide descriptions of relationships (or influences) between the nodes, such as consumption, production, stimulation and inhibition. Among all three languages of SBGN, PD is the closest to metabolic and regulatory pathways in biological literature and textbooks, but its well-defined semantics offer a superior precision in expressing biological knowledge.
Policy Process Editor for P3BM Software

Science.gov (United States)

James, Mark; Chang, Hsin-Ping; Chow, Edward T.; Crichton, Gerald A.

2010-01-01

A computer program enables generation, in the form of graphical representations of process flows with embedded natural-language policy statements, input to a suite of policy-, process-, and performance-based management (P3BM) software. This program (1) serves as an interface between users and the Hunter software, which translates the input into machine-readable form; and (2) enables users to initialize and monitor the policy-implementation process. This program provides an intuitive graphical interface for incorporating natural-language policy statements into business-process flow diagrams. Thus, the program enables users who dictate policies to intuitively embed their intended process flows as they state the policies, reducing the likelihood of errors and reducing the time between declaration and execution of policy.
Touch-sensitive graphics terminal applied to process control

International Nuclear Information System (INIS)

Bennion, S.I.; Creager, J.D.; VanHouten, R.D.

1981-01-01

Limited initial demonstrations of the system described took place during September 1980. A single CRT was used an an input device in the control center while operating a furnace and a pellet inspection gage. These two process line devices were completely controlled, despite the longer than desired response times noted, using a single control station located in the control center. The operator could conveniently execute any function from this remote location which could be performed locally at the hard-wired control panels. With the installation of the enhancements, the integrated touchscreen/graphics terminal will provide a preferable alternative to normal keyboard command input devices
Parallel computing for data science with examples in R, C++ and CUDA

CERN Document Server

Matloff, Norman

2015-01-01

Parallel Computing for Data Science: With Examples in R, C++ and CUDA is one of the first parallel computing books to concentrate exclusively on parallel data structures, algorithms, software tools, and applications in data science. It includes examples not only from the classic ""n observations, p variables"" matrix format but also from time series, network graph models, and numerous other structures common in data science. The examples illustrate the range of issues encountered in parallel programming.With the main focus on computation, the book shows how to compute on three types of platfor
HPLOT: the graphics interface package for the HBOOK histogramming package

International Nuclear Information System (INIS)

Watkins, H.

1978-01-01

The subroutine package HPLOT described in this report, enables the CERN histogramming package HBOOK to produce high-quality pictures by means of high-resolution devices such as plotters. HPLOT can be implemented on any scientific computing system with a Fortran IV compiler and can be interfaced with any graphics package; spectral routines in addition to the basic ones enable users to embellish their histograms. Examples are also given of the use of HPLOT as a graphics package for plotting simple pictures without histograms. (Auth.)
CUDA accelerated simulation of needle insertions in deformable tissue

International Nuclear Information System (INIS)

Patriciu, Alexandru

2012-01-01

This paper presents a stiff needle-deformable tissue interaction model. The model uses a mesh-less discretization of continuum; avoiding thus the expensive remeshing required by the finite element models. The proposed model can accommodate both linear and nonlinear material characteristics. The needle-deformable tissue interaction is modeled through fundamental boundaries. The forces applied by the needle on the tissue are divided in tangent forces and constraint forces. The constraint forces are adaptively computed such that the material is properly constrained by the needle. The implementation is accelerated using NVidia CUDA. We present detailed analysis of the execution timing in both serial and parallel case. The proposed needle insertion model was integrated in a custom software that loads DICOM images, generate the deformable model, and can simulate different insertion strategies.

GPU Computing Gems Emerald Edition

CERN Document Server

Hwu, Wen-mei W

2011-01-01

".the perfect companion to Programming Massively Parallel Processors by Hwu & Kirk." -Nicolas Pinto, Research Scientist at Harvard & MIT, NVIDIA Fellow 2009-2010 Graphics processing units (GPUs) can do much more than render graphics. Scientists and researchers increasingly look to GPUs to improve the efficiency and performance of computationally-intensive experiments across a range of disciplines. GPU Computing Gems: Emerald Edition brings their techniques to you, showcasing GPU-based solutions including: Black hole simulations with CUDA GPU-accelerated computation and interactive display of
Formal Analysis of Graphical Security Models

DEFF Research Database (Denmark)

Aslanyan, Zaruhi

, software components and human actors interacting with each other to form so-called socio-technical systems. The importance of socio-technical systems to modern societies requires verifying their security properties formally, while their inherent complexity makes manual analyses impracticable. Graphical...... models for security offer an unrivalled opportunity to describe socio-technical systems, for they allow to represent different aspects like human behaviour, computation and physical phenomena in an abstract yet uniform manner. Moreover, these models can be assigned a formal semantics, thereby allowing...... formal verification of their properties. Finally, their appealing graphical notations enable to communicate security concerns in an understandable way also to non-experts, often in charge of the decision making. This dissertation argues that automated techniques can be developed on graphical security...
The graphics future in scientific applications-trends and developments in computer graphics

CERN Document Server

Enderle, G

1982-01-01

Computer graphics methods and tools are being used to a great extent in scientific research. The future development in this area will be influenced both by new hardware developments and by software advances. On the hardware sector, the development of the raster technology will lead to the increased use of colour workstations with more local processing power. Colour hardcopy devices for creating plots, slides, or movies will be available at a lower price than today. The first real 3D-workstations will appear on the marketplace. One of the main activities on the software sector is the standardization of computer graphics systems, graphical files, and device interfaces. This will lead to more portable graphical application programs and to a common base for computer graphics education.
Graphics Gems III IBM version

CERN Document Server

Kirk, David

1994-01-01

This sequel to Graphics Gems (Academic Press, 1990), and Graphics Gems II (Academic Press, 1991) is a practical collection of computer graphics programming tools and techniques. Graphics Gems III contains a larger percentage of gems related to modeling and rendering, particularly lighting and shading. This new edition also covers image processing, numerical and programming techniques, modeling and transformations, 2D and 3D geometry and algorithms,ray tracing and radiosity, rendering, and more clever new tools and tricks for graphics programming. Volume III also includes a
Image processing and computer graphics in radiology. Pt. A

International Nuclear Information System (INIS)

Toennies, K.D.

1993-01-01

The reports give a full review of all aspects of digital imaging in radiology which are of significance to image processing and the subsequent picture archiving and communication techniques. The review strongly clings to practice and illustrates the various contributions from specialized areas of the computer sciences, such as computer vision, computer graphics, database systems and information and communication systems, man-machine interactions and software engineering. Methods and models available are explained and assessed for their respective performance and value, and basic principles are briefly explained. (DG) [de
Image processing and computer graphics in radiology. Pt. B

International Nuclear Information System (INIS)

Toennies, K.D.

1993-01-01

The reports give a full review of all aspects of digital imaging in radiology which are of significance to image processing and the subsequent picture archiving and communication techniques. The review strongly clings to practice and illustrates the various contributions from specialized areas of the computer sciences, such as computer vision, computer graphics, database systems and information and communication systems, man-machine interactions and software engineering. Methods and models available are explained and assessed for their respective performance and value, and basic principles are briefly explained. (DG) [de
High-Performance Pseudo-Random Number Generation on Graphics Processing Units

OpenAIRE

Nandapalan, Nimalan; Brent, Richard P.; Murray, Lawrence M.; Rendell, Alistair

2011-01-01

This work considers the deployment of pseudo-random number generators (PRNGs) on graphics processing units (GPUs), developing an approach based on the xorgens generator to rapidly produce pseudo-random numbers of high statistical quality. The chosen algorithm has configurable state size and period, making it ideal for tuning to the GPU architecture. We present a comparison of both speed and statistical quality with other common parallel, GPU-based PRNGs, demonstrating favourable performance o...
COMPUTER GRAPHICS IN ENGINEERING GRAPHICS DEPARTMENT OF MOSCOW AVIATION INSTITUTE EDUCATIONAL PROCESS

OpenAIRE

Ludmila P. Bobrik; Leonid V. Markin

2013-01-01

Current state of technical universities students’ engineering grounding and “Engineering graphics” course place in MAI are analyzed in this paper. Also bachelor degree problems and experience of creation of issuing specialty based on «Engineering graphics» department are considered.
COMPUTER GRAPHICS IN ENGINEERING GRAPHICS DEPARTMENT OF MOSCOW AVIATION INSTITUTE EDUCATIONAL PROCESS

Directory of Open Access Journals (Sweden)

Ludmila P. Bobrik

2013-01-01

Full Text Available Current state of technical universities students’ engineering grounding and “Engineering graphics” course place in MAI are analyzed in this paper. Also bachelor degree problems and experience of creation of issuing specialty based on «Engineering graphics» department are considered.
Energy Level Composite Curves-a new graphical methodology for the integration of energy intensive processes

International Nuclear Information System (INIS)

Anantharaman, Rahul; Abbas, Own Syed; Gundersen, Truls

2006-01-01

Pinch Analysis, Exergy Analysis and Optimization have all been used independently or in combination for the energy integration of process plants. In order to address the issue of energy integration, taking into account composition and pressure effects, the concept of energy level as proposed by [X. Feng, X.X. Zhu, Combining pinch and exergy analysis for process modifications, Appl. Therm. Eng. 17 (1997) 249] has been modified and expanded in this work. We have developed a strategy for energy integration that uses process simulation tools to define the interaction between the various subsystems in the plant and a graphical technique to help the engineer interpret the results of the simulation with physical insights that point towards exploring possible integration schemes to increase energy efficiency. The proposed graphical representation of energy levels of processes is very similar to the Composite Curves of Pinch Analysis-the interpretation of the Energy Level Composite Curves reduces to the Pinch Analysis case when dealing with heat transfer. Other similarities and differences are detailed in this work. Energy integration of a methanol plant is taken as a case study to test the efficacy of this methodology. Potential integration schemes are identified that would have been difficult to visualize without the help of the new graphical representation
Canvas Pocket Reference Scripted Graphics for HTML5

CERN Document Server

Flanagan, David

2010-01-01

The Canvas element is a revolutionary feature of HTML5 that enables powerful graphics for rich Internet applications, and this pocket reference provides the essentials you need to put this element to work. If you have working knowledge of JavaScript, this book will help you create detailed, interactive, and animated graphics -- from charts to animations to video games -- whether you're a web designer or a programmer interested in graphics. Canvas Pocket Reference provides both a tutorial that covers all of the element's features with plenty of examples and a definitive reference to each of t
A Parallel Supercomputer Implementation of a Biological Inspired Neural Network and its use for Pattern Recognition

International Nuclear Information System (INIS)

De Ladurantaye, Vincent; Lavoie, Jean; Bergeron, Jocelyn; Parenteau, Maxime; Lu Huizhong; Pichevar, Ramin; Rouat, Jean

2012-01-01

A parallel implementation of a large spiking neural network is proposed and evaluated. The neural network implements the binding by synchrony process using the Oscillatory Dynamic Link Matcher (ODLM). Scalability, speed and performance are compared for 2 implementations: Message Passing Interface (MPI) and Compute Unified Device Architecture (CUDA) running on clusters of multicore supercomputers and NVIDIA graphical processing units respectively. A global spiking list that represents at each instant the state of the neural network is described. This list indexes each neuron that fires during the current simulation time so that the influence of their spikes are simultaneously processed on all computing units. Our implementation shows a good scalability for very large networks. A complex and large spiking neural network has been implemented in parallel with success, thus paving the road towards real-life applications based on networks of spiking neurons. MPI offers a better scalability than CUDA, while the CUDA implementation on a GeForce GTX 285 gives the best cost to performance ratio. When running the neural network on the GTX 285, the processing speed is comparable to the MPI implementation on RQCHP's Mammouth parallel with 64 notes (128 cores).
Integrating post-Newtonian equations on graphics processing units

Energy Technology Data Exchange (ETDEWEB)

Herrmann, Frank; Tiglio, Manuel [Department of Physics, Center for Fundamental Physics, and Center for Scientific Computation and Mathematical Modeling, University of Maryland, College Park, MD 20742 (United States); Silberholz, John [Center for Scientific Computation and Mathematical Modeling, University of Maryland, College Park, MD 20742 (United States); Bellone, Matias [Facultad de Matematica, Astronomia y Fisica, Universidad Nacional de Cordoba, Cordoba 5000 (Argentina); Guerberoff, Gustavo, E-mail: tiglio@umd.ed [Facultad de Ingenieria, Instituto de Matematica y Estadistica ' Prof. Ing. Rafael Laguardia' , Universidad de la Republica, Montevideo (Uruguay)

2010-02-07

We report on early results of a numerical and statistical study of binary black hole inspirals. The two black holes are evolved using post-Newtonian approximations starting with initially randomly distributed spin vectors. We characterize certain aspects of the distribution shortly before merger. In particular we note the uniform distribution of black hole spin vector dot products shortly before merger and a high correlation between the initial and final black hole spin vector dot products in the equal-mass, maximally spinning case. More than 300 million simulations were performed on graphics processing units, and we demonstrate a speed-up of a factor 50 over a more conventional CPU implementation. (fast track communication)
SraTailor: graphical user interface software for processing and visualizing ChIP-seq data.

Science.gov (United States)

Oki, Shinya; Maehara, Kazumitsu; Ohkawa, Yasuyuki; Meno, Chikara

2014-12-01

Raw data from ChIP-seq (chromatin immunoprecipitation combined with massively parallel DNA sequencing) experiments are deposited in public databases as SRAs (Sequence Read Archives) that are publically available to all researchers. However, to graphically visualize ChIP-seq data of interest, the corresponding SRAs must be downloaded and converted into BigWig format, a process that involves complicated command-line processing. This task requires users to possess skill with script languages and sequence data processing, a requirement that prevents a wide range of biologists from exploiting SRAs. To address these challenges, we developed SraTailor, a GUI (Graphical User Interface) software package that automatically converts an SRA into a BigWig-formatted file. Simplicity of use is one of the most notable features of SraTailor: entering an accession number of an SRA and clicking the mouse are the only steps required to obtain BigWig-formatted files and to graphically visualize the extents of reads at given loci. SraTailor is also able to make peak calls, generate files of other formats, process users' own data, and accept various command-line-like options. Therefore, this software makes ChIP-seq data fully exploitable by a wide range of biologists. SraTailor is freely available at http://www.devbio.med.kyushu-u.ac.jp/sra_tailor/, and runs on both Mac and Windows machines. © 2014 The Authors Genes to Cells © 2014 by the Molecular Biology Society of Japan and Wiley Publishing Asia Pty Ltd.
A Real-Time Early Cognitive Vision System based on a Hybrid coarse and fine grained Parallel Architecture

DEFF Research Database (Denmark)

Jensen, Lars Baunegaard With

. The current top model GPUs from NVIDIA possess up to 240 homogeneous cores. In the past, GPUs have beenhard to program, forcing the programmer to map the algorithm to the graphics processing pipeline and think in terms of vertex and fragment shaders, imposing a limiting factor in the implementation of non......-graphics applications. This, however, has changed with the introduction of the Compute Unified Device Architecture (CUDA) framework from NVIDIA. The EV and ECV stages have different parallel properties. The regular, pixel-based processing of EV fit the GPU architecture very well, and parts of ECV, on the other hand...
Communication: A reduced scaling J-engine based reformulation of SOS-MP2 using graphics processing units

Energy Technology Data Exchange (ETDEWEB)

Maurer, S. A.; Kussmann, J.; Ochsenfeld, C., E-mail: Christian.Ochsenfeld@cup.uni-muenchen.de [Chair of Theoretical Chemistry, Department of Chemistry, University of Munich (LMU), Butenandtstr. 7, D-81377 München (Germany); Center for Integrated Protein Science (CIPSM) at the Department of Chemistry, University of Munich (LMU), Butenandtstr. 5–13, D-81377 München (Germany)

2014-08-07

We present a low-prefactor, cubically scaling scaled-opposite-spin second-order Møller-Plesset perturbation theory (SOS-MP2) method which is highly suitable for massively parallel architectures like graphics processing units (GPU). The scaling is reduced from O(N{sup 5}) to O(N{sup 3}) by a reformulation of the MP2-expression in the atomic orbital basis via Laplace transformation and the resolution-of-the-identity (RI) approximation of the integrals in combination with efficient sparse algebra for the 3-center integral transformation. In contrast to previous works that employ GPUs for post Hartree-Fock calculations, we do not simply employ GPU-based linear algebra libraries to accelerate the conventional algorithm. Instead, our reformulation allows to replace the rate-determining contraction step with a modified J-engine algorithm, that has been proven to be highly efficient on GPUs. Thus, our SOS-MP2 scheme enables us to treat large molecular systems in an accurate and efficient manner on a single GPU-server.
Communication: A reduced scaling J-engine based reformulation of SOS-MP2 using graphics processing units.

Science.gov (United States)

Maurer, S A; Kussmann, J; Ochsenfeld, C

2014-08-07

We present a low-prefactor, cubically scaling scaled-opposite-spin second-order Møller-Plesset perturbation theory (SOS-MP2) method which is highly suitable for massively parallel architectures like graphics processing units (GPU). The scaling is reduced from O(N⁵) to O(N³) by a reformulation of the MP2-expression in the atomic orbital basis via Laplace transformation and the resolution-of-the-identity (RI) approximation of the integrals in combination with efficient sparse algebra for the 3-center integral transformation. In contrast to previous works that employ GPUs for post Hartree-Fock calculations, we do not simply employ GPU-based linear algebra libraries to accelerate the conventional algorithm. Instead, our reformulation allows to replace the rate-determining contraction step with a modified J-engine algorithm, that has been proven to be highly efficient on GPUs. Thus, our SOS-MP2 scheme enables us to treat large molecular systems in an accurate and efficient manner on a single GPU-server.
Hierarchical data structures for graphics program languages

International Nuclear Information System (INIS)

Gonauser, M.; Schinner, P.; Weiss, J.

1978-01-01

Graphic data processing with a computer makes exacting demands on the interactive capability of the program language and the management of the graphic data. A description of the structure of a graphics program language which has been shown by initial practical experiments to possess a particularly favorable interactive capability is followed by the evaluation of various data structures (list, tree, ring) with respect to their interactive capability in processing graphics. A practical structure is proposed. (orig.) [de
THREE-DIMENSIONAL MODELING TOOLS IN THE PROCESS OF FORMATION OF GRAPHIC COMPETENCE OF THE FUTURE BACHELOR OF COMPUTER SCIENCE

Directory of Open Access Journals (Sweden)

Kateryna P. Osadcha

2017-12-01

Full Text Available The article is devoted to some aspects of the formation of future bachelor's graphic competence in computer sciences while teaching the fundamentals for working with three-dimensional modelling means. The analysis, classification and systematization of three-dimensional modelling means are given. The aim of research consists in investigating the set of instruments and classification of three-dimensional modelling means and correlation of skills, which are being formed, concerning inquired ones at the labour market in order to use them further in the process of forming graphic competence during training future bachelors in computer sciences. The peculiarities of the process of forming future bachelor's graphic competence in computer sciences by means of revealing, analyzing and systematizing three-dimensional modelling means and types of three-dimensional graphics at present stage of the development of informational technologies are traced a line round. The result of the research is a soft-ware choice in three-dimensional modelling for the process of training future bachelors in computer sciences.
Storyboard dalam Pembuatan Motion Graphic

Directory of Open Access Journals (Sweden)

Satrya Mahardhika

2013-10-01

Full Text Available Motion graphics is one category in the animation that makes animation with lots of design elements in each component. Motion graphics needs long process including preproduction, production, and postproduction. Preproduction has an important role so that the next stage may provide guidance or instructions for the production process or the animation process. Preproduction includes research, making the story, script, screenplay, character, environment design and storyboards. The storyboard will be determined through camera angles, blocking, sets, and many supporting roles involved in a scene. Storyboard is also useful as a production reference in recording or taping each scene in sequence or as an efficient priority. The example used is an ad creation using motion graphic animation storyboard which has an important role as a blueprint for every scene and giving instructions to make the transition movement, layout, blocking, and defining camera movement that everything should be done periodically in animation production. Planning before making the animation or motion graphic will make the job more organized, presentable, and more efficient in the process.

Significance of Internet in Development of Graphic Communications

OpenAIRE

Tajana Koren

2000-01-01

It is impossible to think of development of graphic communications - even based on traditional principles - without knowledge and application of new technologies, which are enabling a new conception of graphic design. First of all, here is the significant role of Internet, as a means of communications, interactive source of information and a way of expression. New possibilities are urging a new creativity. Social aspects of new technologies should not be neglected. Only permanent education wi...
Storyboard dalam Pembuatan Motion Graphic

OpenAIRE

Satrya Mahardhika; A.F. Choiril Anam Fathoni

2013-01-01

Motion graphics is one category in the animation that makes animation with lots of design elements in each component. Motion graphics needs long process including preproduction, production, and postproduction. Preproduction has an important role so that the next stage may provide guidance or instructions for the production process or the animation process. Preproduction includes research, making the story, script, screenplay, character, environment design and storyboards. The storyboard will ...
Building probabilistic graphical models with Python

CERN Document Server

Karkera, Kiran R

2014-01-01

This is a short, practical guide that allows data scientists to understand the concepts of Graphical models and enables them to try them out using small Python code snippets, without being too mathematically complicated. If you are a data scientist who knows about machine learning and want to enhance your knowledge of graphical models, such as Bayes network, in order to use them to solve real-world problems using Python libraries, this book is for you. This book is intended for those who have some Python and machine learning experience, or are exploring the machine learning field.
Efficient pseudo-random number generation for monte-carlo simulations using graphic processors

Science.gov (United States)

Mohanty, Siddhant; Mohanty, A. K.; Carminati, F.

2012-06-01

A hybrid approach based on the combination of three Tausworthe generators and one linear congruential generator for pseudo random number generation for GPU programing as suggested in NVIDIA-CUDA library has been used for MONTE-CARLO sampling. On each GPU thread, a random seed is generated on fly in a simple way using the quick and dirty algorithm where mod operation is not performed explicitly due to unsigned integer overflow. Using this hybrid generator, multivariate correlated sampling based on alias technique has been carried out using both CUDA and OpenCL languages.
Efficient pseudo-random number generation for Monte-Carlo simulations using graphic processors

International Nuclear Information System (INIS)

Mohanty, Siddhant; Mohanty, A K; Carminati, F

2012-01-01

A hybrid approach based on the combination of three Tausworthe generators and one linear congruential generator for pseudo random number generation for GPU programing as suggested in NVIDIA-CUDA library has been used for MONTE-CARLO sampling. On each GPU thread, a random seed is generated on fly in a simple way using the quick and dirty algorithm where mod operation is not performed explicitly due to unsigned integer overflow. Using this hybrid generator, multivariate correlated sampling based on alias technique has been carried out using both CUDA and OpenCL languages.
Assembly of finite element methods on graphics processors

KAUST Repository

Cecka, Cris; Lew, Adrian J.; Darve, E.

2010-01-01

in assembling and solving sparse linear systems with NVIDIA GPUs and the Compute Unified Device Architecture (CUDA) are created and analyzed. Multiple strategies for efficient use of global, shared, and local memory, methods to achieve memory coalescing
GPFrontend and GPGraphics: graphical analysis tools for genetic association studies.

Science.gov (United States)

Uebe, Steffen; Pasutto, Francesca; Krumbiegel, Mandy; Schanze, Denny; Ekici, Arif B; Reis, André

2010-09-21

Most software packages for whole genome association studies are non-graphical, purely text based programs originally designed to run with UNIX-like operating systems. Graphical output is often not intended or supposed to be performed with other command line tools, e.g. gnuplot. Using the Microsoft .NET 2.0 platform and Visual Studio 2005, we have created a graphical software package to analyze data from microarray whole genome association studies, both for a DNA-pooling based approach as well as regular single sample data. Part of this package was made to integrate with GenePool 0.8.2, a previously existing software suite for GNU/Linux systems, which we have modified to run in a Microsoft Windows environment. Further modifications cause it to generate some additional data. This enables GenePool to interact with the .NET parts created by us. The programs we developed are GPFrontend, a graphical user interface and frontend to use GenePool and create metadata files for it, and GPGraphics, a program to further analyze and graphically evaluate output of different WGA analysis programs, among them also GenePool. Our programs enable regular MS Windows users without much experience in bioinformatics to easily visualize whole genome data from a variety of sources.
GPFrontend and GPGraphics: graphical analysis tools for genetic association studies

Directory of Open Access Journals (Sweden)

Schanze Denny

2010-09-01

Full Text Available Abstract Background Most software packages for whole genome association studies are non-graphical, purely text based programs originally designed to run with UNIX-like operating systems. Graphical output is often not intended or supposed to be performed with other command line tools, e.g. gnuplot. Results Using the Microsoft .NET 2.0 platform and Visual Studio 2005, we have created a graphical software package to analyze data from microarray whole genome association studies, both for a DNA-pooling based approach as well as regular single sample data. Part of this package was made to integrate with GenePool 0.8.2, a previously existing software suite for GNU/Linux systems, which we have modified to run in a Microsoft Windows environment. Further modifications cause it to generate some additional data. This enables GenePool to interact with the .NET parts created by us. The programs we developed are GPFrontend, a graphical user interface and frontend to use GenePool and create metadata files for it, and GPGraphics, a program to further analyze and graphically evaluate output of different WGA analysis programs, among them also GenePool. Conclusions Our programs enable regular MS Windows users without much experience in bioinformatics to easily visualize whole genome data from a variety of sources.
Comparison between research data processing capabilities of AMD and NVIDIA architecture-based graphic processors

International Nuclear Information System (INIS)

Dudnik, V.A.; Kudryavtsev, V.I.; Us, S.A.; Shestakov, M.V.

2015-01-01

A comparative analysis has been made to describe the potentialities of hardware and software tools of two most widely used modern architectures of graphic processors (AMD and NVIDIA). Special features and differences of GPU architectures are exemplified by fragments of GPGPU programs. Time consumption for the program development has been estimated. Some pieces of advice are given as to the optimum choice of the GPU type for speeding up the processing of scientific research results. Recommendations are formulated for the use of software tools that reduce the time of GPGPU application programming for the given types of graphic processors
The role of graphic design in marketing communications

OpenAIRE

Praznik, Daša

2014-01-01

Well designed visual image is the first step of advertising that has an impact on our business partners and consumers. According to Slovenian proverb that "clothes make the man" we could say the same for graphic design and marketing. Well executed graphic design and visual communications enable the communication of our product with a customer. This could be also a communication with artistic approach and a special connection. In order to obtain good results, advertisers use techniques to get...
Mathematic-Graphical Formalization of Switch Point Control Circuit Function

Directory of Open Access Journals (Sweden)

Juraj Zdansky

2004-01-01

Full Text Available This article describes authors designed method then enables mathematic – graphical formalization of system’s functional specification. The result of this method is algebraic system – finite automata that is written in transition table. This transition table is possible to overwrite to graphic form (state diagram or to mathematic form (transition and output function. This method is described by example of switch point control circuit.
Accessible high performance computing solutions for near real-time image processing for time critical applications

Science.gov (United States)

Bielski, Conrad; Lemoine, Guido; Syryczynski, Jacek

2009-09-01

High Performance Computing (HPC) hardware solutions such as grid computing and General Processing on a Graphics Processing Unit (GPGPU) are now accessible to users with general computing needs. Grid computing infrastructures in the form of computing clusters or blades are becoming common place and GPGPU solutions that leverage the processing power of the video card are quickly being integrated into personal workstations. Our interest in these HPC technologies stems from the need to produce near real-time maps from a combination of pre- and post-event satellite imagery in support of post-disaster management. Faster processing provides a twofold gain in this situation: 1. critical information can be provided faster and 2. more elaborate automated processing can be performed prior to providing the critical information. In our particular case, we test the use of the PANTEX index which is based on analysis of image textural measures extracted using anisotropic, rotation-invariant GLCM statistics. The use of this index, applied in a moving window, has been shown to successfully identify built-up areas in remotely sensed imagery. Built-up index image masks are important input to the structuring of damage assessment interpretation because they help optimise the workload. The performance of computing the PANTEX workflow is compared on two different HPC hardware architectures: (1) a blade server with 4 blades, each having dual quad-core CPUs and (2) a CUDA enabled GPU workstation. The reference platform is a dual CPU-quad core workstation and the PANTEX workflow total computing time is measured. Furthermore, as part of a qualitative evaluation, the differences in setting up and configuring various hardware solutions and the related software coding effort is presented.
Gamma camera image processing and graphical analysis mutual software system

International Nuclear Information System (INIS)

Wang Zhiqian; Chen Yongming; Ding Ailian; Ling Zhiye; Jin Yongjie

1992-01-01

GCCS gamma camera image processing and graphical analysis system is a special mutual software system. It is mainly used to analyse various patient data acquired from gamma camera. This system is used on IBM PC, PC/XT or PC/AT. It consists of several parts: system management, data management, device management, program package and user programs. The system provides two kinds of user interfaces: command menu and command characters. It is easy to change and enlarge this system because it is best modularized. The user programs include almost all the clinical protocols used now
Graphical symbol recognition

OpenAIRE

K.C. , Santosh; Wendling , Laurent

2015-01-01

International audience; The chapter focuses on one of the key issues in document image processing i.e., graphical symbol recognition. Graphical symbol recognition is a sub-field of a larger research domain: pattern recognition. The chapter covers several approaches (i.e., statistical, structural and syntactic) and specially designed symbol recognition techniques inspired by real-world industrial problems. It, in general, contains research problems, state-of-the-art methods that convey basic s...
Computer graphics from basic to application

International Nuclear Information System (INIS)

Kim, Do Hyeong; Mun, Sung Min

1998-04-01

This book mentions conception of computer graphics, background history, necessity and applied field like construction design, image processing, auto mobile design, fashion design and TV broadcast, basic principle of computer, computer graphics hardware, computer graphics software such as adobe illustrator tool box and adobe photo shop, quarkXpress like introduction, application and operating circumstance, 3D graphics with summary, difference of versions of 3D studio and system, and Auto CAD application.
Computer graphics from basic to application

Energy Technology Data Exchange (ETDEWEB)

Kim, Do Hyeong; Mun, Sung Min

1998-04-15

This book mentions conception of computer graphics, background history, necessity and applied field like construction design, image processing, auto mobile design, fashion design and TV broadcast, basic principle of computer, computer graphics hardware, computer graphics software such as adobe illustrator tool box and adobe photo shop, quarkXpress like introduction, application and operating circumstance, 3D graphics with summary, difference of versions of 3D studio and system, and Auto CAD application.
Representation stigma: Perceptions of tools and processes for design graphics

Directory of Open Access Journals (Sweden)

David Barbarash

2016-12-01

Full Text Available Practicing designers and design students across multiple fields were surveyed to measure preference and perception of traditional hand and digital tools to determine if common biases for an individual toolset are realized in practice. Significant results were found, primarily with age being a determinant in preference of graphic tools and processes; this finding demonstrates a hard line between generations of designers. Results show that while there are strong opinions in tools and processes, the realities of modern business practice and production gravitate towards digital methods despite a traditional tool preference in more experienced designers. While negative stigmas regarding computers remain, younger generations are more accepting of digital tools and images, which should eventually lead to a paradigm shift in design professions.
Development of the graphic design and control system based on a graphic simulator for the spent fuel dismantling equipment

Energy Technology Data Exchange (ETDEWEB)

Lee, J. Y.; Kim, S. H.; Song, T. G.; Yoon, J. S

2000-06-01

In this study, the graphic design system is developed for designing the spent fuel rod consolidation and the dismantling processes. This system is used throughout the design stages from the conceptual design to the motion analysis. Also, the real-time control system of the rod extracting equipment is developed. This system utilizes the graphic simulator which simulates the motion of the equipment in real time by synchronously connecting the control PC with the graphic server through the TCP/IP network. The developed system is expected to be used as an effective tool in designing the process equipment for the spent fuel management. And the real-time graphic control system can be effectively used to enhance the reliability and safety of the spent fuel handling process by providing the remote monitoring function of the process.
Development of the graphic design and control system based on a graphic simulator for the spent fuel dismantling equipment

International Nuclear Information System (INIS)

Lee, J. Y.; Kim, S. H.; Song, T. G.; Yoon, J. S.

2000-06-01

In this study, the graphic design system is developed for designing the spent fuel rod consolidation and the dismantling processes. This system is used throughout the design stages from the conceptual design to the motion analysis. Also, the real-time control system of the rod extracting equipment is developed. This system utilizes the graphic simulator which simulates the motion of the equipment in real time by synchronously connecting the control PC with the graphic server through the TCP/IP network. The developed system is expected to be used as an effective tool in designing the process equipment for the spent fuel management. And the real-time graphic control system can be effectively used to enhance the reliability and safety of the spent fuel handling process by providing the remote monitoring function of the process
A graphical method for estimating the tunneling factor for mode conversion processes

International Nuclear Information System (INIS)

Swanson, D.G.

1994-01-01

The fundamental parameter characterizing the strength of any mode conversion process is the tunneling parameter, which is typically determined from a model dispersion relation which is transformed into a differential equation. Here a graphical method is described which gives the tunneling parameter from quantities directly measured from a simple graph of the dispersion relation. The accuracy of the estimate depends only on the accuracy of the measurements

[Influence of the recording interval and a graphic organizer on the writing process/product and on other psychological variables].

Science.gov (United States)

García Sánchez, Jesús N; Rodríguez Pérez, Celestino

2007-05-01

An experimental study of the influence of the recording interval and a graphic organizer on the processes of writing composition and on the final product is presented. We studied 326 participants, age 10 to 16 years old, by means of a nested design. Two groups were compared: one group was aided in the writing process with a graphic organizer and the other was not. Each group was subdivided into two further groups: one with a mean recording interval of 45 seconds and the other with approximately 90 seconds recording interval in a writing log. The results showed that the group aided by a graphic organizer obtained better results both in processes and writing product, and that the groups assessed with an average interval of 45 seconds obtained worse results. Implications for educational practice are discussed, and limitations and future perspectives are commented on.
A graphical user interface for diagnostic radiology dosimetry using Monte Carlo (MCNP) simulation

International Nuclear Information System (INIS)

Collins, P.J.; Gorbatkov, D.; Schultz, F.W.

2000-01-01

Monte Carlo methods (for example, MCNP, EGGS4) are the 'gold standard' for both external and internal dosimetry in humans. These powerful simulation tools are, however, general-purpose codes and consequently do not provide a simple user interface for specific dosimetry tasks. We have developed a graphical user interface, for external radiation dosimetry (diagnostic radiology) using MCNP and an anthropomorphic mathematical phantom (Adam/Eva), which enables convenient modification and processing of the MCNP input and output files. The input form displays a colour coded, 3D representation of the phantom with a superimposed 'beam' for the required x-ray projection. The phantom can be rotated through 360 degrees and a transverse section at the level of the mid-point of the beam is also displayed. Text fields enable entry of input data (beam dimensions, source position, kVp, total filtration, focus-to-skin distance). A pull-down menu enables the user to select from 22 standard radiographic views. A standard projection can be modified, or new projection data entered if required. The input program modifies the MCNP input file and initiates processing. An output form displays the organ doses, normalised to unit skin entrance dose (with backscatter) (SED). The user can also enter the SED (calculated or measured) for a particular machine, to obtain the effective dose. To validate the program, the results for a PA Chest study (80 kVp, 2.5 mm Al total filtration) were compared with NRPB data (Jones and Wall, 1985). In conclusion, a convenient and reliable graphical user interface has been developed for MCNP, which enables dosimetry calculation for a full range of diagnostic radiological studies. (author)
Critique and Process: Signature Pedagogies in the Graphic Design Classroom

Science.gov (United States)

Motley, Phillip

2017-01-01

Like many disciplines in design and the visual fine arts, critique is a signature pedagogy in the graphic design classroom. It serves as both a formative and summative assessment while also giving students the opportunity to practice the habits of graphic design. Critiques help students become keen observers of relevant disciplinary criteria;…
Accelerated Hierarchical Collision Detection for Simulation using CUDA

DEFF Research Database (Denmark)

Jørgensen, Jimmy Alison; Fugl, Andreas Rune; Petersen, Henrik Gordon

2011-01-01

. The hierarchical nature of the bounding volume structure complicates an efficient implementation on massively parallel architectures such as modern graphics cards and we therefore propose a hybrid method where only box and triangle overlap tests and transformations are offloaded to the graphics card. When...
A DDC Bibliography on Optical or Graphic Information Processing (Information Sciences Series). Volume I.

Science.gov (United States)

Defense Documentation Center, Alexandria, VA.

This unclassified-unlimited bibliography contains 183 references, with abstracts, dealing specifically with optical or graphic information processing. Citations are grouped under three headings: display devices and theory, character recognition, and pattern recognition. Within each group, they are arranged in accession number (AD-number) sequence.…
GRAPHICAL MODELS OF THE AIRCRAFT MAINTENANCE PROCESS

Directory of Open Access Journals (Sweden)

Stanislav Vladimirovich Daletskiy

2017-01-01

Full Text Available The aircraft maintenance is realized by a rapid sequence of maintenance organizational and technical states, its re- search and analysis are carried out by statistical methods. The maintenance process concludes aircraft technical states con- nected with the objective patterns of technical qualities changes of the aircraft as a maintenance object and organizational states which determine the subjective organization and planning process of aircraft using. The objective maintenance pro- cess is realized in Maintenance and Repair System which does not include maintenance organization and planning and is a set of related elements: aircraft, Maintenance and Repair measures, executors and documentation that sets rules of their interaction for maintaining of the aircraft reliability and readiness for flight. The aircraft organizational and technical states are considered, their characteristics and heuristic estimates of connection in knots and arcs of graphs and of aircraft organi- zational states during regular maintenance and at technical state failure are given. It is shown that in real conditions of air- craft maintenance, planned aircraft technical state control and maintenance control through it, is only defined by Mainte- nance and Repair conditions at a given Maintenance and Repair type and form structures, and correspondingly by setting principles of Maintenance and Repair work types to the execution, due to maintenance, by aircraft and all its units mainte- nance and reconstruction strategies. The realization of planned Maintenance and Repair process determines the one of the constant maintenance component. The proposed graphical models allow to reveal quantitative correlations between graph knots to improve maintenance processes by statistical research methods, what reduces manning, timetable and expenses for providing safe civil aviation aircraft maintenance.
Microarray Я US: a user-friendly graphical interface to Bioconductor tools that enables accurate microarray data analysis and expedites comprehensive functional analysis of microarray results.

Science.gov (United States)

Dai, Yilin; Guo, Ling; Li, Meng; Chen, Yi-Bu

2012-06-08

Microarray data analysis presents a significant challenge to researchers who are unable to use the powerful Bioconductor and its numerous tools due to their lack of knowledge of R language. Among the few existing software programs that offer a graphic user interface to Bioconductor packages, none have implemented a comprehensive strategy to address the accuracy and reliability issue of microarray data analysis due to the well known probe design problems associated with many widely used microarray chips. There is also a lack of tools that would expedite the functional analysis of microarray results. We present Microarray Я US, an R-based graphical user interface that implements over a dozen popular Bioconductor packages to offer researchers a streamlined workflow for routine differential microarray expression data analysis without the need to learn R language. In order to enable a more accurate analysis and interpretation of microarray data, we incorporated the latest custom probe re-definition and re-annotation for Affymetrix and Illumina chips. A versatile microarray results output utility tool was also implemented for easy and fast generation of input files for over 20 of the most widely used functional analysis software programs. Coupled with a well-designed user interface, Microarray Я US leverages cutting edge Bioconductor packages for researchers with no knowledge in R language. It also enables a more reliable and accurate microarray data analysis and expedites downstream functional analysis of microarray results.
MGUPGMA: A Fast UPGMA Algorithm With Multiple Graphics Processing Units Using NCCL

Directory of Open Access Journals (Sweden)

Guan-Jie Hua

2017-10-01

Full Text Available A phylogenetic tree is a visual diagram of the relationship between a set of biological species. The scientists usually use it to analyze many characteristics of the species. The distance-matrix methods, such as Unweighted Pair Group Method with Arithmetic Mean and Neighbor Joining, construct a phylogenetic tree by calculating pairwise genetic distances between taxa. These methods have the computational performance issue. Although several new methods with high-performance hardware and frameworks have been proposed, the issue still exists. In this work, a novel parallel Unweighted Pair Group Method with Arithmetic Mean approach on multiple Graphics Processing Units is proposed to construct a phylogenetic tree from extremely large set of sequences. The experimental results present that the proposed approach on a DGX-1 server with 8 NVIDIA P100 graphic cards achieves approximately 3-fold to 7-fold speedup over the implementation of Unweighted Pair Group Method with Arithmetic Mean on a modern CPU and a single GPU, respectively.
MGUPGMA: A Fast UPGMA Algorithm With Multiple Graphics Processing Units Using NCCL.

Science.gov (United States)

Hua, Guan-Jie; Hung, Che-Lun; Lin, Chun-Yuan; Wu, Fu-Che; Chan, Yu-Wei; Tang, Chuan Yi

2017-01-01

A phylogenetic tree is a visual diagram of the relationship between a set of biological species. The scientists usually use it to analyze many characteristics of the species. The distance-matrix methods, such as Unweighted Pair Group Method with Arithmetic Mean and Neighbor Joining, construct a phylogenetic tree by calculating pairwise genetic distances between taxa. These methods have the computational performance issue. Although several new methods with high-performance hardware and frameworks have been proposed, the issue still exists. In this work, a novel parallel Unweighted Pair Group Method with Arithmetic Mean approach on multiple Graphics Processing Units is proposed to construct a phylogenetic tree from extremely large set of sequences. The experimental results present that the proposed approach on a DGX-1 server with 8 NVIDIA P100 graphic cards achieves approximately 3-fold to 7-fold speedup over the implementation of Unweighted Pair Group Method with Arithmetic Mean on a modern CPU and a single GPU, respectively.
The Research and Implementation of MUSER CLEAN Algorithm Based on OpenCL

Science.gov (United States)

Feng, Y.; Chen, K.; Deng, H.; Wang, F.; Mei, Y.; Wei, S. L.; Dai, W.; Yang, Q. P.; Liu, Y. B.; Wu, J. P.

2017-03-01

It's urgent to carry out high-performance data processing with a single machine in the development of astronomical software. However, due to the different configuration of the machine, traditional programming techniques such as multi-threading, and CUDA (Compute Unified Device Architecture)+GPU (Graphic Processing Unit) have obvious limitations in portability and seamlessness between different operation systems. The OpenCL (Open Computing Language) used in the development of MUSER (MingantU SpEctral Radioheliograph) data processing system is introduced. And the Högbom CLEAN algorithm is re-implemented into parallel CLEAN algorithm by the Python language and PyOpenCL extended package. The experimental results show that the CLEAN algorithm based on OpenCL has approximately equally operating efficiency compared with the former CLEAN algorithm based on CUDA. More important, the data processing in merely CPU (Central Processing Unit) environment of this system can also achieve high performance, which has solved the problem of environmental dependence of CUDA+GPU. Overall, the research improves the adaptability of the system with emphasis on performance of MUSER image clean computing. In the meanwhile, the realization of OpenCL in MUSER proves its availability in scientific data processing. In view of the high-performance computing features of OpenCL in heterogeneous environment, it will probably become the preferred technology in the future high-performance astronomical software development.
Fast DRR splat rendering using common consumer graphics hardware

International Nuclear Information System (INIS)

Spoerk, Jakob; Bergmann, Helmar; Wanschitz, Felix; Dong, Shuo; Birkfellner, Wolfgang

2007-01-01

Digitally rendered radiographs (DRR) are a vital part of various medical image processing applications such as 2D/3D registration for patient pose determination in image-guided radiotherapy procedures. This paper presents a technique to accelerate DRR creation by using conventional graphics hardware for the rendering process. DRR computation itself is done by an efficient volume rendering method named wobbled splatting. For programming the graphics hardware, NVIDIAs C for Graphics (Cg) is used. The description of an algorithm used for rendering DRRs on the graphics hardware is presented, together with a benchmark comparing this technique to a CPU-based wobbled splatting program. Results show a reduction of rendering time by about 70%-90% depending on the amount of data. For instance, rendering a volume of 2x10 6 voxels is feasible at an update rate of 38 Hz compared to 6 Hz on a common Intel-based PC using the graphics processing unit (GPU) of a conventional graphics adapter. In addition, wobbled splatting using graphics hardware for DRR computation provides higher resolution DRRs with comparable image quality due to special processing characteristics of the GPU. We conclude that DRR generation on common graphics hardware using the freely available Cg environment is a major step toward 2D/3D registration in clinical routine
GRAPHICS PROCESSING UNITS: MORE THAN THE PATHWAY TO REALISTIC VIDEO-GAMES

Directory of Open Access Journals (Sweden)

CARLOS TRUJILLO

2011-01-01

Full Text Available El amplio mercado de los juegos de video ha impulsado un acelerado progreso del hardware y software orientado a lograr ambientes de juego de mayor realidad. Entre estos desarrollos se cuentan las unidades de procesamiento gráfico (GPU, cuyo objetivo es liberar la unidad de procesamiento principal (CPU de los elaborados cómputos que proporcionan "vida" a los juegos de video. Para lograrlo, las GPUs son equipadas con múltiples núcleos de procesamiento operando en paralelo, lo cual permite utilizarlas en tareas mucho más diversas que el desarrollo de juegos de video. En este artículo se presenta una breve descripción de las características de compute unified device architecture (CUDA TM, una arquitectura de cómputo paralelo en GPUs. Se presenta una aplicación de esta arquitectura en la reconstrucción numérica de hologramas, para la cual se reporta una aceleración de 11X con respecto al desempeño alcanzado en una CPU.
Perception in statistical graphics

Science.gov (United States)

VanderPlas, Susan Ruth

There has been quite a bit of research on statistical graphics and visualization, generally focused on new types of graphics, new software to create graphics, interactivity, and usability studies. Our ability to interpret and use statistical graphics hinges on the interface between the graph itself and the brain that perceives and interprets it, and there is substantially less research on the interplay between graph, eye, brain, and mind than is sufficient to understand the nature of these relationships. The goal of the work presented here is to further explore the interplay between a static graph, the translation of that graph from paper to mental representation (the journey from eye to brain), and the mental processes that operate on that graph once it is transferred into memory (mind). Understanding the perception of statistical graphics should allow researchers to create more effective graphs which produce fewer distortions and viewer errors while reducing the cognitive load necessary to understand the information presented in the graph. Taken together, these experiments should lay a foundation for exploring the perception of statistical graphics. There has been considerable research into the accuracy of numerical judgments viewers make from graphs, and these studies are useful, but it is more effective to understand how errors in these judgments occur so that the root cause of the error can be addressed directly. Understanding how visual reasoning relates to the ability to make judgments from graphs allows us to tailor graphics to particular target audiences. In addition, understanding the hierarchy of salient features in statistical graphics allows us to clearly communicate the important message from data or statistical models by constructing graphics which are designed specifically for the perceptual system.
Fundamental Mechanisms of NeuroInformation Processing: Inverse Problems and Spike Processing

Science.gov (United States)

2016-08-04

Neurokernel software using the Python programming language and the PyCUDA in- terface to NVIDIAs CUDA GPU programming environment to avail ourselves of the...Neuroscience, UCSD. Marius Buibas, Scientist, Brain Corporation , San Diego, California. Gaute T. Einevoll, Department of Mathematical Sciences
FamSeq: a variant calling program for family-based sequencing data using graphics processing units.

Directory of Open Access Journals (Sweden)

Gang Peng

2014-10-01

Full Text Available Various algorithms have been developed for variant calling using next-generation sequencing data, and various methods have been applied to reduce the associated false positive and false negative rates. Few variant calling programs, however, utilize the pedigree information when the family-based sequencing data are available. Here, we present a program, FamSeq, which reduces both false positive and false negative rates by incorporating the pedigree information from the Mendelian genetic model into variant calling. To accommodate variations in data complexity, FamSeq consists of four distinct implementations of the Mendelian genetic model: the Bayesian network algorithm, a graphics processing unit version of the Bayesian network algorithm, the Elston-Stewart algorithm and the Markov chain Monte Carlo algorithm. To make the software efficient and applicable to large families, we parallelized the Bayesian network algorithm that copes with pedigrees with inbreeding loops without losing calculation precision on an NVIDIA graphics processing unit. In order to compare the difference in the four methods, we applied FamSeq to pedigree sequencing data with family sizes that varied from 7 to 12. When there is no inbreeding loop in the pedigree, the Elston-Stewart algorithm gives analytical results in a short time. If there are inbreeding loops in the pedigree, we recommend the Bayesian network method, which provides exact answers. To improve the computing speed of the Bayesian network method, we parallelized the computation on a graphics processing unit. This allowed the Bayesian network method to process the whole genome sequencing data of a family of 12 individuals within two days, which was a 10-fold time reduction compared to the time required for this computation on a central processing unit.
Three-dimensional photoacoustic tomography based on graphics-processing-unit-accelerated finite element method.

Science.gov (United States)

Peng, Kuan; He, Ling; Zhu, Ziqiang; Tang, Jingtian; Xiao, Jiaying

2013-12-01

Compared with commonly used analytical reconstruction methods, the frequency-domain finite element method (FEM) based approach has proven to be an accurate and flexible algorithm for photoacoustic tomography. However, the FEM-based algorithm is computationally demanding, especially for three-dimensional cases. To enhance the algorithm's efficiency, in this work a parallel computational strategy is implemented in the framework of the FEM-based reconstruction algorithm using a graphic-processing-unit parallel frame named the "compute unified device architecture." A series of simulation experiments is carried out to test the accuracy and accelerating effect of the improved method. The results obtained indicate that the parallel calculation does not change the accuracy of the reconstruction algorithm, while its computational cost is significantly reduced by a factor of 38.9 with a GTX 580 graphics card using the improved method.
Hybrid compression of video with graphics in DTV communication systems

OpenAIRE

Schaar, van der, M.; With, de, P.H.N.

2000-01-01

Advanced broadcast manipulation of TV sequences and enhanced user interfaces for TV systems have resulted in an increased amount of pre- and post-editing of video sequences, where graphical information is inserted. However, in the current broadcasting chain, there are no provisions for enabling an efficient transmission/storage of these mixed video and graphics signals and, at this emerging stage of DTV systems, introducing new standards is not desired. Nevertheless, in the professional video...
Using of new possibilities of Fermi architecture by development og GPGPU programs

International Nuclear Information System (INIS)

Dudnik, V.A.; Kudryavtsev, V.I.; Us, S.A.; Shestakov, M.V.

2013-01-01

Description of additional functions of hardware and software, which are presented in the structure of new architecture of FERMI graphic processors made by company NVIDIA, was given. Recommendations of their use within the realization of algorithms of scientific and technical calculations by means of the graphic processors were given. Application of the new possibilities of FERMI architecture and CUDA technologies (Compute Unified Device Architecture - unified hardware-software decision for parallel calculations on GPU) of NVIDIA Company was described. It was done for time reduction of applications' development which is using possibilities of GPGPU for acceleration of data processing
Three Dimensional Simulation of Ion Thruster Plume-Spacecraft Interaction Based on a Graphic Processor Unit

International Nuclear Information System (INIS)

Ren Junxue; Xie Kan; Qiu Qian; Tang Haibin; Li Juan; Tian Huabing

2013-01-01

Based on the three-dimensional particle-in-cell (PIC) method and Compute Unified Device Architecture (CUDA), a parallel particle simulation code combined with a graphic processor unit (GPU) has been developed for the simulation of charge-exchange (CEX) xenon ions in the plume of an ion thruster. Using the proposed technique, the potential and CEX plasma distribution are calculated for the ion thruster plume surrounding the DS1 spacecraft at different thrust levels. The simulation results are in good agreement with measured CEX ion parameters reported in literature, and the GPU's results are equal to a CPU's. Compared with a single CPU Intel Core 2 E6300, 16-processor GPU NVIDIA GeForce 9400 GT indicates a speedup factor of 3.6 when the total macro particle number is 1.1×10 6 . The simulation results also reveal how the back flow CEX plasma affects the spacecraft floating potential, which indicates that the plume of the ion thruster is indeed able to alleviate the extreme negative floating potentials of spacecraft in geosynchronous orbit
CUDA based Level Set Method for 3D Reconstruction of Fishes from Large Acoustic Data

DEFF Research Database (Denmark)

Sharma, Ojaswa; Anton, François

2009-01-01

Acoustic images present views of underwater dynamics, even in high depths. With multi-beam echo sounders (SONARs), it is possible to capture series of 2D high resolution acoustic images. 3D reconstruction of the water column and subsequent estimation of fish abundance and fish species identificat...... of suppressing threshold and show its convergence as the evolution proceeds. We also present a GPU based streaming computation of the method using NVIDIA's CUDA framework to handle large volume data-sets. Our implementation is optimised for memory usage to handle large volumes....

Visual Media Reasoning - Terrain-based Geolocation

Science.gov (United States)

2015-06-01

the drawings, specifications, or other data does not license the holder or any other person or corporation ; or convey any rights or permission to...3.4 Alternative Metric Investigation This section describes a graphics processor unit (GPU) based implementation in the NVIDIA CUDA programming...utilizing 2 concurrent CPU cores, each controlling a single Nvidia C2075 Tesla Fermi CUDA card. Figure 22 shows a comparison of the CPU and the GPU powered
The graphics future in scientific applications

International Nuclear Information System (INIS)

Enderle, G.

1982-01-01

Computer graphics methods and tools are being used to a great extent in scientific research. The future development in this area will be influenced both by new hardware developments and by software advances. On the hardware sector, the development of the raster technology will lead to the increased use of colour workstations with more local processing power. Colour hardcopy devices for creating plots, slides, or movies will be available at a lower price than today. The first real 3D-workstations appear on the marketplace. One of the main activities on the software sector is the standardization of computer graphics systems, graphical files, and device interfaces. This will lead to more portable graphical application programs and to a common base for computer graphics education. (orig.)
Optimization Techniques for 3D Graphics Deployment on Mobile Devices

Science.gov (United States)

Koskela, Timo; Vatjus-Anttila, Jarkko

2015-03-01

3D Internet technologies are becoming essential enablers in many application areas including games, education, collaboration, navigation and social networking. The use of 3D Internet applications with mobile devices provides location-independent access and richer use context, but also performance issues. Therefore, one of the important challenges facing 3D Internet applications is the deployment of 3D graphics on mobile devices. In this article, we present an extensive survey on optimization techniques for 3D graphics deployment on mobile devices and qualitatively analyze the applicability of each technique from the standpoints of visual quality, performance and energy consumption. The analysis focuses on optimization techniques related to data-driven 3D graphics deployment, because it supports off-line use, multi-user interaction, user-created 3D graphics and creation of arbitrary 3D graphics. The outcome of the analysis facilitates the development and deployment of 3D Internet applications on mobile devices and provides guidelines for future research.
siGnum: graphical user interface for EMG signal analysis.

Science.gov (United States)

Kaur, Manvinder; Mathur, Shilpi; Bhatia, Dinesh; Verma, Suresh

2015-01-01

Electromyography (EMG) signals that represent the electrical activity of muscles can be used for various clinical and biomedical applications. These are complicated and highly varying signals that are dependent on anatomical location and physiological properties of the muscles. EMG signals acquired from the muscles require advanced methods for detection, decomposition and processing. This paper proposes a novel Graphical User Interface (GUI) siGnum developed in MATLAB that will apply efficient and effective techniques on processing of the raw EMG signals and decompose it in a simpler manner. It could be used independent of MATLAB software by employing a deploy tool. This would enable researcher's to gain good understanding of EMG signal and its analysis procedures that can be utilized for more powerful, flexible and efficient applications in near future.
INTLIB-6, Graphic Device Interface Library for ENDF/B Processing Codes

International Nuclear Information System (INIS)

Dunford, L.

1999-01-01

1 - Description of program or function: The graphic subroutine libraries DISSPLA and GRALIB (USCD1211) generally produce output which is independent of the output graphic device. A set of device dependent interface routines is required to translate the device independent output to the form required for each graphic device available. The interface library INTLIB provides interface routines for the following output formats: TETRONIX - LN03 PLUS, - video display terminal; POSTSCRIPT - LN03 PLUS with PostScript, - LaserJet III in PostScript mode, - video display terminal; REGIS - VT240 and VT1200; HPGL - LaserJet III in HPGL mode; FR80 - COMP80 film, fiche and hard copy
Parallel Block Structured Adaptive Mesh Refinement on Graphics Processing Units

Energy Technology Data Exchange (ETDEWEB)

Beckingsale, D. A. [Atomic Weapons Establishment (AWE), Aldermaston (United Kingdom); Gaudin, W. P. [Atomic Weapons Establishment (AWE), Aldermaston (United Kingdom); Hornung, R. D. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Gunney, B. T. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Gamblin, T. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Herdman, J. A. [Atomic Weapons Establishment (AWE), Aldermaston (United Kingdom); Jarvis, S. A. [Atomic Weapons Establishment (AWE), Aldermaston (United Kingdom)

2014-11-17

Block-structured adaptive mesh refinement is a technique that can be used when solving partial differential equations to reduce the number of zones necessary to achieve the required accuracy in areas of interest. These areas (shock fronts, material interfaces, etc.) are recursively covered with finer mesh patches that are grouped into a hierarchy of refinement levels. Despite the potential for large savings in computational requirements and memory usage without a corresponding reduction in accuracy, AMR adds overhead in managing the mesh hierarchy, adding complex communication and data movement requirements to a simulation. In this paper, we describe the design and implementation of a native GPU-based AMR library, including: the classes used to manage data on a mesh patch, the routines used for transferring data between GPUs on different nodes, and the data-parallel operators developed to coarsen and refine mesh data. We validate the performance and accuracy of our implementation using three test problems and two architectures: an eight-node cluster, and over four thousand nodes of Oak Ridge National Laboratory’s Titan supercomputer. Our GPU-based AMR hydrodynamics code performs up to 4.87× faster than the CPU-based implementation, and has been scaled to over four thousand GPUs using a combination of MPI and CUDA.
Study on availability of GPU for scientific and engineering calculations

International Nuclear Information System (INIS)

Sakamoto, Kensaku; Kobayashi, Seiji

2009-07-01

Recently, the number of scientific and engineering calculations on GPUs (Graphic Processing Units) is increasing. It is said that GPUs have much higher peak floating-point processing power and memory bandwidth than CPUs (Central Processing Units). We have studied the effectiveness of GPUs by applying them to fundamental scientific and engineering calculations with CUDA (Compute Unified Device Architecture) development tools. The results have shown as follows: 1) Computations on GPUs are effective for such calculations as matrix operation, FFT (Fast Fourier Transform) and CFD (Computational Fluid Dynamics) in nuclear research region. 2) Highly-advanced programming is required for bringing out high performance of GPUs. 3) Double-precision performance is low and ECC (Error Correction Code) in graphic memory systems supports are lacking. (author)
An X window based graphics user interface for radiation information processing system developed with object-oriented programming technology

International Nuclear Information System (INIS)

Gao Wenhuan; Fu Changqing; Kang Kejun

1993-01-01

X Window is a network-oriented and network transparent windowing system, and now dominant in the Unix domain. The object-oriented programming technology can be used to change the extensibility of a software system remarkably. An introduction to graphics user interface is given. And how to develop a graphics user interface for radiation information processing system with object-oriented programming technology, which is based on X Window and independent of application is described briefly
3D Sensor-Based Obstacle Detection Comparing Octrees and Point clouds Using CUDA

Directory of Open Access Journals (Sweden)

K.B. Kaldestad

2012-10-01

Full Text Available This paper presents adaptable methods for achieving fast collision detection using the GPU and Nvidia CUDA together with Octrees. Earlier related work have focused on serial methods, while this paper presents a parallel solution which shows that there is a great increase in time if the number of operations is large. Two different models of the environment and the industrial robot are presented, the first is Octrees at different resolutions, the second is a point cloud representation. The relative merits of the two different world model representations are shown. In particular, the experimental results show the potential of adapting the resolution of the robot and environment models to the task at hand.
Struggling readers learning with graphic-rich digital science text: Effects of a Highlight & Animate Feature and Manipulable Graphics

Science.gov (United States)

Defrance, Nancy L.

Technology offers promise of 'leveling the playing field' for struggling readers. That is, instructional support features within digital texts may enable all readers to learn. This quasi-experimental study examined the effects on learning of two support features, which offered unique opportunities to interact with text. The Highlight & Animate Feature highlighted an important idea in prose, while simultaneously animating its representation in an adjacent graphic. It invited readers to integrate ideas depicted in graphics and prose, using each one to interpret the other. The Manipulable Graphics had parts that the reader could operate to discover relationships among phenomena. It invited readers to test or refine the ideas that they brought to, or gleaned from, the text. Use of these support features was compulsory. Twenty fifth grade struggling readers read a graphic-rich digital science text in a clinical interview setting, under one of two conditions: using either the Highlight & Animate Feature or the Manipulable Graphics. Participants in both conditions made statistically significant gains on a multiple choice measure of knowledge of the topic of the text. While there were no significant differences by condition in the amount of knowledge gained; there were significant differences in the quality of knowledge expressed. Transcripts revealed that understandings about light and vision, expressed by those who used the Highlight & Animate Feature, were more often conceptually and linguistically 'complete.' That is, their understandings included both a description of phenomena as well as an explanation of underlying scientific principles, which participants articulated using the vocabulary of the text. This finding may be attributed to the multiple opportunities to integrate graphics (depicting the behavior of phenomena) and prose (providing the scientific explanation of that phenomena), which characterized the Highlight & Animate Condition. Those who used the
Heuristics for the Variable Sized Bin Packing Problem Using a Hybrid P-System and CUDA Architecture

OpenAIRE

AlEnezi, Qadha'a; AboElFotoh, Hosam; AlBdaiwi, Bader; AlMulla, Mohammad Ali

2016-01-01

The Variable Sized Bin Packing Problem has a wide range of application areas including packing, scheduling, and manufacturing. Given a list of items and variable sized bin types, the objective is to minimize the total size of the used bins. This problem is known to be NP-hard. In this article, we present two new heuristics for solving the problem using a new variation of P systems with active membranes, which we call a hybrid P system, implemented in CUDA. Our hybrid P-system model allows usi...
Simulation Control Graphical User Interface Logging Report

Science.gov (United States)

Hewling, Karl B., Jr.

2012-01-01

One of the many tasks of my project was to revise the code of the Simulation Control Graphical User Interface (SIM GUI) to enable logging functionality to a file. I was also tasked with developing a script that directed the startup and initialization flow of the various LCS software components. This makes sure that a software component will not spin up until all the appropriate dependencies have been configured properly. Also I was able to assist hardware modelers in verifying the configuration of models after they have been upgraded to a new software version. I developed some code that analyzes the MDL files to determine if any error were generated due to the upgrade process. Another one of the projects assigned to me was supporting the End-to-End Hardware/Software Daily Tag-up meeting.
CORDSPW - Windows computer program package for graphical interpretation of CORD-2 data

International Nuclear Information System (INIS)

Slavic, S.; Kromar, M.

2007-01-01

The CORD-2 package, developed at Jozef Stefan Institute, enables determination of the core power distribution and reactivity. Core distributions data generated during the calculation process are stored in CORlib files. CORDSP code, which is a part of the CORD-2 package, displays and compares data contained in CORlib files. Since it runs in the DOS environment, there are several limitations in the presentation of desired data. A CORDSPW package runs in the Windows environment and offers better graphical interpretation of the CORlib data. Core distributions can be displayed, compared, rewritten in the new files and sent to the printer. The user can select the appropriate display of the presented data such as core symmetry, colour and fonts. Core radial and axial distributions can be presented and compared. There are several options to store and print data. The user can choose between standard ASCII and graphical JPG format. (author)
A Fast MHD Code for Gravitationally Stratified Media using Graphical Processing Units: SMAUG

Science.gov (United States)

Griffiths, M. K.; Fedun, V.; Erdélyi, R.

2015-03-01

Parallelization techniques have been exploited most successfully by the gaming/graphics industry with the adoption of graphical processing units (GPUs), possessing hundreds of processor cores. The opportunity has been recognized by the computational sciences and engineering communities, who have recently harnessed successfully the numerical performance of GPUs. For example, parallel magnetohydrodynamic (MHD) algorithms are important for numerical modelling of highly inhomogeneous solar, astrophysical and geophysical plasmas. Here, we describe the implementation of SMAUG, the Sheffield Magnetohydrodynamics Algorithm Using GPUs. SMAUG is a 1-3D MHD code capable of modelling magnetized and gravitationally stratified plasma. The objective of this paper is to present the numerical methods and techniques used for porting the code to this novel and highly parallel compute architecture. The methods employed are justified by the performance benchmarks and validation results demonstrating that the code successfully simulates the physics for a range of test scenarios including a full 3D realistic model of wave propagation in the solar atmosphere.
Development of Stand Alone Application Tool for Processing and Quality Measurement of Weld Imperfection Image Captured by μ-Focused Digital Radiography Using MATLAB- Based Graphical User Interface

Directory of Open Access Journals (Sweden)

PZ Nadila

2012-12-01

Full Text Available Digital radiography incresingly is being applied in the fabrication industry. Compared to film- based radiography, digitally radiographed images can be acquired with less time and fewer exposures. However, noises can simply occur on the digital image resulting in a low-quality result. Due to this and the system’s complexity, parameters’ sensitivity, and environmental effects, the results can be difficult to interpret, even for a radiographer. Therefore, the need of an application tool to improve and evaluate the image is becoming urgent. In this research, a user-friendly tool for image processing and image quality measurement was developed. The resulting tool contains important components needed by radiograph inspectors in analyzing defects and recording the results. This tool was written by using image processing and the graphical user interface development environment and compiler (GUIDE toolbox available in Matrix Laboratory (MATLAB R2008a. In image processing methods, contrast adjustment, and noise removal, edge detection was applied. In image quality measurement methods, mean square error (MSE, peak signal-to-noise ratio (PSNR, modulation transfer function (MTF, normalized signal-to-noise ratio (SNRnorm, sensitivity and unsharpness were used to measure the image quality. The graphical user interface (GUI wass then compiled to build a Windows, stand-alone application that enables this tool to be executed independently without the installation of MATLAB.
Enabling In-Theater Processes for Indigenous, Recycled, and Reclaimed Material Manufacturing

Science.gov (United States)

2015-12-01

plastic , chemicals, food, cloth , oil, grease, biological materials, animal and agricultural waste, and sludge. It is expected that one of these...ARL-TR-7560 ● DEC 2015 US Army Research Laboratory Enabling In-Theater Processes for Indigenous, Recycled , and Reclaimed Material...ARL-TR-7560 ● DEC 2015 US Army Research Laboratory Enabling In-Theater Processes for Indigenous, Recycled , and Reclaimed Material
Smoldyn on graphics processing units: massively parallel Brownian dynamics simulations.

Science.gov (United States)

Dematté, Lorenzo

2012-01-01

Space is a very important aspect in the simulation of biochemical systems; recently, the need for simulation algorithms able to cope with space is becoming more and more compelling. Complex and detailed models of biochemical systems need to deal with the movement of single molecules and particles, taking into consideration localized fluctuations, transportation phenomena, and diffusion. A common drawback of spatial models lies in their complexity: models can become very large, and their simulation could be time consuming, especially if we want to capture the systems behavior in a reliable way using stochastic methods in conjunction with a high spatial resolution. In order to deliver the promise done by systems biology to be able to understand a system as whole, we need to scale up the size of models we are able to simulate, moving from sequential to parallel simulation algorithms. In this paper, we analyze Smoldyn, a widely diffused algorithm for stochastic simulation of chemical reactions with spatial resolution and single molecule detail, and we propose an alternative, innovative implementation that exploits the parallelism of Graphics Processing Units (GPUs). The implementation executes the most computational demanding steps (computation of diffusion, unimolecular, and bimolecular reaction, as well as the most common cases of molecule-surface interaction) on the GPU, computing them in parallel on each molecule of the system. The implementation offers good speed-ups and real time, high quality graphics output
The End of the Rainbow? Color Schemes for Improved Data Graphics

Science.gov (United States)

Light, Adam; Bartlein, Patrick J.

2004-10-01

Modern computer displays and printers enable the widespread use of color in scientific communication, but the expertise for designing effective graphics has not kept pace with the technology for producing them. Historically, even the most prestigious publications have tolerated high defect rates in figures and illustrations, and technological advances that make creating and reproducing graphics easier do not appear to have decreased the frequency of errors. Flawed graphics consequently beget more flawed graphics as authors emulate published examples. Color has the potential to enhance communication, but design mistakes can result in color figures that are less effective than gray scale displays of the same data. Empirical research on human subjects can build a fundamental understanding of visual perception and scientific methods can be used to evaluate existing designs, but creating effective data graphics is a design task and not fundamentally a scientific pursuit. Like writing well, creating good data graphics requires a combination of formal knowledge and artistic sensibility tempered by experience: a combination of ``substance, statistics, and design''.
A Graphical User Interface to Generalized Linear Models in MATLAB

Directory of Open Access Journals (Sweden)

Peter Dunn

1999-07-01

Full Text Available Generalized linear models unite a wide variety of statistical models in a common theoretical framework. This paper discusses GLMLAB-software that enables such models to be fitted in the popular mathematical package MATLAB. It provides a graphical user interface to the powerful MATLAB computational engine to produce a program that is easy to use but with many features, including offsets, prior weights and user-defined distributions and link functions. MATLAB's graphical capacities are also utilized in providing a number of simple residual diagnostic plots.
High speed graphic program on a personal computer and its utilization in JIPP T-IIU online data-processing system

International Nuclear Information System (INIS)

Taniguchi, Yoshiyuki; Noda, Nobuaki; Sasao, Mamiko; Sato, Masahiro

1986-01-01

A high speed graphic program was developed on a personal computer PC9801. Using this program, one can draw a waveform of successive 16 bit-integer data, such as obtained by analog-to-digital convertor. The program is written by the machine language and has a form of a subroutine program which can be called from main programs under N 88 BASIC. The time for drawing one waveform is 4 ms, which is two orders faster than the time with standard graphic routines of BASIC interpreter. This program is very convenient for the real-time display of plasma-monitoring raw data, such as plasma current, loop voltage, rf power etc. in tokamak experiments. This program has been utilized in JIPP T-IIU experiments and enables to display data of 8 channel ADC within a few seconds before the system transmits the data from CAMAC to the computer center of the institute. The program and its utilization are presented. (author)

Computation of large covariance matrices by SAMMY on graphical processing units and multicore CPUs

International Nuclear Information System (INIS)

Arbanas, G.; Dunn, M.E.; Wiarda, D.

2011-01-01

Computational power of Graphical Processing Units and multicore CPUs was harnessed by the nuclear data evaluation code SAMMY to speed up computations of large Resonance Parameter Covariance Matrices (RPCMs). This was accomplished by linking SAMMY to vendor-optimized implementations of the matrix-matrix multiplication subroutine of the Basic Linear Algebra Library to compute the most time-consuming step. The 235 U RPCM computed previously using a triple-nested loop was re-computed using the NVIDIA implementation of the subroutine on a single Tesla Fermi Graphical Processing Unit, and also using the Intel's Math Kernel Library implementation on two different multicore CPU systems. A multiplication of two matrices of dimensions 16,000×20,000 that had previously taken days, took approximately one minute on the GPU. Comparable performance was achieved on a dual six-core CPU system. The magnitude of the speed-up suggests that these, or similar, combinations of hardware and libraries may be useful for large matrix operations in SAMMY. Uniform interfaces of standard linear algebra libraries make them a promising candidate for a programming framework of a new generation of SAMMY for the emerging heterogeneous computing platforms. (author)
Computation of large covariance matrices by SAMMY on graphical processing units and multicore CPUs

Energy Technology Data Exchange (ETDEWEB)

Arbanas, G.; Dunn, M.E.; Wiarda, D., E-mail: arbanasg@ornl.gov, E-mail: dunnme@ornl.gov, E-mail: wiardada@ornl.gov [Oak Ridge National Laboratory, Oak Ridge, TN (United States)

2011-07-01

Computational power of Graphical Processing Units and multicore CPUs was harnessed by the nuclear data evaluation code SAMMY to speed up computations of large Resonance Parameter Covariance Matrices (RPCMs). This was accomplished by linking SAMMY to vendor-optimized implementations of the matrix-matrix multiplication subroutine of the Basic Linear Algebra Library to compute the most time-consuming step. The {sup 235}U RPCM computed previously using a triple-nested loop was re-computed using the NVIDIA implementation of the subroutine on a single Tesla Fermi Graphical Processing Unit, and also using the Intel's Math Kernel Library implementation on two different multicore CPU systems. A multiplication of two matrices of dimensions 16,000×20,000 that had previously taken days, took approximately one minute on the GPU. Comparable performance was achieved on a dual six-core CPU system. The magnitude of the speed-up suggests that these, or similar, combinations of hardware and libraries may be useful for large matrix operations in SAMMY. Uniform interfaces of standard linear algebra libraries make them a promising candidate for a programming framework of a new generation of SAMMY for the emerging heterogeneous computing platforms. (author)
CT applications of medical computer graphics

International Nuclear Information System (INIS)

Rhodes, M.L.

1985-01-01

Few applications of computer graphics show as much promise and early success as that for CT. Unlike electron microscopy, ultrasound, business, military, and animation applications, CT image data are inherently digital. CT pictures can be processed directly by programs well established in the fields of computer graphics and digital image processing. Methods for reformatting digital pictures, enhancing structure shape, reducing image noise, and rendering three-dimensional (3D) scenes of anatomic structures have all become routine at many CT centers. In this chapter, the authors provide a brief introduction to computer graphics terms and techniques commonly applied to CT pictures and, when appropriate, to those showing promise for magnetic resonance images. Topics discussed here are image-processing options that are applied to digital images already constructed. In the final portion of this chapter techniques for ''slicing'' CT image data are presented, and geometric principles that describe the specification of oblique and curved images are outlined. Clinical examples are included
GapBlaster-A Graphical Gap Filler for Prokaryote Genomes.

Directory of Open Access Journals (Sweden)

Pablo H C G de Sá

Full Text Available The advent of NGS (Next Generation Sequencing technologies has resulted in an exponential increase in the number of complete genomes available in biological databases. This advance has allowed the development of several computational tools enabling analyses of large amounts of data in each of the various steps, from processing and quality filtering to gap filling and manual curation. The tools developed for gap closure are very useful as they result in more complete genomes, which will influence downstream analyses of genomic plasticity and comparative genomics. However, the gap filling step remains a challenge for genome assembly, often requiring manual intervention. Here, we present GapBlaster, a graphical application to evaluate and close gaps. GapBlaster was developed via Java programming language. The software uses contigs obtained in the assembly of the genome to perform an alignment against a draft of the genome/scaffold, using BLAST or Mummer to close gaps. Then, all identified alignments of contigs that extend through the gaps in the draft sequence are presented to the user for further evaluation via the GapBlaster graphical interface. GapBlaster presents significant results compared to other similar software and has the advantage of offering a graphical interface for manual curation of the gaps. GapBlaster program, the user guide and the test datasets are freely available at https://sourceforge.net/projects/gapblaster2015/. It requires Sun JDK 8 and Blast or Mummer.
R graphics

CERN Document Server

Murrell, Paul

2005-01-01

R is revolutionizing the world of statistical computing. Powerful, flexible, and best of all free, R is now the program of choice for tens of thousands of statisticians. Destined to become an instant classic, R Graphics presents the first complete, authoritative exposition on the R graphical system. Paul Murrell, widely known as the leading expert on R graphics, has developed an in-depth resource that takes nothing for granted and helps both neophyte and seasoned users master the intricacies of R graphics. After an introductory overview of R graphics facilities, the presentation first focuses
PROFESSIONALLY ORIENTED COURSE OF ENGINEERING-GRAPHICAL TRAINING

Directory of Open Access Journals (Sweden)

Olga V. Zhuykova

2015-01-01

Full Text Available The aim of the article is to present the results of managing the competence oriented self-directed student learning while studying graphical subjects at Kalashnikov Izhevsk State Technical University.Methods. The technology of self-directed engineering-graphical training of future bachelors based on the analysis of educational literature and teaching experience, providing individualization and professional education is suggested. The method of team expert appraisal was used at all stages of self-directed learning management. This method is one of main in qualimetry (the science concerned with assessing and evaluating the quality of any objects and processes; it permits to reveal the components of engineering-graphical competence, to establish the criteria and markers of determining the level of its development, to perform expert evaluation of student tasks and estimation procedures.Results. It has been established that the revitalization of student selfdirected learning owing to professional education and individualization permits to increase the level of student engineering-graphical competence development. Scientific novelty. The criteria evaluation procedures for determining the level of student engineering-graphical competence development in the process of their professional oriented self-directed learning while studying graphical subjects at a technical university are developed.Practical significance. The professional-focused educational trajectories of independent engineering-graphic preparation of students are designed and substantially filled in content. Such training is being realised at the present time at Kalashnikov Izhevsk State Technical University, major «Instrument Engineering».
Can we be more Graphic about Graphic Design?

OpenAIRE

Vienne, Véronique

2012-01-01

Can you objectify a subjective notion? This is the question graphic designers must face when they talk about their work. Even though graphic design artifacts are omnipresent in our culture, graphic design is still an exceptionally ill-defined profession. This is one of the reasons design criticism is still a rudimentary discipline. No one knows for sure what is this thing we sometimes call “graphic communication” for lack of a better word–a technique my Webster’s dictionary describes as “the ...
A New Parallel Approach for Accelerating the GPU-Based Execution of Edge Detection Algorithms.

Science.gov (United States)

Emrani, Zahra; Bateni, Soroosh; Rabbani, Hossein

2017-01-01

Real-time image processing is used in a wide variety of applications like those in medical care and industrial processes. This technique in medical care has the ability to display important patient information graphi graphically, which can supplement and help the treatment process. Medical decisions made based on real-time images are more accurate and reliable. According to the recent researches, graphic processing unit (GPU) programming is a useful method for improving the speed and quality of medical image processing and is one of the ways of real-time image processing. Edge detection is an early stage in most of the image processing methods for the extraction of features and object segments from a raw image. The Canny method, Sobel and Prewitt filters, and the Roberts' Cross technique are some examples of edge detection algorithms that are widely used in image processing and machine vision. In this work, these algorithms are implemented using the Compute Unified Device Architecture (CUDA), Open Source Computer Vision (OpenCV), and Matrix Laboratory (MATLAB) platforms. An existing parallel method for Canny approach has been modified further to run in a fully parallel manner. This has been achieved by replacing the breadth- first search procedure with a parallel method. These algorithms have been compared by testing them on a database of optical coherence tomography images. The comparison of results shows that the proposed implementation of the Canny method on GPU using the CUDA platform improves the speed of execution by 2-100× compared to the central processing unit-based implementation using the OpenCV and MATLAB platforms.
Processing Approaches for DAS-Enabled Continuous Seismic Monitoring

Science.gov (United States)

Dou, S.; Wood, T.; Freifeld, B. M.; Robertson, M.; McDonald, S.; Pevzner, R.; Lindsey, N.; Gelvin, A.; Saari, S.; Morales, A.; Ekblaw, I.; Wagner, A. M.; Ulrich, C.; Daley, T. M.; Ajo Franklin, J. B.

2017-12-01

Distributed Acoustic Sensing (DAS) is creating a "field as laboratory" capability for seismic monitoring of subsurface changes. By providing unprecedented spatial and temporal sampling at a relatively low cost, DAS enables field-scale seismic monitoring to have durations and temporal resolutions that are comparable to those of laboratory experiments. Here we report on seismic processing approaches developed during data analyses of three case studies all using DAS-enabled seismic monitoring with applications ranging from shallow permafrost to deep reservoirs: (1) 10-hour downhole monitoring of cement curing at Otway, Australia; (2) 2-month surface monitoring of controlled permafrost thaw at Fairbanks, Alaska; (3) multi-month downhole and surface monitoring of carbon sequestration at Decatur, Illinois. We emphasize the data management and processing components relevant to DAS-based seismic monitoring, which include scalable approaches to data management, pre-processing, denoising, filtering, and wavefield decomposition. DAS has dramatically increased the data volume to the extent that terabyte-per-day data loads are now typical, straining conventional approaches to data storage and processing. To achieve more efficient use of disk space and network bandwidth, we explore improved file structures and data compression schemes. Because noise floor of DAS measurements is higher than that of conventional sensors, optimal processing workflow involving advanced denoising, deconvolution (of the source signatures), and stacking approaches are being established to maximize signal content of DAS data. The resulting workflow of data management and processing could accelerate the broader adaption of DAS for continuous monitoring of critical processes.
Modeling And Simulation As The Basis For Hybridity In The Graphic Discipline Learning/Teaching Area

Directory of Open Access Journals (Sweden)

Jana Žiljak Vujić

2009-01-01

Full Text Available Only some fifteen years have passed since the scientific graphics discipline was established. In the transition period from the College of Graphics to «Integrated Graphic Technology Studies» to the contemporary Faculty of Graphics Arts with the University in Zagreb, three main periods of development can be noted: digital printing, computer prepress and automatic procedures in postpress packaging production. Computer technology has enabled a change in the methodology of teaching graphics technology and studying it on the level of secondary and higher education. The task has been set to create tools for simulating printing processes in order to master the program through a hybrid system consisting of methods that are separate in relation to one another: learning with the help of digital models and checking in the actual real system. We are setting a hybrid project for teaching because the overall acquired knowledge is the result of completely different methods. The first method is on the free programs level functioning without consequences. Everything remains as a record in the knowledge database that can be analyzed, statistically processed and repeated with new parameter values of the system being researched. The second method uses the actual real system where the results are in proving the value of new knowledge and this is something that encourages and stimulates new cycles of hybrid behavior in mastering programs. This is the area where individual learning incurs. The hybrid method allows the possibility of studying actual situations on a computer model, proving it on an actual real model and entering the area of learning envisaging future development.
Modeling and Simulation as the Basis for Hybridity in the Graphic Discipline Learning/Teaching Area

Directory of Open Access Journals (Sweden)

Vilko Ziljak

2009-11-01

Full Text Available Only some fifteen years have passed since the scientific graphics discipline was established. In the transition period from the College of Graphics to «Integrated Graphic Technology Studies» to the contemporary Faculty of Graphics Arts with the University in Zagreb, three main periods of development can be noted: digital printing, computer prepress and automatic procedures in postpress packaging production. Computer technology has enabled a change in the methodology of teaching graphics technology and studying it on the level of secondary and higher education. The task has been set to create tools for simulating printing processes in order to master the program through a hybrid system consisting of methods that are separate in relation to one another: learning with the help of digital models and checking in the actual real system. We are setting a hybrid project for teaching because the overall acquired knowledge is the result of completely different methods. The first method is on the free programs level functioning without consequences. Everything remains as a record in the knowledge database that can be analyzed, statistically processed and repeated with new parameter values of the system being researched. The second method uses the actual real system where the results are in proving the value of new knowledge and this is something that encourages and stimulates new cycles of hybrid behavior in mastering programs. This is the area where individual learning incurs. The hybrid method allows the possibility of studying actual situations on a computer model, proving it on an actual real model and entering the area of learning envisaging future development.
Fully 3-D list-mode positron emission tomography image reconstruction on a multi-GPU cluster

Energy Technology Data Exchange (ETDEWEB)

Cui, Jingyu [Stanford Univ., CA (United States). Dept. of Electrical Engineering; Prevrhal, Sven; Shao, Lingxiong [Philips Healthcare, San Jose, CA (United States); Pratx, Guillem [Stanford Univ., CA (United States). Dept. of Radiation Oncology; Levin, Craig S. [Stanford Univ., CA (United States). Dept. of Radiology, Electrical Engineering, and Physics; Stanford Univ., CA (United States). Molecular Imaging Program at Stanford (MIPS); Stanford Univ., CA (United States). School of Medicine

2011-07-01

List-mode processing is an efficient way of dealing with the sparse nature of PET data sets, and is the processing method of choice for time-of-flight (ToF) PET. We present a novel method of computing line projection operations required for list-mode ordered subsets expectation maximization (OSEM) for fully 3-D PET image reconstruction on a graphics processing unit (GPU) using the compute unified device architecture (CUDA) framework. Our method overcomes challenges such as compute thread divergence, and exploits GPU capabilities such as shared memory and atomic operations. When applied to line projection operations for list-mode time-of-flight PET, this new GPU-CUDA reformulation is 188X faster than a single-threaded reference CPU implementation. When embedded in a multi-process environment on a GPU-equipped small cluster, a speedup of 4X was observed over the same configuration but without GPU support. Image quality is preserved with root mean squared (RMS) deviation of 0.05% between CPU and GPU-generated images, which has negligible effect in typical clinical applications. (orig.)
Interactive Graphic Journalism

Directory of Open Access Journals (Sweden)

Laura Schlichting

2016-12-01

Full Text Available This paper examines graphic journalism (GJ in a transmedial context, and argues that transmedial graphic journalism (TMGJ is an important and fruitful new form of visual storytelling, that will re-invigorate the field of journalism, as it steadily tests out and plays with new media, ultimately leading to new challenges in both the production and reception process. With TMGJ, linear narratives may be broken up and ethical issues concerning the emotional and entertainment value are raised when it comes to ‘playing the news’. The aesthetic characteristics of TMGJ will be described and interactivity’s influence on non-fiction storytelling will be explored in an analysis of The Nisoor Square Shooting (2011 and Ferguson Firsthand (2015.
Acceleration of Meshfree Radial Point Interpolation Method on Graphics Hardware

International Nuclear Information System (INIS)

Nakata, Susumu

2008-01-01

This article describes a parallel computational technique to accelerate radial point interpolation method (RPIM)-based meshfree method using graphics hardware. RPIM is one of the meshfree partial differential equation solvers that do not require the mesh structure of the analysis targets. In this paper, a technique for accelerating RPIM using graphics hardware is presented. In the method, the computation process is divided into small processes suitable for processing on the parallel architecture of the graphics hardware in a single instruction multiple data manner.
Opticks : GPU Optical Photon Simulation for Particle Physics using NVIDIA® OptiX™

Science.gov (United States)

C, Blyth Simon

2017-10-01

Opticks is an open source project that integrates the NVIDIA OptiX GPU ray tracing engine with Geant4 toolkit based simulations. Massive parallelism brings drastic performance improvements with optical photon simulation speedup expected to exceed 1000 times Geant4 when using workstation GPUs. Optical photon simulation time becomes effectively zero compared to the rest of the simulation. Optical photons from scintillation and Cherenkov processes are allocated, generated and propagated entirely on the GPU, minimizing transfer overheads and allowing CPU memory usage to be restricted to optical photons that hit photomultiplier tubes or other photon detectors. Collecting hits into standard Geant4 hit collections then allows the rest of the simulation chain to proceed unmodified. Optical physics processes of scattering, absorption, scintillator reemission and boundary processes are implemented in CUDA OptiX programs based on the Geant4 implementations. Wavelength dependent material and surface properties as well as inverse cumulative distribution functions for reemission are interleaved into GPU textures providing fast interpolated property lookup or wavelength generation. Geometry is provided to OptiX in the form of CUDA programs that return bounding boxes for each primitive and ray geometry intersection positions. Some critical parts of the geometry such as photomultiplier tubes have been implemented analytically with the remainder being tessellated. OptiX handles the creation and application of a choice of acceleration structures such as boundary volume hierarchies and the transparent use of multiple GPUs. OptiX supports interoperation with OpenGL and CUDA Thrust that has enabled unprecedented visualisations of photon propagations to be developed using OpenGL geometry shaders to provide interactive time scrubbing and CUDA Thrust photon indexing to enable interactive history selection.
Graphics processing units accelerated semiclassical initial value representation molecular dynamics

Energy Technology Data Exchange (ETDEWEB)

Tamascelli, Dario; Dambrosio, Francesco Saverio [Dipartimento di Fisica, Università degli Studi di Milano, via Celoria 16, 20133 Milano (Italy); Conte, Riccardo [Department of Chemistry and Cherry L. Emerson Center for Scientific Computation, Emory University, Atlanta, Georgia 30322 (United States); Ceotto, Michele, E-mail: michele.ceotto@unimi.it [Dipartimento di Chimica, Università degli Studi di Milano, via Golgi 19, 20133 Milano (Italy)

2014-05-07

This paper presents a Graphics Processing Units (GPUs) implementation of the Semiclassical Initial Value Representation (SC-IVR) propagator for vibrational molecular spectroscopy calculations. The time-averaging formulation of the SC-IVR for power spectrum calculations is employed. Details about the GPU implementation of the semiclassical code are provided. Four molecules with an increasing number of atoms are considered and the GPU-calculated vibrational frequencies perfectly match the benchmark values. The computational time scaling of two GPUs (NVIDIA Tesla C2075 and Kepler K20), respectively, versus two CPUs (Intel Core i5 and Intel Xeon E5-2687W) and the critical issues related to the GPU implementation are discussed. The resulting reduction in computational time and power consumption is significant and semiclassical GPU calculations are shown to be environment friendly.
Enabling Process Alignment for IT Entrepreneurship

Directory of Open Access Journals (Sweden)

Sonia D. Bot

2012-11-01

Full Text Available All firms use information technology (IT. Larger firms have IT organizations whose business function is to supply and manage IT infrastructure and applications to support the firm's business objectives. Regardless of whether the IT function has been outsourced or is resident within a firm, the objectives of the IT organization must be aligned to the strategic needs of the business. It is often a challenge to balance the demand for IT against the available supply within the firm. Most IT organizations have little capacity to carry out activities that go beyond the incremental ones that are needed to run the immediate needs of the business. A process-ambidexterity framework for IT improves the IT organization's entrepreneurial ability, which in turn, better aligns the IT function with the business functions in the firm. Process ambidexterity utilizes both process alignment and process adaptability. This article presents a framework for process alignment in IT. This is useful for understanding how the processes in Business Demand Management, a core component of the process-ambidexterity framework for IT, relate to those in IT Governance and IT Supply Chain Management. The framework is presented through three lenses (governance, business, and technology along with real-world examples from major firms in the USA. Enabling process alignment in the IT function, and process ambidexterity overall, benefits those who govern IT, the executives who lead IT, as well as their peers in the business functions that depend on IT.
Graphics processing units in bioinformatics, computational biology and systems biology.

Science.gov (United States)

Nobile, Marco S; Cazzaniga, Paolo; Tangherloni, Andrea; Besozzi, Daniela

2017-09-01

Several studies in Bioinformatics, Computational Biology and Systems Biology rely on the definition of physico-chemical or mathematical models of biological systems at different scales and levels of complexity, ranging from the interaction of atoms in single molecules up to genome-wide interaction networks. Traditional computational methods and software tools developed in these research fields share a common trait: they can be computationally demanding on Central Processing Units (CPUs), therefore limiting their applicability in many circumstances. To overcome this issue, general-purpose Graphics Processing Units (GPUs) are gaining an increasing attention by the scientific community, as they can considerably reduce the running time required by standard CPU-based software, and allow more intensive investigations of biological systems. In this review, we present a collection of GPU tools recently developed to perform computational analyses in life science disciplines, emphasizing the advantages and the drawbacks in the use of these parallel architectures. The complete list of GPU-powered tools here reviewed is available at http://bit.ly/gputools. © The Author 2016. Published by Oxford University Press.
Implementation techniques and acceleration of DBPF reconstruction algorithm based on GPGPU for helical cone beam CT

International Nuclear Information System (INIS)

Shen Le; Xing Yuxiang

2010-01-01

The derivative back-projection filtered algorithm for a helical cone-beam CT is a newly developed exact reconstruction method. Due to its large computational complexity, the reconstruction is rather slow for practical use. General purpose graphic processing unit (GPGPU) is an SIMD paralleled hardware architecture with powerful float-point operation capacity. In this paper,we propose a new method for PI-line choice and sampling grid, and a paralleled PI-line reconstruction algorithm implemented on NVIDIA's Compute Unified Device Architecture (CUDA). Numerical simulation studies are carried out to validate our method. Compared with conventional CPU implementation, the CUDA accelerated method provides images of the same quality with a speedup factor of 318. Optimization strategies for the GPU acceleration are presented. Finally, influence of the parameters of the PI-line samples on the reconstruction speed and image quality is discussed. (authors)
Genetic particle swarm parallel algorithm analysis of optimization arrangement on mistuned blades

Science.gov (United States)

Zhao, Tianyu; Yuan, Huiqun; Yang, Wenjun; Sun, Huagang

2017-12-01

This article introduces a method of mistuned parameter identification which consists of static frequency testing of blades, dichotomy and finite element analysis. A lumped parameter model of an engine bladed-disc system is then set up. A bladed arrangement optimization method, namely the genetic particle swarm optimization algorithm, is presented. It consists of a discrete particle swarm optimization and a genetic algorithm. From this, the local and global search ability is introduced. CUDA-based co-evolution particle swarm optimization, using a graphics processing unit, is presented and its performance is analysed. The results show that using optimization results can reduce the amplitude and localization of the forced vibration response of a bladed-disc system, while optimization based on the CUDA framework can improve the computing speed. This method could provide support for engineering applications in terms of effectiveness and efficiency.

General Purpose Graphics Processing Unit Based High-Rate Rice Decompression and Reed-Solomon Decoding

Energy Technology Data Exchange (ETDEWEB)

Loughry, Thomas A. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

2015-02-01

As the volume of data acquired by space-based sensors increases, mission data compression/decompression and forward error correction code processing performance must likewise scale. This competency development effort was explored using the General Purpose Graphics Processing Unit (GPGPU) to accomplish high-rate Rice Decompression and high-rate Reed-Solomon (RS) decoding at the satellite mission ground station. Each algorithm was implemented and benchmarked on a single GPGPU. Distributed processing across one to four GPGPUs was also investigated. The results show that the GPGPU has considerable potential for performing satellite communication Data Signal Processing, with three times or better performance improvements and up to ten times reduction in cost over custom hardware, at least in the case of Rice Decompression and Reed-Solomon Decoding.
Spatial issues in user interface design from a graphic design perspective

Science.gov (United States)

Marcus, Aaron

1989-01-01

The user interface of a computer system is a visual display that provides information about the status of operations on data within the computer and control options to the user that enable adjustments to these operations. From the very beginning of computer technology the user interface was a spatial display, although its spatial features were not necessarily complex or explicitly recognized by the users. All text and nonverbal signs appeared in a virtual space generally thought of as a single flat plane of symbols. Current technology of high performance workstations permits any element of the display to appear as dynamic, multicolor, 3-D signs in a virtual 3-D space. The complexity of appearance and the user's interaction with the display provide significant challenges to the graphic designer of current and future user interfaces. In particular, spatial depiction provides many opportunities for effective communication of objects, structures, processes, navigation, selection, and manipulation. Issues are presented that are relevant to the graphic designer seeking to optimize the user interface's spatial attributes for effective visual communication.
Graphical Model Theory for Wireless Sensor Networks

International Nuclear Information System (INIS)

Davis, William B.

2002-01-01

Information processing in sensor networks, with many small processors, demands a theory of computation that allows the minimization of processing effort, and the distribution of this effort throughout the network. Graphical model theory provides a probabilistic theory of computation that explicitly addresses complexity and decentralization for optimizing network computation. The junction tree algorithm, for decentralized inference on graphical probability models, can be instantiated in a variety of applications useful for wireless sensor networks, including: sensor validation and fusion; data compression and channel coding; expert systems, with decentralized data structures, and efficient local queries; pattern classification, and machine learning. Graphical models for these applications are sketched, and a model of dynamic sensor validation and fusion is presented in more depth, to illustrate the junction tree algorithm
Hybrid compression of video with graphics in DTV communication systems

NARCIS (Netherlands)

Schaar, van der M.; With, de P.H.N.

2000-01-01

Advanced broadcast manipulation of TV sequences and enhanced user interfaces for TV systems have resulted in an increased amount of pre- and post-editing of video sequences, where graphical information is inserted. However, in the current broadcasting chain, there are no provisions for enabling an
Spatial data infrastructures at work analysing the spatial enablement of public sector processes

CERN Document Server

Dessers, Ezra

2013-01-01

In 'Spatial Data Infrastructures at Work', Ezra Dessers introduces spatial enablement as a key concept to describe the realisation of SDI objectives in the context of individual public sector processes. Drawing on four years of research, Dessers argues that it has become essential, even unavoidable, to manage and (re)design inter-organisational process chains in order to further advance the role of SDIs as an enabling platform for a spatially enabled society. Detailed case studies illustrate that the process he describes is the setting in which one can see the SDI at work.
VQone MATLAB toolbox: A graphical experiment builder for image and video quality evaluations: VQone MATLAB toolbox.

Science.gov (United States)

Nuutinen, Mikko; Virtanen, Toni; Rummukainen, Olli; Häkkinen, Jukka

2016-03-01

This article presents VQone, a graphical experiment builder, written as a MATLAB toolbox, developed for image and video quality ratings. VQone contains the main elements needed for the subjective image and video quality rating process. This includes building and conducting experiments and data analysis. All functions can be controlled through graphical user interfaces. The experiment builder includes many standardized image and video quality rating methods. Moreover, it enables the creation of new methods or modified versions from standard methods. VQone is distributed free of charge under the terms of the GNU general public license and allows code modifications to be made so that the program's functions can be adjusted according to a user's requirements. VQone is available for download from the project page (http://www.helsinki.fi/psychology/groups/visualcognition/).
Finite difference numerical method for the superlattice Boltzmann transport equation and case comparison of CPU(C) and GPU(CUDA) implementations

Energy Technology Data Exchange (ETDEWEB)

Priimak, Dmitri

2014-12-01

We present a finite difference numerical algorithm for solving two dimensional spatially homogeneous Boltzmann transport equation which describes electron transport in a semiconductor superlattice subject to crossed time dependent electric and constant magnetic fields. The algorithm is implemented both in C language targeted to CPU and in CUDA C language targeted to commodity NVidia GPU. We compare performances and merits of one implementation versus another and discuss various software optimisation techniques.
Finite difference numerical method for the superlattice Boltzmann transport equation and case comparison of CPU(C) and GPU(CUDA) implementations

International Nuclear Information System (INIS)

Priimak, Dmitri

2014-01-01

We present a finite difference numerical algorithm for solving two dimensional spatially homogeneous Boltzmann transport equation which describes electron transport in a semiconductor superlattice subject to crossed time dependent electric and constant magnetic fields. The algorithm is implemented both in C language targeted to CPU and in CUDA C language targeted to commodity NVidia GPU. We compare performances and merits of one implementation versus another and discuss various software optimisation techniques
RELIEVE: A FORTRAN 77 program for numerical and graphical processing of digital topographic maps

International Nuclear Information System (INIS)

Sanchez, J.J.; Gorostiza, C.

1995-01-01

The RELIEVE program was developed in order to its integration with the expert system SIRENAS, in the frame of the Industrial Risks Programme, within the CIEMAT center. For accomplishing this mentioned system, arose the necessity of an additional component enabled for analyzing the topography (relieve) of the territory in which the focused site is located. That is just the mission of the RELIEVE program. Basically RELIEVE analyses the digitalized data points of a determinate topographic area, around a location of interest. The program allows us estimation by numerical techniques, using IMSL library, of the deep width, and other geometrical characteristics of the valley that is involved in. Optionally RELIEVE produces also graphical outputs concerning 3D representation of topographical map, level curves, sections of interest considered in the valley, etc., by means of the DISSPLA II library, running in the IBM system of the CIEMAT. (Author) 5 refs
Extending Graphic Statics for User-Controlled Structural Morphogenesis

OpenAIRE

Fivet, Corentin; Zastavni, Denis; Cap, Jean-François; Structural Morphology Group International Seminar 2011

2011-01-01

The first geometrical definitions of any structure are of primary importance when considering pertinence and efficiency in structural design processes. Engineering history has taught us how graphic statics can be a very powerful tool since it allows the designer to take shapes and forces into account simultaneously. However, current and past graphic statics methods are more suitable for analysis than structural morphogenesis. This contribution introduces new graphical methods that can supp...
Knowledge Management Enablers and Process in Hospital Organizations.

Science.gov (United States)

Lee, Hyun-Sook

2017-02-01

This research aimed to investigate the effects of knowledge management enablers, such as organizational structure, leadership, learning, information technology systems, trust, and collaboration, on the knowledge management process of creation, storage, sharing, and application. Using data from self-administered questionnaires in four Korean tertiary hospitals, this survey investigated the main organizational factors affecting the knowledge management process in these organizations. A total of 779 questionnaires were analyzed using SPSS 18.0 and AMOS 18.0. The results showed that organizational factors affect the knowledge management process differently in each hospital organization. From a managerial perspective, the implications of these factors for developing organizational strategies that encourage and foster the knowledge management process are discussed.
Enabling High Performance Large Scale Dense Problems through KBLAS

KAUST Repository

Abdelfattah, Ahmad; Keyes, David E.; Ltaief, Hatem

2014-01-01

KBLAS (KAUST BLAS) is a small library that provides highly optimized BLAS routines on systems accelerated with GPUs. KBLAS is entirely written in CUDA C, and targets NVIDIA GPUs with compute capability 2.0 (Fermi) or higher. The current focus
Rough surface scattering simulations using graphics cards

International Nuclear Information System (INIS)

Klapetek, Petr; Valtr, Miroslav; Poruba, Ales; Necas, David; Ohlidal, Miloslav

2010-01-01

In this article we present results of rough surface scattering calculations using a graphical processing unit implementation of the Finite Difference in Time Domain algorithm. Numerical results are compared to real measurements and computational performance is compared to computer processor implementation of the same algorithm. As a basis for computations, atomic force microscope measurements of surface morphology are used. It is shown that the graphical processing unit capabilities can be used to speedup presented computationally demanding algorithms without loss of precision.
Multi-GPU parallel algorithm design and analysis for improved inversion of probability tomography with gravity gradiometry data

Science.gov (United States)

Hou, Zhenlong; Huang, Danian

2017-09-01

In this paper, we make a study on the inversion of probability tomography (IPT) with gravity gradiometry data at first. The space resolution of the results is improved by multi-tensor joint inversion, depth weighting matrix and the other methods. Aiming at solving the problems brought by the big data in the exploration, we present the parallel algorithm and the performance analysis combining Compute Unified Device Architecture (CUDA) with Open Multi-Processing (OpenMP) based on Graphics Processing Unit (GPU) accelerating. In the test of the synthetic model and real data from Vinton Dome, we get the improved results. It is also proved that the improved inversion algorithm is effective and feasible. The performance of parallel algorithm we designed is better than the other ones with CUDA. The maximum speedup could be more than 200. In the performance analysis, multi-GPU speedup and multi-GPU efficiency are applied to analyze the scalability of the multi-GPU programs. The designed parallel algorithm is demonstrated to be able to process larger scale of data and the new analysis method is practical.
Critical frameworks for graphic design: graphic design and visual culture

OpenAIRE

Dauppe, Michele-Anne

2011-01-01

The paper considers an approach to the study of graphic design which addresses the expanding nature of graphic design in the 21st century and the purposeful application of theory to the subject of graphic design. In recent years graphic design has expanded its domain from the world of print culture (e.g. books, posters) into what is sometimes called screen culture. Everything from a mobile phone to a display in an airport lounge to the A.T.M. carries graphic design. It has become ever more ub...
Algorithms for GPU-based molecular dynamics simulations of complex fluids: Applications to water, mixtures, and liquid crystals.

Science.gov (United States)

Kazachenko, Sergey; Giovinazzo, Mark; Hall, Kyle Wm; Cann, Natalie M

2015-09-15

A custom code for molecular dynamics simulations has been designed to run on CUDA-enabled NVIDIA graphics processing units (GPUs). The double-precision code simulates multicomponent fluids, with intramolecular and intermolecular forces, coarse-grained and atomistic models, holonomic constraints, Nosé-Hoover thermostats, and the generation of distribution functions. Algorithms to compute Lennard-Jones and Gay-Berne interactions, and the electrostatic force using Ewald summations, are discussed. A neighbor list is introduced to improve scaling with respect to system size. Three test systems are examined: SPC/E water; an n-hexane/2-propanol mixture; and a liquid crystal mesogen, 2-(4-butyloxyphenyl)-5-octyloxypyrimidine. Code performance is analyzed for each system. With one GPU, a 33-119 fold increase in performance is achieved compared with the serial code while the use of two GPUs leads to a 69-287 fold improvement and three GPUs yield a 101-377 fold speedup. © 2015 Wiley Periodicals, Inc.
Parallelizing the QUDA Library for Multi-GPU Calculations in Lattice Quantum Chromodynamics

Energy Technology Data Exchange (ETDEWEB)

Ronald Babich, Michael Clark, Balint Joo

2010-11-01

Graphics Processing Units (GPUs) are having a transformational effect on numerical lattice quantum chromodynamics (LQCD) calculations of importance in nuclear and particle physics. The QUDA library provides a package of mixed precision sparse matrix linear solvers for LQCD applications, supporting single GPUs based on NVIDIA's Compute Unified Device Architecture (CUDA). This library, interfaced to the QDP++/Chroma framework for LQCD calculations, is currently in production use on the "9g" cluster at the Jefferson Laboratory, enabling unprecedented price/performance for a range of problems in LQCD. Nevertheless, memory constraints on current GPU devices limit the problem sizes that can be tackled. In this contribution we describe the parallelization of the QUDA library onto multiple GPUs using MPI, including strategies for the overlapping of communication and computation. We report on both weak and strong scaling for up to 32 GPUs interconnected by InfiniBand, on which we sustain in excess of 4 Tflops.
Parallelizing the QUDA Library for Multi-GPU Calculations in Lattice Quantum Chromodynamics

International Nuclear Information System (INIS)

Babich, Ronald; Clark, Michael; Joo, Balint

2010-01-01

Graphics Processing Units (GPUs) are having a transformational effect on numerical lattice quantum chromodynamics (LQCD) calculations of importance in nuclear and particle physics. The QUDA library provides a package of mixed precision sparse matrix linear solvers for LQCD applications, supporting single GPUs based on NVIDIA's Compute Unified Device Architecture (CUDA). This library, interfaced to the QDP++/Chroma framework for LQCD calculations, is currently in production use on the '9g' cluster at the Jefferson Laboratory, enabling unprecedented price/performance for a range of problems in LQCD. Nevertheless, memory constraints on current GPU devices limit the problem sizes that can be tackled. In this contribution we describe the parallelization of the QUDA library onto multiple GPUs using MPI, including strategies for the overlapping of communication and computation. We report on both weak and strong scaling for up to 32 GPUs interconnected by InfiniBand, on which we sustain in excess of 4 Tflops.
Web-based execution of graphical work-flows: a modular platform for multifunctional scientific process automation

International Nuclear Information System (INIS)

De Ley, E.; Jacobs, D.; Ounsy, M.

2012-01-01

The Passerelle process automation suite offers a fundamentally modular solution platform, based on a layered integration of several best-of-breed technologies. It has been successfully applied by Synchrotron Soleil as the sequencer for data acquisition and control processes on its beamlines, integrated with TANGO as a control bus and GlobalScreen TM ) as the SCADA package. Since last year it is being used as the graphical work-flow component for the development of an eclipse-based Data Analysis Work Bench, at ESRF. The top layer of Passerelle exposes an actor-based development paradigm, based on the Ptolemy framework (UC Berkeley). Actors provide explicit reusability and strong decoupling, combined with an inherently concurrent execution model. Actor libraries exist for TANGO integration, web-services, database operations, flow control, rules-based analysis, mathematical calculations, launching external scripts etc. Passerelle's internal architecture is based on OSGi, the major Java framework for modular service-based applications. A large set of modules exist that can be recombined as desired to obtain different features and deployment models. Besides desktop versions of the Passerelle work-flow workbench, there is also the Passerelle Manager. It is a secured web application including a graphical editor, for centralized design, execution, management and monitoring of process flows, integrating standard Java Enterprise services with OSGi. We will present the internal technical architecture, some interesting application cases and the lessons learnt. (authors)
Integration of rocket turbine design and analysis through computer graphics

Science.gov (United States)

Hsu, Wayne; Boynton, Jim

1988-01-01

An interactive approach with engineering computer graphics is used to integrate the design and analysis processes of a rocket engine turbine into a progressive and iterative design procedure. The processes are interconnected through pre- and postprocessors. The graphics are used to generate the blade profiles, their stacking, finite element generation, and analysis presentation through color graphics. Steps of the design process discussed include pitch-line design, axisymmetric hub-to-tip meridional design, and quasi-three-dimensional analysis. The viscous two- and three-dimensional analysis codes are executed after acceptable designs are achieved and estimates of initial losses are confirmed.

A Graphical Solution for Espaces Verts

CERN Document Server

Skelton, K

1999-01-01

'Espaces Verts' is responsible for the landscaping of the green areas, the cleaning of the roads, pavements, and car parks on the CERN site. This work is carried out by a contracting company. To control the work previously, there was a database of all the areas included in the contract and paper plans of the site. Given the size of the site the ideal solution was considered to be a visual system which integrates the maps and the database. To achieve this, the Surveying Department's graphical information system was used, linking it to the database for Espaces Verts, thus enabling the presentation of graphical thematic queries. This provides a useful management tool, which facilitates the task of ensuring that the contracting company carries out the work according to the agreed planning, and gives precise measurement of the site and thus of the contract. This paper will present how this has been achieved.
FAST CALCULATION OF THE LOMB-SCARGLE PERIODOGRAM USING GRAPHICS PROCESSING UNITS

International Nuclear Information System (INIS)

Townsend, R. H. D.

2010-01-01

I introduce a new code for fast calculation of the Lomb-Scargle periodogram that leverages the computing power of graphics processing units (GPUs). After establishing a background to the newly emergent field of GPU computing, I discuss the code design and narrate key parts of its source. Benchmarking calculations indicate no significant differences in accuracy compared to an equivalent CPU-based code. However, the differences in performance are pronounced; running on a low-end GPU, the code can match eight CPU cores, and on a high-end GPU it is faster by a factor approaching 30. Applications of the code include analysis of long photometric time series obtained by ongoing satellite missions and upcoming ground-based monitoring facilities, and Monte Carlo simulation of periodogram statistical properties.
General purpose graphics-processing-unit implementation of cosmological domain wall network evolution.

Science.gov (United States)

Correia, J R C C C; Martins, C J A P

2017-10-01

Topological defects unavoidably form at symmetry breaking phase transitions in the early universe. To probe the parameter space of theoretical models and set tighter experimental constraints (exploiting the recent advances in astrophysical observations), one requires more and more demanding simulations, and therefore more hardware resources and computation time. Improving the speed and efficiency of existing codes is essential. Here we present a general purpose graphics-processing-unit implementation of the canonical Press-Ryden-Spergel algorithm for the evolution of cosmological domain wall networks. This is ported to the Open Computing Language standard, and as a consequence significant speedups are achieved both in two-dimensional (2D) and 3D simulations.
General purpose graphics-processing-unit implementation of cosmological domain wall network evolution

Science.gov (United States)

Correia, J. R. C. C. C.; Martins, C. J. A. P.

2017-10-01

Topological defects unavoidably form at symmetry breaking phase transitions in the early universe. To probe the parameter space of theoretical models and set tighter experimental constraints (exploiting the recent advances in astrophysical observations), one requires more and more demanding simulations, and therefore more hardware resources and computation time. Improving the speed and efficiency of existing codes is essential. Here we present a general purpose graphics-processing-unit implementation of the canonical Press-Ryden-Spergel algorithm for the evolution of cosmological domain wall networks. This is ported to the Open Computing Language standard, and as a consequence significant speedups are achieved both in two-dimensional (2D) and 3D simulations.
An Application of Graphics Processing Units to Geosimulation of Collective Crowd Behaviour

Directory of Open Access Journals (Sweden)

Cjoskāns Jānis

2017-12-01

Full Text Available The goal of the paper is to assess the ways for computational performance and efficiency improvement of collective crowd behaviour simulation by using parallel computing methods implemented on graphics processing unit (GPU. To perform an experimental evaluation of benefits of parallel computing, a new GPU-based simulator prototype is proposed and the runtime performance is analysed. Based on practical examples of pedestrian dynamics geosimulation, the obtained performance measurements are compared to several other available multiagent simulation tools to determine the efficiency of the proposed simulator, as well as to provide generic guidelines for the efficiency improvements of the parallel simulation of collective crowd behaviour.
Solution of relativistic quantum optics problems using clusters of graphical processing units

Energy Technology Data Exchange (ETDEWEB)

Gordon, D.F., E-mail: daviel.gordon@nrl.navy.mil; Hafizi, B.; Helle, M.H.

2014-06-15

Numerical solution of relativistic quantum optics problems requires high performance computing due to the rapid oscillations in a relativistic wavefunction. Clusters of graphical processing units are used to accelerate the computation of a time dependent relativistic wavefunction in an arbitrary external potential. The stationary states in a Coulomb potential and uniform magnetic field are determined analytically and numerically, so that they can used as initial conditions in fully time dependent calculations. Relativistic energy levels in extreme magnetic fields are recovered as a means of validation. The relativistic ionization rate is computed for an ion illuminated by a laser field near the usual barrier suppression threshold, and the ionizing wavefunction is displayed.
Granatum: a graphical single-cell RNA-Seq analysis pipeline for genomics scientists.

Science.gov (United States)

Zhu, Xun; Wolfgruber, Thomas K; Tasato, Austin; Arisdakessian, Cédric; Garmire, David G; Garmire, Lana X

2017-12-05

Single-cell RNA sequencing (scRNA-Seq) is an increasingly popular platform to study heterogeneity at the single-cell level. Computational methods to process scRNA-Seq data are not very accessible to bench scientists as they require a significant amount of bioinformatic skills. We have developed Granatum, a web-based scRNA-Seq analysis pipeline to make analysis more broadly accessible to researchers. Without a single line of programming code, users can click through the pipeline, setting parameters and visualizing results via the interactive graphical interface. Granatum conveniently walks users through various steps of scRNA-Seq analysis. It has a comprehensive list of modules, including plate merging and batch-effect removal, outlier-sample removal, gene-expression normalization, imputation, gene filtering, cell clustering, differential gene expression analysis, pathway/ontology enrichment analysis, protein network interaction visualization, and pseudo-time cell series construction. Granatum enables broad adoption of scRNA-Seq technology by empowering bench scientists with an easy-to-use graphical interface for scRNA-Seq data analysis. The package is freely available for research use at http://garmiregroup.org/granatum/app.
Use of general purpose graphics processing units with MODFLOW

Science.gov (United States)

Hughes, Joseph D.; White, Jeremy T.

2013-01-01

To evaluate the use of general-purpose graphics processing units (GPGPUs) to improve the performance of MODFLOW, an unstructured preconditioned conjugate gradient (UPCG) solver has been developed. The UPCG solver uses a compressed sparse row storage scheme and includes Jacobi, zero fill-in incomplete, and modified-incomplete lower-upper (LU) factorization, and generalized least-squares polynomial preconditioners. The UPCG solver also includes options for sequential and parallel solution on the central processing unit (CPU) using OpenMP. For simulations utilizing the GPGPU, all basic linear algebra operations are performed on the GPGPU; memory copies between the central processing unit CPU and GPCPU occur prior to the first iteration of the UPCG solver and after satisfying head and flow criteria or exceeding a maximum number of iterations. The efficiency of the UPCG solver for GPGPU and CPU solutions is benchmarked using simulations of a synthetic, heterogeneous unconfined aquifer with tens of thousands to millions of active grid cells. Testing indicates GPGPU speedups on the order of 2 to 8, relative to the standard MODFLOW preconditioned conjugate gradient (PCG) solver, can be achieved when (1) memory copies between the CPU and GPGPU are optimized, (2) the percentage of time performing memory copies between the CPU and GPGPU is small relative to the calculation time, (3) high-performance GPGPU cards are utilized, and (4) CPU-GPGPU combinations are used to execute sequential operations that are difficult to parallelize. Furthermore, UPCG solver testing indicates GPGPU speedups exceed parallel CPU speedups achieved using OpenMP on multicore CPUs for preconditioners that can be easily parallelized.
A graphical user interface (gui) matlab program Synthetic_Ves For ...

African Journals Online (AJOL)

An interactive and robust computer program for 1D forward modeling of Schlumberger Vertical Electrical Sounding (VES) curves for multilayered earth models is presented. The Graphical User Interface (GUI) enabled software, written in MATLAB v.7.12.0.635 (R2011a), accepts user-defined geologic model parameters (i.e. ...
Influence of intrinsic and extrinsic forces on 3D stress distribution using CUDA programming

Science.gov (United States)

Räss, Ludovic; Omlin, Samuel; Podladchikov, Yuri

2013-04-01

In order to have a better understanding of the influence of buoyancy (intrinsic) and boundary (extrinsic) forces in a nonlinear rheology due to a power law fluid, some basics needs to be explored through 3D numerical calculation. As first approach, the already studied Stokes setup of a rising sphere will be used to calibrate the 3D model. Far field horizontal tectonic stress is applied to the sphere, which generates a vertical acceleration, buoyancy driven. This simple and known setup allows some benchmarking performed through systematic runs. The relative importance of intrinsic and extrinsic forces producing the wide variety of rates and styles of deformation, including absence of deformation and generating 3D stress patterns, will be determined. Relation between vertical motion and power law exponent will also be explored. The goal of these investigations will be to run models having topography and density structure from geophysical imaging as input, and 3D stress field as output. The stress distribution in Swiss Alps and Plateau and its implication for risk analysis is one of the perspective for this research. In fact, proximity of the stress to the failure is fundamental for risk assessment. Sensitivity of this to the accurate topography representation can then be evaluated. The developed 3D numerical codes, tuned for mid-sized cluster, need to be optimized, especially while running good resolution in full 3D. Therefor, two largely used computing platforms, MATLAB and FORTRAN 90 are explored. Starting with an easy adaptable and as short as possible MATLAB code, which is then upgraded in order to reach higher performance in simulation times and resolution. A significant speedup using the rising NVIDIA CUDA technology and resources is also possible. Programming in C-CUDA, creating some synchronization feature, and comparing the results with previous runs, helps us to investigate the new speedup possibilities allowed through GPU parallel computing. These codes
High-performance method of morphological medical image processing

Directory of Open Access Journals (Sweden)

Ryabykh M. S.

2016-07-01

Full Text Available the article shows the implementation of grayscale morphology vHGW algorithm for selection borders in the medical image. Image processing is executed using OpenMP and NVIDIA CUDA technology for images with different resolution and different size of the structuring element.
Topographic Digital Raster Graphics - USGS DIGITAL RASTER GRAPHICS

Data.gov (United States)

NSGIC Local Govt | GIS Inventory — USGS Topographic Digital Raster Graphics downloaded from LABINS (http://data.labins.org/2003/MappingData/drg/drg_stpl83.cfm). A digital raster graphic (DRG) is a...
Vega-Lite: A Grammar of Interactive Graphics.

Science.gov (United States)

Satyanarayan, Arvind; Moritz, Dominik; Wongsuphasawat, Kanit; Heer, Jeffrey

2017-01-01

We present Vega-Lite, a high-level grammar that enables rapid specification of interactive data visualizations. Vega-Lite combines a traditional grammar of graphics, providing visual encoding rules and a composition algebra for layered and multi-view displays, with a novel grammar of interaction. Users specify interactive semantics by composing selections. In Vega-Lite, a selection is an abstraction that defines input event processing, points of interest, and a predicate function for inclusion testing. Selections parameterize visual encodings by serving as input data, defining scale extents, or by driving conditional logic. The Vega-Lite compiler automatically synthesizes requisite data flow and event handling logic, which users can override for further customization. In contrast to existing reactive specifications, Vega-Lite selections decompose an interaction design into concise, enumerable semantic units. We evaluate Vega-Lite through a range of examples, demonstrating succinct specification of both customized interaction methods and common techniques such as panning, zooming, and linked selection.
Big Data GPU-Driven Parallel Processing Spatial and Spatio-Temporal Clustering Algorithms

Science.gov (United States)

Konstantaras, Antonios; Skounakis, Emmanouil; Kilty, James-Alexander; Frantzeskakis, Theofanis; Maravelakis, Emmanuel

2016-04-01

Advances in graphics processing units' technology towards encompassing parallel architectures [1], comprised of thousands of cores and multiples of parallel threads, provide the foundation in terms of hardware for the rapid processing of various parallel applications regarding seismic big data analysis. Seismic data are normally stored as collections of vectors in massive matrices, growing rapidly in size as wider areas are covered, denser recording networks are being established and decades of data are being compiled together [2]. Yet, many processes regarding seismic data analysis are performed on each seismic event independently or as distinct tiles [3] of specific grouped seismic events within a much larger data set. Such processes, independent of one another can be performed in parallel narrowing down processing times drastically [1,3]. This research work presents the development and implementation of three parallel processing algorithms using Cuda C [4] for the investigation of potentially distinct seismic regions [5,6] present in the vicinity of the southern Hellenic seismic arc. The algorithms, programmed and executed in parallel comparatively, are the: fuzzy k-means clustering with expert knowledge [7] in assigning overall clusters' number; density-based clustering [8]; and a selves-developed spatio-temporal clustering algorithm encompassing expert [9] and empirical knowledge [10] for the specific area under investigation. Indexing terms: GPU parallel programming, Cuda C, heterogeneous processing, distinct seismic regions, parallel clustering algorithms, spatio-temporal clustering References [1] Kirk, D. and Hwu, W.: 'Programming massively parallel processors - A hands-on approach', 2nd Edition, Morgan Kaufman Publisher, 2013 [2] Konstantaras, A., Valianatos, F., Varley, M.R. and Makris, J.P.: 'Soft-Computing Modelling of Seismicity in the Southern Hellenic Arc', Geoscience and Remote Sensing Letters, vol. 5 (3), pp. 323-327, 2008 [3] Papadakis, S. and
Ways of explaining sexual harassment: motivating, enabling and legitimizing processes

OpenAIRE

Diehl, Charlotte

2014-01-01

This dissertation aims to contribute to a comprehensive explanation of sexual harassment by the investigation of three social-psychological processes, which seem to crucially contribute to the etiology of sexual harassment: motivation to sexually harass (e.g., power or sexuality), enabling processes (e.g., through diverse situational cues), and legitimization of sexually harassing behavior (e.g., by applying myths about sexual harassment). By consolidating these three processes into one multi...
Graphics in DAQSIM

International Nuclear Information System (INIS)

Wang, C.C.; Booth, A.W.; Chen, Y.M.; Botlo, M.

1993-06-01

At the Superconducting Super Collider Laboratory (SSCL) a tool called DAQSIM has been developed to study the behavior of Data Acquisition (DAQ) systems. This paper reports and discusses the graphics used in DAQSIM. DAQSIM graphics includes graphical user interface (GUI), animation, debugging, and control facilities. DAQSIM graphics not only provides a convenient DAQ simulation environment, it also serves as an efficient manager in simulation development and verification
SU-E-P-59: A Graphical Interface for XCAT Phantom Configuration, Generation and Processing

International Nuclear Information System (INIS)

Myronakis, M; Cai, W; Dhou, S; Cifter, F; Lewis, J; Hurwitz, M

2015-01-01

Purpose: To design a comprehensive open-source, publicly available, graphical user interface (GUI) to facilitate the configuration, generation, processing and use of the 4D Extended Cardiac-Torso (XCAT) phantom. Methods: The XCAT phantom includes over 9000 anatomical objects as well as respiratory, cardiac and tumor motion. It is widely used for research studies in medical imaging and radiotherapy. The phantom generation process involves the configuration of a text script to parameterize the geometry, motion, and composition of the whole body and objects within it, and to generate simulated PET or CT images. To avoid the need for manual editing or script writing, our MATLAB-based GUI uses slider controls, drop-down lists, buttons and graphical text input to parameterize and process the phantom. Results: Our GUI can be used to: a) generate parameter files; b) generate the voxelized phantom; c) combine the phantom with a lesion; d) display the phantom; e) produce average and maximum intensity images from the phantom output files; f) incorporate irregular patient breathing patterns; and f) generate DICOM files containing phantom images. The GUI provides local help information using tool-tip strings on the currently selected phantom, minimizing the need for external documentation. The DICOM generation feature is intended to simplify the process of importing the phantom images into radiotherapy treatment planning systems or other clinical software. Conclusion: The GUI simplifies and automates the use of the XCAT phantom for imaging-based research projects in medical imaging or radiotherapy. This has the potential to accelerate research conducted with the XCAT phantom, or to ease the learning curve for new users. This tool does not include the XCAT phantom software itself. We would like to acknowledge funding from MRA, Varian Medical Systems Inc
SU-E-P-59: A Graphical Interface for XCAT Phantom Configuration, Generation and Processing

Energy Technology Data Exchange (ETDEWEB)

Myronakis, M; Cai, W; Dhou, S; Cifter, F; Lewis, J [Brigham and Women’s Hospital, Boston, MA (United States); Hurwitz, M [Newton, MA (United States)

2015-06-15

Purpose: To design a comprehensive open-source, publicly available, graphical user interface (GUI) to facilitate the configuration, generation, processing and use of the 4D Extended Cardiac-Torso (XCAT) phantom. Methods: The XCAT phantom includes over 9000 anatomical objects as well as respiratory, cardiac and tumor motion. It is widely used for research studies in medical imaging and radiotherapy. The phantom generation process involves the configuration of a text script to parameterize the geometry, motion, and composition of the whole body and objects within it, and to generate simulated PET or CT images. To avoid the need for manual editing or script writing, our MATLAB-based GUI uses slider controls, drop-down lists, buttons and graphical text input to parameterize and process the phantom. Results: Our GUI can be used to: a) generate parameter files; b) generate the voxelized phantom; c) combine the phantom with a lesion; d) display the phantom; e) produce average and maximum intensity images from the phantom output files; f) incorporate irregular patient breathing patterns; and f) generate DICOM files containing phantom images. The GUI provides local help information using tool-tip strings on the currently selected phantom, minimizing the need for external documentation. The DICOM generation feature is intended to simplify the process of importing the phantom images into radiotherapy treatment planning systems or other clinical software. Conclusion: The GUI simplifies and automates the use of the XCAT phantom for imaging-based research projects in medical imaging or radiotherapy. This has the potential to accelerate research conducted with the XCAT phantom, or to ease the learning curve for new users. This tool does not include the XCAT phantom software itself. We would like to acknowledge funding from MRA, Varian Medical Systems Inc.
Risk Management Collaboration through Sharing Interactive Graphics

Science.gov (United States)

Slingsby, Aidan; Dykes, Jason; Wood, Jo; Foote, Matthew

2010-05-01

Risk management involves the cooperation of scientists, underwriters and actuaries all of whom analyse data to support decision-making. Results are often disseminated through static documents with graphics that convey the message the analyst wishes to communicate. Interactive graphics are increasingly popular means of communicating the results of data analyses because they enable other parties to explore and visually analyse some of the data themselves prior to and during discussion. Discussion around interactive graphics can occur synchronously in face-to-face meetings or with video-conferencing and screen sharing or they can occur asynchronously through web-sites such as ManyEyes, web-based fora, blogs, wikis and email. A limitation of approaches that do not involve screen sharing is the difficulty in sharing the results of insights from interacting with the graphic. Static images accompanied can be shared but these themselves cannot be interacted, producing a discussion bottleneck (Baker, 2008). We address this limitation by allowing the state and configuration of graphics to be shared (rather than static images) so that a user can reproduce someone else's graphic, interact with it and then share the results of this accompanied with some commentary. HiVE (Slingsby et al, 2009) is a compact and intuitive text-based language that has been designed for this purpose. We will describe the vizTweets project (a 9-month project funded by JISC) in which we are applying these principles to insurance risk management in the context of the Willis Research Network, the world's largest collaboration between the insurance industry and the academia). The project aims to extend HiVE to meet the needs of the sector, design, implement free-available web services and tools and to provide case studies. We will present a case study that demonstrate the potential of this approach for collaboration within the Willis Research Network. Baker, D. Towards Transparency in Visualisation Based
Srijan: a graphical toolkit for sensor network macroprogramming

OpenAIRE

Pathak , Animesh; Gowda , Mahanth K.

2009-01-01

International audience; Macroprogramming is an application development technique for wireless sensor networks (WSNs) where the developer specifies the behavior of the system, as opposed to that of the constituent nodes. In this proposed demonstration, we would like to present Srijan, a toolkit that enables application development for WSNs in a graphical manner using data-driven macroprogramming. It can be used in various stages of application development, viz. i) specification of application ...

Smoke simulation for fire engineering using a multigrid method on graphics hardware

DEFF Research Database (Denmark)

Glimberg, Stefan; Erleben, Kenny; Bennetsen, Jens

2009-01-01

interactive physical simulation for engineering purposes, has the benefit of reducing production turn-around time. We have measured speed-up improvements by a factor of up to 350, compared to existing CPU-based solvers. The present CUDA-based solver promises huge potential in economical benefits, as well...
Pseudo-random number generators for Monte Carlo simulations on ATI Graphics Processing Units

Science.gov (United States)

Demchik, Vadim

2011-03-01

Basic uniform pseudo-random number generators are implemented on ATI Graphics Processing Units (GPU). The performance results of the realized generators (multiplicative linear congruential (GGL), XOR-shift (XOR128), RANECU, RANMAR, RANLUX and Mersenne Twister (MT19937)) on CPU and GPU are discussed. The obtained speed up factor is hundreds of times in comparison with CPU. RANLUX generator is found to be the most appropriate for using on GPU in Monte Carlo simulations. The brief review of the pseudo-random number generators used in modern software packages for Monte Carlo simulations in high-energy physics is presented.
Global tree network for computing structures enabling global processing operations

Science.gov (United States)

Blumrich; Matthias A.; Chen, Dong; Coteus, Paul W.; Gara, Alan G.; Giampapa, Mark E.; Heidelberger, Philip; Hoenicke, Dirk; Steinmacher-Burow, Burkhard D.; Takken, Todd E.; Vranas, Pavlos M.

2010-01-19

A system and method for enabling high-speed, low-latency global tree network communications among processing nodes interconnected according to a tree network structure. The global tree network enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices are included that interconnect the nodes of the tree via links to facilitate performance of low-latency global processing operations at nodes of the virtual tree and sub-tree structures. The global operations performed include one or more of: broadcast operations downstream from a root node to leaf nodes of a virtual tree, reduction operations upstream from leaf nodes to the root node in the virtual tree, and point-to-point message passing from any node to the root node. The global tree network is configurable to provide global barrier and interrupt functionality in asynchronous or synchronized manner, and, is physically and logically partitionable.
Study on GPU Computing for SCOPE2 with CUDA

International Nuclear Information System (INIS)

Kodama, Yasuhiro; Tatsumi, Masahiro; Ohoka, Yasunori

2011-01-01

For improving safety and cost effectiveness of nuclear power plants, a core calculation code SCOPE2 has been developed, which adopts detailed calculation models such as the multi-group nodal SP3 transport calculation method in three-dimensional pin-by-pin geometry to achieve high predictability. However, it is difficult to apply the code to loading pattern optimizations since it requires much longer computation time than that of codes based on the nodal diffusion method which is widely used in core design calculations. In this study, we studied possibility of acceleration of SCOPE2 with GPU computing capability which has been recognized as one of the most promising direction of high performance computing. In the previous study with an experimental programming framework, it required much effort to convert the algorithms to ones which fit to GPU computation. It was found, however, that this conversion was tremendously difficult because of the complexity of algorithms and restrictions in implementation. In this study, to overcome this complexity, we utilized the CUDA programming environment provided by NVIDIA which is a versatile and flexible language as an extension to the C/C++ languages. It was confirmed that we could enjoy high performance without degradation of maintainability through test implementation of GPU kernels for neutron diffusion/simplified P3 equation solvers. (author)
oRGB: a practical opponent color space for computer graphics.

Science.gov (United States)

Bratkova, Margarita; Boulos, Solomon; Shirley, Peter

2009-01-01

Designed for computer graphics, oRGB is a new color model based on opponent color theory. It works well for both HSV-style color selection and computational applications such as color transfer. oRGB also enables new applications such as a quantitative cool-to-warm metric, intuitive color manipulation and variations, and simple gamut mapping.
Trends in Continuity and Interpolation for Computer Graphics.

Science.gov (United States)

Gonzalez Garcia, Francisco

2015-01-01

In every computer graphics oriented application today, it is a common practice to texture 3D models as a way to obtain realistic material. As part of this process, mesh texturing, deformation, and visualization are all key parts of the computer graphics field. This PhD dissertation was completed in the context of these three important and related fields in computer graphics. The article presents techniques that improve on existing state-of-the-art approaches related to continuity and interpolation in texture space (texturing), object space (deformation), and screen space (rendering).
An Analysis of OpenACC Programming Model: Image Processing Algorithms as a Case Study

Directory of Open Access Journals (Sweden)

M. J. Mišić

2014-06-01

Full Text Available Graphics processing units and similar accelerators have been intensively used in general purpose computations for several years. In the last decade, GPU architecture and organization changed dramatically to support an ever-increasing demand for computing power. Along with changes in hardware, novel programming models have been proposed, such as NVIDIA’s Compute Unified Device Architecture (CUDA and Open Computing Language (OpenCL by Khronos group. Although numerous commercial and scientific applications have been developed using these two models, they still impose a significant challenge for less experienced users. There are users from various scientific and engineering communities who would like to speed up their applications without the need to deeply understand a low-level programming model and underlying hardware. In 2011, OpenACC programming model was launched. Much like OpenMP for multicore processors, OpenACC is a high-level, directive-based programming model for manycore processors like GPUs. This paper presents an analysis of OpenACC programming model and its applicability in typical domains like image processing. Three, simple image processing algorithms have been implemented for execution on the GPU with OpenACC. The results were compared with their sequential counterparts, and results are briefly discussed.
Graphics on demand: the automatic data visualization on the WEB

Directory of Open Access Journals (Sweden)

Ramzi Guetari

2017-06-01

Full Text Available Data visualization is an effective tool for communicating the results of opinion surveys, epidemiological studies, statistics on consumer habits, etc. The graphical representation of data usually assists human information processing by reducing demands on attention, working memory, and long-term memory. It allows, among other things, a faster reading of the information (by acting on the forms, directions, colors…, the independence of the language (or culture, a better capture the attention of the audience, etc. Data that could be graphically represented may be structured or unstructured. The unstructured data, whose volume grows exponentially, often hide important and even vital information for society and companies. It, therefore, takes a lot of work to extract valuable information from unstructured data. If it is easier to understand a message through structured data, such as a table, than through a long narrative text, it is even easier to convey a message through a graphic than a table. In our opinion, it is often very useful to synthesize the unstructured data in the form of graphical representations. In this paper, we present an approach for processing unstructured data containing statistics in order to represent them graphically. This approach allows transforming the unstructured data into structured one which globally conveys the same countable information. The graphical representation of such a structured data is then obvious. This approach deals with both quantitative and qualitative data. It is based on Natural Language Processing Techniques and Text Mining. An application that implements this process is also presented in this paper.
Graphics Processing Unit-Accelerated Nonrigid Registration of MR Images to CT Images During CT-Guided Percutaneous Liver Tumor Ablations.

Science.gov (United States)

Tokuda, Junichi; Plishker, William; Torabi, Meysam; Olubiyi, Olutayo I; Zaki, George; Tatli, Servet; Silverman, Stuart G; Shekher, Raj; Hata, Nobuhiko

2015-06-01

Accuracy and speed are essential for the intraprocedural nonrigid magnetic resonance (MR) to computed tomography (CT) image registration in the assessment of tumor margins during CT-guided liver tumor ablations. Although both accuracy and speed can be improved by limiting the registration to a region of interest (ROI), manual contouring of the ROI prolongs the registration process substantially. To achieve accurate and fast registration without the use of an ROI, we combined a nonrigid registration technique on the basis of volume subdivision with hardware acceleration using a graphics processing unit (GPU). We compared the registration accuracy and processing time of GPU-accelerated volume subdivision-based nonrigid registration technique to the conventional nonrigid B-spline registration technique. Fourteen image data sets of preprocedural MR and intraprocedural CT images for percutaneous CT-guided liver tumor ablations were obtained. Each set of images was registered using the GPU-accelerated volume subdivision technique and the B-spline technique. Manual contouring of ROI was used only for the B-spline technique. Registration accuracies (Dice similarity coefficient [DSC] and 95% Hausdorff distance [HD]) and total processing time including contouring of ROIs and computation were compared using a paired Student t test. Accuracies of the GPU-accelerated registrations and B-spline registrations, respectively, were 88.3 ± 3.7% versus 89.3 ± 4.9% (P = .41) for DSC and 13.1 ± 5.2 versus 11.4 ± 6.3 mm (P = .15) for HD. Total processing time of the GPU-accelerated registration and B-spline registration techniques was 88 ± 14 versus 557 ± 116 seconds (P processing time. The GPU-accelerated volume subdivision technique may enable the implementation of nonrigid registration into routine clinical practice. Copyright © 2015 AUR. Published by Elsevier Inc. All rights reserved.
Bayesian Graphical Models

DEFF Research Database (Denmark)

Jensen, Finn Verner; Nielsen, Thomas Dyhre

2016-01-01

Mathematically, a Bayesian graphical model is a compact representation of the joint probability distribution for a set of variables. The most frequently used type of Bayesian graphical models are Bayesian networks. The structural part of a Bayesian graphical model is a graph consisting of nodes...
PC Graphic file programing

International Nuclear Information System (INIS)

Yang, Jin Seok

1993-04-01

This book gives description of basic of graphic knowledge and understanding and realization of graphic file form. The first part deals with graphic with graphic data, store of graphic data and compress of data, programing language such as assembling, stack, compile and link of program and practice and debugging. The next part mentions graphic file form such as Mac paint file, GEM/IMG file, PCX file, GIF file, and TIFF file, consideration of hardware like mono screen driver and color screen driver in high speed, basic conception of dithering and conversion of formality.
Computer graphics application in the engineering design integration system

Science.gov (United States)

Glatt, C. R.; Abel, R. W.; Hirsch, G. N.; Alford, G. E.; Colquitt, W. N.; Stewart, W. A.

1975-01-01

The computer graphics aspect of the Engineering Design Integration (EDIN) system and its application to design problems were discussed. Three basic types of computer graphics may be used with the EDIN system for the evaluation of aerospace vehicles preliminary designs: offline graphics systems using vellum-inking or photographic processes, online graphics systems characterized by direct coupled low cost storage tube terminals with limited interactive capabilities, and a minicomputer based refresh terminal offering highly interactive capabilities. The offline line systems are characterized by high quality (resolution better than 0.254 mm) and slow turnaround (one to four days). The online systems are characterized by low cost, instant visualization of the computer results, slow line speed (300 BAUD), poor hard copy, and the early limitations on vector graphic input capabilities. The recent acquisition of the Adage 330 Graphic Display system has greatly enhanced the potential for interactive computer aided design.
Lattice QCD simulations using the OpenACC platform

International Nuclear Information System (INIS)

Majumdar, Pushan

2016-01-01

In this article we will explore the OpenACC platform for programming Graphics Processing Units (GPUs). The OpenACC platform offers a directive based programming model for GPUs which avoids the detailed data flow control and memory management necessary in a CUDA programming environment. In the OpenACC model, programs can be written in high level languages with OpenMP like directives. We present some examples of QCD simulation codes using OpenACC and discuss their performance on the Fermi and Kepler GPUs. (paper)
Full Chain Benchmarking for Open Architecture Airborne ISR Systems: A Case Study for GMTI Radar Applications

Science.gov (United States)

2015-09-15

languages targeting graphics processors [1]. Examples include the CUDA APIs for development on NVIDIA devices, and the more portable OpenCL APIs which...offered by NVIDIA for programming their GPU products. OpenVPX is a switched fabric standard developed specifically for high-performance
Computer graphics as an information means for power plants

International Nuclear Information System (INIS)

Kollmannsberger, J.; Pfadler, H.

1990-01-01

Computer-aided graphics have proved increasingly successful as a help in process control in large plants. The specific requirements for the system and the methods of planning and achieving graphic systems in powerstation control rooms are described. Experience from operation is evaluated from completed plants. (orig.) [de
The process of patient enablement in general practice nurse consultations: a grounded theory study.

Science.gov (United States)

Desborough, Jane; Banfield, Michelle; Phillips, Christine; Mills, Jane

2017-05-01

The aim of this study was to gain insight into the process of patient enablement in general practice nursing consultations. Enhanced roles for general practice nurses may benefit patients through a range of mechanisms, one of which may be increasing patient enablement. In studies with general practitioners enhanced patient enablement has been associated with increases in self-efficacy and skill development. This study used a constructivist grounded theory design. In-depth interviews were conducted with 16 general practice nurses and 23 patients from 21 general practices between September 2013 - March 2014. Data generation and analysis were conducted concurrently using constant comparative analysis and theoretical sampling focussing on the process and outcomes of patient enablement. Use of the storyline technique supported theoretical coding and integration of the data into a theoretical model. A clearly defined social process that fostered and optimised patient enablement was constructed. The theory of 'developing enabling healthcare partnerships between nurses and patients in general practice' incorporates three stages: triggering enabling healthcare partnerships, tailoring care and the manifestation of patient enablement. Patient enablement was evidenced through: 1. Patients' understanding of their unique healthcare requirements informing their health seeking behaviours and choices; 2. Patients taking an increased lead in their partnership with a nurse and seeking choices in their care and 3. Patients getting health care that reflected their needs, preferences and goals. This theoretical model is in line with a patient-centred model of health care and is particularly suited to patients with chronic disease. © 2016 John Wiley & Sons Ltd.
Business Process Simulation: Requirements for Business and Resource Models

Directory of Open Access Journals (Sweden)

Audrius Rima

2015-07-01

Full Text Available The purpose of Business Process Model and Notation (BPMN is to provide easily understandable graphical representation of business process. Thus BPMN is widely used and applied in various areas one of them being a business process simulation. This paper addresses some BPMN model based business process simulation problems. The paper formulate requirements for business process and resource models in enabling their use for business process simulation.
Development of the data logging and graphical presentation for gamma scanning, trouble shooting and process evaluation in the petroleum refinery column

International Nuclear Information System (INIS)

Saengchantr, Dhanaj; Chueinta Siripone

2009-07-01

Full text: Software of data logging and graphical presentation on gamma scanning for trouble shooting and process evaluation of the petroleum refinery column was developed. While setting the gamma source and gamma detector at the opposite orientation along side the column and recording the transmitted radiation through the column at several elevations, the relative density gamma intensity vs. vertical elevation could be obtained in the graphical mode. In comparison with engineering drawing, the physical and process abnormalities could be clearly evaluated during field investigation. The program could also accumulate up to 8 data sets, each of 1,000 points allowing with convenience, the comparison of different operational parameters adjustment during remedy of the problem and/or process optimization. Incorporated with this development and other factors, the technology capability of the TINT Service Center to the petroleum refinery was also enhanced
Object-oriented graphics programming in C++

CERN Document Server

Stevens, Roger T

2014-01-01

Object-Oriented Graphics Programming in C++ provides programmers with the information needed to produce realistic pictures on a PC monitor screen.The book is comprised of 20 chapters that discuss the aspects of graphics programming in C++. The book starts with a short introduction discussing the purpose of the book. It also includes the basic concepts of programming in C++ and the basic hardware requirement. Subsequent chapters cover related topics in C++ programming such as the various display modes; displaying TGA files, and the vector class. The text also tackles subjects on the processing
Graphical Argument in the Essayist Prose of the Pesquisa FAPESP Journal

Directory of Open Access Journals (Sweden)

Irene Machado

2016-03-01

Full Text Available This article investigates the concept of graphical argumentation as an exercise of essayistic prose developed in the process of writing expansion in printed texts. It is understood that by expanding the scope of the word in the context of visual graphics processes such as drawings, photography and infographics, arguments are achievements much more of diagrammatic reasoning than of rhetorical elaboration. Proof of that are graphic arguments, which have become an inalienable modeling from texts of scientific communication, such as the ones produced in the Pesquisa FAPESP journal.

On the Role of Computer Graphics in Engineering Design Graphics Courses.

Science.gov (United States)

Pleck, Michael H.

The implementation of two- and three-dimensional computer graphics in a freshmen engineering design course at the university level is described. An assessment of the capabilities and limitations of computer graphics is made, along with a presentation of the fundamental role which computer graphics plays in engineering design instruction.…
Write Is Right: Using Graphic Organizers to Improve Student Mathematical Problem Solving

Science.gov (United States)

Zollman, Alan

2012-01-01

Teachers have used graphic organizers successfully in teaching the writing process. This paper describes graphic organizers and their potential mathematics benefits for both students and teachers, elucidates a specific graphic organizer adaptation for mathematical problem solving, and discusses results using the "four-corners-and-a-diamond"…
Graphic Designer/Production Coordinator | IDRC - International ...

International Development Research Centre (IDRC) Digital Library (Canada)

Provides design and graphic services for print- and Web-based publishing;; Initiates designs and carries out ... process, ensuring that such suppliers meet appropriate standards of quality and service at reasonable cost; ... Internal Services.
TESLA GPUs versus MPI with OpenMP for the Forward Modeling of Gravity and Gravity Gradient of Large Prisms Ensemble

Directory of Open Access Journals (Sweden)

Carlos Couder-Castañeda

2013-01-01

Full Text Available An implementation with the CUDA technology in a single and in several graphics processing units (GPUs is presented for the calculation of the forward modeling of gravitational fields from a tridimensional volumetric ensemble composed by unitary prisms of constant density. We compared the performance results obtained with the GPUs against a previous version coded in OpenMP with MPI, and we analyzed the results on both platforms. Today, the use of GPUs represents a breakthrough in parallel computing, which has led to the development of several applications with various applications. Nevertheless, in some applications the decomposition of the tasks is not trivial, as can be appreciated in this paper. Unlike a trivial decomposition of the domain, we proposed to decompose the problem by sets of prisms and use different memory spaces per processing CUDA core, avoiding the performance decay as a result of the constant calls to kernels functions which would be needed in a parallelization by observations points. The design and implementation created are the main contributions of this work, because the parallelization scheme implemented is not trivial. The performance results obtained are comparable to those of a small processing cluster.
Nanoscale multireference quantum chemistry: full configuration interaction on graphical processing units.

Science.gov (United States)

Fales, B Scott; Levine, Benjamin G

2015-10-13

Methods based on a full configuration interaction (FCI) expansion in an active space of orbitals are widely used for modeling chemical phenomena such as bond breaking, multiply excited states, and conical intersections in small-to-medium-sized molecules, but these phenomena occur in systems of all sizes. To scale such calculations up to the nanoscale, we have developed an implementation of FCI in which electron repulsion integral transformation and several of the more expensive steps in σ vector formation are performed on graphical processing unit (GPU) hardware. When applied to a 1.7 × 1.4 × 1.4 nm silicon nanoparticle (Si72H64) described with the polarized, all-electron 6-31G** basis set, our implementation can solve for the ground state of the 16-active-electron/16-active-orbital CASCI Hamiltonian (more than 100,000,000 configurations) in 39 min on a single NVidia K40 GPU.
Multithreaded real-time 3D image processing software architecture and implementation

Science.gov (United States)

Ramachandra, Vikas; Atanassov, Kalin; Aleksic, Milivoje; Goma, Sergio R.

2011-03-01

Recently, 3D displays and videos have generated a lot of interest in the consumer electronics industry. To make 3D capture and playback popular and practical, a user friendly playback interface is desirable. Towards this end, we built a real time software 3D video player. The 3D video player displays user captured 3D videos, provides for various 3D specific image processing functions and ensures a pleasant viewing experience. Moreover, the player enables user interactivity by providing digital zoom and pan functionalities. This real time 3D player was implemented on the GPU using CUDA and OpenGL. The player provides user interactive 3D video playback. Stereo images are first read by the player from a fast drive and rectified. Further processing of the images determines the optimal convergence point in the 3D scene to reduce eye strain. The rationale for this convergence point selection takes into account scene depth and display geometry. The first step in this processing chain is identifying keypoints by detecting vertical edges within the left image. Regions surrounding reliable keypoints are then located on the right image through the use of block matching. The difference in the positions between the corresponding regions in the left and right images are then used to calculate disparity. The extrema of the disparity histogram gives the scene disparity range. The left and right images are shifted based upon the calculated range, in order to place the desired region of the 3D scene at convergence. All the above computations are performed on one CPU thread which calls CUDA functions. Image upsampling and shifting is performed in response to user zoom and pan. The player also consists of a CPU display thread, which uses OpenGL rendering (quad buffers). This also gathers user input for digital zoom and pan and sends them to the processing thread.
Image flows and one-liner graphical image representation.

Science.gov (United States)

Makhervaks, Vadim; Barequet, Gill; Bruckstein, Alfred

2002-10-01

This paper introduces a novel graphical image representation consisting of a single curve-the one-liner. The first step of the algorithm involves the detection and ranking of image edges. A new edge exploration technique is used to perform both tasks simultaneously. This process is based on image flows. It uses a gradient vector field and a new operator to explore image edges. Estimation of the derivatives of the image is performed by using local Taylor expansions in conjunction with a weighted least-squares method. This process finds all the possible image edges without any pruning, and collects information that allows the edges found to be prioritized. This enables the most important edges to be selected to form a skeleton of the representation sought. The next step connects the selected edges into one continuous curve-the one-liner. It orders the selected edges and determines the curves connecting them. These two problems are solved separately. Since the abstract graph setting of the first problem is NP-complete, we reduce it to a variant of the traveling salesman problem and compute an approximate solution to it. We solve the second problem by using Dijkstra's shortest-path algorithm. The full software implementation for the entire one-liner determination process is available.
High-Throughput Characterization of Porous Materials Using Graphics Processing Units

Energy Technology Data Exchange (ETDEWEB)

Kim, Jihan; Martin, Richard L.; Rübel, Oliver; Haranczyk, Maciej; Smit, Berend

2012-05-08

We have developed a high-throughput graphics processing units (GPU) code that can characterize a large database of crystalline porous materials. In our algorithm, the GPU is utilized to accelerate energy grid calculations where the grid values represent interactions (i.e., Lennard-Jones + Coulomb potentials) between gas molecules (i.e., CH$_{4}$ and CO$_{2}$) and material's framework atoms. Using a parallel flood fill CPU algorithm, inaccessible regions inside the framework structures are identified and blocked based on their energy profiles. Finally, we compute the Henry coefficients and heats of adsorption through statistical Widom insertion Monte Carlo moves in the domain restricted to the accessible space. The code offers significant speedup over a single core CPU code and allows us to characterize a set of porous materials at least an order of magnitude larger than ones considered in earlier studies. For structures selected from such a prescreening algorithm, full adsorption isotherms can be calculated by conducting multiple grand canonical Monte Carlo simulations concurrently within the GPU.
EBR-II Cover Gas Cleanup System (CGCS) upgrade graphical interface design

International Nuclear Information System (INIS)

Staffon, J.D.; Peters, G.G.

1992-01-01

Technology advances in the past few years have prompted an effort at Argonne National Laboratory to replace existing equipment with high performance digital computers and color graphic displays. Improved operation of process systems can be achieved by utilizing state-of-the-art computer technology in the areas of process control and process monitoring. The Cover Gas Cleanup System (CGCS) at EBR-II is the first system to be upgraded with high performance digital equipment. The upgrade consisted of a main control computer, a distributed control computer, a front end input/output computer, a main graphics interface terminal, and a remote graphics interface terminal. This paper describes the main control computer and the operator interface control software
Graphical Models with R

DEFF Research Database (Denmark)

Højsgaard, Søren; Edwards, David; Lauritzen, Steffen

Graphical models in their modern form have been around since the late 1970s and appear today in many areas of the sciences. Along with the ongoing developments of graphical models, a number of different graphical modeling software programs have been written over the years. In recent years many...... of these software developments have taken place within the R community, either in the form of new packages or by providing an R ingerface to existing software. This book attempts to give the reader a gentle introduction to graphical modeling using R and the main features of some of these packages. In addition......, the book provides examples of how more advanced aspects of graphical modeling can be represented and handled within R. Topics covered in the seven chapters include graphical models for contingency tables, Gaussian and mixed graphical models, Bayesian networks and modeling high dimensional data...
Graphical user interface concepts for tactical augmented reality

Science.gov (United States)

Argenta, Chris; Murphy, Anne; Hinton, Jeremy; Cook, James; Sherrill, Todd; Snarski, Steve

2010-04-01

Applied Research Associates and BAE Systems are working together to develop a wearable augmented reality system under the DARPA ULTRA-Vis program†. Our approach to achieve the objectives of ULTRAVis, called iLeader, incorporates a full color 40° field of view (FOV) see-thru holographic waveguide integrated with sensors for full position and head tracking to provide an unobtrusive information system for operational maneuvers. iLeader will enable warfighters to mark-up the 3D battle-space with symbologic identification of graphical control measures, friendly force positions and enemy/target locations. Our augmented reality display provides dynamic real-time painting of symbols on real objects, a pose-sensitive 360° representation of relevant object positions, and visual feedback for a variety of system activities. The iLeader user interface and situational awareness graphical representations are highly intuitive, nondisruptive, and always tactically relevant. We used best human-factors practices, system engineering expertise, and cognitive task analysis to design effective strategies for presenting real-time situational awareness to the military user without distorting their natural senses and perception. We present requirements identified for presenting information within a see-through display in combat environments, challenges in designing suitable visualization capabilities, and solutions that enable us to bring real-time iconic command and control to the tactical user community.
Porting of the transfer-matrix method for multilayer thin-film computations on graphics processing units

Science.gov (United States)

Limmer, Steffen; Fey, Dietmar

2013-07-01

Thin-film computations are often a time-consuming task during optical design. An efficient way to accelerate these computations with the help of graphics processing units (GPUs) is described. It turned out that significant speed-ups can be achieved. We investigate the circumstances under which the best speed-up values can be expected. Therefore we compare different GPUs among themselves and with a modern CPU. Furthermore, the effect of thickness modulation on the speed-up and the runtime behavior depending on the input data is examined.
Paralelização e comparação de métodos iterativos na solução de sistemas lineares grandes e esparsos

Directory of Open Access Journals (Sweden)

Lauro Cássio Martins de Paula

2013-11-01

Full Text Available Apresenta-se neste trabalho uma comparação de desempenho computacional entre métodos iterativos utilizados para solução de sistemas lineares. O objetivo é mostrar que a utilização de processamento paralelo fornecido por uma Graphics Processing Unit (GPU pode ser viável, por viabilizar a solução rápida de sistemas de equações lineares, para que sistemas grandes e esparsos possam ser solucionados em um espaço curto de tempo. Para a validação do trabalho, utilizou-se uma GPU, por meio da arquitetura Compute Unified Device Architecture (CUDA, e comparou-se o desempenho computacional dos métodos iterativos de Jacobi, Gauss-Seidel, BiCGStab e BiCGStab(2 paralelizado na solução de sistemas lineares de tamanhos variados. Foi possível observar uma aceleração significativa nos testes com o método paralelizado, que se acentua consideravelmente na medida em que os sistemas aumentam. Os resultados mostraram que a aplicação de processamento paralelo em um método robusto e eficiente, tal como o BiCGStab(2, se torna muitas vezes indispensável, para que simulações sejam realizadas com qualidade e em tempo não proibitivo.Palavras-chave: CUDA. GPU. BiCGStab(2.Parallelization and comparison of interative methods in solving large and sparse linear systemsAbstractThis paper presents a computational performance comparison between some iterative methods used for linear systems solution. The goal is to show that the use of parallel processing provided by a Graphics Processing Unit (GPU may be more feasible, for making possible the fast solution of linear equations systems in order that complex and sparse problems can be solved in a short time. To validate the paper a GPU through the NVIDIA's Compute Unified Device Architecture (CUDA was employed and the computational performance was compared with Jacobi, Gauss-Seidel, BiCGStab iterative methods and BiCGStab(2 parallelized in the solution of linear systems of varying sizes. There was a
Development of a prototype graphic simulation program for severe accident training

International Nuclear Information System (INIS)

Kim, Ko Ryu; Jeong, Kwang Sub; Ha, Jae Joo

2000-05-01

This is a report of the development process and related technologies of severe accident graphic simulators, required in industrial severe accident management and training. Here, we say 'a severe accident graphic simulator' as a graphics add-in system to existing calculation codes, which can show the severe accident phenomena dynamically on computer screens and therefore which can supplement one of main defects of existing calculation codes. With graphic simulators it is fairly easy to see the total behavior of nuclear power plants, where it was very difficult to see only from partial variable numerical information. Moreover, the fast processing and control feature of a graphic simulator can give some opportunities of predicting the severe accident advancement among several possibilities, to one who is not an expert. Utilizing graphic simulators' we expect operators' and TSC members' physical phenomena understanding enhancement from the realistic dynamic behavior of plants. We also expect that severe accident training course can gain better training effects using graphic simulator's control functions and predicting capabilities, and therefore we expect that graphic simulators will be effective decision-aids tools both in sever accident training course and in real severe accident situations. With these in mind, we have developed a prototype graphic simulator having surveyed related technologies, and from this development experiences we have inspected the possibility to build a severe accident graphic simulator. The prototype graphic simulator is developed under IBM PC WinNT environments and is suited to Uljin 3and4 nuclear power plant. When supplied with adequate severe accident scenario as an input, the prototype can provide graphical simulations of plant safety systems' dynamic behaviors. The prototype is composed of several different modules, which are phenomena display module, MELCOR data interface module and graphic database interface module. Main functions of
ElectroEncephaloGraphics: Making waves in computer graphics research.

Science.gov (United States)

Mustafa, Maryam; Magnor, Marcus

2014-01-01

Electroencephalography (EEG) is a novel modality for investigating perceptual graphics problems. Until recently, EEG has predominantly been used for clinical diagnosis, in psychology, and by the brain-computer-interface community. Researchers are extending it to help understand the perception of visual output from graphics applications and to create approaches based on direct neural feedback. Researchers have applied EEG to graphics to determine perceived image and video quality by detecting typical rendering artifacts, to evaluate visualization effectiveness by calculating the cognitive load, and to automatically optimize rendering parameters for images and videos on the basis of implicit neural feedback.
Quick plasma equilibrium reconstruction based on GPU

International Nuclear Information System (INIS)

Xiao Bingjia; Huang, Y.; Luo, Z.P.; Yuan, Q.P.; Lao, L.

2014-01-01

A parallel code named P-EFIT which could complete an equilibrium reconstruction iteration in 250 μs is described. It is built with the CUDA TM architecture by using Graphical Processing Unit (GPU). It is described for the optimization of middle-scale matrix multiplication on GPU and an algorithm which could solve block tri-diagonal linear system efficiently in parallel. Benchmark test is conducted. Static test proves the accuracy of the P-EFIT and simulation-test proves the feasibility of using P-EFIT for real-time reconstruction on 65x65 computation grids. (author)
Development Of 12 Head GAMMA Detection And Graphical Presentation Software Suitable For Industrial Process Investigation By Radiotracer Technique

International Nuclear Information System (INIS)

Saengchantr, Dhanaj; Chueinta, Siripone

2009-07-01

Full text: Data logging with prompt graphical presentation software accommodating gamma radiation signals from 12 scintillation detectors through standard RS-232 interface has been developed. Laboratory testing by detection of injected-mixed radioactive tracer in a fluid flowing inside a pipe was conducted. The radioactive mixed fluid passed through the detectors located at several points along the pipe and the generated signals correspond to the mass flow inside the pipe were recorded. Up to 10,000 data points of fast (20 millisecond) dwell time could be accumulated. Graphical presentation allowed fast interpretation while the output data were suitable for more accurate evaluation with standard software e.g. Residence Time Distribution (RTD), Computed Tomography Visualization. Further utilization in the industry, in conjunction with radiotracer techniques, for troubleshooting and process optimization will be further carried out
Nuclear reactors; graphical symbols

International Nuclear Information System (INIS)

1987-11-01

This standard contains graphical symbols that reveal the type of nuclear reactor and is used to design graphical and technical presentations. Distinguishing features for nuclear reactors are laid down in graphical symbols. (orig.) [de
Upside to downsizing : Acceleware's graphic processor technology propels seismic data processing revolution

Energy Technology Data Exchange (ETDEWEB)

Smith, M.

2009-11-15

Accelware has developed a graphic processor technology (GPU) that is transforming the petroleum industry. The benefits of the technology are its small-footprint, low-wattage, and high speed. The software brings supercomputing speed to the desktop by leveraging the massive parallel processing capacity to the very latest in GPU technology. This article discussed the GPU technology and its emergence as a powerful supercomputing tool. Accelware's partnering with California-based NVIDIA was also outlined. The advantages of the technology were also discussed including its smaller footprint. Accelware's hardware takes up a fraction of the space and uses up to 70 per cent less power than a traditional central processing unit. By combining Accelware's core knowledge in making complex algorithms run in parallel with an in-house team of seismic industry experts, the company provides software solutions for seismic data processors that access the massively parallel processing capabilities of GPUs. 1 fig.
Graphics processor efficiency for realization of rapid tabular computations

International Nuclear Information System (INIS)

Dudnik, V.A.; Kudryavtsev, V.I.; Us, S.A.; Shestakov, M.V.

2016-01-01

Capabilities of graphics processing units (GPU) and central processing units (CPU) have been investigated for realization of fast-calculation algorithms with the use of tabulated functions. The realization of tabulated functions is exemplified by the GPU/CPU architecture-based processors. Comparison is made between the operating efficiencies of GPU and CPU, employed for tabular calculations at different conditions of use. Recommendations are formulated for the use of graphical and central processors to speed up scientific and engineering computations through the use of tabulated functions

Investigating Creativity in Graphic Design Education from Psychological Perspectives

Directory of Open Access Journals (Sweden)

Salman Amur Alhajri

2017-01-01

Full Text Available The role of creativity in graphic design education has been a central aspect of graphic design education. The psychological component of creativity and its role in graphic design education has not been given much importance. The present research would attempt to study ‘creativity in graphic design education from psychological perspectives’. A thorough review of literature would be conducted on graphic design education, creativity and its psychological aspects. Creativity is commonly defined as a ‘problem solving’ feature in design education. Students of graphic design have to involve themselves in the identification of cultural and social elements. Instruction in the field of graphic design must be aimed at enhancing the creative abilities of the student. The notion that creativity is a cultural production is strengthened by the problem solving methods employed in all cultures. Most cultures regard creativity as a process which leads to the creation of something new. Based on this idea, a cross-cultural research was conducted to explore the concept of creativity from Arabic and Western perspective. From a psychological viewpoint, the student’s cognition, thinking patterns and habits also have a role in knowledge acquisition. The field of graphic design is not equipped with a decent framework which necessitates certain modes of instruction; appropriate to the discipline. The results of the study revealed that the psychological aspect of creativity needs to be adequately understood in order to enhance creativity in graphic design education.
Real-time speckle variance swept-source optical coherence tomography using a graphics processing unit.

Science.gov (United States)

Lee, Kenneth K C; Mariampillai, Adrian; Yu, Joe X Z; Cadotte, David W; Wilson, Brian C; Standish, Beau A; Yang, Victor X D

2012-07-01

Advances in swept source laser technology continues to increase the imaging speed of swept-source optical coherence tomography (SS-OCT) systems. These fast imaging speeds are ideal for microvascular detection schemes, such as speckle variance (SV), where interframe motion can cause severe imaging artifacts and loss of vascular contrast. However, full utilization of the laser scan speed has been hindered by the computationally intensive signal processing required by SS-OCT and SV calculations. Using a commercial graphics processing unit that has been optimized for parallel data processing, we report a complete high-speed SS-OCT platform capable of real-time data acquisition, processing, display, and saving at 108,000 lines per second. Subpixel image registration of structural images was performed in real-time prior to SV calculations in order to reduce decorrelation from stationary structures induced by the bulk tissue motion. The viability of the system was successfully demonstrated in a high bulk tissue motion scenario of human fingernail root imaging where SV images (512 × 512 pixels, n = 4) were displayed at 54 frames per second.
Achromatic hues matching in graphic printing

Directory of Open Access Journals (Sweden)

Martinia Ira Glogar

2015-05-01

Full Text Available Some problems in process of dark achromatic hues reproduction and matching in graphic industry, where requests on colour matching are very high, are discussed. When achromatic hues is concerned, in terms of high requests on colour parameter matching, right on time production, quick response and high quality standards requests, the production and moreover the reproduction is subject to many variables and represent the manufacturing process of high complexity. The aim is to achieve a graphic reproduction with defined colour parameters and remission characteristics as close as possible to a standard. In this paper, black and grey hues characterized with average lightness value L*≤ 20, were analysed. Subjective as well as objective colour evaluation have been performed and results of colour differences obtained by two colour difference formulae, CIELAB and CMC(l:c have been compared.
Graphic Storytelling

Science.gov (United States)

Thompson, John

2009-01-01

Graphic storytelling is a medium that allows students to make and share stories, while developing their art communication skills. American comics today are more varied in genre, approach, and audience than ever before. When considering the impact of Japanese manga on the youth, graphic storytelling emerges as a powerful player in pop culture. In…
Object tracking mask-based NLUT on GPUs for real-time generation of holographic videos of three-dimensional scenes.

Science.gov (United States)

Kwon, M-W; Kim, S-C; Yoon, S-E; Ho, Y-S; Kim, E-S

2015-02-09

A new object tracking mask-based novel-look-up-table (OTM-NLUT) method is proposed and implemented on graphics-processing-units (GPUs) for real-time generation of holographic videos of three-dimensional (3-D) scenes. Since the proposed method is designed to be matched with software and memory structures of the GPU, the number of compute-unified-device-architecture (CUDA) kernel function calls and the computer-generated hologram (CGH) buffer size of the proposed method have been significantly reduced. It therefore results in a great increase of the computational speed of the proposed method and enables real-time generation of CGH patterns of 3-D scenes. Experimental results show that the proposed method can generate 31.1 frames of Fresnel CGH patterns with 1,920 × 1,080 pixels per second, on average, for three test 3-D video scenarios with 12,666 object points on three GPU boards of NVIDIA GTX TITAN, and confirm the feasibility of the proposed method in the practical application of electro-holographic 3-D displays.
Graphic Turbulence Guidance

Data.gov (United States)

National Oceanic and Atmospheric Administration, Department of Commerce — Forecast turbulence hazards identified by the Graphical Turbulence Guidance algorithm. The Graphical Turbulence Guidance product depicts mid-level and upper-level...
Monte Carlo method for neutron transport calculations in graphics processing units (GPUs)

International Nuclear Information System (INIS)

Pellegrino, Esteban

2011-01-01

Monte Carlo simulation is well suited for solving the Boltzmann neutron transport equation in an inhomogeneous media for complicated geometries. However, routine applications require the computation time to be reduced to hours and even minutes in a desktop PC. The interest in adopting Graphics Processing Units (GPUs) for Monte Carlo acceleration is rapidly growing. This is due to the massive parallelism provided by the latest GPU technologies which is the most promising solution to the challenge of performing full-size reactor core analysis on a routine basis. In this study, Monte Carlo codes for a fixed-source neutron transport problem were developed for GPU environments in order to evaluate issues associated with computational speedup using GPUs. Results obtained in this work suggest that a speedup of several orders of magnitude is possible using the state-of-the-art GPU technologies. (author) [es
Area-delay trade-offs of texture decompressors for a graphics processing unit

Science.gov (United States)

Novoa Súñer, Emilio; Ituero, Pablo; López-Vallejo, Marisa

2011-05-01

Graphics Processing Units have become a booster for the microelectronics industry. However, due to intellectual property issues, there is a serious lack of information on implementation details of the hardware architecture that is behind GPUs. For instance, the way texture is handled and decompressed in a GPU to reduce bandwidth usage has never been dealt with in depth from a hardware point of view. This work addresses a comparative study on the hardware implementation of different texture decompression algorithms for both conventional (PCs and video game consoles) and mobile platforms. Circuit synthesis is performed targeting both a reconfigurable hardware platform and a 90nm standard cell library. Area-delay trade-offs have been extensively analyzed, which allows us to compare the complexity of decompressors and thus determine suitability of algorithms for systems with limited hardware resources.
Graphic Presentation: An Empirical Examination of the Graphic Novel Approach to Communicate Business Concepts

Science.gov (United States)

Short, Jeremy C.; Randolph-Seng, Brandon; McKenny, Aaron F.

2013-01-01

Graphic novels have been increasingly incorporated into business communication forums. Despite potential benefits, little research has examined the merits of the graphic novel approach. In response, we engage in a two-study approach. Study 1 explores the potential of graphic novels to affect learning outcomes and finds that the graphic novel was…
A graphical criterion for working fluid selection and thermodynamic system comparison in waste heat recovery

International Nuclear Information System (INIS)

Xi, Huan; Li, Ming-Jia; He, Ya-Ling; Tao, Wen-Quan

2015-01-01

In the present study, we proposed a graphical criterion called CE diagram by achieving the Pareto optimal solutions of the annual cash flow and exergy efficiency. This new graphical criterion enables both working fluid selection and thermodynamic system comparison for waste heat recovery. It's better than the existing criterion based on single objective optimization because it is graphical and intuitionistic in the form of diagram. The features of CE diagram were illustrated by studying 5 examples with different heat-source temperatures (ranging between 100 °C to 260 °C), 26 chlorine-free working fluids and two typical ORC systems including basic organic Rankine cycle(BORC) and recuperative organic Rankine cycle (RORC). It is found that the proposed graphical criterion is feasible and can be applied to any closed loop waste heat recovery thermodynamic systems and working fluids. - Highlights: • A graphical method for ORC system comparison/working fluid selection was proposed. • Multi-objectives genetic algorithm (MOGA) was applied for optimizing ORC systems. • Application cases were performed to demonstrate the usage of the proposed method.
Development of a prototype graphic simulation program for severe accident training

Energy Technology Data Exchange (ETDEWEB)

Kim, Ko Ryu; Jeong, Kwang Sub; Ha, Jae Joo

2000-05-01

This is a report of the development process and related technologies of severe accident graphic simulators, required in industrial severe accident management and training. Here, we say 'a severe accident graphic simulator' as a graphics add-in system to existing calculation codes, which can show the severe accident phenomena dynamically on computer screens and therefore which can supplement one of main defects of existing calculation codes. With graphic simulators it is fairly easy to see the total behavior of nuclear power plants, where it was very difficult to see only from partial variable numerical information. Moreover, the fast processing and control feature of a graphic simulator can give some opportunities of predicting the severe accident advancement among several possibilities, to one who is not an expert. Utilizing graphic simulators' we expect operators' and TSC members' physical phenomena understanding enhancement from the realistic dynamic behavior of plants. We also expect that severe accident training course can gain better training effects using graphic simulator's control functions and predicting capabilities, and therefore we expect that graphic simulators will be effective decision-aids tools both in sever accident training course and in real severe accident situations. With these in mind, we have developed a prototype graphic simulator having surveyed related technologies, and from this development experiences we have inspected the possibility to build a severe accident graphic simulator. The prototype graphic simulator is developed under IBM PC WinNT environments and is suited to Uljin 3and4 nuclear power plant. When supplied with adequate severe accident scenario as an input, the prototype can provide graphical simulations of plant safety systems' dynamic behaviors. The prototype is composed of several different modules, which are phenomena display module, MELCOR data interface module and graphic database
Development of a Semi-Automatic Technique for Flow Estimation using Optical Flow Registration and k-means Clustering on Two Dimensional Cardiovascular Magnetic Resonance Flow Images

DEFF Research Database (Denmark)

Brix, Lau; Christoffersen, Christian P. V.; Kristiansen, Martin Søndergaard

was then categorized into groups by the k-means clustering method. Finally, the cluster containing the vessel under investigation was selected manually by a single mouse click. All calculations were performed on a Nvidia 8800 GTX graphics card using the Compute Unified Device Architecture (CUDA) extension to the C...
Parallel implementation of DNA sequences matching algorithms using PWM on GPU architecture.

Science.gov (United States)

Sharma, Rahul; Gupta, Nitin; Narang, Vipin; Mittal, Ankush

2011-01-01

Positional Weight Matrices (PWMs) are widely used in representation and detection of Transcription Factor Of Binding Sites (TFBSs) on DNA. We implement online PWM search algorithm over parallel architecture. A large PWM data can be processed on Graphic Processing Unit (GPU) systems in parallel which can help in matching sequences at a faster rate. Our method employs extensive usage of highly multithreaded architecture and shared memory of multi-cored GPU. An efficient use of shared memory is required to optimise parallel reduction in CUDA. Our optimised method has a speedup of 230-280x over linear implementation on GPU named GeForce GTX 280.
Accelerating cardiac bidomain simulations using graphics processing units.

Science.gov (United States)

Neic, A; Liebmann, M; Hoetzl, E; Mitchell, L; Vigmond, E J; Haase, G; Plank, G

2012-08-01

Anatomically realistic and biophysically detailed multiscale computer models of the heart are playing an increasingly important role in advancing our understanding of integrated cardiac function in health and disease. Such detailed simulations, however, are computationally vastly demanding, which is a limiting factor for a wider adoption of in-silico modeling. While current trends in high-performance computing (HPC) hardware promise to alleviate this problem, exploiting the potential of such architectures remains challenging since strongly scalable algorithms are necessitated to reduce execution times. Alternatively, acceleration technologies such as graphics processing units (GPUs) are being considered. While the potential of GPUs has been demonstrated in various applications, benefits in the context of bidomain simulations where large sparse linear systems have to be solved in parallel with advanced numerical techniques are less clear. In this study, the feasibility of multi-GPU bidomain simulations is demonstrated by running strong scalability benchmarks using a state-of-the-art model of rabbit ventricles. The model is spatially discretized using the finite element methods (FEM) on fully unstructured grids. The GPU code is directly derived from a large pre-existing code, the Cardiac Arrhythmia Research Package (CARP), with very minor perturbation of the code base. Overall, bidomain simulations were sped up by a factor of 11.8 to 16.3 in benchmarks running on 6-20 GPUs compared to the same number of CPU cores. To match the fastest GPU simulation which engaged 20 GPUs, 476 CPU cores were required on a national supercomputing facility.
THE USING OF GRAPHICAL EDITOR IN THE ENGINEERING GRAPHICS AND THE COURSE DESIGNING

Directory of Open Access Journals (Sweden)

KARPYUK L. V.

2016-08-01

Full Text Available The problems of learning students of the engineering and computer graphics of the course on the base of computer-aided design (CAD were described in the article. The examples of training tasks for acquiring knowledge of work in the environment of graphical editor of AutoCAD were shown. These examples are needed to perform drawings on The Engineering Graphics, and also for a graphic part of Course Projects for students of mechanical specialties.
Graphics gems V (Macintosh version)

CERN Document Server

Paeth, Alan W

1995-01-01

Graphics Gems V is the newest volume in The Graphics Gems Series. It is intended to provide the graphics community with a set of practical tools for implementing new ideas and techniques, and to offer working solutions to real programming problems. These tools are written by a wide variety of graphics programmers from industry, academia, and research. The books in the series have become essential, time-saving tools for many programmers.Latest collection of graphics tips in The Graphics Gems Series written by the leading programmers in the field.Contains over 50 new gems displaying some of t
Graphical Rasch models

DEFF Research Database (Denmark)

Kreiner, Svend; Christensen, Karl Bang

Rasch models; Partial Credit models; Rating Scale models; Item bias; Differential item functioning; Local independence; Graphical models......Rasch models; Partial Credit models; Rating Scale models; Item bias; Differential item functioning; Local independence; Graphical models...
Real-time processing for full-range Fourier-domain optical-coherence tomography with zero-filling interpolation using multiple graphic processing units.

Science.gov (United States)

Watanabe, Yuuki; Maeno, Seiya; Aoshima, Kenji; Hasegawa, Haruyuki; Koseki, Hitoshi

2010-09-01

The real-time display of full-range, 2048?axial pixelx1024?lateral pixel, Fourier-domain optical-coherence tomography (FD-OCT) images is demonstrated. The required speed was achieved by using dual graphic processing units (GPUs) with many stream processors to realize highly parallel processing. We used a zero-filling technique, including a forward Fourier transform, a zero padding to increase the axial data-array size to 8192, an inverse-Fourier transform back to the spectral domain, a linear interpolation from wavelength to wavenumber, a lateral Hilbert transform to obtain the complex spectrum, a Fourier transform to obtain the axial profiles, and a log scaling. The data-transfer time of the frame grabber was 15.73?ms, and the processing time, which includes the data transfer between the GPU memory and the host computer, was 14.75?ms, for a total time shorter than the 36.70?ms frame-interval time using a line-scan CCD camera operated at 27.9?kHz. That is, our OCT system achieved a processed-image display rate of 27.23 frames/s.
Real time 3D structural and Doppler OCT imaging on graphics processing units

Science.gov (United States)

Sylwestrzak, Marcin; Szlag, Daniel; Szkulmowski, Maciej; Gorczyńska, Iwona; Bukowska, Danuta; Wojtkowski, Maciej; Targowski, Piotr

2013-03-01

In this report the application of graphics processing unit (GPU) programming for real-time 3D Fourier domain Optical Coherence Tomography (FdOCT) imaging with implementation of Doppler algorithms for visualization of the flows in capillary vessels is presented. Generally, the time of the data processing of the FdOCT data on the main processor of the computer (CPU) constitute a main limitation for real-time imaging. Employing additional algorithms, such as Doppler OCT analysis, makes this processing even more time consuming. Lately developed GPUs, which offers a very high computational power, give a solution to this problem. Taking advantages of them for massively parallel data processing, allow for real-time imaging in FdOCT. The presented software for structural and Doppler OCT allow for the whole processing with visualization of 2D data consisting of 2000 A-scans generated from 2048 pixels spectra with frame rate about 120 fps. The 3D imaging in the same mode of the volume data build of 220 × 100 A-scans is performed at a rate of about 8 frames per second. In this paper a software architecture, organization of the threads and optimization applied is shown. For illustration the screen shots recorded during real time imaging of the phantom (homogeneous water solution of Intralipid in glass capillary) and the human eye in-vivo is presented.
A Live-Time Relation: Motion Graphics meets Classical Music

DEFF Research Database (Denmark)

Steijn, Arthur

2014-01-01

, liveness and atmosphere. The design model will be a framework for both academic analytical studies as well as for designing time-based narratives and visual concepts involving motion graphics in spatial contexts. I focus on cases in which both pre-rendered, and live generated motion graphics are designed......In our digital age, we frequently meet fine examples of live performances of classical music with accompanying visuals. Yet, we find very little theoretical or analytical work on the relation between classical music and digital temporal visuals, nor on the process of creating them. In this paper, I...... present segments of my work toward a working model for the process of design of visuals and motion graphics applied in spatial contexts. I show how various design elements and components: line and shape, tone and colour, time and timing, rhythm and movement interact with conceptualizations of space...

Graphics processing unit accelerated intensity-based optical coherence tomography angiography using differential frames with real-time motion correction.

Science.gov (United States)

Watanabe, Yuuki; Takahashi, Yuhei; Numazawa, Hiroshi

2014-02-01

We demonstrate intensity-based optical coherence tomography (OCT) angiography using the squared difference of two sequential frames with bulk-tissue-motion (BTM) correction. This motion correction was performed by minimization of the sum of the pixel values using axial- and lateral-pixel-shifted structural OCT images. We extract the BTM-corrected image from a total of 25 calculated OCT angiographic images. Image processing was accelerated by a graphics processing unit (GPU) with many stream processors to optimize the parallel processing procedure. The GPU processing rate was faster than that of a line scan camera (46.9 kHz). Our OCT system provides the means of displaying structural OCT images and BTM-corrected OCT angiographic images in real time.
Ensuring the principle of visibility when examining graphic disciplines

Directory of Open Access Journals (Sweden)

Tel’noy Viktor Ivanovich

2015-11-01

Full Text Available The article shows the importance of the use of didactic principle of visualization in the study of graphic disciplines for more effective organization of educational process, improvement of forms, methods and means of education. The authors analyze different approaches to the classification of means of visualization in modern pedagogy. The proposed classification of clarity with regard to graphic disciplines can be used not so much for their classification, as for the full and effective use of their capabilities in the learning process. The article demonstrates structural links between the stages of clarity, use of funds, ways and rules of their use, leading to successful achievement of the goals for the revitalization of the educational process and enhancing cognitive interest of students. Practical recommendations for the integrated use of means of presentation in the classes on descriptive geometry, engineering graphics and computer graphics are given. Special attention in the learning process is paid to the role of the teacher. In addition to his or her professional knowledge, a teacher should possess oratory skills, to competently combine the rhetoric and psychological techniques to use interactive and effective active forms of training, including workshops, to engage students in the learning process, to monitor feedback from the students’ audience. When conducting different kinds of practice, teachers should know the advantages and disadvantages, strengths and weaknesses, timely application of every means of presentation for greater impact and effect in the educational process. The effectiveness of using the selected visualization tools is largely determined by the methods and techniques of their use in the classroom. It is important to consider the following factors: location, convenient for review, and approach; the accessibility; the expert support of a demonstration by the review; the duration of the demonstration; training students
The computer graphics metafile

CERN Document Server

Henderson, LR; Shepherd, B; Arnold, D B

1990-01-01

The Computer Graphics Metafile deals with the Computer Graphics Metafile (CGM) standard and covers topics ranging from the structure and contents of a metafile to CGM functionality, metafile elements, and real-world applications of CGM. Binary Encoding, Character Encoding, application profiles, and implementations are also discussed. This book is comprised of 18 chapters divided into five sections and begins with an overview of the CGM standard and how it can meet some of the requirements for storage of graphical data within a graphics system or application environment. The reader is then intr
The computer graphics interface

CERN Document Server

Steinbrugge Chauveau, Karla; Niles Reed, Theodore; Shepherd, B

2014-01-01

The Computer Graphics Interface provides a concise discussion of computer graphics interface (CGI) standards. The title is comprised of seven chapters that cover the concepts of the CGI standard. Figures and examples are also included. The first chapter provides a general overview of CGI; this chapter covers graphics standards, functional specifications, and syntactic interfaces. Next, the book discusses the basic concepts of CGI, such as inquiry, profiles, and registration. The third chapter covers the CGI concepts and functions, while the fourth chapter deals with the concept of graphic obje
Programación de gráficos 3D con Mathematica, DrawGraphics, CurvesGraphics, LiveGraphics3D y JavaView

OpenAIRE

Mora Flores, Walter; Instituto Tecnológico de Costa Rica; Figueroa, Geovanni; Instituto Tecnológico de Costa Rica

2015-01-01

Se muestra como integrar las herramientas: Mathematica (y los paquetes DrawGraphics y CurvesGraphics), LiveGraphics3D, JavaView y html, para crear algunas figuras 3D las cuales se pueden incrustar en páginas Web independientes y con posibilidad de interacción.
Interplay of Computer and Paper-Based Sketching in Graphic Design

Science.gov (United States)

Pan, Rui; Kuo, Shih-Ping; Strobel, Johannes

2013-01-01

The purpose of this study is to investigate student designers' attitude and choices towards the use of computers and paper sketches when involved in a graphic design process. 65 computer graphic technology undergraduates participated in this research. A mixed method study with survey and in-depth interviews was applied to answer the research…
Design Application Translates 2-D Graphics to 3-D Surfaces

Science.gov (United States)

2007-01-01

Fabric Images Inc., specializing in the printing and manufacturing of fabric tension architecture for the retail, museum, and exhibit/tradeshow communities, designed software to translate 2-D graphics for 3-D surfaces prior to print production. Fabric Images' fabric-flattening design process models a 3-D surface based on computer-aided design (CAD) specifications. The surface geometry of the model is used to form a 2-D template, similar to a flattening process developed by NASA's Glenn Research Center. This template or pattern is then applied in the development of a 2-D graphic layout. Benefits of this process include 11.5 percent time savings per project, less material wasted, and the ability to improve upon graphic techniques and offer new design services. Partners include Exhibitgroup/Giltspur (end-user client: TAC Air, a division of Truman Arnold Companies Inc.), Jack Morton Worldwide (end-user client: Nickelodeon), as well as 3D Exhibits Inc., and MG Design Associates Corp.
MASSIVELY PARALLEL LATENT SEMANTIC ANALYSES USING A GRAPHICS PROCESSING UNIT

Energy Technology Data Exchange (ETDEWEB)

Cavanagh, J.; Cui, S.

2009-01-01

Latent Semantic Analysis (LSA) aims to reduce the dimensions of large term-document datasets using Singular Value Decomposition. However, with the ever-expanding size of datasets, current implementations are not fast enough to quickly and easily compute the results on a standard PC. A graphics processing unit (GPU) can solve some highly parallel problems much faster than a traditional sequential processor or central processing unit (CPU). Thus, a deployable system using a GPU to speed up large-scale LSA processes would be a much more effective choice (in terms of cost/performance ratio) than using a PC cluster. Due to the GPU’s application-specifi c architecture, harnessing the GPU’s computational prowess for LSA is a great challenge. We presented a parallel LSA implementation on the GPU, using NVIDIA® Compute Unifi ed Device Architecture and Compute Unifi ed Basic Linear Algebra Subprograms software. The performance of this implementation is compared to traditional LSA implementation on a CPU using an optimized Basic Linear Algebra Subprograms library. After implementation, we discovered that the GPU version of the algorithm was twice as fast for large matrices (1 000x1 000 and above) that had dimensions not divisible by 16. For large matrices that did have dimensions divisible by 16, the GPU algorithm ran fi ve to six times faster than the CPU version. The large variation is due to architectural benefi ts of the GPU for matrices divisible by 16. It should be noted that the overall speeds for the CPU version did not vary from relative normal when the matrix dimensions were divisible by 16. Further research is needed in order to produce a fully implementable version of LSA. With that in mind, the research we presented shows that the GPU is a viable option for increasing the speed of LSA, in terms of cost/performance ratio.
Mathematical structures for computer graphics

CERN Document Server

Janke, Steven J

2014-01-01

A comprehensive exploration of the mathematics behind the modeling and rendering of computer graphics scenes Mathematical Structures for Computer Graphics presents an accessible and intuitive approach to the mathematical ideas and techniques necessary for two- and three-dimensional computer graphics. Focusing on the significant mathematical results, the book establishes key algorithms used to build complex graphics scenes. Written for readers with various levels of mathematical background, the book develops a solid foundation for graphics techniques and fills in relevant grap
Improving aircraft conceptual design - A PHIGS interactive graphics interface for ACSYNT

Science.gov (United States)

Wampler, S. G.; Myklebust, A.; Jayaram, S.; Gelhausen, P.

1988-01-01

A CAD interface has been created for the 'ACSYNT' aircraft conceptual design code that permits the execution and control of the design process via interactive graphics menus. This CAD interface was coded entirely with the new three-dimensional graphics standard, the Programmer's Hierarchical Interactive Graphics System. The CAD/ACSYNT system is designed for use by state-of-the-art high-speed imaging work stations. Attention is given to the approaches employed in modeling, data storage, and rendering.
Probabilistic Inference in General Graphical Models through Sampling in Stochastic Networks of Spiking Neurons

Science.gov (United States)

Pecevski, Dejan; Buesing, Lars; Maass, Wolfgang

2011-01-01

An important open problem of computational neuroscience is the generic organization of computations in networks of neurons in the brain. We show here through rigorous theoretical analysis that inherent stochastic features of spiking neurons, in combination with simple nonlinear computational operations in specific network motifs and dendritic arbors, enable networks of spiking neurons to carry out probabilistic inference through sampling in general graphical models. In particular, it enables them to carry out probabilistic inference in Bayesian networks with converging arrows (“explaining away”) and with undirected loops, that occur in many real-world tasks. Ubiquitous stochastic features of networks of spiking neurons, such as trial-to-trial variability and spontaneous activity, are necessary ingredients of the underlying computational organization. We demonstrate through computer simulations that this approach can be scaled up to neural emulations of probabilistic inference in fairly large graphical models, yielding some of the most complex computations that have been carried out so far in networks of spiking neurons. PMID:22219717
Probabilistic inference in general graphical models through sampling in stochastic networks of spiking neurons.

Directory of Open Access Journals (Sweden)

Dejan Pecevski

2011-12-01

Full Text Available An important open problem of computational neuroscience is the generic organization of computations in networks of neurons in the brain. We show here through rigorous theoretical analysis that inherent stochastic features of spiking neurons, in combination with simple nonlinear computational operations in specific network motifs and dendritic arbors, enable networks of spiking neurons to carry out probabilistic inference through sampling in general graphical models. In particular, it enables them to carry out probabilistic inference in Bayesian networks with converging arrows ("explaining away" and with undirected loops, that occur in many real-world tasks. Ubiquitous stochastic features of networks of spiking neurons, such as trial-to-trial variability and spontaneous activity, are necessary ingredients of the underlying computational organization. We demonstrate through computer simulations that this approach can be scaled up to neural emulations of probabilistic inference in fairly large graphical models, yielding some of the most complex computations that have been carried out so far in networks of spiking neurons.
Probabilistic inference in general graphical models through sampling in stochastic networks of spiking neurons.

Science.gov (United States)

Pecevski, Dejan; Buesing, Lars; Maass, Wolfgang

2011-12-01

An important open problem of computational neuroscience is the generic organization of computations in networks of neurons in the brain. We show here through rigorous theoretical analysis that inherent stochastic features of spiking neurons, in combination with simple nonlinear computational operations in specific network motifs and dendritic arbors, enable networks of spiking neurons to carry out probabilistic inference through sampling in general graphical models. In particular, it enables them to carry out probabilistic inference in Bayesian networks with converging arrows ("explaining away") and with undirected loops, that occur in many real-world tasks. Ubiquitous stochastic features of networks of spiking neurons, such as trial-to-trial variability and spontaneous activity, are necessary ingredients of the underlying computational organization. We demonstrate through computer simulations that this approach can be scaled up to neural emulations of probabilistic inference in fairly large graphical models, yielding some of the most complex computations that have been carried out so far in networks of spiking neurons.
Fast ray-tracing of human eye optics on Graphics Processing Units.

Science.gov (United States)

Wei, Qi; Patkar, Saket; Pai, Dinesh K

2014-05-01

We present a new technique for simulating retinal image formation by tracing a large number of rays from objects in three dimensions as they pass through the optic apparatus of the eye to objects. Simulating human optics is useful for understanding basic questions of vision science and for studying vision defects and their corrections. Because of the complexity of computing such simulations accurately, most previous efforts used simplified analytical models of the normal eye. This makes them less effective in modeling vision disorders associated with abnormal shapes of the ocular structures which are hard to be precisely represented by analytical surfaces. We have developed a computer simulator that can simulate ocular structures of arbitrary shapes, for instance represented by polygon meshes. Topographic and geometric measurements of the cornea, lens, and retina from keratometer or medical imaging data can be integrated for individualized examination. We utilize parallel processing using modern Graphics Processing Units (GPUs) to efficiently compute retinal images by tracing millions of rays. A stable retinal image can be generated within minutes. We simulated depth-of-field, accommodation, chromatic aberrations, as well as astigmatism and correction. We also show application of the technique in patient specific vision correction by incorporating geometric models of the orbit reconstructed from clinical medical images. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Docker Containers for Deep Learning Experiments

OpenAIRE

Gerke, Paul K.

2017-01-01

Deep learning is a powerful tool to solve problems in the area of image analysis. The dominant compute platform for deep learning is Nvidia’s proprietary CUDA, which can only be used together with Nvidia graphics cards. The nivida-docker project allows exposing Nvidia graphics cards to docker containers and thus makes it possible to run deep learning experiments in docker containers.In our department, we use deep learning to solve problems in the area of medical image analysis and use docker ...
Graphical Models with R

CERN Document Server

Højsgaard, Søren; Lauritzen, Steffen

2012-01-01

Graphical models in their modern form have been around since the late 1970s and appear today in many areas of the sciences. Along with the ongoing developments of graphical models, a number of different graphical modeling software programs have been written over the years. In recent years many of these software developments have taken place within the R community, either in the form of new packages or by providing an R interface to existing software. This book attempts to give the reader a gentle introduction to graphical modeling using R and the main features of some of these packages. In add
Performance tuning for CUDA-accelerated neighborhood denoising filters

Energy Technology Data Exchange (ETDEWEB)

Zheng, Ziyi; Mueller, Klaus [Stony Brook Univ., NY (United States). Center for Visual Computing, Computer Science; Xu, Wei

2011-07-01

Neighborhood denoising filters are powerful techniques in image processing and can effectively enhance the image quality in CT reconstructions. In this study, by taking the bilateral filter and the non-local mean filter as two examples, we discuss their implementations and perform fine-tuning on the targeted GPU architecture. Experimental results show that the straightforward GPU-based neighborhood filters can be further accelerated by pre-fetching. The optimized GPU-accelerated denoising filters are ready for plug-in into reconstruction framework to enable fast denoising without compromising image quality. (orig.)
Graphic notation

DEFF Research Database (Denmark)

Bergstrøm-Nielsen, Carl

1992-01-01

Texbook to be used along with training the practise of graphic notation. Describes method; exercises; bibliography; collection of examples. If you can read Danish, please refer to that edition which is by far much more updated.......Texbook to be used along with training the practise of graphic notation. Describes method; exercises; bibliography; collection of examples. If you can read Danish, please refer to that edition which is by far much more updated....
Graphics gems

CERN Document Server

Glassner, Andrew S

1993-01-01

""The GRAPHICS GEMS Series"" was started in 1990 by Andrew Glassner. The vision and purpose of the Series was - and still is - to provide tips, techniques, and algorithms for graphics programmers. All of the gems are written by programmers who work in the field and are motivated by a common desire to share interesting ideas and tools with their colleagues. Each volume provides a new set of innovative solutions to a variety of programming problems.
Simulation of Specular Surface Imaging Based on Computer Graphics: Application on a Vision Inspection System

Directory of Open Access Journals (Sweden)

Seulin Ralph

2002-01-01

Full Text Available This work aims at detecting surface defects on reflecting industrial parts. A machine vision system, performing the detection of geometric aspect surface defects, is completely described. The revealing of defects is realized by a particular lighting device. It has been carefully designed to ensure the imaging of defects. The lighting system simplifies a lot the image processing for defect segmentation and so a real-time inspection of reflective products is possible. To bring help in the conception of imaging conditions, a complete simulation is proposed. The simulation, based on computer graphics, enables the rendering of realistic images. Simulation provides here a very efficient way to perform tests compared to the numerous attempts of manual experiments.

Graphical models for inference under outcome-dependent sampling

DEFF Research Database (Denmark)

Didelez, V; Kreiner, S; Keiding, N

2010-01-01

a node for the sampling indicator, assumptions about sampling processes can be made explicit. We demonstrate how to read off such graphs whether consistent estimation of the association between exposure and outcome is possible. Moreover, we give sufficient graphical conditions for testing and estimating......We consider situations where data have been collected such that the sampling depends on the outcome of interest and possibly further covariates, as for instance in case-control studies. Graphical models represent assumptions about the conditional independencies among the variables. By including...
Interactive Graphic Journalism

NARCIS (Netherlands)

Schlichting, Laura

2016-01-01

textabstractThis paper examines graphic journalism (GJ) in a transmedial context, and argues that transmedial graphic journalism (TMGJ) is an important and fruitful new form of visual storytelling, that will re-invigorate the field of journalism, as it steadily tests out and plays with new media,
Evaluating Texts for Graphical Literacy Instruction: The Graphic Rating Tool

Science.gov (United States)

Roberts, Kathryn L.; Brugar, Kristy A.; Norman, Rebecca R.

2015-01-01

In this article, we present the Graphical Rating Tool (GRT), which is designed to evaluate the graphical devices that are commonly found in content-area, non-fiction texts, in order to identify books that are well suited for teaching about those devices. We also present a "best of" list of science and social studies books, which includes…
Graphical programming at Sandia National Laboratories

International Nuclear Information System (INIS)

McDonald, M.J.; Palmquist, R.D.; Desjarlais, L.

1993-09-01

Sandia has developed an advanced operational control system approach, called Graphical Programming, to design, program, and operate robotic systems. The Graphical Programming approach produces robot systems that are faster to develop and use, safer in operation, and cheaper overall than altemative teleoperation or autonomous robot control systems. Graphical Programming also provides an efficient and easy-to-use interface to traditional robot systems for use in setup and programming tasks. This paper provides an overview of the Graphical Programming approach and lists key features of Graphical Programming systems. Graphical Programming uses 3-D visualization and simulation software with intuitive operator interfaces for the programming and control of complex robotic systems. Graphical Programming Supervisor software modules allow an operator to command and simulate complex tasks in a graphic preview mode and, when acceptable, command the actual robots and monitor their motions with the graphic system. Graphical Programming Supervisors maintain registration with the real world and allow the robot to perform tasks that cannot be accurately represented with models alone by using a combination of model and sensor-based control
Accelerating large-scale protein structure alignments with graphics processing units

Directory of Open Access Journals (Sweden)

Pang Bin

2012-02-01

Full Text Available Abstract Background Large-scale protein structure alignment, an indispensable tool to structural bioinformatics, poses a tremendous challenge on computational resources. To ensure structure alignment accuracy and efficiency, efforts have been made to parallelize traditional alignment algorithms in grid environments. However, these solutions are costly and of limited accessibility. Others trade alignment quality for speedup by using high-level characteristics of structure fragments for structure comparisons. Findings We present ppsAlign, a parallel protein structure Alignment framework designed and optimized to exploit the parallelism of Graphics Processing Units (GPUs. As a general-purpose GPU platform, ppsAlign could take many concurrent methods, such as TM-align and Fr-TM-align, into the parallelized algorithm design. We evaluated ppsAlign on an NVIDIA Tesla C2050 GPU card, and compared it with existing software solutions running on an AMD dual-core CPU. We observed a 36-fold speedup over TM-align, a 65-fold speedup over Fr-TM-align, and a 40-fold speedup over MAMMOTH. Conclusions ppsAlign is a high-performance protein structure alignment tool designed to tackle the computational complexity issues from protein structural data. The solution presented in this paper allows large-scale structure comparisons to be performed using massive parallel computing power of GPU.
3D data processing with advanced computer graphics tools

Science.gov (United States)

Zhang, Song; Ekstrand, Laura; Grieve, Taylor; Eisenmann, David J.; Chumbley, L. Scott

2012-09-01

Often, the 3-D raw data coming from an optical profilometer contains spiky noises and irregular grid, which make it difficult to analyze and difficult to store because of the enormously large size. This paper is to address these two issues for an optical profilometer by substantially reducing the spiky noise of the 3-D raw data from an optical profilometer, and by rapidly re-sampling the raw data into regular grids at any pixel size and any orientation with advanced computer graphics tools. Experimental results will be presented to demonstrate the effectiveness of the proposed approach.
Transforming Graphical System Models to Graphical Attack Models

DEFF Research Database (Denmark)

Ivanova, Marieta Georgieva; Probst, Christian W.; Hansen, Rene Rydhof

2016-01-01

Manually identifying possible attacks on an organisation is a complex undertaking; many different factors must be considered, and the resulting attack scenarios can be complex and hard to maintain as the organisation changes. System models provide a systematic representation of organisations...... approach to transforming graphical system models to graphical attack models in the form of attack trees. Based on an asset in the model, our transformations result in an attack tree that represents attacks by all possible actors in the model, after which the actor in question has obtained the asset....
Computer graphics at VAX JINR

International Nuclear Information System (INIS)

Balashov, V.K.

1991-01-01

The structure of the software for computer graphics at VAX JINR is described. It consists of graphical packages GKS, WAND and a set graphicals packages for High Energy Physics application designed at CERN. 17 refs.; 1 tab
Interactive Learning for Graphic Design Foundations

Science.gov (United States)

Chu, Sauman; Ramirez, German Mauricio Mejia

2012-01-01

One of the biggest problems for students majoring in pre-graphic design is students' inability to apply their knowledge to different design solutions. The purpose of this study is to examine the effectiveness of interactive learning modules in facilitating knowledge acquisition during the learning process and to create interactive learning modules…
Development of a graphic interface for the Ramona-3B code

International Nuclear Information System (INIS)

Maldonado D, D.; Santos O, M.A.

2003-01-01

In this work a graphic interface that interprets the data of the Ramona-3B code is presented. The Ramona-3B code it is a computer program, that it uses text files as input and its generate output also of this type. The quantity of generated information is so big that always it is necessary to process this information with graphic tools to be able to analyze the results of the simulations of nuclear centrals with boiling water reactors. When having a modern tool that it translates text in graphics in an automatic way and that it is of great versatility, one can obtain a graphic interface that facilitates the interpretation of how a BWR nuclear plant behaves. To achieve this tool the key it has been a program that it reads chains of previously indicated characters that keeps the data in a file for later to manipulate them in the creation of the graphic interface. It is used a software of easy access that resists the processing of a great one quantity of data and that later its have been able to graph. Another important function of this interface it is allowing the modification of the input file for Ramona using graphic unfolding and helps in it lines without necessarily to go to the file with input data. For the design of graphic interface it was decided first to show the more representative variables of a BWR type nuclear plant. It is used Mat lab as platform on several options, as PHP, Lab view or C ' . The obtained graphs allow monitoring the plant and to have the control of selected variables. For the graphic interface only is necessary to indicate it the variable to simulate for to be able to interpret graphically the behavior of the BWR type nuclear plant. This tool is of great utility for the teaching of students that they are interested in this type of nuclear topics. (Author)
The Case for Graphic Novels

Directory of Open Access Journals (Sweden)

Steven Hoover

2012-04-01

Full Text Available Many libraries and librarians have embraced graphic novels. A number of books, articles, and presentations have focused on the history of the medium and offered advice on building and maintaining collections, but very little attention has been given the question of how integrate graphic novels into a library’s instructional efforts. This paper will explore the characteristics of graphic novels that make them a valuable resource for librarians who focus on research and information literacy instruction, identify skills and competencies that can be taught by the study of graphic novels, and will provide specific examples of how to incorporate graphic novels into instruction.
GRAPHIC AND MEANING IN LOGO DESIGN

Directory of Open Access Journals (Sweden)

ADÎR Victor

2015-06-01

Full Text Available To design a logo is a special work. It means creation, intelligence, use of colors, and of course, signs and symbols. It is about a visual personality and a signature of an entity. During the working process, a designer has to answer to a few questions, such as: What signs, symbols and colors have to be used? or What are the important things be design in a logo? What are the logos in the market about a same activity? To answer means to design it. The meaning comes mainly from graphics and that is why it is compulsory to pay attention to each detail. The paper talks about the connection between graphic and meaning which may create a corporate identity.
Characterizing chemical systems with on-line computers and graphics

International Nuclear Information System (INIS)

Frazer, J.W.; Rigdon, L.P.; Brand, H.R.; Pomernacki, C.L.

1979-01-01

Incorporating computers and graphics on-line to chemical experiments and processes opens up new opportunities for the study and control of complex systems. Systems having many variables can be characterized even when the variable interactions are nonlinear, and the system cannot a priori be represented by numerical methods and models. That is, large sets of accurate data can be rapidly acquired, then modeling and graphic techniques can be used to obtain partial interpretation plus design of further experimentation. The experimenter can thus comparatively quickly iterate between experimentation and modeling to obtain a final solution. We have designed and characterized a versatile computer-controlled apparatus for chemical research, which incorporates on-line instrumentation and graphics. It can be used to determine the mechanism of enzyme-induced reactions or to optimize analytical methods. The apparatus can also be operated as a pilot plant to design control strategies. On-line graphics were used to display conventional plots used by biochemists and three-dimensional response-surface plots
Path planning of master-slave manipulator using graphic simulator

International Nuclear Information System (INIS)

Lee, J. Y.; Kim, S. H.; Song, T. K.; Park, B. S.; Yoon, J. S.

2002-01-01

To handle the high level radioactive materials such as spent fuels remotely, the master-slave manipulator is generally used as a remote handling equipment in the hot cell. To analyze the motion and to implement the training system by virtual reality technology, the simulator for M-S manipulator using the computer graphics is developed. The parts are modelled in 3-D graphics, assembled, and kinematics are assigned. The inverse kinematics of the manipulator is defined, and the slave of manipulator is coupled with master by the manipulator's specification. Also, the virtual work cell is implemented in the graphical environment which is the same as the real environment and the path planning method using the function of the collision detection for a manipulator are proposed. This graphic simulator of manipulator can be effectively used in designing of the maintenance processes for the hot cell equipment and enhance the reliability of the spent fuel management
Expected Utility Illustrated: A Graphical Analysis of Gambles with More than Two Possible Outcomes

Science.gov (United States)

Chen, Frederick H.

2010-01-01

The author presents a simple geometric method to graphically illustrate the expected utility from a gamble with more than two possible outcomes. This geometric result gives economics students a simple visual aid for studying expected utility theory and enables them to analyze a richer set of decision problems under uncertainty compared to what…
EASI graphics - Version II

International Nuclear Information System (INIS)

Allensworth, J.A.

1984-04-01

EASI (Estimate of Adversary Sequence Interruption) is an analytical technique for measuring the effectiveness of physical protection systems. EASI Graphics is a computer graphics extension of EASI which provides a capability for performing sensitivity and trade-off analyses of the parameters of a physical protection system. This document reports on the implementation of the Version II of EASI Graphics and illustrates its application with some examples. 5 references, 15 figures, 6 tables
Graphic training materials: Your genie in the lamp

Energy Technology Data Exchange (ETDEWEB)

Hartley, D.; Stroupe, P.

1995-11-01

In the United States, we have overlooked using illustrated narrative materials (comic books) for training. Illustrated narrative training materials have the following benefits: (1) they promote learning by capitalizing on the visual dependency of the American public; (2) they promote retention by reinforcing the written word with graphic illustrations and with job-related stories; (3) they promote efficient transfer of knowledge to those with limited reading skills and those with limited English comprehension skills; and (4) they increase interest and are read! The Japanese have been successfully using graphic tests for education and training for years. Study comics were developed for mathematics, physics, economics, and multi-volume histories of Japan. Our organization decided to capitalize on the popularity and appeal of comic books and develop a graphic text that teaches the On-the-Job Training (OJT) process and good practices.
VACTIV: A graphical dialog based program for an automatic processing of line and band spectra

Science.gov (United States)

Zlokazov, V. B.

2013-05-01

The program VACTIV-Visual ACTIV-has been developed for an automatic analysis of spectrum-like distributions, in particular gamma-ray spectra or alpha-spectra and is a standard graphical dialog based Windows XX application, driven by a menu, mouse and keyboard. On the one hand, it was a conversion of an existing Fortran program ACTIV [1] to the DELPHI language; on the other hand, it is a transformation of the sequential syntax of Fortran programming to a new object-oriented style, based on the organization of event interactions. New features implemented in the algorithms of both the versions consisted in the following as peak model both an analytical function and a graphical curve could be used; the peak search algorithm was able to recognize not only Gauss peaks but also peaks with an irregular form; both narrow peaks (2-4 channels) and broad ones (50-100 channels); the regularization technique in the fitting guaranteed a stable solution in the most complicated cases of strongly overlapping or weak peaks. The graphical dialog interface of VACTIV is much more convenient than the batch mode of ACTIV. [1] V.B. Zlokazov, Computer Physics Communications, 28 (1982) 27-37. NEW VERSION PROGRAM SUMMARYProgram Title: VACTIV Catalogue identifier: ABAC_v2_0 Licensing provisions: no Programming language: DELPHI 5-7 Pascal. Computer: IBM PC series. Operating system: Windows XX. RAM: 1 MB Keywords: Nuclear physics, spectrum decomposition, least squares analysis, graphical dialog, object-oriented programming. Classification: 17.6. Catalogue identifier of previous version: ABAC_v1_0 Journal reference of previous version: Comput. Phys. Commun. 28 (1982) 27 Does the new version supersede the previous version?: Yes. Nature of problem: Program VACTIV is intended for precise analysis of arbitrary spectrum-like distributions, e.g. gamma-ray and X-ray spectra and allows the user to carry out the full cycle of automatic processing of such spectra, i.e. calibration, automatic peak search
The graphics system and the data saving for the SAPHIR experiment

International Nuclear Information System (INIS)

Albold, D.

1990-08-01

Important extensions have been made to the data acquisition system SOS for the SAPHIR experiment at the Bonn ELSA facilities. As support for various online-programs, controlling components of the detector, a graphic system for presenting data was developed. This enables any program in the system to use all graphic devices. Main component is a program serving requests for presentation on a 19 inch color monitor. Window-technique allows a presentation of several graphics on one screen. Equipped with a trackball and using menus, this is an easy to use and powerful tool in controlling the experiment. Other important extensions concern data storage. A huge amount of event data can be stored on 8 mm cassettes by the program Eventsaver. This program can be controlled by a component of the SAPHIR-Online SOL running on a VAX-Computer and using windows and menus. The smaller amount of data, containing parameters and programs, which should be accessible within a small period of time, can be stored on a magnetic disk. A program supporting a file-structure for access to this disk is described. (orig./HSI) [de
A handbook of statistical graphics using SAS ODS

CERN Document Server

Der, Geoff

2014-01-01

An Introduction to Graphics: Good Graphics, Bad Graphics, Catastrophic Graphics and Statistical GraphicsThe Challenger DisasterGraphical DisplaysA Little History and Some Early Graphical DisplaysGraphical DeceptionAn Introduction to ODS GraphicsGenerating ODS GraphsODS DestinationsStatistical Graphics ProceduresODS Graphs from Statistical ProceduresControlling ODS GraphicsControlling Labelling in GraphsODS Graphics EditorGraphs for Displaying the Characteristics of Univariate Data: Horse Racing, Mortality Rates, Forearm Lengths, Survival Times and Geyser EruptionsIntroductionPie Chart, Bar Cha

Some links on this page may take you to non-federal websites. Their policies may differ from this site.