units gpus offer: Topics by WorldWideScience.org

Sample records for units gpus offer

Green smartphone GPUs: Optimizing energy consumption using GPUFreq scaling governors

KAUST Repository

Ahmad, Enas M.

2015-10-19

Modern smartphones are limited by their short battery life. The advancement of the graphical performance is considered as one of the main reasons behind the massive battery drainage in smartphones. In this paper we present a novel implementation of the GPUFreq Scaling Governors, a Dynamic Voltage and Frequency Scaling (DVFS) model implemented in the Android Linux kernel for dynamically scaling smartphone Graphical Processing Units (GPUs). The GPUFreq governors offer users multiple variations and alternatives in controlling the power consumption and performance of their GPUs. We implemented and evaluated our model on a smartphone GPU and measured the energy performance using an external power monitor. The results show that the energy consumption of smartphone GPUs can be significantly reduced with a minor effect on the GPU performance.
Designing scientific applications on GPUs

CERN Document Server

Couturier, Raphael

2013-01-01

Many of today's complex scientific applications now require a vast amount of computational power. General purpose graphics processing units (GPGPUs) enable researchers in a variety of fields to benefit from the computational power of all the cores available inside graphics cards.Understand the Benefits of Using GPUs for Many Scientific ApplicationsDesigning Scientific Applications on GPUs shows you how to use GPUs for applications in diverse scientific fields, from physics and mathematics to computer science. The book explains the methods necessary for designing or porting your scientific appl
GPUs for the realtime low-level trigger of the NA62 experiment at CERN

CERN Document Server

Ammendola, R; Biagioni, A; Chiozzi, S; Cotta Ramusino, A; Fantechi, R; Fiorini, M; Gianoli, A; Graverini, E; Lamanna, G; Lonardo, A; Messina, A; Neri, I; Pantaleo, F; Paolucci, P S; Piandani, R; Pontisso, L; Simula, F; Sozzi, M; Vicini, P

2015-01-01

A pilot project for the use of GPUs (Graphics processing units) in online triggering ap- plications for high energy physics experiments (HEP) is presented. GPUs offer a highly parallel architecture and the fact that most of the chip resources are devoted to computa- tion. Moreover, they allow to achieve a large computing power using a limited amount of space and power. The application of online parallel computing on GPUs is shown for the synchronous low level trigger of NA62 experiment at CERN. Direct GPU communication using a FPGA-based board has been exploited to reduce the data transmission latency and results on a first field test at CERN will be highlighted. This work is part of a wider project named GAP (GPU application project), intended to study the use of GPUs in real-time applications in both HEP and medical imagin
Monte Carlo method for neutron transport calculations in graphics processing units (GPUs)

International Nuclear Information System (INIS)

Pellegrino, Esteban

2011-01-01

Monte Carlo simulation is well suited for solving the Boltzmann neutron transport equation in an inhomogeneous media for complicated geometries. However, routine applications require the computation time to be reduced to hours and even minutes in a desktop PC. The interest in adopting Graphics Processing Units (GPUs) for Monte Carlo acceleration is rapidly growing. This is due to the massive parallelism provided by the latest GPU technologies which is the most promising solution to the challenge of performing full-size reactor core analysis on a routine basis. In this study, Monte Carlo codes for a fixed-source neutron transport problem were developed for GPU environments in order to evaluate issues associated with computational speedup using GPUs. Results obtained in this work suggest that a speedup of several orders of magnitude is possible using the state-of-the-art GPU technologies. (author) [es
High-performance blob-based iterative three-dimensional reconstruction in electron tomography using multi-GPUs

Directory of Open Access Journals (Sweden)

Wan Xiaohua

2012-06-01

Full Text Available Abstract Background Three-dimensional (3D reconstruction in electron tomography (ET has emerged as a leading technique to elucidate the molecular structures of complex biological specimens. Blob-based iterative methods are advantageous reconstruction methods for 3D reconstruction in ET, but demand huge computational costs. Multiple graphic processing units (multi-GPUs offer an affordable platform to meet these demands. However, a synchronous communication scheme between multi-GPUs leads to idle GPU time, and a weighted matrix involved in iterative methods cannot be loaded into GPUs especially for large images due to the limited available memory of GPUs. Results In this paper we propose a multilevel parallel strategy combined with an asynchronous communication scheme and a blob-ELLR data structure to efficiently perform blob-based iterative reconstructions on multi-GPUs. The asynchronous communication scheme is used to minimize the idle GPU time so as to asynchronously overlap communications with computations. The blob-ELLR data structure only needs nearly 1/16 of the storage space in comparison with ELLPACK-R (ELLR data structure and yields significant acceleration. Conclusions Experimental results indicate that the multilevel parallel scheme combined with the asynchronous communication scheme and the blob-ELLR data structure allows efficient implementations of 3D reconstruction in ET on multi-GPUs.
Finite Temperature Lattice QCD with GPUs

International Nuclear Information System (INIS)

Cardoso, N.; Cardoso, M.; Bicudo, P.

2011-01-01

Graphics Processing Units (GPUs) are being used in many areas of physics, since the performance versus cost is very attractive. The GPUs can be addressed by CUDA which is a NVIDIA's parallel computing architecture. It enables dramatic increases in computing performance by harnessing the power of the GPU. We present a performance comparison between the GPU and CPU with single precision and double precision in generating lattice SU(2) configurations. Analyses with single and multiple GPUs, using CUDA and OPENMP, are also presented. We also present SU(2) results for the renormalized Polyakov loop, colour averaged free energy and the string tension as a function of the temperature. (authors)
Utilizing the Double-Precision Floating-Point Computing Power of GPUs for RSA Acceleration

Directory of Open Access Journals (Sweden)

Jiankuo Dong

2017-01-01

Full Text Available Asymmetric cryptographic algorithm (e.g., RSA and Elliptic Curve Cryptography implementations on Graphics Processing Units (GPUs have been researched for over a decade. The basic idea of most previous contributions is exploiting the highly parallel GPU architecture and porting the integer-based algorithms from general-purpose CPUs to GPUs, to offer high performance. However, the great potential cryptographic computing power of GPUs, especially by the more powerful floating-point instructions, has not been comprehensively investigated in fact. In this paper, we fully exploit the floating-point computing power of GPUs, by various designs, including the floating-point-based Montgomery multiplication/exponentiation algorithm and Chinese Remainder Theorem (CRT implementation in GPU. And for practical usage of the proposed algorithm, a new method is performed to convert the input/output between octet strings and floating-point numbers, fully utilizing GPUs and further promoting the overall performance by about 5%. The performance of RSA-2048/3072/4096 decryption on NVIDIA GeForce GTX TITAN reaches 42,211/12,151/5,790 operations per second, respectively, which achieves 13 times the performance of the previous fastest floating-point-based implementation (published in Eurocrypt 2009. The RSA-4096 decryption precedes the existing fastest integer-based result by 23%.
Symplectic multi-particle tracking on GPUs

Science.gov (United States)

Liu, Zhicong; Qiang, Ji

2018-05-01

A symplectic multi-particle tracking model is implemented on the Graphic Processing Units (GPUs) using the Compute Unified Device Architecture (CUDA) language. The symplectic tracking model can preserve phase space structure and reduce non-physical effects in long term simulation, which is important for beam property evaluation in particle accelerators. Though this model is computationally expensive, it is very suitable for parallelization and can be accelerated significantly by using GPUs. In this paper, we optimized the implementation of the symplectic tracking model on both single GPU and multiple GPUs. Using a single GPU processor, the code achieves a factor of 2-10 speedup for a range of problem sizes compared with the time on a single state-of-the-art Central Processing Unit (CPU) node with similar power consumption and semiconductor technology. It also shows good scalability on a multi-GPU cluster at Oak Ridge Leadership Computing Facility. In an application to beam dynamics simulation, the GPU implementation helps save more than a factor of two total computing time in comparison to the CPU implementation.
A Novel CSR-Based Sparse Matrix-Vector Multiplication on GPUs

Directory of Open Access Journals (Sweden)

Guixia He

2016-01-01

Full Text Available Sparse matrix-vector multiplication (SpMV is an important operation in scientific computations. Compressed sparse row (CSR is the most frequently used format to store sparse matrices. However, CSR-based SpMVs on graphic processing units (GPUs, for example, CSR-scalar and CSR-vector, usually have poor performance due to irregular memory access patterns. This motivates us to propose a perfect CSR-based SpMV on the GPU that is called PCSR. PCSR involves two kernels and accesses CSR arrays in a fully coalesced manner by introducing a middle array, which greatly alleviates the deficiencies of CSR-scalar (rare coalescing and CSR-vector (partial coalescing. Test results on a single C2050 GPU show that PCSR fully outperforms CSR-scalar, CSR-vector, and CSRMV and HYBMV in the vendor-tuned CUSPARSE library and is comparable with a most recently proposed CSR-based algorithm, CSR-Adaptive. Furthermore, we extend PCSR on a single GPU to multiple GPUs. Experimental results on four C2050 GPUs show that no matter whether the communication between GPUs is considered or not PCSR on multiple GPUs achieves good performance and has high parallel efficiency.
ECC2K-130 on NVIDIA GPUs

NARCIS (Netherlands)

Bernstein, D.J.; Chen, H.-C.; Cheng, C.M.; Lange, T.; Niederhagen, R.F.; Schwabe, P.; Yang, B.Y.; Gong, G.; Gupta, K.C.

2010-01-01

A major cryptanalytic computation is currently underway on multiple platforms, including standard CPUs, FPGAs, PlayStations and Graphics Processing Units (GPUs), to break the Certicom ECC2K-130 challenge. This challenge is to compute an elliptic-curve discrete logarithm on a Koblitz curve over $\\rm
Numerical computations with GPUs

CERN Document Server

Kindratenko, Volodymyr

2014-01-01

This book brings together research on numerical methods adapted for Graphics Processing Units (GPUs). It explains recent efforts to adapt classic numerical methods, including solution of linear equations and FFT, for massively parallel GPU architectures. This volume consolidates recent research and adaptations, covering widely used methods that are at the core of many scientific and engineering computations. Each chapter is written by authors working on a specific group of methods; these leading experts provide mathematical background, parallel algorithms and implementation details leading to
GPUs for real-time processing in HEP trigger systems

CERN Document Server

Ammendola, R; Deri, L; Fiorini, M; Frezza, O; Lamanna, G; Lo Cicero, F; Lonardo, A; Messina, A; Sozzi, M; Pantaleo, F; Paolucci, Ps; Rossetti, D; Simula, F; Tosoratto, L; Vicini, P

2014-01-01

We describe a pilot project (GAP - GPU Application Project) for the use of GPUs (Graphics processing units) for online triggering applications in High Energy Physics experiments. Two major trends can be identied in the development of trigger and DAQ systems for particle physics experiments: the massive use of general-purpose commodity systems such as commercial multicore PC farms for data acquisition, and the reduction of trigger levels implemented in hardware, towards a fully software data selection system (\\trigger-less"). The innovative approach presented here aims at exploiting the parallel computing power of commercial GPUs to perform fast computations in software not only in high level trigger levels but also in early trigger stages. General-purpose computing on GPUs is emerging as a new paradigm in several elds of science, although so far applications have been tailored to the specic strengths of such devices as accelerators in oine computation. With the steady reduction of GPU latencies, and the incre...
Accelerated radiotherapy planners calculated by parallelization with GPUs

International Nuclear Information System (INIS)

Reinado, D.; Cozar, J.; Alonso, S.; Chinillach, N.; Cortina, T.; Ricos, B.; Diez, S.

2011-01-01

In this paper we have developed and tested by a subroutine parallelization architectures graphics processing units (GPUs) to apply to calculations with standard algorithms known code. The experience acquired during these tests shall also apply to the MC calculations in radiotherapy if you have the code.
Exploiting GPUs in Virtual Machine for BioCloud

Science.gov (United States)

Jo, Heeseung; Jeong, Jinkyu; Lee, Myoungho; Choi, Dong Hoon

2013-01-01

Recently, biological applications start to be reimplemented into the applications which exploit many cores of GPUs for better computation performance. Therefore, by providing virtualized GPUs to VMs in cloud computing environment, many biological applications will willingly move into cloud environment to enhance their computation performance and utilize infinite cloud computing resource while reducing expenses for computations. In this paper, we propose a BioCloud system architecture that enables VMs to use GPUs in cloud environment. Because much of the previous research has focused on the sharing mechanism of GPUs among VMs, they cannot achieve enough performance for biological applications of which computation throughput is more crucial rather than sharing. The proposed system exploits the pass-through mode of PCI express (PCI-E) channel. By making each VM be able to access underlying GPUs directly, applications can show almost the same performance as when those are in native environment. In addition, our scheme multiplexes GPUs by using hot plug-in/out device features of PCI-E channel. By adding or removing GPUs in each VM in on-demand manner, VMs in the same physical host can time-share their GPUs. We implemented the proposed system using the Xen VMM and NVIDIA GPUs and showed that our prototype is highly effective for biological GPU applications in cloud environment. PMID:23710465
Exploiting GPUs in Virtual Machine for BioCloud

Directory of Open Access Journals (Sweden)

Heeseung Jo

2013-01-01

Full Text Available Recently, biological applications start to be reimplemented into the applications which exploit many cores of GPUs for better computation performance. Therefore, by providing virtualized GPUs to VMs in cloud computing environment, many biological applications will willingly move into cloud environment to enhance their computation performance and utilize infinite cloud computing resource while reducing expenses for computations. In this paper, we propose a BioCloud system architecture that enables VMs to use GPUs in cloud environment. Because much of the previous research has focused on the sharing mechanism of GPUs among VMs, they cannot achieve enough performance for biological applications of which computation throughput is more crucial rather than sharing. The proposed system exploits the pass-through mode of PCI express (PCI-E channel. By making each VM be able to access underlying GPUs directly, applications can show almost the same performance as when those are in native environment. In addition, our scheme multiplexes GPUs by using hot plug-in/out device features of PCI-E channel. By adding or removing GPUs in each VM in on-demand manner, VMs in the same physical host can time-share their GPUs. We implemented the proposed system using the Xen VMM and NVIDIA GPUs and showed that our prototype is highly effective for biological GPU applications in cloud environment.
Exploiting GPUs in virtual machine for BioCloud.

Science.gov (United States)

Jo, Heeseung; Jeong, Jinkyu; Lee, Myoungho; Choi, Dong Hoon

2013-01-01

Recently, biological applications start to be reimplemented into the applications which exploit many cores of GPUs for better computation performance. Therefore, by providing virtualized GPUs to VMs in cloud computing environment, many biological applications will willingly move into cloud environment to enhance their computation performance and utilize infinite cloud computing resource while reducing expenses for computations. In this paper, we propose a BioCloud system architecture that enables VMs to use GPUs in cloud environment. Because much of the previous research has focused on the sharing mechanism of GPUs among VMs, they cannot achieve enough performance for biological applications of which computation throughput is more crucial rather than sharing. The proposed system exploits the pass-through mode of PCI express (PCI-E) channel. By making each VM be able to access underlying GPUs directly, applications can show almost the same performance as when those are in native environment. In addition, our scheme multiplexes GPUs by using hot plug-in/out device features of PCI-E channel. By adding or removing GPUs in each VM in on-demand manner, VMs in the same physical host can time-share their GPUs. We implemented the proposed system using the Xen VMM and NVIDIA GPUs and showed that our prototype is highly effective for biological GPU applications in cloud environment.
A massively parallel method of characteristic neutral particle transport code for GPUs

International Nuclear Information System (INIS)

Boyd, W. R.; Smith, K.; Forget, B.

2013-01-01

Over the past 20 years, parallel computing has enabled computers to grow ever larger and more powerful while scientific applications have advanced in sophistication and resolution. This trend is being challenged, however, as the power consumption for conventional parallel computing architectures has risen to unsustainable levels and memory limitations have come to dominate compute performance. Heterogeneous computing platforms, such as Graphics Processing Units (GPUs), are an increasingly popular paradigm for solving these issues. This paper explores the applicability of GPUs for deterministic neutron transport. A 2D method of characteristics (MOC) code - OpenMOC - has been developed with solvers for both shared memory multi-core platforms as well as GPUs. The multi-threading and memory locality methodologies for the GPU solver are presented. Performance results for the 2D C5G7 benchmark demonstrate 25-35 x speedup for MOC on the GPU. The lessons learned from this case study will provide the basis for further exploration of MOC on GPUs as well as design decisions for hardware vendors exploring technologies for the next generation of machines for scientific computing. (authors)
Comparison of some parallelization strategies of thermalhydraulic codes on GPUs

International Nuclear Information System (INIS)

Jendoubi, T.; Bergeaud, V.; Geay, A.

2013-01-01

Modern supercomputers architecture is now often based on hybrid concepts combining parallelism to distributed memory, parallelism to shared memory and also to GPUs (Graphic Process Units). In this work, we propose a new approach to take advantage of these graphic cards in thermohydraulics algorithms. (authors)
A Triply Selective MIMO Channel Simulator Using GPUs

Directory of Open Access Journals (Sweden)

R. Carrasco-Alvarez

2018-01-01

Full Text Available A methodology for implementing a triply selective multiple-input multiple-output (MIMO simulator based on graphics processing units (GPUs is presented. The resulting simulator is based on the implementation of multiple double-selective single-input single-output (SISO channel generators, where the multiple inputs and the multiple received signals have been transformed in order to supply the corresponding space correlation of the channel under consideration. A direct consequence of this approach is the flexibility provided, which allows different propagation statistics to each SISO channel to be specified and thus more complex environments to be replicated. It is shown that under some specific constraints, the statistics of the triply selective MIMO simulator are the same as those reported in the state of art. Simulation results show the computational time improvement achieved, up to 650-fold for an 8 × 8 MIMO channel simulator when compared with sequential implementations. In addition to the computational improvement, the proposed simulator offers flexibility for testing a variety of scenarios in vehicle-to-vehicle (V2V and vehicle-to-infrastructure (V2I systems.
Data Acquisition with GPUs: The DAQ for the Muon $g$-$2$ Experiment at Fermilab

Energy Technology Data Exchange (ETDEWEB)

Gohn, W. [Kentucky U.

2016-11-15

Graphical Processing Units (GPUs) have recently become a valuable computing tool for the acquisition of data at high rates and for a relatively low cost. The devices work by parallelizing the code into thousands of threads, each executing a simple process, such as identifying pulses from a waveform digitizer. The CUDA programming library can be used to effectively write code to parallelize such tasks on Nvidia GPUs, providing a significant upgrade in performance over CPU based acquisition systems. The muon $g$-$2$ experiment at Fermilab is heavily relying on GPUs to process its data. The data acquisition system for this experiment must have the ability to create deadtime-free records from 700 $\\mu$s muon spills at a raw data rate 18 GB per second. Data will be collected using 1296 channels of $\\mu$TCA-based 800 MSPS, 12 bit waveform digitizers and processed in a layered array of networked commodity processors with 24 GPUs working in parallel to perform a fast recording of the muon decays during the spill. The described data acquisition system is currently being constructed, and will be fully operational before the start of the experiment in 2017.

Computation Reduction Oriented Circular Scanning SAR Raw Data Simulation on Multi-GPUs

Directory of Open Access Journals (Sweden)

Hu Chen

2016-08-01

Full Text Available As a special working mode, the circular scanning Synthetic Aperture Radar (SAR is widely used in the earth observation. With the increase of resolution and swath width, the simulation data has a massive increase, which boosts the new requirements of efficiency. Through analyzing the redundancy in the raw data simulation based on Graphics Processing Unit (GPU, a fast simulation method considering reduction of redundant computation is realized by the multi-GPUs and Message Passing Interface (MPI. The results show that the efficiency of 4-GPUs increases 2 times through the redundant reduction, and the hardware cost decreases by 50%, thus the overall speedup achieves 350 times than the traditional CPU simulation.
GPUs, a new tool of acceleration in CFD: efficiency and reliability on smoothed particle hydrodynamics methods.

Directory of Open Access Journals (Sweden)

Alejandro C Crespo

Full Text Available Smoothed Particle Hydrodynamics (SPH is a numerical method commonly used in Computational Fluid Dynamics (CFD to simulate complex free-surface flows. Simulations with this mesh-free particle method far exceed the capacity of a single processor. In this paper, as part of a dual-functioning code for either central processing units (CPUs or Graphics Processor Units (GPUs, a parallelisation using GPUs is presented. The GPU parallelisation technique uses the Compute Unified Device Architecture (CUDA of nVidia devices. Simulations with more than one million particles on a single GPU card exhibit speedups of up to two orders of magnitude over using a single-core CPU. It is demonstrated that the code achieves different speedups with different CUDA-enabled GPUs. The numerical behaviour of the SPH code is validated with a standard benchmark test case of dam break flow impacting on an obstacle where good agreement with the experimental results is observed. Both the achieved speed-ups and the quantitative agreement with experiments suggest that CUDA-based GPU programming can be used in SPH methods with efficiency and reliability.
Cross-Identification of Astronomical Catalogs on Multiple GPUs

Science.gov (United States)

Lee, M. A.; Budavári, T.

2013-10-01

One of the most fundamental problems in observational astronomy is the cross-identification of sources. Observations are made in different wavelengths, at different times, and from different locations and instruments, resulting in a large set of independent observations. The scientific outcome is often limited by our ability to quickly perform meaningful associations between detections. The matching, however, is difficult scientifically, statistically, as well as computationally. The former two require detailed physical modeling and advanced probabilistic concepts; the latter is due to the large volumes of data and the problem's combinatorial nature. In order to tackle the computational challenge and to prepare for future surveys, whose measurements will be exponentially increasing in size past the scale of feasible CPU-based solutions, we developed a new implementation which addresses the issue by performing the associations on multiple Graphics Processing Units (GPUs). Our implementation utilizes up to 6 GPUs in combination with the Thrust library to achieve an over 40x speed up verses the previous best implementation running on a multi-CPU SQL Server.
Fast network centrality analysis using GPUs

Directory of Open Access Journals (Sweden)

Shi Zhiao

2011-05-01

Full Text Available Abstract Background With the exploding volume of data generated by continuously evolving high-throughput technologies, biological network analysis problems are growing larger in scale and craving for more computational power. General Purpose computation on Graphics Processing Units (GPGPU provides a cost-effective technology for the study of large-scale biological networks. Designing algorithms that maximize data parallelism is the key in leveraging the power of GPUs. Results We proposed an efficient data parallel formulation of the All-Pairs Shortest Path problem, which is the key component for shortest path-based centrality computation. A betweenness centrality algorithm built upon this formulation was developed and benchmarked against the most recent GPU-based algorithm. Speedup between 11 to 19% was observed in various simulated scale-free networks. We further designed three algorithms based on this core component to compute closeness centrality, eccentricity centrality and stress centrality. To make all these algorithms available to the research community, we developed a software package gpu-fan (GPU-based Fast Analysis of Networks for CUDA enabled GPUs. Speedup of 10-50× compared with CPU implementations was observed for simulated scale-free networks and real world biological networks. Conclusions gpu-fan provides a significant performance improvement for centrality computation in large-scale networks. Source code is available under the GNU Public License (GPL at http://bioinfo.vanderbilt.edu/gpu-fan/.
Massively parallel read mapping on GPUs with the q-group index and PEANUT

NARCIS (Netherlands)

J. Köster (Johannes); S. Rahmann (Sven)

2014-01-01

textabstractWe present the q-group index, a novel data structure for read mapping tailored towards graphics processing units (GPUs) with a small memory footprint and efficient parallel algorithms for querying and building. On top of the q-group index we introduce PEANUT, a highly parallel GPU-based
A Decade-Long European-Scale Convection-Resolving Climate Simulation on GPUs

Science.gov (United States)

Leutwyler, D.; Fuhrer, O.; Ban, N.; Lapillonne, X.; Lüthi, D.; Schar, C.

2016-12-01

Convection-resolving models have proven to be very useful tools in numerical weather prediction and in climate research. However, due to their extremely demanding computational requirements, they have so far been limited to short simulations and/or small computational domains. Innovations in the supercomputing domain have led to new supercomputer designs that involve conventional multi-core CPUs and accelerators such as graphics processing units (GPUs). One of the first atmospheric models that has been fully ported to GPUs is the Consortium for Small-Scale Modeling weather and climate model COSMO. This new version allows us to expand the size of the simulation domain to areas spanning continents and the time period up to one decade. We present results from a decade-long, convection-resolving climate simulation over Europe using the GPU-enabled COSMO version on a computational domain with 1536x1536x60 gridpoints. The simulation is driven by the ERA-interim reanalysis. The results illustrate how the approach allows for the representation of interactions between synoptic-scale and meso-scale atmospheric circulations at scales ranging from 1000 to 10 km. We discuss some of the advantages and prospects from using GPUs, and focus on the performance of the convection-resolving modeling approach on the European scale. Specifically we investigate the organization of convective clouds and on validate hourly rainfall distributions with various high-resolution data sets.
Exploiting GPUs in Virtual Machine for BioCloud

OpenAIRE

Jo, Heeseung; Jeong, Jinkyu; Lee, Myoungho; Choi, Dong Hoon

2013-01-01

Recently, biological applications start to be reimplemented into the applications which exploit many cores of GPUs for better computation performance. Therefore, by providing virtualized GPUs to VMs in cloud computing environment, many biological applications will willingly move into cloud environment to enhance their computation performance and utilize infinite cloud computing resource while reducing expenses for computations. In this paper, we propose a BioCloud system architecture that ena...
Evaluation of vectorized Monte Carlo algorithms on GPUs for a neutron Eigenvalue problem

International Nuclear Information System (INIS)

Du, X.; Liu, T.; Ji, W.; Xu, X. G.; Brown, F. B.

2013-01-01

Conventional Monte Carlo (MC) methods for radiation transport computations are 'history-based', which means that one particle history at a time is tracked. Simulations based on such methods suffer from thread divergence on the graphics processing unit (GPU), which severely affects the performance of GPUs. To circumvent this limitation, event-based vectorized MC algorithms can be utilized. A versatile software test-bed, called ARCHER - Accelerated Radiation-transport Computations in Heterogeneous Environments - was used for this study. ARCHER facilitates the development and testing of a MC code based on the vectorized MC algorithm implemented on GPUs by using NVIDIA's Compute Unified Device Architecture (CUDA). The ARCHER GPU code was designed to solve a neutron eigenvalue problem and was tested on a NVIDIA Tesla M2090 Fermi card. We found that although the vectorized MC method significantly reduces the occurrence of divergent branching and enhances the warp execution efficiency, the overall simulation speed is ten times slower than the conventional history-based MC method on GPUs. By analyzing detailed GPU profiling information from ARCHER, we discovered that the main reason was the large amount of global memory transactions, causing severe memory access latency. Several possible solutions to alleviate the memory latency issue are discussed. (authors)
Evaluation of vectorized Monte Carlo algorithms on GPUs for a neutron Eigenvalue problem

Energy Technology Data Exchange (ETDEWEB)

Du, X.; Liu, T.; Ji, W.; Xu, X. G. [Nuclear Engineering Program, Rensselaer Polytechnic Institute, Troy, NY 12180 (United States); Brown, F. B. [Monte Carlo Codes Group, Los Alamos National Laboratory, Los Alamos, NM 87545 (United States)

2013-07-01

Conventional Monte Carlo (MC) methods for radiation transport computations are 'history-based', which means that one particle history at a time is tracked. Simulations based on such methods suffer from thread divergence on the graphics processing unit (GPU), which severely affects the performance of GPUs. To circumvent this limitation, event-based vectorized MC algorithms can be utilized. A versatile software test-bed, called ARCHER - Accelerated Radiation-transport Computations in Heterogeneous Environments - was used for this study. ARCHER facilitates the development and testing of a MC code based on the vectorized MC algorithm implemented on GPUs by using NVIDIA's Compute Unified Device Architecture (CUDA). The ARCHER{sub GPU} code was designed to solve a neutron eigenvalue problem and was tested on a NVIDIA Tesla M2090 Fermi card. We found that although the vectorized MC method significantly reduces the occurrence of divergent branching and enhances the warp execution efficiency, the overall simulation speed is ten times slower than the conventional history-based MC method on GPUs. By analyzing detailed GPU profiling information from ARCHER, we discovered that the main reason was the large amount of global memory transactions, causing severe memory access latency. Several possible solutions to alleviate the memory latency issue are discussed. (authors)
Fast DRR generation for 2D to 3D registration on GPUs

Energy Technology Data Exchange (ETDEWEB)

Tornai, Gabor Janos; Cserey, Gyoergy [Faculty of Information Technology, Pazmany Peter Catholic University, Prater u. 50/a, H-1083, Budapest (Hungary); Pappas, Ion [General Electric Healthcare, Akron u. 2, H-2040, Budaoers (Hungary)

2012-08-15

Purpose: The generation of digitally reconstructed radiographs (DRRs) is the most time consuming step on the CPU in intensity based two-dimensional x-ray to three-dimensional (CT or 3D rotational x-ray) medical image registration, which has application in several image guided interventions. This work presents optimized DRR rendering on graphical processor units (GPUs) and compares performance achievable on four commercially available devices. Methods: A ray-cast based DRR rendering was implemented for a 512 Multiplication-Sign 512 Multiplication-Sign 72 CT volume. The block size parameter was optimized for four different GPUs for a region of interest (ROI) of 400 Multiplication-Sign 225 pixels with different sampling ratios (1.1%-9.1% and 100%). Performance was statistically evaluated and compared for the four GPUs. The method and the block size dependence were validated on the latest GPU for several parameter settings with a public gold standard dataset (512 Multiplication-Sign 512 Multiplication-Sign 825 CT) for registration purposes. Results: Depending on the GPU, the full ROI is rendered in 2.7-5.2 ms. If sampling ratio of 1.1%-9.1% is applied, execution time is in the range of 0.3-7.3 ms. On all GPUs, the mean of the execution time increased linearly with respect to the number of pixels if sampling was used. Conclusions: The presented results outperform other results from the literature. This indicates that automatic 2D to 3D registration, which typically requires a couple of hundred DRR renderings to converge, can be performed quasi on-line, in less than a second or depending on the application and hardware in less than a couple of seconds. Accordingly, a whole new field of applications is opened for image guided interventions, where the registration is continuously performed to match the real-time x-ray.
Solution to PDEs using radial basis function finite-differences (RBF-FD) on multiple GPUs

International Nuclear Information System (INIS)

Bollig, Evan F.; Flyer, Natasha; Erlebacher, Gordon

2012-01-01

This paper presents parallelization strategies for the radial basis function-finite difference (RBF-FD) method. As a generalized finite differencing scheme, the RBF-FD method functions without the need for underlying meshes to structure nodes. It offers high-order accuracy approximation and scales as O(N) per time step, with N being with the total number of nodes. To our knowledge, this is the first implementation of the RBF-FD method to leverage GPU accelerators for the solution of PDEs. Additionally, this implementation is the first to span both multiple CPUs and multiple GPUs. OpenCL kernels target the GPUs and inter-processor communication and synchronization is managed by the Message Passing Interface (MPI). We verify our implementation of the RBF-FD method with two hyperbolic PDEs on the sphere, and demonstrate up to 9x speedup on a commodity GPU with unoptimized kernel implementations. On a high performance cluster, the method achieves up to 7x speedup for the maximum problem size of 27,556 nodes.
Musrfit-Real Time Parameter Fitting Using GPUs

Science.gov (United States)

Locans, Uldis; Suter, Andreas

High transverse field μSR (HTF-μSR) experiments typically lead to a rather large data sets, since it is necessary to follow the high frequencies present in the positron decay histograms. The analysis of these data sets can be very time consuming, usually due to the limited computational power of the hardware. To overcome the limited computing resources rotating reference frame transformation (RRF) is often used to reduce the data sets that need to be handled. This comes at a price typically the μSR community is not aware of: (i) due to the RRF transformation the fitting parameter estimate is of poorer precision, i.e., more extended expensive beamtime is needed. (ii) RRF introduces systematic errors which hampers the statistical interpretation of χ2 or the maximum log-likelihood. We will briefly discuss these issues in a non-exhaustive practical way. The only and single purpose of the RRF transformation is the sluggish computer power. Therefore during this work GPU (Graphical Processing Units) based fitting was developed which allows to perform real-time full data analysis without RRF. GPUs have become increasingly popular in scientific computing in recent years. Due to their highly parallel architecture they provide the opportunity to accelerate many applications with considerably less costs than upgrading the CPU computational power. With the emergence of frameworks such as CUDA and OpenCL these devices have become more easily programmable. During this work GPU support was added to Musrfit- a data analysis framework for μSR experiments. The new fitting algorithm uses CUDA or OpenCL to offload the most time consuming parts of the calculations to Nvidia or AMD GPUs. Using the current CPU implementation in Musrfit parameter fitting can take hours for certain data sets while the GPU version can allow to perform real-time data analysis on the same data sets. This work describes the challenges that arise in adding the GPU support to t as well as results obtained
Parallel iterative solution of the Hermite Collocation equations on GPUs II

International Nuclear Information System (INIS)

Vilanakis, N; Mathioudakis, E

2014-01-01

Hermite Collocation is a high order finite element method for Boundary Value Problems modelling applications in several fields of science and engineering. Application of this integration free numerical solver for the solution of linear BVPs results in a large and sparse general system of algebraic equations, suggesting the usage of an efficient iterative solver especially for realistic simulations. In part I of this work an efficient parallel algorithm of the Schur complement method coupled with Bi-Conjugate Gradient Stabilized (BiCGSTAB) iterative solver has been designed for multicore computing architectures with a Graphics Processing Unit (GPU). In the present work the proposed algorithm has been extended for high performance computing environments consisting of multiprocessor machines with multiple GPUs. Since this is a distributed GPU and shared CPU memory parallel architecture, a hybrid memory treatment is needed for the development of the parallel algorithm. The realization of the algorithm took place on a multiprocessor machine HP SL390 with Tesla M2070 GPUs using the OpenMP and OpenACC standards. Execution time measurements reveal the efficiency of the parallel implementation
Computation of Galois field expressions for quaternary logic functions on GPUs

Directory of Open Access Journals (Sweden)

Gajić Dušan B.

2014-01-01

Full Text Available Galois field (GF expressions are polynomials used as representations of multiple-valued logic (MVL functions. For this purpose, MVL functions are considered as functions defined over a finite (Galois field of order p - GF(p. The problem of computing these functional expressions has an important role in areas such as digital signal processing and logic design. Time needed for computing GF-expressions increases exponentially with the number of variables in MVL functions and, as a result, it often represents a limiting factor in applications. This paper proposes a method for an accelerated computation of GF(4-expressions for quaternary (four-valued logic functions using graphics processing units (GPUs. The method is based on the spectral interpretation of GF-expressions, permitting the use of fast Fourier transform (FFT-like algorithms for their computation. These algorithms are then adapted for highly parallel processing on GPUs. The performance of the proposed solutions is compared with referent C/C++ implementations of the same algorithms processed on central processing units (CPUs. Experimental results confirm that the presented approach leads to significant reduction in processing times (up to 10.86 times when compared to CPU processing. Therefore, the proposed approach widens the set of problem instances which can be efficiently handled in practice. [Projekat Ministarstva nauke Republike Srbije, br. ON174026 i br. III44006
The voluntary safeguards offer of the United States

International Nuclear Information System (INIS)

Houck, F.S.

1985-01-01

During negotiations of the Treaty on the Non-Proliferation of Nuclear Weapons (NPT) concerns were expressed by non-nuclear-weapon States that their acceptance of Agency safeguards would put them at a disadvantage vis-a-vis the nuclear-weapon States. To allay these concerns, the United States and the United Kingdom in December 1967 made voluntary offers to accept Agency safeguards on their peaceful nuclear activities. Subsequently, France made a voluntary offer, the safeguards agreement for which was approved by the IAEA Board of Governors in February 1978, with a view to encouraging acceptance of Agency safeguards by additional States. More recently, in February 1985 the Board approved the safeguards agreement for the voluntary offer of the USSR, made inter alia to encourage further acceptance of Agency safeguards. These safeguards agreements with nuclear-weapon-States have two important features in common: Namely, they result from voluntary offers to accept safeguards rather than from multilateral or bilateral undertakings, and they give the Agency the right but generally not an obligation to apply its safeguards. The agreements differ in certain respects, the most noteworthy of which is the scope of the nuclear activities covered by each offer. The agreements of the United States and United Kingdom are the broadest, covering all peaceful nuclear activities in each country. The safeguards agreement for the US voluntary offer has been in force since December 1980. Now is an appropriate time to review the experience with the agreement's implementation during its first four years, as well as its history and salient features
A convolution-superposition dose calculation engine for GPUs

Energy Technology Data Exchange (ETDEWEB)

Hissoiny, Sami; Ozell, Benoit; Despres, Philippe [Departement de genie informatique et genie logiciel, Ecole polytechnique de Montreal, 2500 Chemin de Polytechnique, Montreal, Quebec H3T 1J4 (Canada); Departement de radio-oncologie, CRCHUM-Centre hospitalier de l' Universite de Montreal, 1560 rue Sherbrooke Est, Montreal, Quebec H2L 4M1 (Canada)

2010-03-15

Purpose: Graphic processing units (GPUs) are increasingly used for scientific applications, where their parallel architecture and unprecedented computing power density can be exploited to accelerate calculations. In this paper, a new GPU implementation of a convolution/superposition (CS) algorithm is presented. Methods: This new GPU implementation has been designed from the ground-up to use the graphics card's strengths and to avoid its weaknesses. The CS GPU algorithm takes into account beam hardening, off-axis softening, kernel tilting, and relies heavily on raytracing through patient imaging data. Implementation details are reported as well as a multi-GPU solution. Results: An overall single-GPU acceleration factor of 908x was achieved when compared to a nonoptimized version of the CS algorithm implemented in PlanUNC in single threaded central processing unit (CPU) mode, resulting in approximatively 2.8 s per beam for a 3D dose computation on a 0.4 cm grid. A comparison to an established commercial system leads to an acceleration factor of approximately 29x or 0.58 versus 16.6 s per beam in single threaded mode. An acceleration factor of 46x has been obtained for the total energy released per mass (TERMA) calculation and a 943x acceleration factor for the CS calculation compared to PlanUNC. Dose distributions also have been obtained for a simple water-lung phantom to verify that the implementation gives accurate results. Conclusions: These results suggest that GPUs are an attractive solution for radiation therapy applications and that careful design, taking the GPU architecture into account, is critical in obtaining significant acceleration factors. These results potentially can have a significant impact on complex dose delivery techniques requiring intensive dose calculations such as intensity-modulated radiation therapy (IMRT) and arc therapy. They also are relevant for adaptive radiation therapy where dose results must be obtained rapidly.
Use of GPUs in Trigger Systems

Science.gov (United States)

Lamanna, Gianluca

In recent years the interest for using graphics processor (GPU) in general purpose high performance computing is constantly rising. In this paper we discuss the possible use of GPUs to construct a fast and effective real time trigger system, both in software and hardware levels. In particular, we study the integration of such a system in the NA62 trigger. The first application of GPUs for rings pattern recognition in the RICH will be presented. The results obtained show that there are not showstoppers in trigger systems with relatively low latency. Thanks to the use of off-the-shelf technology, in continous development for purposes related to video game and image processing market, the architecture described would be easily exported to other experiments, to build a versatile and fully customizable online selection.
Iterative Methods for MPC on Graphical Processing Units

DEFF Research Database (Denmark)

Gade-Nielsen, Nicolai Fog; Jørgensen, John Bagterp; Dammann, Bernd

2012-01-01

The high oating point performance and memory bandwidth of Graphical Processing Units (GPUs) makes them ideal for a large number of computations which often arises in scientic computing, such as matrix operations. GPUs achieve this performance by utilizing massive par- allelism, which requires ree...... as to avoid the use of dense matrices, which may be too large for the limited memory capacity of current graphics cards.......The high oating point performance and memory bandwidth of Graphical Processing Units (GPUs) makes them ideal for a large number of computations which often arises in scientic computing, such as matrix operations. GPUs achieve this performance by utilizing massive par- allelism, which requires...
Deep Packet/Flow Analysis using GPUs

Energy Technology Data Exchange (ETDEWEB)

Gong, Qian [Fermi National Accelerator Lab. (FNAL), Batavia, IL (United States); Wu, Wenji [Fermi National Accelerator Lab. (FNAL), Batavia, IL (United States); DeMar, Phil [Fermi National Accelerator Lab. (FNAL), Batavia, IL (United States)

2017-11-12

Deep packet inspection (DPI) faces severe performance challenges in high-speed networks (40/100 GE) as it requires a large amount of raw computing power and high I/O throughputs. Recently, researchers have tentatively used GPUs to address the above issues and boost the performance of DPI. Typically, DPI applications involve highly complex operations in both per-packet and per-flow data level, often in real-time. The parallel architecture of GPUs fits exceptionally well for per-packet network traffic processing. However, for stateful network protocols such as TCP, their data stream need to be reconstructed in a per-flow level to deliver a consistent content analysis. Since the flow-centric operations are naturally antiparallel and often require large memory space for buffering out-of-sequence packets, they can be problematic for GPUs, whose memory is normally limited to several gigabytes. In this work, we present a highly efficient GPU-based deep packet/flow analysis framework. The proposed design includes a purely GPU-implemented flow tracking and TCP stream reassembly. Instead of buffering and waiting for TCP packets to become in sequence, our framework process the packets in batch and uses a deterministic finite automaton (DFA) with prefix-/suffix- tree method to detect patterns across out-of-sequence packets that happen to be located in different batches. In conclusion, evaluation shows that our code can reassemble and forward tens of millions of packets per second and conduct a stateful signature-based deep packet inspection at 55 Gbit/s using an NVIDIA K40 GPU.
End-to-end plasma bubble PIC simulations on GPUs

Science.gov (United States)

Germaschewski, Kai; Fox, William; Matteucci, Jackson; Bhattacharjee, Amitava

2017-10-01

Accelerator technologies play a crucial role in eventually achieving exascale computing capabilities. The current and upcoming leadership machines at ORNL (Titan and Summit) employ Nvidia GPUs, which provide vast computational power but also need specifically adapted computational kernels to fully exploit them. In this work, we will show end-to-end particle-in-cell simulations of the formation, evolution and coalescence of laser-generated plasma bubbles. This work showcases the GPU capabilities of the PSC particle-in-cell code, which has been adapted for this problem to support particle injection, a heating operator and a collision operator on GPUs.

GPUs for real-time processing in HEP trigger systems (CHEP2013: 20. international conference on computing in high energy and nuclear physics)

Energy Technology Data Exchange (ETDEWEB)

Lamanna, G; Lamanna, G; Piandani, R [INFN, Pisa (Italy); Ammendola, R [INFN, Rome " Tor Vergata" (Italy); Bauce, M; Giagu, S; Messina, A [University, Rome " Sapienza" (Italy); Biagioni, A; Lonardo, A; Paolucci, P S; Rescigno, M; Simula, F; Vicini, P [INFN, Rome " Sapienza" (Italy); Fantechi, R [CERN, Geneve (Switzerland); Fiorini, M [University and INFN, Ferrara (Italy); Graverini, E; Pantaleo, F; Sozzi, M [University, Pisa (Italy)

2014-06-11

We describe a pilot project for the use of Graphics Processing Units (GPUs) for online triggering applications in High Energy Physics (HEP) experiments. Two major trends can be identified in the development of trigger and DAQ systems for HEP experiments: the massive use of general-purpose commodity systems such as commercial multicore PC farms for data acquisition, and the reduction of trigger levels implemented in hardware, towards a pure software selection system (trigger-less). The very innovative approach presented here aims at exploiting the parallel computing power of commercial GPUs to perform fast computations in software both at low- and high-level trigger stages. General-purpose computing on GPUs is emerging as a new paradigm in several fields of science, although so far applications have been tailored to the specific strengths of such devices as accelerator in offline computation. With the steady reduction of GPU latencies, and the increase in link and memory throughputs, the use of such devices for real-time applications in high-energy physics data acquisition and trigger systems is becoming very attractive. We discuss in details the use of online parallel computing on GPUs for synchronous low-level trigger with fixed latency. In particular we show preliminary results on a first test in the NA62 experiment at CERN. The use of GPUs in high-level triggers is also considered, the ATLAS experiment (and in particular the muon trigger) at CERN will be taken as a study case of possible applications.
GPUs for real-time processing in HEP trigger systems (CHEP2013: 20. international conference on computing in high energy and nuclear physics)

International Nuclear Information System (INIS)

Lamanna, G; Lamanna, G; Piandani, R; Tor Vergata (Italy))" data-affiliation=" (INFN, Rome Tor Vergata (Italy))" >Ammendola, R; Sapienza (Italy))" data-affiliation=" (University, Rome Sapienza (Italy))" >Bauce, M; Sapienza (Italy))" data-affiliation=" (University, Rome Sapienza (Italy))" >Giagu, S; Sapienza (Italy))" data-affiliation=" (University, Rome Sapienza (Italy))" >Messina, A; Sapienza (Italy))" data-affiliation=" (INFN, Rome Sapienza (Italy))" >Biagioni, A; Sapienza (Italy))" data-affiliation=" (INFN, Rome Sapienza (Italy))" >Lonardo, A; Sapienza (Italy))" data-affiliation=" (INFN, Rome Sapienza (Italy))" >Paolucci, P S; Sapienza (Italy))" data-affiliation=" (INFN, Rome Sapienza (Italy))" >Rescigno, M; Sapienza (Italy))" data-affiliation=" (INFN, Rome Sapienza (Italy))" >Simula, F; Sapienza (Italy))" data-affiliation=" (INFN, Rome Sapienza (Italy))" >Vicini, P; Fantechi, R; Fiorini, M; Graverini, E; Pantaleo, F; Sozzi, M

2014-01-01

We describe a pilot project for the use of Graphics Processing Units (GPUs) for online triggering applications in High Energy Physics (HEP) experiments. Two major trends can be identified in the development of trigger and DAQ systems for HEP experiments: the massive use of general-purpose commodity systems such as commercial multicore PC farms for data acquisition, and the reduction of trigger levels implemented in hardware, towards a pure software selection system (trigger-less). The very innovative approach presented here aims at exploiting the parallel computing power of commercial GPUs to perform fast computations in software both at low- and high-level trigger stages. General-purpose computing on GPUs is emerging as a new paradigm in several fields of science, although so far applications have been tailored to the specific strengths of such devices as accelerator in offline computation. With the steady reduction of GPU latencies, and the increase in link and memory throughputs, the use of such devices for real-time applications in high-energy physics data acquisition and trigger systems is becoming very attractive. We discuss in details the use of online parallel computing on GPUs for synchronous low-level trigger with fixed latency. In particular we show preliminary results on a first test in the NA62 experiment at CERN. The use of GPUs in high-level triggers is also considered, the ATLAS experiment (and in particular the muon trigger) at CERN will be taken as a study case of possible applications.
GPUs for real-time processing in HEP trigger systems (ACAT2013: 15. international workshop on advanced computing and analysis techniques in physics research)

International Nuclear Information System (INIS)

Ammendola, R; Biagioni, A; Frezza, O; Cicero, F Lo; Lonardo, A; Messina, A; Paolucci, PS; Rossetti, D; Simula, F; Tosoratto, L; Vicini, P; Deri, L; Sozzi, M; Pantaleo, F; Fiorini, M; Lamanna, G

2014-01-01

We describe a pilot project (GAP – GPU Application Project) for the use of GPUs (Graphics processing units) for online triggering applications in High Energy Physics experiments. Two major trends can be identified in the development of trigger and DAQ systems for particle physics experiments: the massive use of general-purpose commodity systems such as commercial multicore PC farms for data acquisition, and the reduction of trigger levels implemented in hardware, towards a fully software data selection system ( t rigger-less ) . The innovative approach presented here aims at exploiting the parallel computing power of commercial GPUs to perform fast computations in software not only in high level trigger levels but also in early trigger stages. General-purpose computing on GPUs is emerging as a new paradigm in several fields of science, although so far applications have been tailored to the specific strengths of such devices as accelerators in offline computation. With the steady reduction of GPU latencies, and the increase in link and memory throughputs, the use of such devices for real-time applications in high energy physics data acquisition and trigger systems is becoming relevant. We discuss in detail the use of online parallel computing on GPUs for synchronous low-level triggers with fixed latency. In particular we show preliminary results on a first test in the CERN NA62 experiment. The use of GPUs in high level triggers is also considered, the CERN ATLAS experiment being taken as a case study of possible applications
ECC2K-130 on NVIDIA GPUs

NARCIS (Netherlands)

Bernstein, D.J.; Chen, H.-C.; Cheng, C.M.; Lange, T.; Niederhagen, R.F.; Schwabe, P.; Yang, B.Y.

2012-01-01

[Updated version of paper at Indocrypt 2010] A major cryptanalytic computation is currently underway on multiple platforms, including standard CPUs, FPGAs, PlayStations and GPUs, to break the Certicom ECC2K-130 challenge. This challenge is to compute an elliptic-curve discrete logarithm on a Koblitz
Fast in-memory elastic full-waveform inversion using consumer-grade GPUs

Science.gov (United States)

Sivertsen Bergslid, Tore; Birger Raknes, Espen; Arntsen, Børge

2017-04-01

Full-waveform inversion (FWI) is a technique to estimate subsurface properties by using the recorded waveform produced by a seismic source and applying inverse theory. This is done through an iterative optimization procedure, where each iteration requires solving the wave equation many times, then trying to minimize the difference between the modeled and the measured seismic data. Having to model many of these seismic sources per iteration means that this is a highly computationally demanding procedure, which usually involves writing a lot of data to disk. We have written code that does forward modeling and inversion entirely in memory. A typical HPC cluster has many more CPUs than GPUs. Since FWI involves modeling many seismic sources per iteration, the obvious approach is to parallelize the code on a source-by-source basis, where each core of the CPU performs one modeling, and do all modelings simultaneously. With this approach, the GPU is already at a major disadvantage in pure numbers. Fortunately, GPUs can more than make up for this hardware disadvantage by performing each modeling much faster than a CPU. Another benefit of parallelizing each individual modeling is that it lets each modeling use a lot more RAM. If one node has 128 GB of RAM and 20 CPU cores, each modeling can use only 6.4 GB RAM if one is running the node at full capacity with source-by-source parallelization on the CPU. A parallelized per-source code using GPUs can use 64 GB RAM per modeling. Whenever a modeling uses more RAM than is available and has to start using regular disk space the runtime increases dramatically, due to slow file I/O. The extremely high computational speed of the GPUs combined with the large amount of RAM available for each modeling lets us do high frequency FWI for fairly large models very quickly. For a single modeling, our GPU code outperforms the single-threaded CPU-code by a factor of about 75. Successful inversions have been run on data with frequencies up to 40
GPUs for real-time processing in HEP trigger systems (ACAT2013: 15. international workshop on advanced computing and analysis techniques in physics research)

Energy Technology Data Exchange (ETDEWEB)

Ammendola, R; Biagioni, A; Frezza, O; Cicero, F Lo; Lonardo, A; Messina, A; Paolucci, PS; Rossetti, D; Simula, F; Tosoratto, L; Vicini, P [INFN Roma,P.le A.Moro,2, 00185 Roma (Italy); Deri, L; Sozzi, M; Pantaleo, F [Pisa University, Largo B.Pontecorvo,3, 56127 Pisa (Italy); Fiorini, M [Ferrara University, Via Saragat,1, 44122 Ferrara (Italy); Lamanna, G [INFN Pisa, laro B.Pontecorvo,3, 56127 Pisa (Italy); Collaboration: GAP Collaboration

2014-06-06

We describe a pilot project (GAP – GPU Application Project) for the use of GPUs (Graphics processing units) for online triggering applications in High Energy Physics experiments. Two major trends can be identified in the development of trigger and DAQ systems for particle physics experiments: the massive use of general-purpose commodity systems such as commercial multicore PC farms for data acquisition, and the reduction of trigger levels implemented in hardware, towards a fully software data selection system ({sup t}rigger-less{sup )}. The innovative approach presented here aims at exploiting the parallel computing power of commercial GPUs to perform fast computations in software not only in high level trigger levels but also in early trigger stages. General-purpose computing on GPUs is emerging as a new paradigm in several fields of science, although so far applications have been tailored to the specific strengths of such devices as accelerators in offline computation. With the steady reduction of GPU latencies, and the increase in link and memory throughputs, the use of such devices for real-time applications in high energy physics data acquisition and trigger systems is becoming relevant. We discuss in detail the use of online parallel computing on GPUs for synchronous low-level triggers with fixed latency. In particular we show preliminary results on a first test in the CERN NA62 experiment. The use of GPUs in high level triggers is also considered, the CERN ATLAS experiment being taken as a case study of possible applications.
TESLA GPUs versus MPI with OpenMP for the Forward Modeling of Gravity and Gravity Gradient of Large Prisms Ensemble

Directory of Open Access Journals (Sweden)

Carlos Couder-Castañeda

2013-01-01

Full Text Available An implementation with the CUDA technology in a single and in several graphics processing units (GPUs is presented for the calculation of the forward modeling of gravitational fields from a tridimensional volumetric ensemble composed by unitary prisms of constant density. We compared the performance results obtained with the GPUs against a previous version coded in OpenMP with MPI, and we analyzed the results on both platforms. Today, the use of GPUs represents a breakthrough in parallel computing, which has led to the development of several applications with various applications. Nevertheless, in some applications the decomposition of the tasks is not trivial, as can be appreciated in this paper. Unlike a trivial decomposition of the domain, we proposed to decompose the problem by sets of prisms and use different memory spaces per processing CUDA core, avoiding the performance decay as a result of the constant calls to kernels functions which would be needed in a parallelization by observations points. The design and implementation created are the main contributions of this work, because the parallelization scheme implemented is not trivial. The performance results obtained are comparable to those of a small processing cluster.
Heterogeneous System Architectures from APUs to discrete GPUs

CERN Multimedia

CERN. Geneva

2013-01-01

We will present the Heterogeneous Systems Architectures that new AMD processors are bringing with the new GCN based GPUs and the new APUs. We will show how together they represent a huge step forward for programming flexibility and performance efficiently for Compute.
Efficient Synchronization Primitives for GPUs

OpenAIRE

Stuart, Jeff A.; Owens, John D.

2011-01-01

In this paper, we revisit the design of synchronization primitives---specifically barriers, mutexes, and semaphores---and how they apply to the GPU. Previous implementations are insufficient due to the discrepancies in hardware and programming model of the GPU and CPU. We create new implementations in CUDA and analyze the performance of spinning on the GPU, as well as a method of sleeping on the GPU, by running a set of memory-system benchmarks on two of the most common GPUs in use, the Tesla...
Energy Efficient Smartphones: Minimizing the Energy Consumption of Smartphone GPUs using DVFS Governors

KAUST Repository

Ahmad, Enas M.

2013-05-15

Modern smartphones are being designed with increasing processing power, memory capacity, network communication, and graphics performance. Although all of these features are enriching and expanding the experience of a smartphone user, they are significantly adding an overhead on the limited energy of the battery. This thesis aims at enhancing the energy efficiency of modern smartphones and increasing their battery life by minimizing the energy consumption of smartphones Graphical Processing Unit (GPU). Smartphone operating systems are becoming fully hardware-accelerated, which implies relying on the GPU power for rendering all application graphics. In addition, the GPUs installed in smartphones are becoming more and more powerful by the day. This raises an energy consumption concern. We present a novel implementation of GPU Scaling Governors, a Dynamic Voltage and Frequency Scaling (DVFS) scheme implemented in the Android kernel to dynamically scale the GPU. The scheme includes four main governors: Performance, Powersave, Ondmand, and Conservative. Unlike previous studies which looked into the power efficiency of mobile GPUs only through simulation and power estimations, we have implemented our approach on a real modern smartphone GPU, and acquired actual energy measurements using an external power monitor. Our results show that the energy consumption of smartphones can be reduced up to 15% using the Conservative governor in 2D rendering mode, and up to 9% in 3D rendering mode, with minimal effect on the performance.
A New Data Layout For Set Intersection on GPUs

DEFF Research Database (Denmark)

Amossen, Rasmus Resen; Pagh, Rasmus

2011-01-01

. However, GPUs require highly regular control flow and memory access patterns, and for this reason previous GPU methods for intersecting sets have used a simple bitmap representation. This representation requires excessive space on sparse data sets. In this paper we present a novel data layout, BATMAP...
Decryption-decompression of AES protected ZIP files on GPUs

Science.gov (United States)

Duong, Tan Nhat; Pham, Phong Hong; Nguyen, Duc Huu; Nguyen, Thuy Thanh; Le, Hung Duc

2011-10-01

AES is a strong encryption system, so decryption-decompression of AES encrypted ZIP files requires very large computing power and techniques of reducing the password space. This makes implementations of techniques on common computing system not practical. In [1], we reduced the original very large password search space to a much smaller one which surely containing the correct password. Based on reduced set of passwords, in this paper, we parallel decryption, decompression and plain text recognition for encrypted ZIP files by using CUDA computing technology on graphics cards GeForce GTX295 of NVIDIA, to find out the correct password. The experimental results have shown that the speed of decrypting, decompressing, recognizing plain text and finding out the original password increases about from 45 to 180 times (depends on the number of GPUs) compared to sequential execution on the Intel Core 2 Quad Q8400 2.66 GHz. These results have demonstrated the potential applicability of GPUs in this cryptanalysis field.
Batched Tile Low-Rank GEMM on GPUs

KAUST Repository

Charara, Ali

2018-02-01

Dense General Matrix-Matrix (GEMM) multiplication is a core operation of the Basic Linear Algebra Subroutines (BLAS) library, and therefore, often resides at the bottom of the traditional software stack for most of the scientific applications. In fact, chip manufacturers give a special attention to the GEMM kernel implementation since this is exactly where most of the high-performance software libraries extract the hardware performance. With the emergence of big data applications involving large data-sparse, hierarchically low-rank matrices, the off-diagonal tiles can be compressed to reduce the algorithmic complexity and the memory footprint. The resulting tile low-rank (TLR) data format is composed of small data structures, which retains the most significant information for each tile. However, to operate on low-rank tiles, a new GEMM operation and its corresponding API have to be designed on GPUs so that it can exploit the data sparsity structure of the matrix while leveraging the underlying TLR compression format. The main idea consists in aggregating all operations onto a single kernel launch to compensate for their low arithmetic intensities and to mitigate the data transfer overhead on GPUs. The new TLR GEMM kernel outperforms the cuBLAS dense batched GEMM by more than an order of magnitude and creates new opportunities for TLR advance algorithms.
Real-time radar signal processing using GPGPU (general-purpose graphic processing unit)

Science.gov (United States)

Kong, Fanxing; Zhang, Yan Rockee; Cai, Jingxiao; Palmer, Robert D.

2016-05-01

This study introduces a practical approach to develop real-time signal processing chain for general phased array radar on NVIDIA GPUs(Graphical Processing Units) using CUDA (Compute Unified Device Architecture) libraries such as cuBlas and cuFFT, which are adopted from open source libraries and optimized for the NVIDIA GPUs. The processed results are rigorously verified against those from the CPUs. Performance benchmarked in computation time with various input data cube sizes are compared across GPUs and CPUs. Through the analysis, it will be demonstrated that GPGPUs (General Purpose GPU) real-time processing of the array radar data is possible with relatively low-cost commercial GPUs.
Graph Processing on GPUs: A Survey

DEFF Research Database (Denmark)

Shi, Xuanhua; Zheng, Zhigao; Zhou, Yongluan

2018-01-01

hundreds of billions, has attracted much attention in both industry and academia. It still remains a great challenge to process such large-scale graphs. Researchers have been seeking for new possible solutions. Because of the massive degree of parallelism and the high memory access bandwidth in GPU......, utilizing GPU to accelerate graph processing proves to be a promising solution. This article surveys the key issues of graph processing on GPUs, including data layout, memory access pattern, workload mapping, and specific GPU programming. In this article, we summarize the state-of-the-art research on GPU...
Green smartphone GPUs: Optimizing energy consumption using GPUFreq scaling governors

KAUST Repository

Ahmad, Enas M.; Shihada, Basem

2015-01-01

and alternatives in controlling the power consumption and performance of their GPUs. We implemented and evaluated our model on a smartphone GPU and measured the energy performance using an external power monitor. The results show that the energy consumption
Open problems in CEM: Porting an explicit time-domain volume-integral- equation solver on GPUs with OpenACC

KAUST Repository

Ergül, Özgür

2014-04-01

Graphics processing units (GPUs) are gradually becoming mainstream in high-performance computing, as their capabilities for enhancing performance of a large spectrum of scientific applications to many fold when compared to multi-core CPUs have been clearly identified and proven. In this paper, implementation and performance-tuning details for porting an explicit marching-on-in-time (MOT)-based time-domain volume-integral-equation (TDVIE) solver onto GPUs are described in detail. To this end, a high-level approach, utilizing the OpenACC directive-based parallel programming model, is used to minimize two often-faced challenges in GPU programming: developer productivity and code portability. The MOT-TDVIE solver code, originally developed for CPUs, is annotated with compiler directives to port it to GPUs in a fashion similar to how OpenMP targets multi-core CPUs. In contrast to CUDA and OpenCL, where significant modifications to CPU-based codes are required, this high-level approach therefore requires minimal changes to the codes. In this work, we make use of two available OpenACC compilers, CAPS and PGI. Our experience reveals that different annotations of the code are required for each of the compilers, due to different interpretations of the fairly new standard by the compiler developers. Both versions of the OpenACC accelerated code achieved significant performance improvements, with up to 30× speedup against the sequential CPU code using recent hardware technology. Moreover, we demonstrated that the GPU-accelerated fully explicit MOT-TDVIE solver leveraged energy-consumption gains of the order of 3× against its CPU counterpart. © 2014 IEEE.
Development of Desktop Computing Applications and Engineering Tools on GPUs

DEFF Research Database (Denmark)

Sørensen, Hans Henrik Brandenborg; Glimberg, Stefan Lemvig; Hansen, Toke Jansen

(GPUs) for high-performance computing applications and software tools in science and engineering, inverse problems, visualization, imaging, dynamic optimization. The goals are to contribute to the development of new state-of-the-art mathematical models and algorithms for maximum throughout performance...
Graphics processing unit based computation for NDE applications

Science.gov (United States)

Nahas, C. A.; Rajagopal, Prabhu; Balasubramaniam, Krishnan; Krishnamurthy, C. V.

2012-05-01

Advances in parallel processing in recent years are helping to improve the cost of numerical simulation. Breakthroughs in Graphical Processing Unit (GPU) based computation now offer the prospect of further drastic improvements. The introduction of 'compute unified device architecture' (CUDA) by NVIDIA (the global technology company based in Santa Clara, California, USA) has made programming GPUs for general purpose computing accessible to the average programmer. Here we use CUDA to develop parallel finite difference schemes as applicable to two problems of interest to NDE community, namely heat diffusion and elastic wave propagation. The implementations are for two-dimensions. Performance improvement of the GPU implementation against serial CPU implementation is then discussed.
Online tracking with GPUs at the PANDA experiment

Energy Technology Data Exchange (ETDEWEB)

Bianchi, Ludovico; Herten, Andreas; Ritman, James; Stockmanns, Tobias [Forschungszentrum Juelich (Germany); Collaboration: PANDA-Collaboration

2015-07-01

The PANDA experiment is a next generation particle detector planned for operation at the FAIR facility, that will study collisions of antiprotons with beam momenta of 1.5-15 GeV/c on a fixed proton target. Signal and background events at PANDA will look very similar, making a conventional hardware-trigger based approach unfeasible. Instead, data coming from the detector are acquired continuously, and event selection is performed in real time. A rejection factor of up to 1000 is needed to reduce the data rate for offline storage, making the data acquisition system computationally very challenging. Our activity within the PANDA collaboration is centered on the development and implementation of particle tracking algorithms on Graphical Processing Units (GPUs), and on studying the possibility of performing tracking for online event filtering using a multi-GPU architecture. Three algorithms are currently being developed, using information from the PANDA tracking system: a Hough Transform, a Riemann Track Finder, and a Triplet Finder algorithm. This talk presents the algorithms, their performance, and studies for GPU data transfer methods based on so-called message queues for a deeper integration of the algorithms with the FairRoot and PandaRoot frameworks.

Accelerating Lattice QCD Multigrid on GPUs Using Fine-Grained Parallelization

Energy Technology Data Exchange (ETDEWEB)

Clark, M. A. [NVIDIA Corp., Santa Clara; Joó, Bálint [Jefferson Lab; Strelchenko, Alexei [Fermilab; Cheng, Michael [Boston U., Ctr. Comp. Sci.; Gambhir, Arjun [William-Mary Coll.; Brower, Richard [Boston U.

2016-12-22

The past decade has witnessed a dramatic acceleration of lattice quantum chromodynamics calculations in nuclear and particle physics. This has been due to both significant progress in accelerating the iterative linear solvers using multi-grid algorithms, and due to the throughput improvements brought by GPUs. Deploying hierarchical algorithms optimally on GPUs is non-trivial owing to the lack of parallelism on the coarse grids, and as such, these advances have not proved multiplicative. Using the QUDA library, we demonstrate that by exposing all sources of parallelism that the underlying stencil problem possesses, and through appropriate mapping of this parallelism to the GPU architecture, we can achieve high efficiency even for the coarsest of grids. Results are presented for the Wilson-Clover discretization, where we demonstrate up to 10x speedup over present state-of-the-art GPU-accelerated methods on Titan. Finally, we look to the future, and consider the software implications of our findings.
Accelerating Scientific Applications using High Performance Dense and Sparse Linear Algebra Kernels on GPUs

KAUST Repository

Abdelfattah, Ahmad

2015-01-15

High performance computing (HPC) platforms are evolving to more heterogeneous configurations to support the workloads of various applications. The current hardware landscape is composed of traditional multicore CPUs equipped with hardware accelerators that can handle high levels of parallelism. Graphical Processing Units (GPUs) are popular high performance hardware accelerators in modern supercomputers. GPU programming has a different model than that for CPUs, which means that many numerical kernels have to be redesigned and optimized specifically for this architecture. GPUs usually outperform multicore CPUs in some compute intensive and massively parallel applications that have regular processing patterns. However, most scientific applications rely on crucial memory-bound kernels and may witness bottlenecks due to the overhead of the memory bus latency. They can still take advantage of the GPU compute power capabilities, provided that an efficient architecture-aware design is achieved. This dissertation presents a uniform design strategy for optimizing critical memory-bound kernels on GPUs. Based on hierarchical register blocking, double buffering and latency hiding techniques, this strategy leverages the performance of a wide range of standard numerical kernels found in dense and sparse linear algebra libraries. The work presented here focuses on matrix-vector multiplication kernels (MVM) as repre- sentative and most important memory-bound operations in this context. Each kernel inherits the benefits of the proposed strategies. By exposing a proper set of tuning parameters, the strategy is flexible enough to suit different types of matrices, ranging from large dense matrices, to sparse matrices with dense block structures, while high performance is maintained. Furthermore, the tuning parameters are used to maintain the relative performance across different GPU architectures. Multi-GPU acceleration is proposed to scale the performance on several devices. The
Streaming Multiframe Deconvolutions on GPUs

Science.gov (United States)

Lee, M. A.; Budavári, T.

2015-09-01

Atmospheric turbulence distorts all ground-based observations, which is especially detrimental to faint detections. The point spread function (PSF) defining this blur is unknown for each exposure and varies significantly over time, making image analysis difficult. Lucky imaging and traditional co-adding throws away lots of information. We developed blind deconvolution algorithms that can simultaneously obtain robust solutions for the background image and all the PSFs. It is done in a streaming setting, which makes it practical for large number of big images. We implemented a new tool that runs of GPUs and achieves exceptional running times that can scale to the new time-domain surveys. Our code can quickly and effectively recover high-resolution images exceeding the quality of traditional co-adds. We demonstrate the power of the method on the repeated exposures in the Sloan Digital Sky Survey's Stripe 82.
Heterogeneous Multicore Parallel Programming for Graphics Processing Units

Directory of Open Access Journals (Sweden)

Francois Bodin

2009-01-01

Full Text Available Hybrid parallel multicore architectures based on graphics processing units (GPUs can provide tremendous computing power. Current NVIDIA and AMD Graphics Product Group hardware display a peak performance of hundreds of gigaflops. However, exploiting GPUs from existing applications is a difficult task that requires non-portable rewriting of the code. In this paper, we present HMPP, a Heterogeneous Multicore Parallel Programming workbench with compilers, developed by CAPS entreprise, that allows the integration of heterogeneous hardware accelerators in a unintrusive manner while preserving the legacy code.
SU (2) lattice gauge theory simulations on Fermi GPUs

International Nuclear Information System (INIS)

Cardoso, Nuno; Bicudo, Pedro

2011-01-01

In this work we explore the performance of CUDA in quenched lattice SU (2) simulations. CUDA, NVIDIA Compute Unified Device Architecture, is a hardware and software architecture developed by NVIDIA for computing on the GPU. We present an analysis and performance comparison between the GPU and CPU in single and double precision. Analyses with multiple GPUs and two different architectures (G200 and Fermi architectures) are also presented. In order to obtain a high performance, the code must be optimized for the GPU architecture, i.e., an implementation that exploits the memory hierarchy of the CUDA programming model. We produce codes for the Monte Carlo generation of SU (2) lattice gauge configurations, for the mean plaquette, for the Polyakov Loop at finite T and for the Wilson loop. We also present results for the potential using many configurations (50,000) without smearing and almost 2000 configurations with APE smearing. With two Fermi GPUs we have achieved an excellent performance of 200x the speed over one CPU, in single precision, around 110 Gflops/s. We also find that, using the Fermi architecture, double precision computations for the static quark-antiquark potential are not much slower (less than 2x slower) than single precision computations.
Accelerating MATLAB with GPU computing a primer with examples

CERN Document Server

Suh, Jung W

2013-01-01

Beyond simulation and algorithm development, many developers increasingly use MATLAB even for product deployment in computationally heavy fields. This often demands that MATLAB codes run faster by leveraging the distributed parallelism of Graphics Processing Units (GPUs). While MATLAB successfully provides high-level functions as a simulation tool for rapid prototyping, the underlying details and knowledge needed for utilizing GPUs make MATLAB users hesitate to step into it. Accelerating MATLAB with GPUs offers a primer on bridging this gap. Starting with the basics, setting up MATLAB for
Lattice QCD simulations using the OpenACC platform

International Nuclear Information System (INIS)

Majumdar, Pushan

2016-01-01

In this article we will explore the OpenACC platform for programming Graphics Processing Units (GPUs). The OpenACC platform offers a directive based programming model for GPUs which avoids the detailed data flow control and memory management necessary in a CUDA programming environment. In the OpenACC model, programs can be written in high level languages with OpenMP like directives. We present some examples of QCD simulation codes using OpenACC and discuss their performance on the Fermi and Kepler GPUs. (paper)
Monte Carlo MP2 on Many Graphical Processing Units.

Science.gov (United States)

Doran, Alexander E; Hirata, So

2016-10-11

In the Monte Carlo second-order many-body perturbation (MC-MP2) method, the long sum-of-product matrix expression of the MP2 energy, whose literal evaluation may be poorly scalable, is recast into a single high-dimensional integral of functions of electron pair coordinates, which is evaluated by the scalable method of Monte Carlo integration. The sampling efficiency is further accelerated by the redundant-walker algorithm, which allows a maximal reuse of electron pairs. Here, a multitude of graphical processing units (GPUs) offers a uniquely ideal platform to expose multilevel parallelism: fine-grain data-parallelism for the redundant-walker algorithm in which millions of threads compute and share orbital amplitudes on each GPU; coarse-grain instruction-parallelism for near-independent Monte Carlo integrations on many GPUs with few and infrequent interprocessor communications. While the efficiency boost by the redundant-walker algorithm on central processing units (CPUs) grows linearly with the number of electron pairs and tends to saturate when the latter exceeds the number of orbitals, on a GPU it grows quadratically before it increases linearly and then eventually saturates at a much larger number of pairs. This is because the orbital constructions are nearly perfectly parallelized on a GPU and thus completed in a near-constant time regardless of the number of pairs. In consequence, an MC-MP2/cc-pVDZ calculation of a benzene dimer is 2700 times faster on 256 GPUs (using 2048 electron pairs) than on two CPUs, each with 8 cores (which can use only up to 256 pairs effectively). We also numerically determine that the cost to achieve a given relative statistical uncertainty in an MC-MP2 energy increases as O(n 3 ) or better with system size n, which may be compared with the O(n 5 ) scaling of the conventional implementation of deterministic MP2. We thus establish the scalability of MC-MP2 with both system and computer sizes.
Applying graphics processor units to Monte Carlo dose calculation in radiation therapy

Directory of Open Access Journals (Sweden)

Bakhtiari M

2010-01-01

Full Text Available We investigate the potential in using of using a graphics processor unit (GPU for Monte-Carlo (MC-based radiation dose calculations. The percent depth dose (PDD of photons in a medium with known absorption and scattering coefficients is computed using a MC simulation running on both a standard CPU and a GPU. We demonstrate that the GPU′s capability for massive parallel processing provides a significant acceleration in the MC calculation, and offers a significant advantage for distributed stochastic simulations on a single computer. Harnessing this potential of GPUs will help in the early adoption of MC for routine planning in a clinical environment.
A parallel approximate string matching under Levenshtein distance on graphics processing units using warp-shuffle operations.

Directory of Open Access Journals (Sweden)

ThienLuan Ho

Full Text Available Approximate string matching with k-differences has a number of practical applications, ranging from pattern recognition to computational biology. This paper proposes an efficient memory-access algorithm for parallel approximate string matching with k-differences on Graphics Processing Units (GPUs. In the proposed algorithm, all threads in the same GPUs warp share data using warp-shuffle operation instead of accessing the shared memory. Moreover, we implement the proposed algorithm by exploiting the memory structure of GPUs to optimize its performance. Experiment results for real DNA packages revealed that the performance of the proposed algorithm and its implementation archived up to 122.64 and 1.53 times compared to that of sequential algorithm on CPU and previous parallel approximate string matching algorithm on GPUs, respectively.
Software-as-a-Service Offer Differentiation by Business Unit

Directory of Open Access Journals (Sweden)

Islam Balbaa

2011-01-01

Full Text Available This article summarizes the author's recent research into the fit between software-as-a-service (SaaS tools and the requirements of particular business units. First, an overview of SaaS is provided, including a summary of its benefits to users and software vendors. Next, the approach used to gather and analyze data about the SaaS solutions offered on the Force.com AppExchange is outlined. Finally, the article describes the managerial implications of this research.
Three-dimensional discrete ordinates reactor assembly calculations on GPUs

Energy Technology Data Exchange (ETDEWEB)

Evans, Thomas M [ORNL; Joubert, Wayne [ORNL; Hamilton, Steven P [ORNL; Johnson, Seth R [ORNL; Turner, John A [ORNL; Davidson, Gregory G [ORNL; Pandya, Tara M [ORNL

2015-01-01

In this paper we describe and demonstrate a discrete ordinates sweep algorithm on GPUs. This sweep algorithm is nested within a multilevel comunication-based decomposition based on energy. We demonstrated the effectiveness of this algorithm on detailed three-dimensional critical experiments and PWR lattice problems. For these problems we show improvement factors of 4 6 over conventional communication-based, CPU-only sweeps. These sweep kernel speedups resulted in a factor of 2 total time-to-solution improvement.
Accelerating the explicitly restarted Arnoldi method with GPUs using an auto-tuned matrix vector product

International Nuclear Information System (INIS)

Dubois, J.; Calvin, Ch.; Dubois, J.; Petiton, S.

2011-01-01

This paper presents a parallelized hybrid single-vector Arnoldi algorithm for computing approximations to Eigen-pairs of a nonsymmetric matrix. We are interested in the use of accelerators and multi-core units to speed up the Arnoldi process. The main goal is to propose a parallel version of the Arnoldi solver, which can efficiently use multiple multi-core processors or multiple graphics processing units (GPUs) in a mixed coarse and fine grain fashion. In the proposed algorithms, this is achieved by an auto-tuning of the matrix vector product before starting the Arnoldi Eigen-solver as well as the reorganization of the data and global communications so that communication time is reduced. The execution time, performance, and scalability are assessed with well-known dense and sparse test matrices on multiple Nehalems, GT200 NVidia Tesla, and next generation Fermi Tesla. With one processor, we see a performance speedup of 2 to 3x when using all the physical cores, and a total speedup of 2 to 8x when adding a GPU to this multi-core unit, and hence a speedup of 4 to 24x compared to the sequential solver. (authors)
Optimization of Monte Carlo algorithms and ray tracing on GPUs

International Nuclear Information System (INIS)

Bergmann, R.M.; Vujic, J.L.

2013-01-01

To take advantage of the computational power of GPUs (Graphical Processing Units), algorithms that work well on CPUs must be modified to conform to the GPU execution model. In this study, typical task-parallel Monte Carlo algorithms have been reformulated in a data-parallel way, and the benefits of doing so are examined. We were able to show that the data-parallel approach greatly improves thread coherency and keeps thread blocks busy, improving GPU utilization compared to the task-parallel approach. Data-parallel does not, however, outperform the task-parallel approach in regards to speedup over CPU. Regarding the ray-tracing acceleration, OptiX shows promise for providing enough ray tracing speed to be used in a full 3D Monte Carlo neutron transport code for reactor calculations. It is important to note that it is necessary to operate on large datasets of particle histories in order to have good performance in both OptiX and the data-parallel algorithm since this reduces the impact of latency. Our paper also shows the need to rewrite standard Monte Carlo algorithms in order to take full advantage of these new, powerful processor architectures
SIRFING: Sparse Image Reconstruction For INterferometry using GPUs

Science.gov (United States)

Cranmer, Miles; Garsden, Hugh; Mitchell, Daniel A.; Greenhill, Lincoln

2018-01-01

We present a deconvolution code for radio interferometric imaging based on the compressed sensing algorithms in Garsden et al. (2015). Being computationally intensive, compressed sensing is ripe for parallelization over GPUs. Our compressed sensing implementation generates images using wavelets, and we have ported the underlying wavelet library to CUDA, targeting the spline filter reconstruction part of the algorithm. The speedup achieved is almost an order of magnitude. The code is modular but is also being integrated into the calibration and imaging pipeline in use by the LEDA project at the Long Wavelength Array (LWA) as well as by the Murchinson Widefield Array (MWA).
Batched Triangular Dense Linear Algebra Kernels for Very Small Matrix Sizes on GPUs

KAUST Repository

Charara, Ali; Keyes, David E.; Ltaief, Hatem

2017-01-01

Batched dense linear algebra kernels are becoming ubiquitous in scientific applications, ranging from tensor contractions in deep learning to data compression in hierarchical low-rank matrix approximation. Within a single API call, these kernels are capable of simultaneously launching up to thousands of similar matrix computations, removing the expensive overhead of multiple API calls while increasing the occupancy of the underlying hardware. A challenge is that for the existing hardware landscape (x86, GPUs, etc.), only a subset of the required batched operations is implemented by the vendors, with limited support for very small problem sizes. We describe the design and performance of a new class of batched triangular dense linear algebra kernels on very small data sizes using single and multiple GPUs. By deploying two-sided recursive formulations, stressing the register usage, maintaining data locality, reducing threads synchronization and fusing successive kernel calls, the new batched kernels outperform existing state-of-the-art implementations.
Batched Triangular Dense Linear Algebra Kernels for Very Small Matrix Sizes on GPUs

KAUST Repository

Charara, Ali

2017-03-06

Batched dense linear algebra kernels are becoming ubiquitous in scientific applications, ranging from tensor contractions in deep learning to data compression in hierarchical low-rank matrix approximation. Within a single API call, these kernels are capable of simultaneously launching up to thousands of similar matrix computations, removing the expensive overhead of multiple API calls while increasing the occupancy of the underlying hardware. A challenge is that for the existing hardware landscape (x86, GPUs, etc.), only a subset of the required batched operations is implemented by the vendors, with limited support for very small problem sizes. We describe the design and performance of a new class of batched triangular dense linear algebra kernels on very small data sizes using single and multiple GPUs. By deploying two-sided recursive formulations, stressing the register usage, maintaining data locality, reducing threads synchronization and fusing successive kernel calls, the new batched kernels outperform existing state-of-the-art implementations.
A Weighted Spatial-Spectral Kernel RX Algorithm and Efficient Implementation on GPUs

Directory of Open Access Journals (Sweden)

Chunhui Zhao

2017-02-01

Full Text Available The kernel RX (KRX detector proposed by Kwon and Nasrabadi exploits a kernel function to obtain a better detection performance. However, it still has two limits that can be improved. On the one hand, reasonable integration of spatial-spectral information can be used to further improve its detection accuracy. On the other hand, parallel computing can be used to reduce the processing time in available KRX detectors. Accordingly, this paper presents a novel weighted spatial-spectral kernel RX (WSSKRX detector and its parallel implementation on graphics processing units (GPUs. The WSSKRX utilizes the spatial neighborhood resources to reconstruct the testing pixels by introducing a spectral factor and a spatial window, thereby effectively reducing the interference of background noise. Then, the kernel function is redesigned as a mapping trick in a KRX detector to implement the anomaly detection. In addition, a powerful architecture based on the GPU technique is designed to accelerate WSSKRX. To substantiate the performance of the proposed algorithm, both synthetic and real data are conducted for experiments.
Enhanced static ground power unit based on flying capacitor based h-bridge hybrid active-neutral-point-clamped converter

DEFF Research Database (Denmark)

Abarzadeh, Mostafa; Madadi Kojabadi, Hossein; Deng, Fujin

2016-01-01

Static power converters have various applications, such as static ground power units (GPUs) for airplanes. This study proposes a new configuration of a static GPU based on a novel nine-level flying capacitor h-bridge active-neutral-point-clamped (FCHB_ANPC) converter. The main advantages of the p......Static power converters have various applications, such as static ground power units (GPUs) for airplanes. This study proposes a new configuration of a static GPU based on a novel nine-level flying capacitor h-bridge active-neutral-point-clamped (FCHB_ANPC) converter. The main advantages...
Jet browser model accelerated by GPUs

Directory of Open Access Journals (Sweden)

Forster Richárd

2016-12-01

Full Text Available In the last centuries the experimental particle physics began to develop thank to growing capacity of computers among others. It is allowed to know the structure of the matter to level of quark gluon. Plasma in the strong interaction. Experimental evidences supported the theory to measure the predicted results. Since its inception the researchers are interested in the track reconstruction. We studied the jet browser model, which was developed for 4π calorimeter. This method works on the measurement data set, which contain the components of interaction points in the detector space and it allows to examine the trajectory reconstruction of the final state particles. We keep the total energy in constant values and it satisfies the Gauss law. Using GPUs the evaluation of the model can be drastically accelerated, as we were able to achieve up to 223 fold speedup compared to a CPU based parallel implementation.

A Strategy for Automatic Performance Tuning of Stencil Computations on GPUs

Directory of Open Access Journals (Sweden)

Joseph D. Garvey

2018-01-01

Full Text Available We propose and evaluate a novel strategy for tuning the performance of a class of stencil computations on Graphics Processing Units. The strategy uses a machine learning model to predict the optimal way to load data from memory followed by a heuristic that divides other optimizations into groups and exhaustively explores one group at a time. We use a set of 104 synthetic OpenCL stencil benchmarks that are representative of many real stencil computations. We first demonstrate the need for auto-tuning by showing that the optimization space is sufficiently complex that simple approaches to determining a high-performing configuration fail. We then demonstrate the effectiveness of our approach on NVIDIA and AMD GPUs. Relative to a random sampling of the space, we find configurations that are 12%/32% faster on the NVIDIA/AMD platform in 71% and 4% less time, respectively. Relative to an expert search, we achieve 5% and 9% better performance on the two platforms in 89% and 76% less time. We also evaluate our strategy for different stencil computational intensities, varying array sizes and shapes, and in combination with expert search.
A Framework for Lattice QCD Calculations on GPUs

Energy Technology Data Exchange (ETDEWEB)

Winter, Frank; Clark, M A; Edwards, Robert G; Joo, Balint

2014-08-01

Computing platforms equipped with accelerators like GPUs have proven to provide great computational power. However, exploiting such platforms for existing scientific applications is not a trivial task. Current GPU programming frameworks such as CUDA C/C++ require low-level programming from the developer in order to achieve high performance code. As a result porting of applications to GPUs is typically limited to time-dominant algorithms and routines, leaving the remainder not accelerated which can open a serious Amdahl's law issue. The lattice QCD application Chroma allows to explore a different porting strategy. The layered structure of the software architecture logically separates the data-parallel from the application layer. The QCD Data-Parallel software layer provides data types and expressions with stencil-like operations suitable for lattice field theory and Chroma implements algorithms in terms of this high-level interface. Thus by porting the low-level layer one can effectively move the whole application in one swing to a different platform. The QDP-JIT/PTX library, the reimplementation of the low-level layer, provides a framework for lattice QCD calculations for the CUDA architecture. The complete software interface is supported and thus applications can be run unaltered on GPU-based parallel computers. This reimplementation was possible due to the availability of a JIT compiler (part of the NVIDIA Linux kernel driver) which translates an assembly-like language (PTX) to GPU code. The expression template technique is used to build PTX code generators and a software cache manages the GPU memory. This reimplementation allows us to deploy an efficient implementation of the full gauge-generation program with dynamical fermions on large-scale GPU-based machines such as Titan and Blue Waters which accelerates the algorithm by more than an order of magnitude.
Mosaic: An Application-Transparent Hardware-Software Cooperative Memory Manager for GPUs

OpenAIRE

Ausavarungnirun, Rachata; Landgraf, Joshua; Miller, Vance; Ghose, Saugata; Gandhi, Jayneel; Rossbach, Christopher J.; Mutlu, Onur

2018-01-01

Modern GPUs face a trade-off on how the page size used for memory management affects address translation and demand paging. Support for multiple page sizes can help relax the page size trade-off so that address translation and demand paging optimizations work together synergistically. However, existing page coalescing and splintering policies require costly base page migrations that undermine the benefits multiple page sizes provide. In this paper, we observe that GPGPU applications present a...
Accelerating Astronomy & Astrophysics in the New Era of Parallel Computing: GPUs, Phi and Cloud Computing

Science.gov (United States)

Ford, Eric B.; Dindar, Saleh; Peters, Jorg

2015-08-01

The realism of astrophysical simulations and statistical analyses of astronomical data are set by the available computational resources. Thus, astronomers and astrophysicists are constantly pushing the limits of computational capabilities. For decades, astronomers benefited from massive improvements in computational power that were driven primarily by increasing clock speeds and required relatively little attention to details of the computational hardware. For nearly a decade, increases in computational capabilities have come primarily from increasing the degree of parallelism, rather than increasing clock speeds. Further increases in computational capabilities will likely be led by many-core architectures such as Graphical Processing Units (GPUs) and Intel Xeon Phi. Successfully harnessing these new architectures, requires significantly more understanding of the hardware architecture, cache hierarchy, compiler capabilities and network network characteristics.I will provide an astronomer's overview of the opportunities and challenges provided by modern many-core architectures and elastic cloud computing. The primary goal is to help an astronomical audience understand what types of problems are likely to yield more than order of magnitude speed-ups and which problems are unlikely to parallelize sufficiently efficiently to be worth the development time and/or costs.I will draw on my experience leading a team in developing the Swarm-NG library for parallel integration of large ensembles of small n-body systems on GPUs, as well as several smaller software projects. I will share lessons learned from collaborating with computer scientists, including both technical and soft skills. Finally, I will discuss the challenges of training the next generation of astronomers to be proficient in this new era of high-performance computing, drawing on experience teaching a graduate class on High-Performance Scientific Computing for Astrophysics and organizing a 2014 advanced summer
PuReMD-GPU: A reactive molecular dynamics simulation package for GPUs

International Nuclear Information System (INIS)

Kylasa, S.B.; Aktulga, H.M.; Grama, A.Y.

2014-01-01

We present an efficient and highly accurate GP-GPU implementation of our community code, PuReMD, for reactive molecular dynamics simulations using the ReaxFF force field. PuReMD and its incorporation into LAMMPS (Reax/C) is used by a large number of research groups worldwide for simulating diverse systems ranging from biomembranes to explosives (RDX) at atomistic level of detail. The sub-femtosecond time-steps associated with ReaxFF strongly motivate significant improvements to per-timestep simulation time through effective use of GPUs. This paper presents, in detail, the design and implementation of PuReMD-GPU, which enables ReaxFF simulations on GPUs, as well as various performance optimization techniques we developed to obtain high performance on state-of-the-art hardware. Comprehensive experiments on model systems (bulk water and amorphous silica) are presented to quantify the performance improvements achieved by PuReMD-GPU and to verify its accuracy. In particular, our experiments show up to 16× improvement in runtime compared to our highly optimized CPU-only single-core ReaxFF implementation. PuReMD-GPU is a unique production code, and is currently available on request from the authors
PuReMD-GPU: A reactive molecular dynamics simulation package for GPUs

Energy Technology Data Exchange (ETDEWEB)

Kylasa, S.B., E-mail: skylasa@purdue.edu [Department of Elec. and Comp. Eng., Purdue University, West Lafayette, IN 47907 (United States); Aktulga, H.M., E-mail: hmaktulga@lbl.gov [Lawrence Berkeley National Laboratory, 1 Cyclotron Rd, MS 50F-1650, Berkeley, CA 94720 (United States); Grama, A.Y., E-mail: ayg@cs.purdue.edu [Department of Computer Science, Purdue University, West Lafayette, IN 47907 (United States)

2014-09-01

We present an efficient and highly accurate GP-GPU implementation of our community code, PuReMD, for reactive molecular dynamics simulations using the ReaxFF force field. PuReMD and its incorporation into LAMMPS (Reax/C) is used by a large number of research groups worldwide for simulating diverse systems ranging from biomembranes to explosives (RDX) at atomistic level of detail. The sub-femtosecond time-steps associated with ReaxFF strongly motivate significant improvements to per-timestep simulation time through effective use of GPUs. This paper presents, in detail, the design and implementation of PuReMD-GPU, which enables ReaxFF simulations on GPUs, as well as various performance optimization techniques we developed to obtain high performance on state-of-the-art hardware. Comprehensive experiments on model systems (bulk water and amorphous silica) are presented to quantify the performance improvements achieved by PuReMD-GPU and to verify its accuracy. In particular, our experiments show up to 16× improvement in runtime compared to our highly optimized CPU-only single-core ReaxFF implementation. PuReMD-GPU is a unique production code, and is currently available on request from the authors.
Massive parallelization of a 3D finite difference electromagnetic forward solution using domain decomposition methods on multiple CUDA enabled GPUs

Science.gov (United States)

Schultz, A.

2010-12-01

3D forward solvers lie at the core of inverse formulations used to image the variation of electrical conductivity within the Earth's interior. This property is associated with variations in temperature, composition, phase, presence of volatiles, and in specific settings, the presence of groundwater, geothermal resources, oil/gas or minerals. The high cost of 3D solutions has been a stumbling block to wider adoption of 3D methods. Parallel algorithms for modeling frequency domain 3D EM problems have not achieved wide scale adoption, with emphasis on fairly coarse grained parallelism using MPI and similar approaches. The communications bandwidth as well as the latency required to send and receive network communication packets is a limiting factor in implementing fine grained parallel strategies, inhibiting wide adoption of these algorithms. Leading Graphics Processor Unit (GPU) companies now produce GPUs with hundreds of GPU processor cores per die. The footprint, in silicon, of the GPU's restricted instruction set is much smaller than the general purpose instruction set required of a CPU. Consequently, the density of processor cores on a GPU can be much greater than on a CPU. GPUs also have local memory, registers and high speed communication with host CPUs, usually through PCIe type interconnects. The extremely low cost and high computational power of GPUs provides the EM geophysics community with an opportunity to achieve fine grained (i.e. massive) parallelization of codes on low cost hardware. The current generation of GPUs (e.g. NVidia Fermi) provides 3 billion transistors per chip die, with nearly 500 processor cores and up to 6 GB of fast (DDR5) GPU memory. This latest generation of GPU supports fast hardware double precision (64 bit) floating point operations of the type required for frequency domain EM forward solutions. Each Fermi GPU board can sustain nearly 1 TFLOP in double precision, and multiple boards can be installed in the host computer system. We
TernaryNet: faster deep model inference without GPUs for medical 3D segmentation using sparse and binary convolutions.

Science.gov (United States)

Heinrich, Mattias P; Blendowski, Max; Oktay, Ozan

2018-05-30

Deep convolutional neural networks (DCNN) are currently ubiquitous in medical imaging. While their versatility and high-quality results for common image analysis tasks including segmentation, localisation and prediction is astonishing, the large representational power comes at the cost of highly demanding computational effort. This limits their practical applications for image-guided interventions and diagnostic (point-of-care) support using mobile devices without graphics processing units (GPU). We propose a new scheme that approximates both trainable weights and neural activations in deep networks by ternary values and tackles the open question of backpropagation when dealing with non-differentiable functions. Our solution enables the removal of the expensive floating-point matrix multiplications throughout any convolutional neural network and replaces them by energy- and time-preserving binary operators and population counts. We evaluate our approach for the segmentation of the pancreas in CT. Here, our ternary approximation within a fully convolutional network leads to more than 90% memory reductions and high accuracy (without any post-processing) with a Dice overlap of 71.0% that comes close to the one obtained when using networks with high-precision weights and activations. We further provide a concept for sub-second inference without GPUs and demonstrate significant improvements in comparison with binary quantisation and without our proposed ternary hyperbolic tangent continuation. We present a key enabling technique for highly efficient DCNN inference without GPUs that will help to bring the advances of deep learning to practical clinical applications. It has also great promise for improving accuracies in large-scale medical data retrieval.
Analysis of impact of general-purpose graphics processor units in supersonic flow modeling

Science.gov (United States)

Emelyanov, V. N.; Karpenko, A. G.; Kozelkov, A. S.; Teterina, I. V.; Volkov, K. N.; Yalozo, A. V.

2017-06-01

Computational methods are widely used in prediction of complex flowfields associated with off-normal situations in aerospace engineering. Modern graphics processing units (GPU) provide architectures and new programming models that enable to harness their large processing power and to design computational fluid dynamics (CFD) simulations at both high performance and low cost. Possibilities of the use of GPUs for the simulation of external and internal flows on unstructured meshes are discussed. The finite volume method is applied to solve three-dimensional unsteady compressible Euler and Navier-Stokes equations on unstructured meshes with high resolution numerical schemes. CUDA technology is used for programming implementation of parallel computational algorithms. Solutions of some benchmark test cases on GPUs are reported, and the results computed are compared with experimental and computational data. Approaches to optimization of the CFD code related to the use of different types of memory are considered. Speedup of solution on GPUs with respect to the solution on central processor unit (CPU) is compared. Performance measurements show that numerical schemes developed achieve 20-50 speedup on GPU hardware compared to CPU reference implementation. The results obtained provide promising perspective for designing a GPU-based software framework for applications in CFD.
Impact of memory bottleneck on the performance of graphics processing units

Science.gov (United States)

Son, Dong Oh; Choi, Hong Jun; Kim, Jong Myon; Kim, Cheol Hong

2015-12-01

Recent graphics processing units (GPUs) can process general-purpose applications as well as graphics applications with the help of various user-friendly application programming interfaces (APIs) supported by GPU vendors. Unfortunately, utilizing the hardware resource in the GPU efficiently is a challenging problem, since the GPU architecture is totally different to the traditional CPU architecture. To solve this problem, many studies have focused on the techniques for improving the system performance using GPUs. In this work, we analyze the GPU performance varying GPU parameters such as the number of cores and clock frequency. According to our simulations, the GPU performance can be improved by 125.8% and 16.2% on average as the number of cores and clock frequency increase, respectively. However, the performance is saturated when memory bottleneck problems incur due to huge data requests to the memory. The performance of GPUs can be improved as the memory bottleneck is reduced by changing GPU parameters dynamically.
Porting of the transfer-matrix method for multilayer thin-film computations on graphics processing units

Science.gov (United States)

Limmer, Steffen; Fey, Dietmar

2013-07-01

Thin-film computations are often a time-consuming task during optical design. An efficient way to accelerate these computations with the help of graphics processing units (GPUs) is described. It turned out that significant speed-ups can be achieved. We investigate the circumstances under which the best speed-up values can be expected. Therefore we compare different GPUs among themselves and with a modern CPU. Furthermore, the effect of thickness modulation on the speed-up and the runtime behavior depending on the input data is examined.
Memory transfer optimization for a lattice Boltzmann solver on Kepler architecture nVidia GPUs

Science.gov (United States)

Mawson, Mark J.; Revell, Alistair J.

2014-10-01

The Lattice Boltzmann method (LBM) for solving fluid flow is naturally well suited to an efficient implementation for massively parallel computing, due to the prevalence of local operations in the algorithm. This paper presents and analyses the performance of a 3D lattice Boltzmann solver, optimized for third generation nVidia GPU hardware, also known as 'Kepler'. We provide a review of previous optimization strategies and analyse data read/write times for different memory types. In LBM, the time propagation step (known as streaming), involves shifting data to adjacent locations and is central to parallel performance; here we examine three approaches which make use of different hardware options. Two of which make use of 'performance enhancing' features of the GPU; shared memory and the new shuffle instruction found in Kepler based GPUs. These are compared to a standard transfer of data which relies instead on optimized storage to increase coalesced access. It is shown that the more simple approach is most efficient; since the need for large numbers of registers per thread in LBM limits the block size and thus the efficiency of these special features is reduced. Detailed results are obtained for a D3Q19 LBM solver, which is benchmarked on nVidia K5000M and K20C GPUs. In the latter case the use of a read-only data cache is explored, and peak performance of over 1036 Million Lattice Updates Per Second (MLUPS) is achieved. The appearance of a periodic bottleneck in the solver performance is also reported, believed to be hardware related; spikes in iteration-time occur with a frequency of around 11 Hz for both GPUs, independent of the size of the problem.
Optimizing strassen matrix multiply on GPUs

KAUST Repository

ul Hasan Khan, Ayaz; Al-Mouhamed, Mayez; Fatayer, Allam

2015-01-01

© 2015 IEEE. Many core systems are basically designed for applications having large data parallelism. Strassen Matrix Multiply (MM) can be formulated as a depth first (DFS) traversal of a recursion tree where all cores work in parallel on computing each of the NxN sub-matrices that reduces storage at the detriment of large data motion to gather and aggregate the results. We propose Strassen and Winograd algorithms (S-MM and W-MM) based on three optimizations: a set of basic algebra functions to reduce overhead, invoking efficient library (CUBLAS 5.5), and parameter-tuning of parametric kernel to improve resource occupancy. On GPUs, W-MM and S-MM with one recursion level outperform CUBLAS 5.5 Library with up to twice as faster for large arrays satisfying N>=2048 and N>=3072, respectively. Compared to NVIDIA SDK library, S-MM and W-MM achieved a speedup between 20x to 80x for the above arrays. The proposed approach can be used to enhance the performance of CUBLAS and MKL libraries.
Optimizing strassen matrix multiply on GPUs

KAUST Repository

ul Hasan Khan, Ayaz

2015-06-01

© 2015 IEEE. Many core systems are basically designed for applications having large data parallelism. Strassen Matrix Multiply (MM) can be formulated as a depth first (DFS) traversal of a recursion tree where all cores work in parallel on computing each of the NxN sub-matrices that reduces storage at the detriment of large data motion to gather and aggregate the results. We propose Strassen and Winograd algorithms (S-MM and W-MM) based on three optimizations: a set of basic algebra functions to reduce overhead, invoking efficient library (CUBLAS 5.5), and parameter-tuning of parametric kernel to improve resource occupancy. On GPUs, W-MM and S-MM with one recursion level outperform CUBLAS 5.5 Library with up to twice as faster for large arrays satisfying N>=2048 and N>=3072, respectively. Compared to NVIDIA SDK library, S-MM and W-MM achieved a speedup between 20x to 80x for the above arrays. The proposed approach can be used to enhance the performance of CUBLAS and MKL libraries.
A 1.5 GFLOPS Reciprocal Unit for Computer Graphics

DEFF Research Database (Denmark)

Nannarelli, Alberto; Rasmussen, Morten Sleth; Stuart, Matthias Bo

2006-01-01

The reciprocal operation 1/d is a frequent operation performed in graphics processors (GPUs). In this work, we present the design of a radix-16 reciprocal unit based on the algorithm combining the traditional digit-by-digit algorithm and the approximation of the reciprocal by one Newton-Raphson i...
A Fast MHD Code for Gravitationally Stratified Media using Graphical Processing Units: SMAUG

Science.gov (United States)

Griffiths, M. K.; Fedun, V.; Erdélyi, R.

2015-03-01

Parallelization techniques have been exploited most successfully by the gaming/graphics industry with the adoption of graphical processing units (GPUs), possessing hundreds of processor cores. The opportunity has been recognized by the computational sciences and engineering communities, who have recently harnessed successfully the numerical performance of GPUs. For example, parallel magnetohydrodynamic (MHD) algorithms are important for numerical modelling of highly inhomogeneous solar, astrophysical and geophysical plasmas. Here, we describe the implementation of SMAUG, the Sheffield Magnetohydrodynamics Algorithm Using GPUs. SMAUG is a 1-3D MHD code capable of modelling magnetized and gravitationally stratified plasma. The objective of this paper is to present the numerical methods and techniques used for porting the code to this novel and highly parallel compute architecture. The methods employed are justified by the performance benchmarks and validation results demonstrating that the code successfully simulates the physics for a range of test scenarios including a full 3D realistic model of wave propagation in the solar atmosphere.
Monte Carlo methods for neutron transport on graphics processing units using Cuda - 015

International Nuclear Information System (INIS)

Nelson, A.G.; Ivanov, K.N.

2010-01-01

This work examined the feasibility of utilizing Graphics Processing Units (GPUs) to accelerate Monte Carlo neutron transport simulations. First, a clean-sheet MC code was written in C++ for an x86 CPU and later ported to run on GPUs using NVIDIA's CUDA programming language. After further optimization, the GPU ran 21 times faster than the CPU code when using single-precision floating point math. This can be further increased with no additional effort if accuracy is sacrificed for speed: using a compiler flag, the speedup was increased to 22x. Further, if double-precision floating point math is desired for neutron tracking through the geometry, a speedup of 11x was obtained. The GPUs have proven to be useful in this study, but the current generation does have limitations: the maximum memory currently available on a single GPU is only 4 GB; the GPU RAM does not provide error-checking and correction; and the optimization required for large speedups can lead to confusing code. (authors)
Accelerating Smith-Waterman Algorithm for Biological Database Search on CUDA-Compatible GPUs

Science.gov (United States)

Munekawa, Yuma; Ino, Fumihiko; Hagihara, Kenichi

This paper presents a fast method capable of accelerating the Smith-Waterman algorithm for biological database search on a cluster of graphics processing units (GPUs). Our method is implemented using compute unified device architecture (CUDA), which is available on the nVIDIA GPU. As compared with previous methods, our method has four major contributions. (1) The method efficiently uses on-chip shared memory to reduce the data amount being transferred between off-chip video memory and processing elements in the GPU. (2) It also reduces the number of data fetches by applying a data reuse technique to query and database sequences. (3) A pipelined method is also implemented to overlap GPU execution with database access. (4) Finally, a master/worker paradigm is employed to accelerate hundreds of database searches on a cluster system. In experiments, the peak performance on a GeForce GTX 280 card reaches 8.32 giga cell updates per second (GCUPS). We also find that our method reduces the amount of data fetches to 1/140, achieving approximately three times higher performance than a previous CUDA-based method. Our 32-node cluster version is approximately 28 times faster than a single GPU version. Furthermore, the effective performance reaches 75.6 giga instructions per second (GIPS) using 32 GeForce 8800 GTX cards.
A high throughput data acquisition and processing model for applications based on GPUs

International Nuclear Information System (INIS)

Nieto, J.; Arcas, G. de; Ruiz, M.; Castro, R.; Vega, J.; Guillen, P.

2015-01-01

Highlights: • Implementation of a direct communication path between a data acquisition NI FlexRIO device and a NVIDIA GPU device. • Customization of a Linux Kernel Open Driver (NI FlexRIO) and a C API Interface for work con NVIDIA RDMA GPUDirect. • Performance evaluation with respect to traditional model that use CPU as buffer data allocation. - Abstract: There is an increasing interest in the use of GPU technologies for real time analysis in fusion devices. The availability of high bandwidth interfaces has made them a very cost effective alternative not only for high volume data analysis or simulation, and commercial products are available for some interest areas. However from the point of view of their application in real time scenarios, there are still some issues under analysis, such as the possibility to improve the data throughput inside a discrete system consisting of data acquisition devices (DAQ) and GPUs. This paper addresses the possibility of using peer to peer data communication between DAQ devices and GPUs sharing the same PCIexpress bus to implement continuous real time acquisition and processing systems where data transfers require minimum CPU intervention. This technology eliminates unnecessary system memory copies and lowers CPU overhead, avoiding bottleneck when the system uses the main system memory.
A high throughput data acquisition and processing model for applications based on GPUs

Energy Technology Data Exchange (ETDEWEB)

Nieto, J., E-mail: jnieto@sec.upm.es [Instrumentation and Applied Acoustic Research Group, Technical University of Madrid (UPM), Madrid (Spain); Arcas, G. de; Ruiz, M. [Instrumentation and Applied Acoustic Research Group, Technical University of Madrid (UPM), Madrid (Spain); Castro, R.; Vega, J. [Data acquisition Group EURATOM/CIEMAT Association for Fusion, Madrid (Spain); Guillen, P. [Instrumentation and Applied Acoustic Research Group, Technical University of Madrid (UPM), Madrid (Spain)

2015-10-15

Highlights: • Implementation of a direct communication path between a data acquisition NI FlexRIO device and a NVIDIA GPU device. • Customization of a Linux Kernel Open Driver (NI FlexRIO) and a C API Interface for work con NVIDIA RDMA GPUDirect. • Performance evaluation with respect to traditional model that use CPU as buffer data allocation. - Abstract: There is an increasing interest in the use of GPU technologies for real time analysis in fusion devices. The availability of high bandwidth interfaces has made them a very cost effective alternative not only for high volume data analysis or simulation, and commercial products are available for some interest areas. However from the point of view of their application in real time scenarios, there are still some issues under analysis, such as the possibility to improve the data throughput inside a discrete system consisting of data acquisition devices (DAQ) and GPUs. This paper addresses the possibility of using peer to peer data communication between DAQ devices and GPUs sharing the same PCIexpress bus to implement continuous real time acquisition and processing systems where data transfers require minimum CPU intervention. This technology eliminates unnecessary system memory copies and lowers CPU overhead, avoiding bottleneck when the system uses the main system memory.

Accelerating Multiple Compound Comparison Using LINGO-Based Load-Balancing Strategies on Multi-GPUs.

Science.gov (United States)

Lin, Chun-Yuan; Wang, Chung-Hung; Hung, Che-Lun; Lin, Yu-Shiang

2015-01-01

Compound comparison is an important task for the computational chemistry. By the comparison results, potential inhibitors can be found and then used for the pharmacy experiments. The time complexity of a pairwise compound comparison is O(n (2)), where n is the maximal length of compounds. In general, the length of compounds is tens to hundreds, and the computation time is small. However, more and more compounds have been synthesized and extracted now, even more than tens of millions. Therefore, it still will be time-consuming when comparing with a large amount of compounds (seen as a multiple compound comparison problem, abbreviated to MCC). The intrinsic time complexity of MCC problem is O(k (2) n (2)) with k compounds of maximal length n. In this paper, we propose a GPU-based algorithm for MCC problem, called CUDA-MCC, on single- and multi-GPUs. Four LINGO-based load-balancing strategies are considered in CUDA-MCC in order to accelerate the computation speed among thread blocks on GPUs. CUDA-MCC was implemented by C+OpenMP+CUDA. CUDA-MCC achieved 45 times and 391 times faster than its CPU version on a single NVIDIA Tesla K20m GPU card and a dual-NVIDIA Tesla K20m GPU card, respectively, under the experimental results.
Statistical significance estimation of a signal within the GooFit framework on GPUs

Directory of Open Access Journals (Sweden)

Cristella Leonardo

2017-01-01

Full Text Available In order to test the computing capabilities of GPUs with respect to traditional CPU cores a high-statistics toy Monte Carlo technique has been implemented both in ROOT/RooFit and GooFit frameworks with the purpose to estimate the statistical significance of the structure observed by CMS close to the kinematical boundary of the J/ψϕ invariant mass in the three-body decay B+ → J/ψϕK+. GooFit is a data analysis open tool under development that interfaces ROOT/RooFit to CUDA platform on nVidia GPU. The optimized GooFit application running on GPUs hosted by servers in the Bari Tier2 provides striking speed-up performances with respect to the RooFit application parallelised on multiple CPUs by means of PROOF-Lite tool. The considerable resulting speed-up, evident when comparing concurrent GooFit processes allowed by CUDA Multi Process Service and a RooFit/PROOF-Lite process with multiple CPU workers, is presented and discussed in detail. By means of GooFit it has also been possible to explore the behaviour of a likelihood ratio test statistic in different situations in which the Wilks Theorem may or may not apply because its regularity conditions are not satisfied.
Accelerating cardiac bidomain simulations using graphics processing units.

Science.gov (United States)

Neic, A; Liebmann, M; Hoetzl, E; Mitchell, L; Vigmond, E J; Haase, G; Plank, G

2012-08-01

Anatomically realistic and biophysically detailed multiscale computer models of the heart are playing an increasingly important role in advancing our understanding of integrated cardiac function in health and disease. Such detailed simulations, however, are computationally vastly demanding, which is a limiting factor for a wider adoption of in-silico modeling. While current trends in high-performance computing (HPC) hardware promise to alleviate this problem, exploiting the potential of such architectures remains challenging since strongly scalable algorithms are necessitated to reduce execution times. Alternatively, acceleration technologies such as graphics processing units (GPUs) are being considered. While the potential of GPUs has been demonstrated in various applications, benefits in the context of bidomain simulations where large sparse linear systems have to be solved in parallel with advanced numerical techniques are less clear. In this study, the feasibility of multi-GPU bidomain simulations is demonstrated by running strong scalability benchmarks using a state-of-the-art model of rabbit ventricles. The model is spatially discretized using the finite element methods (FEM) on fully unstructured grids. The GPU code is directly derived from a large pre-existing code, the Cardiac Arrhythmia Research Package (CARP), with very minor perturbation of the code base. Overall, bidomain simulations were sped up by a factor of 11.8 to 16.3 in benchmarks running on 6-20 GPUs compared to the same number of CPU cores. To match the fastest GPU simulation which engaged 20 GPUs, 476 CPU cores were required on a national supercomputing facility.
Parallelized Kalman-Filter-Based Reconstruction of Particle Tracks on Many-Core Processors and GPUs

Science.gov (United States)

Cerati, Giuseppe; Elmer, Peter; Krutelyov, Slava; Lantz, Steven; Lefebvre, Matthieu; Masciovecchio, Mario; McDermott, Kevin; Riley, Daniel; Tadel, Matevž; Wittich, Peter; Würthwein, Frank; Yagil, Avi

2017-08-01

For over a decade now, physical and energy constraints have limited clock speed improvements in commodity microprocessors. Instead, chipmakers have been pushed into producing lower-power, multi-core processors such as Graphical Processing Units (GPU), ARM CPUs, and Intel MICs. Broad-based efforts from manufacturers and developers have been devoted to making these processors user-friendly enough to perform general computations. However, extracting performance from a larger number of cores, as well as specialized vector or SIMD units, requires special care in algorithm design and code optimization. One of the most computationally challenging problems in high-energy particle experiments is finding and fitting the charged-particle tracks during event reconstruction. This is expected to become by far the dominant problem at the High-Luminosity Large Hadron Collider (HL-LHC), for example. Today the most common track finding methods are those based on the Kalman filter. Experience with Kalman techniques on real tracking detector systems has shown that they are robust and provide high physics performance. This is why they are currently in use at the LHC, both in the trigger and offine. Previously we reported on the significant parallel speedups that resulted from our investigations to adapt Kalman filters to track fitting and track building on Intel Xeon and Xeon Phi. Here, we discuss our progresses toward the understanding of these processors and the new developments to port the Kalman filter to NVIDIA GPUs.
Parallelized Kalman-Filter-Based Reconstruction of Particle Tracks on Many-Core Processors and GPUs

Directory of Open Access Journals (Sweden)

Cerati Giuseppe

2017-01-01

Full Text Available For over a decade now, physical and energy constraints have limited clock speed improvements in commodity microprocessors. Instead, chipmakers have been pushed into producing lower-power, multi-core processors such as Graphical Processing Units (GPU, ARM CPUs, and Intel MICs. Broad-based efforts from manufacturers and developers have been devoted to making these processors user-friendly enough to perform general computations. However, extracting performance from a larger number of cores, as well as specialized vector or SIMD units, requires special care in algorithm design and code optimization. One of the most computationally challenging problems in high-energy particle experiments is finding and fitting the charged-particle tracks during event reconstruction. This is expected to become by far the dominant problem at the High-Luminosity Large Hadron Collider (HL-LHC, for example. Today the most common track finding methods are those based on the Kalman filter. Experience with Kalman techniques on real tracking detector systems has shown that they are robust and provide high physics performance. This is why they are currently in use at the LHC, both in the trigger and offine. Previously we reported on the significant parallel speedups that resulted from our investigations to adapt Kalman filters to track fitting and track building on Intel Xeon and Xeon Phi. Here, we discuss our progresses toward the understanding of these processors and the new developments to port the Kalman filter to NVIDIA GPUs.
Parallelized Kalman-Filter-Based Reconstruction of Particle Tracks on Many-Core Processors and GPUs

Energy Technology Data Exchange (ETDEWEB)

Cerati, Giuseppe [Fermilab; Elmer, Peter [Princeton U.; Krutelyov, Slava [UC, San Diego; Lantz, Steven [Cornell U.; Lefebvre, Matthieu [Princeton U.; Masciovecchio, Mario [UC, San Diego; McDermott, Kevin [Cornell U.; Riley, Daniel [Cornell U., LNS; Tadel, Matevž [UC, San Diego; Wittich, Peter [Cornell U.; Würthwein, Frank [UC, San Diego; Yagil, Avi [UC, San Diego

2017-01-01

For over a decade now, physical and energy constraints have limited clock speed improvements in commodity microprocessors. Instead, chipmakers have been pushed into producing lower-power, multi-core processors such as Graphical Processing Units (GPU), ARM CPUs, and Intel MICs. Broad-based efforts from manufacturers and developers have been devoted to making these processors user-friendly enough to perform general computations. However, extracting performance from a larger number of cores, as well as specialized vector or SIMD units, requires special care in algorithm design and code optimization. One of the most computationally challenging problems in high-energy particle experiments is finding and fitting the charged-particle tracks during event reconstruction. This is expected to become by far the dominant problem at the High-Luminosity Large Hadron Collider (HL-LHC), for example. Today the most common track finding methods are those based on the Kalman filter. Experience with Kalman techniques on real tracking detector systems has shown that they are robust and provide high physics performance. This is why they are currently in use at the LHC, both in the trigger and offine. Previously we reported on the significant parallel speedups that resulted from our investigations to adapt Kalman filters to track fitting and track building on Intel Xeon and Xeon Phi. Here, we discuss our progresses toward the understanding of these processors and the new developments to port the Kalman filter to NVIDIA GPUs.
Energy- and cost-efficient lattice-QCD computations using graphics processing units

Energy Technology Data Exchange (ETDEWEB)

Bach, Matthias

2014-07-01

Quarks and gluons are the building blocks of all hadronic matter, like protons and neutrons. Their interaction is described by Quantum Chromodynamics (QCD), a theory under test by large scale experiments like the Large Hadron Collider (LHC) at CERN and in the future at the Facility for Antiproton and Ion Research (FAIR) at GSI. However, perturbative methods can only be applied to QCD for high energies. Studies from first principles are possible via a discretization onto an Euclidean space-time grid. This discretization of QCD is called Lattice QCD (LQCD) and is the only ab-initio option outside of the high-energy regime. LQCD is extremely compute and memory intensive. In particular, it is by definition always bandwidth limited. Thus - despite the complexity of LQCD applications - it led to the development of several specialized compute platforms and influenced the development of others. However, in recent years General-Purpose computation on Graphics Processing Units (GPGPU) came up as a new means for parallel computing. Contrary to machines traditionally used for LQCD, graphics processing units (GPUs) are a massmarket product. This promises advantages in both the pace at which higher-performing hardware becomes available and its price. CL2QCD is an OpenCL based implementation of LQCD using Wilson fermions that was developed within this thesis. It operates on GPUs by all major vendors as well as on central processing units (CPUs). On the AMD Radeon HD 7970 it provides the fastest double-precision D kernel for a single GPU, achieving 120GFLOPS. D - the most compute intensive kernel in LQCD simulations - is commonly used to compare LQCD platforms. This performance is enabled by an in-depth analysis of optimization techniques for bandwidth-limited codes on GPUs. Further, analysis of the communication between GPU and CPU, as well as between multiple GPUs, enables high-performance Krylov space solvers and linear scaling to multiple GPUs within a single system. LQCD
Energy- and cost-efficient lattice-QCD computations using graphics processing units

International Nuclear Information System (INIS)

Bach, Matthias

2014-01-01

Quarks and gluons are the building blocks of all hadronic matter, like protons and neutrons. Their interaction is described by Quantum Chromodynamics (QCD), a theory under test by large scale experiments like the Large Hadron Collider (LHC) at CERN and in the future at the Facility for Antiproton and Ion Research (FAIR) at GSI. However, perturbative methods can only be applied to QCD for high energies. Studies from first principles are possible via a discretization onto an Euclidean space-time grid. This discretization of QCD is called Lattice QCD (LQCD) and is the only ab-initio option outside of the high-energy regime. LQCD is extremely compute and memory intensive. In particular, it is by definition always bandwidth limited. Thus - despite the complexity of LQCD applications - it led to the development of several specialized compute platforms and influenced the development of others. However, in recent years General-Purpose computation on Graphics Processing Units (GPGPU) came up as a new means for parallel computing. Contrary to machines traditionally used for LQCD, graphics processing units (GPUs) are a massmarket product. This promises advantages in both the pace at which higher-performing hardware becomes available and its price. CL2QCD is an OpenCL based implementation of LQCD using Wilson fermions that was developed within this thesis. It operates on GPUs by all major vendors as well as on central processing units (CPUs). On the AMD Radeon HD 7970 it provides the fastest double-precision D kernel for a single GPU, achieving 120GFLOPS. D - the most compute intensive kernel in LQCD simulations - is commonly used to compare LQCD platforms. This performance is enabled by an in-depth analysis of optimization techniques for bandwidth-limited codes on GPUs. Further, analysis of the communication between GPU and CPU, as well as between multiple GPUs, enables high-performance Krylov space solvers and linear scaling to multiple GPUs within a single system. LQCD
Fast analysis of molecular dynamics trajectories with graphics processing units-Radial distribution function histogramming

International Nuclear Information System (INIS)

Levine, Benjamin G.; Stone, John E.; Kohlmeyer, Axel

2011-01-01

The calculation of radial distribution functions (RDFs) from molecular dynamics trajectory data is a common and computationally expensive analysis task. The rate limiting step in the calculation of the RDF is building a histogram of the distance between atom pairs in each trajectory frame. Here we present an implementation of this histogramming scheme for multiple graphics processing units (GPUs). The algorithm features a tiling scheme to maximize the reuse of data at the fastest levels of the GPU's memory hierarchy and dynamic load balancing to allow high performance on heterogeneous configurations of GPUs. Several versions of the RDF algorithm are presented, utilizing the specific hardware features found on different generations of GPUs. We take advantage of larger shared memory and atomic memory operations available on state-of-the-art GPUs to accelerate the code significantly. The use of atomic memory operations allows the fast, limited-capacity on-chip memory to be used much more efficiently, resulting in a fivefold increase in performance compared to the version of the algorithm without atomic operations. The ultimate version of the algorithm running in parallel on four NVIDIA GeForce GTX 480 (Fermi) GPUs was found to be 92 times faster than a multithreaded implementation running on an Intel Xeon 5550 CPU. On this multi-GPU hardware, the RDF between two selections of 1,000,000 atoms each can be calculated in 26.9 s per frame. The multi-GPU RDF algorithms described here are implemented in VMD, a widely used and freely available software package for molecular dynamics visualization and analysis.
Accelerating Multiple Compound Comparison Using LINGO-Based Load-Balancing Strategies on Multi-GPUs

Directory of Open Access Journals (Sweden)

Chun-Yuan Lin

2015-01-01

Full Text Available Compound comparison is an important task for the computational chemistry. By the comparison results, potential inhibitors can be found and then used for the pharmacy experiments. The time complexity of a pairwise compound comparison is O(n2, where n is the maximal length of compounds. In general, the length of compounds is tens to hundreds, and the computation time is small. However, more and more compounds have been synthesized and extracted now, even more than tens of millions. Therefore, it still will be time-consuming when comparing with a large amount of compounds (seen as a multiple compound comparison problem, abbreviated to MCC. The intrinsic time complexity of MCC problem is O(k2n2 with k compounds of maximal length n. In this paper, we propose a GPU-based algorithm for MCC problem, called CUDA-MCC, on single- and multi-GPUs. Four LINGO-based load-balancing strategies are considered in CUDA-MCC in order to accelerate the computation speed among thread blocks on GPUs. CUDA-MCC was implemented by C+OpenMP+CUDA. CUDA-MCC achieved 45 times and 391 times faster than its CPU version on a single NVIDIA Tesla K20m GPU card and a dual-NVIDIA Tesla K20m GPU card, respectively, under the experimental results.
Graphics processing units accelerated semiclassical initial value representation molecular dynamics

Energy Technology Data Exchange (ETDEWEB)

Tamascelli, Dario; Dambrosio, Francesco Saverio [Dipartimento di Fisica, Università degli Studi di Milano, via Celoria 16, 20133 Milano (Italy); Conte, Riccardo [Department of Chemistry and Cherry L. Emerson Center for Scientific Computation, Emory University, Atlanta, Georgia 30322 (United States); Ceotto, Michele, E-mail: michele.ceotto@unimi.it [Dipartimento di Chimica, Università degli Studi di Milano, via Golgi 19, 20133 Milano (Italy)

2014-05-07

This paper presents a Graphics Processing Units (GPUs) implementation of the Semiclassical Initial Value Representation (SC-IVR) propagator for vibrational molecular spectroscopy calculations. The time-averaging formulation of the SC-IVR for power spectrum calculations is employed. Details about the GPU implementation of the semiclassical code are provided. Four molecules with an increasing number of atoms are considered and the GPU-calculated vibrational frequencies perfectly match the benchmark values. The computational time scaling of two GPUs (NVIDIA Tesla C2075 and Kepler K20), respectively, versus two CPUs (Intel Core i5 and Intel Xeon E5-2687W) and the critical issues related to the GPU implementation are discussed. The resulting reduction in computational time and power consumption is significant and semiclassical GPU calculations are shown to be environment friendly.
Prevalence of Salmonella in 11 Spices Offered for Sale from Retail Establishments and in Imported Shipments Offered for Entry to the United States.

Science.gov (United States)

Zhang, Guodong; Hu, Lijun; Pouillot, Régis; Tatavarthy, Aparna; Doren, Jane M Van; Kleinmeier, Daria; Ziobro, George C; Melka, David; Wang, Hua; Brown, Eric W; Strain, Errol; Bunning, Vincent K; Musser, Steven M; Hammack, Thomas S

2017-10-04

The U.S. Food and Drug Administration conducted a survey to evaluate Salmonella prevalence and aerobic plate counts in packaged (dried) spices offered for sale at retail establishments in the United States. The study included 7,250 retail samples of 11 spice types that were collected during November 2013 to September 2014 and October 2014 to March 2015. No Salmonella-positive samples (based on analysis of 125 g) were found among retail samples of cumin seed (whole or ground), sesame seed (whole, not roasted or toasted, and not black), and white pepper (ground or cracked), for prevalence estimates of 0.00% with 95% Clopper and Pearson's confidence intervals of 0.00 to 0.67%, 0.00 to 0.70%, and 0.00 to 0.63%, respectively. Salmonella prevalence estimates (confidence intervals) for the other eight spice types were 0.19% (0.0048 to 1.1%) for basil leaf (whole, ground, crushed, or flakes), 0.24% (0.049 to 0.69%) for black pepper (whole, ground, or cracked), 0.56% (0.11 to 1.6%) for coriander seed (ground), 0.19% (0.0049 to 1.1%) for curry powder (ground mixture of spices), 0.49% (0.10 to 1.4%) for dehydrated garlic (powder, granules, or flakes), 0.15% (0.0038 to 0.83%) for oregano leaf (whole, ground, crushed, or flakes), 0.25% (0.03 to 0.88%) for paprika (ground or cracked), and 0.64% (0.17 to 1.6%) for red pepper (hot red pepper, e.g., chili, cayenne; ground, cracked, crushed, or flakes). Salmonella isolates were serotyped, and genomes were sequenced. Samples of these same 11 spice types were also examined from shipments of imported spices offered for entry to the United States from 1 October 2011 to 30 September 2015. Salmonella prevalence estimates (based on analysis of two 375-g composite samples) for shipments of imported spices were 1.7 to 18%. The Salmonella prevalence estimates for spices offered for sale at retail establishments for all of the spice types except dehydrated garlic and basil were significantly lower than estimates for shipments of imported spice
Exploiting graphics processing units for computational biology and bioinformatics.

Science.gov (United States)

Payne, Joshua L; Sinnott-Armstrong, Nicholas A; Moore, Jason H

2010-09-01

Advances in the video gaming industry have led to the production of low-cost, high-performance graphics processing units (GPUs) that possess more memory bandwidth and computational capability than central processing units (CPUs), the standard workhorses of scientific computing. With the recent release of generalpurpose GPUs and NVIDIA's GPU programming language, CUDA, graphics engines are being adopted widely in scientific computing applications, particularly in the fields of computational biology and bioinformatics. The goal of this article is to concisely present an introduction to GPU hardware and programming, aimed at the computational biologist or bioinformaticist. To this end, we discuss the primary differences between GPU and CPU architecture, introduce the basics of the CUDA programming language, and discuss important CUDA programming practices, such as the proper use of coalesced reads, data types, and memory hierarchies. We highlight each of these topics in the context of computing the all-pairs distance between instances in a dataset, a common procedure in numerous disciplines of scientific computing. We conclude with a runtime analysis of the GPU and CPU implementations of the all-pairs distance calculation. We show our final GPU implementation to outperform the CPU implementation by a factor of 1700.
Real-time capture and reconstruction system with multiple GPUs for a 3D live scene by a generation from 4K IP images to 8K holograms.

Science.gov (United States)

Ichihashi, Yasuyuki; Oi, Ryutaro; Senoh, Takanori; Yamamoto, Kenji; Kurita, Taiichiro

2012-09-10

We developed a real-time capture and reconstruction system for three-dimensional (3D) live scenes. In previous research, we used integral photography (IP) to capture 3D images and then generated holograms from the IP images to implement a real-time reconstruction system. In this paper, we use a 4K (3,840 × 2,160) camera to capture IP images and 8K (7,680 × 4,320) liquid crystal display (LCD) panels for the reconstruction of holograms. We investigate two methods for enlarging the 4K images that were captured by integral photography to 8K images. One of the methods increases the number of pixels of each elemental image. The other increases the number of elemental images. In addition, we developed a personal computer (PC) cluster system with graphics processing units (GPUs) for the enlargement of IP images and the generation of holograms from the IP images using fast Fourier transform (FFT). We used the Compute Unified Device Architecture (CUDA) as the development environment for the GPUs. The Fast Fourier transform is performed using the CUFFT (CUDA FFT) library. As a result, we developed an integrated system for performing all processing from the capture to the reconstruction of 3D images by using these components and successfully used this system to reconstruct a 3D live scene at 12 frames per second.
Near-global climate simulation at 1 km resolution: establishing a performance baseline on 4888 GPUs with COSMO 5.0

Science.gov (United States)

Fuhrer, Oliver; Chadha, Tarun; Hoefler, Torsten; Kwasniewski, Grzegorz; Lapillonne, Xavier; Leutwyler, David; Lüthi, Daniel; Osuna, Carlos; Schär, Christoph; Schulthess, Thomas C.; Vogt, Hannes

2018-05-01

The best hope for reducing long-standing global climate model biases is by increasing resolution to the kilometer scale. Here we present results from an ultrahigh-resolution non-hydrostatic climate model for a near-global setup running on the full Piz Daint supercomputer on 4888 GPUs (graphics processing units). The dynamical core of the model has been completely rewritten using a domain-specific language (DSL) for performance portability across different hardware architectures. Physical parameterizations and diagnostics have been ported using compiler directives. To our knowledge this represents the first complete atmospheric model being run entirely on accelerators on this scale. At a grid spacing of 930 m (1.9 km), we achieve a simulation throughput of 0.043 (0.23) simulated years per day and an energy consumption of 596 MWh per simulated year. Furthermore, we propose a new memory usage efficiency (MUE) metric that considers how efficiently the memory bandwidth - the dominant bottleneck of climate codes - is being used.
Multidisciplinary Simulation Acceleration using Multiple Shared-Memory Graphical Processing Units

Science.gov (United States)

Kemal, Jonathan Yashar

For purposes of optimizing and analyzing turbomachinery and other designs, the unsteady Favre-averaged flow-field differential equations for an ideal compressible gas can be solved in conjunction with the heat conduction equation. We solve all equations using the finite-volume multiple-grid numerical technique, with the dual time-step scheme used for unsteady simulations. Our numerical solver code targets CUDA-capable Graphical Processing Units (GPUs) produced by NVIDIA. Making use of MPI, our solver can run across networked compute notes, where each MPI process can use either a GPU or a Central Processing Unit (CPU) core for primary solver calculations. We use NVIDIA Tesla C2050/C2070 GPUs based on the Fermi architecture, and compare our resulting performance against Intel Zeon X5690 CPUs. Solver routines converted to CUDA typically run about 10 times faster on a GPU for sufficiently dense computational grids. We used a conjugate cylinder computational grid and ran a turbulent steady flow simulation using 4 increasingly dense computational grids. Our densest computational grid is divided into 13 blocks each containing 1033x1033 grid points, for a total of 13.87 million grid points or 1.07 million grid points per domain block. To obtain overall speedups, we compare the execution time of the solver's iteration loop, including all resource intensive GPU-related memory copies. Comparing the performance of 8 GPUs to that of 8 CPUs, we obtain an overall speedup of about 6.0 when using our densest computational grid. This amounts to an 8-GPU simulation running about 39.5 times faster than running than a single-CPU simulation.
Micromagnetic simulations using Graphics Processing Units

International Nuclear Information System (INIS)

Lopez-Diaz, L; Aurelio, D; Torres, L; Martinez, E; Hernandez-Lopez, M A; Gomez, J; Alejos, O; Carpentieri, M; Finocchio, G; Consolo, G

2012-01-01

The methodology for adapting a standard micromagnetic code to run on graphics processing units (GPUs) and exploit the potential for parallel calculations of this platform is discussed. GPMagnet, a general purpose finite-difference GPU-based micromagnetic tool, is used as an example. Speed-up factors of two orders of magnitude can be achieved with GPMagnet with respect to a serial code. This allows for running extensive simulations, nearly inaccessible with a standard micromagnetic solver, at reasonable computational times. (topical review)
Object tracking mask-based NLUT on GPUs for real-time generation of holographic videos of three-dimensional scenes.

Science.gov (United States)

Kwon, M-W; Kim, S-C; Yoon, S-E; Ho, Y-S; Kim, E-S

2015-02-09

A new object tracking mask-based novel-look-up-table (OTM-NLUT) method is proposed and implemented on graphics-processing-units (GPUs) for real-time generation of holographic videos of three-dimensional (3-D) scenes. Since the proposed method is designed to be matched with software and memory structures of the GPU, the number of compute-unified-device-architecture (CUDA) kernel function calls and the computer-generated hologram (CGH) buffer size of the proposed method have been significantly reduced. It therefore results in a great increase of the computational speed of the proposed method and enables real-time generation of CGH patterns of 3-D scenes. Experimental results show that the proposed method can generate 31.1 frames of Fresnel CGH patterns with 1,920 × 1,080 pixels per second, on average, for three test 3-D video scenarios with 12,666 object points on three GPU boards of NVIDIA GTX TITAN, and confirm the feasibility of the proposed method in the practical application of electro-holographic 3-D displays.
Data Sorting Using Graphics Processing Units

Directory of Open Access Journals (Sweden)

M. J. Mišić

2012-06-01

Full Text Available Graphics processing units (GPUs have been increasingly used for general-purpose computation in recent years. The GPU accelerated applications are found in both scientific and commercial domains. Sorting is considered as one of the very important operations in many applications, so its efficient implementation is essential for the overall application performance. This paper represents an effort to analyze and evaluate the implementations of the representative sorting algorithms on the graphics processing units. Three sorting algorithms (Quicksort, Merge sort, and Radix sort were evaluated on the Compute Unified Device Architecture (CUDA platform that is used to execute applications on NVIDIA graphics processing units. Algorithms were tested and evaluated using an automated test environment with input datasets of different characteristics. Finally, the results of this analysis are briefly discussed.
Batched QR and SVD Algorithms on GPUs with Applications in Hierarchical Matrix Compression

KAUST Repository

Halim Boukaram, Wajih

2017-09-14

We present high performance implementations of the QR and the singular value decomposition of a batch of small matrices hosted on the GPU with applications in the compression of hierarchical matrices. The one-sided Jacobi algorithm is used for its simplicity and inherent parallelism as a building block for the SVD of low rank blocks using randomized methods. We implement multiple kernels based on the level of the GPU memory hierarchy in which the matrices can reside and show substantial speedups against streamed cuSOLVER SVDs. The resulting batched routine is a key component of hierarchical matrix compression, opening up opportunities to perform H-matrix arithmetic efficiently on GPUs.

Batched QR and SVD Algorithms on GPUs with Applications in Hierarchical Matrix Compression

KAUST Repository

Halim Boukaram, Wajih; Turkiyyah, George; Ltaief, Hatem; Keyes, David E.

2017-01-01

We present high performance implementations of the QR and the singular value decomposition of a batch of small matrices hosted on the GPU with applications in the compression of hierarchical matrices. The one-sided Jacobi algorithm is used for its simplicity and inherent parallelism as a building block for the SVD of low rank blocks using randomized methods. We implement multiple kernels based on the level of the GPU memory hierarchy in which the matrices can reside and show substantial speedups against streamed cuSOLVER SVDs. The resulting batched routine is a key component of hierarchical matrix compression, opening up opportunities to perform H-matrix arithmetic efficiently on GPUs.
32 CFR 536.64 - Final offers.

Science.gov (United States)

2010-07-01

... 32 National Defense 3 2010-07-01 2010-07-01 true Final offers. 536.64 Section 536.64 National... UNITED STATES Investigation and Processing of Claims § 536.64 Final offers. (a) When claims personnel... less than the amount claimed, a settlement authority will make a written final offer within his or her...
Potenciando el aprendizaje proactivo con ILIAS&WebQuest: aprendiendo a paralelizar algoritmos con GPUs

OpenAIRE

Santamaría, J.; Espinilla, M.; Rivera, A. J.; Romero, S.

2010-01-01

Arquitectura de Computadores es una asignatura troncal de segundo ciclo de la titulación de Ingeniería de Telecomunicación (P.E. 2004) de la Universidad de Jaén, que desde el curso académico 2009/10 cuenta con una metodología de aprendizaje proactivo para motivar al alumno en la realización de las prácticas. En concreto, se ha abordado la enseñanza de la materia de paralelización de algoritmos haciendo uso de GPUs de tarjetas gráficas convencionales. Además, se ha d...
Animação em tempo real de rugas faciais explorando as modernas GPUs

OpenAIRE

Clausius Duque Gonçalves Reis

2010-01-01

Resumo: A modelagem e animação de rugas faciais têm sido tarefas desafiadoras, devido à variedade de conformações e sutilezas de detalhes que as rugas podem exibir. Neste trabalho, são descritos dois métodos de apresentação de rugas em tempo real, utilizando as modernas GPUs. Ambos os métodos são baseados no uso de shaders em GPU e em uma abordagem de normal mapping para aplicar rugas sobre modelos virtuais. O primeiro método utiliza áreas de influência descritas por mapas de textura para cal...
GPU-computing in econophysics and statistical physics

Science.gov (United States)

Preis, T.

2011-03-01

A recent trend in computer science and related fields is general purpose computing on graphics processing units (GPUs), which can yield impressive performance. With multiple cores connected by high memory bandwidth, today's GPUs offer resources for non-graphics parallel processing. This article provides a brief introduction into the field of GPU computing and includes examples. In particular computationally expensive analyses employed in financial market context are coded on a graphics card architecture which leads to a significant reduction of computing time. In order to demonstrate the wide range of possible applications, a standard model in statistical physics - the Ising model - is ported to a graphics card architecture as well, resulting in large speedup values.
A Block-Asynchronous Relaxation Method for Graphics Processing Units

OpenAIRE

Anzt, H.; Dongarra, J.; Heuveline, Vincent; Tomov, S.

2011-01-01

In this paper, we analyze the potential of asynchronous relaxation methods on Graphics Processing Units (GPUs). For this purpose, we developed a set of asynchronous iteration algorithms in CUDA and compared them with a parallel implementation of synchronous relaxation methods on CPU-based systems. For a set of test matrices taken from the University of Florida Matrix Collection we monitor the convergence behavior, the average iteration time and the total time-to-solution time. Analyzing the r...
High-Performance Pseudo-Random Number Generation on Graphics Processing Units

OpenAIRE

Nandapalan, Nimalan; Brent, Richard P.; Murray, Lawrence M.; Rendell, Alistair

2011-01-01

This work considers the deployment of pseudo-random number generators (PRNGs) on graphics processing units (GPUs), developing an approach based on the xorgens generator to rapidly produce pseudo-random numbers of high statistical quality. The chosen algorithm has configurable state size and period, making it ideal for tuning to the GPU architecture. We present a comparison of both speed and statistical quality with other common parallel, GPU-based PRNGs, demonstrating favourable performance o...
General purpose graphic processing unit implementation of adaptive pulse compression algorithms

Science.gov (United States)

Cai, Jingxiao; Zhang, Yan

2017-07-01

This study introduces a practical approach to implement real-time signal processing algorithms for general surveillance radar based on NVIDIA graphical processing units (GPUs). The pulse compression algorithms are implemented using compute unified device architecture (CUDA) libraries such as CUDA basic linear algebra subroutines and CUDA fast Fourier transform library, which are adopted from open source libraries and optimized for the NVIDIA GPUs. For more advanced, adaptive processing algorithms such as adaptive pulse compression, customized kernel optimization is needed and investigated. A statistical optimization approach is developed for this purpose without needing much knowledge of the physical configurations of the kernels. It was found that the kernel optimization approach can significantly improve the performance. Benchmark performance is compared with the CPU performance in terms of processing accelerations. The proposed implementation framework can be used in various radar systems including ground-based phased array radar, airborne sense and avoid radar, and aerospace surveillance radar.
Protein alignment algorithms with an efficient backtracking routine on multiple GPUs

Directory of Open Access Journals (Sweden)

Kierzynka Michal

2011-05-01

Full Text Available Abstract Background Pairwise sequence alignment methods are widely used in biological research. The increasing number of sequences is perceived as one of the upcoming challenges for sequence alignment methods in the nearest future. To overcome this challenge several GPU (Graphics Processing Unit computing approaches have been proposed lately. These solutions show a great potential of a GPU platform but in most cases address the problem of sequence database scanning and computing only the alignment score whereas the alignment itself is omitted. Thus, the need arose to implement the global and semiglobal Needleman-Wunsch, and Smith-Waterman algorithms with a backtracking procedure which is needed to construct the alignment. Results In this paper we present the solution that performs the alignment of every given sequence pair, which is a required step for progressive multiple sequence alignment methods, as well as for DNA recognition at the DNA assembly stage. Performed tests show that the implementation, with performance up to 6.3 GCUPS on a single GPU for affine gap penalties, is very efficient in comparison to other CPU and GPU-based solutions. Moreover, multiple GPUs support with load balancing makes the application very scalable. Conclusions The article shows that the backtracking procedure of the sequence alignment algorithms may be designed to fit in with the GPU architecture. Therefore, our algorithm, apart from scores, is able to compute pairwise alignments. This opens a wide range of new possibilities, allowing other methods from the area of molecular biology to take advantage of the new computational architecture. Performed tests show that the efficiency of the implementation is excellent. Moreover, the speed of our GPU-based algorithms can be almost linearly increased when using more than one graphics card.
Redesigning Triangular Dense Matrix Computations on GPUs

KAUST Repository

Charara, Ali

2016-08-09

A new implementation of the triangular matrix-matrix multiplication (TRMM) and the triangular solve (TRSM) kernels are described on GPU hardware accelerators. Although part of the Level 3 BLAS family, these highly computationally intensive kernels fail to achieve the percentage of the theoretical peak performance on GPUs that one would expect when running kernels with similar surface-to-volume ratio on hardware accelerators, i.e., the standard matrix-matrix multiplication (GEMM). The authors propose adopting a recursive formulation, which enriches the TRMM and TRSM inner structures with GEMM calls and, therefore, reduces memory traffic while increasing the level of concurrency. The new implementation enables efficient use of the GPU memory hierarchy and mitigates the latency overhead, to run at the speed of the higher cache levels. Performance comparisons show up to eightfold and twofold speedups for large dense matrix sizes, against the existing state-of-the-art TRMM and TRSM implementations from NVIDIA cuBLAS, respectively, across various GPU generations. Once integrated into high-level Cholesky-based dense linear algebra algorithms, the performance impact on the overall applications demonstrates up to fourfold and twofold speedups, against the equivalent native implementations, linked with cuBLAS TRMM and TRSM kernels, respectively. The new TRMM/TRSM kernel implementations are part of the open-source KBLAS software library (http://ecrc.kaust.edu.sa/Pages/Res-kblas.aspx) and are lined up for integration into the NVIDIA cuBLAS library in the upcoming v8.0 release.
Planet-disc interactions with Discontinuous Galerkin Methods using GPUs

Science.gov (United States)

Velasco Romero, David A.; Veiga, Maria Han; Teyssier, Romain; Masset, Frédéric S.

2018-05-01

We present a two-dimensional Cartesian code based on high order discontinuous Galerkin methods, implemented to run in parallel over multiple GPUs. A simple planet-disc setup is used to compare the behaviour of our code against the behaviour found using the FARGO3D code with a polar mesh. We make use of the time dependence of the torque exerted by the disc on the planet as a mean to quantify the numerical viscosity of the code. We find that the numerical viscosity of the Keplerian flow can be as low as a few 10-8r2Ω, r and Ω being respectively the local orbital radius and frequency, for fifth order schemes and resolution of ˜10-2r. Although for a single disc problem a solution of low numerical viscosity can be obtained at lower computational cost with FARGO3D (which is nearly an order of magnitude faster than a fifth order method), discontinuous Galerkin methods appear promising to obtain solutions of low numerical viscosity in more complex situations where the flow cannot be captured on a polar or spherical mesh concentric with the disc.
United Kingdom evidence on the behaviour of the beta or systematic risk of initial public offerings

OpenAIRE

Qian, Yi

2012-01-01

This study examines the beta or systematic risk of initial public offerings using a sample of newly issued stocks in the United Kingdom market. The findings are threefold. First, the beta risk estimation is found to decline over time. It corresponds with the differential information model which predicts that the risk of low information is high with uncertainty around it and will decline as the quantity of information increases. The quantity of information, in this case, is represented by time...
Accelerating Monte Carlo simulations of photon transport in a voxelized geometry using a massively parallel graphics processing unit

International Nuclear Information System (INIS)

Badal, Andreu; Badano, Aldo

2009-01-01

Purpose: It is a known fact that Monte Carlo simulations of radiation transport are computationally intensive and may require long computing times. The authors introduce a new paradigm for the acceleration of Monte Carlo simulations: The use of a graphics processing unit (GPU) as the main computing device instead of a central processing unit (CPU). Methods: A GPU-based Monte Carlo code that simulates photon transport in a voxelized geometry with the accurate physics models from PENELOPE has been developed using the CUDA programming model (NVIDIA Corporation, Santa Clara, CA). Results: An outline of the new code and a sample x-ray imaging simulation with an anthropomorphic phantom are presented. A remarkable 27-fold speed up factor was obtained using a GPU compared to a single core CPU. Conclusions: The reported results show that GPUs are currently a good alternative to CPUs for the simulation of radiation transport. Since the performance of GPUs is currently increasing at a faster pace than that of CPUs, the advantages of GPU-based software are likely to be more pronounced in the future.
Accelerating Monte Carlo simulations of photon transport in a voxelized geometry using a massively parallel graphics processing unit

Energy Technology Data Exchange (ETDEWEB)

Badal, Andreu; Badano, Aldo [Division of Imaging and Applied Mathematics, OSEL, CDRH, U.S. Food and Drug Administration, Silver Spring, Maryland 20993-0002 (United States)

2009-11-15

Purpose: It is a known fact that Monte Carlo simulations of radiation transport are computationally intensive and may require long computing times. The authors introduce a new paradigm for the acceleration of Monte Carlo simulations: The use of a graphics processing unit (GPU) as the main computing device instead of a central processing unit (CPU). Methods: A GPU-based Monte Carlo code that simulates photon transport in a voxelized geometry with the accurate physics models from PENELOPE has been developed using the CUDA programming model (NVIDIA Corporation, Santa Clara, CA). Results: An outline of the new code and a sample x-ray imaging simulation with an anthropomorphic phantom are presented. A remarkable 27-fold speed up factor was obtained using a GPU compared to a single core CPU. Conclusions: The reported results show that GPUs are currently a good alternative to CPUs for the simulation of radiation transport. Since the performance of GPUs is currently increasing at a faster pace than that of CPUs, the advantages of GPU-based software are likely to be more pronounced in the future.
Accelerating Monte Carlo simulations of photon transport in a voxelized geometry using a massively parallel graphics processing unit.

Science.gov (United States)

Badal, Andreu; Badano, Aldo

2009-11-01

It is a known fact that Monte Carlo simulations of radiation transport are computationally intensive and may require long computing times. The authors introduce a new paradigm for the acceleration of Monte Carlo simulations: The use of a graphics processing unit (GPU) as the main computing device instead of a central processing unit (CPU). A GPU-based Monte Carlo code that simulates photon transport in a voxelized geometry with the accurate physics models from PENELOPE has been developed using the CUDATM programming model (NVIDIA Corporation, Santa Clara, CA). An outline of the new code and a sample x-ray imaging simulation with an anthropomorphic phantom are presented. A remarkable 27-fold speed up factor was obtained using a GPU compared to a single core CPU. The reported results show that GPUs are currently a good alternative to CPUs for the simulation of radiation transport. Since the performance of GPUs is currently increasing at a faster pace than that of CPUs, the advantages of GPU-based software are likely to be more pronounced in the future.
Near-global climate simulation at 1 km resolution: establishing a performance baseline on 4888 GPUs with COSMO 5.0

Directory of Open Access Journals (Sweden)

O. Fuhrer

2018-05-01

Full Text Available The best hope for reducing long-standing global climate model biases is by increasing resolution to the kilometer scale. Here we present results from an ultrahigh-resolution non-hydrostatic climate model for a near-global setup running on the full Piz Daint supercomputer on 4888 GPUs (graphics processing units. The dynamical core of the model has been completely rewritten using a domain-specific language (DSL for performance portability across different hardware architectures. Physical parameterizations and diagnostics have been ported using compiler directives. To our knowledge this represents the first complete atmospheric model being run entirely on accelerators on this scale. At a grid spacing of 930 m (1.9 km, we achieve a simulation throughput of 0.043 (0.23 simulated years per day and an energy consumption of 596 MWh per simulated year. Furthermore, we propose a new memory usage efficiency (MUE metric that considers how efficiently the memory bandwidth – the dominant bottleneck of climate codes – is being used.
Modeling Thermally Driven Flow Problems with a Grid-Free Vortex Filament Scheme: Part 1

Science.gov (United States)

2018-02-01

simulation FMM Fast Multipole Method GPUs graphic processing units LES Large Eddy Simulation M-O Monin-Obukhov MPI Message Passing Interface Re Reynolds...mail.mil>. Grid-free representation of turbulent flow via vortex filaments offers a means for large eddy simulations that faithfully and efficiently...particle, Lagrangian, turbulence, grid-free, large eddy simulation , natural convection, thermal bubble 56 Pat Collins 301-394-5617Unclassified
Study on availability of GPU for scientific and engineering calculations

International Nuclear Information System (INIS)

Sakamoto, Kensaku; Kobayashi, Seiji

2009-07-01

Recently, the number of scientific and engineering calculations on GPUs (Graphic Processing Units) is increasing. It is said that GPUs have much higher peak floating-point processing power and memory bandwidth than CPUs (Central Processing Units). We have studied the effectiveness of GPUs by applying them to fundamental scientific and engineering calculations with CUDA (Compute Unified Device Architecture) development tools. The results have shown as follows: 1) Computations on GPUs are effective for such calculations as matrix operation, FFT (Fast Fourier Transform) and CFD (Computational Fluid Dynamics) in nuclear research region. 2) Highly-advanced programming is required for bringing out high performance of GPUs. 3) Double-precision performance is low and ECC (Error Correction Code) in graphic memory systems supports are lacking. (author)
BROCCOLI: Software for Fast fMRI Analysis on Many-Core CPUs and GPUs

Directory of Open Access Journals (Sweden)

Anders eEklund

2014-03-01

Full Text Available Analysis of functional magnetic resonance imaging (fMRI data is becoming ever more computationally demanding as temporal and spatial resolutions improve, and large, publicly available data sets proliferate. Moreover, methodological improvements in the neuroimaging pipeline, such as non-linear spatial normalization, non-parametric permutation tests and Bayesian Markov Chain Monte Carlo approaches, can dramatically increase the computational burden. Despite these challenges, there do not yet exist any fMRI software packages which leverage inexpensive and powerful graphics processing units (GPUs to perform these analyses. Here, we therefore present BROCCOLI, a free software package written in OpenCL (Open Computing Language that can be used for parallel analysis of fMRI data on a large variety of hardware configurations. BROCCOLI has, for example, been tested with an Intel CPU, an Nvidia GPU and an AMD GPU. These tests show that parallel processing of fMRI data can lead to significantly faster analysis pipelines. This speedup can be achieved on relatively standard hardware, but further, dramatic speed improvements require only a modest investment in GPU hardware. BROCCOLI (running on a GPU can perform non-linear spatial normalization to a 1 mm3 brain template in 4-6 seconds, and run a second level permutation test with 10,000 permutations in about a minute. These non-parametric tests are generally more robust than their parametric counterparts, and can also enable more sophisticated analyses by estimating complicated null distributions. Additionally, BROCCOLI includes support for Bayesian first-level fMRI analysis using a Gibbs sampler. The new software is freely available under GNU GPL3 and can be downloaded from github (https://github.com/wanderine/BROCCOLI/.
76 FR 55944 - In the Matter of Certain Electronic Devices With Image Processing Systems, Components Thereof...

Science.gov (United States)

2011-09-09

... having graphics processing units (``GPUs'') supplied by NVIDIA Corporation (``NVIDIA'') infringe any... show the ALJ addressed infringement relating to the NVIDIA GPUs; and (b) the evidence in the record, if any, that accused articles incorporating the NVIDIA GPUs infringe an asserted patent claim. Please...

Edge-preserving image denoising via group coordinate descent on the GPU

OpenAIRE

McGaffin, Madison G.; Fessler, Jeffrey A.

2015-01-01

Image denoising is a fundamental operation in image processing, and its applications range from the direct (photographic enhancement) to the technical (as a subproblem in image reconstruction algorithms). In many applications, the number of pixels has continued to grow, while the serial execution speed of computational hardware has begun to stall. New image processing algorithms must exploit the power offered by massively parallel architectures like graphics processing units (GPUs). This pape...
Multidimensional upwind hydrodynamics on unstructured meshes using graphics processing units - I. Two-dimensional uniform meshes

Science.gov (United States)

Paardekooper, S.-J.

2017-08-01

We present a new method for numerical hydrodynamics which uses a multidimensional generalization of the Roe solver and operates on an unstructured triangular mesh. The main advantage over traditional methods based on Riemann solvers, which commonly use one-dimensional flux estimates as building blocks for a multidimensional integration, is its inherently multidimensional nature, and as a consequence its ability to recognize multidimensional stationary states that are not hydrostatic. A second novelty is the focus on graphics processing units (GPUs). By tailoring the algorithms specifically to GPUs, we are able to get speedups of 100-250 compared to a desktop machine. We compare the multidimensional upwind scheme to a traditional, dimensionally split implementation of the Roe solver on several test problems, and we find that the new method significantly outperforms the Roe solver in almost all cases. This comes with increased computational costs per time-step, which makes the new method approximately a factor of 2 slower than a dimensionally split scheme acting on a structured grid.
Monte Carlo electron-photon transport using GPUs as an accelerator: Results for a water-aluminum-water phantom

Energy Technology Data Exchange (ETDEWEB)

Su, L.; Du, X.; Liu, T.; Xu, X. G. [Nuclear Engineering Program, Rensselaer Polytechnic Institute, Troy, NY 12180 (United States)

2013-07-01

An electron-photon coupled Monte Carlo code ARCHER - Accelerated Radiation-transport Computations in Heterogeneous Environments - is being developed at Rensselaer Polytechnic Institute as a software test bed for emerging heterogeneous high performance computers that utilize accelerators such as GPUs. In this paper, the preliminary results of code development and testing are presented. The electron transport in media was modeled using the class-II condensed history method. The electron energy considered ranges from a few hundred keV to 30 MeV. Moller scattering and bremsstrahlung processes above a preset energy were explicitly modeled. Energy loss below that threshold was accounted for using the Continuously Slowing Down Approximation (CSDA). Photon transport was dealt with using the delta tracking method. Photoelectric effect, Compton scattering and pair production were modeled. Voxelised geometry was supported. A serial ARHCHER-CPU was first written in C++. The code was then ported to the GPU platform using CUDA C. The hardware involved a desktop PC with an Intel Xeon X5660 CPU and six NVIDIA Tesla M2090 GPUs. ARHCHER was tested for a case of 20 MeV electron beam incident perpendicularly on a water-aluminum-water phantom. The depth and lateral dose profiles were found to agree with results obtained from well tested MC codes. Using six GPU cards, 6x10{sup 6} histories of electrons were simulated within 2 seconds. In comparison, the same case running the EGSnrc and MCNPX codes required 1645 seconds and 9213 seconds, respectively, on a CPU with a single core used. (authors)
Communication: A reduced scaling J-engine based reformulation of SOS-MP2 using graphics processing units

Energy Technology Data Exchange (ETDEWEB)

Maurer, S. A.; Kussmann, J.; Ochsenfeld, C., E-mail: Christian.Ochsenfeld@cup.uni-muenchen.de [Chair of Theoretical Chemistry, Department of Chemistry, University of Munich (LMU), Butenandtstr. 7, D-81377 München (Germany); Center for Integrated Protein Science (CIPSM) at the Department of Chemistry, University of Munich (LMU), Butenandtstr. 5–13, D-81377 München (Germany)

2014-08-07

We present a low-prefactor, cubically scaling scaled-opposite-spin second-order Møller-Plesset perturbation theory (SOS-MP2) method which is highly suitable for massively parallel architectures like graphics processing units (GPU). The scaling is reduced from O(N{sup 5}) to O(N{sup 3}) by a reformulation of the MP2-expression in the atomic orbital basis via Laplace transformation and the resolution-of-the-identity (RI) approximation of the integrals in combination with efficient sparse algebra for the 3-center integral transformation. In contrast to previous works that employ GPUs for post Hartree-Fock calculations, we do not simply employ GPU-based linear algebra libraries to accelerate the conventional algorithm. Instead, our reformulation allows to replace the rate-determining contraction step with a modified J-engine algorithm, that has been proven to be highly efficient on GPUs. Thus, our SOS-MP2 scheme enables us to treat large molecular systems in an accurate and efficient manner on a single GPU-server.
Communication: A reduced scaling J-engine based reformulation of SOS-MP2 using graphics processing units.

Science.gov (United States)

Maurer, S A; Kussmann, J; Ochsenfeld, C

2014-08-07

We present a low-prefactor, cubically scaling scaled-opposite-spin second-order Møller-Plesset perturbation theory (SOS-MP2) method which is highly suitable for massively parallel architectures like graphics processing units (GPU). The scaling is reduced from O(N⁵) to O(N³) by a reformulation of the MP2-expression in the atomic orbital basis via Laplace transformation and the resolution-of-the-identity (RI) approximation of the integrals in combination with efficient sparse algebra for the 3-center integral transformation. In contrast to previous works that employ GPUs for post Hartree-Fock calculations, we do not simply employ GPU-based linear algebra libraries to accelerate the conventional algorithm. Instead, our reformulation allows to replace the rate-determining contraction step with a modified J-engine algorithm, that has been proven to be highly efficient on GPUs. Thus, our SOS-MP2 scheme enables us to treat large molecular systems in an accurate and efficient manner on a single GPU-server.
Graphics Processing Units for HEP trigger systems

International Nuclear Information System (INIS)

Ammendola, R.; Bauce, M.; Biagioni, A.; Chiozzi, S.; Cotta Ramusino, A.; Fantechi, R.; Fiorini, M.; Giagu, S.; Gianoli, A.; Lamanna, G.; Lonardo, A.; Messina, A.

2016-01-01

General-purpose computing on GPUs (Graphics Processing Units) is emerging as a new paradigm in several fields of science, although so far applications have been tailored to the specific strengths of such devices as accelerator in offline computation. With the steady reduction of GPU latencies, and the increase in link and memory throughput, the use of such devices for real-time applications in high-energy physics data acquisition and trigger systems is becoming ripe. We will discuss the use of online parallel computing on GPU for synchronous low level trigger, focusing on CERN NA62 experiment trigger system. The use of GPU in higher level trigger system is also briefly considered.
Graphics Processing Units for HEP trigger systems

Energy Technology Data Exchange (ETDEWEB)

Ammendola, R. [INFN Sezione di Roma “Tor Vergata”, Via della Ricerca Scientifica 1, 00133 Roma (Italy); Bauce, M. [INFN Sezione di Roma “La Sapienza”, P.le A. Moro 2, 00185 Roma (Italy); University of Rome “La Sapienza”, P.lee A.Moro 2, 00185 Roma (Italy); Biagioni, A. [INFN Sezione di Roma “La Sapienza”, P.le A. Moro 2, 00185 Roma (Italy); Chiozzi, S.; Cotta Ramusino, A. [INFN Sezione di Ferrara, Via Saragat 1, 44122 Ferrara (Italy); University of Ferrara, Via Saragat 1, 44122 Ferrara (Italy); Fantechi, R. [INFN Sezione di Pisa, Largo B. Pontecorvo 3, 56127 Pisa (Italy); CERN, Geneve (Switzerland); Fiorini, M. [INFN Sezione di Ferrara, Via Saragat 1, 44122 Ferrara (Italy); University of Ferrara, Via Saragat 1, 44122 Ferrara (Italy); Giagu, S. [INFN Sezione di Roma “La Sapienza”, P.le A. Moro 2, 00185 Roma (Italy); University of Rome “La Sapienza”, P.lee A.Moro 2, 00185 Roma (Italy); Gianoli, A. [INFN Sezione di Ferrara, Via Saragat 1, 44122 Ferrara (Italy); University of Ferrara, Via Saragat 1, 44122 Ferrara (Italy); Lamanna, G., E-mail: gianluca.lamanna@cern.ch [INFN Sezione di Pisa, Largo B. Pontecorvo 3, 56127 Pisa (Italy); INFN Laboratori Nazionali di Frascati, Via Enrico Fermi 40, 00044 Frascati (Roma) (Italy); Lonardo, A. [INFN Sezione di Roma “La Sapienza”, P.le A. Moro 2, 00185 Roma (Italy); Messina, A. [INFN Sezione di Roma “La Sapienza”, P.le A. Moro 2, 00185 Roma (Italy); University of Rome “La Sapienza”, P.lee A.Moro 2, 00185 Roma (Italy); and others

2016-07-11

General-purpose computing on GPUs (Graphics Processing Units) is emerging as a new paradigm in several fields of science, although so far applications have been tailored to the specific strengths of such devices as accelerator in offline computation. With the steady reduction of GPU latencies, and the increase in link and memory throughput, the use of such devices for real-time applications in high-energy physics data acquisition and trigger systems is becoming ripe. We will discuss the use of online parallel computing on GPU for synchronous low level trigger, focusing on CERN NA62 experiment trigger system. The use of GPU in higher level trigger system is also briefly considered.
20 CFR 655.120 - Offered wage rate.

Science.gov (United States)

2010-04-01

... 20 Employees' Benefits 3 2010-04-01 2010-04-01 false Offered wage rate. 655.120 Section 655.120... the United States (H-2A Workers) Prefiling Procedures § 655.120 Offered wage rate. (a) To comply with... wage that is the highest of the AEWR, the prevailing hourly wage or piece rate, the agreed-upon...
APL on GPUs

DEFF Research Database (Denmark)

Henriksen, Troels; Dybdal, Martin; Urms, Henrik

2016-01-01

This paper demonstrates translation schemes by which programs written in a functional subset of APL can be compiled to code that is run efficiently on general purpose graphical processing units (GPGPUs). Furthermore, the generated programs can be straight-forwardly interoperated with mainstream p...... programming environments, such as Python, for example for purposes of visualization and user interaction. Finally, empirical evaluation shows that the GPGPU translation achieves speedups up to hundreds of times faster than sequential C compiled code.......This paper demonstrates translation schemes by which programs written in a functional subset of APL can be compiled to code that is run efficiently on general purpose graphical processing units (GPGPUs). Furthermore, the generated programs can be straight-forwardly interoperated with mainstream...
Experiences with High-Level Programming Directives for Porting Applications to GPUs

International Nuclear Information System (INIS)

Ding, Wei; Chapman, Barbara; Sankaran, Ramanan; Graham, Richard L.

2012-01-01

HPC systems now exploit GPUs within their compute nodes to accelerate program performance. As a result, high-end application development has become extremely complex at the node level. In addition to restructuring the node code to exploit the cores and specialized devices, the programmer may need to choose a programming model such as OpenMP or CPU threads in conjunction with an accelerator programming model to share and manage the difference node resources. This comes at a time when programmer productivity and the ability to produce portable code has been recognized as a major concern. In order to offset the high development cost of creating CUDA or OpenCL kernels, directives have been proposed for programming accelerator devices, but their implications are not well known. In this paper, we evaluate the state of the art accelerator directives to program several applications kernels, explore transformations to achieve good performance, and examine the expressiveness and performance penalty of using high-level directives versus CUDA. We also compare our results to OpenMP implementations to understand the benefits of running the kernels in the accelerator versus CPU cores.
GPUs for fast pattern matching in the RICH of the NA62 experiment

CERN Document Server

Lamanna, G; Sozzi, M

2011-01-01

In rare decays experiments an effective online selection is a fundamental part of the data acquisition system (DAQ) in order to reduce both the quantity of data written on tape and the bandwidth requirements for the DAQ system. A multilevel architecture is commonly used to achieve a higher reduction factor, exploiting dedicated custom hardware and flexible software in standard computers. In this paper we discuss the possibility to use commercial video card processors (GPU) to build a fast and effective trigger system, both at hardware and software level. The computing power of the GPUs allows to design a real-time system in which trigger decisions are taken directly in the video processor with a defined maximum latency. This allows building lowest trigger levels based on standard off-the-shelf PCs with CPU and GPU (instead of the commonly adopted solutions based on custom electronics with FPGA or ASICs) with enhanced and high performance computation capabilities, resulting in high rejection power, high effici...
Performance studies of GooFit on GPUs vs RooFit on CPUs while estimating the statistical significance of a new physical signal

Science.gov (United States)

Di Florio, Adriano

2017-10-01

In order to test the computing capabilities of GPUs with respect to traditional CPU cores a high-statistics toy Monte Carlo technique has been implemented both in ROOT/RooFit and GooFit frameworks with the purpose to estimate the statistical significance of the structure observed by CMS close to the kinematical boundary of the J/ψϕ invariant mass in the three-body decay B + → J/ψϕK +. GooFit is a data analysis open tool under development that interfaces ROOT/RooFit to CUDA platform on nVidia GPU. The optimized GooFit application running on GPUs hosted by servers in the Bari Tier2 provides striking speed-up performances with respect to the RooFit application parallelised on multiple CPUs by means of PROOF-Lite tool. The considerable resulting speed-up, evident when comparing concurrent GooFit processes allowed by CUDA Multi Process Service and a RooFit/PROOF-Lite process with multiple CPU workers, is presented and discussed in detail. By means of GooFit it has also been possible to explore the behaviour of a likelihood ratio test statistic in different situations in which the Wilks Theorem may or may not apply because its regularity conditions are not satisfied.
Selecting a Benchmark Suite to Profile High-Performance Computing (HPC) Machines

Science.gov (United States)

2014-11-01

architectures. Machines now contain central processing units (CPUs), graphics processing units (GPUs), and many integrated core ( MIC ) architecture all...evaluate the feasibility and applicability of a new architecture just released to the market . Researchers are often unsure how available resources will...architectures. Having a suite of programs running on different architectures, such as GPUs, MICs , and CPUs, adds complexity and technical challenges
Pushing Memory Bandwidth Limitations Through Efficient Implementations of Block-Krylov Space Solvers on GPUs

Energy Technology Data Exchange (ETDEWEB)

Clark, M. A. [NVIDIA Corp., Santa Clara; Strelchenko, Alexei [Fermilab; Vaquero, Alejandro [Utah U.; Wagner, Mathias [NVIDIA Corp., Santa Clara; Weinberg, Evan [Boston U.

2017-10-26

Lattice quantum chromodynamics simulations in nuclear physics have benefited from a tremendous number of algorithmic advances such as multigrid and eigenvector deflation. These improve the time to solution but do not alleviate the intrinsic memory-bandwidth constraints of the matrix-vector operation dominating iterative solvers. Batching this operation for multiple vectors and exploiting cache and register blocking can yield a super-linear speed up. Block-Krylov solvers can naturally take advantage of such batched matrix-vector operations, further reducing the iterations to solution by sharing the Krylov space between solves. However, practical implementations typically suffer from the quadratic scaling in the number of vector-vector operations. Using the QUDA library, we present an implementation of a block-CG solver on NVIDIA GPUs which reduces the memory-bandwidth complexity of vector-vector operations from quadratic to linear. We present results for the HISQ discretization, showing a 5x speedup compared to highly-optimized independent Krylov solves on NVIDIA's SaturnV cluster.
Acceleration of the OpenFOAM-based MHD solver using graphics processing units

International Nuclear Information System (INIS)

He, Qingyun; Chen, Hongli; Feng, Jingchao

2015-01-01

Highlights: • A 3D PISO-MHD was implemented on Kepler-class graphics processing units (GPUs) using CUDA technology. • A consistent and conservative scheme is used in the code which was validated by three basic benchmarks in a rectangular and round ducts. • Parallelized of CPU and GPU acceleration were compared relating to single core CPU in MHD problems and non-MHD problems. • Different preconditions for solving MHD solver were compared and the results showed that AMG method is better for calculations. - Abstract: The pressure-implicit with splitting of operators (PISO) magnetohydrodynamics MHD solver of the couple of Navier–Stokes equations and Maxwell equations was implemented on Kepler-class graphics processing units (GPUs) using the CUDA technology. The solver is developed on open source code OpenFOAM based on consistent and conservative scheme which is suitable for simulating MHD flow under strong magnetic field in fusion liquid metal blanket with structured or unstructured mesh. We verified the validity of the implementation on several standard cases including the benchmark I of Shercliff and Hunt's cases, benchmark II of fully developed circular pipe MHD flow cases and benchmark III of KIT experimental case. Computational performance of the GPU implementation was examined by comparing its double precision run times with those of essentially the same algorithms and meshes. The resulted showed that a GPU (GTX 770) can outperform a server-class 4-core, 8-thread CPU (Intel Core i7-4770k) by a factor of 2 at least.
Acceleration of the OpenFOAM-based MHD solver using graphics processing units

Energy Technology Data Exchange (ETDEWEB)

He, Qingyun; Chen, Hongli, E-mail: hlchen1@ustc.edu.cn; Feng, Jingchao

2015-12-15

Highlights: • A 3D PISO-MHD was implemented on Kepler-class graphics processing units (GPUs) using CUDA technology. • A consistent and conservative scheme is used in the code which was validated by three basic benchmarks in a rectangular and round ducts. • Parallelized of CPU and GPU acceleration were compared relating to single core CPU in MHD problems and non-MHD problems. • Different preconditions for solving MHD solver were compared and the results showed that AMG method is better for calculations. - Abstract: The pressure-implicit with splitting of operators (PISO) magnetohydrodynamics MHD solver of the couple of Navier–Stokes equations and Maxwell equations was implemented on Kepler-class graphics processing units (GPUs) using the CUDA technology. The solver is developed on open source code OpenFOAM based on consistent and conservative scheme which is suitable for simulating MHD flow under strong magnetic field in fusion liquid metal blanket with structured or unstructured mesh. We verified the validity of the implementation on several standard cases including the benchmark I of Shercliff and Hunt's cases, benchmark II of fully developed circular pipe MHD flow cases and benchmark III of KIT experimental case. Computational performance of the GPU implementation was examined by comparing its double precision run times with those of essentially the same algorithms and meshes. The resulted showed that a GPU (GTX 770) can outperform a server-class 4-core, 8-thread CPU (Intel Core i7-4770k) by a factor of 2 at least.
Evaluation of Selected Resource Allocation and Scheduling Methods in Heterogeneous Many-Core Processors and Graphics Processing Units

Directory of Open Access Journals (Sweden)

Ciznicki Milosz

2014-12-01

Full Text Available Heterogeneous many-core computing resources are increasingly popular among users due to their improved performance over homogeneous systems. Many developers have realized that heterogeneous systems, e.g. a combination of a shared memory multi-core CPU machine with massively parallel Graphics Processing Units (GPUs, can provide significant performance opportunities to a wide range of applications. However, the best overall performance can only be achieved if application tasks are efficiently assigned to different types of processor units in time taking into account their specific resource requirements. Additionally, one should note that available heterogeneous resources have been designed as general purpose units, however, with many built-in features accelerating specific application operations. In other words, the same algorithm or application functionality can be implemented as a different task for CPU or GPU. Nevertheless, from the perspective of various evaluation criteria, e.g. the total execution time or energy consumption, we may observe completely different results. Therefore, as tasks can be scheduled and managed in many alternative ways on both many-core CPUs or GPUs and consequently have a huge impact on the overall computing resources performance, there are needs for new and improved resource management techniques. In this paper we discuss results achieved during experimental performance studies of selected task scheduling methods in heterogeneous computing systems. Additionally, we present a new architecture for resource allocation and task scheduling library which provides a generic application programming interface at the operating system level for improving scheduling polices taking into account a diversity of tasks and heterogeneous computing resources characteristics.
Reliability Lessons Learned From GPU Experience With The Titan Supercomputer at Oak Ridge Leadership Computing Facility

Energy Technology Data Exchange (ETDEWEB)

Gallarno, George [Christian Brothers University; Rogers, James H [ORNL; Maxwell, Don E [ORNL

2015-01-01

The high computational capability of graphics processing units (GPUs) is enabling and driving the scientific discovery process at large-scale. The world s second fastest supercomputer for open science, Titan, has more than 18,000 GPUs that computational scientists use to perform scientific simu- lations and data analysis. Understanding of GPU reliability characteristics, however, is still in its nascent stage since GPUs have only recently been deployed at large-scale. This paper presents a detailed study of GPU errors and their impact on system operations and applications, describing experiences with the 18,688 GPUs on the Titan supercom- puter as well as lessons learned in the process of efficient operation of GPUs at scale. These experiences are helpful to HPC sites which already have large-scale GPU clusters or plan to deploy GPUs in the future.
Numerical simulation of air hypersonic flows with equilibrium chemical reactions

Science.gov (United States)

Emelyanov, Vladislav; Karpenko, Anton; Volkov, Konstantin

2018-05-01

The finite volume method is applied to solve unsteady three-dimensional compressible Navier-Stokes equations on unstructured meshes. High-temperature gas effects altering the aerodynamics of vehicles are taken into account. Possibilities of the use of graphics processor units (GPUs) for the simulation of hypersonic flows are demonstrated. Solutions of some test cases on GPUs are reported, and a comparison between computational results of equilibrium chemically reacting and perfect air flowfields is performed. Speedup of solution on GPUs with respect to the solution on central processor units (CPUs) is compared. The results obtained provide promising perspective for designing a GPU-based software framework for practical applications.
United abominations: Density functional studies of heavy metal chemistry

Energy Technology Data Exchange (ETDEWEB)

Schoendorff, George [Iowa State Univ., Ames, IA (United States)

2012-01-01

Carbonyl and nitrile addition to uranyl (UO₂²⁺) are studied. The competition between nitrile and water ligands in the formation of uranyl complexes is investigated. The possibility of hypercoordinated uranyl with acetone ligands is examined. Uranyl is studied with diactone alcohol ligands as a means to explain the apparent hypercoordinated uranyl. A discussion of the formation of mesityl oxide ligands is also included. A joint theory/experimental study of reactions of zwitterionic boratoiridium(I) complexes with oxazoline-based scorpionate ligands is reported. A computational study was done of the catalytic hydroamination/cyclization of aminoalkenes with zirconium-based catalysts. Techniques are surveyed for programming for graphical processing units (GPUs) using Fortran.

Accelerating Calculations of Reaction Dissipative Particle Dynamics in LAMMPS

Science.gov (United States)

2017-05-17

HPC) resources and exploit emerging, heterogeneous architectures (e.g., co-processors and graphics processing units [GPUs]), while enabling EM...2 ODE solvers—CVODE* and RKF45—which we previously developed for NVIDIA Compute Unified Device Architecture (CUDA) GPUs.9 The CPU versions of both...nodes. Half of the accelerator nodes (178) have 2 NVIDIA Kepler K40m GPUs and the remaining 178 accelerator nodes have 2 Intel Xeon Phi 7120P co
Calcul de Flux de Puissance amélioré grâce aux Processeurs Graphiques

OpenAIRE

Marin , Manuel

2015-01-01

This thesis addresses the utilization of Graphics Processing Units (GPUs) to improve the Power Flow (PF) analysis of modern power systems. GPUs are powerful vector co-processors that have been very useful in the acceleration of several computational intensive applications. PF analysis is the steady-state analysis of AC power networks and is widely used for several tasks involved in system operation and planning. Currently, GPUs are challenged by applications exhibiting an irregular computatio...
Turnkey offering a claimed sector 'first'.

Science.gov (United States)

Law, Oliver

2011-01-01

Manufacturer and supplier of LED theatre lights, HD camera systems, video integration technologies, and ceiling support units, Trumpf Medical Systems UK, and "logistical services" company Canute International Medical Services (CIMS), one of whose specialities is providing mobile medical units for diagnostic imaging, have entered into a partnership that will see the two companies offer fully fitted out modular operating theatres and other medical/clinical buildings incorporating the latest technology and equipment, on a fully project-managed, "turnkey" basis. Oliver Law, Trumpf Medical Systems UK managing director, explains the background, and the new service's anticipated customer benefits.
MPC Toolbox with GPU Accelerated Optimization Algorithms

DEFF Research Database (Denmark)

Gade-Nielsen, Nicolai Fog; Jørgensen, John Bagterp; Dammann, Bernd

2012-01-01

The introduction of Graphical Processing Units (GPUs) in scientific computing has shown great promise in many different fields. While GPUs are capable of very high floating point performance and memory bandwidth, its massively parallel architecture requires algorithms to be reimplemented to suit...
Proceedings of the GPU computing in high-energy physics conference 2014 GPUHEP2014

International Nuclear Information System (INIS)

Bonati, Claudio; D'Elia, Massimo; Lamanna, Gianluca; Sozzi, Marco

2015-06-01

The International Conference on GPUs in High-Energy Physics was held from September 10 to 12, 2014 at the University of Pisa, Italy. It represented a larger scale follow-up to a set of workshops which indicated the rising interest of the HEP community, experimentalists and theorists alike, towards the use of inexpensive and massively parallel computing devices, for very diverse purposes. The conference was organized in plenary sessions of invited and contributed talks, and poster presentations on the following topics: - GPUs in triggering applications - Low-level trigger systems based on GPUs - Use of GPUs in high-level trigger systems - GPUs in tracking and vertexing - Challenges for triggers in future HEP experiments - Reconstruction and Monte Carlo software on GPUs - Software frameworks and tools for GPU code integration - Hard real-time use of GPUs - Lattice QCD simulation - GPUs in phenomenology - GPUs for medical imaging purposes - GPUs in neutron and photon science - Massively parallel computations in HEP - Code parallelization. ''GPU computing in High-Energy Physics'' attracted 78 registrants to Pisa. The 38 oral presentations included talks on specific topics in experimental and theoretical applications of GPUs, as well as review talks on applications and technology. 5 posters were also presented, and were introduced by a short plenary oral illustration. A company exhibition was hosted on site. The conference consisted of 12 plenary sessions, together with a social program which included a banquet and guided excursions around Pisa. It was overall an enjoyable experience, offering an opportunity to share ideas and opinions, and getting updated on other participants' work in this emerging field, as well as being a valuable introduction for newcomers interested to learn more about the use of GPUs as accelerators for scientific progress on the elementary constituents of matter and energy.
Proceedings of the GPU computing in high-energy physics conference 2014 GPUHEP2014

Energy Technology Data Exchange (ETDEWEB)

Bonati, Claudio; D' Elia, Massimo; Lamanna, Gianluca; Sozzi, Marco (eds.)

2015-06-15

The International Conference on GPUs in High-Energy Physics was held from September 10 to 12, 2014 at the University of Pisa, Italy. It represented a larger scale follow-up to a set of workshops which indicated the rising interest of the HEP community, experimentalists and theorists alike, towards the use of inexpensive and massively parallel computing devices, for very diverse purposes. The conference was organized in plenary sessions of invited and contributed talks, and poster presentations on the following topics: - GPUs in triggering applications - Low-level trigger systems based on GPUs - Use of GPUs in high-level trigger systems - GPUs in tracking and vertexing - Challenges for triggers in future HEP experiments - Reconstruction and Monte Carlo software on GPUs - Software frameworks and tools for GPU code integration - Hard real-time use of GPUs - Lattice QCD simulation - GPUs in phenomenology - GPUs for medical imaging purposes - GPUs in neutron and photon science - Massively parallel computations in HEP - Code parallelization. ''GPU computing in High-Energy Physics'' attracted 78 registrants to Pisa. The 38 oral presentations included talks on specific topics in experimental and theoretical applications of GPUs, as well as review talks on applications and technology. 5 posters were also presented, and were introduced by a short plenary oral illustration. A company exhibition was hosted on site. The conference consisted of 12 plenary sessions, together with a social program which included a banquet and guided excursions around Pisa. It was overall an enjoyable experience, offering an opportunity to share ideas and opinions, and getting updated on other participants' work in this emerging field, as well as being a valuable introduction for newcomers interested to learn more about the use of GPUs as accelerators for scientific progress on the elementary constituents of matter and energy.
GPU-FS-kNN: a software tool for fast and scalable kNN computation using GPUs.

Directory of Open Access Journals (Sweden)

Ahmed Shamsul Arefin

Full Text Available BACKGROUND: The analysis of biological networks has become a major challenge due to the recent development of high-throughput techniques that are rapidly producing very large data sets. The exploding volumes of biological data are craving for extreme computational power and special computing facilities (i.e. super-computers. An inexpensive solution, such as General Purpose computation based on Graphics Processing Units (GPGPU, can be adapted to tackle this challenge, but the limitation of the device internal memory can pose a new problem of scalability. An efficient data and computational parallelism with partitioning is required to provide a fast and scalable solution to this problem. RESULTS: We propose an efficient parallel formulation of the k-Nearest Neighbour (kNN search problem, which is a popular method for classifying objects in several fields of research, such as pattern recognition, machine learning and bioinformatics. Being very simple and straightforward, the performance of the kNN search degrades dramatically for large data sets, since the task is computationally intensive. The proposed approach is not only fast but also scalable to large-scale instances. Based on our approach, we implemented a software tool GPU-FS-kNN (GPU-based Fast and Scalable k-Nearest Neighbour for CUDA enabled GPUs. The basic approach is simple and adaptable to other available GPU architectures. We observed speed-ups of 50-60 times compared with CPU implementation on a well-known breast microarray study and its associated data sets. CONCLUSION: Our GPU-based Fast and Scalable k-Nearest Neighbour search technique (GPU-FS-kNN provides a significant performance improvement for nearest neighbour computation in large-scale networks. Source code and the software tool is available under GNU Public License (GPL at https://sourceforge.net/p/gpufsknn/.
Development and Evaluation of Sterographic Display for Lung Cancer Screening

Science.gov (United States)

2008-12-01

burden. Application of GPUs – With the evolution of commodity graphics processing units (GPUs) for accelerating games on personal computers, over the...units, which are designed for rendering computer games , are readily available and can be programmed to perform the kinds of real-time calculations...575-581, 1994. 12. Anderson CM, Saloner D, Tsuruda JS, Shapeero LG, Lee RE. "Artifacts in maximun-intensity-projection display of MR angiograms
Performance of Point and Range Queries for In-memory Databases using Radix Trees on GPUs

Energy Technology Data Exchange (ETDEWEB)

Alam, Maksudul [ORNL; Yoginath, Srikanth B [ORNL; Perumalla, Kalyan S [ORNL

2016-01-01

In in-memory database systems augmented by hardware accelerators, accelerating the index searching operations can greatly increase the runtime performance of database queries. Recently, adaptive radix trees (ART) have been shown to provide very fast index search implementation on the CPU. Here, we focus on an accelerator-based implementation of ART. We present a detailed performance study of our GPU-based adaptive radix tree (GRT) implementation over a variety of key distributions, synthetic benchmarks, and actual keys from music and book data sets. The performance is also compared with other index-searching schemes on the GPU. GRT on modern GPUs achieves some of the highest rates of index searches reported in the literature. For point queries, a throughput of up to 106 million and 130 million lookups per second is achieved for sparse and dense keys, respectively. For range queries, GRT yields 600 million and 1000 million lookups per second for sparse and dense keys, respectively, on a large dataset of 64 million 32-bit keys.
Adaptive Optics Simulation for the World's Largest Telescope on Multicore Architectures with Multiple GPUs

KAUST Repository

Ltaief, Hatem

2016-06-02

We present a high performance comprehensive implementation of a multi-object adaptive optics (MOAO) simulation on multicore architectures with hardware accelerators in the context of computational astronomy. This implementation will be used as an operational testbed for simulating the de- sign of new instruments for the European Extremely Large Telescope project (E-ELT), the world\\'s biggest eye and one of Europe\\'s highest priorities in ground-based astronomy. The simulation corresponds to a multi-step multi-stage pro- cedure, which is fed, near real-time, by system and turbulence data coming from the telescope environment. Based on the PLASMA library powered by the OmpSs dynamic runtime system, our implementation relies on a task-based programming model to permit an asynchronous out-of-order execution. Using modern multicore architectures associated with the enormous computing power of GPUS, the resulting data-driven compute-intensive simulation of the entire MOAO application, composed of the tomographic reconstructor and the observing sequence, is capable of coping with the aforementioned real-time challenge and stands as a reference implementation for the computational astronomy community.
The ultimatum game: Discrete vs. continuous offers

Science.gov (United States)

Dishon-Berkovits, Miriam; Berkovits, Richard

2014-09-01

In many experimental setups in social-sciences, psychology and economy the subjects are requested to accept or dispense monetary compensation which is usually given in discrete units. Using computer and mathematical modeling we show that in the framework of studying the dynamics of acceptance of proposals in the ultimatum game, the long time dynamics of acceptance of offers in the game are completely different for discrete vs. continuous offers. For discrete values the dynamics follow an exponential behavior. However, for continuous offers the dynamics are described by a power-law. This is shown using an agent based computer simulation as well as by utilizing an analytical solution of a mean-field equation describing the model. These findings have implications to the design and interpretation of socio-economical experiments beyond the ultimatum game.
48 CFR 570.303-3 - Late offers, modifications of offers, and withdrawals of offers.

Science.gov (United States)

2010-10-01

... 48 Federal Acquisition Regulations System 4 2010-10-01 2010-10-01 false Late offers, modifications of offers, and withdrawals of offers. 570.303-3 Section 570.303-3 Federal Acquisition Regulations... PROPERTY Contracting Procedures for Leasehold Interests in Real Property 570.303-3 Late offers...
Price-Taker Offering Strategy in Electricity Pay-as-Bid Markets

DEFF Research Database (Denmark)

Mazzi, Nicolò; Kazempour, Jalal; Pinson, Pierre

2017-01-01

The recent increase in the deployment of renewable energy sources may affect the offering strategy of conventional producers, mainly in the balancing market. The topics of optimal offering strategy and self-scheduling of thermal units have been extensively addressed in the literature. The feasible...... operating region of such units can be modeled using a mixed-integer linear programming approach, and the trading problem as a linear programming problem. However, the existing models mostly assume a uniform pricing scheme in all market stages, while several European balancing markets (e.g., in Germany...... and Italy) are settled under a pay-as-bid pricing scheme. The existing tools for solving the trading problem in pay-as-bid electricity markets rely on non-linear optimization models, which, combined with the unit commitment constraints, result in a mixed-integer non-linear programming problem. In contrast...
Three-directional motion-compensation mask-based novel look-up table on graphics processing units for video-rate generation of digital holographic videos of three-dimensional scenes.

Science.gov (United States)

Kwon, Min-Woo; Kim, Seung-Cheol; Kim, Eun-Soo

2016-01-20

A three-directional motion-compensation mask-based novel look-up table method is proposed and implemented on graphics processing units (GPUs) for video-rate generation of digital holographic videos of three-dimensional (3D) scenes. Since the proposed method is designed to be well matched with the software and memory structures of GPUs, the number of compute-unified-device-architecture kernel function calls can be significantly reduced. This results in a great increase of the computational speed of the proposed method, allowing video-rate generation of the computer-generated hologram (CGH) patterns of 3D scenes. Experimental results reveal that the proposed method can generate 39.8 frames of Fresnel CGH patterns with 1920×1080 pixels per second for the test 3D video scenario with 12,088 object points on dual GPU boards of NVIDIA GTX TITANs, and they confirm the feasibility of the proposed method in the practical application fields of electroholographic 3D displays.
Smoldyn on graphics processing units: massively parallel Brownian dynamics simulations.

Science.gov (United States)

Dematté, Lorenzo

2012-01-01

Space is a very important aspect in the simulation of biochemical systems; recently, the need for simulation algorithms able to cope with space is becoming more and more compelling. Complex and detailed models of biochemical systems need to deal with the movement of single molecules and particles, taking into consideration localized fluctuations, transportation phenomena, and diffusion. A common drawback of spatial models lies in their complexity: models can become very large, and their simulation could be time consuming, especially if we want to capture the systems behavior in a reliable way using stochastic methods in conjunction with a high spatial resolution. In order to deliver the promise done by systems biology to be able to understand a system as whole, we need to scale up the size of models we are able to simulate, moving from sequential to parallel simulation algorithms. In this paper, we analyze Smoldyn, a widely diffused algorithm for stochastic simulation of chemical reactions with spatial resolution and single molecule detail, and we propose an alternative, innovative implementation that exploits the parallelism of Graphics Processing Units (GPUs). The implementation executes the most computational demanding steps (computation of diffusion, unimolecular, and bimolecular reaction, as well as the most common cases of molecule-surface interaction) on the GPU, computing them in parallel on each molecule of the system. The implementation offers good speed-ups and real time, high quality graphics output
MHD code using multi graphical processing units: SMAUG+

Science.gov (United States)

Gyenge, N.; Griffiths, M. K.; Erdélyi, R.

2018-01-01

This paper introduces the Sheffield Magnetohydrodynamics Algorithm Using GPUs (SMAUG+), an advanced numerical code for solving magnetohydrodynamic (MHD) problems, using multi-GPU systems. Multi-GPU systems facilitate the development of accelerated codes and enable us to investigate larger model sizes and/or more detailed computational domain resolutions. This is a significant advancement over the parent single-GPU MHD code, SMAUG (Griffiths et al., 2015). Here, we demonstrate the validity of the SMAUG + code, describe the parallelisation techniques and investigate performance benchmarks. The initial configuration of the Orszag-Tang vortex simulations are distributed among 4, 16, 64 and 100 GPUs. Furthermore, different simulation box resolutions are applied: 1000 × 1000, 2044 × 2044, 4000 × 4000 and 8000 × 8000 . We also tested the code with the Brio-Wu shock tube simulations with model size of 800 employing up to 10 GPUs. Based on the test results, we observed speed ups and slow downs, depending on the granularity and the communication overhead of certain parallel tasks. The main aim of the code development is to provide massively parallel code without the memory limitation of a single GPU. By using our code, the applied model size could be significantly increased. We demonstrate that we are able to successfully compute numerically valid and large 2D MHD problems.
The ATLAS Trigger Algorithms for General Purpose Graphics Processor Units

CERN Document Server

Tavares Delgado, Ademar; The ATLAS collaboration

2016-01-01

The ATLAS Trigger Algorithms for General Purpose Graphics Processor Units Type: Talk Abstract: We present the ATLAS Trigger algorithms developed to exploit General Purpose Graphics Processor Units. ATLAS is a particle physics experiment located on the LHC collider at CERN. The ATLAS Trigger system has two levels, hardware-based Level 1 and the High Level Trigger implemented in software running on a farm of commodity CPU. Performing the trigger event selection within the available farm resources presents a significant challenge that will increase future LHC upgrades. are being evaluated as a potential solution for trigger algorithms acceleration. Key factors determining the potential benefit of this new technology are the relative execution speedup, the number of GPUs required and the relative financial cost of the selected GPU. We have developed a trigger demonstrator which includes algorithms for reconstructing tracks in the Inner Detector and Muon Spectrometer and clusters of energy deposited in the Cal...
Real time 3D structural and Doppler OCT imaging on graphics processing units

Science.gov (United States)

Sylwestrzak, Marcin; Szlag, Daniel; Szkulmowski, Maciej; Gorczyńska, Iwona; Bukowska, Danuta; Wojtkowski, Maciej; Targowski, Piotr

2013-03-01

In this report the application of graphics processing unit (GPU) programming for real-time 3D Fourier domain Optical Coherence Tomography (FdOCT) imaging with implementation of Doppler algorithms for visualization of the flows in capillary vessels is presented. Generally, the time of the data processing of the FdOCT data on the main processor of the computer (CPU) constitute a main limitation for real-time imaging. Employing additional algorithms, such as Doppler OCT analysis, makes this processing even more time consuming. Lately developed GPUs, which offers a very high computational power, give a solution to this problem. Taking advantages of them for massively parallel data processing, allow for real-time imaging in FdOCT. The presented software for structural and Doppler OCT allow for the whole processing with visualization of 2D data consisting of 2000 A-scans generated from 2048 pixels spectra with frame rate about 120 fps. The 3D imaging in the same mode of the volume data build of 220 × 100 A-scans is performed at a rate of about 8 frames per second. In this paper a software architecture, organization of the threads and optimization applied is shown. For illustration the screen shots recorded during real time imaging of the phantom (homogeneous water solution of Intralipid in glass capillary) and the human eye in-vivo is presented.
Performance evaluation of H.264/AVC decoding and visualization using the GPU

OpenAIRE

Pieters, Bart; Van Rijsselbergen, Dieter; De Neve, Wesley; Van de Walle, Rik

2007-01-01

The coding efficiency of the H.264/AVC standard makes the decoding process computationally demanding. This has limited the availability of cost-effective, high-performance solutions. Modern computers are typically equipped with powerful yet cost-effective Graphics Processing Units (GPUs) to accelerate graphics operations. These GPUs can be addressed by means of a 3-D graphics API such as Microsoft Direct3D or OpenGL, using programmable shaders as generic processing units for vector data. The ...
GPU-Accelerated Stony-Brook University 5-class Microphysics Scheme in WRF

Science.gov (United States)

Mielikainen, J.; Huang, B.; Huang, A.

2011-12-01

The Weather Research and Forecasting (WRF) model is a next-generation mesoscale numerical weather prediction system. Microphysics plays an important role in weather and climate prediction. Several bulk water microphysics schemes are available within the WRF, with different numbers of simulated hydrometeor classes and methods for estimating their size fall speeds, distributions and densities. Stony-Brook University scheme (SBU-YLIN) is a 5-class scheme with riming intensity predicted to account for mixed-phase processes. In the past few years, co-processing on Graphics Processing Units (GPUs) has been a disruptive technology in High Performance Computing (HPC). GPUs use the ever increasing transistor count for adding more processor cores. Therefore, GPUs are well suited for massively data parallel processing with high floating point arithmetic intensity. Thus, it is imperative to update legacy scientific applications to take advantage of this unprecedented increase in computing power. CUDA is an extension to the C programming language offering programming GPU's directly. It is designed so that its constructs allow for natural expression of data-level parallelism. A CUDA program is organized into two parts: a serial program running on the CPU and a CUDA kernel running on the GPU. The CUDA code consists of three computational phases: transmission of data into the global memory of the GPU, execution of the CUDA kernel, and transmission of results from the GPU into the memory of CPU. CUDA takes a bottom-up point of view of parallelism is which thread is an atomic unit of parallelism. Individual threads are part of groups called warps, within which every thread executes exactly the same sequence of instructions. To test SBU-YLIN, we used a CONtinental United States (CONUS) benchmark data set for 12 km resolution domain for October 24, 2001. A WRF domain is a geographic region of interest discretized into a 2-dimensional grid parallel to the ground. Each grid point has

Dentists' self-perceived role in offering tobacco cessation services: results from a nationally representative survey, United States, 2010-2011.

Science.gov (United States)

Jannat-Khah, Deanna P; McNeely, Jennifer; Pereyra, Margaret R; Parish, Carrigan; Pollack, Harold A; Ostroff, Jamie; Metsch, Lisa; Shelley, Donna R

2014-11-06

Dental visits represent an opportunity to identify and help patients quit smoking, yet dental settings remain an untapped venue for treatment of tobacco dependence. The purpose of this analysis was to assess factors that may influence patterns of tobacco-use-related practice among a national sample of dental providers. We surveyed a representative sample of general dentists practicing in the United States (N = 1,802). Multivariable analysis was used to assess correlates of adherence to tobacco use treatment guidelines and to analyze factors that influence providers' willingness to offer tobacco cessation assistance if reimbursed for this service. More than 90% of dental providers reported that they routinely ask patients about tobacco use, 76% counsel patients, and 45% routinely offer cessation assistance, defined as referring patients for cessation counseling, providing a cessation prescription, or both. Results from multivariable analysis indicated that cessation assistance was associated with having a practice with 1 or more hygienists, having a chart system that includes a tobacco use question, having received training on treating tobacco dependence, and having positive attitudes toward treating tobacco use. Providers who did not offer assistance but who reported that they would change their practice patterns if sufficiently reimbursed were more likely to be in a group practice, treat patients insured through Medicaid, and have positive attitudes toward treating tobacco dependence. Findings indicate the potential benefit of increasing training opportunities and promoting system changes to increase involvement of dental providers in conducting tobacco use treatment. Reimbursement models should be tested to assess the effect on dental provider practice patterns.
An evaluation of the potential of GPUs to accelerate tracking algorithms for the ATLAS trigger

CERN Document Server

Baines, JTM; The ATLAS collaboration; Emeliyanov, D; Howard, JR; Kama, S; Washbrook, AJ; Wynne, BM

2014-01-01

The potential of GPUs has been evaluated as a possible way to accelerate trigger algorithms for the ATLAS experiment located at the Large Hadron Collider (LHC). During LHC Run-1 ATLAS employed a three-level trigger system to progressively reduce the LHC collision rate of 20 MHz to a storage rate of about 600 Hz for offline processing. Reconstruction of charged particles trajectories through the Inner Detector (ID) was performed at the second (L2) and third (EF) trigger levels. The ID contains pixel, silicon strip (SCT) and straw-tube technologies. Prior to tracking, data-preparation algorithms processed the ID raw data producing measurements of the track position at each detector layer. The data-preparation and tracking consumed almost three-quarters of the total L2 CPU resources during 2012 data-taking. Detailed performance studies of a CUDA™ implementation of the L2 pixel and SCT data-preparation and tracking algorithms running on a Nvidia® Tesla C2050 GPU have shown a speed-up by a factor of 12 for the ...
Tinker-OpenMM: Absolute and relative alchemical free energies using AMOEBA on GPUs.

Science.gov (United States)

Harger, Matthew; Li, Daniel; Wang, Zhi; Dalby, Kevin; Lagardère, Louis; Piquemal, Jean-Philip; Ponder, Jay; Ren, Pengyu

2017-09-05

The capabilities of the polarizable force fields for alchemical free energy calculations have been limited by the high computational cost and complexity of the underlying potential energy functions. In this work, we present a GPU-based general alchemical free energy simulation platform for polarizable potential AMOEBA. Tinker-OpenMM, the OpenMM implementation of the AMOEBA simulation engine has been modified to enable both absolute and relative alchemical simulations on GPUs, which leads to a ∼200-fold improvement in simulation speed over a single CPU core. We show that free energy values calculated using this platform agree with the results of Tinker simulations for the hydration of organic compounds and binding of host-guest systems within the statistical errors. In addition to absolute binding, we designed a relative alchemical approach for computing relative binding affinities of ligands to the same host, where a special path was applied to avoid numerical instability due to polarization between the different ligands that bind to the same site. This scheme is general and does not require ligands to have similar scaffolds. We show that relative hydration and binding free energy calculated using this approach match those computed from the absolute free energy approach. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Optimization Solutions for Improving the Performance of the Parallel Reduction Algorithm Using Graphics Processing Units

Directory of Open Access Journals (Sweden)

Ion LUNGU

2012-01-01

Full Text Available In this paper, we research, analyze and develop optimization solutions for the parallel reduction function using graphics processing units (GPUs that implement the Compute Unified Device Architecture (CUDA, a modern and novel approach for improving the software performance of data processing applications and algorithms. Many of these applications and algorithms make use of the reduction function in their computational steps. After having designed the function and its algorithmic steps in CUDA, we have progressively developed and implemented optimization solutions for the reduction function. In order to confirm, test and evaluate the solutions' efficiency, we have developed a custom tailored benchmark suite. We have analyzed the obtained experimental results regarding: the comparison of the execution time and bandwidth when using graphic processing units covering the main CUDA architectures (Tesla GT200, Fermi GF100, Kepler GK104 and a central processing unit; the data type influence; the binary operator's influence.
Acceleration of spiking neural network based pattern recognition on NVIDIA graphics processors.

Science.gov (United States)

Han, Bing; Taha, Tarek M

2010-04-01

There is currently a strong push in the research community to develop biological scale implementations of neuron based vision models. Systems at this scale are computationally demanding and generally utilize more accurate neuron models, such as the Izhikevich and the Hodgkin-Huxley models, in favor of the more popular integrate and fire model. We examine the feasibility of using graphics processing units (GPUs) to accelerate a spiking neural network based character recognition network to enable such large scale systems. Two versions of the network utilizing the Izhikevich and Hodgkin-Huxley models are implemented. Three NVIDIA general-purpose (GP) GPU platforms are examined, including the GeForce 9800 GX2, the Tesla C1060, and the Tesla S1070. Our results show that the GPGPUs can provide significant speedup over conventional processors. In particular, the fastest GPGPU utilized, the Tesla S1070, provided a speedup of 5.6 and 84.4 over highly optimized implementations on the fastest central processing unit (CPU) tested, a quadcore 2.67 GHz Xeon processor, for the Izhikevich and the Hodgkin-Huxley models, respectively. The CPU implementation utilized all four cores and the vector data parallelism offered by the processor. The results indicate that GPUs are well suited for this application domain.
User embracement and maternal characteristics associated with liquid offer to infants.

Science.gov (United States)

Niquini, Roberta Pereira; Bittencourt, Sonia Azevedo; Lacerda, Elisa Maria de Aquino; Couto de Oliveira, Maria Inês; Leal, Maria do Carmo

2010-08-01

To identify the maternal characteristics and welcoming actions towards mothers of infants aged less than six months associated with early liquid offer. Cross-sectional study performed in 2007, with a representative sample of mothers of infants aged less than six months (n=1,057), users of Primary Health Care (PHC) Units, in the city of Rio de Janeiro, Southeastern Brazil. A multivariate logistic regression model was used to estimate the association between explanatory variables and liquid offer, with weighing and design effect and controlled for infant age. Of all mothers, 32% did not receive the welcoming card in the maternity hospital, 47% did not receive guidance on breastfeeding at their first visit to the PHC unit after childbirth and 55% reported they had offered liquids to their infants. Women without at least six months of previous breastfeeding experience were more likely to offer liquids than those with such experience (OR=1.57; 95% CI: 1.16;2.13). Mothers who had not received guidance on breastfeeding at their first visit to the UBS after childbirth were 58% more likely to offer liquids than those who had received it. Liquid offer was positively associated with adolescence among women with a partner (OR=2.17; 95% CI: 1.10;4.30) and negatively associated with adolescence among those without a partner (OR=0.31; 95% CI: 0.11;0.85). Among women with less than eight years of education, those who had not received guidance on breastfeeding after childbirth were 1.8 times more likely to offer liquids than others who had received it. Age, marital status and previous breastfeeding experience are maternal characteristics associated with liquid offer to infants aged less than six months. Receiving early guidance on breastfeeding could reduce liquid offer to infants.
FAST CALCULATION OF THE LOMB-SCARGLE PERIODOGRAM USING GRAPHICS PROCESSING UNITS

International Nuclear Information System (INIS)

Townsend, R. H. D.

2010-01-01

I introduce a new code for fast calculation of the Lomb-Scargle periodogram that leverages the computing power of graphics processing units (GPUs). After establishing a background to the newly emergent field of GPU computing, I discuss the code design and narrate key parts of its source. Benchmarking calculations indicate no significant differences in accuracy compared to an equivalent CPU-based code. However, the differences in performance are pronounced; running on a low-end GPU, the code can match eight CPU cores, and on a high-end GPU it is faster by a factor approaching 30. Applications of the code include analysis of long photometric time series obtained by ongoing satellite missions and upcoming ground-based monitoring facilities, and Monte Carlo simulation of periodogram statistical properties.
GAMER: A GRAPHIC PROCESSING UNIT ACCELERATED ADAPTIVE-MESH-REFINEMENT CODE FOR ASTROPHYSICS

International Nuclear Information System (INIS)

Schive, H.-Y.; Tsai, Y.-C.; Chiueh Tzihong

2010-01-01

We present the newly developed code, GPU-accelerated Adaptive-MEsh-Refinement code (GAMER), which adopts a novel approach in improving the performance of adaptive-mesh-refinement (AMR) astrophysical simulations by a large factor with the use of the graphic processing unit (GPU). The AMR implementation is based on a hierarchy of grid patches with an oct-tree data structure. We adopt a three-dimensional relaxing total variation diminishing scheme for the hydrodynamic solver and a multi-level relaxation scheme for the Poisson solver. Both solvers have been implemented in GPU, by which hundreds of patches can be advanced in parallel. The computational overhead associated with the data transfer between the CPU and GPU is carefully reduced by utilizing the capability of asynchronous memory copies in GPU, and the computing time of the ghost-zone values for each patch is diminished by overlapping it with the GPU computations. We demonstrate the accuracy of the code by performing several standard test problems in astrophysics. GAMER is a parallel code that can be run in a multi-GPU cluster system. We measure the performance of the code by performing purely baryonic cosmological simulations in different hardware implementations, in which detailed timing analyses provide comparison between the computations with and without GPU(s) acceleration. Maximum speed-up factors of 12.19 and 10.47 are demonstrated using one GPU with 4096 3 effective resolution and 16 GPUs with 8192 3 effective resolution, respectively.
Partial wave analysis using graphics processing units

Energy Technology Data Exchange (ETDEWEB)

Berger, Niklaus; Liu Beijiang; Wang Jike, E-mail: nberger@ihep.ac.c [Institute of High Energy Physics, Chinese Academy of Sciences, 19B Yuquan Lu, Shijingshan, 100049 Beijing (China)

2010-04-01

Partial wave analysis is an important tool for determining resonance properties in hadron spectroscopy. For large data samples however, the un-binned likelihood fits employed are computationally very expensive. At the Beijing Spectrometer (BES) III experiment, an increase in statistics compared to earlier experiments of up to two orders of magnitude is expected. In order to allow for a timely analysis of these datasets, additional computing power with short turnover times has to be made available. It turns out that graphics processing units (GPUs) originally developed for 3D computer games have an architecture of massively parallel single instruction multiple data floating point units that is almost ideally suited for the algorithms employed in partial wave analysis. We have implemented a framework for tensor manipulation and partial wave fits called GPUPWA. The user writes a program in pure C++ whilst the GPUPWA classes handle computations on the GPU, memory transfers, caching and other technical details. In conjunction with a recent graphics processor, the framework provides a speed-up of the partial wave fit by more than two orders of magnitude compared to legacy FORTRAN code.
Multi–GPU Implementation of Machine Learning Algorithm using CUDA and OpenCL

Directory of Open Access Journals (Sweden)

Jan Masek

2016-06-01

Full Text Available Using modern Graphic Processing Units (GPUs becomes very useful for computing complex and time consuming processes. GPUs provide high–performance computation capabilities with a good price. This paper deals with a multi–GPU OpenCL and CUDA implementations of k–Nearest Neighbor (k–NN algorithm. This work compares performances of OpenCLand CUDA implementations where each of them is suitable for different number of used attributes. The proposed CUDA algorithm achieves acceleration up to 880x in comparison witha single thread CPU version. The common k-NN was modified to be faster when the lower number of k neighbors is set. The performance of algorithm was verified with two GPUs dual-core NVIDIA GeForce GTX 690 and CPU Intel Core i7 3770 with 4.1 GHz frequency. The results of speed up were measured for one GPU, two GPUs, three and four GPUs. We performed several tests with data sets containing up to 4 million elements with various number of attributes.
Offers

CERN Multimedia

Staff Association

2012-01-01

L'Occitane en Provence proposes the following offer: 10 % discount on all products in all L'Occitane shops in Metropolitan France upon presentation of your Staff Association membership card and a valid ID. This offer is valid only for one person, is non-transferable and cannot be combined with other promotions.
Quasi-real-time end-to-end simulations of ELT-scale adaptive optics systems on GPUs

Science.gov (United States)

Gratadour, Damien

2011-09-01

Our team has started the development of a code dedicated to GPUs for the simulation of AO systems at the E-ELT scale. It uses the CUDA toolkit and an original binding to Yorick (an open source interpreted language) to provide the user with a comprehensive interface. In this paper we present the first performance analysis of our simulation code, showing its ability to provide Shack-Hartmann (SH) images and measurements at the kHz scale for VLT-sized AO system and in quasi-real-time (up to 70 Hz) for ELT-sized systems on a single top-end GPU. The simulation code includes multiple layers atmospheric turbulence generation, ray tracing through these layers, image formation at the focal plane of every sub-apertures of a SH sensor using either natural or laser guide stars and centroiding on these images using various algorithms. Turbulence is generated on-the-fly giving the ability to simulate hours of observations without the need of loading extremely large phase screens in the global memory. Because of its performance this code additionally provides the unique ability to test real-time controllers for future AO systems under nominal conditions.
GPU-accelerated micromagnetic simulations using cloud computing

Energy Technology Data Exchange (ETDEWEB)

Jermain, C.L., E-mail: clj72@cornell.edu [Cornell University, Ithaca, NY 14853 (United States); Rowlands, G.E.; Buhrman, R.A. [Cornell University, Ithaca, NY 14853 (United States); Ralph, D.C. [Cornell University, Ithaca, NY 14853 (United States); Kavli Institute at Cornell, Ithaca, NY 14853 (United States)

2016-03-01

Highly parallel graphics processing units (GPUs) can improve the speed of micromagnetic simulations significantly as compared to conventional computing using central processing units (CPUs). We present a strategy for performing GPU-accelerated micromagnetic simulations by utilizing cost-effective GPU access offered by cloud computing services with an open-source Python-based program for running the MuMax3 micromagnetics code remotely. We analyze the scaling and cost benefits of using cloud computing for micromagnetics. - Highlights: • The benefits of cloud computing for GPU-accelerated micromagnetics are examined. • We present the MuCloud software for running simulations on cloud computing. • Simulation run times are measured to benchmark cloud computing performance. • Comparison benchmarks are analyzed between CPU and GPU based solvers.
GPU-accelerated micromagnetic simulations using cloud computing

International Nuclear Information System (INIS)

Jermain, C.L.; Rowlands, G.E.; Buhrman, R.A.; Ralph, D.C.

2016-01-01

Highly parallel graphics processing units (GPUs) can improve the speed of micromagnetic simulations significantly as compared to conventional computing using central processing units (CPUs). We present a strategy for performing GPU-accelerated micromagnetic simulations by utilizing cost-effective GPU access offered by cloud computing services with an open-source Python-based program for running the MuMax3 micromagnetics code remotely. We analyze the scaling and cost benefits of using cloud computing for micromagnetics. - Highlights: • The benefits of cloud computing for GPU-accelerated micromagnetics are examined. • We present the MuCloud software for running simulations on cloud computing. • Simulation run times are measured to benchmark cloud computing performance. • Comparison benchmarks are analyzed between CPU and GPU based solvers.
7 CFR 1494.501 - Submission of offers to CCC.

Science.gov (United States)

2010-01-01

..., DEPARTMENT OF AGRICULTURE LOANS, PURCHASES, AND OTHER OPERATIONS EXPORT BONUS PROGRAMS Export Enhancement... contract unit price, delivery terms (e.g., FOB, C&F, etc.); the nature of any arrangements or... CCC bonus; (B) The intention to submit an offer; or (C) The methods or factors used to calculate the...
Offers

CERN Multimedia

Staff Association

2011-01-01

Special offers for our members Go Sport in Val Thoiry is offering 15% discount on all purchases made in the shop upon presentation of the Staff Association membership card (excluding promotions, sale items and bargain corner, and excluding purchases using Go Sport and Kadéos gift cards. Only one discount can be applied to each purchase).
Graphics processing units in bioinformatics, computational biology and systems biology.

Science.gov (United States)

Nobile, Marco S; Cazzaniga, Paolo; Tangherloni, Andrea; Besozzi, Daniela

2017-09-01

Several studies in Bioinformatics, Computational Biology and Systems Biology rely on the definition of physico-chemical or mathematical models of biological systems at different scales and levels of complexity, ranging from the interaction of atoms in single molecules up to genome-wide interaction networks. Traditional computational methods and software tools developed in these research fields share a common trait: they can be computationally demanding on Central Processing Units (CPUs), therefore limiting their applicability in many circumstances. To overcome this issue, general-purpose Graphics Processing Units (GPUs) are gaining an increasing attention by the scientific community, as they can considerably reduce the running time required by standard CPU-based software, and allow more intensive investigations of biological systems. In this review, we present a collection of GPU tools recently developed to perform computational analyses in life science disciplines, emphasizing the advantages and the drawbacks in the use of these parallel architectures. The complete list of GPU-powered tools here reviewed is available at http://bit.ly/gputools. © The Author 2016. Published by Oxford University Press.
Fast TPC Online Tracking on GPUs and Asynchronous Data Processing in the ALICE HLT to facilitate Online Calibration

International Nuclear Information System (INIS)

Rohr, David; Gorbunov, Sergey; Krzewicki, Mikolaj; Breitner, Timo; Kretz, Matthias; Lindenstruth, Volker

2015-01-01

ALICE (A Large Heavy Ion Experiment) is one of the four major experiments at the Large Hadron Collider (LHC) at CERN, which is today the most powerful particle accelerator worldwide. The High Level Trigger (HLT) is an online compute farm of about 200 nodes, which reconstructs events measured by the ALICE detector in real-time. The HLT uses a custom online data-transport framework to distribute data and workload among the compute nodes.ALICE employs several calibration-sensitive subdetectors, e.g. the TPC (Time Projection Chamber). For a precise reconstruction, the HLT has to perform the calibration online. Online- calibration can make certain Offline calibration steps obsolete and can thus speed up Offline analysis. Looking forward to ALICE Run III starting in 2020, online calibration becomes a necessity.The main detector used for track reconstruction is the TPC. Reconstructing the trajectories in the TPC is the most compute-intense step during event reconstruction. Therefore, a fast tracking implementation is of great importance. Reconstructed TPC tracks build the basis for the calibration making a fast online-tracking mandatory.We present several components developed for the ALICE High Level Trigger to perform fast event reconstruction and to provide features required for online calibration.As first topic, we present our TPC tracker, which employs GPUs to speed up the processing, and which bases on a Cellular Automaton and on the Kalman filter. Our TPC tracking algorithm has been successfully used in 2011 and 2012 in the lead-lead and the proton-lead runs. We have improved it to leverage features of newer GPUs and we have ported it to support OpenCL, CUDA, and CPUs with a single common source code. This makes us vendor independent.As second topic, we present framework extensions required for online calibration. The extensions, however, are generic and can be used for other purposes as well. We have extended the framework to support asynchronous compute chains
Offers

CERN Multimedia

Staff Association

2014-01-01

New offers : Discover the theater Galpon in Geneva. The Staff Association is happy to offer to its members a discount of 8.00 CHF on a full-price ticket (tickets of 15.00 CHF instead of 22.00 CHF) so do not hesitate anymore (mandatory reservation by phone + 4122 321 21 76 as tickets are quickly sold out!). For further information, please see our website: http://staff-association.web.cern.ch/fr/content/th%C3%A9%C3%A2tre-du-galpon
75 FR 52459 - Regulations Governing Agencies for Issue of United States Savings Bonds; Offering of United...

Science.gov (United States)

2010-08-26

... eliminate the option to purchase paper savings bonds through payroll deductions for United States government... savings bonds purchased through payroll sales; individuals will still be able to purchase paper savings bonds at financial institutions for themselves and as gifts. Payroll savers will be encouraged to...

Special offers

CERN Multimedia

Staff Association

2011-01-01

Are you a member of the Staff Association? Did you know that as a member you can benefit from the following special offers: BCGE (Banque Cantonale de Genève): personalized banking solutions with preferential conditions. TPG: reduced rates on annual transport passes for active and retired staff. Aquaparc: reduced ticket prices for children and adults at this Swiss waterpark in Le Bouveret. FNAC: 5% reduction on FNAC vouchers. For more information about all these offers, please consult our web site: http://association.web.cern.ch/association/en/OtherActivities/Offers.html
Electricity distribution. Price control, reliability and customer services: response to OFFER

International Nuclear Information System (INIS)

1994-02-01

This document presents the views of the National Consumer Council to a recent consultation paper from OFFER, the body responsible for regulation of the United Kingdom electric power industry. The financial performance of the Regional Electricity Companies (RECs) is reviewed by examining how it relates to the prices paid by domestic consumers. A critical analysis is presented of OFFER's notion of the revision of the existing price control mechanism for the distribution businesses within the RECs. Standards of performance, debt and consumer disconnection are also examined. (UK)
Performance comparison of OpenCL and CUDA by benchmarking an optimized perspective backprojection

Energy Technology Data Exchange (ETDEWEB)

Swall, Stefan; Ritschl, Ludwig; Knaup, Michael; Kachelriess, Marc [Erlangen-Nuernberg Univ., Erlangen (Germany). Inst. of Medical Physics (IMP)

2011-07-01

The increase in performance of Graphical Processing Units (GPUs) and the onward development of dedicated software tools within the last decade allows to transfer performance-demanding computations from the Central Processing Unit (CPU) to the GPU and to speed up certain tasks by utilizing the massiv parallel architecture of these devices. The Computate Unified Device Architecture (CUDA) developed by NVIDIA provides an easy hence effective way to develop application that target NVIDIA GPUs. It has become one of the cardinal software tools for this purpose. Recently the Open Computing Language (OpenCL) became available that is neither vendor-specific nor limited to GPUs only. As the benefits of CUDA-based image reconstruction are well known we aim at providing a comparison between the performance that can be achieved with CUDA in comparison to OpenCL by benchmarking the time required to perform a simple but computationally demanding task: the perspective backprojection. (orig.)
The "Earth Physics" Workshops Offered by the Earth Science Education Unit

Science.gov (United States)

Davies, Stephen

2012-01-01

Earth science has a part to play in broadening students' learning experience in physics. The Earth Science Education Unit presents a range of (free) workshops to teachers and trainee teachers, suggesting how Earth-based science activities, which show how we understand and use the planet we live on, can easily be slotted into normal science…
Special Offers

CERN Multimedia

Association du personnel

2011-01-01

Walibi Rhône-Alpes is open until 31 October. Reduced prices for children and adults at this French attraction park in Les Avenières. For more information about all these offers, please consult our web site: http://association.web.cern.ch/association/en/OtherActivities/Offers.html
Strategic Genco offers in electric energy markets cleared by merit order

Science.gov (United States)

Hasan, Ebrahim A. Rahman

In an electricity market cleared by merit-order economic dispatch we identify necessary and sufficient conditions under which the market outcomes supported by pure strategy Nash equilibria (NE) exist when generating companies (Gencos) game through continuously variable incremental cost (IC) block offers. A Genco may own any number of units, each unit having multiple blocks with each block being offered at a constant IC. Next, a mixed-integer linear programming (MILP) scheme devoid of approximations or iterations is developed to identify all possible NE. The MILP scheme is systematic and general but computationally demanding for large systems. Thus, an alternative significantly faster lambda-iterative approach that does not require the use of MILP was also developed. Once all NE are found, one critical question is to identify the one whose corresponding gaming strategy may be considered by all Gencos as being the most rational. To answer this, this thesis proposes the use of a measure based on the potential profit gain and loss by each Genco for each NE. The most rational offer strategy for each Genco in terms of gaming or not gaming that best meets their risk/benefit expectations is the one corresponding to the NE with the largest gain to loss ratio. The computation of all NE is tested on several systems of up to ninety generating units, each with four incremental cost blocks. These NE are then used to examine how market power is influenced by market parameters, specifically, the number of competing Gencos, their size and true ICs, as well as the level of demand and price cap.
Special Offers

CERN Multimedia

Association du personnel

2011-01-01

Are you a member of the Staff Association? Did you know that as a member you can benefit from the following special offers: BCGE (Banque Cantonale de Genève): personalized banking solutions with preferential conditions. TPG: reduced rates on annual transport passes for active and retired staff. Aquaparc: reduced ticket prices for children and adults at this Swiss waterpark in Le Bouveret. Walibi: reduced prices for children and adults at this French attraction park in Les Avenières. FNAC: 5% reduction on FNAC vouchers. For more information about all these offers, please consult our web site: http://association.web.cern.ch/association/en/OtherActivities/Offers.html
Innovative gas offers

International Nuclear Information System (INIS)

Sala, O.; Mela, P.; Chatelain, F.

2007-01-01

New energy offers are progressively made available as the opening of gas market to competition becomes broader. How are organized the combined offers: gas, electricity, renewable energies and energy services? What are the marketing strategies implemented? Three participants at this round table present their offer and answer these questions. (J.S.)
Offer

CERN Multimedia

Staff Association

2016-01-01

CERN was selected and participated in the ranking "Best Employers" organized by the magazine Bilan. To thank CERN for its collaboration, the magazine offers a reduction to the subscription fee for all employed members of personnel. 25% off the annual subscription: CHF 149.25 instead of CHF 199 .— The subscription includes the magazine delivered to your home for a year, every other Wednesday, as well as special editions and access to the e-paper. To benefit from this offer, simply fill out the form provided for this purpose. To get the form, please contact the secretariat of the Staff Association (Staff.Association@cern.ch).
Accelerating VASP electronic structure calculations using graphic processing units

KAUST Repository

Hacene, Mohamed

2012-08-20

We present a way to improve the performance of the electronic structure Vienna Ab initio Simulation Package (VASP) program. We show that high-performance computers equipped with graphics processing units (GPUs) as accelerators may reduce drastically the computation time when offloading these sections to the graphic chips. The procedure consists of (i) profiling the performance of the code to isolate the time-consuming parts, (ii) rewriting these so that the algorithms become better-suited for the chosen graphic accelerator, and (iii) optimizing memory traffic between the host computer and the GPU accelerator. We chose to accelerate VASP with NVIDIA GPU using CUDA. We compare the GPU and original versions of VASP by evaluating the Davidson and RMM-DIIS algorithms on chemical systems of up to 1100 atoms. In these tests, the total time is reduced by a factor between 3 and 8 when running on n (CPU core + GPU) compared to n CPU cores only, without any accuracy loss. © 2012 Wiley Periodicals, Inc.
Accelerating VASP electronic structure calculations using graphic processing units

KAUST Repository

Hacene, Mohamed; Anciaux-Sedrakian, Ani; Rozanska, Xavier; Klahr, Diego; Guignon, Thomas; Fleurat-Lessard, Paul

2012-01-01

We present a way to improve the performance of the electronic structure Vienna Ab initio Simulation Package (VASP) program. We show that high-performance computers equipped with graphics processing units (GPUs) as accelerators may reduce drastically the computation time when offloading these sections to the graphic chips. The procedure consists of (i) profiling the performance of the code to isolate the time-consuming parts, (ii) rewriting these so that the algorithms become better-suited for the chosen graphic accelerator, and (iii) optimizing memory traffic between the host computer and the GPU accelerator. We chose to accelerate VASP with NVIDIA GPU using CUDA. We compare the GPU and original versions of VASP by evaluating the Davidson and RMM-DIIS algorithms on chemical systems of up to 1100 atoms. In these tests, the total time is reduced by a factor between 3 and 8 when running on n (CPU core + GPU) compared to n CPU cores only, without any accuracy loss. © 2012 Wiley Periodicals, Inc.
Multi-GPU based acceleration of a list-mode DRAMA toward real-time OpenPET imaging

Energy Technology Data Exchange (ETDEWEB)

Kinouchi, Shoko [Chiba Univ. (Japan); National Institute of Radiological Sciences, Chiba (Japan); Yamaya, Taiga; Yoshida, Eiji; Tashima, Hideaki [National Institute of Radiological Sciences, Chiba (Japan); Kudo, Hiroyuki [Tsukuba Univ., Ibaraki (Japan); Suga, Mikio [Chiba Univ. (Japan)

2011-07-01

OpenPET, which has a physical gap between two detector rings, is our new PET geometry. In order to realize future radiation therapy guided by OpenPET, real-time imaging is required. Therefore we developed a list-mode image reconstruction method using general purpose graphic processing units (GPUs). For GPU implementation, the efficiency of acceleration depends on the implementation method which is required to avoid conditional statements. Therefore, in our previous study, we developed a new system model which was suited for the GPU implementation. In this paper, we implemented our image reconstruction method using 4 GPUs to get further acceleration. We applied the developed reconstruction method to a small OpenPET prototype. We obtained calculation times of total iteration using 4 GPUs that were 3.4 times faster than using a single GPU. Compared to using a single CPU, we achieved the reconstruction time speed-up of 142 times using 4 GPUs. (orig.)
Parallelizing the QUDA Library for Multi-GPU Calculations in Lattice Quantum Chromodynamics

Energy Technology Data Exchange (ETDEWEB)

Ronald Babich, Michael Clark, Balint Joo

2010-11-01

Graphics Processing Units (GPUs) are having a transformational effect on numerical lattice quantum chromodynamics (LQCD) calculations of importance in nuclear and particle physics. The QUDA library provides a package of mixed precision sparse matrix linear solvers for LQCD applications, supporting single GPUs based on NVIDIA's Compute Unified Device Architecture (CUDA). This library, interfaced to the QDP++/Chroma framework for LQCD calculations, is currently in production use on the "9g" cluster at the Jefferson Laboratory, enabling unprecedented price/performance for a range of problems in LQCD. Nevertheless, memory constraints on current GPU devices limit the problem sizes that can be tackled. In this contribution we describe the parallelization of the QUDA library onto multiple GPUs using MPI, including strategies for the overlapping of communication and computation. We report on both weak and strong scaling for up to 32 GPUs interconnected by InfiniBand, on which we sustain in excess of 4 Tflops.
Parallelizing the QUDA Library for Multi-GPU Calculations in Lattice Quantum Chromodynamics

International Nuclear Information System (INIS)

Babich, Ronald; Clark, Michael; Joo, Balint

2010-01-01

Graphics Processing Units (GPUs) are having a transformational effect on numerical lattice quantum chromodynamics (LQCD) calculations of importance in nuclear and particle physics. The QUDA library provides a package of mixed precision sparse matrix linear solvers for LQCD applications, supporting single GPUs based on NVIDIA's Compute Unified Device Architecture (CUDA). This library, interfaced to the QDP++/Chroma framework for LQCD calculations, is currently in production use on the '9g' cluster at the Jefferson Laboratory, enabling unprecedented price/performance for a range of problems in LQCD. Nevertheless, memory constraints on current GPU devices limit the problem sizes that can be tackled. In this contribution we describe the parallelization of the QUDA library onto multiple GPUs using MPI, including strategies for the overlapping of communication and computation. We report on both weak and strong scaling for up to 32 GPUs interconnected by InfiniBand, on which we sustain in excess of 4 Tflops.
Area-delay trade-offs of texture decompressors for a graphics processing unit

Science.gov (United States)

Novoa Súñer, Emilio; Ituero, Pablo; López-Vallejo, Marisa

2011-05-01

Graphics Processing Units have become a booster for the microelectronics industry. However, due to intellectual property issues, there is a serious lack of information on implementation details of the hardware architecture that is behind GPUs. For instance, the way texture is handled and decompressed in a GPU to reduce bandwidth usage has never been dealt with in depth from a hardware point of view. This work addresses a comparative study on the hardware implementation of different texture decompression algorithms for both conventional (PCs and video game consoles) and mobile platforms. Circuit synthesis is performed targeting both a reconfigurable hardware platform and a 90nm standard cell library. Area-delay trade-offs have been extensively analyzed, which allows us to compare the complexity of decompressors and thus determine suitability of algorithms for systems with limited hardware resources.
Special Offers

CERN Multimedia

Association du personnel

2011-01-01

Are you a member of the Staff Association? Did you know that as a member you can benefit from the following special offers: BCGE (Banque Cantonale de Genève): personalized banking solutions with preferential conditions. TPG: reduced rates on annual transport passes for active and retired staff. Aquaparc: reduced ticket prices for children and adults at this Swiss waterpark in Le Bouveret. Walibi: reduced prices for children and adults at this French attraction park in Les Avenières. FNAC: 5% reduction on FNAC vouchers. For more information about all these offers, please consult our web site: http://association.web.cern.ch/association/en/OtherActivities/Offers.html
Special Offers

CERN Multimedia

Staff Association

2011-01-01

Are you a member of the Staff Association? Did you know that as a member you can benefit from the following special offers: BCGE (Banque Cantonale de Genève): personalized banking solutions with preferential conditions. TPG: reduced rates on annual transport passes for all active and retired staff. Aquaparc: reduced ticket prices for children and adults at this Swiss waterpark in Le Bouveret. Walibi: reduced prices for children and adults at this French attraction park in Les Avenières. FNAC: 5% reduction on FNAC vouchers. For more information about all these offers, please consult our web site: http://association.web.cern.ch/association/en/OtherActivities/Offers.html
Embedded-Based Graphics Processing Unit Cluster Platform for Multiple Sequence Alignments

Directory of Open Access Journals (Sweden)

Jyh-Da Wei

2017-08-01

Full Text Available High-end graphics processing units (GPUs, such as NVIDIA Tesla/Fermi/Kepler series cards with thousands of cores per chip, are widely applied to high-performance computing fields in a decade. These desktop GPU cards should be installed in personal computers/servers with desktop CPUs, and the cost and power consumption of constructing a GPU cluster platform are very high. In recent years, NVIDIA releases an embedded board, called Jetson Tegra K1 (TK1, which contains 4 ARM Cortex-A15 CPUs and 192 Compute Unified Device Architecture cores (belong to Kepler GPUs. Jetson Tegra K1 has several advantages, such as the low cost, low power consumption, and high applicability, and it has been applied into several specific applications. In our previous work, a bioinformatics platform with a single TK1 (STK platform was constructed, and this previous work is also used to prove that the Web and mobile services can be implemented in the STK platform with a good cost-performance ratio by comparing a STK platform with the desktop CPU and GPU. In this work, an embedded-based GPU cluster platform will be constructed with multiple TK1s (MTK platform. Complex system installation and setup are necessary procedures at first. Then, 2 job assignment modes are designed for the MTK platform to provide services for users. Finally, ClustalW v2.0.11 and ClustalWtk will be ported to the MTK platform. The experimental results showed that the speedup ratios achieved 5.5 and 4.8 times for ClustalW v2.0.11 and ClustalWtk, respectively, by comparing 6 TK1s with a single TK1. The MTK platform is proven to be useful for multiple sequence alignments.
Embedded-Based Graphics Processing Unit Cluster Platform for Multiple Sequence Alignments.

Science.gov (United States)

Wei, Jyh-Da; Cheng, Hui-Jun; Lin, Chun-Yuan; Ye, Jin; Yeh, Kuan-Yu

2017-01-01

High-end graphics processing units (GPUs), such as NVIDIA Tesla/Fermi/Kepler series cards with thousands of cores per chip, are widely applied to high-performance computing fields in a decade. These desktop GPU cards should be installed in personal computers/servers with desktop CPUs, and the cost and power consumption of constructing a GPU cluster platform are very high. In recent years, NVIDIA releases an embedded board, called Jetson Tegra K1 (TK1), which contains 4 ARM Cortex-A15 CPUs and 192 Compute Unified Device Architecture cores (belong to Kepler GPUs). Jetson Tegra K1 has several advantages, such as the low cost, low power consumption, and high applicability, and it has been applied into several specific applications. In our previous work, a bioinformatics platform with a single TK1 (STK platform) was constructed, and this previous work is also used to prove that the Web and mobile services can be implemented in the STK platform with a good cost-performance ratio by comparing a STK platform with the desktop CPU and GPU. In this work, an embedded-based GPU cluster platform will be constructed with multiple TK1s (MTK platform). Complex system installation and setup are necessary procedures at first. Then, 2 job assignment modes are designed for the MTK platform to provide services for users. Finally, ClustalW v2.0.11 and ClustalWtk will be ported to the MTK platform. The experimental results showed that the speedup ratios achieved 5.5 and 4.8 times for ClustalW v2.0.11 and ClustalWtk, respectively, by comparing 6 TK1s with a single TK1. The MTK platform is proven to be useful for multiple sequence alignments.
Real-time computation of parameter fitting and image reconstruction using graphical processing units

Science.gov (United States)

Locans, Uldis; Adelmann, Andreas; Suter, Andreas; Fischer, Jannis; Lustermann, Werner; Dissertori, Günther; Wang, Qiulin

2017-06-01

In recent years graphical processing units (GPUs) have become a powerful tool in scientific computing. Their potential to speed up highly parallel applications brings the power of high performance computing to a wider range of users. However, programming these devices and integrating their use in existing applications is still a challenging task. In this paper we examined the potential of GPUs for two different applications. The first application, created at Paul Scherrer Institut (PSI), is used for parameter fitting during data analysis of μSR (muon spin rotation, relaxation and resonance) experiments. The second application, developed at ETH, is used for PET (Positron Emission Tomography) image reconstruction and analysis. Applications currently in use were examined to identify parts of the algorithms in need of optimization. Efficient GPU kernels were created in order to allow applications to use a GPU, to speed up the previously identified parts. Benchmarking tests were performed in order to measure the achieved speedup. During this work, we focused on single GPU systems to show that real time data analysis of these problems can be achieved without the need for large computing clusters. The results show that the currently used application for parameter fitting, which uses OpenMP to parallelize calculations over multiple CPU cores, can be accelerated around 40 times through the use of a GPU. The speedup may vary depending on the size and complexity of the problem. For PET image analysis, the obtained speedups of the GPU version were more than × 40 larger compared to a single core CPU implementation. The achieved results show that it is possible to improve the execution time by orders of magnitude.

Efficient computation of k-Nearest Neighbour Graphs for large high-dimensional data sets on GPU clusters.

Directory of Open Access Journals (Sweden)

Ali Dashti

Full Text Available This paper presents an implementation of the brute-force exact k-Nearest Neighbor Graph (k-NNG construction for ultra-large high-dimensional data cloud. The proposed method uses Graphics Processing Units (GPUs and is scalable with multi-levels of parallelism (between nodes of a cluster, between different GPUs on a single node, and within a GPU. The method is applicable to homogeneous computing clusters with a varying number of nodes and GPUs per node. We achieve a 6-fold speedup in data processing as compared with an optimized method running on a cluster of CPUs and bring a hitherto impossible [Formula: see text]-NNG generation for a dataset of twenty million images with 15 k dimensionality into the realm of practical possibility.
Distributed terascale volume visualization using distributed shared virtual memory

KAUST Repository

Beyer, Johanna

2011-10-01

Table 1 illustrates the impact of different distribution unit sizes, different screen resolutions, and numbers of GPU nodes. We use two and four GPUs (NVIDIA Quadro 5000 with 2.5 GB memory) and a mouse cortex EM dataset (see Figure 2) of resolution 21,494 x 25,790 x 1,850 = 955GB. The size of the virtual distribution units significantly influences the data distribution between nodes. Small distribution units result in a high depth complexity for compositing. Large distribution units lead to a low utilization of GPUs, because in the worst case only a single distribution unit will be in view, which is rendered by only a single node. The choice of an optimal distribution unit size depends on three major factors: the output screen resolution, the block cache size on each node, and the number of nodes. Currently, we are working on optimizing the compositing step and network communication between nodes. © 2011 IEEE.
2nd INNS Conference on Big Data

CERN Document Server

Manolopoulos, Yannis; Iliadis, Lazaros; Roy, Asim; Vellasco, Marley

2017-01-01

The book offers a timely snapshot of neural network technologies as a significant component of big data analytics platforms. It promotes new advances and research directions in efficient and innovative algorithmic approaches to analyzing big data (e.g. deep networks, nature-inspired and brain-inspired algorithms); implementations on different computing platforms (e.g. neuromorphic, graphics processing units (GPUs), clouds, clusters); and big data analytics applications to solve real-world problems (e.g. weather prediction, transportation, energy management). The book, which reports on the second edition of the INNS Conference on Big Data, held on October 23–25, 2016, in Thessaloniki, Greece, depicts an interesting collaborative adventure of neural networks with big data and other learning technologies.
GPU Computing Gems Emerald Edition

CERN Document Server

Hwu, Wen-mei W

2011-01-01

".the perfect companion to Programming Massively Parallel Processors by Hwu & Kirk." -Nicolas Pinto, Research Scientist at Harvard & MIT, NVIDIA Fellow 2009-2010 Graphics processing units (GPUs) can do much more than render graphics. Scientists and researchers increasingly look to GPUs to improve the efficiency and performance of computationally-intensive experiments across a range of disciplines. GPU Computing Gems: Emerald Edition brings their techniques to you, showcasing GPU-based solutions including: Black hole simulations with CUDA GPU-accelerated computation and interactive display of
Accelerated Numerical Processing API Based on GPU Technology, Phase II

Data.gov (United States)

National Aeronautics and Space Administration — The recent performance increases in graphics processing units (GPUs) have made graphics cards an attractive platform for implementing computationally intense...
Accelerated Numerical Processing API Based on GPU Technology, Phase I

Data.gov (United States)

National Aeronautics and Space Administration — The recent performance increases in graphics processing units (GPUs) have made graphics cards an attractive platform for implementing computationally intense...
Communication dated 25 June 2008 received from the Resident Representative of the United Kingdom to the Agency concerning a letter and offer of 12 June 2008 delivered to the Islamic Republic of Iran

International Nuclear Information System (INIS)

2008-01-01

The Director General has received a communication dated 25 June 2008 from the Resident Representative of the United Kingdom, on behalf of the Resident Representatives of China, France, Germany, the Russian Federation, the United Kingdom and the United States of America, and the Secretary General and High Representative of the European Union, attaching the text of a letter and offer of 12 June 2008 delivered to the authorities of the Islamic Republic of Iran by Mr. Javier Solana, together with a summary of remarks made by Mr. Solana on 14 June 2008. The communication and, as requested therein, its attachments, are herewith circulated for information
Offers

CERN Document Server

Staff Association

2015-01-01

New offer for our members. The Staff Association CERN staff has recently concluded a framework agreement with AXA Insurance Ltd, General-Guisan-Strasse 40, 8401 Winterthur. This contract allows you to benefit from a preferential tariff and conditions for insurances: Motor vehicles for passenger cars and motorcycles of the product line STRADA: 10% discount Household insurance (personal liability and household contents) the product line BOX: 10% discount Travel insurance: 10% discount Buildings: 10% discount Legal protection: 10% discount AXA is number one on the Swiss insurance market. The product range encompasses all non-life insurance such as insurance of persons, property, civil liability, vehicles, credit and travel as well as innovative and comprehensive solutions in the field of occupational benefits insurance for individuals and businesses. Finally, the affiliate AXA-ARAG (legal expenses insurance) completes the offer. Armed with your staff association CERN card, you can always get the off...
Swan: A tool for porting CUDA programs to OpenCL

Science.gov (United States)

Harvey, M. J.; De Fabritiis, G.

2011-04-01

The use of modern, high-performance graphical processing units (GPUs) for acceleration of scientific computation has been widely reported. The majority of this work has used the CUDA programming model supported exclusively by GPUs manufactured by NVIDIA. An industry standardisation effort has recently produced the OpenCL specification for GPU programming. This offers the benefits of hardware-independence and reduced dependence on proprietary tool-chains. Here we describe a source-to-source translation tool, "Swan" for facilitating the conversion of an existing CUDA code to use the OpenCL model, as a means to aid programmers experienced with CUDA in evaluating OpenCL and alternative hardware. While the performance of equivalent OpenCL and CUDA code on fixed hardware should be comparable, we find that a real-world CUDA application ported to OpenCL exhibits an overall 50% increase in runtime, a reduction in performance attributable to the immaturity of contemporary compilers. The ported application is shown to have platform independence, running on both NVIDIA and AMD GPUs without modification. We conclude that OpenCL is a viable platform for developing portable GPU applications but that the more mature CUDA tools continue to provide best performance. Program summaryProgram title: Swan Catalogue identifier: AEIH_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEIH_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU Public License version 2 No. of lines in distributed program, including test data, etc.: 17 736 No. of bytes in distributed program, including test data, etc.: 131 177 Distribution format: tar.gz Programming language: C Computer: PC Operating system: Linux RAM: 256 Mbytes Classification: 6.5 External routines: NVIDIA CUDA, OpenCL Nature of problem: Graphical Processing Units (GPUs) from NVIDIA are preferentially programed with the proprietary CUDA programming toolkit. An
Offers

CERN Multimedia

Staff Association

2012-01-01

proposes the following offer: 15% discount for the Staff Association members who enroll their children in summer FUTUREKIDS activities. Extracurricular Activities For Your Children The FUTUREKIDS Geneva Learning Center is open 6 days a week and offers a selection of after-school extracurricular activities for children and teenagers (ages 5 to 16). In addition to teaching in its Learning Centers, Futurekids collaborates with many private schools in Suisse Romande (Florimont, Moser, Champittet, Ecole Nouvelle, etc.) and with the Département de l'Instruction Publique (DIP) Genève. Courses and camps are usually in French but English groups can be set up on demand. FUTUREKIDS Computer Camps (during school holidays) FUTUREKIDS Computer Camps are a way of having a great time during vacations while learning something useful, possibly discovering a new hobby or even, why not, a future profession. Our computer camps are at the forefront of technology. Themes are diverse and suit all ...
Association of provider recommendation and offer and influenza vaccination among adults aged ≥18 years - United States.

Science.gov (United States)

Lu, Peng-Jun; Srivastav, Anup; Amaya, Ashley; Dever, Jill A; Roycroft, Jessica; Kurtz, Marshica Stanley; O'Halloran, Alissa; Williams, Walter W

2018-02-01

Influenza vaccination has been recommended for all persons aged ≥6 months since 2010. Data from the 2016 National Internet Flu Survey were analyzed to assess provider vaccination recommendations and early influenza vaccination during the 2016-17 season among adults aged ≥18 years. Predictive marginals from a multivariable logistic regression model were used to identify factors independently associated with early vaccine uptake by provider vaccination recommendation status. Overall, 24.0% visited a provider who both recommended and offered influenza vaccination, 9.0% visited a provider who only recommended but did not offer, 25.1% visited a provider who neither recommended nor offered, and 41.9% did not visit a doctor from July 1 through date of interview. Adults who reported that a provider both recommended and offered vaccine had significantly higher vaccination coverage (66.6%) compared with those who reported that a provider only recommended but did not offer (48.4%), those who neither received recommendation nor offer (32.0%), and those who did not visit a doctor during the vaccination period (28.8%). Results of multivariable logistic regression indicated that having received a provider recommendation, with or without an offer for vaccination, was significantly associated with higher vaccination coverage after controlling for demographic and access-to-care factors. Provider recommendation was significantly associated with influenza vaccination. However, overall, 67.0% of adults did not visit a doctor during the vaccination period or did visit a doctor but did not receive a provider recommendation. Evidence-based strategies such as client reminder/recall, standing orders, provider reminders, or health systems interventions in combination should be undertaken to improve provider recommendation and influenza vaccination coverage. Other factors significantly associated with a higher level of influenza vaccination included age ≥50 years, being Hispanic
THE COMPARATIVE ANALYSIS REGARDING THE SERVICES OFFERED BY THE INTERNATIONAL HOTEL CHAINS FROM ROMANIA

OpenAIRE

CRISTINA FLESERIU

2010-01-01

The hotel services are a part from the tourism services, or can be defined as an independent offer. In the second place, the most important thing is the need of accommodation (probably also food and other additional services – conference room, recreational facilities, etc.) from those that are traveling for business or with some other personal problems. As follow, in Romania, all the hospitality business units must offer a range of additional services, with or without pay. Due to the fact tha...
Prevalence, serotype diversity, and antimicrobial resistance of Salmonella in imported shipments of spice offered for entry to the United States, FY2007-FY2009.

Science.gov (United States)

Van Doren, Jane M; Kleinmeier, Daria; Hammack, Thomas S; Westerman, Ann

2013-06-01

In response to increased concerns about spice safety, the U.S. FDA initiated research to characterize the prevalence of Salmonella in imported spices. Shipments of imported spices offered for entry to the United Sates were sampled during the fiscal years 2007-2009. The mean shipment prevalence for Salmonella was 0.066 (95% CI 0.057-0.076). A wide diversity of Salmonella serotypes was isolated from spices; no single serotype constituted more than 7% of the isolates. A small percentage of spice shipments were contaminated with antimicrobial-resistant Salmonella strains (8.3%). Trends in shipment prevalence for Salmonella associated with spice properties, extent of processing, and export country, were examined. A larger proportion of shipments of spices derived from fruit/seeds or leaves of plants were contaminated than those derived from the bark/flower of spice plants. Salmonella prevalence was larger for shipments of ground/cracked capsicum and coriander than for shipments of their whole spice counterparts. No difference in prevalence was observed between shipments of spice blends and non-blended spices. Some shipments reported to have been subjected to a pathogen reduction treatment prior to being offered for U.S. entry were found contaminated. Statistical differences in Salmonella shipment prevalence were also identified on the basis of export country. Published by Elsevier Ltd.
Offers

CERN Multimedia

Staff Association

2015-01-01

New season 2015-2016 The new season was revealed in May, and was warmly welcomed by the press, which is especially enthusiastic about the exceptional arrival of Fanny Ardand in September in the framework of Cassandre show. Discover the programme 2015-2016. The theatre La Comédie proposes different offers to our members Benefit from a reduction of 20 % on a full price ticket during all the season: from 38 CHF to 23 CHF ticket instead of 50 CHF to 30 CHF depending on the show. Buy two seasonal tickets at the price of one (offers valid upon availability, and until 30 september 2015) 2 Cards Libertà for 240 CHF instead of 480 CHF. Cruise freely through the season with 8 perfomances of your choice per season. These cards are transferrable, and can be shared with one or more accompanying persons. 2 Abo Piccolo for 120 CHF instead of 240 CHF. Let yourself be surprised a theatre performance with our discovery seasonal tickets, which includes 4 flagship perfomances for the season. ...
Offers

CERN Multimedia

Staff Association

2013-01-01

SPECIAL OFFER FOR OUR MEMBERS Prices Spring and Summer 2013 Day ticket: same price weekends, public holidays and weekdays: Children from 5 to 15 years old: 30 CHF instead of 39 CHF Adults from 16 years old: 36 CHF instead of 49 CHF Bonus! Free for children under 5 Tickets available at the Staff Association Secretariat.
Offers

CERN Multimedia

Association du personnel

2013-01-01

SPECIAL OFFER FOR OUR MEMBERS Prices Spring and Summer 2013 Day ticket: same price weekends, public holidays and weekdays: – Children from 5 to 15 years old: 30 CHF instead of 39 CHF – Adults from 16 years old: 36 CHF instead of 49 CHF – Bonus! Free for children under 5 Tickets available at the Staff Association Secretariat.
Fast Shepard interpolation on graphics processing units: potential energy surfaces and dynamics for H + CH4 → H2 + CH3.

Science.gov (United States)

Welsch, Ralph; Manthe, Uwe

2013-04-28

A strategy for the fast evaluation of Shepard interpolated potential energy surfaces (PESs) utilizing graphics processing units (GPUs) is presented. Speed ups of several orders of magnitude are gained for the title reaction on the ZFWCZ PES [Y. Zhou, B. Fu, C. Wang, M. A. Collins, and D. H. Zhang, J. Chem. Phys. 134, 064323 (2011)]. Thermal rate constants are calculated employing the quantum transition state concept and the multi-layer multi-configurational time-dependent Hartree approach. Results for the ZFWCZ PES are compared to rate constants obtained for other ab initio PESs and problems are discussed. A revised PES is presented. Thermal rate constants obtained for the revised PES indicate that an accurate description of the anharmonicity around the transition state is crucial.
Offers

CERN Multimedia

Staff Association

2012-01-01

SPECIAL OFFER FOR OUR MEMBERS Single tariff Adulte/Enfant Tickets “Zone terrestre” 20 euros instead of 25 euros. Access to Aqualibi: 5 euros instead of 8 euros on presentation of your ticket SA member. Free for children under 3, with limited access to the attractions. More information on our website : http://association.web.cern.ch/association/en/OtherActivities/Walibi.html
12 CFR 563g.21 - Filing of copies of offering circulars in certain exempt offerings.

Science.gov (United States)

2010-01-01

... 12 Banks and Banking 5 2010-01-01 2010-01-01 false Filing of copies of offering circulars in certain exempt offerings. 563g.21 Section 563g.21 Banks and Banking OFFICE OF THRIFT SUPERVISION, DEPARTMENT OF THE TREASURY SECURITIES OFFERINGS § 563g.21 Filing of copies of offering circulars in certain...
Offers

CERN Multimedia

Staff Association

2012-01-01

SPECIAL OFFER FOR OUR MEMBERS Prices Spring and Summer 2012 Half-day ticket: 5 hours, same price weekends, public holidays and weekdays. Children from 5 to 15 years old: 26 CHF instead of 35 CHF Adults from 16 years old: 32 CHF instead of 43 CHF Bonus! Free for children under 5. Aquaparc Les Caraïbes sur Léman 1807 Le Bouveret (VS)

ELASTIC CLOUD COMPUTING ARCHITECTURE AND SYSTEM FOR HETEROGENEOUS SPATIOTEMPORAL COMPUTING

Directory of Open Access Journals (Sweden)

X. Shi

2017-10-01

Full Text Available Spatiotemporal computation implements a variety of different algorithms. When big data are involved, desktop computer or standalone application may not be able to complete the computation task due to limited memory and computing power. Now that a variety of hardware accelerators and computing platforms are available to improve the performance of geocomputation, different algorithms may have different behavior on different computing infrastructure and platforms. Some are perfect for implementation on a cluster of graphics processing units (GPUs, while GPUs may not be useful on certain kind of spatiotemporal computation. This is the same situation in utilizing a cluster of Intel's many-integrated-core (MIC or Xeon Phi, as well as Hadoop or Spark platforms, to handle big spatiotemporal data. Furthermore, considering the energy efficiency requirement in general computation, Field Programmable Gate Array (FPGA may be a better solution for better energy efficiency when the performance of computation could be similar or better than GPUs and MICs. It is expected that an elastic cloud computing architecture and system that integrates all of GPUs, MICs, and FPGAs could be developed and deployed to support spatiotemporal computing over heterogeneous data types and computational problems.
Elastic Cloud Computing Architecture and System for Heterogeneous Spatiotemporal Computing

Science.gov (United States)

Shi, X.

2017-10-01

Spatiotemporal computation implements a variety of different algorithms. When big data are involved, desktop computer or standalone application may not be able to complete the computation task due to limited memory and computing power. Now that a variety of hardware accelerators and computing platforms are available to improve the performance of geocomputation, different algorithms may have different behavior on different computing infrastructure and platforms. Some are perfect for implementation on a cluster of graphics processing units (GPUs), while GPUs may not be useful on certain kind of spatiotemporal computation. This is the same situation in utilizing a cluster of Intel's many-integrated-core (MIC) or Xeon Phi, as well as Hadoop or Spark platforms, to handle big spatiotemporal data. Furthermore, considering the energy efficiency requirement in general computation, Field Programmable Gate Array (FPGA) may be a better solution for better energy efficiency when the performance of computation could be similar or better than GPUs and MICs. It is expected that an elastic cloud computing architecture and system that integrates all of GPUs, MICs, and FPGAs could be developed and deployed to support spatiotemporal computing over heterogeneous data types and computational problems.
Graphics Processors in HEP Low-Level Trigger Systems

International Nuclear Information System (INIS)

Ammendola, Roberto; Biagioni, Andrea; Chiozzi, Stefano; Ramusino, Angelo Cotta; Cretaro, Paolo; Lorenzo, Stefano Di; Fantechi, Riccardo; Fiorini, Massimiliano; Frezza, Ottorino; Lamanna, Gianluca; Cicero, Francesca Lo; Lonardo, Alessandro; Martinelli, Michele; Neri, Ilaria; Paolucci, Pier Stanislao; Pastorelli, Elena; Piandani, Roberto; Pontisso, Luca; Rossetti, Davide; Simula, Francesco; Sozzi, Marco; Vicini, Piero

2016-01-01

Usage of Graphics Processing Units (GPUs) in the so called general-purpose computing is emerging as an effective approach in several fields of science, although so far applications have been employing GPUs typically for offline computations. Taking into account the steady performance increase of GPU architectures in terms of computing power and I/O capacity, the real-time applications of these devices can thrive in high-energy physics data acquisition and trigger systems. We will examine the use of online parallel computing on GPUs for the synchronous low-level trigger, focusing on tests performed on the trigger system of the CERN NA62 experiment. To successfully integrate GPUs in such an online environment, latencies of all components need analysing, networking being the most critical. To keep it under control, we envisioned NaNet, an FPGA-based PCIe Network Interface Card (NIC) enabling GPUDirect connection. Furthermore, it is assessed how specific trigger algorithms can be parallelized and thus benefit from a GPU implementation, in terms of increased execution speed. Such improvements are particularly relevant for the foreseen Large Hadron Collider (LHC) luminosity upgrade where highly selective algorithms will be essential to maintain sustainable trigger rates with very high pileup
GPUs for fast pattern matching in the RICH of the NA62 experiment

International Nuclear Information System (INIS)

Lamanna, Gianluca; Collazuol, Gianmaria; Sozzi, Marco

2011-01-01

In rare decays experiments an effective online selection is a fundamental part of the data acquisition system (DAQ) in order to reduce both the quantity of data written on tape and the bandwidth requirements for the DAQ system. A multilevel architecture is commonly used to achieve a higher reduction factor, exploiting dedicated custom hardware and flexible software in standard computers. In this paper we discuss the possibility to use commercial video card processors (GPU) to build a fast and effective trigger system, both at hardware and software level. The computing power of the GPUs allows to design a real-time system in which trigger decisions are taken directly in the video processor with a defined maximum latency. This allows building lowest trigger levels based on standard off-the-shelf PCs with CPU and GPU (instead of the commonly adopted solutions based on custom electronics with FPGA or ASICs) with enhanced and high performance computation capabilities, resulting in high rejection power, high efficiency and simpler low level triggers. The ongoing work presented here shows the results achieved in the case of fast pattern matching in the RICH detector of the NA62 at CERN, aiming at measuring the Branching Ratio of the ultra rare decay K + →π + νν-bar, is considered as use case, although the versatility and the customizability of this approach easily allow exporting the concept to different contexts. In particular the application is related to particle identification in the RICH detector of the NA62 experiment, where the rate of events to be analyzed will be around 10 MHz. The results obtained in lab tests are very encouraging to go towards a working prototype. Due to the use of off-the-shelf technology, in continuous development for other purposes (Video Games, image editing,...), the architecture described would be easily exported into other experiments, for building powerful, flexible and fully customizable trigger systems.
GPUs for fast pattern matching in the RICH of the NA62 experiment

Energy Technology Data Exchange (ETDEWEB)

Lamanna, Gianluca, E-mail: gianluca.lamanna@cern.c [CERN, 1211 Geneve 23 (Switzerland); Collazuol, Gianmaria, E-mail: gianmaria.collazuol@cern.c [INFN Pisa, Largo Pontecorvo 3, 56127 Pisa (Italy); Sozzi, Marco, E-mail: marco.sozzi@cern.c [University and INFN Pisa, Largo Pontecorvo 3, 56127 Pisa (Italy)

2011-05-21

In rare decays experiments an effective online selection is a fundamental part of the data acquisition system (DAQ) in order to reduce both the quantity of data written on tape and the bandwidth requirements for the DAQ system. A multilevel architecture is commonly used to achieve a higher reduction factor, exploiting dedicated custom hardware and flexible software in standard computers. In this paper we discuss the possibility to use commercial video card processors (GPU) to build a fast and effective trigger system, both at hardware and software level. The computing power of the GPUs allows to design a real-time system in which trigger decisions are taken directly in the video processor with a defined maximum latency. This allows building lowest trigger levels based on standard off-the-shelf PCs with CPU and GPU (instead of the commonly adopted solutions based on custom electronics with FPGA or ASICs) with enhanced and high performance computation capabilities, resulting in high rejection power, high efficiency and simpler low level triggers. The ongoing work presented here shows the results achieved in the case of fast pattern matching in the RICH detector of the NA62 at CERN, aiming at measuring the Branching Ratio of the ultra rare decay K{sup +}{yields}{pi}{sup +}{nu}{nu}-bar, is considered as use case, although the versatility and the customizability of this approach easily allow exporting the concept to different contexts. In particular the application is related to particle identification in the RICH detector of the NA62 experiment, where the rate of events to be analyzed will be around 10 MHz. The results obtained in lab tests are very encouraging to go towards a working prototype. Due to the use of off-the-shelf technology, in continuous development for other purposes (Video Games, image editing,...), the architecture described would be easily exported into other experiments, for building powerful, flexible and fully customizable trigger systems.
GRAPHICS PROCESSING UNITS: MORE THAN THE PATHWAY TO REALISTIC VIDEO-GAMES

Directory of Open Access Journals (Sweden)

CARLOS TRUJILLO

2011-01-01

Full Text Available El amplio mercado de los juegos de video ha impulsado un acelerado progreso del hardware y software orientado a lograr ambientes de juego de mayor realidad. Entre estos desarrollos se cuentan las unidades de procesamiento gráfico (GPU, cuyo objetivo es liberar la unidad de procesamiento principal (CPU de los elaborados cómputos que proporcionan "vida" a los juegos de video. Para lograrlo, las GPUs son equipadas con múltiples núcleos de procesamiento operando en paralelo, lo cual permite utilizarlas en tareas mucho más diversas que el desarrollo de juegos de video. En este artículo se presenta una breve descripción de las características de compute unified device architecture (CUDA TM, una arquitectura de cómputo paralelo en GPUs. Se presenta una aplicación de esta arquitectura en la reconstrucción numérica de hologramas, para la cual se reporta una aceleración de 11X con respecto al desempeño alcanzado en una CPU.
Offers INTERSOCCER

CERN Multimedia

Staff Association

2014-01-01

Summer Football camps New offer to the members of the Staff Association – INTERSOCCER: 12% discount on summer football camps and courses for children (bilingual) so do not hesitate anymore!
Offer

CERN Multimedia

CARLSON WAGONLIT TRAVEL

2011-01-01

Special offer From 14th to 28th February 2011: no CWT service fee! For any new reservation of a holiday package (flight + hotel/apartment) from a catalog “summer 2011” For any additional information our staff is at your disposal from Monday – Friday, from 8h30 to 16h30. Phone number 72763 or 72797 Carlson Wagonlit Tavel, Agence du CERN
GPU Based Software Correlators - Perspectives for VLBI2010

Science.gov (United States)

Hobiger, Thomas; Kimura, Moritaka; Takefuji, Kazuhiro; Oyama, Tomoaki; Koyama, Yasuhiro; Kondo, Tetsuro; Gotoh, Tadahiro; Amagai, Jun

2010-01-01

Caused by historical separation and driven by the requirements of the PC gaming industry, Graphics Processing Units (GPUs) have evolved to massive parallel processing systems which entered the area of non-graphic related applications. Although a single processing core on the GPU is much slower and provides less functionality than its counterpart on the CPU, the huge number of these small processing entities outperforms the classical processors when the application can be parallelized. Thus, in recent years various radio astronomical projects have started to make use of this technology either to realize the correlator on this platform or to establish the post-processing pipeline with GPUs. Therefore, the feasibility of GPUs as a choice for a VLBI correlator is being investigated, including pros and cons of this technology. Additionally, a GPU based software correlator will be reviewed with respect to energy consumption/GFlop/sec and cost/GFlop/sec.
Offer

CERN Multimedia

Staff Association

2010-01-01

Special offer for members of the Staff Association and their families 10% reduction on all products in the SEPHORA shop (sells perfume, beauty products etc.) in Val Thoiry ALL YEAR ROUND. Plus 20% reduction during their “vente privée”* three or four times a year. Simply present your Staff Association membership card when you make your purchase. * Next “vente privée” from 22th to 29th November 2010
Offer

CERN Multimedia

Staff Association

2011-01-01

Special offer for members of the Staff Association and their families 10% reduction on all products in the SEPHORA shop (sells perfume, beauty products etc.) in Val Thoiry ALL YEAR ROUND. Plus 20% reduction during their “vente privée”* three or four times a year. Simply present your Staff Association membership card when you make your purchase. * Next “vente privée” from 25th to 27th March 2011
Exploiting current-generation graphics hardware for synthetic-scene generation

Science.gov (United States)

Tanner, Michael A.; Keen, Wayne A.

2010-04-01

Increasing seeker frame rate and pixel count, as well as the demand for higher levels of scene fidelity, have driven scene generation software for hardware-in-the-loop (HWIL) and software-in-the-loop (SWIL) testing to higher levels of parallelization. Because modern PC graphics cards provide multiple computational cores (240 shader cores for a current NVIDIA Corporation GeForce and Quadro cards), implementation of phenomenology codes on graphics processing units (GPUs) offers significant potential for simultaneous enhancement of simulation frame rate and fidelity. To take advantage of this potential requires algorithm implementation that is structured to minimize data transfers between the central processing unit (CPU) and the GPU. In this paper, preliminary methodologies developed at the Kinetic Hardware In-The-Loop Simulator (KHILS) will be presented. Included in this paper will be various language tradeoffs between conventional shader programming, Compute Unified Device Architecture (CUDA) and Open Computing Language (OpenCL), including performance trades and possible pathways for future tool development.
Real-time track-less Cherenkov ring fitting trigger system based on Graphics Processing Units

Science.gov (United States)

Ammendola, R.; Biagioni, A.; Chiozzi, S.; Cretaro, P.; Cotta Ramusino, A.; Di Lorenzo, S.; Fantechi, R.; Fiorini, M.; Frezza, O.; Gianoli, A.; Lamanna, G.; Lo Cicero, F.; Lonardo, A.; Martinelli, M.; Neri, I.; Paolucci, P. S.; Pastorelli, E.; Piandani, R.; Piccini, M.; Pontisso, L.; Rossetti, D.; Simula, F.; Sozzi, M.; Vicini, P.

2017-12-01

The parallel computing power of commercial Graphics Processing Units (GPUs) is exploited to perform real-time ring fitting at the lowest trigger level using information coming from the Ring Imaging Cherenkov (RICH) detector of the NA62 experiment at CERN. To this purpose, direct GPU communication with a custom FPGA-based board has been used to reduce the data transmission latency. The GPU-based trigger system is currently integrated in the experimental setup of the RICH detector of the NA62 experiment, in order to reconstruct ring-shaped hit patterns. The ring-fitting algorithm running on GPU is fed with raw RICH data only, with no information coming from other detectors, and is able to provide more complex trigger primitives with respect to the simple photodetector hit multiplicity, resulting in a higher selection efficiency. The performance of the system for multi-ring Cherenkov online reconstruction obtained during the NA62 physics run is presented.
48 CFR 12.205 - Offers.

Science.gov (United States)

2010-10-01

... 48 Federal Acquisition Regulations System 1 2010-10-01 2010-10-01 false Offers. 12.205 Section 12... ACQUISITION OF COMMERCIAL ITEMS Special Requirements for the Acquisition of Commercial Items 12.205 Offers. (a) Where technical information is necessary for evaluation of offers, agencies should, as part of market...
3D Tomographic Image Reconstruction using CUDA C

International Nuclear Information System (INIS)

Dominguez, J. S.; Assis, J. T.; Oliveira, L. F. de

2011-01-01

This paper presents the study and implementation of a software for three dimensional reconstruction of images obtained with a tomographic system using the capabilities of Graphic Processing Units(GPU). The reconstruction by filtered back-projection method was developed using the CUDA C, for maximum utilization of the processing capabilities of GPUs to solve computational problems with large computational cost and highly parallelizable. It was discussed the potential of GPUs and shown its advantages to solving this kind of problems. The results in terms of runtime will be compared with non-parallelized implementations and must show a great reduction of processing time. (Author)
Offers

CERN Document Server

Staff Association

2011-01-01

Banque cantonale de Genève (BCGE) The BCGE Business partner programme devised for members of the CERN Staff Association offers personalized banking solutions with preferential conditions. The advantages are linked to salary accounts (free account keeping, internet banking, free Maestro and credit cards, etc.), mortgage lending, retirement planning, investment, credit, etc. The details of the programme and the preferential conditions are available on our website: http://association.web.cern.ch/association/en/OtherActivities/BCGE.html.
Decoupled Vector-Fetch Architecture with a Scalarizing Compiler

OpenAIRE

Lee, Yunsup

2016-01-01

As we approach the end of conventional technology scaling, computer architects are forced to incorporate specialized and heterogeneous accelerators into general-purpose processors for greater energy efficiency. Among the prominent accelerators that have recently become more popular are data-parallel processing units, such as classic vector units, SIMD units, and graphics processing units (GPUs). Surveying a wide range of data-parallel architectures and their parallel programming models and ...
Receipt of a pediatric liver offer as the first offer reduces waitlist mortality for adult women.

Science.gov (United States)

Ge, Jin; Gilroy, Richard; Lai, Jennifer C

2018-03-31

In liver transplantation, adults with small stature have a greater susceptibility to waitlist mortality. This may explain the persistent waitlist mortality disparity that exists for women. We hypothesized that women who receive early offers of pediatric donor livers have improved waitlist survival, and that preferentially offering these organs to women mitigates this sex-based disparity. We analyzed donor liver offers from 2010 to 2014. Adult candidates who received a first offer that ranked within the first three match run positions from the donors' perspective were classified based on gender and whether they received a pediatric versus adult offer. We used competing risks regression to associate first offer type and waitlist mortality. 8,101 waitlist candidates received a first offer that was ranked within the first three match run positions: 5.6% (293/5,202) men and 6.2% (179/2,899) women received a pediatric donor liver as their first offer. In multivariable analyses, compared to adult-first men, adult-first women (sHR1.33, 95%CI 1.17-1.51, p offer had a lower risk of waitlist mortality compared to those who receive adult offers. Our data provides a simple approach to mitigating the increased waitlist mortality experienced by women by incorporating donor and recipient size, as variables, into organ allocation. This article is protected by copyright. All rights reserved. © 2018 by the American Association for the Study of Liver Diseases.
77 FR 66844 - Federal Acquisition Regulation; Submission for OMB Review; Evaluation of Export Offers

Science.gov (United States)

2012-11-07

... Management and Budget (OMB) a request to review and approve an extension of a previously approved information... posted without change to http://www.regulations.gov , including any personal and/or business confidential... United States (CONUS) ports and offers are solicited on a free onboard (f.o.b.) origin or f.o.b...
48 CFR 225.503 - Group offers.

Science.gov (United States)

2010-10-01

... 48 Federal Acquisition Regulations System 3 2010-10-01 2010-10-01 false Group offers. 225.503... OF DEFENSE SOCIOECONOMIC PROGRAMS FOREIGN ACQUISITION Evaluating Foreign Offers-Supply Contracts 225.503 Group offers. Evaluate group offers in accordance with FAR 25.503, but apply the evaluation...

A GPU-based incompressible Navier-Stokes solver on moving overset grids

Science.gov (United States)

Chandar, Dominic D. J.; Sitaraman, Jayanarayanan; Mavriplis, Dimitri J.

2013-07-01

In pursuit of obtaining high fidelity solutions to the fluid flow equations in a short span of time, graphics processing units (GPUs) which were originally intended for gaming applications are currently being used to accelerate computational fluid dynamics (CFD) codes. With a high peak throughput of about 1 TFLOPS on a PC, GPUs seem to be favourable for many high-resolution computations. One such computation that involves a lot of number crunching is computing time accurate flow solutions past moving bodies. The aim of the present paper is thus to discuss the development of a flow solver on unstructured and overset grids and its implementation on GPUs. In its present form, the flow solver solves the incompressible fluid flow equations on unstructured/hybrid/overset grids using a fully implicit projection method. The resulting discretised equations are solved using a matrix-free Krylov solver using several GPU kernels such as gradient, Laplacian and reduction. Some of the simple arithmetic vector calculations are implemented using the CU++: An Object Oriented Framework for Computational Fluid Dynamics Applications using Graphics Processing Units, Journal of Supercomputing, 2013, doi:10.1007/s11227-013-0985-9 approach where GPU kernels are automatically generated at compile time. Results are presented for two- and three-dimensional computations on static and moving grids.
ISMB 2016 offers outstanding science, networking, and celebration.

Science.gov (United States)

Fogg, Christiana

2016-01-01

The annual international conference on Intelligent Systems for Molecular Biology (ISMB) is the major meeting of the International Society for Computational Biology (ISCB). Over the past 23 years the ISMB conference has grown to become the world's largest bioinformatics/computational biology conference. ISMB 2016 will be the year's most important computational biology event globally. The conferences provide a multidisciplinary forum for disseminating the latest developments in bioinformatics/computational biology. ISMB brings together scientists from computer science, molecular biology, mathematics, statistics and related fields. Its principal focus is on the development and application of advanced computational methods for biological problems. ISMB 2016 offers the strongest scientific program and the broadest scope of any international bioinformatics/computational biology conference. Building on past successes, the conference is designed to cater to variety of disciplines within the bioinformatics/computational biology community. ISMB 2016 takes place July 8 - 12 at the Swan and Dolphin Hotel in Orlando, Florida, United States. For two days preceding the conference, additional opportunities including Satellite Meetings, Student Council Symposium, and a selection of Special Interest Group Meetings and Applied Knowledge Exchange Sessions (AKES) are all offered to enable registered participants to learn more on the latest methods and tools within specialty research areas.
Offers

CERN Multimedia

Staff Association

2013-01-01

Special offer for members of the Staff Association and their families 10 % reduction on all products in the SEPHORA shop (sells perfume, beauty products etc.) in Val Thoiry ALL YEAR ROUND. Plus 20 % reduction during their “vente privée”* three or four times a year. Simply present your Staff Association membership card when you make your purchase. * Next “vente privée” from 11th to 23rd November 2013 Please contact the Staff Association Secretariat to get the discount voucher.
Offers

CERN Multimedia

Staff Association

2014-01-01

Special offer for members of the Staff Association and their families 10 % reduction on all products in the SEPHORA shop (sells perfume, beauty products etc.) in Val Thoiry ALL YEAR ROUND. Simply present your Staff Association membership card when you make your purchase. Plus 20 % reduction during their “vente privée”* three or four times a year. * Next “vente privée” from 24th September to 6th November 2014 Please contact the Staff Association Secretariat to get the discount voucher.
Offers

CERN Multimedia

Staff Association

2012-01-01

Special offer for members of the Staff Association and their families 10 % reduction on all products in the Sephora shop (sells perfume, beauty products etc.) in Val Thoiry all year round. Plus 20 % reduction during their “vente privée”* three or four times a year. Simply present your Staff Association membership card when you make your purchase. * next “vente privée” from 21st November to 1st December 2012 Please contact the Staff Association Secretariat to get the discount voucher.
Offers

CERN Multimedia

Staff Association

2012-01-01

Special offer for members of the Staff Association and their families 10% reduction on all products in the SEPHORA shop (sells perfume, beauty products etc.) in Val Thoiry ALL YEAR ROUND. Plus 20% reduction during their “vente privée”* three or four times a year. Simply present your Staff Association membership card when you make your purchase. * Next “vente privée” from 21st to 26th May 2012 Please contact the Staff Association Secretariat to get the discount voucher
48 CFR 570.306 - Evaluating offers.

Science.gov (United States)

2010-10-01

... 48 Federal Acquisition Regulations System 4 2010-10-01 2010-10-01 false Evaluating offers. 570.306... Real Property 570.306 Evaluating offers. (a) You must evaluate offers solely in accordance with the... solicitation. The file must include the basis for evaluation, an analysis of each offer, and a summary of...
Algorithms for GPU-based molecular dynamics simulations of complex fluids: Applications to water, mixtures, and liquid crystals.

Science.gov (United States)

Kazachenko, Sergey; Giovinazzo, Mark; Hall, Kyle Wm; Cann, Natalie M

2015-09-15

A custom code for molecular dynamics simulations has been designed to run on CUDA-enabled NVIDIA graphics processing units (GPUs). The double-precision code simulates multicomponent fluids, with intramolecular and intermolecular forces, coarse-grained and atomistic models, holonomic constraints, Nosé-Hoover thermostats, and the generation of distribution functions. Algorithms to compute Lennard-Jones and Gay-Berne interactions, and the electrostatic force using Ewald summations, are discussed. A neighbor list is introduced to improve scaling with respect to system size. Three test systems are examined: SPC/E water; an n-hexane/2-propanol mixture; and a liquid crystal mesogen, 2-(4-butyloxyphenyl)-5-octyloxypyrimidine. Code performance is analyzed for each system. With one GPU, a 33-119 fold increase in performance is achieved compared with the serial code while the use of two GPUs leads to a 69-287 fold improvement and three GPUs yield a 101-377 fold speedup. © 2015 Wiley Periodicals, Inc.
A comparative study of history-based versus vectorized Monte Carlo methods in the GPU/CUDA environment for a simple neutron eigenvalue problem

International Nuclear Information System (INIS)

Liu, T.; Du, X.; Ji, W.; Xu, G.; Brown, F.B.

2013-01-01

For nuclear reactor analysis such as the neutron eigenvalue calculations, the time consuming Monte Carlo (MC) simulations can be accelerated by using graphics processing units (GPUs). However, traditional MC methods are often history-based, and their performance on GPUs is affected significantly by the thread divergence problem. In this paper we describe the development of a newly designed event-based vectorized MC algorithm for solving the neutron eigenvalue problem. The code was implemented using NVIDIA's Compute Unified Device Architecture (CUDA), and tested on a NVIDIA Tesla M2090 GPU card. We found that although the vectorized MC algorithm greatly reduces the occurrence of thread divergence thus enhancing the warp execution efficiency, the overall simulation speed is roughly ten times slower than the history-based MC code on GPUs. Profiling results suggest that the slow speed is probably due to the memory access latency caused by the large amount of global memory transactions. Possible solutions to improve the code efficiency are discussed. (authors)
GPU-accelerated 3-D model-based tracking

International Nuclear Information System (INIS)

Brown, J Anthony; Capson, David W

2010-01-01

Model-based approaches to tracking the pose of a 3-D object in video are effective but computationally demanding. While statistical estimation techniques, such as the particle filter, are often employed to minimize the search space, real-time performance remains unachievable on current generation CPUs. Recent advances in graphics processing units (GPUs) have brought massively parallel computational power to the desktop environment and powerful developer tools, such as NVIDIA Compute Unified Device Architecture (CUDA), have provided programmers with a mechanism to exploit it. NVIDIA GPUs' single-instruction multiple-thread (SIMT) programming model is well-suited to many computer vision tasks, particularly model-based tracking, which requires several hundred 3-D model poses to be dynamically configured, rendered, and evaluated against each frame in the video sequence. Using 6 degree-of-freedom (DOF) rigid hand tracking as an example application, this work harnesses consumer-grade GPUs to achieve real-time, 3-D model-based, markerless object tracking in monocular video.
A comparative study of history-based versus vectorized Monte Carlo methods in the GPU/CUDA environment for a simple neutron eigenvalue problem

Science.gov (United States)

Liu, Tianyu; Du, Xining; Ji, Wei; Xu, X. George; Brown, Forrest B.

2014-06-01

For nuclear reactor analysis such as the neutron eigenvalue calculations, the time consuming Monte Carlo (MC) simulations can be accelerated by using graphics processing units (GPUs). However, traditional MC methods are often history-based, and their performance on GPUs is affected significantly by the thread divergence problem. In this paper we describe the development of a newly designed event-based vectorized MC algorithm for solving the neutron eigenvalue problem. The code was implemented using NVIDIA's Compute Unified Device Architecture (CUDA), and tested on a NVIDIA Tesla M2090 GPU card. We found that although the vectorized MC algorithm greatly reduces the occurrence of thread divergence thus enhancing the warp execution efficiency, the overall simulation speed is roughly ten times slower than the history-based MC code on GPUs. Profiling results suggest that the slow speed is probably due to the memory access latency caused by the large amount of global memory transactions. Possible solutions to improve the code efficiency are discussed.
18th East European Conference on Advances in Databases and Information Systems and Associated Satellite Events

CERN Document Server

Ivanovic, Mirjana; Kon-Popovska, Margita; Manolopoulos, Yannis; Palpanas, Themis; Trajcevski, Goce; Vakali, Athena

2015-01-01

This volume contains the papers of 3 workshops and the doctoral consortium, which are organized in the framework of the 18th East-European Conference on Advances in Databases and Information Systems (ADBIS’2014). The 3rd International Workshop on GPUs in Databases (GID’2014) is devoted to subjects related to utilization of Graphics Processing Units in database environments. The use of GPUs in databases has not yet received enough attention from the database community. The intention of the GID workshop is to provide a discussion on popularizing the GPUs and providing a forum for discussion with respect to the GID’s research ideas and their potential to achieve high speedups in many database applications. The 3rd International Workshop on Ontologies Meet Advanced Information Systems (OAIS’2014) has a twofold objective to present: new and challenging issues in the contribution of ontologies for designing high quality information systems, and new research and technological developments which use ontologie...
Initial Assessment of Parallelization of Monte Carlo Calculation using Graphics Processing Units

International Nuclear Information System (INIS)

Choi, Sung Hoon; Joo, Han Gyu

2009-01-01

Monte Carlo (MC) simulation is an effective tool for calculating neutron transports in complex geometry. However, because Monte Carlo simulates each neutron behavior one by one, it takes a very long computing time if enough neutrons are used for high precision of calculation. Accordingly, methods that reduce the computing time are required. In a Monte Carlo code, parallel calculation is well-suited since it simulates the behavior of each neutron independently and thus parallel computation is natural. The parallelization of the Monte Carlo codes, however, was done using multi CPUs. By the global demand for high quality 3D graphics, the Graphics Processing Unit (GPU) has developed into a highly parallel, multi-core processor. This parallel processing capability of GPUs can be available to engineering computing once a suitable interface is provided. Recently, NVIDIA introduced CUDATM, a general purpose parallel computing architecture. CUDA is a software environment that allows developers to manage GPU using C/C++ or other languages. In this work, a GPU-based Monte Carlo is developed and the initial assessment of it parallel performance is investigated
48 CFR 25.503 - Group offers.

Science.gov (United States)

2010-10-01

... 48 Federal Acquisition Regulations System 1 2010-10-01 2010-10-01 false Group offers. 25.503... PROGRAMS FOREIGN ACQUISITION Evaluating Foreign Offers-Supply Contracts 25.503 Group offers. (a) If the solicitation or an offer specifies that award can be made only on a group of line items or on all line items...
17 CFR 230.802 - Exemption for offerings in connection with an exchange offer or business combination for the...

Science.gov (United States)

2010-04-01

... connection with an exchange offer or business combination for the securities of foreign private issuers. 230... Offers and Business Combinations § 230.802 Exemption for offerings in connection with an exchange offer or business combination for the securities of foreign private issuers. Offers and sales in any...
Accept or Decline? An Analytics-Based Decision Tool for Kidney Offer Evaluation.

Science.gov (United States)

Bertsimas, Dimitris; Kung, Jerry; Trichakis, Nikolaos; Wojciechowski, David; Vagefi, Parsia A

2017-12-01

When a deceased-donor kidney is offered to a waitlisted candidate, the decision to accept or decline the organ relies primarily upon a practitioner's experience and intuition. Such decisions must achieve a delicate balance between estimating the immediate benefit of transplantation and the potential for future higher-quality offers. However, the current experience-based paradigm lacks scientific rigor and is subject to the inaccuracies that plague anecdotal decision-making. A data-driven analytics-based model was developed to predict whether a patient will receive an offer for a deceased-donor kidney at Kidney Donor Profile Index thresholds of 0.2, 0.4, and 0.6, and at timeframes of 3, 6, and 12 months. The model accounted for Organ Procurement Organization, blood group, wait time, DR antigens, and prior offer history to provide accurate and personalized predictions. Performance was evaluated on data sets spanning various lengths of time to understand the adaptability of the method. Using United Network for Organ Sharing match-run data from March 2007 to June 2013, out-of-sample area under the receiver operating characteristic curve was approximately 0.87 for all Kidney Donor Profile Index thresholds and timeframes considered for the 10 most populous Organ Procurement Organizations. As more data becomes available, area under the receiver operating characteristic curve values increase and subsequently level off. The development of a data-driven analytics-based model may assist transplant practitioners and candidates during the complex decision of whether to accept or forgo a current kidney offer in anticipation of a future high-quality offer. The latter holds promise to facilitate timely transplantation and optimize the efficiency of allocation.
Oil Vulnerabilities and United States Strategy

Science.gov (United States)

2007-02-08

Mazda, Mercedes - Benz , Ford, Mercury, and Nissan offer flexible fuel vehicles in the United States. Ethanol is currently produced in the United States...USAWC STRATEGY RESEARCH PROJECT OIL VULNERABILITIES AND UNITED STATES STRATEGY by Colonel Shawn P. Walsh...Colleges and Schools, 3624 Market Street, Philadelphia, PA 19104, (215) 662-5606. The Commission on Higher Education is an institutional accrediting
Efficient Acceleration of the Pair-HMMs Forward Algorithm for GATK HaplotypeCaller on Graphics Processing Units.

Science.gov (United States)

Ren, Shanshan; Bertels, Koen; Al-Ars, Zaid

2018-01-01

GATK HaplotypeCaller (HC) is a popular variant caller, which is widely used to identify variants in complex genomes. However, due to its high variants detection accuracy, it suffers from long execution time. In GATK HC, the pair-HMMs forward algorithm accounts for a large percentage of the total execution time. This article proposes to accelerate the pair-HMMs forward algorithm on graphics processing units (GPUs) to improve the performance of GATK HC. This article presents several GPU-based implementations of the pair-HMMs forward algorithm. It also analyzes the performance bottlenecks of the implementations on an NVIDIA Tesla K40 card with various data sets. Based on these results and the characteristics of GATK HC, we are able to identify the GPU-based implementations with the highest performance for the various analyzed data sets. Experimental results show that the GPU-based implementations of the pair-HMMs forward algorithm achieve a speedup of up to 5.47× over existing GPU-based implementations.
Performance Analysis of FEM Algorithmson GPU and Many-Core Architectures

KAUST Repository

Khurram, Rooh; Kortas, Samuel

2015-01-01

-only Exascale systems will be unsustainable, thus accelerators such as graphic processing units (GPUs) and many-integrated-core (MIC) will likely be the integral part of the TOP500 (http://www.top500.org/) supercomputers, beyond 2020. The emerging supercomputer
Offers

CERN Multimedia

Staff Association

2014-01-01

Passeport Gourmand Are you dying for a nice meal? The “Passeport Gourmand” offers discounted prices to the members of the Staff Association (available until April 2015 and on sale in the Staff Association Secretariat): Passeport gourmand Ain / Savoie/ Haute Savoie: 56 CHF instead of 79 CHF. Passeport gourmand Geneva / neighbouring France:72 CHF instead of 95 CHF. To the members of the Staff Association: Benefit of reduced tickets: CHF 10 (instead of 18 CHF at the desk) on sale to the secretariat of the Staff Association, Building 510-R010 (in front of the Printshop).

Barriers to Offering Vasectomy at Publicly Funded Family Planning Organizations in Texas.

Science.gov (United States)

White, Kari; Campbell, Anthony; Hopkins, Kristine; Grossman, Daniel; Potter, Joseph E

2017-05-01

Few publicly funded family planning clinics in the United States offer vasectomy, but little is known about the reasons this method is not more widely available at these sources of care. Between February 2012 and February 2015, three waves of in-depth interviews were conducted with program administrators at 54 family planning organizations in Texas. Participants described their organization's vasectomy service model and factors that influenced how frequently vasectomy was provided. Interview transcripts were coded and analyzed using a theme-based approach. Service models and barriers to providing vasectomy were compared by organization type (e.g., women's health center, public health clinic) and receipt of Title X funding. Two thirds of organizations did not offer vasectomy on-site or pay for referrals with family planning funding; nine organizations frequently provided vasectomy. Organizations did not widely offer vasectomy because they could not find providers that would accept the low reimbursement for the procedure or because they lacked funding for men's reproductive health care. Respondents often did not perceive men's reproductive health care as a service priority and commented that men, especially Latinos, had limited interest in vasectomy. Although organizations of all types reported barriers, women's health centers and Title X-funded organizations more frequently offered vasectomy by conducting tailored outreach to men and vasectomy providers. A combination of factors operating at the health systems and provider level influence the availability of vasectomy at publicly funded family planning organizations in Texas. Multilevel approaches that address key barriers to vasectomy provision would help organizations offer comprehensive contraceptive services.
Parallel Block Structured Adaptive Mesh Refinement on Graphics Processing Units

Energy Technology Data Exchange (ETDEWEB)

Beckingsale, D. A. [Atomic Weapons Establishment (AWE), Aldermaston (United Kingdom); Gaudin, W. P. [Atomic Weapons Establishment (AWE), Aldermaston (United Kingdom); Hornung, R. D. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Gunney, B. T. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Gamblin, T. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Herdman, J. A. [Atomic Weapons Establishment (AWE), Aldermaston (United Kingdom); Jarvis, S. A. [Atomic Weapons Establishment (AWE), Aldermaston (United Kingdom)

2014-11-17

Block-structured adaptive mesh refinement is a technique that can be used when solving partial differential equations to reduce the number of zones necessary to achieve the required accuracy in areas of interest. These areas (shock fronts, material interfaces, etc.) are recursively covered with finer mesh patches that are grouped into a hierarchy of refinement levels. Despite the potential for large savings in computational requirements and memory usage without a corresponding reduction in accuracy, AMR adds overhead in managing the mesh hierarchy, adding complex communication and data movement requirements to a simulation. In this paper, we describe the design and implementation of a native GPU-based AMR library, including: the classes used to manage data on a mesh patch, the routines used for transferring data between GPUs on different nodes, and the data-parallel operators developed to coarsen and refine mesh data. We validate the performance and accuracy of our implementation using three test problems and two architectures: an eight-node cluster, and over four thousand nodes of Oak Ridge National Laboratory’s Titan supercomputer. Our GPU-based AMR hydrodynamics code performs up to 4.87× faster than the CPU-based implementation, and has been scaled to over four thousand GPUs using a combination of MPI and CUDA.
48 CFR 225.7703-3 - Evaluating offers.

Science.gov (United States)

2010-10-01

... 48 Federal Acquisition Regulations System 3 2010-10-01 2010-10-01 false Evaluating offers. 225... Iraq or Afghanistan 225.7703-3 Evaluating offers. (a) Evaluate offers submitted in response to... Afghanistan, as follows: (1) If the low offer is an offer of a product or service from Iraq or Afghanistan...
Cost-effectiveness of primary offer of IVF vs. primary offer of IUI followed by IVF (for IUI failures) in couples with unexplained or mild male factor subfertility.

Science.gov (United States)

Pashayan, Nora; Lyratzopoulos, Georgios; Mathur, Raj

2006-06-23

In unexplained and mild male factor subfertility, both intrauterine insemination (IUI) and in-vitro fertilisation (IVF) are indicated as first line treatments. Because the success rate of IUI is low, many couples failing IUI subsequently require IVF treatment. In practice, it is therefore important to examine the comparative outcomes (live birth-producing pregnancy), costs, and cost-effectiveness of primary offer of IVF, compared with primary offer of IUI followed by IVF for couples failing IUI. Mathematical modelling was used to estimate comparative clinical and cost effectiveness of either primary offer of one full IVF cycle (including frozen cycles when applicable) or "IUI + IVF" (defined as primary IUI followed by IVF for IUI failures) to a hypothetical cohort of subfertile couples who are eligible for both treatment strategies. Data used in calculations were derived from the published peer-reviewed literature as well as activity data of local infertility units. Cost-effectiveness ratios for IVF, "unstimulated-IUI (U-IUI) + IVF", and "stimulated IUI (S-IUI) + IVF" were 12,600 pounds sterling, 13,100 pound sterling and 15,100 pound sterling per live birth-producing pregnancy respectively. For a hypothetical cohort of 100 couples with unexplained or mild male factor subfertility, compared with primary offer of IVF, 6 cycles of "U-IUI + IVF" or of "S-IUI + IVF" would cost an additional 174,200 pounds sterling and 438,000 pounds sterling, representing an opportunity cost of 54 and 136 additional IVF cycles and 14 to 35 live birth-producing pregnancies respectively. For couples with unexplained and mild male factor subfertility, primary offer of a full IVF cycle is less costly and more cost-effective than providing IUI (of any modality) followed by IVF.
Cost-effectiveness of primary offer of IVF vs. primary offer of IUI followed by IVF (for IUI failures in couples with unexplained or mild male factor subfertility

Directory of Open Access Journals (Sweden)

Lyratzopoulos Georgios

2006-06-01

Full Text Available Abstract Background In unexplained and mild male factor subfertility, both intrauterine insemination (IUI and in-vitro fertilisation (IVF are indicated as first line treatments. Because the success rate of IUI is low, many couples failing IUI subsequently require IVF treatment. In practice, it is therefore important to examine the comparative outcomes (live birth-producing pregnancy, costs, and cost-effectiveness of primary offer of IVF, compared with primary offer of IUI followed by IVF for couples failing IUI. Methods Mathematical modelling was used to estimate comparative clinical and cost effectiveness of either primary offer of one full IVF cycle (including frozen cycles when applicable or "IUI + IVF" (defined as primary IUI followed by IVF for IUI failures to a hypothetical cohort of subfertile couples who are eligible for both treatment strategies. Data used in calculations were derived from the published peer-reviewed literature as well as activity data of local infertility units. Results Cost-effectiveness ratios for IVF, "unstimulated-IUI (U-IUI + IVF", and "stimulated IUI (S-IUI + IVF" were £12,600, £13,100 and £15,100 per live birth-producing pregnancy respectively. For a hypothetical cohort of 100 couples with unexplained or mild male factor subfertility, compared with primary offer of IVF, 6 cycles of "U-IUI + IVF" or of "S-IUI + IVF" would cost an additional £174,200 and £438,000, representing an opportunity cost of 54 and 136 additional IVF cycles and 14 to 35 live birth-producing pregnancies respectively. Conclusion For couples with unexplained and mild male factor subfertility, primary offer of a full IVF cycle is less costly and more cost-effective than providing IUI (of any modality followed by IVF.
Distributed terascale volume visualization using distributed shared virtual memory

KAUST Repository

Beyer, Johanna; Hadwiger, Markus; Schneider, Jens; Jeong, Wonki; Pfister, Hanspeter

2011-01-01

Table 1 illustrates the impact of different distribution unit sizes, different screen resolutions, and numbers of GPU nodes. We use two and four GPUs (NVIDIA Quadro 5000 with 2.5 GB memory) and a mouse cortex EM dataset (see Figure 2) of resolution
Offers

CERN Multimedia

Staff Association

2013-01-01

The « Théâtre de Carouge » offers a 5.- CHF discount for all shows (30.- CHF instead of 35.- CHF) and for the season tickets "Premières représentations" (132.- CHF instead of 162.- CHF) and "Classique" (150.- CHF instead of 180.- CHF). Please send your reservation by email to smills@tcag.ch via your professional email address. Please indicate the date of your reservation, your name and firstname and your telephone number A confirmation will be sent by email. Your membership card will be asked when you collect the tickets. More information on www.tcag.ch and www.tcag.ch/blog/
Efficient particle-in-cell simulation of auroral plasma phenomena using a CUDA enabled graphics processing unit

Science.gov (United States)

Sewell, Stephen

This thesis introduces a software framework that effectively utilizes low-cost commercially available Graphic Processing Units (GPUs) to simulate complex scientific plasma phenomena that are modeled using the Particle-In-Cell (PIC) paradigm. The software framework that was developed conforms to the Compute Unified Device Architecture (CUDA), a standard for general purpose graphic processing that was introduced by NVIDIA Corporation. This framework has been verified for correctness and applied to advance the state of understanding of the electromagnetic aspects of the development of the Aurora Borealis and Aurora Australis. For each phase of the PIC methodology, this research has identified one or more methods to exploit the problem's natural parallelism and effectively map it for execution on the graphic processing unit and its host processor. The sources of overhead that can reduce the effectiveness of parallelization for each of these methods have also been identified. One of the novel aspects of this research was the utilization of particle sorting during the grid interpolation phase. The final representation resulted in simulations that executed about 38 times faster than simulations that were run on a single-core general-purpose processing system. The scalability of this framework to larger problem sizes and future generation systems has also been investigated.
7 CFR 3560.656 - Incentives offers.

Science.gov (United States)

2010-01-01

... 7 Agriculture 15 2010-01-01 2010-01-01 false Incentives offers. 3560.656 Section 3560.656... AGRICULTURE DIRECT MULTI-FAMILY HOUSING LOANS AND GRANTS Housing Preservation § 3560.656 Incentives offers. (a) The Agency will offer a borrower, who submits a prepayment request meeting the conditions of § 3560...
Offering Incentives from the Outside

DEFF Research Database (Denmark)

Emmanuel, Nikolas G.

2017-01-01

Incentives offer a good deal of underexplored opportunities to help manage conflict by encouraging political bargaining. This study has two primary objectives. First, it furthers the discussion of how external third parties can help manage conflicts. Second, it offers a typology of the available ...
48 CFR 570.203-3 - Soliciting offers.

Science.gov (United States)

2010-10-01

... 48 Federal Acquisition Regulations System 4 2010-10-01 2010-10-01 false Soliciting offers. 570.203... 570.203-3 Soliciting offers. (a) Solicit offers by providing each prospective offeror a proposed short..., evaluation procedures and submissions of offers. ...
43 CFR 12.715 - Evaluating offers.

Science.gov (United States)

2010-10-01

... 43 Public Lands: Interior 1 2010-10-01 2010-10-01 false Evaluating offers. 12.715 Section 12.715... Act-Supplies § 12.715 Evaluating offers. (a) Unless the head of the grantee organization or a designee at a level no lower than the grantee's designated awarding official determines otherwise, the offered...
5 CFR 536.104 - Reasonable offer.

Science.gov (United States)

2010-01-01

... 5 Administrative Personnel 1 2010-01-01 2010-01-01 false Reasonable offer. 536.104 Section 536.104... Provisions § 536.104 Reasonable offer. (a) For the purpose of determining whether grade retention eligibility or entitlement must be terminated under § 536.207 or 536.208, the offer of a position is a reasonable...
CUDA-Accelerated Geodesic Ray-Tracing for Fiber Tracking

Directory of Open Access Journals (Sweden)

Evert van Aart

2011-01-01

Full Text Available Diffusion Tensor Imaging (DTI allows to noninvasively measure the diffusion of water in fibrous tissue. By reconstructing the fibers from DTI data using a fiber-tracking algorithm, we can deduce the structure of the tissue. In this paper, we outline an approach to accelerating such a fiber-tracking algorithm using a Graphics Processing Unit (GPU. This algorithm, which is based on the calculation of geodesics, has shown promising results for both synthetic and real data, but is limited in its applicability by its high computational requirements. We present a solution which uses the parallelism offered by modern GPUs, in combination with the CUDA platform by NVIDIA, to significantly reduce the execution time of the fiber-tracking algorithm. Compared to a multithreaded CPU implementation of the same algorithm, our GPU mapping achieves a speedup factor of up to 40 times.
State-of-the-art in Heterogeneous Computing

Directory of Open Access Journals (Sweden)

Andre R. Brodtkorb

2010-01-01

Full Text Available Node level heterogeneous architectures have become attractive during the last decade for several reasons: compared to traditional symmetric CPUs, they offer high peak performance and are energy and/or cost efficient. With the increase of fine-grained parallelism in high-performance computing, as well as the introduction of parallelism in workstations, there is an acute need for a good overview and understanding of these architectures. We give an overview of the state-of-the-art in heterogeneous computing, focusing on three commonly found architectures: the Cell Broadband Engine Architecture, graphics processing units (GPUs, and field programmable gate arrays (FPGAs. We present a review of hardware, available software tools, and an overview of state-of-the-art techniques and algorithms. Furthermore, we present a qualitative and quantitative comparison of the architectures, and give our view on the future of heterogeneous computing.
Strategic renewal for business units.

Science.gov (United States)

Whitney, J O

1996-01-01

Over the past decade, business units have increasingly taken the role of strategy formulation away from corporate headquarters. The change makes sense: business units are closer to customers, competitors, and costs. Nevertheless, business units can fail, just as headquarters once did, by losing their focus on the organization's priorities and capabilities. John Whitney--turnaround expert and professor of management at Columbia University--offers a method for refocusing companies that he calls the strategic-renewal process. The principles behind the process are straightforward, but its execution demands extensive data, rigorous analysis, and the judgment of key decision makers. However, when applied with diligence, it can produce a strategy that yields both growth and profit. To carry out the process, managers must analyze, one by one or in logical groupings, the company's customers, the products it sells, and the services it offers in light of three criteria: strategic importance, significance, and profitability. Does a given customer, product, or service mesh with the organization's goals? Is it significant in terms of current and future revenues? And is it truly profitable when all costs are care fully considered? Customers, products, and services that do not measure up, says the author, must be weeded out relentlessly. Although the process is a painstaking one, the article offers clear thinking on why-and how-to go about it. A series of exhibits takes managers through the questions they need to raise, and two matrices offer Whitney's concentrated wisdom on when to cultivate--and when to prune.
46 CFR 201.144 - Offer of proof.

Science.gov (United States)

2010-10-01

... 46 Shipping 8 2010-10-01 2010-10-01 false Offer of proof. 201.144 Section 201.144 Shipping... PROCEDURE Evidence (Rule 14) § 201.144 Offer of proof. An offer of proof made in connection with an... accompany the record as the offer of proof. ...
Offers

CERN Multimedia

Association du personnel

2010-01-01

THEATRE FORUM DE MEYRIN 1, place des Cinq-Continents 1217 Meyrin Special offer for members of the Staff Association: Reduced ticket prices for the play Love is my sin (in English) from 15 to 17 March at 8.30pm http://www.forum-meyrin.ch/main.php?page=119&s=12 First category: 37 CHF instead of 46 CHF Second category (seats towards the sides): 30 CHF instead of 38 CHF Please present your CERN card and your Staff Association membership card at the ticket office. Ticket reservation: tel. 022 989 34 34 (from Monday to Friday 2pm to 6pm) or e-mail : billetterie@forum-meyrin.ch
Offer

CERN Document Server

Staff Association

2011-01-01

DETAILS OF THE AGREEMENT WITH BCGE The BCGE Business partner programme devised for members of the CERN Staff Association offers personalized banking solutions with preferential conditions. The advantages are linked to salary accounts (free account keeping, internet banking, free Maestro and credit cards, etc.), mortgage lending, retirement planning, investment, credit, etc. The details of the programme and the preferential conditions are available on the Staff Association web site and from the secretariat (http://cern.ch/association/en/OtherActivities/BCGE.html). To benefit from these advantages, you will need to fill in the form available on our site, which must then be stamped by the Staff Association as proof that you are a paid-up member.
New offer for our members

CERN Document Server

Staff Association

2018-01-01

Evolution 2, your specialist for Outdoor Adventures Be it for a ski lesson, a parachute jump or for a mountain bike descent, come live an unforgettable experience with our outdoor specialists. Benefit from a 10 % discount on all activities: Offer is open to SA members and their family members living in the same household, upon presentation of the membership card. Offer available for all bookings made between 1 June 2018 and 30 May 2019. Offer available on all the Evoltion2 sites. A wide range of summer and winter activities. More information on http://evolution2.com/ Contact and reservation : +33 (0)4.50.02.63.35 management@evolution2.com

A kidney offer acceptance decision tool to inform the decision to accept an offer or wait for a better kidney.

Science.gov (United States)

Wey, Andrew; Salkowski, Nicholas; Kremers, Walter K; Schaffhausen, Cory R; Kasiske, Bertram L; Israni, Ajay K; Snyder, Jon J

2018-04-01

We developed a kidney offer acceptance decision tool to predict the probability of graft survival and patient survival for first-time kidney-alone candidates after an offer is accepted or declined, and we characterized the effect of restricting the donor pool with a maximum acceptable kidney donor profile index (KDPI). For accepted offers, Cox proportional hazards models estimated these probabilities using transplanted kidneys. For declined offers, these probabilities were estimated by considering the experience of similar candidates who declined offers and the probability that declining would lead to these outcomes. We randomly selected 5000 declined offers and estimated these probabilities 3 years post-offer had the offers been accepted or declined. Predicted outcomes for declined offers were well calibrated (offers been accepted, the probabilities of graft survival and patient survival were typically higher. However, these advantages attenuated or disappeared with higher KDPI, candidate priority, and local donor supply. Donor pool restrictions were associated with worse 3-year outcomes, especially for candidates with high allocation priority. The kidney offer acceptance decision tool could inform offer acceptance by characterizing the potential risk-benefit trade-off associated with accepting or declining an offer. © 2017 The American Society of Transplantation and the American Society of Transplant Surgeons.
A high performance GPU implementation of Surface Energy Balance System (SEBS) based on CUDA-C

NARCIS (Netherlands)

Abouali, Mohammad; Timmermans, J.; Castillo, Jose E.; Su, Zhongbo

2013-01-01

This paper introduces a new implementation of the Surface Energy Balance System (SEBS) algorithm harnessing the many cores available on Graphics Processing Units (GPUs). This new implementation uses Compute Unified Device Architecture C (CUDA-C) programming model and is designed to be executed on a
Parallel GPGPU Evaluation of Small Angle X-ray Scattering Profiles in a Markov Chain Monte Carlo Framework

DEFF Research Database (Denmark)

Antonov, Lubomir Dimitrov; Andreetta, Christian; Hamelryck, Thomas Wim

2013-01-01

directly determines the complexity of the systems that can be explored. We present an efficient implementation of the forward model for SAXS with full hardware utilization of Graphics Processor Units (GPUs). The proposed algorithm is orders of magnitude faster than an efficient CPU implementation...
High Resolution Orientation Distribution Function

DEFF Research Database (Denmark)

Schmidt, Søren; Gade-Nielsen, Nicolai Fog; Høstergaard, Martin

2012-01-01

from the deformed material. The underlying mathematical formalism supports all crystallographic space groups and reduces the problem to solving a (large) set of linear equations. An implementation on multi-core CPUs and Graphical Processing Units (GPUs) is discussed along with an example on simulated...
Impact of the future water value on wind-reversible hydro offering strategies in electricity markets

International Nuclear Information System (INIS)

Sánchez de la Nieta, A.A.; Contreras, J.; Catalão, J.P.S.

2015-01-01

Highlights: • A stochastic mixed integer linear model is proposed to maximize the profit and the future water value. • Conditional Value at Risk (CVaR) is used for risk-hedging. • The offer strategies analyzed are single and separate, with and without a physical connection. • The effect of considering the future water value of the reservoirs is studied for several time horizons. - Abstract: A coordinated offering strategy between a wind farm and a reversible hydro plant can reduce wind power imbalances, improving the system efficiency whilst decreasing the total imbalances. A stochastic mixed integer linear model is proposed to maximize the profit and the future water value FWV of the system using Conditional Value at Risk (CVaR) for risk-hedging. The offer strategies analyzed are: (i) single wind-reversible hydro offer with a physical connection between wind and hydro units to store spare wind energy, and (ii) separate wind and reversible hydro offers without a physical connection between them. The effect of considering the FWV of the reservoirs is studied for several time horizons: one week (168 h) and one month (720 h) using an illustrative case study. Conclusions are duly drawn from the case study to show the impact of FWV in the results.
Real-time autocorrelator for fluorescence correlation spectroscopy based on graphical-processor-unit architecture: method, implementation, and comparative studies

Science.gov (United States)

Laracuente, Nicholas; Grossman, Carl

2013-03-01

We developed an algorithm and software to calculate autocorrelation functions from real-time photon-counting data using the fast, parallel capabilities of graphical processor units (GPUs). Recent developments in hardware and software have allowed for general purpose computing with inexpensive GPU hardware. These devices are more suited for emulating hardware autocorrelators than traditional CPU-based software applications by emphasizing parallel throughput over sequential speed. Incoming data are binned in a standard multi-tau scheme with configurable points-per-bin size and are mapped into a GPU memory pattern to reduce time-expensive memory access. Applications include dynamic light scattering (DLS) and fluorescence correlation spectroscopy (FCS) experiments. We ran the software on a 64-core graphics pci card in a 3.2 GHz Intel i5 CPU based computer running Linux. FCS measurements were made on Alexa-546 and Texas Red dyes in a standard buffer (PBS). Software correlations were compared to hardware correlator measurements on the same signals. Supported by HHMI and Swarthmore College
A GPU-accelerated semi-implicit fractional step method for numerical solutions of incompressible Navier-Stokes equations

Science.gov (United States)

Ha, Sanghyun; Park, Junshin; You, Donghyun

2017-11-01

Utility of the computational power of modern Graphics Processing Units (GPUs) is elaborated for solutions of incompressible Navier-Stokes equations which are integrated using a semi-implicit fractional-step method. Due to its serial and bandwidth-bound nature, the present choice of numerical methods is considered to be a good candidate for evaluating the potential of GPUs for solving Navier-Stokes equations using non-explicit time integration. An efficient algorithm is presented for GPU acceleration of the Alternating Direction Implicit (ADI) and the Fourier-transform-based direct solution method used in the semi-implicit fractional-step method. OpenMP is employed for concurrent collection of turbulence statistics on a CPU while Navier-Stokes equations are computed on a GPU. Extension to multiple NVIDIA GPUs is implemented using NVLink supported by the Pascal architecture. Performance of the present method is experimented on multiple Tesla P100 GPUs compared with a single-core Xeon E5-2650 v4 CPU in simulations of boundary-layer flow over a flat plate. Supported by the National Research Foundation of Korea (NRF) Grant funded by the Korea government (Ministry of Science, ICT and Future Planning NRF-2016R1E1A2A01939553, NRF-2014R1A2A1A11049599, and Ministry of Trade, Industry and Energy 201611101000230).
Offering Strategy of a Flexibility Aggregator in a Balancing Market Using Asymmetric Block Offers

DEFF Research Database (Denmark)

Bobo, Lucien Ali; Delikaraoglou, Stefanos; Vespermann, Niklas

2018-01-01

scenarios are used to find optimal load-shifting offers under uncertainty. The problem is formulated as a stochastic mixed-integer linear program and can be solved with reasonable computational time. This work is taking place in the framework of the real-life demonstration project EcoGrid 2.0, which......In order to enable large-scale penetration of renewables with variable generation, new sources of flexibility have to be exploited in the power systems. Allowing asymmetric block offers (including response and rebound blocks) in balancing markets can facilitate the participation of flexibility...... aggregators and unlock load-shifting flexibility from, e.g., thermostatic loads. In this paper, we formulate an optimal offering strategy for a risk-averse flexibility aggregator participating in such a market. Using a price-taker approach, load flexibility characteristics and balancing market price forecast...
33 CFR 5.37 - Offer of facilities.

Science.gov (United States)

2010-07-01

... 33 Navigation and Navigable Waters 1 2010-07-01 2010-07-01 false Offer of facilities. 5.37 Section... GUARD AUXILIARY § 5.37 Offer of facilities. Any member of the Auxiliary desiring to place a vessel... in such communication which facility is offered. Except in emergencies, an offer to the Coast Guard...
45 CFR 81.85 - Offer of proof.

Science.gov (United States)

2010-10-01

... 45 Public Welfare 1 2010-10-01 2010-10-01 false Offer of proof. 81.85 Section 81.85 Public Welfare... 80 OF THIS TITLE Hearing Procedures § 81.85 Offer of proof. An offer of proof made in connection with... identification and shall accompany the record as the offer of proof. ...
49 CFR 604.43 - Offer of proof.

Science.gov (United States)

2010-10-01

... 49 Transportation 7 2010-10-01 2010-10-01 false Offer of proof. 604.43 Section 604.43..., DEPARTMENT OF TRANSPORTATION CHARTER SERVICE Hearings. § 604.43 Offer of proof. A party whose evidence has... respond to the offer of proof, may offer the evidence on the record when filing an appeal. ...
Special offer

CERN Multimedia

Staff Association

2010-01-01

Special offer for members of the Staff Association and their families 10% reduction on all products in the SEPHORA shop (sells perfume, beauty products etc.) in Val Thoiry ALL YEAR ROUND. Plus 20% reduction during their “vente privée”* three or four times a year. Simply present your Staff Association membership card when you make your purchase. * next “vente privée” from 24th to 29th May 2010
Prevalence, level and distribution of Salmonella in shipments of imported capsicum and sesame seed spice offered for entry to the United States: observations and modeling results.

Science.gov (United States)

Van Doren, Jane M; Blodgett, Robert J; Pouillot, Régis; Westerman, Ann; Kleinmeier, Daria; Ziobro, George C; Ma, Yinqing; Hammack, Thomas S; Gill, Vikas; Muckenfuss, Martin F; Fabbri, Linda

2013-12-01

In response to increased concerns about spice safety, the United States Food and Drug Administration (FDA) initiated research to characterize the prevalence and levels of Salmonella in imported spices. 299 imported dried capsicum shipments and 233 imported sesame seed shipments offered for entry to the United States were sampled. Observed Salmonella shipment prevalence was 3.3% (1500 g examined; 95% CI 1.6-6.1%) for capsicum and 9.9% (1500 g; 95% Confidence Interval (CI) 6.3-14%) for sesame seed. Within shipment contamination was not inconsistent with a Poisson distribution. Shipment mean Salmonella level estimates among contaminated shipments ranged from 6 × 10(-4) to 0.09 (capsicum) or 6 × 10(-4) to 0.04 (sesame seed) MPN/g. A gamma-Poisson model provided the best fit to observed data for both imported shipments of capsicum and imported shipments of sesame seed sampled in this study among the six parametric models considered. Shipment mean levels of Salmonella vary widely between shipments; many contaminated shipments contain low levels of contamination. Examination of sampling plan efficacy for identifying contaminated spice shipments from these distributions indicates that sample size of spice examined is critical. Sampling protocols examining 25 g samples are predicted to be able to identify a small fraction of contaminated shipments of imported capsicum or sesame seeds. Published by Elsevier Ltd.
Architectural improvements and 28 nm FPGA implementation of the APEnet+ 3D Torus network for hybrid HPC systems

International Nuclear Information System (INIS)

Ammendola, Roberto; Biagioni, Andrea; Frezza, Ottorino; Cicero, Francesca Lo; Paolucci, Pier Stanislao; Lonardo, Alessandro; Rossetti, Davide; Simula, Francesco; Tosoratto, Laura; Vicini, Piero

2014-01-01

Modern Graphics Processing Units (GPUs) are now considered accelerators for general purpose computation. A tight interaction between the GPU and the interconnection network is the strategy to express the full potential on capability computing of a multi-GPU system on large HPC clusters; that is the reason why an efficient and scalable interconnect is a key technology to finally deliver GPUs for scientific HPC. In this paper we show the latest architectural and performance improvement of the APEnet+ network fabric, a FPGA-based PCIe board with 6 fully bidirectional off-board links with 34 Gbps of raw bandwidth per direction, and X8 Gen2 bandwidth towards the host PC. The board implements a Remote Direct Memory Access (RDMA) protocol that leverages upon peer-to-peer (P2P) capabilities of Fermi- and Kepler-class NVIDIA GPUs to obtain real zero-copy, low-latency GPU-to-GPU transfers. Finally, we report on the development activities for 2013 focusing on the adoption of the latest generation 28 nm FPGAs and the preliminary tests performed on this new platform.
Architectural improvements and 28 nm FPGA implementation of the APEnet+ 3D Torus network for hybrid HPC systems

Energy Technology Data Exchange (ETDEWEB)

Ammendola, Roberto [INFN Sezione Roma Tor Vergata (Italy); Biagioni, Andrea; Frezza, Ottorino; Cicero, Francesca Lo; Paolucci, Pier Stanislao; Lonardo, Alessandro; Rossetti, Davide; Simula, Francesco; Tosoratto, Laura; Vicini, Piero [INFN Sezione Roma (Italy)

2014-06-11

Modern Graphics Processing Units (GPUs) are now considered accelerators for general purpose computation. A tight interaction between the GPU and the interconnection network is the strategy to express the full potential on capability computing of a multi-GPU system on large HPC clusters; that is the reason why an efficient and scalable interconnect is a key technology to finally deliver GPUs for scientific HPC. In this paper we show the latest architectural and performance improvement of the APEnet+ network fabric, a FPGA-based PCIe board with 6 fully bidirectional off-board links with 34 Gbps of raw bandwidth per direction, and X8 Gen2 bandwidth towards the host PC. The board implements a Remote Direct Memory Access (RDMA) protocol that leverages upon peer-to-peer (P2P) capabilities of Fermi- and Kepler-class NVIDIA GPUs to obtain real zero-copy, low-latency GPU-to-GPU transfers. Finally, we report on the development activities for 2013 focusing on the adoption of the latest generation 28 nm FPGAs and the preliminary tests performed on this new platform.
Special offer

CERN Multimedia

Staff Association

2011-01-01

SPECIAL OFFER FOR OUR MEMBERS Tarif unique Adulte/Enfant Entrée Zone terrestre 19 euros instead of 23 euros Entrée “Zone terrestre + aquatique” 24 euros instead of 31 euros Free for children under 3, with limited access to the attractions. Walibi Rhône-Alpes is open daily from 22 June to 31 August, and every week end from 3 September until 31 October. Closing of the “zone aquatique” 11 September.
Streaming nested data parallelism on multicores

DEFF Research Database (Denmark)

Madsen, Frederik Meisner; Filinski, Andrzej

2016-01-01

The paradigm of nested data parallelism (NDP) allows a variety of semi-regular computation tasks to be mapped onto SIMD-style hardware, including GPUs and vector units. However, some care is needed to keep down space consumption in situations where the available parallelism may vastly exceed...
Real-Time Simulation of Ship-Structure and Ship-Ship Interaction

DEFF Research Database (Denmark)

Lindberg, Ole; Glimberg, Stefan Lemvig; Bingham, Harry B.

2013-01-01

, because it is simple, easy to implement and computationally efficient. Multiple many-core graphical processing units (GPUs) are used for parallel execution and the model is implemented using a combination of C/C++, CUDA and MPI. Two ship hydrodynamic cases are presented: Kriso Container Carrier at steady...
GPU Accelerated Surgical Simulators for Complex Morhpology

DEFF Research Database (Denmark)

Mosegaard, Jesper; Sørensen, Thomas Sangild

2005-01-01

a springmass system in order to simulate a complex organ such as the heart. Computations are accelerated by taking advantage of modern graphics processing units (GPUs). Two GPU implementations are presented. They vary in their generality of spring connections and in the speedup factor they achieve...
Marketing strategy to differentiate the offer

OpenAIRE

Miceski, Trajko; Pasovska, Silvana

2013-01-01

The marketing strategy for differentiation of the offers is important and accepted strategy especially by the bigger legal entities. The differentiation of the offers leads to bigger profit and bigger profitability in operation, through targeting of the demand towards the product of the enterprise. The vertical differentiation of the offers is directed towards the quality of the product itself which is observed as a something superior despite the competitive product which is observed as somet...

A GPU-Accelerated Parameter Interpolation Thermodynamic Integration Free Energy Method.

Science.gov (United States)

Giese, Timothy J; York, Darrin M

2018-03-13

There has been a resurgence of interest in free energy methods motivated by the performance enhancements offered by molecular dynamics (MD) software written for specialized hardware, such as graphics processing units (GPUs). In this work, we exploit the properties of a parameter-interpolated thermodynamic integration (PI-TI) method to connect states by their molecular mechanical (MM) parameter values. This pathway is shown to be better behaved for Mg 2+ → Ca 2+ transformations than traditional linear alchemical pathways (with and without soft-core potentials). The PI-TI method has the practical advantage that no modification of the MD code is required to propagate the dynamics, and unlike with linear alchemical mixing, only one electrostatic evaluation is needed (e.g., single call to particle-mesh Ewald) leading to better performance. In the case of AMBER, this enables all the performance benefits of GPU-acceleration to be realized, in addition to unlocking the full spectrum of features available within the MD software, such as Hamiltonian replica exchange (HREM). The TI derivative evaluation can be accomplished efficiently in a post-processing step by reanalyzing the statistically independent trajectory frames in parallel for high throughput. We also show how one can evaluate the particle mesh Ewald contribution to the TI derivative evaluation without needing to perform two reciprocal space calculations. We apply the PI-TI method with HREM on GPUs in AMBER to predict p K a values in double stranded RNA molecules and make comparison with experiments. Convergence to under 0.25 units for these systems required 100 ns or more of sampling per window and coupling of windows with HREM. We find that MM charges derived from ab initio QM/MM fragment calculations improve the agreement between calculation and experimental results.
Bridging FPGA and GPU technologies for AO real-time control

Science.gov (United States)

Perret, Denis; Lainé, Maxime; Bernard, Julien; Gratadour, Damien; Sevin, Arnaud

2016-07-01

Our team has developed a common environment for high performance simulations and real-time control of AO systems based on the use of Graphics Processors Units in the context of the COMPASS project. Such a solution, based on the ability of the real time core in the simulation to provide adequate computing performance, limits the cost of developing AO RTC systems and makes them more scalable. A code developed and validated in the context of the simulation may be injected directly into the system and tested on sky. Furthermore, the use of relatively low cost components also offers significant advantages for the system hardware platform. However, the use of GPUs in an AO loop comes with drawbacks: the traditional way of offloading computation from CPU to GPUs - involving multiple copies and unacceptable overhead in kernel launching - is not well suited in a real time context. This last application requires the implementation of a solution enabling direct memory access (DMA) to the GPU memory from a third party device, bypassing the operating system. This allows this device to communicate directly with the real-time core of the simulation feeding it with the WFS camera pixel stream. We show that DMA between a custom FPGA-based frame-grabber and a computation unit (GPU, FPGA, or Coprocessor such as Xeon-phi) across PCIe allows us to get latencies compatible with what will be needed on ELTs. As a fine-grained synchronization mechanism is not yet made available by GPU vendors, we propose the use of memory polling to avoid interrupts handling and involvement of a CPU. Network and Vision protocols are handled by the FPGA-based Network Interface Card (NIC). We present the results we obtained on a complete AO loop using camera and deformable mirror simulators.
GPU accelerated fully space and time resolved numerical simulations of self-focusing laser beams in SBS-active media

Energy Technology Data Exchange (ETDEWEB)

Mauger, Sarah; Colin de Verdière, Guillaume [CEA-DAM, DIF, 91297 Arpajon (France); Bergé, Luc, E-mail: luc.berge@cea.fr [CEA-DAM, DIF, 91297 Arpajon (France); Skupin, Stefan [Max Planck Institute for the Physics of Complex Systems, 01187 Dresden (Germany); Friedrich Schiller University, Institute of Condensed Matter Theory and Optics, 07743 Jena (Germany)

2013-02-15

A computer cluster equipped with Graphics Processing Units (GPUs) is used for simulating nonlinear optical wave packets undergoing Kerr self-focusing and stimulated Brillouin scattering in fused silica. We first recall the model equations in full (3+1) dimensions. These consist of two coupled nonlinear Schrödinger equations for counterpropagating optical beams closed with a source equation for light-induced acoustic waves seeded by thermal noise. Compared with simulations on a conventional cluster of Central Processing Units (CPUs), GPU-based computations allow us to use a significant (16 times) larger number of mesh points within similar computation times. Reciprocally, simulations employing the same number of mesh points are between 3 and 20 times faster on GPUs than on the same number of classical CPUs. Performance speedups close to 45 are reported for isolated functions evaluating, e.g., the optical nonlinearities. Since the field intensities may reach the ionization threshold of silica, the action of a defocusing electron plasma is also addressed.
GPU accelerated fully space and time resolved numerical simulations of self-focusing laser beams in SBS-active media

International Nuclear Information System (INIS)

Mauger, Sarah; Colin de Verdière, Guillaume; Bergé, Luc; Skupin, Stefan

2013-01-01

A computer cluster equipped with Graphics Processing Units (GPUs) is used for simulating nonlinear optical wave packets undergoing Kerr self-focusing and stimulated Brillouin scattering in fused silica. We first recall the model equations in full (3+1) dimensions. These consist of two coupled nonlinear Schrödinger equations for counterpropagating optical beams closed with a source equation for light-induced acoustic waves seeded by thermal noise. Compared with simulations on a conventional cluster of Central Processing Units (CPUs), GPU-based computations allow us to use a significant (16 times) larger number of mesh points within similar computation times. Reciprocally, simulations employing the same number of mesh points are between 3 and 20 times faster on GPUs than on the same number of classical CPUs. Performance speedups close to 45 are reported for isolated functions evaluating, e.g., the optical nonlinearities. Since the field intensities may reach the ionization threshold of silica, the action of a defocusing electron plasma is also addressed
17 CFR 230.253 - Offering circular.

Science.gov (United States)

2010-04-01

.... Repetition of information should be avoided; cross-referencing of information within the document is... COMPLETENESS OF ANY OFFERING CIRCULAR OR OTHER SELLING LITERATURE. THESE SECURITIES ARE OFFERED PURSUANT TO AN...
Accelerating large-scale protein structure alignments with graphics processing units

Directory of Open Access Journals (Sweden)

Pang Bin

2012-02-01

Full Text Available Abstract Background Large-scale protein structure alignment, an indispensable tool to structural bioinformatics, poses a tremendous challenge on computational resources. To ensure structure alignment accuracy and efficiency, efforts have been made to parallelize traditional alignment algorithms in grid environments. However, these solutions are costly and of limited accessibility. Others trade alignment quality for speedup by using high-level characteristics of structure fragments for structure comparisons. Findings We present ppsAlign, a parallel protein structure Alignment framework designed and optimized to exploit the parallelism of Graphics Processing Units (GPUs. As a general-purpose GPU platform, ppsAlign could take many concurrent methods, such as TM-align and Fr-TM-align, into the parallelized algorithm design. We evaluated ppsAlign on an NVIDIA Tesla C2050 GPU card, and compared it with existing software solutions running on an AMD dual-core CPU. We observed a 36-fold speedup over TM-align, a 65-fold speedup over Fr-TM-align, and a 40-fold speedup over MAMMOTH. Conclusions ppsAlign is a high-performance protein structure alignment tool designed to tackle the computational complexity issues from protein structural data. The solution presented in this paper allows large-scale structure comparisons to be performed using massive parallel computing power of GPU.
Minneapolis Multi-Ethnic Curriculum Project--Migration Unit.

Science.gov (United States)

Minneapolis Public Schools, Minn. Dept. of Intergroup Education.

The student booklet presents short chapters illustrating the migration unit of the Minneapolis Multi-Ethnic Curriculum Project for secondary schools. Sixteen brief chapters describe migration, immigration, and emigration in the United States. The first six chapters offer first person accounts of immigrants from Norway, Korea, Egypt, Hitler's…
A Robust Optimisation Approach using CVaR for Unit Commitment in a Market with Probabilistic Offers

DEFF Research Database (Denmark)

Bukhsh, W. A.; Papakonstantinou, Athanasios; Pinson, Pierre

2016-01-01

The large scale integration of renewable energy sources (RES) challenges power system planners and operators alike as it can potentially introduce the need for costly investments in infrastructure. Furthermore, traditional market clearing mechanisms are no longer optimal due to the stochastic...... nature of RES. This paper presents a risk-aware market clearing strategy for a network with significant shares of RES. We propose an electricity market that embeds the uncertainty brought by wind power and other stochastic renewable sources by accepting probabilistic offers and use a risk measure defined...
An Offer You Cannot Refuse: Obtaining Efficiency and Fairness in Preplay Negotiation Games with Conditional Offers

DEFF Research Database (Denmark)

Goranko, Valentin; Turrini, Paolo

2013-01-01

. Such offers transform the payoff matrix of the original game and allow for some degree of cooperation between rational players while preserving the non-cooperative nature of the game. We focus on 2-player negotiations games arising in the preplay phase when offers for payments are made conditional...... on a suggested matching offer of the same kind being made in return by the receiver. We study and analyze such bargaining games, obtain results describing their possible solutions and discuss the degrees of efficiency and fairness that can be achieved in such negotiation process depending on whether time...
Sustainable Offering Practices Through Stakeholders Engagement

Directory of Open Access Journals (Sweden)

Bijay Prasad Kushwaha

2018-02-01

Full Text Available Sustainable development is achieved by satisfying the current ends without shrinking the existing means which can serve as needs for the society in the future. It has become global motive and responsibility of present community to utilize resources in an optimum way with minimum environmental damage. The objective of this paper is to study theoretical framework and practical approaches on sustainable offering practices through customer engagement. The study has also examined the opportunities and challenges of sustainable offering practices in India. The study is based on a previous study and secondary data has been used for analysis. The outcome revealed the process for successful sustainable offering practices in context of Indian consumers. The analysis has helped to understand different practices of sustainable offering through engaging stakeholders.
Postgraduates courses offered to nursing

Directory of Open Access Journals (Sweden)

Pedro Jorge Araujo

2011-07-01

Full Text Available Aim: To know the official masters that the Spanish Universities have offered during the academic course 2010/2011.Material and methods: Descriptive observational and transversal court study, in which it has analysed 170 university official masters and in which it has used a questionnaire with a total of 15 questions elaborated for this work.Results: 52 Spanish Universities of the 75 that there is have offered during the academic course 2010/2011 official masters that can realise for graduated in infirmary. By areas, the official masters more offered have been the ones of nutrition and alimentary security. 76,33% of the official masters have a length of 1 academic year. Almost the half of the official masters have an orientation researcher-professional and almost 40% researcher. 62,65% of the masters give of face-to-face way. In 52,1% of the official masters do not realise external practices and 86,2% has continuity with the doctorate.Conclusions: It has seen that it is necessary that expand the number of masters including other fields of study that contribute to a main specialisation of the professionals of the infirmary. An important percentage of official masters give in face-to-face modality, and there is very few offered on-line or to distance.
Molecular Monte Carlo Simulations Using Graphics Processing Units: To Waste Recycle or Not?

Science.gov (United States)

Kim, Jihan; Rodgers, Jocelyn M; Athènes, Manuel; Smit, Berend

2011-10-11

In the waste recycling Monte Carlo (WRMC) algorithm, (1) multiple trial states may be simultaneously generated and utilized during Monte Carlo moves to improve the statistical accuracy of the simulations, suggesting that such an algorithm may be well posed for implementation in parallel on graphics processing units (GPUs). In this paper, we implement two waste recycling Monte Carlo algorithms in CUDA (Compute Unified Device Architecture) using uniformly distributed random trial states and trial states based on displacement random-walk steps, and we test the methods on a methane-zeolite MFI framework system to evaluate their utility. We discuss the specific implementation details of the waste recycling GPU algorithm and compare the methods to other parallel algorithms optimized for the framework system. We analyze the relationship between the statistical accuracy of our simulations and the CUDA block size to determine the efficient allocation of the GPU hardware resources. We make comparisons between the GPU and the serial CPU Monte Carlo implementations to assess speedup over conventional microprocessors. Finally, we apply our optimized GPU algorithms to the important problem of determining free energy landscapes, in this case for molecular motion through the zeolite LTA.
47 CFR 76.1621 - Equipment compatibility offer.

Science.gov (United States)

2010-10-01

... 47 Telecommunication 4 2010-10-01 2010-10-01 false Equipment compatibility offer. 76.1621 Section... MULTICHANNEL VIDEO AND CABLE TELEVISION SERVICE Notices § 76.1621 Equipment compatibility offer. Cable system... offer to supply each subscriber with special equipment that will enable the simultaneous reception of...
The primary relevance of subconsciously offered attitudes

DEFF Research Database (Denmark)

Kristiansen, Tore

2015-01-01

) and subconsciously (covertly) offered attitudes – because subconsciously offered attitudes appear to be a driving force in linguistic variation and change in a way that consciously offered attitudes are not. The argument is based on evidence from empirical investigations of attitudes and use in the ‘...
48 CFR 619.804-2 - Agency offering.

Science.gov (United States)

2010-10-01

... 48 Federal Acquisition Regulations System 4 2010-10-01 2010-10-01 false Agency offering. 619.804-2 Section 619.804-2 Federal Acquisition Regulations System DEPARTMENT OF STATE SOCIOECONOMIC PROGRAMS SMALL... offering. (a) When applicable, this notification shall identify that the offering is in accordance with the...
What is sought from graphic designers? A first thematic analysis of job offers for graphic design positions in the United Kingdom

OpenAIRE

Nicoletti Dziobczenski, Paulo; Person, Oscar

2016-01-01

An empirically grounded understanding about which knowledge and skills that are sought from designers is missing for a number of professional subfields of design. This gap in research challenges i) design educators in planning their educational offerings and ii) design practitioners and students in articulating their contribution to clients and future employers. In this paper, we study the references that are made to knowledge and skills in job offers for graphic designers in UK. Based on a f...
Ottawa offers funds for particle accelerator

International Nuclear Information System (INIS)

1991-01-01

The federal government has offered to contribute at least $236 million toward the controversial KAON particle accelerator facility in Vancouver. Justice Minister Kim Campbell says that no deal on the project has been signed, but negotiations with British Columbia are going well. She said Ottawa is prepared to contribute a third of the operating costs. The facility is intended to investigate the basic structure of matter by smashing atoms into their tiniest components known as quarks. It's estimated that operating costs will be in the range of $90 million a year. Campbell said the United States is willing to contribute $100 million toward the project, but did not know what this would be for. Debate about the KAON facility within the scientific community has been raging for years. Many scientists fear KAON would draw money away from other areas of research, which already face chronic financial problems. Campbell insisted that KAON would not distort overall research priorities, but made no firm commitments about increases for other areas of science. She said money for KAON, assuming the project does get final approval, will not be delivered before the 1994 fiscal year and won't affect efforts to reduce the federal deficit
Unit Pricing and Alternatives: Developing an Individualized Shopping Strategy.

Science.gov (United States)

Cude, Brenda; Walker, Rosemary

1985-01-01

This article offers a new perspective on the teaching of unit pricing in consumer economics classes by identifying ways to teach the costs as well as the benefits of unit pricing and realistic guidelines for suggesting situations in which it is most appropriate. Alternatives to unit pricing will also be explored. (CT)
48 CFR 852.273-70 - Late offers.

Science.gov (United States)

2010-10-01

... 48 Federal Acquisition Regulations System 5 2010-10-01 2010-10-01 false Late offers. 852.273-70... SOLICITATION PROVISIONS AND CONTRACT CLAUSES Texts of Provisions and Clauses 852.273-70 Late offers. As prescribed in 873.110(a), insert the following provision: Late Offers (JAN 2003) This provision replaces...
Innovative gas offers; Les offres gazieres innovantes

Energy Technology Data Exchange (ETDEWEB)

Sala, O.; Mela, P. [Gaz de France (GDF), 75 - Paris (France); Chatelain, F. [Primagaz, 75 - Paris (France)

2007-07-01

New energy offers are progressively made available as the opening of gas market to competition becomes broader. How are organized the combined offers: gas, electricity, renewable energies and energy services? What are the marketing strategies implemented? Three participants at this round table present their offer and answer these questions. (J.S.)

When Are Caregivers More Likely to Offer Sugary Drinks and Snacks to Infants? A Qualitative Thematic Synthesis.

Science.gov (United States)

Moore, Deborah Anne; Goodwin, Tom Lloyd; Brocklehurst, Paul R; Armitage, Christopher J; Glenny, Anne-Marie

2017-01-01

Many children consume more sugar than is recommended, and caregivers often find it difficult to change this habit once established. This thematic synthesis aims to identify the "critical situations" where caregivers may be more likely to offer infants sugary drinks and snacks. This thematic synthesis is reported in accordance with the statement for enhancing transparency in reporting the synthesis of qualitative research (ENTREQ). Our confidence in the findings of our synthesis was assessed using the CERQual (Confidence in the Evidence From Reviews of Qualitative Research Approach). We included 16 studies from the United States, the United Kingdom, Australia, and Denmark. We identified eight "critical situations" when caregivers may be more likely to offer sugary drinks and snacks to infants. Interventions that seek to reduce sugar intake for caries prevention in infants and young children may be more successful if they provide caregivers with practical parenting strategies to replace the nonnutritive functions of sugary foods and drinks, as opposed to taking an information-giving approach. © The Author(s) 2016.
High performance stream computing for particle beam transport simulations

International Nuclear Information System (INIS)

Appleby, R; Bailey, D; Higham, J; Salt, M

2008-01-01

Understanding modern particle accelerators requires simulating charged particle transport through the machine elements. These simulations can be very time consuming due to the large number of particles and the need to consider many turns of a circular machine. Stream computing offers an attractive way to dramatically improve the performance of such simulations by calculating the simultaneous transport of many particles using dedicated hardware. Modern Graphics Processing Units (GPUs) are powerful and affordable stream computing devices. The results of simulations of particle transport through the booster-to-storage-ring transfer line of the DIAMOND synchrotron light source using an NVidia GeForce 7900 GPU are compared to the standard transport code MAD. It is found that particle transport calculations are suitable for stream processing and large performance increases are possible. The accuracy and potential speed gains are compared and the prospects for future work in the area are discussed
48 CFR 2825.203 - Evaluating offers.

Science.gov (United States)

2010-10-01

... 48 Federal Acquisition Regulations System 6 2010-10-01 2010-10-01 true Evaluating offers. 2825.203 Section 2825.203 Federal Acquisition Regulations System DEPARTMENT OF JUSTICE Socioeconomic Programs FOREIGN ACQUISITION Buy American Act-Construction Materials 2825.203 Evaluating offers. The HCA, or...
48 CFR 19.804-2 - Agency offering.

Science.gov (United States)

2010-10-01

... 48 Federal Acquisition Regulations System 1 2010-10-01 2010-10-01 false Agency offering. 19.804-2....804-2 Agency offering. (a) After completing its evaluation, the agency must notify the SBA of the... statement that prior to the offering no solicitation for the specific acquisition has been issued as a small...
14 CFR 406.155 - Offer of proof.

Science.gov (United States)

2010-01-01

... 14 Aeronautics and Space 4 2010-01-01 2010-01-01 false Offer of proof. 406.155 Section 406.155... Transportation Adjudications § 406.155 Offer of proof. A party whose evidence has been excluded by a ruling of the administrative law judge may offer the evidence for the record on appeal. ...
36 CFR 1150.79 - Offer of proof.

Science.gov (United States)

2010-07-01

... 36 Parks, Forests, and Public Property 3 2010-07-01 2010-07-01 false Offer of proof. 1150.79... BOARD PRACTICE AND PROCEDURES FOR COMPLIANCE HEARINGS Hearing Procedures § 1150.79 Offer of proof. An offer of proof made in connection with an objection taken to a ruling of the judge rejecting or...
Introduction to assembly of finite element methods on graphics processors

International Nuclear Information System (INIS)

Cecka, Cristopher; Lew, Adrian; Darve, Eric

2010-01-01

Recently, graphics processing units (GPUs) have had great success in accelerating numerical computations. We present their application to computations on unstructured meshes such as those in finite element methods. Multiple approaches in assembling and solving sparse linear systems with NVIDIA GPUs and the Compute Unified Device Architecture (CUDA) are presented and discussed. Multiple strategies for efficient use of global, shared, and local memory, methods to achieve memory coalescing, and optimal choice of parameters are introduced. We find that with appropriate preprocessing and arrangement of support data, the GPU coprocessor achieves speedups of 30x or more in comparison to a well optimized serial implementation on the CPU. We also find that the optimal assembly strategy depends on the order of polynomials used in the finite-element discretization.
Generating Billion-Edge Scale-Free Networks in Seconds: Performance Study of a Novel GPU-based Preferential Attachment Model

Energy Technology Data Exchange (ETDEWEB)

Perumalla, Kalyan S. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Alam, Maksudul [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)

2017-10-01

A novel parallel algorithm is presented for generating random scale-free networks using the preferential-attachment model. The algorithm, named cuPPA, is custom-designed for single instruction multiple data (SIMD) style of parallel processing supported by modern processors such as graphical processing units (GPUs). To the best of our knowledge, our algorithm is the first to exploit GPUs, and also the fastest implementation available today, to generate scale free networks using the preferential attachment model. A detailed performance study is presented to understand the scalability and runtime characteristics of the cuPPA algorithm. In one of the best cases, when executed on an NVidia GeForce 1080 GPU, cuPPA generates a scale free network of a billion edges in less than 2 seconds.
14 CFR 16.231 - Offer of proof.

Science.gov (United States)

2010-01-01

... 14 Aeronautics and Space 1 2010-01-01 2010-01-01 false Offer of proof. 16.231 Section 16.231... PRACTICE FOR FEDERALLY-ASSISTED AIRPORT ENFORCEMENT PROCEEDINGS Hearings § 16.231 Offer of proof. A party whose evidence has been excluded by a ruling of the hearing officer may offer the evidence on the record...
49 CFR 1503.641 - Offer of proof.

Science.gov (United States)

2010-10-01

... 49 Transportation 9 2010-10-01 2010-10-01 false Offer of proof. 1503.641 Section 1503.641... Rules of Practice in TSA Civil Penalty Actions § 1503.641 Offer of proof. A party whose evidence has been excluded by a ruling of the ALJ may offer the evidence for the record on appeal. ...
38 CFR 18b.64 - Offer of proof.

Science.gov (United States)

2010-07-01

... 38 Pensions, Bonuses, and Veterans' Relief 2 2010-07-01 2010-07-01 false Offer of proof. 18b.64... Procedures § 18b.64 Offer of proof. An offer of proof made in connection with an objection taken to any... record as the offer of proof. ...
Offers

CERN Multimedia

Staff Association

2013-01-01

Do not hesitate to benefit of our offers in our partners: Théâtre de Carouge Discount of 5 CHF for all shows (30 CHF instead of 35 CHF) and on season tickets « first performance » ( 132 CHF instead 162 CHF) and also on « classical » ( 150 CHF instead of 180 CHF) upon presentation of your Staff Association membership card before payment. Théâtre La Comédie de Genève 20% off on tickets (full price – also available for partner): from 24 to 32 CHF a ticket instead of 30 to 40 CHF depending on the shows. 40% off on annual subscriptions (access to the best seats, pick up tickets at the last minute): 200 CHF for 9 shows (about 22 CHF a ticket instead of 30 to 40 CHF. Discounted card: 60 CHF and single price ticket of 16 CHF.
14 CFR 13.225 - Offer of proof.

Science.gov (United States)

2010-01-01

... 14 Aeronautics and Space 1 2010-01-01 2010-01-01 false Offer of proof. 13.225 Section 13.225... INVESTIGATIVE AND ENFORCEMENT PROCEDURES Rules of Practice in FAA Civil Penalty Actions § 13.225 Offer of proof. A party whose evidence has been excluded by a ruling of the administrative law judge may offer the...
34 CFR 101.85 - Offer of proof.

Science.gov (United States)

2010-07-01

... 34 Education 1 2010-07-01 2010-07-01 false Offer of proof. 101.85 Section 101.85 Education... PRACTICE AND PROCEDURE FOR HEARINGS UNDER PART 100 OF THIS TITLE Hearing Procedures § 101.85 Offer of proof. An offer of proof made in connection with an objection taken to any ruling of the presiding officer...
7 CFR 15.122 - Offer of proof.

Science.gov (United States)

2010-01-01

... 7 Agriculture 1 2010-01-01 2010-01-01 false Offer of proof. 15.122 Section 15.122 Agriculture..., Decisions and Administrative Review Under the Civil Rights Act of 1964 Hearing Procedures § 15.122 Offer of proof. An offer of proof made in connection with an objection taken to any ruling of the hearing officer...
43 CFR 4.840 - Offer of proof.

Science.gov (United States)

2010-10-01

... the Interior-Effectuation of Title VI of the Civil Rights Act of 1964 Hearing § 4.840 Offer of proof. An offer of proof made in connection with an objection taken to any ruling of the administrative law... 43 Public Lands: Interior 1 2010-10-01 2010-10-01 false Offer of proof. 4.840 Section 4.840 Public...
Employer health insurance offerings and employee enrollment decisions.

Science.gov (United States)

Polsky, Daniel; Stein, Rebecca; Nicholson, Sean; Bundorf, M Kate

2005-10-01

To determine how the characteristics of the health benefits offered by employers affect worker insurance coverage decisions. The 1996-1997 and the 1998-1999 rounds of the nationally representative Community Tracking Study Household Survey. We use multinomial logistic regression to analyze the choice between own-employer coverage, alternative source coverage, and no coverage among employees offered health insurance by their employer. The key explanatory variables are the types of health plans offered and the net premium offered. The models include controls for personal, health plan, and job characteristics. When an employer offers only a health maintenance organization married employees are more likely to decline coverage from their employer and take-up another offer (odds ratio (OR)=1.27, phealth plan coverage an employer offers affects whether its employees take-up insurance, but has a smaller effect on overall coverage rates for workers and their families because of the availability of alternative sources of coverage. Relative to offering only a non-HMO plan, employers offering only an HMO may reduce take-up among those with alternative sources of coverage, but increase take-up among those who would otherwise go uninsured. By modeling the possibility of take-up through the health insurance offers from the employer of the spouse, the decline in coverage rates from higher net premiums is less than previous estimates.
12 CFR 335.501 - Tender offers.

Science.gov (United States)

2010-01-01

... 12 Banks and Banking 4 2010-01-01 2010-01-01 false Tender offers. 335.501 Section 335.501 Banks and Banking FEDERAL DEPOSIT INSURANCE CORPORATION REGULATIONS AND STATEMENTS OF GENERAL POLICY SECURITIES OF NONMEMBER INSURED BANKS § 335.501 Tender offers. The provisions of the applicable and currently...
12 CFR 563g.2 - Offering circular requirement.

Science.gov (United States)

2010-01-01

... 12 Banks and Banking 5 2010-01-01 2010-01-01 false Offering circular requirement. 563g.2 Section 563g.2 Banks and Banking OFFICE OF THRIFT SUPERVISION, DEPARTMENT OF THE TREASURY SECURITIES OFFERINGS § 563g.2 Offering circular requirement. (a) General. No savings association shall offer or sell, directly...
Strategies of persuasion in offers to participate in cancer clinical trials I: Topic placement and topic framing.

Science.gov (United States)

Barton, Ellen; Eggly, Susan; Winckles, Andrew; Albrecht, Terrance L

2014-01-01

Clinical trials are the gold standard in medical research evaluating new treatments in cancer care; however, in the United States, too few patients enroll in trials, especially patients from minority groups. Offering patients the option of a clinical trial is an ethically-charged communicative event for oncologists. One particularly vexed ethical issue is the use of persuasion in trial offers. Based on a corpus of 22 oncology encounters with Caucasian-American (n = 11) and African-American (n = 11) patients, this discourse analysis describes oncologists' use of two persuasive strategies related to the linguistic structure of trial offers: topic placement and topic framing. Findings are presented in total and by patient race, and discussed in terms of whether these strategies may constitute ethical or unethical persuasion, particularly with respect to the ethical issue of undue influence and the social issue of underrepresentation of minorities in cancer clinical trials.

12 CFR 563g.4 - Non-public offering.

Science.gov (United States)

2010-01-01

... 12 Banks and Banking 5 2010-01-01 2010-01-01 false Non-public offering. 563g.4 Section 563g.4 Banks and Banking OFFICE OF THRIFT SUPERVISION, DEPARTMENT OF THE TREASURY SECURITIES OFFERINGS § 563g.4 Non-public offering. Offers and sales of securities by an issuer that satisfy the conditions of...
Karl Marx, Ludwig Wittgenstein, and Black Underachievement in the United States and United Kingdom

Science.gov (United States)

Tomlin, Carol; Wright, Cecile; Mocombe, Paul C.

2013-01-01

This article synthesizes Marxian conceptions of identity construction within capitalist relations of production with the Wittgensteinian notion of "language games" to offer a more appropriate relational framework within which scholars ought to understand the Black-White academic achievement gap in America, the United Kingdom, and…
43 CFR 12.815 - Evaluating offers.

Science.gov (United States)

2010-10-01

... 43 Public Lands: Interior 1 2010-10-01 2010-10-01 false Evaluating offers. 12.815 Section 12.815 Public Lands: Interior Office of the Secretary of the Interior ADMINISTRATIVE AND AUDIT REQUIREMENTS AND... Act-Construction Materials § 12.815 Evaluating offers. (a) The restrictions of the Buy American Act do...
Offers for our members

CERN Multimedia

Staff Association

2013-01-01

The Courir shops propose the following offer: 15% discount on all articles (not on sales) in the Courir shops (Val Thoiry, Annemasse and Neydens) and 5% discount on sales upon presentation of your Staff Association membership card and an identity card before payment. Summer is here, enjoy our offers for the aquatic parcs! Walibi : Tickets "Zone terrestre": 21 € instead of 26 €. Access to Aqualibi: 5 € instead of 8 € on presentation of your SA member ticket. Free for children (3-11 years old) before 12 h 00. Free for children under 3, with limited access to the attractions. Car park free. * * * * * Aquaparc : Day ticket: – Children: 30 CHF instead of 39 CHF – Adults : 36 CHF instead of 49 CHF Bonus! Free for children under 5.
Offers

CERN Multimedia

Staff Association

2011-01-01

At the UN Cultural kiosk (door C6) This offer is meant for international civil servants, members of diplomatic missions as well as official delegates under presentation of their accreditation card. Matthew Lee & 5 musiciens Du Blues, du Boogie, du Rock’n’Roll 28 octobre 2011 à 20h30 Théâtre du Léman Quai du Mont-Blanc 19 Hôtel Kempinski Genève Matthew Lee is an exciting pianist singer combining classic Rock’n’Roll with timeless ballads. He revisits the standards, being alternately Jerry Lee Lewis, Chuck Berry, Little Richards and many others... He is a showman with a soulful voice and displays virtuosity during his piano solos. Simply amazing! 20 % reduction Tickets from 32 to 68 CHF Kiosque Culturel ONU Palais des Nations Porte 6 Avenue de la Paix 8-14 1211 Genève 10 Tél. 022 917 11 11 info@kiosqueonu.ch
Offers

CERN Multimedia

Staff Association

2013-01-01

FUTUREKIDS proposes 15% discount for the Staff Association members who enroll their children in FUTUREKIDS activities. New workshop for 12-15 year olds, on how to develop applications for Android phones. Easter activities calendar Extracurricular Activities For Your Children The FUTUREKIDS Geneva Learning Center is open 6 days a week and offers a selection of after-school extracurricular activities for children and teenagers (ages 5 to 16). In addition to teaching in its Learning Centers, Futurekids collaborates with many private schools in Suisse Romande (Florimont, Moser, Champittet, Ecole Nouvelle, etc.) and with the Département de l'Instruction Publique (DIP) Genève. Courses and camps are usually in French but English groups can be set up on demand. FUTUREKIDS Computer Camps (during school holidays) FUTUREKIDS Computer Camps are a way of having a great time during vacations while learning something useful, possibly discovering a new hobby or even, why not, a fut...
Offer

CERN Multimedia

Staff Association

2015-01-01

RRP Communication organizes cultural events such as concerts, shows, sporting events. The members of the Staff Association profits from a reduction of 10 CHF per ticket. How to proceed: The ticket reservation is made by mail info@rrp.ch. You need to give the following information: Name of the show, and which date chosen Number of tickets, and category Name and surname Address Telephone number Mention “offer CERN”, and attach a photocopy of your Staff Association member card. After your reservation, you will be sent a copy with a payslip to the address mentioned above. Once paid, the members have the possibility to: pick up their ticket(s) from the cash register the evening of the show (opens 1 hour before the show) by showing their member card; receive the ticket(s) to the address indicated above, by registered mail, subject to an extra cost of 10CHF. Next show : More information at http://www.rrp.ch/
Offers

CERN Multimedia

Staff Association

2011-01-01

Special offer for members of the Staff Association and their families 10% reduction on all products in the SEPHORA shop (sells perfume, beauty products etc.) in Val Thoiry ALL YEAR ROUND. Plus 20% reduction during their “vente privée”* three or four times a year. Simply present your Staff Association membership card when you make your purchase. * Next “vente privée” from 21st to 26th November 2011 New BCGE Business partner benefits As you may remember thanks to our BCGE business partner agreement you benefit from various advantages such as free annual subscription on your Silver or Gold credit card both for yourself and your partner (joint account). Please be informed that as of October 1st 2011 the below mentioned features will be added to your annual credit card subscription : MasterCard/Visa Silver and Gold: travel cancellation as well as related services such as holiday interruption best guaranteed price Only for Ma...
Two schemes for rapid generation of digital video holograms using PC cluster

Science.gov (United States)

Park, Hanhoon; Song, Joongseok; Kim, Changseob; Park, Jong-Il

2017-12-01

Computer-generated holography (CGH), which is a process of generating digital holograms, is computationally expensive. Recently, several methods/systems of parallelizing the process using graphic processing units (GPUs) have been proposed. Indeed, use of multiple GPUs or a personal computer (PC) cluster (each PC with GPUs) enabled great improvements in the process speed. However, extant literature has less often explored systems involving rapid generation of multiple digital holograms and specialized systems for rapid generation of a digital video hologram. This study proposes a system that uses a PC cluster and is able to more efficiently generate a video hologram. The proposed system is designed to simultaneously generate multiple frames and accelerate the generation by parallelizing the CGH computations across a number of frames, as opposed to separately generating each individual frame while parallelizing the CGH computations within each frame. The proposed system also enables the subprocesses for generating each frame to execute in parallel through multithreading. With these two schemes, the proposed system significantly reduced the data communication time for generating a digital hologram when compared with that of the state-of-the-art system.
GPU-Based Cloud Service for Smith-Waterman Algorithm Using Frequency Distance Filtration Scheme

Directory of Open Access Journals (Sweden)

Sheng-Ta Lee

2013-01-01

Full Text Available As the conventional means of analyzing the similarity between a query sequence and database sequences, the Smith-Waterman algorithm is feasible for a database search owing to its high sensitivity. However, this algorithm is still quite time consuming. CUDA programming can improve computations efficiently by using the computational power of massive computing hardware as graphics processing units (GPUs. This work presents a novel Smith-Waterman algorithm with a frequency-based filtration method on GPUs rather than merely accelerating the comparisons yet expending computational resources to handle such unnecessary comparisons. A user friendly interface is also designed for potential cloud server applications with GPUs. Additionally, two data sets, H1N1 protein sequences (query sequence set and human protein database (database set, are selected, followed by a comparison of CUDA-SW and CUDA-SW with the filtration method, referred to herein as CUDA-SWf. Experimental results indicate that reducing unnecessary sequence alignments can improve the computational time by up to 41%. Importantly, by using CUDA-SWf as a cloud service, this application can be accessed from any computing environment of a device with an Internet connection without time constraints.
GPU-based cloud service for Smith-Waterman algorithm using frequency distance filtration scheme.

Science.gov (United States)

Lee, Sheng-Ta; Lin, Chun-Yuan; Hung, Che Lun

2013-01-01

As the conventional means of analyzing the similarity between a query sequence and database sequences, the Smith-Waterman algorithm is feasible for a database search owing to its high sensitivity. However, this algorithm is still quite time consuming. CUDA programming can improve computations efficiently by using the computational power of massive computing hardware as graphics processing units (GPUs). This work presents a novel Smith-Waterman algorithm with a frequency-based filtration method on GPUs rather than merely accelerating the comparisons yet expending computational resources to handle such unnecessary comparisons. A user friendly interface is also designed for potential cloud server applications with GPUs. Additionally, two data sets, H1N1 protein sequences (query sequence set) and human protein database (database set), are selected, followed by a comparison of CUDA-SW and CUDA-SW with the filtration method, referred to herein as CUDA-SWf. Experimental results indicate that reducing unnecessary sequence alignments can improve the computational time by up to 41%. Importantly, by using CUDA-SWf as a cloud service, this application can be accessed from any computing environment of a device with an Internet connection without time constraints.
CUDA/GPU Technology : Parallel Programming For High Performance Scientific Computing

OpenAIRE

YUHENDRA; KUZE, Hiroaki; JOSAPHAT, Tetuko Sri Sumantyo

2009-01-01

[ABSTRACT]Graphics processing units (GP Us) originally designed for computer video cards have emerged as the most powerful chip in a high-performance workstation. In the high performance computation capabilities, graphic processing units (GPU) lead to much more powerful performance than conventional CPUs by means of parallel processing. In 2007, the birth of Compute Unified Device Architecture (CUDA) and CUDA-enabled GPUs by NVIDIA Corporation brought a revolution in the general purpose GPU a...
Negotiating for more: the multiple equivalent simultaneous offer.

Science.gov (United States)

Heller, Richard E

2014-02-01

Whether a doctor, professional baseball manager, or a politician, having successful negotiation skills is a critical part of being a leader. Building upon prior journal articles on negotiation strategy, the author presents the concept of the multiple equivalent simultaneous offer (MESO). The concept of a MESO is straightforward: as opposed to making a single offer, make multiple offers with several variables. Each offer alters the different variables, such that the end result of each offer is equivalent from the perspective of the party making the offer. Research has found several advantages to the use of MESOs. For example, using MESOs, an offer was more likely to be accepted, and the counterparty was more likely to be satisfied with the negotiated deal. Additional benefits have been documented as well, underscoring why a prepared radiology business leader should understand the theory and practice of MESO. Copyright © 2014 American College of Radiology. Published by Elsevier Inc. All rights reserved.
Graphics Processing Unit-Enhanced Genetic Algorithms for Solving the Temporal Dynamics of Gene Regulatory Networks.

Science.gov (United States)

García-Calvo, Raúl; Guisado, J L; Diaz-Del-Rio, Fernando; Córdoba, Antonio; Jiménez-Morales, Francisco

2018-01-01

Understanding the regulation of gene expression is one of the key problems in current biology. A promising method for that purpose is the determination of the temporal dynamics between known initial and ending network states, by using simple acting rules. The huge amount of rule combinations and the nonlinear inherent nature of the problem make genetic algorithms an excellent candidate for finding optimal solutions. As this is a computationally intensive problem that needs long runtimes in conventional architectures for realistic network sizes, it is fundamental to accelerate this task. In this article, we study how to develop efficient parallel implementations of this method for the fine-grained parallel architecture of graphics processing units (GPUs) using the compute unified device architecture (CUDA) platform. An exhaustive and methodical study of various parallel genetic algorithm schemes-master-slave, island, cellular, and hybrid models, and various individual selection methods (roulette, elitist)-is carried out for this problem. Several procedures that optimize the use of the GPU's resources are presented. We conclude that the implementation that produces better results (both from the performance and the genetic algorithm fitness perspectives) is simulating a few thousands of individuals grouped in a few islands using elitist selection. This model comprises 2 mighty factors for discovering the best solutions: finding good individuals in a short number of generations, and introducing genetic diversity via a relatively frequent and numerous migration. As a result, we have even found the optimal solution for the analyzed gene regulatory network (GRN). In addition, a comparative study of the performance obtained by the different parallel implementations on GPU versus a sequential application on CPU is carried out. In our tests, a multifold speedup was obtained for our optimized parallel implementation of the method on medium class GPU over an equivalent
A Parallel Algebraic Multigrid Solver on Graphics Processing Units

KAUST Repository

Haase, Gundolf

2010-01-01

The paper presents a multi-GPU implementation of the preconditioned conjugate gradient algorithm with an algebraic multigrid preconditioner (PCG-AMG) for an elliptic model problem on a 3D unstructured grid. An efficient parallel sparse matrix-vector multiplication scheme underlying the PCG-AMG algorithm is presented for the many-core GPU architecture. A performance comparison of the parallel solver shows that a singe Nvidia Tesla C1060 GPU board delivers the performance of a sixteen node Infiniband cluster and a multi-GPU configuration with eight GPUs is about 100 times faster than a typical server CPU core. © 2010 Springer-Verlag.
N-body simulation for self-gravitating collisional systems with a new SIMD instruction set extension to the x86 architecture, Advanced Vector eXtensions

Science.gov (United States)

Tanikawa, Ataru; Yoshikawa, Kohji; Okamoto, Takashi; Nitadori, Keigo

2012-02-01

We present a high-performance N-body code for self-gravitating collisional systems accelerated with the aid of a new SIMD instruction set extension of the x86 architecture: Advanced Vector eXtensions (AVX), an enhanced version of the Streaming SIMD Extensions (SSE). With one processor core of Intel Core i7-2600 processor (8 MB cache and 3.40 GHz) based on Sandy Bridge micro-architecture, we implemented a fourth-order Hermite scheme with individual timestep scheme ( Makino and Aarseth, 1992), and achieved the performance of ˜20 giga floating point number operations per second (GFLOPS) for double-precision accuracy, which is two times and five times higher than that of the previously developed code implemented with the SSE instructions ( Nitadori et al., 2006b), and that of a code implemented without any explicit use of SIMD instructions with the same processor core, respectively. We have parallelized the code by using so-called NINJA scheme ( Nitadori et al., 2006a), and achieved ˜90 GFLOPS for a system containing more than N = 8192 particles with 8 MPI processes on four cores. We expect to achieve about 10 tera FLOPS (TFLOPS) for a self-gravitating collisional system with N ˜ 10 5 on massively parallel systems with at most 800 cores with Sandy Bridge micro-architecture. This performance will be comparable to that of Graphic Processing Unit (GPU) cluster systems, such as the one with about 200 Tesla C1070 GPUs ( Spurzem et al., 2010). This paper offers an alternative to collisional N-body simulations with GRAPEs and GPUs.
49 CFR 594.9 - Fee for reimbursement of bond processing costs and costs for processing offers of cash deposits...

Science.gov (United States)

2010-10-01

... 49 Transportation 7 2010-10-01 2010-10-01 false Fee for reimbursement of bond processing costs and costs for processing offers of cash deposits or obligations of the United States in lieu of sureties on... indirect costs the agency incurs for receipt, processing, handling, and disbursement of cash deposits or...
Offer

CERN Multimedia

Staff Association

2016-01-01

The “La Comédie” theatre unveiled its programme for the season 2016–2017 in late May, and it was met with great enthusiasm by the press. Leading names of the European and Swiss theatre scenes, such as director Joël Pommerat who recently won four Molière awards, will make an appearance! We are delighted to share this brand new, rich and varied programme with you. The “La Comédie” theatre has various discounts for our members Buy 2 subscriptions for the price of 1 : 2 cards “Libertà” for CHF 240.- instead of CHF 480.- Cruise freely through the season with an 8-entry card valid for the shows of your choice. These cards are transferable and can be shared with one or more accompanying persons. 2 cards “Piccolo” for CHF 120 instead of CHF 240.- This card lets you discover 4 shows which are suitable for all audiences (offers valid while stock lasts and until October 31, 20...
48 CFR 12.602 - Streamlined evaluation of offers.

Science.gov (United States)

2010-10-01

... offers. 12.602 Section 12.602 Federal Acquisition Regulations System FEDERAL ACQUISITION REGULATION... for Commercial Items 12.602 Streamlined evaluation of offers. (a) When evaluation factors are used... evaluation factors. (b) Offers shall be evaluated in accordance with the criteria contained in the...
A Generic High-performance GPU-based Library for PDE solvers

DEFF Research Database (Denmark)

Glimberg, Stefan Lemvig; Engsig-Karup, Allan Peter

, the privilege of high-performance parallel computing is now in principle accessible for many scientific users, no matter their economic resources. Though being highly effective units, GPUs and parallel architectures in general, pose challenges for software developers to utilize their efficiency. Sequential...... legacy codes are not always easily parallelized and the time spent on conversion might not pay o in the end. We present a highly generic C++ library for fast assembling of partial differential equation (PDE) solvers, aiming at utilizing the computational resources of GPUs. The library requires a minimum...... of GPU computing knowledge, while still oering the possibility to customize user-specic solvers at kernel level if desired. Spatial dierential operators are based on matrix free exible order nite dierence approximations. These matrix free operators minimize both memory consumption and main memory access...

GBOOST: a GPU-based tool for detecting gene-gene interactions in genome-wide case control studies.

Science.gov (United States)

Yung, Ling Sing; Yang, Can; Wan, Xiang; Yu, Weichuan

2011-05-01

Collecting millions of genetic variations is feasible with the advanced genotyping technology. With a huge amount of genetic variations data in hand, developing efficient algorithms to carry out the gene-gene interaction analysis in a timely manner has become one of the key problems in genome-wide association studies (GWAS). Boolean operation-based screening and testing (BOOST), a recent work in GWAS, completes gene-gene interaction analysis in 2.5 days on a desktop computer. Compared with central processing units (CPUs), graphic processing units (GPUs) are highly parallel hardware and provide massive computing resources. We are, therefore, motivated to use GPUs to further speed up the analysis of gene-gene interactions. We implement the BOOST method based on a GPU framework and name it GBOOST. GBOOST achieves a 40-fold speedup compared with BOOST. It completes the analysis of Wellcome Trust Case Control Consortium Type 2 Diabetes (WTCCC T2D) genome data within 1.34 h on a desktop computer equipped with Nvidia GeForce GTX 285 display card. GBOOST code is available at http://bioinformatics.ust.hk/BOOST.html#GBOOST.
FARGO3D: A NEW GPU-ORIENTED MHD CODE

Energy Technology Data Exchange (ETDEWEB)

Benitez-Llambay, Pablo [Instituto de Astronomía Teórica y Experimental, Observatorio Astronónomico, Universidad Nacional de Córdoba. Laprida 854, X5000BGR, Córdoba (Argentina); Masset, Frédéric S., E-mail: pbllambay@oac.unc.edu.ar, E-mail: masset@icf.unam.mx [Instituto de Ciencias Físicas, Universidad Nacional Autónoma de México (UNAM), Apdo. Postal 48-3,62251-Cuernavaca, Morelos (Mexico)

2016-03-15

We present the FARGO3D code, recently publicly released. It is a magnetohydrodynamics code developed with special emphasis on the physics of protoplanetary disks and planet–disk interactions, and parallelized with MPI. The hydrodynamics algorithms are based on finite-difference upwind, dimensionally split methods. The magnetohydrodynamics algorithms consist of the constrained transport method to preserve the divergence-free property of the magnetic field to machine accuracy, coupled to a method of characteristics for the evaluation of electromotive forces and Lorentz forces. Orbital advection is implemented, and an N-body solver is included to simulate planets or stars interacting with the gas. We present our implementation in detail and present a number of widely known tests for comparison purposes. One strength of FARGO3D is that it can run on either graphical processing units (GPUs) or central processing units (CPUs), achieving large speed-up with respect to CPU cores. We describe our implementation choices, which allow a user with no prior knowledge of GPU programming to develop new routines for CPUs, and have them translated automatically for GPUs.
Optimal hydro scheduling and offering strategies considering price uncertainty and risk management

International Nuclear Information System (INIS)

Catalão, J.P.S.; Pousinho, H.M.I.; Contreras, J.

2012-01-01

Hydro energy represents a priority in the energy policy of Portugal, with the aim of decreasing the dependence on fossil fuels. In this context, optimal hydro scheduling acquires added significance in moving towards a sustainable environment. A mixed-integer nonlinear programming approach is considered to enable optimal hydro scheduling for the short-term time horizon, including the effect of head on power production, start-up costs related to the units, multiple regions of operation, and constraints on discharge variation. As new contributions to the field, market uncertainty is introduced in the model via price scenarios and risk management is included using Conditional Value-at-Risk to limit profit volatility. Moreover, plant scheduling and pool offering by the hydro power producer are simultaneously considered to solve a realistic cascaded hydro system. -- Highlights: ► A mixed-integer nonlinear programming approach is considered for optimal hydro scheduling. ► Market uncertainty is introduced in the model via price scenarios. ► Risk management is included using conditional value-at-risk. ► Plant scheduling and pool offering by the hydro power producer are simultaneously considered. ► A realistic cascaded hydro system is solved.
Information Uncertainty in Electricity Markets: Introducing Probabilistic Offers

DEFF Research Database (Denmark)

Papakonstantinou, Athanasios; Pinson, Pierre

2016-01-01

We propose a shift from the current paradigm of electricity markets treating stochastic producers similarly to conventional ones in terms of their offers. We argue that the producers’ offers should be probabilistic to reflect the limited predictability of renewable energy generation, while we...... should design market mechanisms to accommodate such offers. We argue that the transition from deterministic offers is a natural next step in electricity markets, by analytically proving our proposal’s equivalence with a two-price conventional market....
5 CFR 339.302 - Authority to offer examinations.

Science.gov (United States)

2010-01-01

... 5 Administrative Personnel 1 2010-01-01 2010-01-01 false Authority to offer examinations. 339.302... QUALIFICATION DETERMINATIONS Medical Examinations § 339.302 Authority to offer examinations. An agency may, at its option, offer a medical examination (including a psychiatric evaluation) in any situation where...
CHOLLA: A NEW MASSIVELY PARALLEL HYDRODYNAMICS CODE FOR ASTROPHYSICAL SIMULATION

International Nuclear Information System (INIS)

Schneider, Evan E.; Robertson, Brant E.

2015-01-01

We present Computational Hydrodynamics On ParaLLel Architectures (Cholla ), a new three-dimensional hydrodynamics code that harnesses the power of graphics processing units (GPUs) to accelerate astrophysical simulations. Cholla models the Euler equations on a static mesh using state-of-the-art techniques, including the unsplit Corner Transport Upwind algorithm, a variety of exact and approximate Riemann solvers, and multiple spatial reconstruction techniques including the piecewise parabolic method (PPM). Using GPUs, Cholla evolves the fluid properties of thousands of cells simultaneously and can update over 10 million cells per GPU-second while using an exact Riemann solver and PPM reconstruction. Owing to the massively parallel architecture of GPUs and the design of the Cholla code, astrophysical simulations with physically interesting grid resolutions (≳256 3 ) can easily be computed on a single device. We use the Message Passing Interface library to extend calculations onto multiple devices and demonstrate nearly ideal scaling beyond 64 GPUs. A suite of test problems highlights the physical accuracy of our modeling and provides a useful comparison to other codes. We then use Cholla to simulate the interaction of a shock wave with a gas cloud in the interstellar medium, showing that the evolution of the cloud is highly dependent on its density structure. We reconcile the computed mixing time of a turbulent cloud with a realistic density distribution destroyed by a strong shock with the existing analytic theory for spherical cloud destruction by describing the system in terms of its median gas density
CHOLLA: A NEW MASSIVELY PARALLEL HYDRODYNAMICS CODE FOR ASTROPHYSICAL SIMULATION

Energy Technology Data Exchange (ETDEWEB)

Schneider, Evan E.; Robertson, Brant E. [Steward Observatory, University of Arizona, 933 North Cherry Avenue, Tucson, AZ 85721 (United States)

2015-04-15

We present Computational Hydrodynamics On ParaLLel Architectures (Cholla ), a new three-dimensional hydrodynamics code that harnesses the power of graphics processing units (GPUs) to accelerate astrophysical simulations. Cholla models the Euler equations on a static mesh using state-of-the-art techniques, including the unsplit Corner Transport Upwind algorithm, a variety of exact and approximate Riemann solvers, and multiple spatial reconstruction techniques including the piecewise parabolic method (PPM). Using GPUs, Cholla evolves the fluid properties of thousands of cells simultaneously and can update over 10 million cells per GPU-second while using an exact Riemann solver and PPM reconstruction. Owing to the massively parallel architecture of GPUs and the design of the Cholla code, astrophysical simulations with physically interesting grid resolutions (≳256{sup 3}) can easily be computed on a single device. We use the Message Passing Interface library to extend calculations onto multiple devices and demonstrate nearly ideal scaling beyond 64 GPUs. A suite of test problems highlights the physical accuracy of our modeling and provides a useful comparison to other codes. We then use Cholla to simulate the interaction of a shock wave with a gas cloud in the interstellar medium, showing that the evolution of the cloud is highly dependent on its density structure. We reconcile the computed mixing time of a turbulent cloud with a realistic density distribution destroyed by a strong shock with the existing analytic theory for spherical cloud destruction by describing the system in terms of its median gas density.
17 CFR 230.155 - Integration of abandoned offerings.

Science.gov (United States)

2010-04-01

... offering disclose information about the abandoned private offering, including: (i) The size and nature of... were (or who the issuer reasonably believes were): (i) Accredited investors (as that term is defined in... document used in the private offering discloses any changes in the issuer's business or financial condition...
Decision support for organ offers in liver transplantation.

Science.gov (United States)

Volk, Michael L; Goodrich, Nathan; Lai, Jennifer C; Sonnenday, Christopher; Shedden, Kerby

2015-06-01

Organ offers in liver transplantation are high-risk medical decisions with a low certainty of whether a better liver offer will come along before death. We hypothesized that decision support could improve the decision to accept or decline. With data from the Scientific Registry of Transplant Recipients, survival models were constructed for 42,857 waiting-list patients and 28,653 posttransplant patients from 2002 to 2008. Daily covariate-adjusted survival probabilities from these 2 models were combined into a 5-year area under the curve to create an individualized prediction of whether an organ offer should be accepted for a given patient. Among 650,832 organ offers from 2008 to 2013, patient survival was compared by whether the clinical decision was concordant or discordant with model predictions. The acceptance benefit (AB)--the predicted gain or loss of life by accepting a given organ versus waiting for the next organ--ranged from 3 to -22 years (harm) and varied geographically; for example, the average benefit of accepting a donation after cardiac death organ ranged from 0.47 to -0.71 years by donation service area. Among organ offers, even when AB was >1 year, the offer was only accepted 10% of the time. Patient survival from the time of the organ offer was better if the model recommendations and the clinical decision were concordant: for offers with AB > 0, the 3-year survival was 80% if the offer was accepted and 66% if it was declined (P decision support may improve patient survival in liver transplantation. © 2015 American Association for the Study of Liver Diseases.
High-speed detection of emergent market clustering via an unsupervised parallel genetic algorithm

Directory of Open Access Journals (Sweden)

Dieter Hendricks

2016-02-01

Full Text Available We implement a master-slave parallel genetic algorithm with a bespoke log-likelihood fitness function to identify emergent clusters within price evolutions. We use graphics processing units (GPUs to implement a parallel genetic algorithm and visualise the results using disjoint minimal spanning trees. We demonstrate that our GPU parallel genetic algorithm, implemented on a commercially available general purpose GPU, is able to recover stock clusters in sub-second speed, based on a subset of stocks in the South African market. This approach represents a pragmatic choice for low-cost, scalable parallel computing and is significantly faster than a prototype serial implementation in an optimised C-based fourth-generation programming language, although the results are not directly comparable because of compiler differences. Combined with fast online intraday correlation matrix estimation from high frequency data for cluster identification, the proposed implementation offers cost-effective, near-real-time risk assessment for financial practitioners.
7 CFR 29.57 - Where inspection is offered.

Science.gov (United States)

2010-01-01

... 7 Agriculture 2 2010-01-01 2010-01-01 false Where inspection is offered. 29.57 Section 29.57... REGULATIONS TOBACCO INSPECTION Regulations Permissive Inspection § 29.57 Where inspection is offered. Tobacco..., samplers, or weighers are available and the tobacco is offered under conditions that permit of its proper...
7 CFR 3431.17 - VMLRP service agreement offer.

Science.gov (United States)

2010-01-01

... 7 Agriculture 15 2010-01-01 2010-01-01 false VMLRP service agreement offer. 3431.17 Section 3431... Administration of the Veterinary Medicine Loan Repayment Program § 3431.17 VMLRP service agreement offer. The Secretary will make an offer to successful applicants to enter into an agreement with the Secretary to...
31 CFR 309.6 - Public notice of offering.

Science.gov (United States)

2010-07-01

... 31 Money and Finance: Treasury 2 2010-07-01 2010-07-01 false Public notice of offering. 309.6 Section 309.6 Money and Finance: Treasury Regulations Relating to Money and Finance (Continued) FISCAL... Public notice of offering. When Treasury bills are to be offered, tenders therefor will be invited...
19 CFR 172.32 - Authority to accept offers.

Science.gov (United States)

2010-04-01

... 19 Customs Duties 2 2010-04-01 2010-04-01 false Authority to accept offers. 172.32 Section 172.32 Customs Duties U.S. CUSTOMS AND BORDER PROTECTION, DEPARTMENT OF HOMELAND SECURITY; DEPARTMENT OF THE....32 Authority to accept offers. The authority to accept offers in compromise, subject to the...
Confabulation Based Real-time Anomaly Detection for Wide-area Surveillance Using Heterogeneous High Performance Computing Architecture

Science.gov (United States)

2015-06-01

CONFABULATION BASED REAL-TIME ANOMALY DETECTION FOR WIDE-AREA SURVEILLANCE USING HETEROGENEOUS HIGH PERFORMANCE COMPUTING ARCHITECTURE SYRACUSE...DETECTION FOR WIDE-AREA SURVEILLANCE USING HETEROGENEOUS HIGH PERFORMANCE COMPUTING ARCHITECTURE 5a. CONTRACT NUMBER FA8750-12-1-0251 5b. GRANT...processors including graphic processor units (GPUs) and Intel Xeon Phi processors. Experimental results showed significant speedups, which can enable
The Improvement of the Accommodation Offer in Vojvodina (Serbia as a Factor of its Competitiveness on the Market

Directory of Open Access Journals (Sweden)

Svetlana Vukosav

2009-06-01

Full Text Available The accommodation offer, particularly hotel industry in Vojvodina is experiencing significant changes today both in quality and in quantity compared to the period 10 years ago. These positive changes and the improvement of the receptive base are a direct consequence of the transition process, ownership transformation and investment in accommodation facilities, which is reflected through the constant increase of foreign tourists, foreign exchange input, as well as market share and competitiveness of certain types of accommodation. Investments in the accommodation offer in Vojvodina are one of the priorities in the Strategy of Tourism Development of Serbia, where a significant increase in the number of accommodation units in the Province is expected.
High-testosterone men reject low ultimatum game offers.

Science.gov (United States)

Burnham, Terence C

2007-09-22

The ultimatum game is a simple negotiation with the interesting property that people frequently reject offers of 'free' money. These rejections contradict the standard view of economic rationality. This divergence between economic theory and human behaviour is important and has no broadly accepted cause. This study examines the relationship between ultimatum game rejections and testosterone. In a variety of species, testosterone is associated with male seeking dominance. If low ultimatum game offers are interpreted as challenges, then high-testosterone men may be more likely to reject such offers. In this experiment, men who reject low offers ($5 out of $40) have significantly higher testosterone levels than those who accept. In addition, high testosterone levels are associated with higher ultimatum game offers, but this second finding is not statistically significant.
12 CFR 810.2 - Public notice of offering.

Science.gov (United States)

2010-01-01

... 12 Banks and Banking 6 2010-01-01 2010-01-01 false Public notice of offering. 810.2 Section 810.2 Banks and Banking FEDERAL FINANCING BANK FEDERAL FINANCING BANK BILLS § 810.2 Public notice of offering. On the occasion of an offering of FFB bills, tenders therefor will be invited through public notices...
Utilizing GPUs to Accelerate Turbomachinery CFD Codes

Science.gov (United States)

MacCalla, Weylin; Kulkarni, Sameer

2016-01-01

GPU computing has established itself as a way to accelerate parallel codes in the high performance computing world. This work focuses on speeding up APNASA, a legacy CFD code used at NASA Glenn Research Center, while also drawing conclusions about the nature of GPU computing and the requirements to make GPGPU worthwhile on legacy codes. Rewriting and restructuring of the source code was avoided to limit the introduction of new bugs. The code was profiled and investigated for parallelization potential, then OpenACC directives were used to indicate parallel parts of the code. The use of OpenACC directives was not able to reduce the runtime of APNASA on either the NVIDIA Tesla discrete graphics card, or the AMD accelerated processing unit. Additionally, it was found that in order to justify the use of GPGPU, the amount of parallel work being done within a kernel would have to greatly exceed the work being done by any one portion of the APNASA code. It was determined that in order for an application like APNASA to be accelerated on the GPU, it should not be modular in nature, and the parallel portions of the code must contain a large portion of the code's computation time.
42 CFR 417.153 - Offer of HMO alternative.

Science.gov (United States)

2010-10-01

... 42 Public Health 3 2010-10-01 2010-10-01 false Offer of HMO alternative. 417.153 Section 417.153... § 417.153 Offer of HMO alternative. (a) Basic rule. An employing entity that is subject to this subpart and that elects to include one or more qualified HMOs must offer the HMO alternative in accordance...

Real-time processing for full-range Fourier-domain optical-coherence tomography with zero-filling interpolation using multiple graphic processing units.

Science.gov (United States)

Watanabe, Yuuki; Maeno, Seiya; Aoshima, Kenji; Hasegawa, Haruyuki; Koseki, Hitoshi

2010-09-01

The real-time display of full-range, 2048?axial pixelx1024?lateral pixel, Fourier-domain optical-coherence tomography (FD-OCT) images is demonstrated. The required speed was achieved by using dual graphic processing units (GPUs) with many stream processors to realize highly parallel processing. We used a zero-filling technique, including a forward Fourier transform, a zero padding to increase the axial data-array size to 8192, an inverse-Fourier transform back to the spectral domain, a linear interpolation from wavelength to wavenumber, a lateral Hilbert transform to obtain the complex spectrum, a Fourier transform to obtain the axial profiles, and a log scaling. The data-transfer time of the frame grabber was 15.73?ms, and the processing time, which includes the data transfer between the GPU memory and the host computer, was 14.75?ms, for a total time shorter than the 36.70?ms frame-interval time using a line-scan CCD camera operated at 27.9?kHz. That is, our OCT system achieved a processed-image display rate of 27.23 frames/s.
Service Offering at Electrical Equipment Manufacturers

Directory of Open Access Journals (Sweden)

Lucie Kaňovská

2015-09-01

Full Text Available Purpose of the article: The aim of the paper is to uncover ways of managing service offering provided by electrical equipment manufactures in the Czech Republic. The segment is extremely important for Czech industry nowadays, especially because of many companies being subcontractors for the car industry and mechanical engineering. The producers of electric equipment comply with the Czech industry classification CZ-NACE 27. Methodology/methods: The questionnaire in the form of the Likert scale was prepared to gather information about customer services. The respondents were usually directors or managers, e.g. employees with high competencies of knowing customer services in this particular market. The total of 22 companies were included in the survey. Research was focused on the following industries classifications belonging to CZ-NACE 27: CZ-NACE 27, CZ-NACE 271 and CZ-NACE 273. According to Czech Statistical Office the total number of companies belonging to these 3 segments is 136. It means 16,2% companies belonging to CZ-NACE 27 participated in our research. Basic statistical methods were used to analyse the complete database. Scientific aim: The paper deals with the problem of service offering provided by today’s manufacturers. Global understanding of services that manufacturers really develop, sell, deliver and manage is still limited. Findings: Managing service offering provided by today‘s manufacturers shows that 1 Manufacturers not offer only tangible products, but also wide range of services and even information and support. 2 New products are not designed only according to company technicians, but also according to their customers. Their products and services are developed, tested and improved according to their needs. 3 Services provide complex customer care from time product selection to its end. Conclusions: Manufacturers of tangible products need to enlarge their product offering to be able to satisfy customers. Therefore
Prevalence of health promotion programs in primary health care units in Brazil

Science.gov (United States)

Ramos, Luiz Roberto; Malta, Deborah Carvalho; Gomes, Grace Angélica de Oliveira; Bracco, Mário M; Florindo, Alex Antonio; Mielke, Gregore Iven; Parra, Diana C; Lobelo, Felipe; Simoes, Eduardo J; Hallal, Pedro Curi

2014-01-01

OBJECTIVE Assessment of prevalence of health promotion programs in primary health care units within Brazil’s health system. METHODS We conducted a cross-sectional descriptive study based on telephone interviews with managers of primary care units. Of a total 42,486 primary health care units listed in the Brazilian Unified Health System directory, 1,600 were randomly selected. Care units from all five Brazilian macroregions were selected proportionally to the number of units in each region. We examined whether any of the following five different types of health promotion programs was available: physical activity; smoking cessation; cessation of alcohol and illicit drug use; healthy eating; and healthy environment. Information was collected on the kinds of activities offered and the status of implementation of the Family Health Strategy at the units. RESULTS Most units (62.0%) reported having in place three health promotion programs or more and only 3.0% reported having none. Healthy environment (77.0%) and healthy eating (72.0%) programs were the most widely available; smoking and alcohol use cessation were reported in 54.0% and 42.0% of the units. Physical activity programs were offered in less than 40.0% of the units and their availability varied greatly nationwide, from 51.0% in the Southeast to as low as 21.0% in the North. The Family Health Strategy was implemented in most units (61.0%); however, they did not offer more health promotion programs than others did. CONCLUSIONS Our study showed that most primary care units have in place health promotion programs. Public policies are needed to strengthen primary care services and improve training of health providers to meet the goals of the agenda for health promotion in Brazil. PMID:25372175
Fast ray-tracing of human eye optics on Graphics Processing Units.

Science.gov (United States)

Wei, Qi; Patkar, Saket; Pai, Dinesh K

2014-05-01

We present a new technique for simulating retinal image formation by tracing a large number of rays from objects in three dimensions as they pass through the optic apparatus of the eye to objects. Simulating human optics is useful for understanding basic questions of vision science and for studying vision defects and their corrections. Because of the complexity of computing such simulations accurately, most previous efforts used simplified analytical models of the normal eye. This makes them less effective in modeling vision disorders associated with abnormal shapes of the ocular structures which are hard to be precisely represented by analytical surfaces. We have developed a computer simulator that can simulate ocular structures of arbitrary shapes, for instance represented by polygon meshes. Topographic and geometric measurements of the cornea, lens, and retina from keratometer or medical imaging data can be integrated for individualized examination. We utilize parallel processing using modern Graphics Processing Units (GPUs) to efficiently compute retinal images by tracing millions of rays. A stable retinal image can be generated within minutes. We simulated depth-of-field, accommodation, chromatic aberrations, as well as astigmatism and correction. We also show application of the technique in patient specific vision correction by incorporating geometric models of the orbit reconstructed from clinical medical images. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
NVidia Tutorial

CERN Document Server

CERN. Geneva; MESSMER, Peter; DEMOUTH, Julien

2015-01-01

This tutorial will present Caffee, a powerful Python library to implement solutions working on CPUs and GPUs, and explain how to use it to build and train Convolutional Neural Networks using NVIDIA GPUs. The session requires no prior experience with GPUs or Caffee.
Hospitalized Patients' Responses to Offers of Prayer.

Science.gov (United States)

McMillan, Kathy; Taylor, Elizabeth Johnston

2018-02-01

Most Americans pray; many pray about their health. When they are hospitalized, however, do patients want an offer of prayer from a healthcare provider? This project allowed for the measurement of hospitalized patient's responses to massage therapists' offers of a colloquial prayer after a massage. After the intervention, 78 patients completed questionnaires that elicited quantitative data that were analyzed using uni- and bivariate statistical analyses. In this sample, 88% accepted the offer of prayer, 85% found it helpful, and 51% wanted prayer daily. Patients may welcome prayer, as long as the clinician shows "genuine kindness and respect."
The offerings from the Hyperboreans.

Science.gov (United States)

Ruck, C A

1983-08-01

The ancient Greeks believed that the fruits of agriculture could be harvested only if one first appeased the spirit of the primitive avatars from which the edible crop had been evolved over the centuries through hybridization and cultivation. On occasion, this appeasement was secured through the sacrifice of a human victim, a person who for various reasons could be considered to represent a similar primitivism. By the classical age, this extreme form of sacrificial appeasement appears to have been reserved for times of unusual crisis, such as pestilence or natural disaster, for at such times, the resurgent forces of primitivism seemed to threaten the entire civilization with regression back to its wilder origins. Other forms of appeasement were ordinarily substituted for the actual offering of a human victim. Amongst these was the enactment of puberty rites, for the natural growth and maturation of an individual could be thought to symbolize this same evolutionary process. Each infant is born as a wild creature who must develop into a socialized adult through the metaphoric death of its former self as it assumes the responsibilities of civilized life in crossing the threshold to sexual maturity. A similar symbolic victim was customarily represented by the offering of first fruits. A portion of the cultivated crop was prematurely cut and consecrated to redeem and release the ripening harvest from the dangerous contamination with the spirits of its pre-agricultural precedents. On the island of Delos, a special version of this consecration was performed. Each year, the various Greek cities would send a sheaf of unripened grain to the sanctuary of the god Apollo and his twin sister Artemis. Amongst these annual offerings, there was one that was supposed to have originated from the Hyperboreans, a mythical people who were thought to live in the original homeland of the two gods. This special Hyperborean offering differed from the others, for it was said to contain a
Speeding up IA mechanically-steered multistatic radar scheduling with GP-GPUs

CSIR Research Space (South Africa)

Focke, RW

2016-07-01

Full Text Available In this paper, the authors investigate speeding up the execution time of Interval Algebra (IA) mechanically-steered multistatic and multisite radar scheduling using a general-purpose graphical processing unit (GP-GPU). Multistatic/multisite radar...
Microlensing observations rapid search for exoplanets: MORSE code for GPUs

Science.gov (United States)

McDougall, Alistair; Albrow, Michael D.

2016-02-01

The rapid analysis of ongoing gravitational microlensing events has been integral to the successful detection and characterization of cool planets orbiting low-mass stars in the Galaxy. In this paper, we present an implementation of search and fit techniques on graphical processing unit (GPU) hardware. The method allows for the rapid identification of candidate planetary microlensing events and their subsequent follow-up for detailed characterization.
Offer of secondary reserve with a pool of electric vehicles on the German market

International Nuclear Information System (INIS)

Jargstorf, Johannes; Wickert, Manuel

2013-01-01

This paper analyzes the business case of offering secondary downward reserve for frequency control on the German market by a pool of electrical vehicles. Former benchmark studies promised high revenues especially for this case. The benefits could provide an incentive to customers to buy an electric vehicle. The business case is analyzed for the German market as a case study. Specific regulations for this market, real driving patterns and real market data are taken into account when calculating revenues. Secondary reserve is strictly regulated, requiring a very high level of availability. As a result, simulated revenues are lower than assumed. Simulation shows average revenues of less than 5€ per month and vehicle. As a major bottleneck for an offer of secondary reserve, fully charged batteries are identified. Additionally an issue is made of costs for communication and customer compensation. Based on the simulation results, it is argued that the market for secondary reserve should not be accessed with these small units. For electric vehicles, easier accessible markets with lower related costs should be considered instead. -- Highlights: •We analyze a business case of providing reserve power with electric vehicles. •We include legal regulations for providing reserve power in the calculation. •Reserve requirements lead to a significant drop in expected revenues. •Results show that vehicles are not suitable to offer reserve power
Initial Public Offering – Finance Source of Stock

Directory of Open Access Journals (Sweden)

Sorin Claudiu Radu

2013-10-01

Full Text Available Capital market offers a wide range of options for financing companies, which can be tailored to meet their exact needs. Thus, they have the opportunity of primary security sale (shares and bonds on the stock exchange, which may take place through a tender, in which case the financial instruments issued by a company are underwritten at the date of issue, or through a secondary offer, in which case they are issued and offered for sale by the issuer. If the public sale offer focuses on shares and aims at transforming the issuing company into a public one, then it bears the name of IPO (Initial Public Offering. The present work aims for the evolution of IPO trends on the European market in the aftermath of the global crisis outbreak. The market of IPO carried out by BSE is also analyzed herewith.
Multi-GPU configuration of 4D intensity modulated radiation therapy inverse planning using global optimization

Science.gov (United States)

Hagan, Aaron; Sawant, Amit; Folkerts, Michael; Modiri, Arezoo

2018-01-01

We report on the design, implementation and characterization of a multi-graphic processing unit (GPU) computational platform for higher-order optimization in radiotherapy treatment planning. In collaboration with a commercial vendor (Varian Medical Systems, Palo Alto, CA), a research prototype GPU-enabled Eclipse (V13.6) workstation was configured. The hardware consisted of dual 8-core Xeon processors, 256 GB RAM and four NVIDIA Tesla K80 general purpose GPUs. We demonstrate the utility of this platform for large radiotherapy optimization problems through the development and characterization of a parallelized particle swarm optimization (PSO) four dimensional (4D) intensity modulated radiation therapy (IMRT) technique. The PSO engine was coupled to the Eclipse treatment planning system via a vendor-provided scripting interface. Specific challenges addressed in this implementation were (i) data management and (ii) non-uniform memory access (NUMA). For the former, we alternated between parameters over which the computation process was parallelized. For the latter, we reduced the amount of data required to be transferred over the NUMA bridge. The datasets examined in this study were approximately 300 GB in size, including 4D computed tomography images, anatomical structure contours and dose deposition matrices. For evaluation, we created a 4D-IMRT treatment plan for one lung cancer patient and analyzed computation speed while varying several parameters (number of respiratory phases, GPUs, PSO particles, and data matrix sizes). The optimized 4D-IMRT plan enhanced sparing of organs at risk by an average reduction of 26% in maximum dose, compared to the clinical optimized IMRT plan, where the internal target volume was used. We validated our computation time analyses in two additional cases. The computation speed in our implementation did not monotonically increase with the number of GPUs. The optimal number of GPUs (five, in our study) is directly related to the
31 CFR 342.0 - Offering of notes.

Science.gov (United States)

2010-07-01

... 31 Money and Finance: Treasury 2 2010-07-01 2010-07-01 false Offering of notes. 342.0 Section 342.0 Money and Finance: Treasury Regulations Relating to Money and Finance (Continued) FISCAL SERVICE... or greater denomination. This offering was effective from May 1, 1967 until the close of business...
A 40-Year History of End-of-Life Offerings in US Medical Schools: 1975-2015.

Science.gov (United States)

Dickinson, George E

2017-07-01

The purpose of this longitudinal study of US medical schools over a 40-year period was to ascertain their offerings on end-of-life (EOL) issues. At 5-year intervals, beginning in 1975, US medical schools were surveyed via a questionnaire to determine their EOL offerings. Data were reported with frequency distributions. The Institute of Medicine has encouraged more emphasis on EOL issues over the past 2 decades. Findings revealed that undergraduate medical students in the United States are now exposed to death and dying, palliative care, and geriatric medicine. The inclusion of EOL topics has definitely expanded over the 40-year period as findings reveal that US undergraduate medical students are currently exposed in over 90% of programs to death and dying, palliative care, and geriatric medicine, with the emphasis on these topics varying with the medical programs. Such inclusion should produce future favorable outcomes for undergraduate medical students, patients, and their families.
cudaBayesreg: Parallel Implementation of a Bayesian Multilevel Model for fMRI Data Analysis

Directory of Open Access Journals (Sweden)

Adelino R. Ferreira da Silva

2011-10-01

Full Text Available Graphic processing units (GPUs are rapidly gaining maturity as powerful general parallel computing devices. A key feature in the development of modern GPUs has been the advancement of the programming model and programming tools. Compute Unified Device Architecture (CUDA is a software platform for massively parallel high-performance computing on Nvidia many-core GPUs. In functional magnetic resonance imaging (fMRI, the volume of the data to be processed, and the type of statistical analysis to perform call for high-performance computing strategies. In this work, we present the main features of the R-CUDA package cudaBayesreg which implements in CUDA the core of a Bayesian multilevel model for the analysis of brain fMRI data. The statistical model implements a Gibbs sampler for multilevel/hierarchical linear models with a normal prior. The main contribution for the increased performance comes from the use of separate threads for fitting the linear regression model at each voxel in parallel. The R-CUDA implementation of the Bayesian model proposed here has been able to reduce significantly the run-time processing of Markov chain Monte Carlo (MCMC simulations used in Bayesian fMRI data analyses. Presently, cudaBayesreg is only configured for Linux systems with Nvidia CUDA support.
Accelerated Computing in Magnetic Resonance Imaging: Real-Time Imaging Using Nonlinear Inverse Reconstruction

Directory of Open Access Journals (Sweden)

Sebastian Schaetz

2017-01-01

Full Text Available Purpose. To develop generic optimization strategies for image reconstruction using graphical processing units (GPUs in magnetic resonance imaging (MRI and to exemplarily report on our experience with a highly accelerated implementation of the nonlinear inversion (NLINV algorithm for dynamic MRI with high frame rates. Methods. The NLINV algorithm is optimized and ported to run on a multi-GPU single-node server. The algorithm is mapped to multiple GPUs by decomposing the data domain along the channel dimension. Furthermore, the algorithm is decomposed along the temporal domain by relaxing a temporal regularization constraint, allowing the algorithm to work on multiple frames in parallel. Finally, an autotuning method is presented that is capable of combining different decomposition variants to achieve optimal algorithm performance in different imaging scenarios. Results. The algorithm is successfully ported to a multi-GPU system and allows online image reconstruction with high frame rates. Real-time reconstruction with low latency and frame rates up to 30 frames per second is demonstrated. Conclusion. Novel parallel decomposition methods are presented which are applicable to many iterative algorithms for dynamic MRI. Using these methods to parallelize the NLINV algorithm on multiple GPUs, it is possible to achieve online image reconstruction with high frame rates.
15 CFR 90.13 - Offer of hearing.

Science.gov (United States)

2010-01-01

... 15 Commerce and Foreign Trade 1 2010-01-01 2010-01-01 false Offer of hearing. 90.13 Section 90.13 Commerce and Foreign Trade Regulations Relating to Commerce and Foreign Trade BUREAU OF THE CENSUS, DEPARTMENT OF COMMERCE PROCEDURE FOR CHALLENGING CERTAIN POPULATION AND INCOME ESTIMATES § 90.13 Offer of...
Accelerating atomistic calculations of quantum energy eigenstates on graphic cards

Science.gov (United States)

Rodrigues, Walter; Pecchia, A.; Lopez, M.; Auf der Maur, M.; Di Carlo, A.

2014-10-01

Electronic properties of nanoscale materials require the calculation of eigenvalues and eigenvectors of large matrices. This bottleneck can be overcome by parallel computing techniques or the introduction of faster algorithms. In this paper we report a custom implementation of the Lanczos algorithm with simple restart, optimized for graphical processing units (GPUs). The whole algorithm has been developed using CUDA and runs entirely on the GPU, with a specialized implementation that spares memory and reduces at most machine-to-device data transfers. Furthermore parallel distribution over several GPUs has been attained using the standard message passing interface (MPI). Benchmark calculations performed on a GaN/AlGaN wurtzite quantum dot with up to 600,000 atoms are presented. The empirical tight-binding (ETB) model with an sp3d5s∗+spin-orbit parametrization has been used to build the system Hamiltonian (H).
Assembly of finite element methods on graphics processors

KAUST Repository

Cecka, Cris

2010-08-23

Recently, graphics processing units (GPUs) have had great success in accelerating many numerical computations. We present their application to computations on unstructured meshes such as those in finite element methods. Multiple approaches in assembling and solving sparse linear systems with NVIDIA GPUs and the Compute Unified Device Architecture (CUDA) are created and analyzed. Multiple strategies for efficient use of global, shared, and local memory, methods to achieve memory coalescing, and optimal choice of parameters are introduced. We find that with appropriate preprocessing and arrangement of support data, the GPU coprocessor using single-precision arithmetic achieves speedups of 30 or more in comparison to a well optimized double-precision single core implementation. We also find that the optimal assembly strategy depends on the order of polynomials used in the finite element discretization. © 2010 John Wiley & Sons, Ltd.
High-throughput sequence alignment using Graphics Processing Units

Directory of Open Access Journals (Sweden)

Trapnell Cole

2007-12-01

Full Text Available Abstract Background The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and de novo genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies. Results This paper describes MUMmerGPU, an open-source high-throughput parallel pairwise local sequence alignment program that runs on commodity Graphics Processing Units (GPUs in common workstations. MUMmerGPU uses the new Compute Unified Device Architecture (CUDA from nVidia to align multiple query sequences against a single reference sequence stored as a suffix tree. By processing the queries in parallel on the highly parallel graphics card, MUMmerGPU achieves more than a 10-fold speedup over a serial CPU version of the sequence alignment kernel, and outperforms the exact alignment component of MUMmer on a high end CPU by 3.5-fold in total application time when aligning reads from recent sequencing projects using Solexa/Illumina, 454, and Sanger sequencing technologies. Conclusion MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU.

Guide to Graduate Departments of Geography in the United States and Canada 1982-1983.

Science.gov (United States)

Association of American Geographers, Washington, DC.

Information is presented about requirements, course offerings, financial aid, and personnel for 147 graduate departments of geography in the United States and Canada. Seventy-three offer a Ph.D. in geography, and 77 award the Master's degree. Information provided for each institution includes: date founded; degrees offered; number of degrees…
Accelerating image reconstruction in three-dimensional optoacoustic tomography on graphics processing units.

Science.gov (United States)

Wang, Kun; Huang, Chao; Kao, Yu-Jiun; Chou, Cheng-Ying; Oraevsky, Alexander A; Anastasio, Mark A

2013-02-01

Optoacoustic tomography (OAT) is inherently a three-dimensional (3D) inverse problem. However, most studies of OAT image reconstruction still employ two-dimensional imaging models. One important reason is because 3D image reconstruction is computationally burdensome. The aim of this work is to accelerate existing image reconstruction algorithms for 3D OAT by use of parallel programming techniques. Parallelization strategies are proposed to accelerate a filtered backprojection (FBP) algorithm and two different pairs of projection/backprojection operations that correspond to two different numerical imaging models. The algorithms are designed to fully exploit the parallel computing power of graphics processing units (GPUs). In order to evaluate the parallelization strategies for the projection/backprojection pairs, an iterative image reconstruction algorithm is implemented. Computer simulation and experimental studies are conducted to investigate the computational efficiency and numerical accuracy of the developed algorithms. The GPU implementations improve the computational efficiency by factors of 1000, 125, and 250 for the FBP algorithm and the two pairs of projection/backprojection operators, respectively. Accurate images are reconstructed by use of the FBP and iterative image reconstruction algorithms from both computer-simulated and experimental data. Parallelization strategies for 3D OAT image reconstruction are proposed for the first time. These GPU-based implementations significantly reduce the computational time for 3D image reconstruction, complementing our earlier work on 3D OAT iterative image reconstruction.
CUDASW++2.0: enhanced Smith-Waterman protein database search on CUDA-enabled GPUs based on SIMT and virtualized SIMD abstractions

Directory of Open Access Journals (Sweden)

Schmidt Bertil

2010-04-01

Full Text Available Abstract Background Due to its high sensitivity, the Smith-Waterman algorithm is widely used for biological database searches. Unfortunately, the quadratic time complexity of this algorithm makes it highly time-consuming. The exponential growth of biological databases further deteriorates the situation. To accelerate this algorithm, many efforts have been made to develop techniques in high performance architectures, especially the recently emerging many-core architectures and their associated programming models. Findings This paper describes the latest release of the CUDASW++ software, CUDASW++ 2.0, which makes new contributions to Smith-Waterman protein database searches using compute unified device architecture (CUDA. A parallel Smith-Waterman algorithm is proposed to further optimize the performance of CUDASW++ 1.0 based on the single instruction, multiple thread (SIMT abstraction. For the first time, we have investigated a partitioned vectorized Smith-Waterman algorithm using CUDA based on the virtualized single instruction, multiple data (SIMD abstraction. The optimized SIMT and the partitioned vectorized algorithms were benchmarked, and remarkably, have similar performance characteristics. CUDASW++ 2.0 achieves performance improvement over CUDASW++ 1.0 as much as 1.74 (1.72 times using the optimized SIMT algorithm and up to 1.77 (1.66 times using the partitioned vectorized algorithm, with a performance of up to 17 (30 billion cells update per second (GCUPS on a single-GPU GeForce GTX 280 (dual-GPU GeForce GTX 295 graphics card. Conclusions CUDASW++ 2.0 is publicly available open-source software, written in CUDA and C++ programming languages. It obtains significant performance improvement over CUDASW++ 1.0 using either the optimized SIMT algorithm or the partitioned vectorized algorithm for Smith-Waterman protein database searches by fully exploiting the compute capability of commonly used CUDA-enabled low-cost GPUs.
NDPA: A generalized efficient parallel in-place N-Dimensional Permutation Algorithm

Directory of Open Access Journals (Sweden)

Muhammad Elsayed Ali

2015-09-01

Full Text Available N-dimensional transpose/permutation is a very important operation in many large-scale data intensive and scientific applications. These applications include but not limited to oil industry i.e. seismic data processing, nuclear medicine, media production, digital signal processing and business intelligence. This paper proposes an efficient in-place N-dimensional permutation algorithm. The algorithm is based on a novel 3D transpose algorithm that was published recently. The proposed algorithm has been tested on 3D, 4D, 5D, 6D and 7D data sets as a proof of concept. This is the first contribution which is breaking the dimensions’ limitation of the base algorithm. The suggested algorithm exploits the idea of mixing both logical and physical permutations together. In the logical permutation, the address map is transposed for each data unit access. In the physical permutation, actual data elements are swapped. Both permutation levels exploit the fast on-chip memory bandwidth by transferring large amount of data and allowing for fine-grain SIMD (Single Instruction, Multiple Data operations. Thus, the performance is improved as evident from the experimental results section. The algorithm is implemented on NVidia GeForce GTS 250 GPU (Graphics Processing Unit containing 128 cores. The rapid increase in GPUs performance coupled with the recent and continuous improvements in its programmability proved that GPUs are the right choice for computationally demanding tasks. The use of GPUs is the second contribution which reflects how strongly they fit for high performance tasks. The third contribution is improving the proposed algorithm performance to its peak as discussed in the results section.
BEAGLE: an application programming interface and high-performance computing library for statistical phylogenetics.

Science.gov (United States)

Ayres, Daniel L; Darling, Aaron; Zwickl, Derrick J; Beerli, Peter; Holder, Mark T; Lewis, Paul O; Huelsenbeck, John P; Ronquist, Fredrik; Swofford, David L; Cummings, Michael P; Rambaut, Andrew; Suchard, Marc A

2012-01-01

Phylogenetic inference is fundamental to our understanding of most aspects of the origin and evolution of life, and in recent years, there has been a concentration of interest in statistical approaches such as Bayesian inference and maximum likelihood estimation. Yet, for large data sets and realistic or interesting models of evolution, these approaches remain computationally demanding. High-throughput sequencing can yield data for thousands of taxa, but scaling to such problems using serial computing often necessitates the use of nonstatistical or approximate approaches. The recent emergence of graphics processing units (GPUs) provides an opportunity to leverage their excellent floating-point computational performance to accelerate statistical phylogenetic inference. A specialized library for phylogenetic calculation would allow existing software packages to make more effective use of available computer hardware, including GPUs. Adoption of a common library would also make it easier for other emerging computing architectures, such as field programmable gate arrays, to be used in the future. We present BEAGLE, an application programming interface (API) and library for high-performance statistical phylogenetic inference. The API provides a uniform interface for performing phylogenetic likelihood calculations on a variety of compute hardware platforms. The library includes a set of efficient implementations and can currently exploit hardware including GPUs using NVIDIA CUDA, central processing units (CPUs) with Streaming SIMD Extensions and related processor supplementary instruction sets, and multicore CPUs via OpenMP. To demonstrate the advantages of a common API, we have incorporated the library into several popular phylogenetic software packages. The BEAGLE library is free open source software licensed under the Lesser GPL and available from http://beagle-lib.googlecode.com. An example client program is available as public domain software.
Auto‐tuning of level 1 and level 2 BLAS for GPUs

DEFF Research Database (Denmark)

Sørensen, Hans Henrik Brandenborg

2013-01-01

). The target hardware is the most recent Nvidia (Santa Clara, CA, USA) Tesla 20‐series (Fermi architecture), which is designed from the ground up for high‐performance computing. We show that it is essentially a matter of fully utilizing the fine‐grained parallelism of the many‐core graphical processing unit...
14 CFR 151.29 - Procedures: Offer, amendment, and acceptance.

Science.gov (United States)

2010-01-01

... resolution or ordinance must, as appropriate under the local law— (1) Set forth the terms of the offer at... 14 Aeronautics and Space 3 2010-01-01 2010-01-01 false Procedures: Offer, amendment, and... § 151.29 Procedures: Offer, amendment, and acceptance. (a) Upon approving a project, the Administrator...
Perceived value creation process: focus on the company offer

Directory of Open Access Journals (Sweden)

Irena Pandža Bajs

2012-12-01

Full Text Available In the competitive business environment, as the number of rational consumers faced with many choices increases, companies can achieve their dominance best by applying the business concepts oriented to consumers in order to deliver a value which is different and better than that of their competitors. Among the various products on the market, an educated consumer chooses the offer that provides the greatest value for him/her. Therefore, it is essential for each company to determine how consumers perceive the value of its offer, and which factors determine the high level of perceived value for current and potential consumers. An analysis of these factors provides guidance on how to improve the existing offer and what the offer to be delivered in the future should be like. That could increase the perceived value of the company offer and result in a positive impact on consumer satisfaction and on establishing a stronger, longterm relationship with consumers. The process of defining the perceived value of a particular market offer is affected by the factors of the respective company’s offer as well as by competition factors, consumer factors and buying process factors. The aim of this paper is to analyze the relevant knowledge about the process of creating the perceived value of the company’s market offer and the factors that influence this process. The paper presents a conceptual model of the perceived value creation process in consumers’ mind.
The disposal of redundant teletherapy units from NHS hospitals

International Nuclear Information System (INIS)

Gaffka, A.P.; Ord, M.A.

1994-01-01

The removal/disposal of redundant teletherapy units from NHS hospitals is described, detailing the operational procedures and the transport package background. The Harwell section of the Transport Technology Department has been carrying out these operations since 1991, where initially the service was just offered to the NHS; however, today their specialist transport service has significantly widened and is now offered to other business sectors. Due to the level of radioactivity found in each teletherapy unit, it was necessary to design a special transport packaging to meet the requirements for shipment of these units. Approval was sought from the Department of Transport to adapt a standard Type B package as no other packaging could be found to comply with the necessary requirements. All work undertaken on the removal and disposal of these units complied with an approved scheme of work and was carried out in accordance with a Quality Assurance workplan. However, to keep abreast of modern standards in a manner which is cost effective to customers and acceptable to the general public, the full development of a new Type B packaging is taking place, which is specifically designed to undertake these removal/disposal duties. (author)
41 CFR 60-30.19 - Objections; exceptions; offer of proof.

Science.gov (United States)

2010-07-01

... exceptions to the Administrative Law Judge's recommendations and conclusions. (c) Offer of proof. An offer of...; offer of proof. 60-30.19 Section 60-30.19 Public Contracts and Property Management Other Provisions... EXECUTIVE ORDER 11246 Hearings and Related Matters § 60-30.19 Objections; exceptions; offer of proof. (a...
Accelerating epistasis analysis in human genetics with consumer graphics hardware

Directory of Open Access Journals (Sweden)

Cancare Fabio

2009-07-01

Full Text Available Abstract Background Human geneticists are now capable of measuring more than one million DNA sequence variations from across the human genome. The new challenge is to develop computationally feasible methods capable of analyzing these data for associations with common human disease, particularly in the context of epistasis. Epistasis describes the situation where multiple genes interact in a complex non-linear manner to determine an individual's disease risk and is thought to be ubiquitous for common diseases. Multifactor Dimensionality Reduction (MDR is an algorithm capable of detecting epistasis. An exhaustive analysis with MDR is often computationally expensive, particularly for high order interactions. This challenge has previously been met with parallel computation and expensive hardware. The option we examine here exploits commodity hardware designed for computer graphics. In modern computers Graphics Processing Units (GPUs have more memory bandwidth and computational capability than Central Processing Units (CPUs and are well suited to this problem. Advances in the video game industry have led to an economy of scale creating a situation where these powerful components are readily available at very low cost. Here we implement and evaluate the performance of the MDR algorithm on GPUs. Of primary interest are the time required for an epistasis analysis and the price to performance ratio of available solutions. Findings We found that using MDR on GPUs consistently increased performance per machine over both a feature rich Java software package and a C++ cluster implementation. The performance of a GPU workstation running a GPU implementation reduces computation time by a factor of 160 compared to an 8-core workstation running the Java implementation on CPUs. This GPU workstation performs similarly to 150 cores running an optimized C++ implementation on a Beowulf cluster. Furthermore this GPU system provides extremely cost effective
Accelerating epistasis analysis in human genetics with consumer graphics hardware.

Science.gov (United States)

Sinnott-Armstrong, Nicholas A; Greene, Casey S; Cancare, Fabio; Moore, Jason H

2009-07-24

Human geneticists are now capable of measuring more than one million DNA sequence variations from across the human genome. The new challenge is to develop computationally feasible methods capable of analyzing these data for associations with common human disease, particularly in the context of epistasis. Epistasis describes the situation where multiple genes interact in a complex non-linear manner to determine an individual's disease risk and is thought to be ubiquitous for common diseases. Multifactor Dimensionality Reduction (MDR) is an algorithm capable of detecting epistasis. An exhaustive analysis with MDR is often computationally expensive, particularly for high order interactions. This challenge has previously been met with parallel computation and expensive hardware. The option we examine here exploits commodity hardware designed for computer graphics. In modern computers Graphics Processing Units (GPUs) have more memory bandwidth and computational capability than Central Processing Units (CPUs) and are well suited to this problem. Advances in the video game industry have led to an economy of scale creating a situation where these powerful components are readily available at very low cost. Here we implement and evaluate the performance of the MDR algorithm on GPUs. Of primary interest are the time required for an epistasis analysis and the price to performance ratio of available solutions. We found that using MDR on GPUs consistently increased performance per machine over both a feature rich Java software package and a C++ cluster implementation. The performance of a GPU workstation running a GPU implementation reduces computation time by a factor of 160 compared to an 8-core workstation running the Java implementation on CPUs. This GPU workstation performs similarly to 150 cores running an optimized C++ implementation on a Beowulf cluster. Furthermore this GPU system provides extremely cost effective performance while leaving the CPU available for other
Computing the Density Matrix in Electronic Structure Theory on Graphics Processing Units.

Science.gov (United States)

Cawkwell, M J; Sanville, E J; Mniszewski, S M; Niklasson, Anders M N

2012-11-13

The self-consistent solution of a Schrödinger-like equation for the density matrix is a critical and computationally demanding step in quantum-based models of interatomic bonding. This step was tackled historically via the diagonalization of the Hamiltonian. We have investigated the performance and accuracy of the second-order spectral projection (SP2) algorithm for the computation of the density matrix via a recursive expansion of the Fermi operator in a series of generalized matrix-matrix multiplications. We demonstrate that owing to its simplicity, the SP2 algorithm [Niklasson, A. M. N. Phys. Rev. B2002, 66, 155115] is exceptionally well suited to implementation on graphics processing units (GPUs). The performance in double and single precision arithmetic of a hybrid GPU/central processing unit (CPU) and full GPU implementation of the SP2 algorithm exceed those of a CPU-only implementation of the SP2 algorithm and traditional matrix diagonalization when the dimensions of the matrices exceed about 2000 × 2000. Padding schemes for arrays allocated in the GPU memory that optimize the performance of the CUBLAS implementations of the level 3 BLAS DGEMM and SGEMM subroutines for generalized matrix-matrix multiplications are described in detail. The analysis of the relative performance of the hybrid CPU/GPU and full GPU implementations indicate that the transfer of arrays between the GPU and CPU constitutes only a small fraction of the total computation time. The errors measured in the self-consistent density matrices computed using the SP2 algorithm are generally smaller than those measured in matrices computed via diagonalization. Furthermore, the errors in the density matrices computed using the SP2 algorithm do not exhibit any dependence of system size, whereas the errors increase linearly with the number of orbitals when diagonalization is employed.
Offers for our members

CERN Multimedia

Staff Association

2017-01-01

Summer is here, enjoy our offers for the aquatic parcs! Walibi : Tickets "Zone terrestre": 24 € instead of 30 €. Access to Aqualibi: 5 € instead of 6 € on presentation of your SA member ticket. Free for children under 100 cm. Car park free. * * * * * Aquaparc : Day ticket: – Children: 33 CHF instead of 39 CHF – Adults : 33 CHF instead of 49 CHF Bonus! Free for children under 5.
Offers for our members

CERN Multimedia

Staff Association

2017-01-01

Summer is coming, enjoy our offers for the aquatic parcs! Walibi : Tickets "Zone terrestre": 24 € instead of 30 €. Access to Aqualibi: 5 € instead of 6 € on presentation of your SA member ticket. Free for children under 100 cm. Car park free. * * * * * Aquaparc : Day ticket: – Children: 33 CHF instead of 39 CHF – Adults : 33 CHF instead of 49 CHF Bonus! Free for children under 5.
17 CFR 240.14e-1 - Unlawful tender offer practices.

Science.gov (United States)

2010-04-01

... 17 Commodity and Securities Exchanges 3 2010-04-01 2010-04-01 false Unlawful tender offer... Securities Exchange Act of 1934 Regulation 14e § 240.14e-1 Unlawful tender offer practices. As a means... section 14(e) of the Act, no person who makes a tender offer shall: (a) Hold such tender offer open for...
12 CFR 563g.8 - Use of the offering circular.

Science.gov (United States)

2010-01-01

... 12 Banks and Banking 5 2010-01-01 2010-01-01 false Use of the offering circular. 563g.8 Section 563g.8 Banks and Banking OFFICE OF THRIFT SUPERVISION, DEPARTMENT OF THE TREASURY SECURITIES OFFERINGS § 563g.8 Use of the offering circular. (a) An offering circular or amendment declared effective by the...
Medical image processing on the GPU - past, present and future.

Science.gov (United States)

Eklund, Anders; Dufort, Paul; Forsberg, Daniel; LaConte, Stephen M

2013-12-01

Graphics processing units (GPUs) are used today in a wide range of applications, mainly because they can dramatically accelerate parallel computing, are affordable and energy efficient. In the field of medical imaging, GPUs are in some cases crucial for enabling practical use of computationally demanding algorithms. This review presents the past and present work on GPU accelerated medical image processing, and is meant to serve as an overview and introduction to existing GPU implementations. The review covers GPU acceleration of basic image processing operations (filtering, interpolation, histogram estimation and distance transforms), the most commonly used algorithms in medical imaging (image registration, image segmentation and image denoising) and algorithms that are specific to individual modalities (CT, PET, SPECT, MRI, fMRI, DTI, ultrasound, optical imaging and microscopy). The review ends by highlighting some future possibilities and challenges. Copyright © 2013 Elsevier B.V. All rights reserved.
Aerodynamic optimization of supersonic compressor cascade using differential evolution on GPU

Energy Technology Data Exchange (ETDEWEB)

Aissa, Mohamed Hasanine; Verstraete, Tom [Von Karman Institute for Fluid Dynamics (VKI) 1640 Sint-Genesius-Rode (Belgium); Vuik, Cornelis [Delft University of Technology 2628 CD Delft (Netherlands)

2016-06-08

Differential Evolution (DE) is a powerful stochastic optimization method. Compared to gradient-based algorithms, DE is able to avoid local minima but requires at the same time more function evaluations. In turbomachinery applications, function evaluations are performed with time-consuming CFD simulation, which results in a long, non affordable, design cycle. Modern High Performance Computing systems, especially Graphic Processing Units (GPUs), are able to alleviate this inconvenience by accelerating the design evaluation itself. In this work we present a validated CFD Solver running on GPUs, able to accelerate the design evaluation and thus the entire design process. An achieved speedup of 20x to 30x enabled the DE algorithm to run on a high-end computer instead of a costly large cluster. The GPU-enhanced DE was used to optimize the aerodynamics of a supersonic compressor cascade, achieving an aerodynamic loss minimization of 20%.
14 CFR 151.121 - Procedures: Offer; sponsor assurances.

Science.gov (United States)

2010-01-01

... 14 Aeronautics and Space 3 2010-01-01 2010-01-01 false Procedures: Offer; sponsor assurances. 151.121 Section 151.121 Aeronautics and Space FEDERAL AVIATION ADMINISTRATION, DEPARTMENT OF... Engineering Proposals § 151.121 Procedures: Offer; sponsor assurances. Each sponsor must adopt the following...

Total body irradiation with a reconditioned cobalt teletherapy unit.

Science.gov (United States)

Evans, Michael D C; Larouche, Renée-Xavière; Olivares, Marina; Léger, Pierre; Larkin, Joe; Freeman, Carolyn R; Podgorsak, Ervin B

2006-01-01

While the current trend in radiotherapy is to replace cobalt teletherapy units with more versatile and technologically advanced linear accelerators, there remain some useful applications for older cobalt units. The expansion of our radiotherapy department involved the decommissioning of an isocentric cobalt teletherapy unit and the replacement of a column-mounted 4-MV LINAC that has been used for total body irradiation (TBI). To continue offering TBI treatments, we converted the decommissioned cobalt unit into a dedicated fixed-field total body irradiator and installed it in an existing medium-energy LINAC bunker. This article describes the logistical and dosimetric aspects of bringing a reconditioned cobalt teletherapy unit into clinical service as a total body irradiator.
7 CFR 1494.601 - Acceptance of offers by CCC.

Science.gov (United States)

2010-01-01

... exporting countries; and the cost effectiveness of the payment of a CCC bonus amount in view of CCC's... exporter's offer by CCC but not later than 10 a.m. of the next business day after the date the offer was... accepted by CCC by 10 a.m. of the next business day after the date for which the offer was submitted for...
CUDA-Sankoff

DEFF Research Database (Denmark)

Sundfeld, Daniel; Havgaard, Jakob H.; Gorodkin, Jan

2017-01-01

In this paper, we propose and evaluate CUDASankoff, a solution to the RNA structural alignment problem based on the Sankoff algorithm in Graphics Processing Units (GPUS). To our knowledge, this is the first time the Sankoff algorithm is implemented in GPU. In our solution, we show how to lineariz...... to 24 times faster than a 16-core CPU solution in the 281 nucleotide Sankoff execution....
Evaluating alternative offering strategies for wind producers in a pool

International Nuclear Information System (INIS)

Rahimiyan, Morteza; Morales, Juan M.; Conejo, Antonio J.

2011-01-01

Highlights: → Out-of-sample analysis allows comparing diverse offers using real-world data. → Offering the best production forecast is not optimal for a wind producer. → Stochastic programming offers lead to maximum expected profit. → Offering the best production forecast is not generally optimal for risk control. → Stochastic programming offers lead to the best tradeoff profit versus risk. -- Abstract: As wind power technology matures and reaches break-even cost, wind producers find it increasingly attractive to participate in pool markets instead of being paid feed-in tariffs. The key issue is then how a wind producer should offer in the pool markets to achieve maximum profit while controlling the variability of such profit. This paper compares two families of offering strategies based, respectively, on a naive use of wind production forecasts and on stochastic programming models. These strategies are compared through a comprehensive out-of-sample chronological analysis based on real-world data. A number of relevant conclusions are then duly drawn.
Offers for our members

CERN Multimedia

Staff Association

2018-01-01

Summer is coming, enjoy our offers for the aquatic parcs! Walibi : Tickets "Zone terrestre": 25 € instead of 31 €. Access to Aqualibi: 5 € instead of 8 € on presentation of your Staff Association member ticket. Free for children under 100 cm. Car park free. * * * * * Aquaparc : Day ticket: – Children: 33 CHF instead of 39 CHF – Adults : 33 CHF instead of 49 CHF Bonus! Free for children under 5.
[Analysis of the web pages of the intensive care units of Spain].

Science.gov (United States)

Navarro-Arnedo, J M

2009-01-01

In order to determine the Intensive Care Units (ICU) of Spanish hospitals that had a web site, to analyze the information they offered and to know what information they needed to offer according to a sample of ICU nurses, a cross-sectional observational, descriptive study was carried out between January and September 2008. For each ICU website, an analysis was made on the information available on the unit, its care, teaching and research activity on nursing. Simultaneously, based on a sample of intensive care nurses, the information that should be contained on an ICU website was determined. The results, expressed in absolute numbers and percentage, showed that 66 of the 292 hospitals with ICU (22.6%) had a web site; 50.7% of the sites showed the number of beds, 19.7% the activity report, 11.3% the published articles/studies and followed research lines and 9.9% the organized formation courses. 14 webs (19.7%) displayed images of nurses. However, only 1 (1.4%) offered guides on the actions followed. No web site offered a navigation section for nursing, the E-mail of the chief nursing, the nursing documentation used or if any nursing model of their own was used. It is concluded that only one-fourth of the Spanish hospitals with ICU have a web site; number of beds was the data offered by the most sites, whereas information on care, educational and investigating activities was very reduced and that on nursing was practically omitted on the web pages of intensive care units.
7 CFR 58.5 - Where service is offered.

Science.gov (United States)

2010-01-01

... 7 Agriculture 3 2010-01-01 2010-01-01 false Where service is offered. 58.5 Section 58.5 Agriculture Regulations of the Department of Agriculture (Continued) AGRICULTURAL MARKETING SERVICE (Standards... Grading Service § 58.5 Where service is offered. Subject to the provisions of this part, inspection or...
Unit Testing for Command and Control Systems

Science.gov (United States)

Alexander, Joshua

2018-01-01

Unit tests were created to evaluate the functionality of a Data Generation and Publication tool for a command and control system. These unit tests are developed to constantly evaluate the tool and ensure it functions properly as the command and control system grows in size and scope. Unit tests are a crucial part of testing any software project and are especially instrumental in the development of a command and control system. They save resources, time and costs associated with testing, and catch issues before they become increasingly difficult and costly. The unit tests produced for the Data Generation and Publication tool to be used in a command and control system assure the users and stakeholders of its functionality and offer assurances which are vital in the launching of spacecraft safely.
Computational performance of a smoothed particle hydrodynamics simulation for shared-memory parallel computing

Science.gov (United States)

Nishiura, Daisuke; Furuichi, Mikito; Sakaguchi, Hide

2015-09-01

The computational performance of a smoothed particle hydrodynamics (SPH) simulation is investigated for three types of current shared-memory parallel computer devices: many integrated core (MIC) processors, graphics processing units (GPUs), and multi-core CPUs. We are especially interested in efficient shared-memory allocation methods for each chipset, because the efficient data access patterns differ between compute unified device architecture (CUDA) programming for GPUs and OpenMP programming for MIC processors and multi-core CPUs. We first introduce several parallel implementation techniques for the SPH code, and then examine these on our target computer architectures to determine the most effective algorithms for each processor unit. In addition, we evaluate the effective computing performance and power efficiency of the SPH simulation on each architecture, as these are critical metrics for overall performance in a multi-device environment. In our benchmark test, the GPU is found to produce the best arithmetic performance as a standalone device unit, and gives the most efficient power consumption. The multi-core CPU obtains the most effective computing performance. The computational speed of the MIC processor on Xeon Phi approached that of two Xeon CPUs. This indicates that using MICs is an attractive choice for existing SPH codes on multi-core CPUs parallelized by OpenMP, as it gains computational acceleration without the need for significant changes to the source code.
Futhark

DEFF Research Database (Denmark)

Henriksen, Troels; Serup, Niels G. W.; Elsman, Martin

2017-01-01

Futhark is a purely functional data-parallel array language that offers a machine-neutral programming model and an optimising compiler that generates OpenCL code for GPUs. This paper presents the design and implementation of three key features of Futhark that seek a suitable middle ground......-of-reference optimisations. Finally, an evaluation on 16 benchmarks demonstrates the impact of the language and compiler features and shows application-level performance competitive with hand-written GPU code. Copyright is held by the owner/author(s)....
Analysis of the main characteristics of Initial Public Offerings in the Czech Republic and perspectives of their further development

Directory of Open Access Journals (Sweden)

Tomáš Meluzín

2009-01-01

Full Text Available Funding development of the company through the “Initial Public Offering” has a high representation globally, the Czech Republic unlike, and belongs to traditional methods of raising funds necessary for development of business in the developed capital markets. In the United States of America, Japan and in the Western Europe countries the method of company funding through IPO has been applying for several decades already. The first public stock offerings began to be applied in these markets in higher volumes from the beginning of the 60th of the last century. From that period importance of IPO goes up globally and the initial public stock offerings begin to be applied more and more even in the Central and Eastern European countries. Since 2004, several companies that have opted for this form of financing can be found in the Czech Republic as well. The objective of the paper is to analyze the main characteristics of initial public offerings of shares effected on the Czech capital market between 2004 and 2008 and to outline the perspectives of further development in this area.
Prototype unit for the air decontamination

International Nuclear Information System (INIS)

Garcia A, J.

1991-01-01

In this work it has the objective of to design and to manufacture an unit that took advantage of the filters of national production in appropriate form, and to offer a wide protection to the maintenance personnel that carries out the replacement, besides to be able to be installed in any position and to facilitate the adsorber installation
9 CFR 592.22 - Where service is offered.

Science.gov (United States)

2010-01-01

... 9 Animals and Animal Products 2 2010-01-01 2010-01-01 false Where service is offered. 592.22 Section 592.22 Animals and Animal Products FOOD SAFETY AND INSPECTION SERVICE, DEPARTMENT OF AGRICULTURE EGG PRODUCTS INSPECTION VOLUNTARY INSPECTION OF EGG PRODUCTS General § 592.22 Where service is offered...
AMITIS: A 3D GPU-Based Hybrid-PIC Model for Space and Plasma Physics

Science.gov (United States)

Fatemi, Shahab; Poppe, Andrew R.; Delory, Gregory T.; Farrell, William M.

2017-05-01

We have developed, for the first time, an advanced modeling infrastructure in space simulations (AMITIS) with an embedded three-dimensional self-consistent grid-based hybrid model of plasma (kinetic ions and fluid electrons) that runs entirely on graphics processing units (GPUs). The model uses NVIDIA GPUs and their associated parallel computing platform, CUDA, developed for general purpose processing on GPUs. The model uses a single CPU-GPU pair, where the CPU transfers data between the system and GPU memory, executes CUDA kernels, and writes simulation outputs on the disk. All computations, including moving particles, calculating macroscopic properties of particles on a grid, and solving hybrid model equations are processed on a single GPU. We explain various computing kernels within AMITIS and compare their performance with an already existing well-tested hybrid model of plasma that runs in parallel using multi-CPU platforms. We show that AMITIS runs ∼10 times faster than the parallel CPU-based hybrid model. We also introduce an implicit solver for computation of Faraday’s Equation, resulting in an explicit-implicit scheme for the hybrid model equation. We show that the proposed scheme is stable and accurate. We examine the AMITIS energy conservation and show that the energy is conserved with an error < 0.2% after 500,000 timesteps, even when a very low number of particles per cell is used.
Unit bias. A new heuristic that helps explain the effect of portion size on food intake.

Science.gov (United States)

Geier, Andrew B; Rozin, Paul; Doros, Gheorghe

2006-06-01

People seem to think that a unit of some entity (with certain constraints) is the appropriate and optimal amount. We refer to this heuristic as unit bias. We illustrate unit bias by demonstrating large effects of unit segmentation, a form of portion control, on food intake. Thus, people choose, and presumably eat, much greater weights of Tootsie Rolls and pretzels when offered a large as opposed to a small unit size (and given the option of taking as many units as they choose at no monetary cost). Additionally, they consume substantially more M&M's when the candies are offered with a large as opposed to a small spoon (again with no limits as to the number of spoonfuls to be taken). We propose that unit bias explains why small portion sizes are effective in controlling consumption; in some cases, people served small portions would simply eat additional portions if it were not for unit bias. We argue that unit bias is a general feature in human choice and discuss possible origins of this bias, including consumption norms.
31 CFR 50.13 - Offer, purchase, and renewal.

Science.gov (United States)

2010-07-01

... Section 50.13 Money and Finance: Treasury Office of the Secretary of the Treasury TERRORISM RISK INSURANCE PROGRAM Disclosures as Conditions for Federal Payment § 50.13 Offer, purchase, and renewal. An insurer is deemed to be in compliance with the requirement of providing disclosure “at the time of offer, purchase...
17 CFR Appendix D to Part 30 - Information That a Foreign Board of Trade Should Submit When Seeking No-Action Relief To Offer...

Science.gov (United States)

2010-04-01

... primary and secondary markets for the component equities, the liquidity of the component stocks, the... applicable, by price in the index as well as the combined weighting of the five highest-weighted stocks in... in evaluating requests by foreign boards of trade to allow the offer and sale within the United...
Marketing Digital Offerings Is Different: Strategies for Teaching about Digital Offerings in the Marketing Classroom

Science.gov (United States)

Roberts, Scott D.; Micken, Kathleen S.

2015-01-01

Digital offerings represent different challenges for marketers than do traditional goods and services. After reviewing the literature, the authors suggest ways that the marketing of digital goods and services might be better presented to and better understood by students. The well-known four challenges of services marketing model (e.g.,…
Neural Parallel Engine: A toolbox for massively parallel neural signal processing.

Science.gov (United States)

Tam, Wing-Kin; Yang, Zhi

2018-05-01

Large-scale neural recordings provide detailed information on neuronal activities and can help elicit the underlying neural mechanisms of the brain. However, the computational burden is also formidable when we try to process the huge data stream generated by such recordings. In this study, we report the development of Neural Parallel Engine (NPE), a toolbox for massively parallel neural signal processing on graphical processing units (GPUs). It offers a selection of the most commonly used routines in neural signal processing such as spike detection and spike sorting, including advanced algorithms such as exponential-component-power-component (EC-PC) spike detection and binary pursuit spike sorting. We also propose a new method for detecting peaks in parallel through a parallel compact operation. Our toolbox is able to offer a 5× to 110× speedup compared with its CPU counterparts depending on the algorithms. A user-friendly MATLAB interface is provided to allow easy integration of the toolbox into existing workflows. Previous efforts on GPU neural signal processing only focus on a few rudimentary algorithms, are not well-optimized and often do not provide a user-friendly programming interface to fit into existing workflows. There is a strong need for a comprehensive toolbox for massively parallel neural signal processing. A new toolbox for massively parallel neural signal processing has been created. It can offer significant speedup in processing signals from large-scale recordings up to thousands of channels. Copyright © 2018 Elsevier B.V. All rights reserved.
20 CFR 655.122 - Contents of job offers.

Science.gov (United States)

2010-04-01

... less than the same benefits, wages, and working conditions that the employer is offering, intends to... worker on a day during the work contract period is less than the number of hours offered, as specified in... the records for not less than 3 years after the date of the certification. (k) Hours and earnings...

Real-time colouring and filtering with graphics shaders

Science.gov (United States)

Vohl, D.; Fluke, C. J.; Barnes, D. G.; Hassan, A. H.

2017-11-01

Despite the popularity of the Graphics Processing Unit (GPU) for general purpose computing, one should not forget about the practicality of the GPU for fast scientific visualization. As astronomers have increasing access to three-dimensional (3D) data from instruments and facilities like integral field units and radio interferometers, visualization techniques such as volume rendering offer means to quickly explore spectral cubes as a whole. As most 3D visualization techniques have been developed in fields of research like medical imaging and fluid dynamics, many transfer functions are not optimal for astronomical data. We demonstrate how transfer functions and graphics shaders can be exploited to provide new astronomy-specific explorative colouring methods. We present 12 shaders, including four novel transfer functions specifically designed to produce intuitive and informative 3D visualizations of spectral cube data. We compare their utility to classic colour mapping. The remaining shaders highlight how common computation like filtering, smoothing and line ratio algorithms can be integrated as part of the graphics pipeline. We discuss how this can be achieved by utilizing the parallelism of modern GPUs along with a shading language, letting astronomers apply these new techniques at interactive frame rates. All shaders investigated in this work are included in the open source software shwirl (Vohl 2017).
Cost Indexing and Unit Price Adjustments for Construction Materials

Science.gov (United States)

2012-10-30

This project was focused on the assimilation of information regarding unit price adjustment clauses, or PACs, : that are offered for construction materials at the state Departments of Transportation (DOTs). It is intended to : provide the South Carol...
[ANDALIES project: consumption, offer and promotion of healthy eating habits within secondary schools in Andalusia].

Science.gov (United States)

González Rodríguez, Angustias; García Padilla, Francisca M; Martos Cerezuela, Ildefonso; Silvano Arranz, Agustina; Fernández Lao, Isabel

2015-04-01

The school context stands out as one of the factors influencing the food practices of adolescents. Food consumption during the school day, the cafeterias' supply and the promotional activities proposed by the centers are objects of increasing attention to community health services. To describe students' eating habits during the school day; to analyze the food on offer by the cafeterias and surrounding establishments; and to assess whether secondary schools are suitable environments for the promotion of healthy eating habits. Cross-sectional study during 2010-2012 courses. Sampling units: public secondary schools (95) and students (8.068). Multistage cluster sampling: random and stratified selection by province and habitat size. Selection of students: systematic sampling of classrooms. 77.5% of students have breakfast at home: cereals and a dairy product (40.9%) or a liquid (29.2%); 70.3% eat something at school and most of them choose a cold meat sandwich. Fruit consumption is infrequent (2.5%) while packed juices are very common (63.3%). 75% eat sweets, the figure increasing significantly in schools with cafeterias. Cafeterias offer a large number of non-recommended products: soft drinks (97,3%), cold meats (91,8%), sweets and chips (89%). Lack of control of the products on offer is common (68.42%); only 28.4% of the managers know the law. 72.5% of the centers undertake isolated activities for the promotion of healthy eating habits. 71.5% of the centers are surrounded by shops that supply the students. Low protection of students' food health is evident, resulting from: students' nutritional deficits, the low quality of the food offered by the cafeterias and the lack of activities to encourage healthy habits. For which reason, educational, health and local administrations must accept shared responsibility on this subject. Copyright AULA MEDICA EDICIONES 2014. Published by AULA MEDICA. All rights reserved.
Empirical observations offer improved estimates of forest floor carbon content across in the United States

Science.gov (United States)

Perry, C. H.; Domke, G. M.; Walters, B. F.; Smith, J. E.; Woodall, C. W.

2014-12-01

The Forest Inventory and Analysis (FIA) program of the United States Forest Service reports official estimates of national forest floor carbon (FFC) stocks and stock change to national and international parties, the US Environmental Protection Agency (USEPA) and the United Nations Framework Convention on Climate Change (UNFCCC), respectively. These estimates of national FFC stocks are derived from plot-level predictions of FFC density. We suspect the models used to predict plot-level FFC density are less than ideal for several reasons: (a) they are based upon local studies that may not reflect FFC dynamics at the national scale, (b) they are relatively insensitive to climate change, and (c) they reduce the natural variability of the data leading to misplaced confidence in the estimates. However, FIA has measured forest floor attributes since 2001 on a systematic 1/16th subset of a nation-wide array of inventory plots (7 800 of 125 000 plots). Here we address the efficacy of replacing plot-level model predictions with empirical observations of FFC density while assessing the impact of imputing FFC density values to the full plot network on national stock estimates. First, using an equivalence testing framework, we found model predictions of FFC density to differ significantly from the observations in all regions and forest types; the mean difference across all plots was 21 percent (1.81 Mg·ha-1). Furthermore, the model predictions were biased towards the lower end of extant FFC density observations, underestimating it while greatly truncating the range relative to the observations. Second, the optimal imputation approach (k-Nearest Neighbor, k-NN) resulted in values that were equivalent to observations of FFC density across a range of simulated missingness and maintained the high variability seen in the observations. We used the k-NN approach to impute FFC density values to the 94 percent of FIA inventory plots without soil measurements. Third, using the imputed
Theoretical and Methodological Considerations on the Public Offers

OpenAIRE

Claudia Catalina SAVA

2013-01-01

This paper describes the most important characteristics of the public offers, both from the theoretical and methodological view. The European Union emphasizes clarity and transparency. The author focuses on specific provisions of European Directive and Romanian law and regulations related to voluntary and mandatory takeover bids, on characteristics such as price, offeror and offeeree right, offer timetable.
Visual Thinking, Algebraic Thinking, and a Full Unit-Circle Diagram.

Science.gov (United States)

Shear, Jonathan

1985-01-01

The study of trigonometric functions in terms of the unit circle offer an example of how students can learn algebraic relations and operations while using visually oriented thinking. Illustrations are included. (MNS)
Offer and Acceptance of Apology in Victim-Offender Mediation

OpenAIRE

Dhami, MK; Dhami, MK

2012-01-01

Past research on restorative justice (RJ) has highlighted the importance of apology for both victims and offenders and the prevalence of apology during the RJ process. The present study moves this work further by examining the nature of the apologies that are offered during victim-offender mediation, as well as the individual-, case-, and mediation-level factors that can affect the offer and acceptance of apology. In addition, we measure the implications that the offer and acceptance of apolo...
Extracting product offers from e-shop websites

OpenAIRE

Horch, Andrea; Kett, Holger; Weisbecker, Anette

2016-01-01

On-line retailers as well as e-shoppers are very interested in gathering product records from the Web in order to compare products and prices. The consumers compare products and prices to find the best price for a specific product or they want to identify alternatives for a product whereas the on-line retailers need to compare their offers with those of their competitors for being able to remain competitive. As there is a huge number and vast array of product offers in the Web the product dat...
16 CFR 238.2 - Initial offer.

Science.gov (United States)

2010-01-01

... § 238.2 Initial offer. (a) No statement or illustration should be used in any advertisement which creates a false impression of the grade, quality, make, value, currency of model, size, color, usability...
Accelerating adaptive inverse distance weighting interpolation algorithm on a graphics processing unit.

Science.gov (United States)

Mei, Gang; Xu, Liangliang; Xu, Nengxiong

2017-09-01

This paper focuses on designing and implementing parallel adaptive inverse distance weighting (AIDW) interpolation algorithms by using the graphics processing unit (GPU). The AIDW is an improved version of the standard IDW, which can adaptively determine the power parameter according to the data points' spatial distribution pattern and achieve more accurate predictions than those predicted by IDW. In this paper, we first present two versions of the GPU-accelerated AIDW, i.e. the naive version without profiting from the shared memory and the tiled version taking advantage of the shared memory. We also implement the naive version and the tiled version using two data layouts, structure of arrays and array of aligned structures, on both single and double precision. We then evaluate the performance of parallel AIDW by comparing it with its corresponding serial algorithm on three different machines equipped with the GPUs GT730M, M5000 and K40c. The experimental results indicate that: (i) there is no significant difference in the computational efficiency when different data layouts are employed; (ii) the tiled version is always slightly faster than the naive version; and (iii) on single precision the achieved speed-up can be up to 763 (on the GPU M5000), while on double precision the obtained highest speed-up is 197 (on the GPU K40c). To benefit the community, all source code and testing data related to the presented parallel AIDW algorithm are publicly available.
14 CFR 152.115 - Grant agreement: Offer, acceptance, and amendment.

Science.gov (United States)

2010-01-01

... agency's attorney must certify that the acceptance complies with all applicable law, and constitutes a... 14 Aeronautics and Space 3 2010-01-01 2010-01-01 false Grant agreement: Offer, acceptance, and....115 Grant agreement: Offer, acceptance, and amendment. (a) Offer. Upon approving a project for airport...
Sport's offer as an instrument of sports marketing mix

Directory of Open Access Journals (Sweden)

Gašović Milan

2004-01-01

Full Text Available Taking logical postulate that a product is all what can be offered on the market in order to satisfy needs, demands or wants of customer, regarding the core of sport's offer (product, marketing experts must give answers to three key questions: What can sports companies, teams or individuals offer to consumer? What needs can sports companies, teams or individuals satisfy? What instruments (techniques and methods should use marketing experts in sports organizations in order to satisfy identified customer needs? .
Parallel Computational Intelligence-Based Multi-Camera Surveillance System

Directory of Open Access Journals (Sweden)

Sergio Orts-Escolano

2014-04-01

Full Text Available In this work, we present a multi-camera surveillance system based on the use of self-organizing neural networks to represent events on video. The system processes several tasks in parallel using GPUs (graphic processor units. It addresses multiple vision tasks at various levels, such as segmentation, representation or characterization, analysis and monitoring of the movement. These features allow the construction of a robust representation of the environment and interpret the behavior of mobile agents in the scene. It is also necessary to integrate the vision module into a global system that operates in a complex environment by receiving images from multiple acquisition devices at video frequency. Offering relevant information to higher level systems, monitoring and making decisions in real time, it must accomplish a set of requirements, such as: time constraints, high availability, robustness, high processing speed and re-configurability. We have built a system able to represent and analyze the motion in video acquired by a multi-camera network and to process multi-source data in parallel on a multi-GPU architecture.
Offers for our members

CERN Multimedia

Staff Association

2013-01-01

Summer is here, enjoy our offers for the aquatic parcs! Walibi : Tickets "Zone terrestre": 21 € instead of 26 €. Access to Aqualibi: 5 € instead of 8 € on presentation of your SA member ticket. Free for children (3-11 years old) before 12 h 00. Free for children under 3, with limited access to the attractions. Car park free. * * * * * Aquaparc : Day ticket: – Children: 30 CHF instead of 39 CHF – Adults : 36 CHF instead of 49 CHF Bonus! Free for children under 5.
Offers for our members

CERN Multimedia

Staff Association

2015-01-01

Summer is here, enjoy our offers for the aquatic parcs! Walibi : Tickets "Zone terrestre": 21,50 € instead of 27 €. Access to Aqualibi: 5 € instead of 6 € on presentation of your SA member ticket. Free for children (3-11 years old) before 12:00 p.m. Free for children under 3, with limited access to the attractions. Car park free. * * * * * Aquaparc : Day ticket: – Children: 33 CHF instead of 39 CHF – Adults : 33 CHF instead of 49 CHF Bonus! Free for children under 5.
Offers for our members

CERN Multimedia

Staff Association

2016-01-01

Summer is here, enjoy our offers for the aquatic parcs! Walibi : Tickets "Zone terrestre": 23 € instead of 29 €. Access to Aqualibi: 5 € instead of 6 € on presentation of your SA member ticket. Free for children (3-11 years old) before 12:00 p.m. Free for children under 3, with limited access to the attractions. Car park free. * * * * * Aquaparc : Day ticket: – Children: 33 CHF instead of 39 CHF – Adults : 33 CHF instead of 49 CHF Bonus! Free for children under 5.
Offers for our members

CERN Multimedia

Staff Association

2017-01-01

Summer is here, enjoy our offers for the water parks! Walibi: Tickets "Zone terrestre": 24 € instead of 30 €. Access to Aqualibi: 5 € instead of 6 € on presentation of your ticket purchased at the Staff Association. Bonus! Free for children under 100 cm, with limited access to the attractions. Free car park. * * * * * * * * Aquaparc: Day ticket: - Children: 33 CHF instead of 39 CHF - Adults : 33 CHF instead of 49 CHF Bonus! Free for children under 5 years old.
Repetitive flood victims and acceptance of FEMA mitigation offers: an analysis with community-system policy implications.

Science.gov (United States)

Kick, Edward L; Fraser, James C; Fulkerson, Gregory M; McKinney, Laura A; De Vries, Daniel H

2011-07-01

Of all natural disasters, flooding causes the greatest amount of economic and social damage. The United States' Federal Emergency Management Agency (FEMA) uses a number of hazard mitigation grant programmes for flood victims, including mitigation offers to relocate permanently repetitive flood loss victims. This study examines factors that help to explain the degree of difficulty repetitive flood loss victims experience when they make decisions about relocating permanently after multiple flood losses. Data are drawn from interviews with FEMA officials and a survey of flood victims from eight repetitive flooding sites. The qualitative and quantitative results show the importance of rational choices by flood victims in their mitigation decisions, as they relate to financial variables, perceptions of future risk, attachments to home and community, and the relationships between repetitive flood loss victims and the local flood management officials who help them. The results offer evidence to suggest the value of a more community-system approach to FEMA relocation practices. © 2011 The Author(s). Disasters © Overseas Development Institute, 2011.
Characteristics of U.S. Mental Health Facilities That Offer Suicide Prevention Services.

Science.gov (United States)

Kuramoto-Crawford, S Janet; Smith, Kelley E; McKeon, Richard

2016-01-01

This study characterized mental health facilities that offer suicide prevention services or outcome follow-up after discharge. The study analyzed data from 8,459 U.S. mental health facilities that participated in the 2010 National Mental Health Services Survey. Logistic regression analyses were used to compare facilities that offered neither of the prevention services with those that offered both or either service. About one-fifth of mental health facilities reported offering neither suicide prevention services nor outcome follow-up. Approximately one-third offered both, 25% offered suicide prevention services only, and 21% offered only outcome follow-up after discharge. Facilities that offered neither service were less likely than facilities that offered either to offer comprehensive support services or special programs for veterans; to offer substance abuse services; and to be accredited, licensed, or certified. Further examination of facilitators and barriers in implementing suicide prevention services in mental health facilities is warranted.
Comprehensive Study of Acute Effects and Recovery After Concussion

Science.gov (United States)

2015-10-01

MCW IRB regarding personality questionnaire changes and head impact sensor company changes • Amendment under review as of Sept 29, 2015 regarding...a large (3Tb) memory system, and four general purpose graphical processing unit (GPU) systems, each with four Nvidia K40 GPUs. Each of these... company changed prior to study implementation and MCW IRB provided oversight for Banyan Biomarkers’ research activities. Approved by MCW IRB Aug 24

Picard Trajectory Approximation Iteration for Efficient Orbit Propagation

Science.gov (United States)

2015-07-21

computing language developed by NVIDIA for use upon their Graphics Processing Units (GPUs); effectively it allows lightweight parallel computation at...Computation Toolbox, and require Matlab 2010 or newer (2011 or newer recommended), and an NVIDIA GPU with compute capability of 1.3 or greater. 3...and Resonances, pp. 216–227, Dordrecht, Holland, 1970. D. Reidel Publishing Company . [4] Zadunaisky, P. E., On the Estimation of Errors Propagated in
A Bitslice Implementation of Anderson's Attack on A5/1

Science.gov (United States)

Bulavintsev, Vadim; Semenov, Alexander; Zaikin, Oleg; Kochemazov, Stepan

2018-03-01

The A5/1 keystream generator is a part of Global System for Mobile Communications (GSM) protocol, employed in cellular networks all over the world. Its cryptographic resistance was extensively analyzed in dozens of papers. However, almost all corresponding methods either employ a specific hardware or require an extensive preprocessing stage and significant amounts of memory. In the present study, a bitslice variant of Anderson's Attack on A5/1 is implemented. It requires very little computer memory and no preprocessing. Moreover, the attack can be made even more efficient by harnessing the computing power of modern Graphics Processing Units (GPUs). As a result, using commonly available GPUs this method can quite efficiently recover the secret key using only 64 bits of keystream. To test the performance of the implementation, a volunteer computing project was launched. 10 instances of A5/1 cryptanalysis have been successfully solved in this project in a single week.
Real-world-time simulation of memory consolidation in a large-scale cerebellar model

Directory of Open Access Journals (Sweden)

Masato eGosui

2016-03-01

Full Text Available We report development of a large-scale spiking network model of thecerebellum composed of more than 1 million neurons. The model isimplemented on graphics processing units (GPUs, which are dedicatedhardware for parallel computing. Using 4 GPUs simultaneously, we achieve realtime simulation, in which computer simulation ofcerebellar activity for 1 sec completes within 1 sec in thereal-world time, with temporal resolution of 1 msec.This allows us to carry out a very long-term computer simulationof cerebellar activity in a practical time with millisecond temporalresolution. Using the model, we carry out computer simulationof long-term gain adaptation of optokinetic response (OKR eye movementsfor 5 days aimed to study the neural mechanisms of posttraining memoryconsolidation. The simulation results are consistent with animal experimentsand our theory of posttraining memory consolidation. These resultssuggest that realtime computing provides a useful means to studya very slow neural process such as memory consolidation in the brain.
A versatile model for soft patchy particles with various patch arrangements.

Science.gov (United States)

Li, Zhan-Wei; Zhu, You-Liang; Lu, Zhong-Yuan; Sun, Zhao-Yan

2016-01-21

We propose a simple and general mesoscale soft patchy particle model, which can felicitously describe the deformable and surface-anisotropic characteristics of soft patchy particles. This model can be used in dynamics simulations to investigate the aggregation behavior and mechanism of various types of soft patchy particles with tunable number, size, direction, and geometrical arrangement of the patches. To improve the computational efficiency of this mesoscale model in dynamics simulations, we give the simulation algorithm that fits the compute unified device architecture (CUDA) framework of NVIDIA graphics processing units (GPUs). The validation of the model and the performance of the simulations using GPUs are demonstrated by simulating several benchmark systems of soft patchy particles with 1 to 4 patches in a regular geometrical arrangement. Because of its simplicity and computational efficiency, the soft patchy particle model will provide a powerful tool to investigate the aggregation behavior of soft patchy particles, such as patchy micelles, patchy microgels, and patchy dendrimers, over larger spatial and temporal scales.
GPU-based real-time triggering in the NA62 experiment

CERN Document Server

Ammendola, R.; Cretaro, P.; Di Lorenzo, S.; Fantechi, R.; Fiorini, M.; Frezza, O.; Lamanna, G.; Lo Cicero, F.; Lonardo, A.; Martinelli, M.; Neri, I.; Paolucci, P.S.; Pastorelli, E.; Piandani, R.; Pontisso, L.; Rossetti, D.; Simula, F.; Sozzi, M.; Vicini, P.

2016-01-01

Over the last few years the GPGPU (General-Purpose computing on Graphics Processing Units) paradigm represented a remarkable development in the world of computing. Computing for High-Energy Physics is no exception: several works have demonstrated the effectiveness of the integration of GPU-based systems in high level trigger of different experiments. On the other hand the use of GPUs in the low level trigger systems, characterized by stringent real-time constraints, such as tight time budget and high throughput, poses several challenges. In this paper we focus on the low level trigger in the CERN NA62 experiment, investigating the use of real-time computing on GPUs in this synchronous system. Our approach aimed at harvesting the GPU computing power to build in real-time refined physics-related trigger primitives for the RICH detector, as the the knowledge of Cerenkov rings parameters allows to build stringent conditions for data selection at trigger level. Latencies of all components of the trigger chain have...
ACL2 Meets the GPU: Formalizing a CUDA-based Parallelizable All-Pairs Shortest Path Algorithm in ACL2

Directory of Open Access Journals (Sweden)

David S. Hardin

2013-04-01

Full Text Available As Graphics Processing Units (GPUs have gained in capability and GPU development environments have matured, developers are increasingly turning to the GPU to off-load the main host CPU of numerically-intensive, parallelizable computations. Modern GPUs feature hundreds of cores, and offer programming niceties such as double-precision floating point, and even limited recursion. This shift from CPU to GPU, however, raises the question: how do we know that these new GPU-based algorithms are correct? In order to explore this new verification frontier, we formalized a parallelizable all-pairs shortest path (APSP algorithm for weighted graphs, originally coded in NVIDIA's CUDA language, in ACL2. The ACL2 specification is written using a single-threaded object (stobj and tail recursion, as the stobj/tail recursion combination yields the most straightforward translation from imperative programming languages, as well as efficient, scalable executable specifications within ACL2 itself. The ACL2 version of the APSP algorithm can process millions of vertices and edges with little to no garbage generation, and executes at one-sixth the speed of a host-based version of APSP coded in C – a very respectable result for a theorem prover. In addition to formalizing the APSP algorithm (which uses Dijkstra's shortest path algorithm at its core, we have also provided capability that the original APSP code lacked, namely shortest path recovery. Path recovery is accomplished using a secondary ACL2 stobj implementing a LIFO stack, which is proven correct. To conclude the experiment, we ported the ACL2 version of the APSP kernels back to C, resulting in a less than 5% slowdown, and also performed a partial back-port to CUDA, which, surprisingly, yielded a slight performance increase.
Negotiation as a form of persuasion: arguments in first offers.

Science.gov (United States)

Maaravi, Yossi; Ganzach, Yoav; Pazy, Asya

2011-08-01

In this article we examined aspects of negotiation within a persuasion framework. Specifically, we investigated how the provision of arguments that justified the first offer in a negotiation affected the behavior of the parties, namely, how it influenced counteroffers and settlement prices. In a series of 4 experiments and 2 pilot studies, we demonstrated that when the generation of counterarguments was easy, negotiators who did not add arguments to their first offers achieved superior results compared with negotiators who used arguments to justify their first offer. We hypothesized and provided evidence that adding arguments to a first offer was likely to cause the responding party to search for counterarguments, and this, in turn, led him or her to present counteroffers that were further away from the first offer.
Marketing in Dutch mainline congregations. The importance of what religious organizations offer and how they offer

NARCIS (Netherlands)

Sengers, E.

2010-01-01

In rational choice theory, the central explanatory term for the vitality of religious organizations is 'cost'. The higher the cost, the more successful the organization is supposed to be. However, as cost and reward are complementary, research should also pay attention to the rewards offered by
Availability of websites offering to sell psilocybin spores and psilocybin.

Science.gov (United States)

Lott, Jason P; Marlowe, Douglas B; Forman, Robert F

2009-09-01

This study assesses the availability of websites offering to sell psilocybin spores and psilocybin, a powerful hallucinogen contained in Psilocybe mushrooms. Over a 25-month period beginning in March 2003, eight searches were conducted in Google using the term "psilocybin spores." In each search the first 100 nonsponsored links obtained were scored by two independent raters according to standardized criteria to determine whether they offered to sell psilocybin or psilocybin spores. No attempts were made to procure the products offered for sale in order to ascertain whether the marketed psilocybin was in fact "genuine" or "counterfeit." Of the 800 links examined, 58% led to websites offering to sell psilocybin spores. Additionally, evidence that whole Psilocybe mushrooms are offered for sale online was obtained. Psilocybin and psilocybin spores were found to be widely available for sale over the Internet. Online purchase of psilocybin may facilitate illicit use of this potent psychoactive substance. Additional studies are needed to assess whether websites offering to sell psilocybin and psilocybin spores actually deliver their products as advertised.
45 CFR 2544.115 - Who may offer a donation?

Science.gov (United States)

2010-10-01

... 45 Public Welfare 4 2010-10-01 2010-10-01 false Who may offer a donation? 2544.115 Section 2544... COMMUNITY SERVICE SOLICITATION AND ACCEPTANCE OF DONATIONS § 2544.115 Who may offer a donation? Anyone... donation to the Corporation. ...
Improved understanding of protein complex offers insight into DNA

Science.gov (United States)

Summer Science Writing Internship Improved understanding of protein complex offers insight into DNA clearer understanding of the origin recognition complex (ORC) - a protein complex that directs DNA replication - through its crystal structure offers new insight into fundamental mechanisms of DNA replication
On feelings as a heuristic for making offers in ultimatum negotiations.

Science.gov (United States)

Stephen, Andrew T; Pham, Michel Tuan

2008-10-01

This research examined how reliance on emotional feelings as a heuristic influences how offers are made. Results from three experiments using the ultimatum game show that, compared with proposers who do not rely on their feelings, proposers who rely on their feelings make less generous offers in the standard ultimatum game, more generous offers in a variant of the game allowing responders to make counteroffers, and less generous offers in a dictator game in which no responses are allowed. Reliance on feelings triggers a more literal form of play, whereby proposers focus more on how they feel toward the content of the offers than on how they feel toward the possible outcomes of those offers, as if the offers were the final outcomes. Proposers who rely on their feelings also tend to focus on gist-based construals of the negotiation that capture only the essential aspects of the situation.
Dedicated auxiliary power units for Hybrid Electric Vehicles

NARCIS (Netherlands)

Mourad, S.; Weijer, C.J.T. van de

1998-01-01

The use of a dedicated auxiliary power unit is essential to utilize the potential that hybrid vehicles offer for efficient and ultra-clean transportation. An example of a hybrid project at the TNO Road-Vehicles Research Institute shows the development and the results of a dedicated auxiliary power
Green Power Marketing in the United States: A Status Report (2008 Data)

Energy Technology Data Exchange (ETDEWEB)

Bird, L.; Kreycik, C.; Friedman, B.

2009-09-01

Voluntary consumer decisions to buy electricity supplied from renewable energy sources represent a powerful market support mechanism for renewable energy development. In the early 1990s, a small number of U.S. utilities began offering 'green power' options to their customers. Since then, these products have become more prevalent, both from traditional utilities and from renewable energy marketers operating in states that have introduced competition into their retail electricity markets or offering renewable energy certificates (RECs) online. Today, more than half of all U.S. electricity customers have an option to purchase some type of green power product directly from a retail electricity provider, while all consumers have the option to purchase RECs. This report documents green power marketing activities and trends in the United States including utility green pricing programs offered in regulated electricity markets; green power marketing activity in competitive electricity markets, as well as green power sold to voluntary purchasers in the form of RECs; and renewable energy sold as greenhouse gas offsets in the United States. These sections are followed by a discussion of key market trends and issues. The final section offers conclusions and observations.
Green Power Marketing in the United States. A Status Report (2008 Data)

Energy Technology Data Exchange (ETDEWEB)

Bird, Lori [National Renewable Energy Lab. (NREL), Golden, CO (United States); Kreycik, Claire [National Renewable Energy Lab. (NREL), Golden, CO (United States); Friedman, Barry [National Renewable Energy Lab. (NREL), Golden, CO (United States)

2009-09-01

Voluntary consumer decisions to buy electricity supplied from renewable energy sources represent a powerful market support mechanism for renewable energy development. In the early 1990s, a small number of U.S. utilities began offering 'green power' options to their customers. Since then, these products have become more prevalent, both from traditional utilities and from renewable energy marketers operating in states that have introduced competition into their retail electricity markets or offering renewable energy certificates (RECs) online. Today, more than half of all U.S. electricity customers have an option to purchase some type of green power product directly from a retail electricity provider, while all consumers have the option to purchase RECs. This report documents green power marketing activities and trends in the United States including utility green pricing programs offered in regulated electricity markets; green power marketing activity in competitive electricity markets, as well as green power sold to voluntary purchasers in the form of RECs; and renewable energy sold as greenhouse gas offsets in the United States. These sections are followed by a discussion of key market trends and issues. The final section offers conclusions and observations.
A Tenured Faith and an Adjunct Faculty: Successes and Challenges in Instructor Formation at Catholic Colleges that Offer Business Programs in an Accelerated Format

Science.gov (United States)

Gambrall, Doug; Newcomb, Mark A.

2009-01-01

Many Catholic colleges in the United States offer Business programs in an accelerated format, featuring evening courses for adult learners, with few faculty contact hours than traditional classes. Most of these institutions believe in the ideals of Catholic Social Teaching and wish to integrate those principles into their curricula for the sake of…
Development and acceleration of unstructured mesh-based cfd solver

Science.gov (United States)

Emelyanov, V.; Karpenko, A.; Volkov, K.

2017-06-01

The study was undertaken as part of a larger effort to establish a common computational fluid dynamics (CFD) code for simulation of internal and external flows and involves some basic validation studies. The governing equations are solved with ¦nite volume code on unstructured meshes. The computational procedure involves reconstruction of the solution in each control volume and extrapolation of the unknowns to find the flow variables on the faces of control volume, solution of Riemann problem for each face of the control volume, and evolution of the time step. The nonlinear CFD solver works in an explicit time-marching fashion, based on a three-step Runge-Kutta stepping procedure. Convergence to a steady state is accelerated by the use of geometric technique and by the application of Jacobi preconditioning for high-speed flows, with a separate low Mach number preconditioning method for use with low-speed flows. The CFD code is implemented on graphics processing units (GPUs). Speedup of solution on GPUs with respect to solution on central processing units (CPU) is compared with the use of different meshes and different methods of distribution of input data into blocks. The results obtained provide promising perspective for designing a GPU-based software framework for applications in CFD.
GPU-Accelerated Parallel FDTD on Distributed Heterogeneous Platform

Directory of Open Access Journals (Sweden)

Ronglin Jiang

2014-01-01

Full Text Available This paper introduces a (finite difference time domain FDTD code written in Fortran and CUDA for realistic electromagnetic calculations with parallelization methods of Message Passing Interface (MPI and Open Multiprocessing (OpenMP. Since both Central Processing Unit (CPU and Graphics Processing Unit (GPU resources are utilized, a faster execution speed can be reached compared to a traditional pure GPU code. In our experiments, 64 NVIDIA TESLA K20m GPUs and 64 INTEL XEON E5-2670 CPUs are used to carry out the pure CPU, pure GPU, and CPU + GPU tests. Relative to the pure CPU calculations for the same problems, the speedup ratio achieved by CPU + GPU calculations is around 14. Compared to the pure GPU calculations for the same problems, the CPU + GPU calculations have 7.6%–13.2% performance improvement. Because of the small memory size of GPUs, the FDTD problem size is usually very small. However, this code can enlarge the maximum problem size by 25% without reducing the performance of traditional pure GPU code. Finally, using this code, a microstrip antenna array with 16×18 elements is calculated and the radiation patterns are compared with the ones of MoM. Results show that there is a well agreement between them.
A Novel CPU/GPU Simulation Environment for Large-Scale Biologically-Realistic Neural Modeling

Directory of Open Access Journals (Sweden)

Roger V Hoang

2013-10-01

Full Text Available Computational Neuroscience is an emerging field that provides unique opportunities to studycomplex brain structures through realistic neural simulations. However, as biological details are added tomodels, the execution time for the simulation becomes longer. Graphics Processing Units (GPUs are now being utilized to accelerate simulations due to their ability to perform computations in parallel. As such, they haveshown significant improvement in execution time compared to Central Processing Units (CPUs. Most neural simulators utilize either multiple CPUs or a single GPU for better performance, but still show limitations in execution time when biological details are not sacrificed. Therefore, we present a novel CPU/GPU simulation environment for large-scale biological networks,the NeoCortical Simulator version 6 (NCS6. NCS6 is a free, open-source, parallelizable, and scalable simula-tor, designed to run on clusters of multiple machines, potentially with high performance computing devicesin each of them. It has built-in leaky-integrate-and-fire (LIF and Izhikevich (IZH neuron models, but usersalso have the capability to design their own plug-in interface for different neuron types as desired. NCS6is currently able to simulate one million cells and 100 million synapses in quasi real time by distributing dataacross these heterogeneous clusters of CPUs and GPUs.
17 CFR 230.419 - Offerings by blank check companies.

Science.gov (United States)

2010-04-01

... derivative securities relating to securities held in the escrow or trust account may be exercised or... other derivative securities issued in the initial offering are exercisable, there is a continuous... 17 Commodity and Securities Exchanges 2 2010-04-01 2010-04-01 false Offerings by blank check...

Some links on this page may take you to non-federal websites. Their policies may differ from this site.