WorldWideScience

Sample records for units gpus offer

  1. Evaluating Mobile Graphics Processing Units (GPUs) for Real-Time Resource Constrained Applications

    Energy Technology Data Exchange (ETDEWEB)

    Meredith, J; Conger, J; Liu, Y; Johnson, J

    2005-11-11

    Modern graphics processing units (GPUs) can provide tremendous performance boosts for some applications beyond what a single CPU can accomplish, and their performance is growing at a rate faster than CPUs as well. Mobile GPUs available for laptops have the small form factor and low power requirements suitable for use in embedded processing. We evaluated several desktop and mobile GPUs and CPUs on traditional and non-traditional graphics tasks, as well as on the most time consuming pieces of a full hyperspectral imaging application. Accuracy remained high despite small differences in arithmetic operations like rounding. Performance improvements are summarized here relative to a desktop Pentium 4 CPU.

  2. Quantum Chemistry for Solvated Molecules on Graphical Processing Units (GPUs)using Polarizable Continuum Models

    CERN Document Server

    Liu, Fang; Kulik, Heather J; Martínez, Todd J

    2015-01-01

    The conductor-like polarization model (C-PCM) with switching/Gaussian smooth discretization is a widely used implicit solvation model in chemical simulations. However, its application in quantum mechanical calculations of large-scale biomolecular systems can be limited by computational expense of both the gas phase electronic structure and the solvation interaction. We have previously used graphical processing units (GPUs) to accelerate the first of these steps. Here, we extend the use of GPUs to accelerate electronic structure calculations including C-PCM solvation. Implementation on the GPU leads to significant acceleration of the generation of the required integrals for C-PCM. We further propose two strategies to improve the solution of the required linear equations: a dynamic convergence threshold and a randomized block-Jacobi preconditioner. These strategies are not specific to GPUs and are expected to be beneficial for both CPU and GPU implementations. We benchmark the performance of the new implementat...

  3. ASAMgpu V1.0 - a moist fully compressible atmospheric model using graphics processing units (GPUs)

    Science.gov (United States)

    Horn, S.

    2012-03-01

    In this work the three dimensional compressible moist atmospheric model ASAMgpu is presented. The calculations are done using graphics processing units (GPUs). To ensure platform independence OpenGL and GLSL are used, with that the model runs on any hardware supporting fragment shaders. The MPICH2 library enables interprocess communication allowing the usage of more than one GPU through domain decomposition. Time integration is done with an explicit three step Runge-Kutta scheme with a time-splitting algorithm for the acoustic waves. The results for four test cases are shown in this paper. A rising dry heat bubble, a cold bubble induced density flow, a rising moist heat bubble in a saturated environment, and a DYCOMS-II case.

  4. Fitting Galaxies on GPUs

    CERN Document Server

    Barsdell, Benjamin R; Fluke, Christopher J

    2011-01-01

    Structural parameters are normally extracted from observed galaxies by fitting analytic light profiles to the observations. Obtaining accurate fits to high-resolution images is a computationally expensive task, requiring many model evaluations and convolutions with the imaging point spread function. While these algorithms contain high degrees of parallelism, current implementations do not exploit this property. With evergrowing volumes of observational data, an inability to make use of advances in computing power can act as a constraint on scientific outcomes. This is the motivation behind our work, which aims to implement the model-fitting procedure on a graphics processing unit (GPU). We begin by analysing the algorithms involved in model evaluation with respect to their suitability for modern many-core computing architectures like GPUs, finding them to be well-placed to take advantage of the high memory bandwidth offered by this hardware. Following our analysis, we briefly describe a preliminary implementa...

  5. Designing scientific applications on GPUs

    CERN Document Server

    Couturier, Raphael

    2013-01-01

    Many of today's complex scientific applications now require a vast amount of computational power. General purpose graphics processing units (GPGPUs) enable researchers in a variety of fields to benefit from the computational power of all the cores available inside graphics cards.Understand the Benefits of Using GPUs for Many Scientific ApplicationsDesigning Scientific Applications on GPUs shows you how to use GPUs for applications in diverse scientific fields, from physics and mathematics to computer science. The book explains the methods necessary for designing or porting your scientific appl

  6. ASAMgpu V1.0 – a moist fully compressible atmospheric model using graphics processing units (GPUs

    Directory of Open Access Journals (Sweden)

    S. Horn

    2012-03-01

    Full Text Available In this work the three dimensional compressible moist atmospheric model ASAMgpu is presented. The calculations are done using graphics processing units (GPUs. To ensure platform independence OpenGL and GLSL are used, with that the model runs on any hardware supporting fragment shaders. The MPICH2 library enables interprocess communication allowing the usage of more than one GPU through domain decomposition. Time integration is done with an explicit three step Runge-Kutta scheme with a time-splitting algorithm for the acoustic waves. The results for four test cases are shown in this paper. A rising dry heat bubble, a cold bubble induced density flow, a rising moist heat bubble in a saturated environment, and a DYCOMS-II case.

  7. ASAMgpu V1.0 – a moist fully compressible atmospheric model using graphics processing units (GPUs

    Directory of Open Access Journals (Sweden)

    S. Horn

    2011-10-01

    Full Text Available In this work the three dimensional compressible moist atmospheric model ASAMgpu is presented. The calculations are done using graphics processing units (GPUs. To ensure platform independence OpenGL and GLSL is used, with that the model runs on any hardware supporting fragment shaders. The MPICH2 library enables interprocess communication allowing the usage of more than one GPU through domain decomposition. Time integration is done with an explicit three step Runge-Kutta scheme with a timesplitting algorithm for the acoustic waves. The results for four test cases are shown in this paper. A rising dry heat bubble, a cold bubble induced density flow, a rising moist heat bubble in a saturated environment and a DYCOMS-II case.

  8. Fitting Galaxies on GPUs

    Science.gov (United States)

    Barsdell, B. R.; Barnes, D. G.; Fluke, C. J.

    2011-07-01

    Structural parameters are normally extracted from observed galaxies by fitting analytic light profiles to the observations. Obtaining accurate fits to high-resolution images is a computationally expensive task, requiring many model evaluations and convolutions with the imaging point spread function. While these algorithms contain high degrees of parallelism, current implementations do not exploit this property. With ever-growing volumes of observational data, an inability to make use of advances in computing power can act as a constraint on scientific outcomes. This is the motivation behind our work, which aims to implement the model-fitting procedure on a graphics processing unit (GPU). We begin by analysing the algorithms involved in model evaluation with respect to their suitability for modern many-core computing architectures like GPUs, finding them to be well-placed to take advantage of the high memory bandwidth offered by this hardware. Following our analysis, we briefly describe a preliminary implementation of the model fitting procedure using freely-available GPU libraries. Early results suggest a speed-up of around 10× over a CPU implementation. We discuss the opportunities such a speed-up could provide, including the ability to use more computationally expensive but better-performing fitting routines to increase the quality and robustness of fits.

  9. Numerical computations with GPUs

    CERN Document Server

    Kindratenko, Volodymyr

    2014-01-01

    This book brings together research on numerical methods adapted for Graphics Processing Units (GPUs). It explains recent efforts to adapt classic numerical methods, including solution of linear equations and FFT, for massively parallel GPU architectures. This volume consolidates recent research and adaptations, covering widely used methods that are at the core of many scientific and engineering computations. Each chapter is written by authors working on a specific group of methods; these leading experts provide mathematical background, parallel algorithms and implementation details leading to

  10. Accelerating QDP++ using GPUs

    CERN Document Server

    Winter, Frank

    2011-01-01

    Graphic Processing Units (GPUs) are getting increasingly important as target architectures in scientific High Performance Computing (HPC). NVIDIA established CUDA as a parallel computing architecture controlling and making use of the compute power of GPUs. CUDA provides sufficient support for C++ language elements to enable the Expression Template (ET) technique in the device memory domain. QDP++ is a C++ vector class library suited for quantum field theory which provides vector data types and expressions and forms the basis of the lattice QCD software suite Chroma. In this work accelerating QDP++ expression evaluation to a GPU was successfully implemented leveraging the ET technique and using Just-In-Time (JIT) compilation. The Portable Expression Template Engine (PETE) and the C API for CUDA kernel arguments were used to build the bridge between host and device memory domains. This provides the possibility to accelerate Chroma routines to a GPU which are typically not subject to special optimisation. As an ...

  11. Green smartphone GPUs: Optimizing energy consumption using GPUFreq scaling governors

    KAUST Repository

    Ahmad, Enas M.

    2015-10-19

    Modern smartphones are limited by their short battery life. The advancement of the graphical performance is considered as one of the main reasons behind the massive battery drainage in smartphones. In this paper we present a novel implementation of the GPUFreq Scaling Governors, a Dynamic Voltage and Frequency Scaling (DVFS) model implemented in the Android Linux kernel for dynamically scaling smartphone Graphical Processing Units (GPUs). The GPUFreq governors offer users multiple variations and alternatives in controlling the power consumption and performance of their GPUs. We implemented and evaluated our model on a smartphone GPU and measured the energy performance using an external power monitor. The results show that the energy consumption of smartphone GPUs can be significantly reduced with a minor effect on the GPU performance.

  12. GPUs as Storage System Accelerators

    CERN Document Server

    Al-Kiswany, Samer; Ripeanu, Matei

    2012-01-01

    Massively multicore processors, such as Graphics Processing Units (GPUs), provide, at a comparable price, a one order of magnitude higher peak performance than traditional CPUs. This drop in the cost of computation, as any order-of-magnitude drop in the cost per unit of performance for a class of system components, triggers the opportunity to redesign systems and to explore new ways to engineer them to recalibrate the cost-to-performance relation. This project explores the feasibility of harnessing GPUs' computational power to improve the performance, reliability, or security of distributed storage systems. In this context, we present the design of a storage system prototype that uses GPU offloading to accelerate a number of computationally intensive primitives based on hashing, and introduce techniques to efficiently leverage the processing power of GPUs. We evaluate the performance of this prototype under two configurations: as a content addressable storage system that facilitates online similarity detectio...

  13. On the use of graphics processing units (GPUs) for molecular dynamics simulation of spherical particles

    NARCIS (Netherlands)

    Hidalgo, R.C.; Kanzaki, T.; Alonso-Marroquin, F.; Luding, S.; Yu, A.; Dong, K.; Yang, R.; Luding, S.

    2013-01-01

    General-purpose computation on Graphics Processing Units (GPU) on personal computers has recently become an attractive alternative to parallel computing on clusters and supercomputers. We present the GPU-implementation of an accurate molecular dynamics algorithm for a system of spheres. The new hybr

  14. GPUs for the realtime low-level trigger of the NA62 experiment at CERN

    CERN Document Server

    Ammendola, R; Biagioni, A; Chiozzi, S; Cotta Ramusino, A; Fantechi, R; Fiorini, M; Gianoli, A; Graverini, E; Lamanna, G; Lonardo, A; Messina, A; Neri, I; Pantaleo, F; Paolucci, P S; Piandani, R; Pontisso, L; Simula, F; Sozzi, M; Vicini, P

    2015-01-01

    A pilot project for the use of GPUs (Graphics processing units) in online triggering ap- plications for high energy physics experiments (HEP) is presented. GPUs offer a highly parallel architecture and the fact that most of the chip resources are devoted to computa- tion. Moreover, they allow to achieve a large computing power using a limited amount of space and power. The application of online parallel computing on GPUs is shown for the synchronous low level trigger of NA62 experiment at CERN. Direct GPU communication using a FPGA-based board has been exploited to reduce the data transmission latency and results on a first field test at CERN will be highlighted. This work is part of a wider project named GAP (GPU application project), intended to study the use of GPUs in real-time applications in both HEP and medical imagin

  15. Offers

    CERN Multimedia

    Staff Association

    2011-01-01

    Special offers for our members       Go Sport in Val Thoiry is offering 15% discount on all purchases made in the shop upon presentation of the Staff Association membership card (excluding promotions, sale items and bargain corner, and excluding purchases using Go Sport  and Kadéos gift cards. Only one discount can be applied to each purchase).  

  16. Offers

    CERN Multimedia

    Staff Association

    2012-01-01

    L'Occitane en Provence proposes the following offer: 10 % discount on all products in all L'Occitane shops in Metropolitan France upon presentation of your Staff Association membership card and a valid ID. This offer is valid only for one person, is non-transferable and cannot be combined with other promotions.

  17. Offers

    CERN Multimedia

    Staff Association

    2014-01-01

    New offers : Discover the theater Galpon in Geneva. The Staff Association is happy to offer to its members a discount of 8.00 CHF on a full-price ticket (tickets of 15.00 CHF instead of 22.00 CHF) so do not hesitate anymore (mandatory reservation by phone + 4122 321  21 76 as tickets are quickly sold out!). For further information, please see our website: http://staff-association.web.cern.ch/fr/content/th%C3%A9%C3%A2tre-du-galpon  

  18. Offer

    CERN Multimedia

    Staff Association

    2016-01-01

    CERN was selected and participated in the ranking "Best Employers" organized by the magazine Bilan. To thank CERN for its collaboration, the magazine offers a reduction to the subscription fee for all employed members of personnel. 25% off the annual subscription: CHF 149.25 instead of CHF 199 .— The subscription includes the magazine delivered to your home for a year, every other Wednesday, as well as special editions and access to the e-paper. To benefit from this offer, simply fill out the form provided for this purpose. To get the form, please contact the secretariat of the Staff Association (Staff.Association@cern.ch).

  19. Offers

    CERN Multimedia

    Staff Association

    2013-01-01

    SPECIAL OFFER FOR OUR MEMBERS Prices Spring and Summer 2013 Day ticket: same price weekends, public holidays and weekdays: Children from 5 to 15 years old: 30 CHF instead of 39 CHF Adults from 16 years old: 36 CHF instead of 49 CHF Bonus! Free for children under 5 Tickets available at the Staff Association Secretariat.

  20. Offers

    CERN Multimedia

    Association du personnel

    2013-01-01

    SPECIAL OFFER FOR OUR MEMBERS Prices Spring and Summer 2013 Day ticket: same price weekends, public holidays and weekdays: – Children from 5 to 15 years old: 30 CHF instead of 39 CHF – Adults from 16 years old: 36 CHF instead of 49 CHF – Bonus! Free for children under 5 Tickets available at the Staff Association Secretariat.

  1. Offers

    CERN Multimedia

    Staff Association

    2013-01-01

    The theater season will start again, so do not hesitate to benefit from our discount: Théâtre de Carouge : Discount for all shows and on various season tickets. La Comédie : reduction on various tickets, on annual subscriptions and on discounted card. For further information, see our website: http://staff-association.web.cern.ch/sociocultural/offers

  2. Offers

    CERN Multimedia

    Staff Association

    2012-01-01

    proposes the following offer: 15% discount for the Staff Association members who enroll their children in summer FUTUREKIDS activities. Extracurricular Activities For Your Children The FUTUREKIDS Geneva Learning Center is open 6 days a week and offers a selection of after-school extracurricular activities for children and teenagers (ages 5 to 16). In addition to teaching in its Learning Centers, Futurekids collaborates with many private schools in Suisse Romande (Florimont, Moser, Champittet, Ecole Nouvelle, etc.) and with the Département de l'Instruction Publique (DIP) Genève. Courses and camps are usually in French but English groups can be set up on demand. FUTUREKIDS Computer Camps (during school holidays) FUTUREKIDS Computer Camps are a way of having a great time during vacations while learning something useful, possibly discovering a new hobby or even, why not, a future profession. Our computer camps are at the forefront of technology. Themes are diverse and suit all ...

  3. Offers

    CERN Multimedia

    Staff Association

    2015-01-01

    New offer for our members. The Staff Association CERN staff has recently concluded a framework agreement with AXA Insurance Ltd, General-Guisan-Strasse 40, 8401 Winterthur. This contract allows you to benefit from a preferential tariff and conditions for insurances: Motor vehicles for passenger cars and motorcycles of the product line STRADA: 10% discount Household insurance (personal liability and household contents) the product line BOX: 10% discount Travel insurance: 10% discount Buildings: 10% discount Legal protection: 10% discount AXA is number one on the Swiss insurance market. The product range encompasses all non-life insurance such as insurance of persons, property, civil liability, vehicles, credit and travel as well as innovative and comprehensive solutions in the field of occupational benefits insurance for individuals and businesses. Finally, the affiliate AXA-ARAG (legal expenses insurance) completes the offer. Armed with your staff association CERN card, you can always get the offe...

  4. Offer

    CERN Multimedia

    Staff Association

    2010-01-01

      Special offer for members of the Staff Association and their families 10% reduction on all products in the SEPHORA shop (sells perfume, beauty products etc.) in Val Thoiry ALL YEAR ROUND. Plus 20% reduction during their “vente privée”* three or four times a year. Simply present your Staff Association membership card when you make your purchase. * Next “vente privée” from 22th to 29th November 2010

  5. Offer

    CERN Multimedia

    Staff Association

    2011-01-01

      Special offer for members of the Staff Association and their families 10% reduction on all products in the SEPHORA shop (sells perfume, beauty products etc.) in Val Thoiry ALL YEAR ROUND. Plus 20% reduction during their “vente privée”* three or four times a year. Simply present your Staff Association membership card when you make your purchase. * Next “vente privée” from 25th to 27th March 2011  

  6. Offer

    CERN Multimedia

    CARLSON WAGONLIT TRAVEL

    2011-01-01

    Special offer   From 14th to 28th February 2011: no CWT service fee! For any new reservation of a holiday package (flight + hotel/apartment) from a catalog “summer 2011” For any additional information our staff is at your disposal from Monday – Friday, from 8h30 to 16h30. Phone number 72763 or 72797 Carlson Wagonlit Tavel, Agence du CERN  

  7. Offers

    CERN Multimedia

    Staff Association

    2012-01-01

    SPECIAL OFFER FOR OUR MEMBERS Prices Spring and Summer 2012 Half-day ticket: 5 hours, same price weekends, public holidays and weekdays. Children from 5 to 15 years old: 26 CHF instead of 35 CHF Adults from 16 years old: 32 CHF instead of 43 CHF Bonus! Free for children under 5. Aquaparc Les Caraïbes sur Léman 1807 Le Bouveret (VS)

  8. Offers

    CERN Document Server

    Staff Association

    2012-01-01

    SPECIAL OFFER FOR OUR MEMBERS Single tariff Adulte/Enfant Tickets “Zone terrestre” 20 euros instead of 25 euros. Access to Aqualibi: 5 euros instead of 8 euros on presentation of your ticket SA member. Free for children under 3, with limited access to the attractions. More information on our website : http://association.web.cern.ch/association/en/OtherActivities/Walibi.html

  9. Offers

    CERN Multimedia

    Staff Association

    2011-01-01

    Banque cantonale de Genève (BCGE) The BCGE Business partner programme devised for members of the CERN Staff Association offers personalized banking solutions with preferential conditions. The advantages are linked to salary accounts (free account keeping, internet banking, free Maestro and credit cards, etc.), mortgage lending, retirement planning, investment, credit, etc. The details of the programme and the preferential conditions are available on our website: http://association.web.cern.ch/association/en/OtherActivities/BCGE.html.  

  10. Offers

    CERN Multimedia

    Staff Association

    2013-01-01

    Special offer for members of the Staff Association and their families 10 % reduction on all products in the SEPHORA shop (sells perfume, beauty products etc.) in Val Thoiry ALL YEAR ROUND. Plus 20 % reduction during their “vente privée”* three or four times a year. Simply present your Staff Association membership card when you make your purchase. * Next “vente privée” from 11th to 23rd November 2013 Please contact the Staff Association Secretariat to get the discount voucher.  

  11. Offers

    CERN Multimedia

    Staff Association

    2012-01-01

    Special offer for members of the Staff Association and their families 10% reduction on all products in the SEPHORA shop (sells perfume, beauty products etc.) in Val Thoiry ALL YEAR ROUND. Plus 20% reduction during their “vente privée”* three or four times a year. Simply present your Staff Association membership card when you make your purchase. * Next “vente privée” from 21st to 26th May 2012 Please contact the Staff Association Secretariat to get the discount voucher  

  12. Offers

    CERN Multimedia

    Staff Association

    2012-01-01

    Special offer for members of the Staff Association and their families 10 % reduction on all products in the Sephora shop (sells perfume, beauty products etc.) in Val Thoiry all year round. Plus 20 % reduction during their “vente privée”* three or four times a year. Simply present your Staff Association membership card when you make your purchase. * next “vente privée” from 21st November to 1st December 2012 Please contact the Staff Association Secretariat to get the discount voucher.

  13. Offers

    CERN Multimedia

    Staff Association

    2014-01-01

    Passeport Gourmand   Are you dying for a nice meal? The “Passeport Gourmand” offers discounted prices to the members of the Staff Association (available until April 2015 and on sale in the Staff Association Secretariat): Passeport gourmand Ain / Savoie/ Haute Savoie: 56 CHF instead of 79 CHF. Passeport gourmand Geneva / neighbouring France:72 CHF instead of 95 CHF. To the members of the Staff Association: Benefit of reduced tickets: CHF 10 (instead of  18 CHF at the desk) on sale to the secretariat of the Staff Association, Building 510-R010 (in front of the Printshop).

  14. Offers

    CERN Multimedia

    Staff Association

    2014-01-01

    Special offer for members of the Staff Association and their families 10 % reduction on all products in the SEPHORA shop (sells perfume, beauty products etc.) in Val Thoiry ALL YEAR ROUND. Simply present your Staff Association membership card when you make your purchase. Plus 20 % reduction during their “vente privée”* three or four times a year. * Next “vente privée” from 24th September to 6th November 2014 Please contact the Staff Association Secretariat to get the discount voucher.  

  15. Offers

    CERN Multimedia

    Staff Association

    2013-01-01

    The « Théâtre de Carouge » offers a 5.- CHF discount for all shows (30.- CHF instead of 35.- CHF) and for the season tickets "Premières représentations" (132.- CHF instead of 162.- CHF) and "Classique" (150.- CHF instead of 180.- CHF). Please send your reservation by email to smills@tcag.ch via your professional email address. Please indicate the date of your reservation, your name and firstname and your telephone number A confirmation will be sent by email. Your membership card will be asked when you collect the tickets. More information on www.tcag.ch and www.tcag.ch/blog/

  16. Offers

    CERN Multimedia

    Staff Association

    2015-01-01

    New season 2015-2016 The new season was revealed in May, and was warmly welcomed by the press, which is especially enthusiastic about the exceptional arrival of Fanny Ardand in September in the framework of Cassandre show. Discover the programme 2015-2016. The theatre La Comédie proposes different offers to our members Benefit from a reduction of 20 % on a full price ticket during all the season: from 38 CHF to 23 CHF ticket instead of 50 CHF to 30 CHF depending on the show. Buy two seasonal tickets at the price of one (offers valid upon availability, and until 30 september 2015) 2 Cards Libertà for 240 CHF instead of 480 CHF. Cruise freely through the season with 8 perfomances of your choice per season. These cards are transferrable, and can be shared with one or more accompanying persons. 2 Abo Piccolo for 120 CHF instead of 240 CHF. Let yourself be surprised a theatre performance with our discovery seasonal tickets, which includes 4 flagship perfomances for the season. ...

  17. Offer

    CERN Multimedia

    Staff Association

    2015-01-01

    RRP Communication organizes cultural events such as concerts, shows, sporting events. The members of the Staff Association profits from a reduction of 10 CHF per ticket. How to proceed: The ticket reservation is made by mail info@rrp.ch. You need to give the following information: Name of the show, and which date chosen Number of tickets, and category Name and surname Address Telephone number Mention “offer CERN”, and attach a photocopy of your Staff Association member card. After your reservation, you will be sent a copy with a payslip to the address mentioned above. Once paid, the members have the possibility to: pick up their ticket(s) from the cash register the evening of the show (opens 1 hour before the show) by showing their member card; receive the ticket(s) to the address indicated above, by registered mail, subject to an extra cost of 10CHF. Next show : More information at http://www.rrp.ch/

  18. Offers

    CERN Document Server

    Staff Association

    2013-01-01

    FUTUREKIDS proposes 15% discount for the Staff Association members who enroll their children in FUTUREKIDS activities. New workshop for 12-15 year olds, on how to develop applications for Android phones. Easter activities calendar Extracurricular Activities For Your Children The FUTUREKIDS Geneva Learning Center is open 6 days a week and offers a selection of after-school extracurricular activities for children and teenagers (ages 5 to 16). In addition to teaching in its Learning Centers, Futurekids collaborates with many private schools in Suisse Romande (Florimont, Moser, Champittet, Ecole Nouvelle, etc.) and with the Département de l'Instruction Publique (DIP) Genève. Courses and camps are usually in French but English groups can be set up on demand. FUTUREKIDS Computer Camps (during school holidays) FUTUREKIDS Computer Camps are a way of having a great time during vacations while learning something useful, possibly discovering a new hobby or even, why not, a fut...

  19. Offers

    CERN Multimedia

    Association du personnel

    2010-01-01

    THEATRE FORUM DE MEYRIN 1, place des Cinq-Continents 1217 Meyrin    Special offer for members of the Staff Association: Reduced ticket prices for the play Love is my sin (in English) from 15 to 17 March at 8.30pm http://www.forum-meyrin.ch/main.php?page=119&s=12   First category: 37 CHF instead of 46 CHF Second category (seats towards the sides): 30 CHF instead of 38 CHF Please present your CERN card and your Staff Association membership card at the ticket office. Ticket reservation: tel. 022 989 34 34 (from Monday to Friday 2pm to 6pm) or e-mail : billetterie@forum-meyrin.ch  

  20. Offer

    CERN Multimedia

    Staff Association

    2011-01-01

    DETAILS OF THE AGREEMENT WITH BCGE The BCGE Business partner programme devised for members of the CERN Staff Association offers personalized banking solutions with preferential conditions. The advantages are linked to salary accounts (free account keeping, internet banking, free Maestro and credit cards, etc.), mortgage lending, retirement planning, investment, credit, etc. The details of the programme and the preferential conditions are available on the Staff Association web site and from the secretariat (http://cern.ch/association/en/OtherActivities/BCGE.html). To benefit from these advantages, you will need to fill in the form available on our site, which must then be stamped by the Staff Association as proof that you are a paid-up member.  

  1. Offers

    CERN Multimedia

    Staff Association

    2013-01-01

    Do not hesitate to benefit of our offers in our partners: Théâtre de Carouge Discount of 5 CHF for all shows (30 CHF instead of 35 CHF) and on season tickets « first performance » ( 132 CHF instead 162 CHF) and also on « classical » ( 150 CHF instead of 180 CHF) upon presentation of your Staff Association membership card before payment. Théâtre La Comédie de Genève  20% off on tickets (full price – also available for partner): from 24 to 32 CHF a ticket instead of 30 to 40 CHF depending on the shows. 40% off on annual subscriptions (access to the best seats, pick up tickets at the last minute): 200 CHF for 9 shows (about 22 CHF a ticket instead of 30 to 40 CHF. Discounted card: 60 CHF and single price ticket of 16 CHF.

  2. Offers

    CERN Document Server

    Staff Association

    2011-01-01

    At the UN Cultural kiosk (door C6) This offer is meant for international civil servants, members of diplomatic missions as well as official delegates under presentation of their accreditation card. Matthew Lee & 5 musiciens Du Blues, du Boogie, du Rock’n’Roll 28 octobre 2011 à 20h30 Théâtre du Léman Quai du Mont-Blanc 19 Hôtel Kempinski Genève Matthew Lee is an exciting pianist singer combining classic Rock’n’Roll with timeless ballads. He revisits the standards, being alternately Jerry Lee Lewis, Chuck Berry, Little Richards and many others... He is a showman with a soulful voice and displays virtuosity during his piano solos. Simply amazing! 20 % reduction Tickets from 32 to 68 CHF Kiosque Culturel ONU Palais des Nations Porte 6 Avenue de la Paix 8-14 1211 Genève 10 Tél. 022 917 11 11 info@kiosqueonu.ch

  3. Offer

    CERN Multimedia

    Staff Association

    2016-01-01

    The “La Comédie” theatre unveiled its programme for the season 2016–2017 in late May, and it was met with great enthusiasm by the press. Leading names of the European and Swiss theatre scenes, such as director Joël Pommerat who recently won four Molière awards, will make an appearance! We are delighted to share this brand new, rich and varied programme with you. The “La Comédie” theatre has various discounts for our members Buy 2 subscriptions for the price of 1 : 2 cards “Libertà” for CHF 240.- instead of CHF 480.- Cruise freely through the season with an 8-entry card valid for the shows of your choice. These cards are transferable and can be shared with one or more accompanying persons. 2 cards “Piccolo” for CHF 120 instead of CHF 240.- This card lets you discover 4 shows which are suitable for all audiences (offers valid while stock lasts and until October 31, 201...

  4. Offers

    CERN Multimedia

    Staff Association

    2011-01-01

    Special offer for members of the Staff Association and their families 10% reduction on all products in the SEPHORA shop (sells perfume, beauty products etc.) in Val Thoiry ALL YEAR ROUND. Plus 20% reduction during their “vente privée”* three or four times a year. Simply present your Staff Association membership card when you make your purchase. * Next “vente privée” from 21st to 26th November 2011 New BCGE Business partner benefits As you may remember thanks to our BCGE business partner agreement you benefit from various advantages such as free annual subscription on your Silver or Gold credit card both for yourself and your partner (joint account). Please be informed that as of October 1st  2011 the below mentioned features will be added to your annual credit card subscription : MasterCard/Visa Silver and Gold: travel cancellation as well as related services such as holiday interruption best guaranteed price Only for Ma...

  5. A pilgrimage to gravity on GPUs

    CERN Document Server

    Bédorf, Jeroen

    2012-01-01

    In this short review we present the developments over the last 5 decades that have led to the use of Graphics Processing Units (GPUs) for astrophysical simulations. Since the introduction of NVIDIA's Compute Unified Device Architecture (CUDA) in 2007 the GPU has become a valuable tool for N-body simulations and is so popular these days that almost all papers about high precision N-body simulations use methods that are accelerated by GPUs. With the GPU hardware becoming more advanced and being used for more advanced algorithms like gravitational tree-codes we see a bright future for GPU like hardware in computational astrophysics.

  6. United Offers Promotion on Roundtrip Flights to New York

    Institute of Scientific and Technical Information of China (English)

    2005-01-01

      United Air announced on May 25 that passengers can fly roundtrip between Beijing and New York for as low as RMB6350. Passengers who book a roundtrip ticket on United Airlines from Beijing to New York from June 15 to June 30, 2005 can qualify for this promotion.In addition, passengers can also fly to other U.S. cities on the East Coast for only RMB7000. ……

  7. United Offers Promotion on Roundtrip Flights to New York

    Institute of Scientific and Technical Information of China (English)

    2005-01-01

    @@ United Air announced on May 25 that passengers can fly roundtrip between Beijing and New York for as low as RMB6350. Passengers who book a roundtrip ticket on United Airlines from Beijing to New York from June 15 to June 30, 2005 can qualify for this promotion.In addition, passengers can also fly to other U.S. cities on the East Coast for only RMB7000.

  8. Simulating Spiking Neural P systems without delays using GPUs

    CERN Document Server

    Cabarle, Francis; Martinez-del-Amor, Miguel A

    2011-01-01

    We present in this paper our work regarding simulating a type of P system known as a spiking neural P system (SNP system) using graphics processing units (GPUs). GPUs, because of their architectural optimization for parallel computations, are well-suited for highly parallelizable problems. Due to the advent of general purpose GPU computing in recent years, GPUs are not limited to graphics and video processing alone, but include computationally intensive scientific and mathematical applications as well. Moreover P systems, including SNP systems, are inherently and maximally parallel computing models whose inspirations are taken from the functioning and dynamics of a living cell. In particular, SNP systems try to give a modest but formal representation of a special type of cell known as the neuron and their interactions with one another. The nature of SNP systems allowed their representation as matrices, which is a crucial step in simulating them on highly parallel devices such as GPUs. The highly parallel natu...

  9. GPUs for real-time processing in HEP trigger systems

    Science.gov (United States)

    Ammendola, R.; Biagioni, A.; Deri, L.; Fiorini, M.; Frezza, O.; Lamanna, G.; Lo Cicero, F.; Lonardo, A.; Messina, A.; Sozzi, M.; Pantaleo, F.; Paolucci, Ps; Rossetti, D.; Simula, F.; Tosoratto, L.; Vicini, P.; Gap Collaboration

    2014-06-01

    We describe a pilot project (GAP - GPU Application Project) for the use of GPUs (Graphics processing units) for online triggering applications in High Energy Physics experiments. Two major trends can be identified in the development of trigger and DAQ systems for particle physics experiments: the massive use of general-purpose commodity systems such as commercial multicore PC farms for data acquisition, and the reduction of trigger levels implemented in hardware, towards a fully software data selection system ("trigger-less"). The innovative approach presented here aims at exploiting the parallel computing power of commercial GPUs to perform fast computations in software not only in high level trigger levels but also in early trigger stages. General-purpose computing on GPUs is emerging as a new paradigm in several fields of science, although so far applications have been tailored to the specific strengths of such devices as accelerators in offline computation. With the steady reduction of GPU latencies, and the increase in link and memory throughputs, the use of such devices for real-time applications in high energy physics data acquisition and trigger systems is becoming relevant. We discuss in detail the use of online parallel computing on GPUs for synchronous low-level triggers with fixed latency. In particular we show preliminary results on a first test in the CERN NA62 experiment. The use of GPUs in high level triggers is also considered, the CERN ATLAS experiment being taken as a case study of possible applications.

  10. Landau Gauge Fixing on GPUs

    CERN Document Server

    Cardoso, Nuno; Bicudo, Pedro; Oliveira, Orlando

    2012-01-01

    In this paper we present and explore the performance of Landau gauge fixing in GPUs using CUDA. We consider the steepest descent algorithm with Fourier acceleration, and compare the GPU performance with a parallel CPU implementation. Using $32^4$ lattice volumes, we find that the computational power of a single Tesla C2070 GPU is equivalent to approximately 256 CPU cores.

  11. Suitability of NVIDIA GPUs for SKA1-Low

    CERN Document Server

    Magro, Alessio; Clark, Mike; Ord, Steve

    2014-01-01

    In this memo we investigate the applicability of NVIDIA Graphics Processing Units (GPUs) for SKA1-Low station and Central Signal Processing (CSP)-level processing. Station-level processing primarily involves generating a single station beam which will then be correlated with other beams in CSP. Fine channelisation can be performed either at the station of CSP-level, while coarse channelisation is assumed to be performed on FPGA-based Tile Processors, together with A/D conversion, equilisation and other processes. Rough estimates for number of GPUs required and power requirements will also be provided.

  12. GPUs for real-time processing in HEP trigger systems

    CERN Document Server

    Ammendola, R; Deri, L; Fiorini, M; Frezza, O; Lamanna, G; Lo Cicero, F; Lonardo, A; Messina, A; Sozzi, M; Pantaleo, F; Paolucci, Ps; Rossetti, D; Simula, F; Tosoratto, L; Vicini, P

    2014-01-01

    We describe a pilot project (GAP - GPU Application Project) for the use of GPUs (Graphics processing units) for online triggering applications in High Energy Physics experiments. Two major trends can be identied in the development of trigger and DAQ systems for particle physics experiments: the massive use of general-purpose commodity systems such as commercial multicore PC farms for data acquisition, and the reduction of trigger levels implemented in hardware, towards a fully software data selection system (\\trigger-less"). The innovative approach presented here aims at exploiting the parallel computing power of commercial GPUs to perform fast computations in software not only in high level trigger levels but also in early trigger stages. General-purpose computing on GPUs is emerging as a new paradigm in several elds of science, although so far applications have been tailored to the specic strengths of such devices as accelerators in oine computation. With the steady reduction of GPU latencies, and the incre...

  13. An Efficient Stencil Implementation for Modern GPUs

    Science.gov (United States)

    Krotkiewski, M.; Dabrowski, M.

    2012-04-01

    Efficient solution of the Poisson's equation is crucial for many applications in geophysics. We show that modern Graphics Processing Units (GPUs) are very well suited for solving Poisson's equation on structured Cartesian grids using techniques such as Finite Element Method (FEM) or Finite Difference Method (FDM). For the homogeneous Poisson's problem the discretized differential operator can be computed in every grid point as a stencil. We present an efficient implementation of 7--point and 27--point stencil computation on high-end Nvidia Tesla GPUs. A new method of reading data from the global memory to the shared memory of thread blocks is shown. The method avoids conditional statements and idle threads, and shows good cache reuse of the halo data required by every thread block. Software prefetching is used to overlap arithmetic and memory instructions. We analyze the performance using a memory footprint model that takes into account the actual halo overhead due to the memory transaction size on the GPUs. Detailed performance analysis for single precision and performance results for single and double precision arithmetic on Nvidia Tesla cards are presented. On Tesla C2050 with single and double precision arithmetic our 7--point stencil implementation achieves an average throughput of 11.8 and 6.5 Gpts/s, respectively. The symmetric 27--point stencil implementation sustains a throughput of 10.5 and 5.8 Gpts/s, respectively, which is equivalent to 456 and 164 GFLOP/s, respectively. Our stencil implementation is used as a building block of a Geometric Multigrid solver for the Poisson's problem. For single precision arithmetic and a grid size of 2573 Tesla C2050 performs more than 50 V-cycles per second. As an example application we use the developed Multigrid solver in simulations of natural porous convection in a homogeneous medium saturated with incompressible fluid.

  14. Iterative Reconstruction of Computed Axial Tomography images based on GPUs; Reconstruccion Iterativa de Imagenes TAC basada en GPUs

    Energy Technology Data Exchange (ETDEWEB)

    Vidal, V.; Florez, L. A.; Mayo, P.; Rodenas, F.; Verdu, G.

    2013-07-01

    Although widely used in nuclear medicine (gamma-cameras, SPECT-single photon emission computed tomography, positron emission tomography, PET), the iterative image reconstruction is not widespread in Computed tomography (CT). The biggest reason for this is that the data set required in TAC is much higher than in nuclear medicine and iterative reconstruction is computationally very intensive. The graphics processing units (GPUs) provide the possibility to reduce the high computational cost of rebuilding in an effective way. The objective of this work is to develop image reconstruction algorithm based on GPUs.

  15. Assessment of CEPH accredited institutions offering Public Health programs in the United States: A Short Report

    Directory of Open Access Journals (Sweden)

    Ashish eJoshi

    2016-01-01

    Full Text Available Aims: Examine the distribution of the CEPH accredited institutions offering public health educational programs in the United States, and characterize their various attributes.Methods: A search was conducted during the period of June 2014, using the Association of Schools and Programs of Public Health database (ASPPH, and individual university websites to obtain a complete list of CEPH accredited institutions offering programs in Public Health at the Certificate, Masters, and Doctoral levels in the United States. Detailed information were abstracted from the various programs offerings including: school/program information, school type, geographic location, admission cycle, education delivery format, public health concentration, number of credits, presence of a global component, joint programs and tuition. The data was analyzed in August 2014. Results: A total of 85 CEPH accredited institutions designated as either Schools of Public Health, or individual Programs of Public Health were present in the ASPPH database at the time of this data collection (2014. These institutions offer programs in public health at the Certificate (61%, n=52, Masters (100%, n=85 and Doctoral (44%, n=37 levels in the US. More than half of the programs offered were provided by schools of public health (58%, N=49, which were mostly public universities (75%, n=64, concentrated in the Northeast (22%, n=19 and mainly admitted students during the fall semester. Ninety three concentrations of Public Health currently exist, of which 25 concentrations are predominant. Conclusion: To the best of our knowledge, this is the first study which examines the distribution of existing CEPH accredited Public Health educational programs offered by US institutions. We suggest future areas of research to assess existing Public Health workforce demands, and map them to the curriculums and competencies provided by institutions offering Public Health educational programs in the United States

  16. APL on GPUs

    DEFF Research Database (Denmark)

    Henriksen, Troels; Dybdal, Martin; Urms, Henrik

    2016-01-01

    This paper demonstrates translation schemes by which programs written in a functional subset of APL can be compiled to code that is run efficiently on general purpose graphical processing units (GPGPUs). Furthermore, the generated programs can be straight-forwardly interoperated with mainstream p...

  17. Fast network centrality analysis using GPUs

    Directory of Open Access Journals (Sweden)

    Shi Zhiao

    2011-05-01

    Full Text Available Abstract Background With the exploding volume of data generated by continuously evolving high-throughput technologies, biological network analysis problems are growing larger in scale and craving for more computational power. General Purpose computation on Graphics Processing Units (GPGPU provides a cost-effective technology for the study of large-scale biological networks. Designing algorithms that maximize data parallelism is the key in leveraging the power of GPUs. Results We proposed an efficient data parallel formulation of the All-Pairs Shortest Path problem, which is the key component for shortest path-based centrality computation. A betweenness centrality algorithm built upon this formulation was developed and benchmarked against the most recent GPU-based algorithm. Speedup between 11 to 19% was observed in various simulated scale-free networks. We further designed three algorithms based on this core component to compute closeness centrality, eccentricity centrality and stress centrality. To make all these algorithms available to the research community, we developed a software package gpu-fan (GPU-based Fast Analysis of Networks for CUDA enabled GPUs. Speedup of 10-50× compared with CPU implementations was observed for simulated scale-free networks and real world biological networks. Conclusions gpu-fan provides a significant performance improvement for centrality computation in large-scale networks. Source code is available under the GNU Public License (GPL at http://bioinfo.vanderbilt.edu/gpu-fan/.

  18. Simulating Collective Effects on GPUs

    CERN Document Server

    AUTHOR|(CDS)2095754; Arbenz, Peter

    Computer simulations are an important tool to study the dynamics of charged particles in particle accelerators, with new hardware solutions such as GPUs providing a vast increase in computing power. In the accelerator physics domain simulations are used to understand instabilities arising due to collective e↵ects in high intensity beams which limit the accelerator performance. In this thesis PyHEADTAIL, a code to study collective effects in synchrotrons, is ported to GPUs using PyCUDA. The goal is to achieve a significant speedup while at the same time producing a simple interface for users and other developers. A speedup of 6 compared to the CPU version is achieved on a typical simulation study of instabilities in the Large Hadron Collider (LHC) at CERN.

  19. Medical image segmentation on GPUs--a comprehensive review.

    Science.gov (United States)

    Smistad, Erik; Falch, Thomas L; Bozorgi, Mohammadmehdi; Elster, Anne C; Lindseth, Frank

    2015-02-01

    Segmentation of anatomical structures, from modalities like computed tomography (CT), magnetic resonance imaging (MRI) and ultrasound, is a key enabling technology for medical applications such as diagnostics, planning and guidance. More efficient implementations are necessary, as most segmentation methods are computationally expensive, and the amount of medical imaging data is growing. The increased programmability of graphic processing units (GPUs) in recent years have enabled their use in several areas. GPUs can solve large data parallel problems at a higher speed than the traditional CPU, while being more affordable and energy efficient than distributed systems. Furthermore, using a GPU enables concurrent visualization and interactive segmentation, where the user can help the algorithm to achieve a satisfactory result. This review investigates the use of GPUs to accelerate medical image segmentation methods. A set of criteria for efficient use of GPUs are defined and each segmentation method is rated accordingly. In addition, references to relevant GPU implementations and insight into GPU optimization are provided and discussed. The review concludes that most segmentation methods may benefit from GPU processing due to the methods' data parallel structure and high thread count. However, factors such as synchronization, branch divergence and memory usage can limit the speedup. Copyright © 2014 The Authors. Published by Elsevier B.V. All rights reserved.

  20. GPUs benchmarking in subpixel image registration algorithm

    Science.gov (United States)

    Sanz-Sabater, Martin; Picazo-Bueno, Jose Angel; Micó, Vicente; Ferrerira, Carlos; Granero, Luis; Garcia, Javier

    2015-05-01

    Image registration techniques are used among different scientific fields, like medical imaging or optical metrology. The straightest way to calculate shifting between two images is using the cross correlation, taking the highest value of this correlation image. Shifting resolution is given in whole pixels which cannot be enough for certain applications. Better results can be achieved interpolating both images, as much as the desired resolution we want to get, and applying the same technique described before, but the memory needed by the system is significantly higher. To avoid memory consuming we are implementing a subpixel shifting method based on FFT. With the original images, subpixel shifting can be achieved multiplying its discrete Fourier transform by a linear phase with different slopes. This method is high time consuming method because checking a concrete shifting means new calculations. The algorithm, highly parallelizable, is very suitable for high performance computing systems. GPU (Graphics Processing Unit) accelerated computing became very popular more than ten years ago because they have hundreds of computational cores in a reasonable cheap card. In our case, we are going to register the shifting between two images, doing the first approach by FFT based correlation, and later doing the subpixel approach using the technique described before. We consider it as `brute force' method. So we will present a benchmark of the algorithm consisting on a first approach (pixel resolution) and then do subpixel resolution approaching, decreasing the shifting step in every loop achieving a high resolution in few steps. This program will be executed in three different computers. At the end, we will present the results of the computation, with different kind of CPUs and GPUs, checking the accuracy of the method, and the time consumed in each computer, discussing the advantages, disadvantages of the use of GPUs.

  1. The Use of GPUs for Solving the Computed Tomography Problem

    Directory of Open Access Journals (Sweden)

    A.E. Kovtanyuk

    2014-07-01

    Full Text Available Computed tomography (CT is a widespread method used to study the internal structure of objects. The method has applications in medicine, industry and other fields of human activity. In particular, Electronic Imaging, as a species CT, can be used to restore the structure of nanosized objects. Accurate and rapid results are in high demand in modern science. However, there are computational limitations that bound the possible usefulness of CT. On the other hand, the introduction of high-performance calculations using Graphics Processing Units (GPUs provides improving quality and performance of computed tomography investigations. Moreover, parallel computing with GPUs gives significantly higher computation speeds when compared with (Central Processing Units CPUs, because of architectural advantages of the former. In this paper a computed tomography method of recovering the image using parallel computations powered by NVIDIA CUDA technology is considered. The implementation of this approach significantly reduces the required time for solving the CT problem.

  2. Massively parallelized replica-exchange simulations of polymers on GPUs

    CERN Document Server

    Groß, Jonathan; Bachmann, Michael

    2011-01-01

    We discuss the advantages of parallelization by multithreading on graphics processing units (GPUs) for parallel tempering Monte Carlo computer simulations of an exemplified bead-spring model for homopolymers. Since the sampling of a large ensemble of conformations is a prerequisite for the precise estimation of statistical quantities such as typical indicators for conformational transitions like the peak structure of the specific heat, the advantage of a strong increase in performance of Monte Carlo simulations cannot be overestimated. Employing multithreading and utilizing the massive power of the large number of cores on GPUs, being available in modern but standard graphics cards, we find a rapid increase in efficiency when porting parts of the code from the central processing unit (CPU) to the GPU.

  3. Baseline data on distance education offerings in deaf education teacher preparation programs in the United States.

    Science.gov (United States)

    Stryker, Deborah S

    2011-01-01

    Given that little is empirically known about the use of distance education within deaf education teacher preparation (DETP) programs, the purpose of the present study was to obtain baseline data on distance education activities in these programs. Using a census of the program coordinators of the 68 DETP programs in the United States, the researcher requested and gathered data by means of an 11-item online questionnaire. A 69% response rate was achieved (N = 47). It was found that more than half of the DETP programs offered distance education courses. Respondents indicated that asynchronous technology was used overwhelmingly more often than synchronous technology, with the Internet listed most often, followed by teleconferencing. Additional results provide information about the current status of distance education within the DETP field.

  4. Redefining the Data Pipeline Using GPUs

    Science.gov (United States)

    Warner, C.; Eikenberry, S. S.; Gonzalez, A. H.; Packham, C.

    2013-10-01

    There are two major challenges facing the next generation of data processing pipelines: 1) handling an ever increasing volume of data as array sizes continue to increase and 2) the desire to process data in near real-time to maximize observing efficiency by providing rapid feedback on data quality. Combining the power of modern graphics processing units (GPUs), relational database management systems (RDBMSs), and extensible markup language (XML) to re-imagine traditional data pipelines will allow us to meet these challenges. Modern GPUs contain hundreds of processing cores, each of which can process hundreds of threads concurrently. Technologies such as Nvidia's Compute Unified Device Architecture (CUDA) platform and the PyCUDA (http://mathema.tician.de/software/pycuda) module for Python allow us to write parallel algorithms and easily link GPU-optimized code into existing data pipeline frameworks. This approach has produced speed gains of over a factor of 100 compared to CPU implementations for individual algorithms and overall pipeline speed gains of a factor of 10-25 compared to traditionally built data pipelines for both imaging and spectroscopy (Warner et al., 2011). However, there are still many bottlenecks inherent in the design of traditional data pipelines. For instance, file input/output of intermediate steps is now a significant portion of the overall processing time. In addition, most traditional pipelines are not designed to be able to process data on-the-fly in real time. We present a model for a next-generation data pipeline that has the flexibility to process data in near real-time at the observatory as well as to automatically process huge archives of past data by using a simple XML configuration file. XML is ideal for describing both the dataset and the processes that will be applied to the data. Meta-data for the datasets would be stored using an RDBMS (such as mysql or PostgreSQL) which could be easily and rapidly queried and file I/O would be

  5. Mapping the MPM maximum flow algorithm on GPUs

    Science.gov (United States)

    Solomon, Steven; Thulasiraman, Parimala

    2010-11-01

    The GPU offers a high degree of parallelism and computational power that developers can exploit for general purpose parallel applications. As a result, a significant level of interest has been directed towards GPUs in recent years. Regular applications, however, have traditionally been the focus of work on the GPU. Only very recently has there been a growing number of works exploring the potential of irregular applications on the GPU. We present a work that investigates the feasibility of Malhotra, Pramodh Kumar and Maheshwari's "MPM" maximum flow algorithm on the GPU that achieves an average speedup of 8 when compared to a sequential CPU implementation.

  6. N-Body Simulations on GPUs

    CERN Document Server

    Elsen, Erich; Houston, Mike; Pande, Vijay; Hanrahan, Pat; Darve, Eric

    2007-01-01

    Commercial graphics processors (GPUs) have high compute capacity at very low cost, which makes them attractive for general purpose scientific computing. In this paper we show how graphics processors can be used for N-body simulations to obtain improvements in performance over current generation CPUs. We have developed a highly optimized algorithm for performing the O(N^2) force calculations that constitute the major part of stellar and molecular dynamics simulations. In some of the calculations, we achieve sustained performance of nearly 100 GFlops on an ATI X1900XTX. The performance on GPUs is comparable to specialized processors such as GRAPE-6A and MDGRAPE-3, but at a fraction of the cost. Furthermore, the wide availability of GPUs has significant implications for cluster computing and distributed computing efforts like Folding@Home.

  7. SFU-Driven Transparent Approximation Acceleration on GPUs

    Energy Technology Data Exchange (ETDEWEB)

    Li, Ang; Song, Shuaiwen; Wijtvliet, Mark; Kumar, Akash; Corporaal, Henk

    2016-06-01

    Approximate computing, the technique that sacrifices certain amount of accuracy in exchange for substantial performance boost or power reduction, is one of the most promising solutions to enable power control and performance scaling towards exascale. Although most existing approximation designs target the emerging data-intensive applications that are comparatively more error-tolerable, there is still high demand for the acceleration of traditional scientific applications (e.g., weather and nuclear simulation), which often comprise intensive transcendental function calls and are very sensitive to accuracy loss. To address this challenge, we focus on a very important but often ignored approximation unit on GPUs.

  8. Discovering Matter-Antimatter Asymmetries with GPUs

    CERN Document Server

    Reichert, Stefanie

    2015-01-01

    The search for matter-antimatter asymmetries requires highest precision analyses and thus very large datasets and intensive computing. This contribution discusses two complemen- tary approaches where GPU systems have been successfully exploited in this area. Both approaches make use of the CUDA Thrust library which can be used on supported GPUs. The first approach is a generic search for local asymmetries in phase-space distributions of matter and antimatter particle decays. This powerful analysis method has never been used to date due to its high demand in CPU time. The second approach uses the GooFit framework, which is a generic fitting framework that exploits massive parallelisation on GPUs

  9. Hardware acceleration of EDA algorithms custom ICS, FPGAs and GPUs

    CERN Document Server

    Khatri, Sunil P

    2010-01-01

    This text covers the acceleration of EDA algorithms using hardware platforms such as FPGAs and GPUs. In it, widely applied CAD algorithms are evaluated and compared for potential acceleration on FPGAs and GPUs.

  10. Accelerating thermal deposition modeling at terahertz frequencies using GPUs

    Science.gov (United States)

    Doroski, Michael; Knight, Michael; Payne, Jason; Grundt, Jessica E.; Ibey, Bennett L.; Thomas, Robert; Roach, William P.; Wilmink, Gerald J.

    2011-03-01

    Finite-difference time-domain (FDTD) methods are widely used to model the propagation of electromagnetic radiation in biological tissues. High-performance central processing units (CPUs) can execute FDTD simulations for complex problems using 3-D geometries and heterogeneous tissue material properties. However, when FDTD simulations are employed at terahertz (THz) frequencies excessively long processing times are required to account for finer resolution voxels and larger computational modeling domains. In this study, we developed and tested the performance of 2-D and 3-D FDTD thermal propagation code executed on a graphics processing unit (GPU) device, which was coded using an extension of the C language referred to as CUDA. In order to examine the speedup provided by GPUs, we compared the performance (speed, accuracy) for simulations executed on a GPU (Tesla C2050), a high-performance CPU (Intel Xeon 5504), and supercomputer. Simulations were conducted to model the propagation and thermal deposition of THz radiation in biological materials for several in vitro and in vivo THz exposure scenarios. For both the 2-D and 3-D in vitro simulations, we found that the GPU performed 100 times faster than runs executed on a CPU, and maintained comparable accuracy to that provided by the supercomputer. For the in vivo tissue damage studies, we found that the GPU executed simulations 87x times faster than the CPU. Interestingly, for all exposure duration tested, the CPU, GPU, and supercomputer provided comparable predictions for tissue damage thresholds (ED50). Overall, these results suggest that GPUs can provide performance comparable to a supercomputer and at speeds significantly faster than those possible with a CPU. Therefore, GPUs are an affordable tool for conducting accurate and fast simulations for computationally intensive modeling problems.

  11. Parallel LDPC Decoding on GPUs Using a Stream-Based Computing Approach

    Institute of Scientific and Technical Information of China (English)

    Gabriel Falc(a)o; Shinichi Yamagiwa; Vitor Silva; Leonel Sousa

    2009-01-01

    Low-Density Parity-Check(LDPC)codes are powerful error correcting codes adopted by recent communication standards.LDPC decoders are based on belief propagation algorithms,which make use of a Tanner graph and very intensive message-passing computation,and usually require hardware-based dedicated solutions.With the exponential increase of the computational power of commodity graphics processing units(GPUs),new opportunities have arisen to develop general purpose processing on GPUs.This paper proposes the use of GPUs for implementing flexible and programmable LDPC decoders.A new stream-based approach is proposed,based on compact data structures to represent the Tanner graph.It is shown that such a challenging application for stream-based computing,because of irregular memory access patterns,memory bandwidth and recursive flow control constraints,can be efficiently implemented on GPUs.The proposal was experimentally evaluated by programming LDPC decoders on GPUs using the Caravela platform,a generic interface tool for managing the kernels'execution regardless of the GPU manufacturer and operating system.Moreover,to relatively assess the obtained results,we have also implemented LDPC decoders on general purpose processors with Streaming Single Instruction Multiple Data(SIMD)Extensions.Experimental results show that the solution proposed here efficiently decodes several codewords simultaneously,reducing the processing time by one order of magnitude.

  12. A Novel CSR-Based Sparse Matrix-Vector Multiplication on GPUs

    Directory of Open Access Journals (Sweden)

    Guixia He

    2016-01-01

    Full Text Available Sparse matrix-vector multiplication (SpMV is an important operation in scientific computations. Compressed sparse row (CSR is the most frequently used format to store sparse matrices. However, CSR-based SpMVs on graphic processing units (GPUs, for example, CSR-scalar and CSR-vector, usually have poor performance due to irregular memory access patterns. This motivates us to propose a perfect CSR-based SpMV on the GPU that is called PCSR. PCSR involves two kernels and accesses CSR arrays in a fully coalesced manner by introducing a middle array, which greatly alleviates the deficiencies of CSR-scalar (rare coalescing and CSR-vector (partial coalescing. Test results on a single C2050 GPU show that PCSR fully outperforms CSR-scalar, CSR-vector, and CSRMV and HYBMV in the vendor-tuned CUSPARSE library and is comparable with a most recently proposed CSR-based algorithm, CSR-Adaptive. Furthermore, we extend PCSR on a single GPU to multiple GPUs. Experimental results on four C2050 GPUs show that no matter whether the communication between GPUs is considered or not PCSR on multiple GPUs achieves good performance and has high parallel efficiency.

  13. Molecular dynamics simulation of complex multiphase flow on a computer cluster with GPUs

    Institute of Scientific and Technical Information of China (English)

    CHEN Fei-Guo; GE Wei; LI Jing-Hai

    2009-01-01

    Compute Unified Device Architecture (CUDA) was used to design and implement molecular dynamics (MD) simulations on graphics processing units (GPU). With an NVIDIA Tesla C870, a 20-60 fold speedup over that of one core of the Intel Xeon 5430 CPU was achieved, reaching up to 150 Gflopa. MD simulation of cavity flow and particle-bubble interaction in liquid was implemented on multiple GPUs using a message passing interface (MPI). Up to 200 GPUs were tested on a special network topology, which achieves good scalability. The capability of GPU clusters for large-scale molecular dynamics simulation of meso-scale flow behavior was, therefore, uncovered.

  14. Molecular dynamics simulation of complex multiphase flow on a computer cluster with GPUs

    Institute of Scientific and Technical Information of China (English)

    2009-01-01

    Compute Unified Device Architecture (CUDA) was used to design and implement molecular dynamics (MD) simulations on graphics processing units (GPU). With an NVIDIA Tesla C870, a 20-60 fold speedup over that of one core of the Intel Xeon 5430 CPU was achieved, reaching up to 150 Gflops. MD simulation of cavity flow and particle-bubble interaction in liquid was implemented on multiple GPUs using a message passing interface (MPI). Up to 200 GPUs were tested on a special network topology, which achieves good scalability. The capability of GPU clusters for large-scale molecular dynamics simulation of meso-scale flow behavior was, therefore, uncovered.

  15. The "Earth Physics" Workshops Offered by the Earth Science Education Unit

    Science.gov (United States)

    Davies, Stephen

    2012-01-01

    Earth science has a part to play in broadening students' learning experience in physics. The Earth Science Education Unit presents a range of (free) workshops to teachers and trainee teachers, suggesting how Earth-based science activities, which show how we understand and use the planet we live on, can easily be slotted into normal science…

  16. The "Earth Physics" Workshops Offered by the Earth Science Education Unit

    Science.gov (United States)

    Davies, Stephen

    2012-01-01

    Earth science has a part to play in broadening students' learning experience in physics. The Earth Science Education Unit presents a range of (free) workshops to teachers and trainee teachers, suggesting how Earth-based science activities, which show how we understand and use the planet we live on, can easily be slotted into normal science…

  17. Massively parallel read mapping on GPUs with the q-group index and PEANUT

    NARCIS (Netherlands)

    J. Köster (Johannes); S. Rahmann (Sven)

    2014-01-01

    textabstractWe present the q-group index, a novel data structure for read mapping tailored towards graphics processing units (GPUs) with a small memory footprint and efficient parallel algorithms for querying and building. On top of the q-group index we introduce PEANUT, a highly parallel GPU-based

  18. ASIAN EMERGING ECONOMIES AND UNITED STATES OF AMERICA: DO THEY OFFER A DIVERSIFICATION BENEFIT?

    Directory of Open Access Journals (Sweden)

    Preeti Sharma

    2011-08-01

    Full Text Available With the emergence of new capital markets and liberalization of stock markets in recent years, there has been an increase in investors' interest in international diversification. This is so because international diversification allows investors to have a larger basket of foreign securities to choose from as part of their portfolio assets, so as to enhance the reward-to-volatility ratio. This paper, thus, studies the issue of co-movement between Asian emerging stock markets and developed economies using the concept of co-integration. Furthermore, it has been observed that there has been increasing interdependence between most of the developed and emerging markets since the 1987 Stock Market Crash. This interdependence intensified after the 1997 Asian Financial Crisis. With this phenomenon of increasing co-movement between developed and emerging stock markets, the benefits of international diversification become limited. We have seen that stock markets behavior is random. Several researches have shown that stock markets moves in random and does not affected by any fundamentals. Some authors describe that global sentiments and fundamentals does not prove fruitful in studying the movement of stock markets. Several investment bankers and speculators daily predict the stock market movements of one economy on the basis of stock market movements of another economy. Researches have been conducted with the purpose of finding out the potential for investors to gain from investments in different economies. The paper analyzes the interdependence (if any of developing or emerging Asian economies and United States of America. And, thus these trends can help the investors to diversify their portfolios. This study is conducted with the objective of finding out the potential for diversification in selected Asian countries and United States of America by studying correlations in the index returns.

  19. H-BLAST: a fast protein sequence alignment toolkit on heterogeneous computers with GPUs.

    Science.gov (United States)

    Ye, Weicai; Chen, Ying; Zhang, Yongdong; Xu, Yuesheng

    2017-04-15

    The sequence alignment is a fundamental problem in bioinformatics. BLAST is a routinely used tool for this purpose with over 118 000 citations in the past two decades. As the size of bio-sequence databases grows exponentially, the computational speed of alignment softwares must be improved. We develop the heterogeneous BLAST (H-BLAST), a fast parallel search tool for a heterogeneous computer that couples CPUs and GPUs, to accelerate BLASTX and BLASTP-basic tools of NCBI-BLAST. H-BLAST employs a locally decoupled seed-extension algorithm for better performance on GPUs, and offers a performance tuning mechanism for better efficiency among various CPUs and GPUs combinations. H-BLAST produces identical alignment results as NCBI-BLAST and its computational speed is much faster than that of NCBI-BLAST. Speedups achieved by H-BLAST over sequential NCBI-BLASTP (resp. NCBI-BLASTX) range mostly from 4 to 10 (resp. 5 to 7.2). With 2 CPU threads and 2 GPUs, H-BLAST can be faster than 16-threaded NCBI-BLASTX. Furthermore, H-BLAST is 1.5-4 times faster than GPU-BLAST. https://github.com/Yeyke/H-BLAST.git. yux06@syr.edu. Supplementary data are available at Bioinformatics online.

  20. Integration of GPUs in the CMS software framework

    CERN Document Server

    Samaras-Tsakiris, Konstantinos

    2015-01-01

    The goal of my summer student project was the exploration of prototype systems for the integration of GPUs in the CMS offline software framework. Any proposed solution would have to address the issues of portability in the absence of GPUs, ease of use and efficiency. My contribution was the creation of a CMSSW service for submitting tasks to the GPUs, therefore allowing to execute CUDA code.

  1. Exploiting GPUs in Virtual Machine for BioCloud

    Directory of Open Access Journals (Sweden)

    Heeseung Jo

    2013-01-01

    Full Text Available Recently, biological applications start to be reimplemented into the applications which exploit many cores of GPUs for better computation performance. Therefore, by providing virtualized GPUs to VMs in cloud computing environment, many biological applications will willingly move into cloud environment to enhance their computation performance and utilize infinite cloud computing resource while reducing expenses for computations. In this paper, we propose a BioCloud system architecture that enables VMs to use GPUs in cloud environment. Because much of the previous research has focused on the sharing mechanism of GPUs among VMs, they cannot achieve enough performance for biological applications of which computation throughput is more crucial rather than sharing. The proposed system exploits the pass-through mode of PCI express (PCI-E channel. By making each VM be able to access underlying GPUs directly, applications can show almost the same performance as when those are in native environment. In addition, our scheme multiplexes GPUs by using hot plug-in/out device features of PCI-E channel. By adding or removing GPUs in each VM in on-demand manner, VMs in the same physical host can time-share their GPUs. We implemented the proposed system using the Xen VMM and NVIDIA GPUs and showed that our prototype is highly effective for biological GPU applications in cloud environment.

  2. Development of Desktop Computing Applications and Engineering Tools on GPUs

    DEFF Research Database (Denmark)

    Sørensen, Hans Henrik Brandenborg; Glimberg, Stefan Lemvig; Hansen, Toke Jansen

    (GPUs) for high-performance computing applications and software tools in science and engineering, inverse problems, visualization, imaging, dynamic optimization. The goals are to contribute to the development of new state-of-the-art mathematical models and algorithms for maximum throughout performance......GPULab - A competence center and laboratory for research and collaboration within academia and partners in industry has been established in 2008 at section for Scientific Computing, DTU informatics, Technical University of Denmark. In GPULab we focus on the utilization of Graphics Processing Units......, improved performance profiling tools and assimilation of results to academic and industrial partners in our network. Our approaches calls for multi-disciplinary skills and understanding of hardware, software development, profiling tools and tuning techniques, analytical methods for analysis and development...

  3. The Telescope Array Middle Drum fluorescence detector simulation on GPUs

    Science.gov (United States)

    Abu-Zayyad, Tareq; Telescope-Array Collaboration

    2014-06-01

    In recent years, the Graphics Processing Unit (GPU) has been recognized and widely used as an accelerator for many scientific calculations. In general, problems amenable to parallelization are ones that benefit most from the use of GPUs. The Monte Carlo simulation of fluorescence detector response to air showers presents many opportunities for parallelization. In this paper we report on a Monte Carlo program used for the simulation of the Telescope Array Fluorescence Detector located at the Middle Drum site which uses GPU acceleration. All of the physics simulation from shower development, light production and atmospheric attenuation, as well as, the realistic detector optics and electronics simulations are done on the GPU. A detailed description of the code implementation is given, and results on the accuracy and performance of the simulation are presented as well. Improvements in computational throughput in excess of 50× are reported and the accuracy of the results is on par with the CPU implementation of the simulation.

  4. A New Data Layout For Set Intersection on GPUs

    DEFF Research Database (Denmark)

    Amossen, Rasmus Resen; Pagh, Rasmus

    2011-01-01

    Set intersection is the core in a variety of problems, e.g. frequent itemset mining and sparse boolean matrix multiplication. It is well-known that large speed gains can, for some computational problems, be obtained by using a graphics processing unit (GPU) as a massively parallel computing device....... However, GPUs require highly regular control flow and memory access patterns, and for this reason previous GPU methods for intersecting sets have used a simple bitmap representation. This representation requires excessive space on sparse data sets. In this paper we present a novel data layout, BATMAP....... The main finding is that our method is able to achieve speedups over both Apriori and FP-growth when the number of distinct items is large, and the density of the problem instance is above 1%. Previous implementations of frequent itemset mining on GPU have not been able to show speedups over the best...

  5. cuLGT: Lattice Gauge Fixing on GPUs

    CERN Document Server

    Vogt, Hannes

    2014-01-01

    We adopt CUDA-capable Graphic Processing Units (GPUs) for Landau, Coulomb and maximally Abelian gauge fixing in 3+1 dimensional SU(3) and SU(2) lattice gauge field theories. A combination of simulated annealing and overrelaxation is used to aim for the global maximum of the gauge functional. We use a fine grained degree of parallelism to achieve the maximum performance: instead of the common 1 thread per site strategy we use 4 or 8 threads per lattice site. Here, we report on an improved version of our publicly available code (www.cuLGT.com and github.com/culgt) which again increases performance and is much easier to include in existing code. On the GeForce GTX 580 we achieve up to 470 GFlops (utilizing 80% of the theoretical peak bandwidth) for the Landau overrelaxation code.

  6. Data Acquisition with GPUs: The DAQ for the Muon $g$-$2$ Experiment at Fermilab

    Energy Technology Data Exchange (ETDEWEB)

    Gohn, W. [Kentucky U.

    2016-11-15

    Graphical Processing Units (GPUs) have recently become a valuable computing tool for the acquisition of data at high rates and for a relatively low cost. The devices work by parallelizing the code into thousands of threads, each executing a simple process, such as identifying pulses from a waveform digitizer. The CUDA programming library can be used to effectively write code to parallelize such tasks on Nvidia GPUs, providing a significant upgrade in performance over CPU based acquisition systems. The muon $g$-$2$ experiment at Fermilab is heavily relying on GPUs to process its data. The data acquisition system for this experiment must have the ability to create deadtime-free records from 700 $\\mu$s muon spills at a raw data rate 18 GB per second. Data will be collected using 1296 channels of $\\mu$TCA-based 800 MSPS, 12 bit waveform digitizers and processed in a layered array of networked commodity processors with 24 GPUs working in parallel to perform a fast recording of the muon decays during the spill. The described data acquisition system is currently being constructed, and will be fully operational before the start of the experiment in 2017.

  7. Data Acquisition with GPUs: The DAQ for the Muon $g$-$2$ Experiment at Fermilab

    CERN Document Server

    Gohn, W

    2016-01-01

    Graphical Processing Units (GPUs) have recently become a valuable computing tool for the acquisition of data at high rates and for a relatively low cost. The devices work by parallelizing the code into thousands of threads, each executing a simple process, such as identifying pulses from a waveform digitizer. The CUDA programming library can be used to effectively write code to parallelize such tasks on Nvidia GPUs, providing a significant upgrade in performance over CPU based acquisition systems. The muon $g$-$2$ experiment at Fermilab is heavily relying on GPUs to process its data. The data acquisition system for this experiment must have the ability to create deadtime-free records from 700 $\\mu$s muon spills at a raw data rate 18 GB per second. Data will be collected using 1296 channels of $\\mu$TCA-based 800 MSPS, 12 bit waveform digitizers and processed in a layered array of networked commodity processors with 24 GPUs working in parallel to perform a fast recording of the muon decays during the spill. The...

  8. A convolution-superposition dose calculation engine for GPUs

    Energy Technology Data Exchange (ETDEWEB)

    Hissoiny, Sami; Ozell, Benoit; Despres, Philippe [Departement de genie informatique et genie logiciel, Ecole polytechnique de Montreal, 2500 Chemin de Polytechnique, Montreal, Quebec H3T 1J4 (Canada); Departement de radio-oncologie, CRCHUM-Centre hospitalier de l' Universite de Montreal, 1560 rue Sherbrooke Est, Montreal, Quebec H2L 4M1 (Canada)

    2010-03-15

    Purpose: Graphic processing units (GPUs) are increasingly used for scientific applications, where their parallel architecture and unprecedented computing power density can be exploited to accelerate calculations. In this paper, a new GPU implementation of a convolution/superposition (CS) algorithm is presented. Methods: This new GPU implementation has been designed from the ground-up to use the graphics card's strengths and to avoid its weaknesses. The CS GPU algorithm takes into account beam hardening, off-axis softening, kernel tilting, and relies heavily on raytracing through patient imaging data. Implementation details are reported as well as a multi-GPU solution. Results: An overall single-GPU acceleration factor of 908x was achieved when compared to a nonoptimized version of the CS algorithm implemented in PlanUNC in single threaded central processing unit (CPU) mode, resulting in approximatively 2.8 s per beam for a 3D dose computation on a 0.4 cm grid. A comparison to an established commercial system leads to an acceleration factor of approximately 29x or 0.58 versus 16.6 s per beam in single threaded mode. An acceleration factor of 46x has been obtained for the total energy released per mass (TERMA) calculation and a 943x acceleration factor for the CS calculation compared to PlanUNC. Dose distributions also have been obtained for a simple water-lung phantom to verify that the implementation gives accurate results. Conclusions: These results suggest that GPUs are an attractive solution for radiation therapy applications and that careful design, taking the GPU architecture into account, is critical in obtaining significant acceleration factors. These results potentially can have a significant impact on complex dose delivery techniques requiring intensive dose calculations such as intensity-modulated radiation therapy (IMRT) and arc therapy. They also are relevant for adaptive radiation therapy where dose results must be obtained rapidly.

  9. Accelerating QDP++/Chroma on GPUs

    CERN Document Server

    Winter, Frank

    2011-01-01

    Extensions to the C++ implementation of the QCD Data Parallel Interface are provided enabling acceleration of expression evaluation on NVIDIA GPUs. Single expressions are off-loaded to the device memory and execution domain leveraging the Portable Expression Template Engine and using Just-in-Time compilation techniques. Memory management is automated by a software implementation of a cache controlling the GPU's memory. Interoperability with existing Krylov space solvers is demonstrated and special attention is paid on 'Chroma readiness'. Non-kernel routines in lattice QCD calculations typically not subject of hand-tuned optimisations are accelerated which can reduce the effects otherwise suffered from Amdahl's Law.

  10. Generating SU(Nc) pure gauge lattice QCD configurations on GPUs with CUDA and OpenMP

    CERN Document Server

    Cardoso, Nuno

    2011-01-01

    The starting point of any lattice QCD computation is the generation of a Markov chain of gauge field configurations. Due to the large number of lattice links and due to the matrix multiplications, generating SU(Nc) lattice QCD configurations is a highly demanding computational task, requiring advanced computer parallel architectures such as clusters of several Central Processing Units (CPUs) or Graphics Processing Units (GPUs). In this paper we present and explore the performance of CUDA codes for NVIDIA GPUs to generate SU(Nc) lattice QCD pure gauge configurations. Our implementation in one GPU uses CUDA and in multiple GPUs uses OpenMP and CUDA. We present optimized CUDA codes SU(2), SU(3) and SU(4). We also show a generic SU(Nc) code for Nc$\\,\\geq 4$ and compare it with the optimized version of SU(4). Our codes are publicly available for free use by the lattice QCD community.

  11. Code Optimization on Kepler GPUs and Xeon Phi

    CERN Document Server

    Jang, Yong-Chull; Kim, Jangho; Lee, Weonjong; Pak, Jeonghwan; Chung, Yuree

    2014-01-01

    Kepler GTX Titan Black and Kepler Tesla K40 are still the best GPUs for high performance computing, although Maxwell GPUs such as GTX 980 are available in the market. Hence, we measure the performance of our lattice QCD codes using the Kepler GPUs. We also upgrade our code to use the latest CPS (Columbia Physics System) library along with the most recent QUDA (QCD CUDA) library for lattice QCD. These new libraries improve the performance of our conjugate gradient (CG) inverter so that it runs twice faster than before. We also investigate the performance of Xeon Phi 7120P coprocessor. It has similar computing power with the Kepler GPUs in principle. However, its performance for our CG code is significantly inferior to that of the GTX Titan Black GPUs at present.

  12. SIML: a fast SIMD algorithm for calculating LINGO chemical similarities on GPUs and CPUs.

    Science.gov (United States)

    Haque, Imran S; Pande, Vijay S; Walters, W Patrick

    2010-04-26

    LINGOs are a holographic measure of chemical similarity based on text comparison of SMILES strings. We present a new algorithm for calculating LINGO similarities amenable to parallelization on SIMD architectures (such as GPUs and vector units of modern CPUs). We show that it is nearly 3x as fast as existing algorithms on a CPU, and over 80x faster than existing methods when run on a GPU.

  13. GPUs: An Oasis in the Supercomputing Desert

    CERN Document Server

    Kamleh, Waseem

    2012-01-01

    A novel metric is introduced to compare the supercomputing resources available to academic researchers on a national basis. Data from the supercomputing Top 500 and the top 500 universities in the Academic Ranking of World Universities (ARWU) are combined to form the proposed "500/500" score for a given country. Australia scores poorly in the 500/500 metric when compared with other countries with a similar ARWU ranking, an indication that HPC-based researchers in Australia are at a relative disadvantage with respect to their overseas competitors. For HPC problems where single precision is sufficient, commodity GPUs provide a cost-effective means of quenching the computational thirst of otherwise parched Lattice practitioners traversing the Australian supercomputing desert. We explore some of the more difficult terrain in single precision territory, finding that BiCGStab is unreliable in single precision at large lattice sizes. We test the CGNE and CGNR forms of the conjugate gradient method on the normal equa...

  14. Jet browser model accelerated by GPUs

    Directory of Open Access Journals (Sweden)

    Forster Richárd

    2016-12-01

    Full Text Available In the last centuries the experimental particle physics began to develop thank to growing capacity of computers among others. It is allowed to know the structure of the matter to level of quark gluon. Plasma in the strong interaction. Experimental evidences supported the theory to measure the predicted results. Since its inception the researchers are interested in the track reconstruction. We studied the jet browser model, which was developed for 4π calorimeter. This method works on the measurement data set, which contain the components of interaction points in the detector space and it allows to examine the trajectory reconstruction of the final state particles. We keep the total energy in constant values and it satisfies the Gauss law. Using GPUs the evaluation of the model can be drastically accelerated, as we were able to achieve up to 223 fold speedup compared to a CPU based parallel implementation.

  15. Improving the Mapping of Smith-Waterman Sequence Database Searches onto CUDA-Enabled GPUs

    Directory of Open Access Journals (Sweden)

    Liang-Tsung Huang

    2015-01-01

    Full Text Available Sequence alignment lies at heart of the bioinformatics. The Smith-Waterman algorithm is one of the key sequence search algorithms and has gained popularity due to improved implementations and rapidly increasing compute power. Recently, the Smith-Waterman algorithm has been successfully mapped onto the emerging general-purpose graphics processing units (GPUs. In this paper, we focused on how to improve the mapping, especially for short query sequences, by better usage of shared memory. We performed and evaluated the proposed method on two different platforms (Tesla C1060 and Tesla K20 and compared it with two classic methods in CUDASW++. Further, the performance on different numbers of threads and blocks has been analyzed. The results showed that the proposed method significantly improves Smith-Waterman algorithm on CUDA-enabled GPUs in proper allocation of block and thread numbers.

  16. Computation Reduction Oriented Circular Scanning SAR Raw Data Simulation on Multi-GPUs

    Directory of Open Access Journals (Sweden)

    Hu Chen

    2016-08-01

    Full Text Available As a special working mode, the circular scanning Synthetic Aperture Radar (SAR is widely used in the earth observation. With the increase of resolution and swath width, the simulation data has a massive increase, which boosts the new requirements of efficiency. Through analyzing the redundancy in the raw data simulation based on Graphics Processing Unit (GPU, a fast simulation method considering reduction of redundant computation is realized by the multi-GPUs and Message Passing Interface (MPI. The results show that the efficiency of 4-GPUs increases 2 times through the redundant reduction, and the hardware cost decreases by 50%, thus the overall speedup achieves 350 times than the traditional CPU simulation.

  17. Improving the Mapping of Smith-Waterman Sequence Database Searches onto CUDA-Enabled GPUs.

    Science.gov (United States)

    Huang, Liang-Tsung; Wu, Chao-Chin; Lai, Lien-Fu; Li, Yun-Ju

    2015-01-01

    Sequence alignment lies at heart of the bioinformatics. The Smith-Waterman algorithm is one of the key sequence search algorithms and has gained popularity due to improved implementations and rapidly increasing compute power. Recently, the Smith-Waterman algorithm has been successfully mapped onto the emerging general-purpose graphics processing units (GPUs). In this paper, we focused on how to improve the mapping, especially for short query sequences, by better usage of shared memory. We performed and evaluated the proposed method on two different platforms (Tesla C1060 and Tesla K20) and compared it with two classic methods in CUDASW++. Further, the performance on different numbers of threads and blocks has been analyzed. The results showed that the proposed method significantly improves Smith-Waterman algorithm on CUDA-enabled GPUs in proper allocation of block and thread numbers.

  18. Coulomb, Landau and Maximally Abelian Gauge Fixing in Lattice QCD with Multi-GPUs

    CERN Document Server

    Schröck, Mario

    2013-01-01

    A lattice gauge theory framework for simulations on graphic processing units (GPUs) using NVIDIA's CUDA is presented. The code comprises template classes that take care of an optimal data pattern to ensure coalesced reading from device memory to achieve maximum performance. In this work we concentrate on applications for lattice gauge fixing in 3+1 dimensional SU(3) lattice gauge field theories. We employ the overrelaxation, stochastic relaxation and simulated annealing algorithms which are perfectly suited to be accelerated by highly parallel architectures like GPUs. The applications support the Coulomb, Landau and maximally Abelian gauges. Moreover, we explore the evolution of the numerical accuracy of the SU(3) valued degrees of freedom over the runtime of the algorithms in single (SP) and double precision (DP). Therefrom we draw conclusions on the reliability of SP and DP simulations and suggest a mixed precision scheme that performs the critical parts of the algorithm in full DP while retaining 80-90% of...

  19. A portable platform for accelerated PIC codes and its application to GPUs using OpenACC

    CERN Document Server

    Hariri, F; Jocksch, A; Lanti, E; Progsch, J; Messmer, P; Brunner, S; Gheller, G; Villard, L

    2016-01-01

    We present a portable platform, called PIC_ENGINE, for accelerating Particle-In-Cell (PIC) codes on heterogeneous many-core architectures such as Graphic Processing Units (GPUs). The aim of this development is efficient simulations on future exascale systems by allowing different parallelization strategies depending on the application problem and the specific architecture. To this end, this platform contains the basic steps of the PIC algorithm and has been designed as a test bed for different algorithmic options and data structures. Among the architectures that this engine can explore, particular attention is given here to systems equipped with GPUs. The study demonstrates that our portable PIC implementation based on the OpenACC programming model can achieve performance closely matching theoretical predictions. Using the Cray XC30 system, Piz Daint, at the Swiss National Supercomputing Centre (CSCS), we show that PIC_ENGINE running on an NVIDIA Kepler K20X GPU can outperform the one on an Intel Sandybridge ...

  20. GPUs, a new tool of acceleration in CFD: efficiency and reliability on smoothed particle hydrodynamics methods.

    Science.gov (United States)

    Crespo, Alejandro C; Dominguez, Jose M; Barreiro, Anxo; Gómez-Gesteira, Moncho; Rogers, Benedict D

    2011-01-01

    Smoothed Particle Hydrodynamics (SPH) is a numerical method commonly used in Computational Fluid Dynamics (CFD) to simulate complex free-surface flows. Simulations with this mesh-free particle method far exceed the capacity of a single processor. In this paper, as part of a dual-functioning code for either central processing units (CPUs) or Graphics Processor Units (GPUs), a parallelisation using GPUs is presented. The GPU parallelisation technique uses the Compute Unified Device Architecture (CUDA) of nVidia devices. Simulations with more than one million particles on a single GPU card exhibit speedups of up to two orders of magnitude over using a single-core CPU. It is demonstrated that the code achieves different speedups with different CUDA-enabled GPUs. The numerical behaviour of the SPH code is validated with a standard benchmark test case of dam break flow impacting on an obstacle where good agreement with the experimental results is observed. Both the achieved speed-ups and the quantitative agreement with experiments suggest that CUDA-based GPU programming can be used in SPH methods with efficiency and reliability.

  1. GPUs, a new tool of acceleration in CFD: efficiency and reliability on smoothed particle hydrodynamics methods.

    Directory of Open Access Journals (Sweden)

    Alejandro C Crespo

    Full Text Available Smoothed Particle Hydrodynamics (SPH is a numerical method commonly used in Computational Fluid Dynamics (CFD to simulate complex free-surface flows. Simulations with this mesh-free particle method far exceed the capacity of a single processor. In this paper, as part of a dual-functioning code for either central processing units (CPUs or Graphics Processor Units (GPUs, a parallelisation using GPUs is presented. The GPU parallelisation technique uses the Compute Unified Device Architecture (CUDA of nVidia devices. Simulations with more than one million particles on a single GPU card exhibit speedups of up to two orders of magnitude over using a single-core CPU. It is demonstrated that the code achieves different speedups with different CUDA-enabled GPUs. The numerical behaviour of the SPH code is validated with a standard benchmark test case of dam break flow impacting on an obstacle where good agreement with the experimental results is observed. Both the achieved speed-ups and the quantitative agreement with experiments suggest that CUDA-based GPU programming can be used in SPH methods with efficiency and reliability.

  2. Tiling for Performance Tuning on Different Models of GPUs

    CERN Document Server

    Xu, Chang; Jenkins, Samantha

    2010-01-01

    The strategy of using CUDA-compatible GPUs as a parallel computation solution to improve the performance of programs has been more and more widely approved during the last two years since the CUDA platform was released. Its benefit extends from the graphic domain to many other computationally intensive domains. Tiling, as the most general and important technique, is widely used for optimization in CUDA programs. New models of GPUs with better compute capabilities have, however, been released, new versions of CUDA SDKs were also released. These updated compute capabilities must to be considered when optimizing using the tiling technique. In this paper, we implement image interpolation algorithms as a test case to discuss how different tiling strategies affect the program's performance. We especially focus on how the different models of GPUs affect the tiling's effectiveness by executing the same program on two different models of GPUs equipped testing platforms. The results demonstrate that an optimized tiling...

  3. Accelerating metagenomic read classification on CUDA-enabled GPUs

    National Research Council Canada - National Science Library

    Kobus, Robin; Hundt, Christian; Müller, André; Schmidt, Bertil

    2017-01-01

    ... metagenomic read classification are urgently needed. Results We present cuCLARK, a read-level classifier for CUDA-enabled GPUs, based on the fast and accurate classification of metagenomic sequences using reduced k-mers (CLARK) method...

  4. Accelerating frequency-domain diffuse optical tomographic image reconstruction using graphics processing units.

    Science.gov (United States)

    Prakash, Jaya; Chandrasekharan, Venkittarayan; Upendra, Vishwajith; Yalavarthy, Phaneendra K

    2010-01-01

    Diffuse optical tomographic image reconstruction uses advanced numerical models that are computationally costly to be implemented in the real time. The graphics processing units (GPUs) offer desktop massive parallelization that can accelerate these computations. An open-source GPU-accelerated linear algebra library package is used to compute the most intensive matrix-matrix calculations and matrix decompositions that are used in solving the system of linear equations. These open-source functions were integrated into the existing frequency-domain diffuse optical image reconstruction algorithms to evaluate the acceleration capability of the GPUs (NVIDIA Tesla C 1060) with increasing reconstruction problem sizes. These studies indicate that single precision computations are sufficient for diffuse optical tomographic image reconstruction. The acceleration per iteration can be up to 40, using GPUs compared to traditional CPUs in case of three-dimensional reconstruction, where the reconstruction problem is more underdetermined, making the GPUs more attractive in the clinical settings. The current limitation of these GPUs in the available onboard memory (4 GB) that restricts the reconstruction of a large set of optical parameters, more than 13,377.

  5. A Decade-Long European-Scale Convection-Resolving Climate Simulation on GPUs

    Science.gov (United States)

    Leutwyler, D.; Fuhrer, O.; Ban, N.; Lapillonne, X.; Lüthi, D.; Schar, C.

    2016-12-01

    Convection-resolving models have proven to be very useful tools in numerical weather prediction and in climate research. However, due to their extremely demanding computational requirements, they have so far been limited to short simulations and/or small computational domains. Innovations in the supercomputing domain have led to new supercomputer designs that involve conventional multi-core CPUs and accelerators such as graphics processing units (GPUs). One of the first atmospheric models that has been fully ported to GPUs is the Consortium for Small-Scale Modeling weather and climate model COSMO. This new version allows us to expand the size of the simulation domain to areas spanning continents and the time period up to one decade. We present results from a decade-long, convection-resolving climate simulation over Europe using the GPU-enabled COSMO version on a computational domain with 1536x1536x60 gridpoints. The simulation is driven by the ERA-interim reanalysis. The results illustrate how the approach allows for the representation of interactions between synoptic-scale and meso-scale atmospheric circulations at scales ranging from 1000 to 10 km. We discuss some of the advantages and prospects from using GPUs, and focus on the performance of the convection-resolving modeling approach on the European scale. Specifically we investigate the organization of convective clouds and on validate hourly rainfall distributions with various high-resolution data sets.

  6. Fast reconstruction of 3D volumes from 2D CT projection data with GPUs.

    Science.gov (United States)

    Leeser, Miriam; Mukherjee, Saoni; Brock, James

    2014-08-30

    Biomedical image reconstruction applications require producing high fidelity images in or close to real-time. We have implemented reconstruction of three dimensional conebeam computed tomography(CBCT) with two dimensional projections. The algorithm takes slices of the target, weights and filters them to backproject the data, then creates the final 3D volume. We have implemented the algorithm using several hardware and software approaches and taken advantage of different types of parallelism in modern processors. The two hardware platforms used are a Central Processing Unit (CPU) and a heterogeneous system with a combination of CPU and GPU. On the CPU we implement serial MATLAB, parallel MATLAB, C and parallel C with OpenMP extensions. These codes are compared against the heterogeneous versions written in CUDA-C and OpenCL. Our results show that GPUs are particularly well suited to accelerating CBCT. Relative performance was evaluated on a mathematical phantom as well as on mouse data. Speedups of up to 200x are observed by using an AMD GPU compared to a parallel version in C with OpenMP constructs. In this paper, we have implemented the Feldkamp-Davis-Kress algorithm, compatible with Fessler's image reconstruction toolbox and tested it on different hardware platforms including CPU and a combination of CPU and GPU. Both NVIDIA and AMD GPUs have been used for performance evaluation. GPUs provide significant speedup over the parallel CPU version.

  7. Optimizing strassen matrix multiply on GPUs

    KAUST Repository

    ul Hasan Khan, Ayaz

    2015-06-01

    © 2015 IEEE. Many core systems are basically designed for applications having large data parallelism. Strassen Matrix Multiply (MM) can be formulated as a depth first (DFS) traversal of a recursion tree where all cores work in parallel on computing each of the NxN sub-matrices that reduces storage at the detriment of large data motion to gather and aggregate the results. We propose Strassen and Winograd algorithms (S-MM and W-MM) based on three optimizations: a set of basic algebra functions to reduce overhead, invoking efficient library (CUBLAS 5.5), and parameter-tuning of parametric kernel to improve resource occupancy. On GPUs, W-MM and S-MM with one recursion level outperform CUBLAS 5.5 Library with up to twice as faster for large arrays satisfying N>=2048 and N>=3072, respectively. Compared to NVIDIA SDK library, S-MM and W-MM achieved a speedup between 20x to 80x for the above arrays. The proposed approach can be used to enhance the performance of CUBLAS and MKL libraries.

  8. Redesigning Triangular Dense Matrix Computations on GPUs

    KAUST Repository

    Charara, Ali

    2016-08-09

    A new implementation of the triangular matrix-matrix multiplication (TRMM) and the triangular solve (TRSM) kernels are described on GPU hardware accelerators. Although part of the Level 3 BLAS family, these highly computationally intensive kernels fail to achieve the percentage of the theoretical peak performance on GPUs that one would expect when running kernels with similar surface-to-volume ratio on hardware accelerators, i.e., the standard matrix-matrix multiplication (GEMM). The authors propose adopting a recursive formulation, which enriches the TRMM and TRSM inner structures with GEMM calls and, therefore, reduces memory traffic while increasing the level of concurrency. The new implementation enables efficient use of the GPU memory hierarchy and mitigates the latency overhead, to run at the speed of the higher cache levels. Performance comparisons show up to eightfold and twofold speedups for large dense matrix sizes, against the existing state-of-the-art TRMM and TRSM implementations from NVIDIA cuBLAS, respectively, across various GPU generations. Once integrated into high-level Cholesky-based dense linear algebra algorithms, the performance impact on the overall applications demonstrates up to fourfold and twofold speedups, against the equivalent native implementations, linked with cuBLAS TRMM and TRSM kernels, respectively. The new TRMM/TRSM kernel implementations are part of the open-source KBLAS software library (http://ecrc.kaust.edu.sa/Pages/Res-kblas.aspx) and are lined up for integration into the NVIDIA cuBLAS library in the upcoming v8.0 release.

  9. Parallelization Strategies for Ant Colony Optimisation on GPUs

    CERN Document Server

    Cecilia, Jose M; Ujaldon, Manuel; Nisbet, Andy; Amos, Martyn

    2011-01-01

    Ant Colony Optimisation (ACO) is an effective population-based meta-heuristic for the solution of a wide variety of problems. As a population-based algorithm, its computation is intrinsically massively parallel, and it is there- fore theoretically well-suited for implementation on Graphics Processing Units (GPUs). The ACO algorithm comprises two main stages: Tour construction and Pheromone update. The former has been previously implemented on the GPU, using a task-based parallelism approach. However, up until now, the latter has always been implemented on the CPU. In this paper, we discuss several parallelisation strategies for both stages of the ACO algorithm on the GPU. We propose an alternative data-based parallelism scheme for Tour construction, which fits better on the GPU architecture. We also describe novel GPU programming strategies for the Pheromone update stage. Our results show a total speed-up exceeding 28x for the Tour construction stage, and 20x for Pheromone update, and suggest that ACO is a po...

  10. Online tracking with GPUs at the PANDA experiment

    Energy Technology Data Exchange (ETDEWEB)

    Bianchi, Ludovico; Herten, Andreas; Ritman, James; Stockmanns, Tobias [Forschungszentrum Juelich (Germany); Collaboration: PANDA-Collaboration

    2015-07-01

    The PANDA experiment is a next generation particle detector planned for operation at the FAIR facility, that will study collisions of antiprotons with beam momenta of 1.5-15 GeV/c on a fixed proton target. Signal and background events at PANDA will look very similar, making a conventional hardware-trigger based approach unfeasible. Instead, data coming from the detector are acquired continuously, and event selection is performed in real time. A rejection factor of up to 1000 is needed to reduce the data rate for offline storage, making the data acquisition system computationally very challenging. Our activity within the PANDA collaboration is centered on the development and implementation of particle tracking algorithms on Graphical Processing Units (GPUs), and on studying the possibility of performing tracking for online event filtering using a multi-GPU architecture. Three algorithms are currently being developed, using information from the PANDA tracking system: a Hough Transform, a Riemann Track Finder, and a Triplet Finder algorithm. This talk presents the algorithms, their performance, and studies for GPU data transfer methods based on so-called message queues for a deeper integration of the algorithms with the FairRoot and PandaRoot frameworks.

  11. Spotting Radio Transients with the help of GPUs

    CERN Document Server

    Barsdell, Benjamin R; Barnes, David G; Fluke, Christopher J

    2011-01-01

    Exploration of the time-domain radio sky has huge potential for advancing our knowledge of the dynamic universe. Past surveys have discovered large numbers of pulsars, rotating radio transients and other transient radio phenomena; however, they have typically relied upon off-line processing to cope with the high data and processing rate. This paradigm rules out the possibility of obtaining high-resolution base-band dumps of significant events or of performing immediate follow-up observations, limiting analysis power to what can be gleaned from detection data alone. To overcome this limitation, real-time processing and detection of transient radio events is required. By exploiting the significant computing power of modern graphics processing units (GPUs), we are developing a transient-detection pipeline that runs in real-time on data from the Parkes radio telescope. In this paper we discuss the algorithms used in our pipeline, the details of their implementation on the GPU and the challenges posed by the prese...

  12. Special Offers

    CERN Multimedia

    Association du personnel

    2011-01-01

    Walibi Rhône-Alpes is open until 31 October. Reduced prices for children and adults at this French attraction park in Les Avenières. For more information about all these offers, please consult our web site: http://association.web.cern.ch/association/en/OtherActivities/Offers.html

  13. Accelerated molecular dynamics simulations of the octopamine receptor using GPUs: discovery of an alternate agonist-binding position.

    Science.gov (United States)

    Kastner, Kevin W; Izaguirre, Jesús A

    2016-10-01

    Octopamine receptors (OARs) perform key biological functions in invertebrates, making this class of G-protein coupled receptors (GPCRs) worth considering for insecticide development. However, no crystal structures and very little research exists for OARs. Furthermore, GPCRs are large proteins, are suspended in a lipid bilayer, and are activated on the millisecond timescale, all of which make conventional molecular dynamics (MD) simulations infeasible, even if run on large supercomputers. However, accelerated Molecular Dynamics (aMD) simulations can reduce this timescale to even hundreds of nanoseconds, while running the simulations on graphics processing units (GPUs) would enable even small clusters of GPUs to have processing power equivalent to hundreds of CPUs. Our results show that aMD simulations run on GPUs can successfully obtain the active and inactive state conformations of a GPCR on this reduced timescale. Furthermore, we discovered a potential alternate active-state agonist-binding position in the octopamine receptor which has yet to be observed and may be a novel GPCR agonist-binding position. These results demonstrate that a complex biological system with an activation process on the millisecond timescale can be successfully simulated on the nanosecond timescale using a simple computing system consisting of a small number of GPUs. Proteins 2016; 84:1480-1489. © 2016 Wiley Periodicals, Inc.

  14. Special offers

    CERN Multimedia

    Staff Association

    2011-01-01

    Are you a member of the Staff Association? Did you know that as a member you can benefit from the following special offers: BCGE (Banque Cantonale de Genève): personalized banking solutions with preferential conditions. TPG: reduced rates on annual transport passes for active and retired staff. Aquaparc: reduced ticket prices for children and adults at this Swiss waterpark in Le Bouveret. FNAC: 5% reduction on FNAC vouchers. For more information about all these offers, please consult our web site: http://association.web.cern.ch/association/en/OtherActivities/Offers.html

  15. Special Offers

    CERN Multimedia

    Association du personnel

    2011-01-01

    Are you a member of the Staff Association? Did you know that as a member you can benefit from the following special offers: BCGE (Banque Cantonale de Genève): personalized banking solutions with preferential conditions. TPG: reduced rates on annual transport passes for active and retired staff. Aquaparc: reduced ticket prices for children and adults at this Swiss waterpark in Le Bouveret. Walibi: reduced prices for children and adults at this French attraction park in Les Avenières. FNAC: 5% reduction on FNAC vouchers. For more information about all these offers, please consult our web site: http://association.web.cern.ch/association/en/OtherActivities/Offers.html

  16. The Association of Marital Status and Offers of Employer-based Health Insurance for Employed Women Aged 27-64: United States, 2014-2015.

    Science.gov (United States)

    Simpson, Jessica L; Cohen, Robin A

    2017-01-01

    Data from the National Health Interview Survey •Among employed women aged 27-64, unmarried women (72.2%) were more likely than married women (69.3%) to have been offered health insurance by their employer. •Among employed married women aged 27-64, 16.8% were offered health insurance only through their spouse's employer. •Considering all offers of health insurance (through a woman's employer or her spouse's employer), employed married women aged 27-64 (86.1%) were more likely than employed unmarried women (72.2%) to have had an employer offer of health insurance. •Regardless of educational attainment, and for most income and racial groups, employed married women aged 27-64 were more likely than employed unmarried women to have been offered health insurance by their employer or their spouse's employer. In 2015, women were less likely than men to have been insured through their own employer and more likely to have been covered as a dependent (1). This report describes the association of marital status and the presence of employer-based health insurance offers among employed women in the United States. Analyses are limited to women aged 27-64 to exclude offers associated with parental employment for those under age 27. An offer of employer-based health insurance includes offers by the woman's employer or her spouse's employer. The presence of an offer does not indicate offer take up. All material appearing in this report is in the public domain and may be reproduced or copied without permission; citation as to source, however, is appreciated.

  17. Offers INTERSOCCER

    CERN Document Server

    Staff Association

    2014-01-01

      Summer Football camps   New offer to the members of the Staff Association – INTERSOCCER: 12% discount on summer football camps and courses for children (bilingual) so do not hesitate anymore!    

  18. Performance of Kepler GTX Titan GPUs and Xeon Phi System

    CERN Document Server

    Jeong, Hwancheol; Pak, Jeonghwan; Choi, Kwang-jong; Park, Sang-Hyun; Yoo, Jun-sik; Kim, Joo Hwan; Lee, Joungjin; Lee, Young Woo

    2013-01-01

    NVIDIA's new architecture, Kepler improves GPU's performance significantly with the new streaming multiprocessor SMX. Along with the performance, NVIDIA has also introduced many new technologies such as direct parallelism, hyper-Q and GPU Direct with RDMA. Apart from other usual GPUs, NVIDIA also released another Kepler 'GeForce' GPU named GTX Titan. GeForce GTX Titan is not only good for gaming but also good for high performance computing with CUDA. Nevertheless, it is remarkably cheaper than Kepler Tesla GPUs. We investigate the performance of GTX Titan and find out how to optimize a CUDA code appropriately for it. Meanwhile, Intel has launched its new many integrated core (MIC) system, Xeon Phi. A Xeon Phi coprocessor could provide similar performance with NVIDIA Kepler GPUs theoretically but, in reality, it turns out that its performance is significantly inferior to GTX Titan.

  19. Special Offers

    CERN Multimedia

    Association du personnel

    2011-01-01

    Are you a member of the Staff Association? Did you know that as a member you can benefit from the following special offers: BCGE (Banque Cantonale de Genève): personalized banking solutions with preferential conditions.     TPG: reduced rates on annual transport passes for active and retired staff.     Aquaparc: reduced ticket prices for children and adults at this Swiss waterpark in Le Bouveret.     Walibi: reduced prices for children and adults at this French attraction park in Les Avenières.       FNAC: 5% reduction on FNAC vouchers.       For more information about all these offers, please consult our web site: http://association.web.cern.ch/association/en/OtherActivities/Offers.html

  20. Special Offers

    CERN Multimedia

    Staff Association

    2011-01-01

    Are you a member of the Staff Association? Did you know that as a member you can benefit from the following special offers: BCGE (Banque Cantonale de Genève): personalized banking solutions with preferential conditions.     TPG: reduced rates on annual transport passes for all active and retired staff.     Aquaparc: reduced ticket prices for children and adults at this Swiss waterpark in Le Bouveret.     Walibi: reduced prices for children and adults at this French attraction park in Les Avenières.       FNAC: 5% reduction on FNAC vouchers.       For more information about all these offers, please consult our web site: http://association.web.cern.ch/association/en/OtherActivities/Offers.html

  1. Special offer

    CERN Multimedia

    Staff Association

    2010-01-01

    Special offer for members of the Staff Association and their families 10% reduction on all products in the SEPHORA shop (sells perfume, beauty products etc.) in Val Thoiry ALL YEAR ROUND. Plus 20% reduction during their “vente privée”* three or four times a year. Simply present your Staff Association membership card when you make your purchase. * next “vente privée” from 24th to 29th May 2010  

  2. Special offer

    CERN Multimedia

    Staff Association

    2011-01-01

    SPECIAL OFFER FOR OUR MEMBERS Tarif unique Adulte/Enfant Entrée Zone terrestre 19 euros instead of 23 euros Entrée “Zone terrestre + aquatique” 24 euros instead of 31 euros Free for children under 3, with limited access to the attractions. Walibi Rhône-Alpes is open daily from 22 June to 31 August, and every week end from 3 September until 31 October. Closing of the “zone aquatique” 11 September.

  3. Special offer

    CERN Multimedia

    Staff Association

    2011-01-01

    SPECIAL OFFER FOR OUR MEMBERS Tarif unique Adulte/Enfant Entrée Zone terrestre 19 euros instead of 23 euros Entrée “Zone terrestre + aquatique” 24 euros instead of 31 euros Free for children under 3, with limited access to the attractions. Walibi Rhône-Alpes is open daily from 22 June to 31 August, and every week end from 3 September until 31 October. Closing of the “zone aquatique” 11 September.

  4. Computation of Galois field expressions for quaternary logic functions on GPUs

    Directory of Open Access Journals (Sweden)

    Gajić Dušan B.

    2014-01-01

    Full Text Available Galois field (GF expressions are polynomials used as representations of multiple-valued logic (MVL functions. For this purpose, MVL functions are considered as functions defined over a finite (Galois field of order p - GF(p. The problem of computing these functional expressions has an important role in areas such as digital signal processing and logic design. Time needed for computing GF-expressions increases exponentially with the number of variables in MVL functions and, as a result, it often represents a limiting factor in applications. This paper proposes a method for an accelerated computation of GF(4-expressions for quaternary (four-valued logic functions using graphics processing units (GPUs. The method is based on the spectral interpretation of GF-expressions, permitting the use of fast Fourier transform (FFT-like algorithms for their computation. These algorithms are then adapted for highly parallel processing on GPUs. The performance of the proposed solutions is compared with referent C/C++ implementations of the same algorithms processed on central processing units (CPUs. Experimental results confirm that the presented approach leads to significant reduction in processing times (up to 10.86 times when compared to CPU processing. Therefore, the proposed approach widens the set of problem instances which can be efficiently handled in practice. [Projekat Ministarstva nauke Republike Srbije, br. ON174026 i br. III44006

  5. Multi-GPUs parallel computation of dendrite growth in forced convection using the phase-field-lattice Boltzmann model

    Science.gov (United States)

    Sakane, Shinji; Takaki, Tomohiro; Rojas, Roberto; Ohno, Munekazu; Shibuta, Yasushi; Shimokawabe, Takashi; Aoki, Takayuki

    2017-09-01

    Melt flow drastically changes dendrite morphology during the solidification of pure metals and alloys. Numerical simulation of dendrite growth in the presence of the melt flow is crucial for the accurate prediction and control of the solidification microstructure. However, accurate simulations are difficult because of the large computational costs required. In this study, we develop a parallel computational scheme using multiple graphics processing units (GPUs) for a very large-scale three-dimensional phase-field-lattice Boltzmann simulation. In the model, a quantitative phase field model, which can accurately simulate the dendrite growth of a dilute binary alloy, and a lattice Boltzmann model to simulate the melt flow are coupled to simulate the dendrite growth in the melt flow. By performing very large-scale simulations using the developed scheme, we demonstrate the applicability of multi-GPUs parallel computation to the systematical large-scale-simulations of dendrite growth with the melt flow.

  6. Use of Multiple GPUs to Speedup the Execution of a Three-Dimensional Computational Model of the Innate Immune System

    Science.gov (United States)

    Xavier, M. P.; do Nascimento, T. M.; dos Santos, R. W.; Lobosco, M.

    2014-03-01

    The development of computational systems that mimics the physiological response of organs or even the entire body is a complex task. One of the issues that makes this task extremely complex is the huge computational resources needed to execute the simulations. For this reason, the use of parallel computing is mandatory. In this work, we focus on the simulation of temporal and spatial behaviour of some human innate immune system cells and molecules in a small three-dimensional section of a tissue. To perform this simulation, we use multiple Graphics Processing Units (GPUs) in a shared-memory environment. Despite of high initialization and communication costs imposed by the use of GPUs, the techniques used to implement the HIS simulator have shown to be very effective to achieve this purpose.

  7. Numerical characterization of nonlinear dynamical systems using parallel computing: The role of GPUS approach

    Science.gov (United States)

    Fazanaro, Filipe I.; Soriano, Diogo C.; Suyama, Ricardo; Madrid, Marconi K.; Oliveira, José Raimundo de; Muñoz, Ignacio Bravo; Attux, Romis

    2016-08-01

    The characterization of nonlinear dynamical systems and their attractors in terms of invariant measures, basins of attractions and the structure of their vector fields usually outlines a task strongly related to the underlying computational cost. In this work, the practical aspects related to the use of parallel computing - specially the use of Graphics Processing Units (GPUS) and of the Compute Unified Device Architecture (CUDA) - are reviewed and discussed in the context of nonlinear dynamical systems characterization. In this work such characterization is performed by obtaining both local and global Lyapunov exponents for the classical forced Duffing oscillator. The local divergence measure was employed by the computation of the Lagrangian Coherent Structures (LCSS), revealing the general organization of the flow according to the obtained separatrices, while the global Lyapunov exponents were used to characterize the attractors obtained under one or more bifurcation parameters. These simulation sets also illustrate the required computation time and speedup gains provided by different parallel computing strategies, justifying the employment and the relevance of GPUS and CUDA in such extensive numerical approach. Finally, more than simply providing an overview supported by a representative set of simulations, this work also aims to be a unified introduction to the use of the mentioned parallel computing tools in the context of nonlinear dynamical systems, providing codes and examples to be executed in MATLAB and using the CUDA environment, something that is usually fragmented in different scientific communities and restricted to specialists on parallel computing strategies.

  8. Energy Efficient Smartphones: Minimizing the Energy Consumption of Smartphone GPUs using DVFS Governors

    KAUST Repository

    Ahmad, Enas M.

    2013-05-15

    Modern smartphones are being designed with increasing processing power, memory capacity, network communication, and graphics performance. Although all of these features are enriching and expanding the experience of a smartphone user, they are significantly adding an overhead on the limited energy of the battery. This thesis aims at enhancing the energy efficiency of modern smartphones and increasing their battery life by minimizing the energy consumption of smartphones Graphical Processing Unit (GPU). Smartphone operating systems are becoming fully hardware-accelerated, which implies relying on the GPU power for rendering all application graphics. In addition, the GPUs installed in smartphones are becoming more and more powerful by the day. This raises an energy consumption concern. We present a novel implementation of GPU Scaling Governors, a Dynamic Voltage and Frequency Scaling (DVFS) scheme implemented in the Android kernel to dynamically scale the GPU. The scheme includes four main governors: Performance, Powersave, Ondmand, and Conservative. Unlike previous studies which looked into the power efficiency of mobile GPUs only through simulation and power estimations, we have implemented our approach on a real modern smartphone GPU, and acquired actual energy measurements using an external power monitor. Our results show that the energy consumption of smartphones can be reduced up to 15% using the Conservative governor in 2D rendering mode, and up to 9% in 3D rendering mode, with minimal effect on the performance.

  9. A practical implementation of 3D TTI reverse time migration with multi-GPUs

    Science.gov (United States)

    Li, Chun; Liu, Guofeng; Li, Yihang

    2017-05-01

    Tilted transversely isotropic (TTI) media are typical earth anisotropy media from practical observational studies. Accurate anisotropic imaging is recognized as a breakthrough in areas with complex anisotropic structures. TTI reverse time migration (RTM) is an important method for these areas. However, P and SV waves are coupled together in the pseudo-acoustic wave equation. The SV wave is regarded as an artifact for RTM of the P wave. We adopt matching of the anisotropy parameters to suppress the SV artifacts. Another problem in the implementation of TTI RTM is instability of the numerical solution for a variably oriented axis of symmetry. We adopt Fletcher's equation by setting a small amount of SV velocity without an acoustic approximation to stabilize the wavefield propagation. To improve calculation efficiency, we use NVIDIA graphic processing unit (GPU) with compute unified device architecture instead of traditional CPU architecture. To accomplish this, we introduced a random velocity boundary and an extended homogeneous anisotropic boundary for the remaining four anisotropic parameters in the source propagation. This process avoids large storage memory and IO requirements, which is important when using a GPU with limited bandwidth of PCI-E. Furthermore, we extend the single GPU code to multi-GPUs and present a corresponding high concurrent strategy with multiple asynchronous streams, which closely achieved an ideal speedup ratio of 2:1 when compared with a single GPU. Synthetic tests validate the correctness and effectiveness of our multi-GPUs-based TTI RTM method.

  10. Demonstration of the suitability of GPUs for AO real-time control at ELT scales

    Science.gov (United States)

    Bitenc, Urban; Basden, Alastair G.; Dipper, Nigel A.; Myers, Richard M.

    2016-07-01

    We have implemented the full AO data-processing pipeline on Graphics Processing Units (GPUs), within the framework of Durham AO Real-time Controller (DARC). The wavefront sensor images are copied from the CPU memory to the GPU memory. The GPU processes the data and the DM commands are copied back to the CPU. For a SCAO system of 80x80 subapertures, the rate achieved on a single GPU is about 700 frames per second (fps). This increases to 1100 fps (1565 fps) if we use two (four) GPUs. Jitter exhibits a distribution with the root-mean-square value of 20 μs-30 μs and a negligible number of outliers. The increase in latency due to the pixel data copying from the CPU to the GPU has been reduced to the minimum by copying the data in parallel to processing them. An alternative solution in which the data would be moved from the camera directly to the GPU, without CPU involvement, could be about 10%-20% faster. We have also implemented the correlation centroiding algorithm, which - when used - reduces the frame rate by about a factor of 2-3.

  11. A portable platform for accelerated PIC codes and its application to GPUs using OpenACC

    Science.gov (United States)

    Hariri, F.; Tran, T. M.; Jocksch, A.; Lanti, E.; Progsch, J.; Messmer, P.; Brunner, S.; Gheller, C.; Villard, L.

    2016-10-01

    We present a portable platform, called PIC_ENGINE, for accelerating Particle-In-Cell (PIC) codes on heterogeneous many-core architectures such as Graphic Processing Units (GPUs). The aim of this development is efficient simulations on future exascale systems by allowing different parallelization strategies depending on the application problem and the specific architecture. To this end, this platform contains the basic steps of the PIC algorithm and has been designed as a test bed for different algorithmic options and data structures. Among the architectures that this engine can explore, particular attention is given here to systems equipped with GPUs. The study demonstrates that our portable PIC implementation based on the OpenACC programming model can achieve performance closely matching theoretical predictions. Using the Cray XC30 system, Piz Daint, at the Swiss National Supercomputing Centre (CSCS), we show that PIC_ENGINE running on an NVIDIA Kepler K20X GPU can outperform the one on an Intel Sandy bridge 8-core CPU by a factor of 3.4.

  12. Solving the Ghost-Gluon System of Yang-Mills Theory on GPUs

    CERN Document Server

    Hopfer, Markus; Haase, Gundolf

    2012-01-01

    We solve the ghost-gluon system of Yang-Mills theory using Graphics Processing Units (GPUs). Working in Landau gauge, we use the Dyson-Schwinger formalism for the mathematical description as this approach is well-suited to directly benefit from the computing power of the GPUs. With the help of a Chebyshev expansion for the dressing functions and a subsequent appliance of a Newton-Raphson method, the non-linear system of coupled integral equations is linearized. The resulting Newton matrix is generated in parallel using OpenMPI and CUDA(TM). Our results show, that it is possible to cut down the run time by two orders of magnitude as compared to a sequential version of the code. This makes the proposed techniques well-suited for Dyson-Schwinger calculations on more complicated systems where the Yang-Mills sector of QCD serves as a starting point. In addition, the computation of Schwinger functions using GPU devices is studied.

  13. A Framework for Lattice QCD Calculations on GPUs

    CERN Document Server

    Winter, F T; Edwards, R G; Joó, B

    2014-01-01

    Computing platforms equipped with accelerators like GPUs have proven to provide great computational power. However, exploiting such platforms for existing scientific applications is not a trivial task. Current GPU programming frameworks such as CUDA C/C++ require low-level programming from the developer in order to achieve high performance code. As a result porting of applications to GPUs is typically limited to time-dominant algorithms and routines, leaving the remainder not accelerated which can open a serious Amdahl's law issue. The lattice QCD application Chroma allows to explore a different porting strategy. The layered structure of the software architecture logically separates the data-parallel from the application layer. The QCD Data-Parallel software layer provides data types and expressions with stencil-like operations suitable for lattice field theory and Chroma implements algorithms in terms of this high-level interface. Thus by porting the low-level layer one can effectively move the whole applica...

  14. Employing OpenCL to Accelerate Ab Initio Calculations on Graphics Processing Units.

    Science.gov (United States)

    Kussmann, Jörg; Ochsenfeld, Christian

    2017-06-13

    We present an extension of our graphics processing units (GPU)-accelerated quantum chemistry package to employ OpenCL compute kernels, which can be executed on a wide range of computing devices like CPUs, Intel Xeon Phi, and AMD GPUs. Here, we focus on the use of AMD GPUs and discuss differences as compared to CUDA-based calculations on NVIDIA GPUs. First illustrative timings are presented for hybrid density functional theory calculations using serial as well as parallel compute environments. The results show that AMD GPUs are as fast or faster than comparable NVIDIA GPUs and provide a viable alternative for quantum chemical applications.

  15. Real-Time Use of GPUs in NA62 Experiment

    CERN Document Server

    Pantaleo, F; Innocente, V; Lamanna, G; Sozzi, M

    2012-01-01

    We describe a pilot project for the use of GPUs in a real-time triggering application in the early trigger stages at the CERN NA62 experiment, and the results of the first field tests together with a prototype data acquisition (DAQ) system. This pilot project within NA62 aims at integrating GPUs into the central L0 trigger processor, and also to use them as fast online processors for computing trigger primitives. Several TDC-equipped sub-detectors with sub-nanosecond time resolution will participate in the first-level NA62 trigger (L0), fully integrated with the data-acquisition system, to reduce the readout rate of all sub-detectors to 1 MHz, using multiplicity information asynchronously computed over time frames of a few ns, both for positive sub-detectors and for vetos. The online use of GPUs would allow the computation of more complex trigger primitives already at this first trigger level. We describe the architectures of the proposed systems, focusing on measuring the performance (both throughput and l...

  16. Research and Development of the General-Purpose Computation on GPUs%GPGPU技术研究与发展

    Institute of Scientific and Technical Information of China (English)

    林一松; 唐玉华; 唐滔

    2011-01-01

    半导体工艺的发展使得芯片上集成的晶体管数目不断增加,图形处理器的存储和计算能力也越来越强大.目前,GPU的峰值运算能力已经远远超出主流的CPU,它在非图形计算领域,特别是高性能计算领域的潜力已经引起越来越多研究者的关注.本文介绍了GPU用于通用计算的原理以及目前学 术界和产业界关于GPGPU体系结构和编程模型方面的最新研究成果.%With the development of the semiconductor technology? The number of transistors integrated on a chip keeps increasing. Consequently, the computation and memory capacity of graphics processing units improve rapidly. So far, the floating-point computing capacity of GPUs has greatly exceeded that of CPUs, and the potential of GPUs in the non-graphic computing field, especially in high performance computing, has attracted more and more researchers' attention. This paper gives an introduction to the principles of the general purpose computation on GPUs and the latest research results about architecture and the programming model of GPGPU from both the research community and industry.

  17. QCD simulations with staggered fermions on GPUs

    CERN Document Server

    Bonati, C; D'Elia, M; Incardona, P

    2011-01-01

    We report on our implementation of the RHMC algorithm for the simulation of lattice QCD with two staggered flavors on Graphics Processing Units, using the NVIDIA CUDA programming language. The main feature of our code is that the GPU is not used just as an accelerator, but instead the whole Molecular Dynamics trajectory is performed on it. After pointing out the main bottlenecks and how to circumvent them, we discuss the obtained performances. We present some preliminary results regarding OpenCL and multiGPU extensions of our code and discuss future perspectives.

  18. Opinion Poll of Information Unit managers of Central Libraries of Government Universities within Metro-Tehran regarding Services offered by Rose System and Nasim Iman

    Directory of Open Access Journals (Sweden)

    Zahra Rahimi

    2008-04-01

    Full Text Available Using a descriptive survey methodology, the data was collected using a questionnaire and calling on the Information and Acquision unit managers in 15 government universities in metro-Tehran. Findings indicate that the population studied utilize the services, databases and e-journals offered by both rose system and nasim iman. Furthermore they are all partake in consortium purchases. Average means for Rose system was 1.78 and for Nasim Iman was 1.77 indicating relative satisfaction with services. Among the principle reasons for dissatisfaction were irregular increase of annual subscription fees to electronic resources, and failure to fully uphold the terms of contracts between these companies and the universities. To resolve these issues, the companies need first to draft a suitable policy for offering services fitting the needs of universities. These services should be offered at reasonable price. Secondly the relevant powers-that-be must facilitate the entry of third party agents into the competition or alternatively lay in the grounds to the universities to directly engage in consortium purchase from foreign publishers.

  19. Triggering events with GPUs at ATLAS

    CERN Document Server

    Kama, Sami; The ATLAS collaboration

    2015-01-01

    The growing complexity of events produced in LHC collisions demands more and more computing power both for the online selection and for the offline reconstruction of events. In recent years, the explosive performance growth of massively parallel processors like Graphics Processing Units~(GPU) both in computing power and in low energy consumption, make GPU extremely attractive for using them in a complex high energy experiment like ATLAS. Together with the optimization of reconstruction algorithms this new massively parallel paradigm is exploited. For this purpose a small scale prototype of the full ATLAS High Level Trigger involving GPU has been implemented. We discuss the integration procedure of this prototype, the achieved performance and the prospects for the future

  20. Optimizing performance per watt on GPUs in High Performance Computing: temperature, frequency and voltage effects

    CERN Document Server

    Price, D C; Barsdell, B R; Babich, R; Greenhill, L J

    2014-01-01

    The magnitude of the real-time digital signal processing challenge attached to large radio astronomical antenna arrays motivates use of high performance computing (HPC) systems. The need for high power efficiency (performance per watt) at remote observatory sites parallels that in HPC broadly, where efficiency is an emerging critical metric. We investigate how the performance per watt of graphics processing units (GPUs) is affected by temperature, core clock frequency and voltage. Our results highlight how the underlying physical processes that govern transistor operation affect power efficiency. In particular, we show experimentally that GPU power consumption grows non-linearly with both temperature and supply voltage, as predicted by physical transistor models. We show lowering GPU supply voltage and increasing clock frequency while maintaining a low die temperature increases the power efficiency of an NVIDIA K20 GPU by up to 37-48% over default settings when running xGPU, a compute-bound code used in radio...

  1. Planar Near-Field Phase Retrieval Using GPUs for Accurate THz Far-Field Prediction

    Science.gov (United States)

    Junkin, Gary

    2013-04-01

    With a view to using Phase Retrieval to accurately predict Terahertz antenna far-field from near-field intensity measurements, this paper reports on three fundamental advances that achieve very low algorithmic error penalties. The first is a new Gaussian beam analysis that provides accurate initial complex aperture estimates including defocus and astigmatic phase errors, based only on first and second moment calculations. The second is a powerful noise tolerant near-field Phase Retrieval algorithm that combines Anderson's Plane-to-Plane (PTP) with Fienup's Hybrid-Input-Output (HIO) and Successive Over-Relaxation (SOR) to achieve increased accuracy at reduced scan separations. The third advance employs teraflop Graphical Processing Units (GPUs) to achieve practically real time near-field phase retrieval and to obtain the optimum aperture constraint without any a priori information.

  2. Accelerating Dissipative Particle Dynamics Simulations on GPUs: Algorithms, Numerics and Applications

    CERN Document Server

    Tang, Yu-Hang

    2013-01-01

    We present a scalable dissipative particle dynamics simulation code, fully implemented on the Graphics Processing Units (GPUs) using a hybrid CUDA/MPI programming model, which achieves 10-30 times speedup on a single GPU over 16 CPU cores and almost linear weak scaling across a thousand nodes. A unified framework is developed within which the efficient generation of the neighbor list and maintaining particle data locality are addressed. Our algorithm generates strictly ordered neighbor lists in parallel, while the construction is deterministic and makes no use of atomic operations or sorting. Such neighbor list leads to optimal data loading efficiency when combined with a two-level particle reordering scheme. A faster in situ generation scheme for Gaussian random numbers is proposed using precomputed binary signatures. We designed custom transcendental functions that are fast and accurate for evaluating the pairwise interaction. The correctness and accuracy of the code is verified through a set of test cases ...

  3. Successes and Challenges Porting Weather and Climate Models to GPUs

    Science.gov (United States)

    Govett, M. W.; Middlecoff, J.; Henderson, T. B.; Rosinski, J.; Madden, P.

    2011-12-01

    NOAA ESRL has had significant success parallelizing and running the Non-Hydrostatic Icosahedral Model (NIM) dynamical core on GPUs. A key ingredient in the success was the development of our Fortran-to-CUDA compiler (called F2C-ACC) to convert the model code. Compiler directives, inserted by the user, define regions of code to be run on the GPU, identify where fine-grain parallelism can be exploited, and manage data transfers between CPU and GPU. In 2009, we demonstrated that our compiler, with limited analysis capabilities, was able to produce code that ran the NIM 25x faster on a single GPU than a similar generation CPU. As F2C-ACC matured, fewer hand-translations were required until the GPU parallelization of NIM became fully automatic. The usefulness of F2C-ACC as a language translation tool will diminish as commercial compilers from CAPS, PGI and Cray mature; however, porting codes to GPUs will continue to require significant user involvement due to limited tools to support parallelization. Code inspection and analysis is currently very challenging and requires heavy user involvement to parallelize, debug, and achieve respectable speedup on GPUs. Users must inspect their code to locate fine grain parallelism, determine performance bottlenecks, manage data transfers, identify data dependencies, place inter-GPU communications, and manage a myriad of other issues in porting CPU-based codes to GPU architectures. This talk will describe the F2C-ACC compiler, discuss code porting challenges, and describe further development of the analysis capabilities of F2C-ACC to improve GPU parallelization of Fortran-based, Numerical Weather Prediction codes.

  4. Sapporo: N-body simulation library for GPUs

    Science.gov (United States)

    Gaburov, Evghenii; Harfst, Stefan; Portegies Zwart, Simon

    2012-10-01

    Sapporo mimics the behavior of GRAPE hardware and uses the GPU to perform high-precision gravitational N-body simulations. It makes use of CUDA and therefore only works on NVIDIA GPUs. N-body codes currently running on GRAPE-6 can switch to Sapporo by a simple relinking of the library. Sapporo's precision is comparable to that of GRAPE-6, even though internally the GPU hardware is limited to single precision arithmetics. This limitation is effectively overcome by emulating double precision for calculating the distance between particles.

  5. Conjugate gradient solvers on Intel Xeon Phi and NVIDIA GPUs

    CERN Document Server

    Kaczmarek, O; Steinbrecher, P; Wagner, M

    2014-01-01

    Lattice Quantum Chromodynamics simulations typically spend most of the runtime in inversions of the Fermion Matrix. This part is therefore frequently optimized for various HPC architectures. Here we compare the performance of the Intel Xeon Phi to current Kepler-based NVIDIA Tesla GPUs running a conjugate gradient solver. By exposing more parallelism to the accelerator through inverting multiple vectors at the same time, we obtain a performance greater than 300 GFlop/s on both architectures. This more than doubles the performance of the inversions. We also give a short overview of the Knights Corner architecture, discuss some details of the implementation and the effort required to obtain the achieved performance.

  6. Solving Classification Problems Using Genetic Programming Algorithms on GPUs

    Science.gov (United States)

    Cano, Alberto; Zafra, Amelia; Ventura, Sebastián

    Genetic Programming is very efficient in problem solving compared to other proposals but its performance is very slow when the size of the data increases. This paper proposes a model for multi-threaded Genetic Programming classification evaluation using a NVIDIA CUDA GPUs programming model to parallelize the evaluation phase and reduce computational time. Three different well-known Genetic Programming classification algorithms are evaluated using the parallel evaluation model proposed. Experimental results using UCI Machine Learning data sets compare the performance of the three classification algorithms in single and multithreaded Java, C and CUDA GPU code. Results show that our proposal is much more efficient.

  7. Accelerating Scientific Applications using High Performance Dense and Sparse Linear Algebra Kernels on GPUs

    KAUST Repository

    Abdelfattah, Ahmad

    2015-01-15

    High performance computing (HPC) platforms are evolving to more heterogeneous configurations to support the workloads of various applications. The current hardware landscape is composed of traditional multicore CPUs equipped with hardware accelerators that can handle high levels of parallelism. Graphical Processing Units (GPUs) are popular high performance hardware accelerators in modern supercomputers. GPU programming has a different model than that for CPUs, which means that many numerical kernels have to be redesigned and optimized specifically for this architecture. GPUs usually outperform multicore CPUs in some compute intensive and massively parallel applications that have regular processing patterns. However, most scientific applications rely on crucial memory-bound kernels and may witness bottlenecks due to the overhead of the memory bus latency. They can still take advantage of the GPU compute power capabilities, provided that an efficient architecture-aware design is achieved. This dissertation presents a uniform design strategy for optimizing critical memory-bound kernels on GPUs. Based on hierarchical register blocking, double buffering and latency hiding techniques, this strategy leverages the performance of a wide range of standard numerical kernels found in dense and sparse linear algebra libraries. The work presented here focuses on matrix-vector multiplication kernels (MVM) as repre- sentative and most important memory-bound operations in this context. Each kernel inherits the benefits of the proposed strategies. By exposing a proper set of tuning parameters, the strategy is flexible enough to suit different types of matrices, ranging from large dense matrices, to sparse matrices with dense block structures, while high performance is maintained. Furthermore, the tuning parameters are used to maintain the relative performance across different GPU architectures. Multi-GPU acceleration is proposed to scale the performance on several devices. The

  8. Accelerating metagenomic read classification on CUDA-enabled GPUs.

    Science.gov (United States)

    Kobus, Robin; Hundt, Christian; Müller, André; Schmidt, Bertil

    2017-01-03

    Metagenomic sequencing studies are becoming increasingly popular with prominent examples including the sequencing of human microbiomes and diverse environments. A fundamental computational problem in this context is read classification; i.e. the assignment of each read to a taxonomic label. Due to the large number of reads produced by modern high-throughput sequencing technologies and the rapidly increasing number of available reference genomes software tools for fast and accurate metagenomic read classification are urgently needed. We present cuCLARK, a read-level classifier for CUDA-enabled GPUs, based on the fast and accurate classification of metagenomic sequences using reduced k-mers (CLARK) method. Using the processing power of a single Titan X GPU, cuCLARK can reach classification speeds of up to 50 million reads per minute. Corresponding speedups for species- (genus-)level classification range between 3.2 and 6.6 (3.7 and 6.4) compared to multi-threaded CLARK executed on a 16-core Xeon CPU workstation. cuCLARK can perform metagenomic read classification at superior speeds on CUDA-enabled GPUs. It is free software licensed under GPL and can be downloaded at https://github.com/funatiq/cuclark free of charge.

  9. A Decade-long Continental-Scale Convection-Resolving Climate Simulation on GPUs

    Science.gov (United States)

    Leutwyler, David; Fuhrer, Oliver; Lapillonne, Xavier; Lüthi, Daniel; Schär, Christoph

    2016-04-01

    The representation of moist convection in climate models represents a major challenge, due to the small scales involved. Convection-resolving models have proven to be very useful tools in numerical weather prediction and in climate research. Using horizontal grid spacings of O(1km), they allow to explicitly resolve deep convection leading to an improved representation of the water cycle. However, due to their extremely demanding computational requirements, they have so far been limited to short simulations and/or small computational domains. Innovations in the supercomputing domain have led to new supercomputer-designs that involve conventional multicore CPUs and accelerators such as graphics processing units (GPUs). One of the first atmospheric models that has been fully ported to GPUs is the Consortium for Small-Scale Modeling weather and climate model COSMO. This new version allows us to expand the size of the simulation domain to areas spanning continents and the time period up to one decade. We present results from a decade-long, convection-resolving climate simulation using the GPU-enabled COSMO version. The simulation is driven by the ERA-interim reanalysis. The results illustrate how the approach allows for the representation of interactions between synoptic-scale and meso-scale atmospheric circulations at scales ranging from 1000 to 10 km. We discuss the performance of the convection-resolving modeling approach on the European scale. Specifically we focus on the annual cycle of convection in Europe, on the organization of convective clouds and on the verification of hourly rainfall with various high resolution datasets.

  10. HISQ inverter on Intel Xeon Phi and NVIDIA GPUs

    CERN Document Server

    Kaczmarek, O; Steinbrecher, P; Mukherjee, Swagato; Wagner, M

    2014-01-01

    The runtime of a Lattice QCD simulation is dominated by a small kernel, which calculates the product of a vector by a sparse matrix known as the "Dslash" operator. Therefore, this kernel is frequently optimized for various HPC architectures. In this contribution we compare the performance of the Intel Xeon Phi to current Kepler-based NVIDIA Tesla GPUs running a conjugate gradient solver. By exposing more parallelism to the accelerator through inverting multiple vectors at the same time we obtain a performance 250 GFlop/s on both architectures. This more than doubles the performance of the inversions. We give a short overview of both architectures, discuss some details of the implementation and the effort required to obtain the achieved performance.

  11. A Framework for Lattice QCD Calculations on GPUs

    Energy Technology Data Exchange (ETDEWEB)

    Winter, Frank; Clark, M A; Edwards, Robert G; Joo, Balint

    2014-08-01

    Computing platforms equipped with accelerators like GPUs have proven to provide great computational power. However, exploiting such platforms for existing scientific applications is not a trivial task. Current GPU programming frameworks such as CUDA C/C++ require low-level programming from the developer in order to achieve high performance code. As a result porting of applications to GPUs is typically limited to time-dominant algorithms and routines, leaving the remainder not accelerated which can open a serious Amdahl's law issue. The lattice QCD application Chroma allows to explore a different porting strategy. The layered structure of the software architecture logically separates the data-parallel from the application layer. The QCD Data-Parallel software layer provides data types and expressions with stencil-like operations suitable for lattice field theory and Chroma implements algorithms in terms of this high-level interface. Thus by porting the low-level layer one can effectively move the whole application in one swing to a different platform. The QDP-JIT/PTX library, the reimplementation of the low-level layer, provides a framework for lattice QCD calculations for the CUDA architecture. The complete software interface is supported and thus applications can be run unaltered on GPU-based parallel computers. This reimplementation was possible due to the availability of a JIT compiler (part of the NVIDIA Linux kernel driver) which translates an assembly-like language (PTX) to GPU code. The expression template technique is used to build PTX code generators and a software cache manages the GPU memory. This reimplementation allows us to deploy an efficient implementation of the full gauge-generation program with dynamical fermions on large-scale GPU-based machines such as Titan and Blue Waters which accelerates the algorithm by more than an order of magnitude.

  12. Towards European-scale convection-resolving climate simulations with GPUs: a study with COSMO 4.19

    Science.gov (United States)

    Leutwyler, David; Fuhrer, Oliver; Lapillonne, Xavier; Lüthi, Daniel; Schär, Christoph

    2016-09-01

    The representation of moist convection in climate models represents a major challenge, due to the small scales involved. Using horizontal grid spacings of O(1km), convection-resolving weather and climate models allows one to explicitly resolve deep convection. However, due to their extremely demanding computational requirements, they have so far been limited to short simulations and/or small computational domains. Innovations in supercomputing have led to new hybrid node designs, mixing conventional multi-core hardware and accelerators such as graphics processing units (GPUs). One of the first atmospheric models that has been fully ported to these architectures is the COSMO (Consortium for Small-scale Modeling) model.Here we present the convection-resolving COSMO model on continental scales using a version of the model capable of using GPU accelerators. The verification of a week-long simulation containing winter storm Kyrill shows that, for this case, convection-parameterizing simulations and convection-resolving simulations agree well. Furthermore, we demonstrate the applicability of the approach to longer simulations by conducting a 3-month-long simulation of the summer season 2006. Its results corroborate the findings found on smaller domains such as more credible representation of the diurnal cycle of precipitation in convection-resolving models and a tendency to produce more intensive hourly precipitation events. Both simulations also show how the approach allows for the representation of interactions between synoptic-scale and meso-scale atmospheric circulations at scales ranging from 1000 to 10 km. This includes the formation of sharp cold frontal structures, convection embedded in fronts and small eddies, or the formation and organization of propagating cold pools. Finally, we assess the performance gain from using heterogeneous hardware equipped with GPUs relative to multi-core hardware. With the COSMO model, we now use a weather and climate model that

  13. Energy Efficient Iris Recognition With Graphics Processing Units

    National Research Council Canada - National Science Library

    Rakvic, Ryan; Broussard, Randy; Ngo, Hau

    2016-01-01

    .... In the past few years, however, this growth has slowed for central processing units (CPUs). Instead, there has been a shift to multicore computing, specifically with the general purpose graphic processing units (GPUs...

  14. Parallelized Kalman-Filter-Based Reconstruction of Particle Tracks on Many-Core Processors and GPUs

    Science.gov (United States)

    Cerati, Giuseppe; Elmer, Peter; Krutelyov, Slava; Lantz, Steven; Lefebvre, Matthieu; Masciovecchio, Mario; McDermott, Kevin; Riley, Daniel; Tadel, Matevž; Wittich, Peter; Würthwein, Frank; Yagil, Avi

    2017-08-01

    For over a decade now, physical and energy constraints have limited clock speed improvements in commodity microprocessors. Instead, chipmakers have been pushed into producing lower-power, multi-core processors such as Graphical Processing Units (GPU), ARM CPUs, and Intel MICs. Broad-based efforts from manufacturers and developers have been devoted to making these processors user-friendly enough to perform general computations. However, extracting performance from a larger number of cores, as well as specialized vector or SIMD units, requires special care in algorithm design and code optimization. One of the most computationally challenging problems in high-energy particle experiments is finding and fitting the charged-particle tracks during event reconstruction. This is expected to become by far the dominant problem at the High-Luminosity Large Hadron Collider (HL-LHC), for example. Today the most common track finding methods are those based on the Kalman filter. Experience with Kalman techniques on real tracking detector systems has shown that they are robust and provide high physics performance. This is why they are currently in use at the LHC, both in the trigger and offine. Previously we reported on the significant parallel speedups that resulted from our investigations to adapt Kalman filters to track fitting and track building on Intel Xeon and Xeon Phi. Here, we discuss our progresses toward the understanding of these processors and the new developments to port the Kalman filter to NVIDIA GPUs.

  15. Parallelized Kalman-Filter-Based Reconstruction of Particle Tracks on Many-Core Processors and GPUs

    Directory of Open Access Journals (Sweden)

    Cerati Giuseppe

    2017-01-01

    Full Text Available For over a decade now, physical and energy constraints have limited clock speed improvements in commodity microprocessors. Instead, chipmakers have been pushed into producing lower-power, multi-core processors such as Graphical Processing Units (GPU, ARM CPUs, and Intel MICs. Broad-based efforts from manufacturers and developers have been devoted to making these processors user-friendly enough to perform general computations. However, extracting performance from a larger number of cores, as well as specialized vector or SIMD units, requires special care in algorithm design and code optimization. One of the most computationally challenging problems in high-energy particle experiments is finding and fitting the charged-particle tracks during event reconstruction. This is expected to become by far the dominant problem at the High-Luminosity Large Hadron Collider (HL-LHC, for example. Today the most common track finding methods are those based on the Kalman filter. Experience with Kalman techniques on real tracking detector systems has shown that they are robust and provide high physics performance. This is why they are currently in use at the LHC, both in the trigger and offine. Previously we reported on the significant parallel speedups that resulted from our investigations to adapt Kalman filters to track fitting and track building on Intel Xeon and Xeon Phi. Here, we discuss our progresses toward the understanding of these processors and the new developments to port the Kalman filter to NVIDIA GPUs.

  16. Implementation Of Decoders for LDPC Block Codes and LDPC Convolutional Codes Based on GPUs

    CERN Document Server

    Zhao, Yue

    2012-01-01

    With the use of belief propagation (BP) decoding algorithm, low-density parity-check (LDPC) codes can achieve near-Shannon limit performance. LDPC codes can accomplish bit error rates (BERs) as low as $10^{-15}$ even at a small bit-energy-to-noise-power-spectral-density ratio ($E_{b}/N_{0}$). In order to evaluate the error performance of LDPC codes, simulators running on central processing units (CPUs) are commonly used. However, the time taken to evaluate LDPC codes with very good error performance is excessive. For example, assuming 30 iterations are used in the decoder, our simulation results have shown that it takes a modern CPU more than 7 days to arrive at a BER of 10^{-6} for a code with length 18360. In this paper, efficient LDPC block-code decoders/simulators which run on graphics processing units (GPUs) are proposed. Both standard BP decoding algorithm and layered decoding algorithm are used. We also implement the decoder for the LDPC convolutional codes (LDPCCC). The LDPCCC is derived from a pre-de...

  17. Implementation of the electron propagator to second order on GPUs to estimate the ionization potentials of confined atoms

    Science.gov (United States)

    García-Hernández, Erwin; Díaz-García, Cecilia; Vargas, Rubicelia; Garza, Jorge

    2014-09-01

    The best way to estimate ionization potentials (I) for confined atoms is by using the same Hamiltonian for the neutral and the corresponding hypothetical ionized atom. For this purpose, we have implemented the electron propagator to second order (EP2) by using parallel programming techniques on graphic processing units (GPUs). These programming techniques exploit the GPUs for the evaluation of two-electron integrals, which is required for a self- consistent process and because of the reduction involved in the four-index integral transformation. As an example, we present results for confined helium, beryllium and neon atoms, and these are contrasted with previously reported results. Although Koopmans’ theorem (KT) provides good estimates for ionization potentials, it is evident that EP2 corrects these estimates. Unfortunately, the correction made by EP2 does not reveal a trend for confined atoms because in the case of certain confinement regions KT overestimates, whereas for other regions, KT underestimates the ionization potential. The orbital crossing between unoccupied orbitals is responsible for this behavior. In particular, if the lowest unoccupied atomic orbital (LUMO) crosses a virtual orbital, the difference {{I}_{EP2}}-{{I}_{KT}} will change its sign. Thus, EP2 approximation is required when the ionization potential is estimated for confined atoms.

  18. Real-time radar signal processing using GPGPU (general-purpose graphic processing unit)

    Science.gov (United States)

    Kong, Fanxing; Zhang, Yan Rockee; Cai, Jingxiao; Palmer, Robert D.

    2016-05-01

    This study introduces a practical approach to develop real-time signal processing chain for general phased array radar on NVIDIA GPUs(Graphical Processing Units) using CUDA (Compute Unified Device Architecture) libraries such as cuBlas and cuFFT, which are adopted from open source libraries and optimized for the NVIDIA GPUs. The processed results are rigorously verified against those from the CPUs. Performance benchmarked in computation time with various input data cube sizes are compared across GPUs and CPUs. Through the analysis, it will be demonstrated that GPGPUs (General Purpose GPU) real-time processing of the array radar data is possible with relatively low-cost commercial GPUs.

  19. TESLA GPUs versus MPI with OpenMP for the Forward Modeling of Gravity and Gravity Gradient of Large Prisms Ensemble

    Directory of Open Access Journals (Sweden)

    Carlos Couder-Castañeda

    2013-01-01

    Full Text Available An implementation with the CUDA technology in a single and in several graphics processing units (GPUs is presented for the calculation of the forward modeling of gravitational fields from a tridimensional volumetric ensemble composed by unitary prisms of constant density. We compared the performance results obtained with the GPUs against a previous version coded in OpenMP with MPI, and we analyzed the results on both platforms. Today, the use of GPUs represents a breakthrough in parallel computing, which has led to the development of several applications with various applications. Nevertheless, in some applications the decomposition of the tasks is not trivial, as can be appreciated in this paper. Unlike a trivial decomposition of the domain, we proposed to decompose the problem by sets of prisms and use different memory spaces per processing CUDA core, avoiding the performance decay as a result of the constant calls to kernels functions which would be needed in a parallelization by observations points. The design and implementation created are the main contributions of this work, because the parallelization scheme implemented is not trivial. The performance results obtained are comparable to those of a small processing cluster.

  20. Coupled Vlasov and two-fluid codes on GPUs

    CERN Document Server

    Rieke, M; Grauer, R

    2014-01-01

    We present a way to combine Vlasov and two-fluid codes for the simulation of a collisionless plasma in large domains while keeping full information of the velocity distribution in localized areas of interest. This is made possible by solving the full Vlasov equation in one region while the remaining area is treated by a 5-moment two-fluid code. In such a treatment, the main challenge of coupling kinetic and fluid descriptions is the interchange of physically correct boundary conditions between the different plasma models. In contrast to other treatments, we do not rely on any specific form of the distribution function, e.g. a Maxwellian type. Instead, we combine an extrapolation of the distribution function and a correction of the moments based on the fluid data. Thus, throughout the simulation both codes provide the necessary boundary conditions for each other. A speed-up factor of around 20 is achieved by using GPUs for the computationally expensive solution of the Vlasov equation and an overall factor of a...

  1. Critical Points Based Register-Concurrency Autotuning for GPUs

    Energy Technology Data Exchange (ETDEWEB)

    Li, Ang; Song, Shuaiwen; Kumar, Akash; Zhang, Eddy; Chavarría-Miranda, Daniel; Corporaal, Henk

    2016-03-14

    The unprecedented prevalence of GPGPU is largely attributed to its abundant on-chip register resources, which allow massively concurrent threads and extremely fast context switch. However, due to internal memory capacity constraints, there is a tradeoff between the per-thread register usage and the overall concurrency. This becomes a design problem in terms of performance tuning, since the performance “sweet spot” which can be significantly affected by these two factors is generally unknown beforehand. In this paper, we propose an effective autotuning solution to quickly and efficiently select the optimal number of registers perthread for delivering the best GPU performance. Experiments on three generations of GPUs (Nvidia Fermi, Kepler and Maxwell) demonstrate that our simple strategy can achieve an average of 10% performance improvement while a max of 50% over the original version without modifying the user program. Additionally, to reduce local cache misses due to register spilling and further improve performance, we explore three optimization schemes (i.e. bypass L1 for global memory access, enlarge local L1 cache and spill into shared memory) and discuss their impact on performance on a Kepler GPU.

  2. Pragma Directed Shared Memory Centric Optimizations on GPUs

    Institute of Scientific and Technical Information of China (English)

    Jing Li; Lei Liu; Yuan Wu; Xiang-Hua Liu; Yi Gao; Xiao-Bing Feng; Cheng-Yong Wu

    2016-01-01

    GPUs become a ubiquitous choice as coprocessors since they have excellent ability in concurrent processing. In GPU architecture, shared memory plays a very important role in system performance as it can largely improve bandwidth utilization and accelerate memory operations. However, even for affine GPU applications that contain regular access patterns, optimizing for shared memory is not an easy work. It often requires programmer expertise and nontrivial parameter selection. Improper shared memory usage might even underutilize GPU resource. Even using state-of-the-art high level programming models (e.g., OpenACC and OpenHMPP), it is still hard to utilize shared memory since they lack inherent support in describing shared memory optimization and selecting suitable parameters, let alone maintaining high resource utilization. Targeting higher productivity for affine applications, we propose a data centric way to shared memory optimization on GPU. We design a pragma extension on OpenACC so as to convey data management hints of programmers to compiler. Meanwhile, we devise a compiler framework to automatically select optimal parameters for shared arrays, using the polyhedral model. We further propose optimization techniques to expose higher memory and instruction level parallelism. The experimental results show that our shared memory centric approaches effectively improve the performance of five typical GPU applications across four widely used platforms by 3.7x on average, and do not burden programmers with lots of pragmas.

  3. A Real-time Single Pulse Detection Algorithm for GPUs

    CERN Document Server

    Adámek, Karel

    2016-01-01

    The detection of non-repeating events in the radio spectrum has become an important area of study in radio astronomy over the last decade due to the discovery of fast radio bursts (FRBs). We have implemented a single pulse detection algorithm, for NVIDIA GPUs, which use boxcar filters of varying widths. Our code performs the calculation of standard deviation, matched filtering by using boxcar filters and thresholding based on the signal-to-noise ratio. We present our parallel implementation of our single pulse detection algorithm. Our GPU algorithm is approximately 17x faster than our current CPU OpenMP code (NVIDIA Titan XP vs Intel E5-2650v3). This code is part of the AstroAccelerate project which is a many-core accelerated time-domain signal processing code for radio astronomy. This work allows our AstroAccelerate code to perform a single pulse search on SKA-like data 4.3x faster than real-time.

  4. GPUs and Python: A Recipe for Lightning-Fast Data Pipelines

    Science.gov (United States)

    Warner, C.; Packham, C.; Eikenberry, S. S.; Gonzalez, A.

    2012-09-01

    As arrays increase their pixel numbers and mosaics of arrays become more prevalent, the volume of data being produced per night is increasing rapidly. As we look forward to the LSST era, where 30TB of data per night will be produced, streamlined and rapid data reduction processes are becoming critical. Recent developments in the computer industry have led to the production of Graphics Processing Units (GPUs) which can contain hundreds of processing cores, each of which can process hundreds of threads concurrently. Nvidia's Compute Unified Device Architecture (CUDA) platform has allowed developers to take advantage of these modern GPUs and design massively parallel algorithms which can provide huge speed-ups of up to around a factor of 100 over CPU implementations. Data pipelines are perfectly suited to reap the benefits of massive parallelization because many of the algorithms in data processing are performed on a per-pixel basis on ever larger sets of images. In addition, the PyCUDA (http://mathema.tician.de/software/pycuda) module and python native C-API allow for CUDA code to be easily integrated into python code. Python has continued to gain momentum in the astronomical community, particularly as an attractive alternative to IDL or C code for data pipelines. Thus, the ability to link GPU-optimized CUDA code directly into python allows for existing data pipeline frameworks to be reused with new parallel algorithms. We present the initial results of parallelizing many of the more CPU-intensive algorithms in the Florida Analysis Tool Born Of Yearning for high quality scientific data (FATBOY) and discuss the implications for the future of data pipelines. We use an Nvidia 580 GTX GPU for our tests and find that the 580 GTX produces a speed-up of anywhere from a factor of around 10 up to a factor of 300 over CPU implementations for individual routines. We believe that it is possible to obtain an overall pipeline speed gain of a factor of 10-25 over traditionally

  5. A Weighted Spatial-Spectral Kernel RX Algorithm and Efficient Implementation on GPUs

    Directory of Open Access Journals (Sweden)

    Chunhui Zhao

    2017-02-01

    Full Text Available The kernel RX (KRX detector proposed by Kwon and Nasrabadi exploits a kernel function to obtain a better detection performance. However, it still has two limits that can be improved. On the one hand, reasonable integration of spatial-spectral information can be used to further improve its detection accuracy. On the other hand, parallel computing can be used to reduce the processing time in available KRX detectors. Accordingly, this paper presents a novel weighted spatial-spectral kernel RX (WSSKRX detector and its parallel implementation on graphics processing units (GPUs. The WSSKRX utilizes the spatial neighborhood resources to reconstruct the testing pixels by introducing a spectral factor and a spatial window, thereby effectively reducing the interference of background noise. Then, the kernel function is redesigned as a mapping trick in a KRX detector to implement the anomaly detection. In addition, a powerful architecture based on the GPU technique is designed to accelerate WSSKRX. To substantiate the performance of the proposed algorithm, both synthetic and real data are conducted for experiments.

  6. A Weighted Spatial-Spectral Kernel RX Algorithm and Efficient Implementation on GPUs.

    Science.gov (United States)

    Zhao, Chunhui; Li, Jiawei; Meng, Meiling; Yao, Xifeng

    2017-02-23

    The kernel RX (KRX) detector proposed by Kwon and Nasrabadi exploits a kernel function to obtain a better detection performance. However, it still has two limits that can be improved. On the one hand, reasonable integration of spatial-spectral information can be used to further improve its detection accuracy. On the other hand, parallel computing can be used to reduce the processing time in available KRX detectors. Accordingly, this paper presents a novel weighted spatial-spectral kernel RX (WSSKRX) detector and its parallel implementation on graphics processing units (GPUs). The WSSKRX utilizes the spatial neighborhood resources to reconstruct the testing pixels by introducing a spectral factor and a spatial window, thereby effectively reducing the interference of background noise. Then, the kernel function is redesigned as a mapping trick in a KRX detector to implement the anomaly detection. In addition, a powerful architecture based on the GPU technique is designed to accelerate WSSKRX. To substantiate the performance of the proposed algorithm, both synthetic and real data are conducted for experiments.

  7. Porting marine ecosystem model spin-up using transport matrices to GPUs

    Directory of Open Access Journals (Sweden)

    E. Siewertsen

    2013-01-01

    Full Text Available We have ported an implementation of the spin-up for marine ecosystem models based on transport matrices to graphics processing units (GPUs. The original implementation was designed for distributed-memory architectures and uses the Portable, Extensible Toolkit for Scientific Computation (PETSc library that is based on the Message Passing Interface (MPI standard. The spin-up computes a steady seasonal cycle of ecosystem tracers with climatological ocean circulation data as forcing. Since the transport is linear with respect to the tracers, the resulting operator is represented by matrices. Each iteration of the spin-up involves two matrix-vector multiplications and the evaluation of the used biogeochemical model. The original code was written in C and Fortran. On the GPU, we use the Compute Unified Device Architecture (CUDA standard, a customized version of PETSc and a commercial CUDA Fortran compiler. We describe the extensions to PETSc and the modifications of the original C and Fortran codes that had to be done. Here we make use of freely available libraries for the GPU. We analyze the computational effort of the main parts of the spin-up for two exemplar ecosystem models and compare the overall computational time to those necessary on different CPUs. The results show that a consumer GPU can compete with a significant number of cluster CPUs without further code optimization.

  8. Power/Performance Trade-offs of Small Batched LU Based Solvers on GPUs

    Energy Technology Data Exchange (ETDEWEB)

    Villa, Oreste; Fatica, Massimiliano; Gawande, Nitin A.; Tumeo, Antonino

    2013-08-26

    In this paper we propose and analyze a set of batched linear solvers for small matrices on Graphic Processing Units (GPUs), evaluating the various alternatives depending on the size of the systems to solve. We discuss three different solutions that operate with different level of parallelization and GPU features. The first, exploiting the CUBLAS library, manages matrices of size up to 32x32 and employs Warp level (one matrix, one Warp) parallelism and shared memory. The second works at Thread-block level parallelism (one matrix, one Thread-block), still exploiting shared memory but managing matrices up to 76x76. The third is Thread level parallel (one matrix, one thread) and can reach sizes up to 128x128, but it does not exploit shared memory and only relies on the high memory bandwidth of the GPU. The first and second solution only support partial pivoting, the third one easily supports partial and full pivoting, making it attractive to problems that require greater numerical stability. We analyze the trade-offs in terms of performance and power consumption as function of the size of the linear systems that are simultaneously solved. We execute the three implementations on a Tesla M2090 (Fermi) and on a Tesla K20 (Kepler).

  9. Porting marine ecosystem model spin-up using transport matrices to GPUs

    Directory of Open Access Journals (Sweden)

    E. Siewertsen

    2012-07-01

    Full Text Available We have ported an implementation of the spin-up for marine ecosystem models based on the "Transport Matrix Method" to graphics processing units (GPUs. The original implementation was designed for distributed-memory architectures and uses the PETSc library that is based on the "Message Passing Interface (MPI" standard. The spin-up computes a steady seasonal cycle of the ecosystem tracers with climatological ocean circulation data as forcing. Since the transport is linear with respect to the tracers, the resulting operator is represented in so-called "transport matrices". Each iteration of the spin-up involves two matrix-vector multiplications and the evaluation of the used biogeochemical model. The original code was written in C and Fortran. On the GPU, we use the CUDA standard, a specialized version of the PETSc toolkit and a CUDA Fortran compiler. We describe the extensions to PETSc and the modifications of the original C and Fortran codes that had to be done. Here we make use of freely available libraries for the GPU. We analyze the computational effort of the main parts of the spin-up for two exemplary ecosystem models and compare the overall computational time to those necessary on different CPUs. The results show that a consumer GPU can beat a significant number of cluster CPUs without further code optimization.

  10. Accelerating dissipative particle dynamics simulations on GPUs: Algorithms, numerics and applications

    Science.gov (United States)

    Tang, Yu-Hang; Karniadakis, George Em

    2014-11-01

    We present a scalable dissipative particle dynamics simulation code, fully implemented on the Graphics Processing Units (GPUs) using a hybrid CUDA/MPI programming model, which achieves 10-30 times speedup on a single GPU over 16 CPU cores and almost linear weak scaling across a thousand nodes. A unified framework is developed within which the efficient generation of the neighbor list and maintaining particle data locality are addressed. Our algorithm generates strictly ordered neighbor lists in parallel, while the construction is deterministic and makes no use of atomic operations or sorting. Such neighbor list leads to optimal data loading efficiency when combined with a two-level particle reordering scheme. A faster in situ generation scheme for Gaussian random numbers is proposed using precomputed binary signatures. We designed custom transcendental functions that are fast and accurate for evaluating the pairwise interaction. The correctness and accuracy of the code is verified through a set of test cases simulating Poiseuille flow and spontaneous vesicle formation. Computer benchmarks demonstrate the speedup of our implementation over the CPU implementation as well as strong and weak scalability. A large-scale simulation of spontaneous vesicle formation consisting of 128 million particles was conducted to further illustrate the practicality of our code in real-world applications.

  11. Applying graphics processor units to Monte Carlo dose calculation in radiation therapy

    Directory of Open Access Journals (Sweden)

    Bakhtiari M

    2010-01-01

    Full Text Available We investigate the potential in using of using a graphics processor unit (GPU for Monte-Carlo (MC-based radiation dose calculations. The percent depth dose (PDD of photons in a medium with known absorption and scattering coefficients is computed using a MC simulation running on both a standard CPU and a GPU. We demonstrate that the GPU′s capability for massive parallel processing provides a significant acceleration in the MC calculation, and offers a significant advantage for distributed stochastic simulations on a single computer. Harnessing this potential of GPUs will help in the early adoption of MC for routine planning in a clinical environment.

  12. Massive parallelization of a 3D finite difference electromagnetic forward solution using domain decomposition methods on multiple CUDA enabled GPUs

    Science.gov (United States)

    Schultz, A.

    2010-12-01

    3D forward solvers lie at the core of inverse formulations used to image the variation of electrical conductivity within the Earth's interior. This property is associated with variations in temperature, composition, phase, presence of volatiles, and in specific settings, the presence of groundwater, geothermal resources, oil/gas or minerals. The high cost of 3D solutions has been a stumbling block to wider adoption of 3D methods. Parallel algorithms for modeling frequency domain 3D EM problems have not achieved wide scale adoption, with emphasis on fairly coarse grained parallelism using MPI and similar approaches. The communications bandwidth as well as the latency required to send and receive network communication packets is a limiting factor in implementing fine grained parallel strategies, inhibiting wide adoption of these algorithms. Leading Graphics Processor Unit (GPU) companies now produce GPUs with hundreds of GPU processor cores per die. The footprint, in silicon, of the GPU's restricted instruction set is much smaller than the general purpose instruction set required of a CPU. Consequently, the density of processor cores on a GPU can be much greater than on a CPU. GPUs also have local memory, registers and high speed communication with host CPUs, usually through PCIe type interconnects. The extremely low cost and high computational power of GPUs provides the EM geophysics community with an opportunity to achieve fine grained (i.e. massive) parallelization of codes on low cost hardware. The current generation of GPUs (e.g. NVidia Fermi) provides 3 billion transistors per chip die, with nearly 500 processor cores and up to 6 GB of fast (DDR5) GPU memory. This latest generation of GPU supports fast hardware double precision (64 bit) floating point operations of the type required for frequency domain EM forward solutions. Each Fermi GPU board can sustain nearly 1 TFLOP in double precision, and multiple boards can be installed in the host computer system. We

  13. Relationship between Training Programs being Offered in State and Federal Penal Institutions and the Unfilled Job Openings in the Major Occupations in the United States.

    Science.gov (United States)

    Torrence, John Thomas

    Excluding military installations, training programs in state and federal penal institutions were surveyed, through a mailed checklist, to test the hypotheses that (1) training programs in penal institutions were not related to the unfilled job openings by major occupations in the United States, and (2) that training programs reported would have a…

  14. Comparison of availability and offer of controlled physical activities for pupils with disability in selected regions of Czech Republic and United Kingdom

    Directory of Open Access Journals (Sweden)

    Zuzana Kornatovská

    2016-11-01

    Full Text Available Background: The European Union and other countries of the world need quality research data, without which it cannot assess how the overall situation of persons with disabilities develops. Objective: The aim of this paper is to compare the availability of controlled physical activities for pupils with mental disabilities, hearing or visual disabilities in selected regions of the Czech Republic and Great Britain (region of South Bohemia and the region West Midlands. Partial aim is to analyse an offer of the controlled physical activities for this population of pupils. Methods: We used analytical investigative methods (Investigative pentagram. The survey was based on explanation, exploration and direct observation. Another method was a tool "ArcGIS" evaluating the distance of polygons and "packaging zones" by driving times for the EU - Index of availability. After that it was determined the availability of controlled physical activities for pupils with the observed types of disability and evaluated the hypothesis H1. In surveyed regions were also examined ways of organizing a range of offers of controlled physical activities for pupils with disabilities. Results: The range of controlled physical activities was verified higher in the West Midlands region compared to the South Bohemian region. It was found that the British region unlike the South Bohemian region accentuated non-confrontational character of the controlled physical activities with health preventive impact (yoga, swimming and social integration (dancing, walking and hiking. Conclusions: It was verified the hypothesis H1, assuming that the availability of controlled physical activities for pupils with mental disabilities, hearing, visual is significantly higher in the surveyed region of the UK compared to the surveyed region in the Czech Republic.

  15. Offer/Acceptance Ratio.

    Science.gov (United States)

    Collins, Mimi

    1997-01-01

    Explores how human resource professionals, with above average offer/acceptance ratios, streamline their recruitment efforts. Profiles company strategies with internships, internal promotion, cooperative education programs, and how to get candidates to accept offers. Also discusses how to use the offer/acceptance ratio as a measure of program…

  16. BROCCOLI: Software for Fast fMRI Analysis on Many-Core CPUs and GPUs

    Directory of Open Access Journals (Sweden)

    Anders eEklund

    2014-03-01

    Full Text Available Analysis of functional magnetic resonance imaging (fMRI data is becoming ever more computationally demanding as temporal and spatial resolutions improve, and large, publicly available data sets proliferate. Moreover, methodological improvements in the neuroimaging pipeline, such as non-linear spatial normalization, non-parametric permutation tests and Bayesian Markov Chain Monte Carlo approaches, can dramatically increase the computational burden. Despite these challenges, there do not yet exist any fMRI software packages which leverage inexpensive and powerful graphics processing units (GPUs to perform these analyses. Here, we therefore present BROCCOLI, a free software package written in OpenCL (Open Computing Language that can be used for parallel analysis of fMRI data on a large variety of hardware configurations. BROCCOLI has, for example, been tested with an Intel CPU, an Nvidia GPU and an AMD GPU. These tests show that parallel processing of fMRI data can lead to significantly faster analysis pipelines. This speedup can be achieved on relatively standard hardware, but further, dramatic speed improvements require only a modest investment in GPU hardware. BROCCOLI (running on a GPU can perform non-linear spatial normalization to a 1 mm3 brain template in 4-6 seconds, and run a second level permutation test with 10,000 permutations in about a minute. These non-parametric tests are generally more robust than their parametric counterparts, and can also enable more sophisticated analyses by estimating complicated null distributions. Additionally, BROCCOLI includes support for Bayesian first-level fMRI analysis using a Gibbs sampler. The new software is freely available under GNU GPL3 and can be downloaded from github (https://github.com/wanderine/BROCCOLI/.

  17. BROCCOLI: Software for fast fMRI analysis on many-core CPUs and GPUs.

    Science.gov (United States)

    Eklund, Anders; Dufort, Paul; Villani, Mattias; Laconte, Stephen

    2014-01-01

    Analysis of functional magnetic resonance imaging (fMRI) data is becoming ever more computationally demanding as temporal and spatial resolutions improve, and large, publicly available data sets proliferate. Moreover, methodological improvements in the neuroimaging pipeline, such as non-linear spatial normalization, non-parametric permutation tests and Bayesian Markov Chain Monte Carlo approaches, can dramatically increase the computational burden. Despite these challenges, there do not yet exist any fMRI software packages which leverage inexpensive and powerful graphics processing units (GPUs) to perform these analyses. Here, we therefore present BROCCOLI, a free software package written in OpenCL (Open Computing Language) that can be used for parallel analysis of fMRI data on a large variety of hardware configurations. BROCCOLI has, for example, been tested with an Intel CPU, an Nvidia GPU, and an AMD GPU. These tests show that parallel processing of fMRI data can lead to significantly faster analysis pipelines. This speedup can be achieved on relatively standard hardware, but further, dramatic speed improvements require only a modest investment in GPU hardware. BROCCOLI (running on a GPU) can perform non-linear spatial normalization to a 1 mm(3) brain template in 4-6 s, and run a second level permutation test with 10,000 permutations in about a minute. These non-parametric tests are generally more robust than their parametric counterparts, and can also enable more sophisticated analyses by estimating complicated null distributions. Additionally, BROCCOLI includes support for Bayesian first-level fMRI analysis using a Gibbs sampler. The new software is freely available under GNU GPL3 and can be downloaded from github (https://github.com/wanderine/BROCCOLI/).

  18. Adventures in the microlensing cloud: Large datasets, eResearch tools, and GPUs

    Science.gov (United States)

    Vernardos, G.; Fluke, C. J.

    2014-10-01

    As astronomy enters the petascale data era, astronomers are faced with new challenges relating to storage, access and management of data. A shift from the traditional approach of combining data and analysis at the desktop to the use of remote services, pushing the computation to the data, is now underway. In the field of cosmological gravitational microlensing, future synoptic all-sky surveys are expected to bring the number of multiply imaged quasars from the few tens that are currently known to a few thousands. This inflow of observational data, together with computationally demanding theoretical modeling via the production of microlensing magnification maps, requires a new approach. We present our technical solutions to supporting the GPU-Enabled, High Resolution cosmological MicroLensing parameter survey (GERLUMPH). This extensive dataset for cosmological microlensing modeling comprises over 70 000 individual magnification maps and ˜106 related results. We describe our approaches to hosting, organizing, and serving ˜ 30 TB of data and metadata products. We present a set of online analysis tools developed with PHP, JavaScript and WebGL to support access and analysis of GELRUMPH data in a Web browser. We discuss our use of graphics processing units (GPUs) to accelerate data production, and we release the core of the GPU-D direct inverse ray-shooting code (Thompson et al., 2010, 2014) used to generate the magnification maps. All of the GERLUMPH data and tools are available online from http://gerlumph.swin.edu.au. This project made use of gSTAR, the GPU Supercomputer for Theoretical Astrophysical Research.

  19. Protein alignment algorithms with an efficient backtracking routine on multiple GPUs

    Directory of Open Access Journals (Sweden)

    Kierzynka Michal

    2011-05-01

    Full Text Available Abstract Background Pairwise sequence alignment methods are widely used in biological research. The increasing number of sequences is perceived as one of the upcoming challenges for sequence alignment methods in the nearest future. To overcome this challenge several GPU (Graphics Processing Unit computing approaches have been proposed lately. These solutions show a great potential of a GPU platform but in most cases address the problem of sequence database scanning and computing only the alignment score whereas the alignment itself is omitted. Thus, the need arose to implement the global and semiglobal Needleman-Wunsch, and Smith-Waterman algorithms with a backtracking procedure which is needed to construct the alignment. Results In this paper we present the solution that performs the alignment of every given sequence pair, which is a required step for progressive multiple sequence alignment methods, as well as for DNA recognition at the DNA assembly stage. Performed tests show that the implementation, with performance up to 6.3 GCUPS on a single GPU for affine gap penalties, is very efficient in comparison to other CPU and GPU-based solutions. Moreover, multiple GPUs support with load balancing makes the application very scalable. Conclusions The article shows that the backtracking procedure of the sequence alignment algorithms may be designed to fit in with the GPU architecture. Therefore, our algorithm, apart from scores, is able to compute pairwise alignments. This opens a wide range of new possibilities, allowing other methods from the area of molecular biology to take advantage of the new computational architecture. Performed tests show that the efficiency of the implementation is excellent. Moreover, the speed of our GPU-based algorithms can be almost linearly increased when using more than one graphics card.

  20. Validation of EMMS-based drag model using lattice Boltzmann simulations on GPUs

    Institute of Scientific and Technical Information of China (English)

    Yun Zhanga; Wei Ge; Xiao wei Wang; Chao he Yang

    2011-01-01

    Interphase momentum transport in heterogeneous gas-solid systems with multi-scale structure is of great importance in process engineering.In this article,lattice Boltzmann simulations are performed on graphics processing units (GPUs),the computational power of which exceeds that of CPUs by more than one order of magnitude,to investigate incompressible Newtonian flow in idealized multi-scale particle-fluid systems.The structure consists of a periodic array of clusters,each constructed by a bundle of cylinders.Fixed pressure boundary condition is implemented by applying a constant body force to the flow through the medium.The bounce-back scheme is adopted on the fluid-solid interfaces,which ensures the no-slip boundary condition.The structure is studied under a wide range of particle diameters and packing fractions,and the drag coefficient of the structure is found to be a function of voidages and fractions of the clusters,besides the traditional Reynolds number and the solid volume fractions.Parameters reflecting multi-scale characters are,therefore,demonstrated to be necessary in quantifying the drag force of heterogeneous gas-solid system.The numerical results in the range 0.1 ≤ Re ≤ 10 and 0 < φ < 0.25are compared with Wen and Yu's correlation,Gibilaro equation,EMMS-based drag model,the Beetstra correlation and the Benyahia correlation,and good agreement is found between the simulations and the EMMS-based drag model for heterogeneous systems.

  1. BROCCOLI: Software for fast fMRI analysis on many-core CPUs and GPUs

    Science.gov (United States)

    Eklund, Anders; Dufort, Paul; Villani, Mattias; LaConte, Stephen

    2014-01-01

    Analysis of functional magnetic resonance imaging (fMRI) data is becoming ever more computationally demanding as temporal and spatial resolutions improve, and large, publicly available data sets proliferate. Moreover, methodological improvements in the neuroimaging pipeline, such as non-linear spatial normalization, non-parametric permutation tests and Bayesian Markov Chain Monte Carlo approaches, can dramatically increase the computational burden. Despite these challenges, there do not yet exist any fMRI software packages which leverage inexpensive and powerful graphics processing units (GPUs) to perform these analyses. Here, we therefore present BROCCOLI, a free software package written in OpenCL (Open Computing Language) that can be used for parallel analysis of fMRI data on a large variety of hardware configurations. BROCCOLI has, for example, been tested with an Intel CPU, an Nvidia GPU, and an AMD GPU. These tests show that parallel processing of fMRI data can lead to significantly faster analysis pipelines. This speedup can be achieved on relatively standard hardware, but further, dramatic speed improvements require only a modest investment in GPU hardware. BROCCOLI (running on a GPU) can perform non-linear spatial normalization to a 1 mm3 brain template in 4–6 s, and run a second level permutation test with 10,000 permutations in about a minute. These non-parametric tests are generally more robust than their parametric counterparts, and can also enable more sophisticated analyses by estimating complicated null distributions. Additionally, BROCCOLI includes support for Bayesian first-level fMRI analysis using a Gibbs sampler. The new software is freely available under GNU GPL3 and can be downloaded from github (https://github.com/wanderine/BROCCOLI/). PMID:24672471

  2. Auto-tuning Dense Vector and Matrix-vector Operations for Fermi GPUs

    DEFF Research Database (Denmark)

    Sørensen, Hans Henrik Brandenborg

    2012-01-01

    In this paper, we consider the automatic performance tuning of dense vector and matrix-vector operations on GPUs. Such operations form the backbone of level 1 and level 2 routines in the Basic Linear Algebra Subroutines (BLAS) library and are therefore of great importance in many scientific...

  3. Open problems in CEM: Porting an explicit time-domain volume-integral- equation solver on GPUs with OpenACC

    KAUST Repository

    Ergül, Özgür

    2014-04-01

    Graphics processing units (GPUs) are gradually becoming mainstream in high-performance computing, as their capabilities for enhancing performance of a large spectrum of scientific applications to many fold when compared to multi-core CPUs have been clearly identified and proven. In this paper, implementation and performance-tuning details for porting an explicit marching-on-in-time (MOT)-based time-domain volume-integral-equation (TDVIE) solver onto GPUs are described in detail. To this end, a high-level approach, utilizing the OpenACC directive-based parallel programming model, is used to minimize two often-faced challenges in GPU programming: developer productivity and code portability. The MOT-TDVIE solver code, originally developed for CPUs, is annotated with compiler directives to port it to GPUs in a fashion similar to how OpenMP targets multi-core CPUs. In contrast to CUDA and OpenCL, where significant modifications to CPU-based codes are required, this high-level approach therefore requires minimal changes to the codes. In this work, we make use of two available OpenACC compilers, CAPS and PGI. Our experience reveals that different annotations of the code are required for each of the compilers, due to different interpretations of the fairly new standard by the compiler developers. Both versions of the OpenACC accelerated code achieved significant performance improvements, with up to 30× speedup against the sequential CPU code using recent hardware technology. Moreover, we demonstrated that the GPU-accelerated fully explicit MOT-TDVIE solver leveraged energy-consumption gains of the order of 3× against its CPU counterpart. © 2014 IEEE.

  4. Solving systems of hyperbolic PDEs using multiple GPUs

    OpenAIRE

    Sætra, Martin Lilleeng

    2007-01-01

    This thesis spans several research areas, where the main topics being parallel programming based on message-passing, general-purpose computation on graphics processing units (GPGPU), numerical simulations, and domain decomposition. The graphics processing unit (GPU) on modern graphics adapters is an inexpensive source of wast parallel computing power. To harvest this power, general purpose graphics programming is used. The main agenda of the thesis is to make a case for GPU clusters. Numerica...

  5. Massively parallel Wang-Landau sampling on multiple GPUs

    Science.gov (United States)

    Yin, Junqi; Landau, D. P.

    2012-08-01

    Wang-Landau sampling is implemented on the Graphics Processing Unit (GPU) with the Compute Unified Device Architecture (CUDA). Performances on three different GPU cards, including the new generation Fermi architecture card, are compared with that on a Central Processing Unit (CPU). The parameters for massively parallel Wang-Landau sampling are tuned in order to achieve fast convergence. For simulations of the water cluster systems, we obtain an average of over 50 times speedup for a given workload.

  6. Initial Public Offering

    OpenAIRE

    Veselý, Marek

    2009-01-01

    Thesis describes initial public offering on the stock markets. There are mentioned basic phases of this process. In this thesis is named pros & cons of this source of financing. Recommends also other ways how to gain capital for own company business acitivities. Thesis is interested about main conditions for successfull "going public". Initial Public Offering of bonds is described too. Practical part of this thesis is concern IPO in the Czech Republic -- historical data, IPO in the past on Pr...

  7. Direct Numerical Simulation and Large Eddy Simulation on a Turbulent Wall-Bounded Flow Using Lattice Boltzmann Method and Multiple GPUs

    Directory of Open Access Journals (Sweden)

    Xian Wang

    2014-01-01

    Full Text Available Direct numerical simulation (DNS and large eddy simulation (LES were performed on the wall-bounded flow at Reτ=180 using lattice Boltzmann method (LBM and multiple GPUs (Graphic Processing Units. In the DNS, 8 K20M GPUs were adopted. The maximum number of meshes is 6.7×107, which results in the nondimensional mesh size of Δ+=1.41 for the whole solution domain. It took 24 hours for GPU-LBM solver to simulate 3×106 LBM steps. The aspect ratio of resolution domain was tested to obtain accurate results for DNS. As a result, both the mean velocity and turbulent variables, such as Reynolds stress and velocity fluctuations, perfectly agree with the results of Kim et al. (1987 when the aspect ratios in streamwise and spanwise directions are 8 and 2, respectively. As for the LES, the local grid refinement technique was tested and then used. Using 1.76×106 grids and Smagorinsky constant (Cs=0.13, good results were obtained. The ability and validity of LBM on simulating turbulent flow were verified.

  8. Batched Triangular Dense Linear Algebra Kernels for Very Small Matrix Sizes on GPUs

    KAUST Repository

    Charara, Ali

    2017-03-06

    Batched dense linear algebra kernels are becoming ubiquitous in scientific applications, ranging from tensor contractions in deep learning to data compression in hierarchical low-rank matrix approximation. Within a single API call, these kernels are capable of simultaneously launching up to thousands of similar matrix computations, removing the expensive overhead of multiple API calls while increasing the occupancy of the underlying hardware. A challenge is that for the existing hardware landscape (x86, GPUs, etc.), only a subset of the required batched operations is implemented by the vendors, with limited support for very small problem sizes. We describe the design and performance of a new class of batched triangular dense linear algebra kernels on very small data sizes using single and multiple GPUs. By deploying two-sided recursive formulations, stressing the register usage, maintaining data locality, reducing threads synchronization and fusing successive kernel calls, the new batched kernels outperform existing state-of-the-art implementations.

  9. GPUs for statistical data analysis in HEP: a performance study of GooFit on GPUs vs. RooFit on CPUs

    Science.gov (United States)

    Pompili, Alexis; Di Florio, Adriano; CMS Collaboration

    2016-10-01

    In order to test the computing capabilities of GPUs with respect to traditional CPU cores a high-statistics toy Monte Carlo technique has been implemented both in ROOT/RooFit and GooFit frameworks with the purpose to estimate the statistical significance of the structure observed by CMS close to the kinematical boundary of the Jψϕ invariant mass in the three-body decay B +→JψϕK +. GooFit is a data analysis open tool under development that interfaces ROOT/RooFit to CUDA platform on nVidia GPU. The optimized GooFit application running on GPUs hosted by servers in the Bari Tier2 provides striking speed-up performances with respect to the RooFit application parallelised on multiple CPUs by means of PROOF-Lite tool. The considerably resulting speed-up, while comparing concurrent GooFit processes allowed by CUDA Multi Process Service and a RooFit/PROOF-Lite process with multiple CPU workers, is presented and discussed in detail. By means of GooFit it has also been possible to explore the behaviour of a likelihood ratio test statistic in different situations in which the Wilks Theorem may apply or does not apply because its regularity conditions are not satisfied.

  10. VIP Programs Offer More

    Institute of Scientific and Technical Information of China (English)

    ISABELDING

    2005-01-01

    When choosing a hotel, service standards are a high priority for customers, with the quality of service often reflecting a hotel's standing.While most hotels try to provide the beststandard possible to their guest, many also offer special VIP programs that provide vale-added service and reward customer loyalty.

  11. Highly Optimized Code Generation for Stencil Codes with Computation Reuse for GPUs

    Institute of Scientific and Technical Information of China (English)

    Wen-Jing Ma; Kan Gao; Guo-Ping Long

    2016-01-01

    Computation reuse is known as an effective optimization technique. However, due to the complexity of modern GPU architectures, there is yet not enough understanding regarding the intriguing implications of the interplay of compu-tation reuse and hardware specifics on application performance. In this paper, we propose an automatic code generator for a class of stencil codes with inherent computation reuse on GPUs. For such applications, the proper reuse of intermediate results, combined with careful register and on-chip local memory usage, has profound implications on performance. Current state of the art does not address this problem in depth, partially due to the lack of a good program representation that can expose all potential computation reuse. In this paper, we leverage the computation overlap graph (COG), a simple representation of data dependence and data reuse with “element view”, to expose potential reuse opportunities. Using COG, we propose a portable code generation and tuning framework for GPUs. Compared with current state-of-the-art code generators, our experimental results show up to 56.7%performance improvement on modern GPUs such as NVIDIA C2050.

  12. MPFFT: An Auto-Tuning FFT Library for OpenCL GPUs

    Institute of Scientific and Technical Information of China (English)

    Yan Li; Yun-Quan Zhang; Yi-Qun Liu; Guo-Ping Long; Hai-Peng Jia

    2013-01-01

    Fourier methods have revolutionized many fields of science and engineering,such as astronomy,medical imaging,seismology and spectroscopy,and the fast Fourier transform (FFT) is a computationally efficient method of generating a Fourier transform.The emerging class of high performance computing architectures,such as GPU,seeks to achieve much higher performance and efficiency by exposing a hierarchy of distinct memories to software.However,the complexity of GPU programming poses a significant challenge to developers.In this paper,we propose an automatic performance tuning framework for FFT on various OpenCL GPUs,and implement a high performance library named MPFFT based on this framework.For power-of-two length FFTs,our library substantially outperforms the clAmdFft library on AMD GPUs and achieves comparable performance as the CUFFT library on NVIDIA GPUs.Furthermore,our library also supports non-power-of-two size.For 3D non-power-of-two FFTs,our library delivers 1.5x to 28x faster than FFTW with 4 threads and 20.01x average speedup over CUFFT 4.0 on Tesla C2050.

  13. Offers for our members

    CERN Multimedia

    Staff Association

    2013-01-01

    The Courir shops propose the following offer: 15% discount on all articles (not on sales) in the Courir shops (Val Thoiry, Annemasse and Neydens) and 5% discount on sales upon presentation of your Staff Association membership card and an identity card before payment. Summer is here, enjoy our offers for the aquatic parcs! Walibi : Tickets "Zone terrestre": 21 € instead of 26 €. Access to Aqualibi: 5 € instead of 8 € on presentation of your SA member ticket. Free for children (3-11 years old) before 12 h 00. Free for children under 3, with limited access to the attractions. Car park free. * * * * * Aquaparc : Day ticket: – Children: 30 CHF instead of 39 CHF – Adults : 36 CHF instead of 49 CHF Bonus! Free for children under 5.

  14. Parallel Neutrino Triggers using GPUs for an underwater telescope

    CERN Document Server

    Bouhadef, Bachir; Terreni, Giuseppe

    2014-01-01

    Graphics Processing Units are high performance co-processors originally intended to improve the use and the acceleration of computer graphics applications. Because of their performance, researchers have extended their use beyond the computer graphics scope. We have investigate the possibility of implementing and speeding up online neutrino trigger algorithms in the KM3Net-It experiment using a CPU-GPU system. The results of a neutrino trigger simulation on NEMO Phase II tower and a KM3-It 14 floors Tower are reported.

  15. Offers for our members

    CERN Multimedia

    Staff Association

    2017-01-01

    Summer is coming, enjoy our offers for the aquatic parcs! Walibi : Tickets "Zone terrestre": 24 € instead of 30 €. Access to Aqualibi: 5 € instead of 6 € on presentation of your SA member ticket. Free for children under 100 cm. Car park free. * * * * * Aquaparc : Day ticket: – Children: 33 CHF instead of 39 CHF – Adults : 33 CHF instead of 49 CHF Bonus! Free for children under 5.

  16. Offers for our members

    CERN Multimedia

    Staff Association

    2017-01-01

    Summer is here, enjoy our offers for the aquatic parcs! Walibi : Tickets "Zone terrestre": 24 € instead of 30 €. Access to Aqualibi: 5 € instead of 6 € on presentation of your SA member ticket. Free for children under 100 cm. Car park free. * * * * * Aquaparc : Day ticket: – Children: 33 CHF instead of 39 CHF – Adults : 33 CHF instead of 49 CHF Bonus! Free for children under 5.

  17. Acceleration of generalized adaptive pulse compression with parallel GPUs

    Science.gov (United States)

    Cai, Jingxiao; Zhang, Yan R.

    2015-05-01

    Super-computing based on Graphic Processing Unit (GPU) has become a booming field both in research and industry. In this paper, GPU is applied as the main computing device on traditional RADAR super resolution algorithms. Comparison is provided between GPU and CPU as computing architecture and MATLAB, as a widely used scientific implementation, is also included as well as C++ implementation in demonstrations of CPU part in the comparison. Fundamental RADAR algorithms as matched filter and least square estimation (LSE) are used as standard procedure to measure the efficiency of each implementation. Based on the result in this paper, GPU shows an enormous potential to expedite the traditional process of RADAR super-resolution applications.

  18. Offers for our members

    CERN Multimedia

    Staff Association

    2017-01-01

    Summer is here, enjoy our offers for the water parks! Walibi: Tickets "Zone terrestre": 24 € instead of 30 €. Access to Aqualibi: 5 € instead of 6 € on presentation of your ticket purchased at the Staff Association. Bonus! Free for children under 100 cm, with limited access to the attractions. Free car park. *  *  *  *  *  *  *  * Aquaparc: Day ticket: -  Children: 33 CHF instead of 39 CHF -  Adults : 33 CHF instead of 49 CHF Bonus! Free for children under 5 years old.

  19. Offers for our members

    CERN Multimedia

    Staff Association

    2016-01-01

    Summer is here, enjoy our offers for the aquatic parcs! Walibi : Tickets "Zone terrestre": 23 € instead of 29 €. Access to Aqualibi: 5 € instead of 6 € on presentation of your SA member ticket. Free for children (3-11 years old) before 12:00 p.m. Free for children under 3, with limited access to the attractions. Car park free. * * * * * Aquaparc : Day ticket: – Children: 33 CHF instead of 39 CHF – Adults : 33 CHF instead of 49 CHF Bonus! Free for children under 5.

  20. Offers for our members

    CERN Multimedia

    Staff Association

    2016-01-01

    Summer is here, enjoy our offers for the aquatic parcs! Walibi : Tickets "Zone terrestre": 23 € instead of 29 €. Access to Aqualibi: 5 € instead of 6 € on presentation of your SA member ticket. Free for children (3-11 years old) before 12:00 p.m. Free for children under 3, with limited access to the attractions. Car park free. * * * * * Aquaparc : Day ticket: – Children: 33 CHF instead of 39 CHF – Adults : 33 CHF instead of 49 CHF Bonus! Free for children under 5.

  1. Offers for our members

    CERN Multimedia

    Staff Association

    2015-01-01

    Summer is here, enjoy our offers for the aquatic parcs! Walibi : Tickets "Zone terrestre": 21,50 € instead of 27 €. Access to Aqualibi: 5 € instead of 6 € on presentation of your SA member ticket. Free for children (3-11 years old) before 12:00 p.m. Free for children under 3, with limited access to the attractions. Car park free. * * * * * Aquaparc : Day ticket: – Children: 33 CHF instead of 39 CHF – Adults : 33 CHF instead of 49 CHF Bonus! Free for children under 5.

  2. Offers for our members

    CERN Multimedia

    Staff Association

    2013-01-01

    Summer is here, enjoy our offers for the aquatic parcs! Walibi : Tickets "Zone terrestre": 21 € instead of 26 €. Access to Aqualibi: 5 € instead of 8 € on presentation of your SA member ticket. Free for children (3-11 years old) before 12 h 00. Free for children under 3, with limited access to the attractions. Car park free. * * * * * Aquaparc : Day ticket: – Children: 30 CHF instead of 39 CHF – Adults : 36 CHF instead of 49 CHF Bonus! Free for children under 5.

  3. Astrophysical Supercomputing with GPUs: Critical Decisions for Early Adopters

    Science.gov (United States)

    Fluke, Christopher J.; Barnes, David G.; Barsdell, Benjamin R.; Hassan, Amr H.

    2011-01-01

    General-purpose computing on graphics processing units (GPGPU) is dramatically changing the landscape of high performance computing in astronomy. In this paper, we identify and investigate several key decision areas, with a goal of simplifying the early adoption of GPGPU in astronomy. We consider the merits of OpenCL as an open standard in order to reduce risks associated with coding in a native, vendor-specific programming environment, and present a GPU programming philosophy based on using brute force solutions. We assert that effective use of new GPU-based supercomputing facilities will require a change in approach from astronomers. This will likely include improved programming training, an increased need for software development best practice through the use of profiling and related optimisation tools, and a greater reliance on third-party code libraries. As with any new technology, those willing to take the risks and make the investment of time and effort to become early adopters of GPGPU in astronomy, stand to reap great benefits.

  4. Astrophysical Supercomputing with GPUs: Critical Decisions for Early Adopters

    CERN Document Server

    Fluke, Christopher J; Barsdell, Benjamin R; Hassan, Amr H

    2010-01-01

    General purpose computing on graphics processing units (GPGPU) is dramatically changing the landscape of high performance computing in astronomy. In this paper, we identify and investigate several key decision areas, with a goal of simplyfing the early adoption of GPGPU in astronomy. We consider the merits of OpenCL as an open standard in order to reduce risks associated with coding in a native, vendor-specific programming environment, and present a GPU programming philosophy based on using brute force solutions. We assert that effective use of new GPU-based supercomputing facilities will require a change in approach from astronomers. This will likely include improved programming training, an increased need for software development best-practice through the use of profiling and related optimisation tools, and a greater reliance on third-party code libraries. As with any new technology, those willing to take the risks, and make the investment of time and effort to become early adopters of GPGPU in astronomy, s...

  5. Multi-phase SPH modelling of violent hydrodynamics on GPUs

    Science.gov (United States)

    Mokos, Athanasios; Rogers, Benedict D.; Stansby, Peter K.; Domínguez, José M.

    2015-11-01

    This paper presents the acceleration of multi-phase smoothed particle hydrodynamics (SPH) using a graphics processing unit (GPU) enabling large numbers of particles (10-20 million) to be simulated on just a single GPU card. With novel hardware architectures such as a GPU, the optimum approach to implement a multi-phase scheme presents some new challenges. Many more particles must be included in the calculation and there are very different speeds of sound in each phase with the largest speed of sound determining the time step. This requires efficient computation. To take full advantage of the hardware acceleration provided by a single GPU for a multi-phase simulation, four different algorithms are investigated: conditional statements, binary operators, separate particle lists and an intermediate global function. Runtime results show that the optimum approach needs to employ separate cell and neighbour lists for each phase. The profiler shows that this approach leads to a reduction in both memory transactions and arithmetic operations giving significant runtime gains. The four different algorithms are compared to the efficiency of the optimised single-phase GPU code, DualSPHysics, for 2-D and 3-D simulations which indicate that the multi-phase functionality has a significant computational overhead. A comparison with an optimised CPU code shows a speed up of an order of magnitude over an OpenMP simulation with 8 threads and two orders of magnitude over a single thread simulation. A demonstration of the multi-phase SPH GPU code is provided by a 3-D dam break case impacting an obstacle. This shows better agreement with experimental results than an equivalent single-phase code. The multi-phase GPU code enables a convergence study to be undertaken on a single GPU with a large number of particles that otherwise would have required large high performance computing resources.

  6. Offers for our members

    CERN Multimedia

    Staff Association

    2013-01-01

    The warm weather arrives, it's time to take advantage of our offers Walibi and Aquapark! Walibi : Tickets "Zone terrestre": 21 € instead of 26 € Access to Aqualibi: 5 € instead of 8 € on presentation of your SA member ticket. Free for children (3-11 years old) before 12 h 00. Free for children under 3, with limited access to the attractions. Car park free. * * * * * Aquaparc : Half-day ticket (5 hours): – Children: 26 CHF instead of 35 CHF – Adults : 32 CHF instead of 43 CHF Day ticket: – Children: 30 CHF instead of 39 CHF – Adults : 36 CHF instead of 49 CHF Free for children under 5.

  7. Offers for our members

    CERN Multimedia

    Association du personnel

    2013-01-01

    La banque LCL propose aux membres de l’Association du personnel les avantages suivants : – Un barème Privilège sur le Prêt immobilier – Des avantages tarifaires sur l’épargne, notamment l’assurance-vie. – Un taux préférentiel de prêt à la consommation. En outre, jusqu’au 30 septembre 2013, elle offre 50€ à tous les nouveaux clients, membres de l'Association du personnel. Summer is here, enjoy our offers for the aquatic parcs! Tickets "Zone terrestre" : 21 € instead of de 26 €. Access to Aqualibi : 5 euros instead of 8 euros on presentation of your SA member ticket. Free for children (3-11 years old) before 12 h 00. Free for children under 3, with limited access to the attractions. Free car park. * * * * * * * Full day ticket: – Children : 30 CHF instead of 39 CHF &...

  8. Using Graphics Processing Units to solve the classical N-body problem in physics and astrophysics

    CERN Document Server

    Spera, Mario

    2014-01-01

    Graphics Processing Units (GPUs) can speed up the numerical solution of various problems in astrophysics including the dynamical evolution of stellar systems; the performance gain can be more than a factor 100 compared to using a Central Processing Unit only. In this work I describe some strategies to speed up the classical N-body problem using GPUs. I show some features of the N-body code HiGPUs as template code. In this context, I also give some hints on the parallel implementation of a regularization method and I introduce the code HiGPUs-R. Although the main application of this work concerns astrophysics, some of the presented techniques are of general validity and can be applied to other branches of physics such as electrodynamics and QCD.

  9. Performance-Centric Optimization for Racetrack Memory Based Register File on GPUs

    Institute of Scientific and Technical Information of China (English)

    Yun Liang; Shuo Wang

    2016-01-01

    The key to high performance for GPU architecture lies in its massive threading capability to drive a large number of cores and enable execution overlapping among threads. However, in reality, the number of threads that can simultaneously execute is often limited by the size of the register file on GPUs. The traditional SRAM-based register file takes up so large amount of chip area that it cannot scale to meet the increasing demand of GPU applications. Racetrack memory (RM) is a promising technology for designing large capacity register file on GPUs due to its high data storage density. However, without careful deployment of RM-based register file, the lengthy shift operations of RM may hurt the performance. In this paper, we explore RM for designing high-performance register file for GPU architecture. High storage density RM helps to improve the thread level parallelism (TLP), but if the bits of the registers are not aligned to the ports, shift operations are required to move the bits to the access ports before they are accessed, and thus the read/write operations are delayed. We develop an optimization framework for RM-based register file on GPUs, which employs three different optimization techniques at the application, compilation, and architecture level, respectively. More clearly, we optimize the TLP at the application level, design a register mapping algorithm at the compilation level, and design a preshifting mechanism at the architecture level. Collectively, these optimizations help to determine the TLP without causing cache and register file resource contention and reduce the shift operation overhead. Experimental results using a variety of representative workloads demonstrate that our optimization framework achieves up to 29%(21%on average) performance improvement.

  10. Batched QR and SVD Algorithms on GPUs with Applications in Hierarchical Matrix Compression

    KAUST Repository

    Halim Boukaram, Wajih

    2017-09-14

    We present high performance implementations of the QR and the singular value decomposition of a batch of small matrices hosted on the GPU with applications in the compression of hierarchical matrices. The one-sided Jacobi algorithm is used for its simplicity and inherent parallelism as a building block for the SVD of low rank blocks using randomized methods. We implement multiple kernels based on the level of the GPU memory hierarchy in which the matrices can reside and show substantial speedups against streamed cuSOLVER SVDs. The resulting batched routine is a key component of hierarchical matrix compression, opening up opportunities to perform H-matrix arithmetic efficiently on GPUs.

  11. Accelerating Multiple Compound Comparison Using LINGO-Based Load-Balancing Strategies on Multi-GPUs

    Directory of Open Access Journals (Sweden)

    Chun-Yuan Lin

    2015-01-01

    Full Text Available Compound comparison is an important task for the computational chemistry. By the comparison results, potential inhibitors can be found and then used for the pharmacy experiments. The time complexity of a pairwise compound comparison is O(n2, where n is the maximal length of compounds. In general, the length of compounds is tens to hundreds, and the computation time is small. However, more and more compounds have been synthesized and extracted now, even more than tens of millions. Therefore, it still will be time-consuming when comparing with a large amount of compounds (seen as a multiple compound comparison problem, abbreviated to MCC. The intrinsic time complexity of MCC problem is O(k2n2 with k compounds of maximal length n. In this paper, we propose a GPU-based algorithm for MCC problem, called CUDA-MCC, on single- and multi-GPUs. Four LINGO-based load-balancing strategies are considered in CUDA-MCC in order to accelerate the computation speed among thread blocks on GPUs. CUDA-MCC was implemented by C+OpenMP+CUDA. CUDA-MCC achieved 45 times and 391 times faster than its CPU version on a single NVIDIA Tesla K20m GPU card and a dual-NVIDIA Tesla K20m GPU card, respectively, under the experimental results.

  12. Accelerating Multiple Compound Comparison Using LINGO-Based Load-Balancing Strategies on Multi-GPUs.

    Science.gov (United States)

    Lin, Chun-Yuan; Wang, Chung-Hung; Hung, Che-Lun; Lin, Yu-Shiang

    2015-01-01

    Compound comparison is an important task for the computational chemistry. By the comparison results, potential inhibitors can be found and then used for the pharmacy experiments. The time complexity of a pairwise compound comparison is O(n (2)), where n is the maximal length of compounds. In general, the length of compounds is tens to hundreds, and the computation time is small. However, more and more compounds have been synthesized and extracted now, even more than tens of millions. Therefore, it still will be time-consuming when comparing with a large amount of compounds (seen as a multiple compound comparison problem, abbreviated to MCC). The intrinsic time complexity of MCC problem is O(k (2) n (2)) with k compounds of maximal length n. In this paper, we propose a GPU-based algorithm for MCC problem, called CUDA-MCC, on single- and multi-GPUs. Four LINGO-based load-balancing strategies are considered in CUDA-MCC in order to accelerate the computation speed among thread blocks on GPUs. CUDA-MCC was implemented by C+OpenMP+CUDA. CUDA-MCC achieved 45 times and 391 times faster than its CPU version on a single NVIDIA Tesla K20m GPU card and a dual-NVIDIA Tesla K20m GPU card, respectively, under the experimental results.

  13. Parallelization and Performance of the NIM Weather Model Running on GPUs

    Science.gov (United States)

    Govett, Mark; Middlecoff, Jacques; Henderson, Tom; Rosinski, James

    2014-05-01

    The Non-hydrostatic Icosahedral Model (NIM) is a global weather prediction model being developed to run on the GPU and MIC fine-grain architectures. The model dynamics, written in Fortran, was initially parallelized for GPUs in 2009 using the F2C-ACC compiler and demonstrated good results running on a single GPU. Subsequent efforts have focused on (1) running efficiently on multiple GPUs, (2) parallelization of NIM for Intel-MIC using openMP, (3) assessing commercial Fortran GPU compilers now available from Cray, PGI and CAPS, (4) keeping the model up to date with the latest scientific development while maintaining a single source performance portable code, and (5) parallelization of two physics packages used in the NIM: the operational Global Forecast System (GFS) used operationally, and the widely used Weather Research and Forecast (WRF) model physics. The presentation will touch on each of these efforts, but highlight improvements in parallel performance of the NIM running on the Titan GPU cluster at ORNL, the ongong parallelization of model physics, and a recent evaluation of commercial GPU compilers using the F2C-ACC compiler as the baseline.

  14. Statistical significance estimation of a signal within the GooFit framework on GPUs

    Science.gov (United States)

    Cristella, Leonardo; Di Florio, Adriano; Pompili, Alexis

    2017-03-01

    In order to test the computing capabilities of GPUs with respect to traditional CPU cores a high-statistics toy Monte Carlo technique has been implemented both in ROOT/RooFit and GooFit frameworks with the purpose to estimate the statistical significance of the structure observed by CMS close to the kinematical boundary of the J/ψϕ invariant mass in the three-body decay B+ → J/ψϕK+. GooFit is a data analysis open tool under development that interfaces ROOT/RooFit to CUDA platform on nVidia GPU. The optimized GooFit application running on GPUs hosted by servers in the Bari Tier2 provides striking speed-up performances with respect to the RooFit application parallelised on multiple CPUs by means of PROOF-Lite tool. The considerable resulting speed-up, evident when comparing concurrent GooFit processes allowed by CUDA Multi Process Service and a RooFit/PROOF-Lite process with multiple CPU workers, is presented and discussed in detail. By means of GooFit it has also been possible to explore the behaviour of a likelihood ratio test statistic in different situations in which the Wilks Theorem may or may not apply because its regularity conditions are not satisfied.

  15. Affordable Wide-field Optical Space Surveillance using sCMOS and GPUs

    Science.gov (United States)

    Zimmer, P.; McGraw, J.; Ackermann, M.

    2016-09-01

    Recent improvements in sCMOS technology allow for affordable, wide-field, and rapid cadence surveillance from LEO to out past GEO using largely off-the-shelf hardware. sCMOS sensors, until very recently, suffered from several shortcomings when compared to CCD sensors - lower sensitivity, smaller physical size and less predictable noise characteristics. Sensors that overcome the first two of these are now available commercially and the principals at J.T. McGraw and Associates (JTMA) have developed observing strategies that minimize the impact of the third, while leveraging the key features of sCMOS, fast readout and low average readout noise. JTMA has integrated a new generation sCMOS sensor into an existing COTS telescope system in order to develop and test new detection techniques designed for uncued optical surveillance across a wide range of apparent object angular rates - from degree per second scale of LEO objects to a few arcseconds per second for objects out past GEO. One further complication arises from this: increased useful frame rate means increased data volume. Fortunately, GPU technology continues to advance at a breakneck pace and we report on the results and performance of our new detection techniques implemented on new generation GPUs. Early results show significance within 20% of the expected theoretical limiting signal-to-noise using commodity GPUs in near real time across a wide range of object parameters, closing the gap in detectivity between moving objects and tracked objects.

  16. Physics based optimization of Particle-in-Cell simulations on GPUs

    Science.gov (United States)

    Abbott, Stephen; D'Azevedo, Ed

    2016-10-01

    We present progress in improving the performance of the gyrokinetic particle-in-cell (PIC) code XGC-1 on NVIDIA GPUs, as well as enhancements made to portability and developer productivity using OpenACC directives. Increasingly simulation codes are required to use heterogeneous accelerator resources on the most powerful supercomputing systems. PIC methods are well suited to these massively parallel accelerator architectures, as particles can largely be advanced independently within a time-step. Their advance must still, however, reference field data on underlying grid structures, which presents a significant performance bottleneck. Even ported to GPUs using CUDA Fortran, the XGC-1 electron push routine accounts for a significant portion of the code execution time. By applying physical insight to the motion of electrons across the device (and therefore field grids) we have developed techniques that increase performance of this kernel by up to 5X, compared to the original CUDA Fortran implementation. Architecture specific optimizations can be isolated in small `leaf' routines, which allows for a portable OpenACC implementation that performs nearly as well as the optimized CUDA.

  17. Strong scaling of general-purpose molecular dynamics simulations on GPUs

    CERN Document Server

    Glaser, Jens; Anderson, Joshua A; Lui, Pak; Spiga, Filippo; Millan, Jaime A; Morse, David C; Glotzer, Sharon C

    2014-01-01

    We describe a highly optimized implementation of MPI domain decomposition in a GPU-enabled, general-purpose molecular dynamics code, HOOMD-blue (Anderson and Glotzer, arXiv:1308.5587). Our approach is inspired by a traditional CPU-based code, LAMMPS (Plimpton, J. Comp. Phys. 117, 1995), but is implemented within a code that was designed for execution on GPUs from the start (Anderson et al., J. Comp. Phys. 227, 2008). The software supports short-ranged pair force and bond force fields and achieves optimal GPU performance using an autotuning algorithm. We are able to demonstrate equivalent or superior scaling on up to 3,375 GPUs in Lennard-Jones and dissipative particle dynamics (DPD) simulations of up to 108 million particles. GPUDirect RDMA capabilities in recent GPU generations provide better performance in full double precision calculations. For a representative polymer physics application, HOOMD-blue 1.0 provides an effective GPU vs. CPU node speed-up of 12.5x.

  18. GPU-S2S:面向GPU的源到源翻译转化%GPU-S2S: a source to source compiler for GPU

    Institute of Scientific and Technical Information of China (English)

    李丹; 曹海军; 董小社; 张保

    2012-01-01

    To address the problem of poor software portability and programmability of a graphic processing unit ( GPU ) , and to facilitate the development of parallel programs on GPU, this study proposed a novel directive based compiler guided approach, and then the GPU-S2S, a prototypic tool for automatic source-to-source translation, was implemented through combining automatic mapping with static compilation configuration, which is capable of translating a C sequential program with directives into a compute unified device architecture (CUDA) program. The experimental results show that CUDA codes generated by the GPU-S2S can achieve comparable performance to that of CUDA benchmarks provided by NVIDIA CUDA SDK, and have significant performance improvements compared to its original C sequential codes.%针对图形处理器(GPU)架构下的软件可移植性、可编程性差的问题,为了便于在GPU上开发并行程序,通过自动映射与静态编译相结合,提出了一种新的基于制导语句控制的编译优化方法,实现了一个源到源的自动转化工具GPU-S2S,它能够将插入了制导语句的串行C程序转化为统一计算架构(CUDA)程序.实验结果表明,经GPU-S2S转化生成的代码和英伟达(NVIDIA)提供的基准测试代码具有相当的性能;与原串行程序在CPU上执行相比,转换后的并行程序在GPU上能够获取显著的性能提升.

  19. GPU-FS-kNN: a software tool for fast and scalable kNN computation using GPUs.

    Directory of Open Access Journals (Sweden)

    Ahmed Shamsul Arefin

    Full Text Available BACKGROUND: The analysis of biological networks has become a major challenge due to the recent development of high-throughput techniques that are rapidly producing very large data sets. The exploding volumes of biological data are craving for extreme computational power and special computing facilities (i.e. super-computers. An inexpensive solution, such as General Purpose computation based on Graphics Processing Units (GPGPU, can be adapted to tackle this challenge, but the limitation of the device internal memory can pose a new problem of scalability. An efficient data and computational parallelism with partitioning is required to provide a fast and scalable solution to this problem. RESULTS: We propose an efficient parallel formulation of the k-Nearest Neighbour (kNN search problem, which is a popular method for classifying objects in several fields of research, such as pattern recognition, machine learning and bioinformatics. Being very simple and straightforward, the performance of the kNN search degrades dramatically for large data sets, since the task is computationally intensive. The proposed approach is not only fast but also scalable to large-scale instances. Based on our approach, we implemented a software tool GPU-FS-kNN (GPU-based Fast and Scalable k-Nearest Neighbour for CUDA enabled GPUs. The basic approach is simple and adaptable to other available GPU architectures. We observed speed-ups of 50-60 times compared with CPU implementation on a well-known breast microarray study and its associated data sets. CONCLUSION: Our GPU-based Fast and Scalable k-Nearest Neighbour search technique (GPU-FS-kNN provides a significant performance improvement for nearest neighbour computation in large-scale networks. Source code and the software tool is available under GNU Public License (GPL at https://sourceforge.net/p/gpufsknn/.

  20. Visualizing 3D/4D Environmental Big Data Using Many-core Compute Unified Device Architecture (CUDA) and Multi-core Central Processing Unit (CPUs)

    Science.gov (United States)

    Li, J.; Jiang, Y.; Yang, C.; Huang, Q.

    2012-12-01

    Visualizing 3D/4D environmental Big Data is critical to understand and predict environmental phenomena for relevant decision making. This research explores how to best utilize Graphics Process Units (GPUs) and Central Processing Units (CPUs) collaboratively to speed up the visualization process. Taking the visualization of dust storm as an example, we developed a systematic visualization framework. To compare the potential speedup of using GPUs versus that of using CPUs, we implemented visualization components based on both multi-core CPUs and many-core GPUs. We found that 1) multi-core CPUs and many-core GPUs can improve the efficiency of mathematical calculations and graphics rendering using multithreading techniques; 2) when increasing the size of blocks of GPUs for reprojecting, interpolating and rendering the same data, the executing time drops consistently before reaching a peak.; 3) GPU-based implementations is faster than CPU-based implementations. However, the best performance of rendering with GPUs is very close to that with CPUs. Therefore, visualization of 3D/4D environmental data using GPUs is a better solution than that of using CPUs.

  1. A Performance Comparison of Different Graphics Processing Units Running Direct N-Body Simulations

    CERN Document Server

    Capuzzo-Dolcetta, Roberto

    2013-01-01

    Hybrid computational architectures based on the joint power of Central Processing Units and Graphic Processing Units (GPUs) are becoming popular and powerful hardware tools for a wide range of simulations in biology, chemistry, engineering, physics, etc.. In this paper we present a comparison of performance of various GPUs available on market when applied to the numerical integration of the classic, gravitational, N-body problem. To do this, we developed an OpenCL version of the parallel code (HiGPUs) to use for these tests, because this version is the only apt to work on GPUs of different makes. The main general result is that we confirm the reliability, speed and cheapness of GPUs when applied to the examined kind of problems (i.e. when the forces to evaluate are dependent on the mutual distances, as it happens in gravitational physics and molecular dynamics). More specifically, we find that also the cheap GPUs built to be employed just for gaming applications are very performant in terms of computing speed...

  2. Monte Carlo MP2 on Many Graphical Processing Units.

    Science.gov (United States)

    Doran, Alexander E; Hirata, So

    2016-10-11

    In the Monte Carlo second-order many-body perturbation (MC-MP2) method, the long sum-of-product matrix expression of the MP2 energy, whose literal evaluation may be poorly scalable, is recast into a single high-dimensional integral of functions of electron pair coordinates, which is evaluated by the scalable method of Monte Carlo integration. The sampling efficiency is further accelerated by the redundant-walker algorithm, which allows a maximal reuse of electron pairs. Here, a multitude of graphical processing units (GPUs) offers a uniquely ideal platform to expose multilevel parallelism: fine-grain data-parallelism for the redundant-walker algorithm in which millions of threads compute and share orbital amplitudes on each GPU; coarse-grain instruction-parallelism for near-independent Monte Carlo integrations on many GPUs with few and infrequent interprocessor communications. While the efficiency boost by the redundant-walker algorithm on central processing units (CPUs) grows linearly with the number of electron pairs and tends to saturate when the latter exceeds the number of orbitals, on a GPU it grows quadratically before it increases linearly and then eventually saturates at a much larger number of pairs. This is because the orbital constructions are nearly perfectly parallelized on a GPU and thus completed in a near-constant time regardless of the number of pairs. In consequence, an MC-MP2/cc-pVDZ calculation of a benzene dimer is 2700 times faster on 256 GPUs (using 2048 electron pairs) than on two CPUs, each with 8 cores (which can use only up to 256 pairs effectively). We also numerically determine that the cost to achieve a given relative statistical uncertainty in an MC-MP2 energy increases as O(n(3)) or better with system size n, which may be compared with the O(n(5)) scaling of the conventional implementation of deterministic MP2. We thus establish the scalability of MC-MP2 with both system and computer sizes.

  3. Offers

    CERN Multimedia

    Staff Association

    2014-01-01

    To our members 5% discount on Fnac vouchers Vouchers of 50.-, 100.- et 200. - CHF Valid in the 4 shops in Switzerland without restriction on purchases. On sale in the office of Secretariat of the staff Association.

  4. Offers

    CERN Multimedia

    Staff Association

    2014-01-01

    12 % discount on football camps and courses for children from 3 to 13 years old, with bilingual coaches.   Now also courses during the autumn holidays! In order to get the discount you need to register online, then send a mail to info@intersoccer.ch with a scan of your membership card to recieve a refund of the discount.

  5. Offers

    CERN Multimedia

    Staff Association

    2013-01-01

    Découvrez les plus belles tables de Suisse romande et de France voisine en bénéficiant des réductions suivantes sur chaque repas, pendant une année : 50 % pour 2 personnes / 40 % pour 3 personnes / 30 % pour 4 personnes / 20 % pour 5 à 6 personnes. Comment ça marche ? Faites votre choix parmi les 110 restaurants de votre région et réservez votre table pour 2, 3, 4, 5 ou 6 personnes. Présentez votre Passeport Gourmand dès votre arrivée. Savourez votre repas et profitez d’une réduction exceptionnelle sur votre addition (hors boissons, menu du jour et business lunch). Quels sont vos avantages ? Profitez du prix préférentiel pour les membres de l’association du CERN : – Passeport Gourmand Genève : CHF 75.- (au lieu de CHF 95.-) – Passeport Gourmand Ain/Savoie/Haute-Savoie : CHF 59.- (au lieu de...

  6. Offers

    CERN Multimedia

    Staff Association

    2013-01-01

    Découvrez les plus belles tables de Suisse romande et de France voisine en bénéficiant des réductions suivantes sur chaque repas, pendant une année : 50 % pour 2 personnes, 40 % pour 3 personnes, 30 % pour 4 personnes, 20 % pour 5 à 6 personnes. Comment ça marche ? Faites votre choix parmi les 110 restaurants de votre région et réservez votre table pour 2, 3, 4, 5 ou 6 personnes. Présentez votre Passeport Gourmand dès votre arrivée. Savourez votre repas et profitez d’une réduction exceptionnelle sur votre addition (hors boissons, menu du jour et business lunch). Quels sont vos avantages ? Profitez du prix préférentiel pour les membres de l’association du CERN : – Passeport Gourmand Genève : CHF 75.- (au lieu de CHF 95.-) – Passeport Gourmand Ain/Savoie/Haute-Savoie : CHF 59.- (au lieu de CH...

  7. Offers

    CERN Multimedia

    Staff Association

    2013-01-01

    Concert Scoop music tour sur le parc Walibi ! Vendredi 12 Juillet Vous trouverez la présentation de l’événement et les vidéos des artistes attendus avec leurs titres faisant vibrer les radios en ce moment sur le site internet http://www.walibi.com/rhone-alpes/fr-fr/evenements/scoop-music-tour. Le concert est gratuit et débute à la fermeture du parc avec une première partie surprise. Profitez donc d’une belle journée sur le parc et finissez en beauté avec le concert de l’été !

  8. Offers

    CERN Multimedia

    Association du personnel

    2010-01-01

    LA BÂTIE-FESTIVAL DE GENEVE Offre pour les membres de l'association du personnel du CERN   P r é s e n t a t i o n L a B â t i e-F e s t i v a l d e G e n è v e : Festival pluridisciplinaire et contemporain, souvent qualifié de «tête chercheuse», La Bâtie-Festival de Genève permet durant deux semaines de découvrir plus de 40 spectacles d’artistes emblématiques d’ici ou d’ailleurs, aussi bien pour les grands que les petits (mini-Bâtie). De la danse, du théâtre, de la musique, du 3 au 18 septembre 2010 nous recevrons près de 300 artistes dans une vingtaine de salles à Genève et en France voisine (Annemasse et Divonne). La Bâtie c’est aussi deux lieux de rencontre et d’échange, Le Tampopo, notre restaurant-l...

  9. Offers

    CERN Multimedia

    Staff association

    2014-01-01

        Envie de soirée au théâtre, n’hésitez pas à bénéficier de nos offres pour nos membres ! Théâtre de Carouge : Réduction de 5 CHF pour tous les spectacles (30 CHF au lieu de 35 CHF) Le théâtre de Carouge vous présente sa nouvelle pièce : La double insconstance Du vedredi 21 mars au dimanche 6 avril 2014 De Marivaux Mise en scène de Philippe Mentha Audio-description le mardi 1er avril et le samedi 5 avril 2014 Il règne un doux mélange de révoltes et de séductions, de ruses et de fatalité dans cette Double Inconstance de Marivaux que met en scène Philippe Mentha, membre fondateur du Théâtre de Carouge et directeur depuis plus de trente ans du Théâtre Kléber-Méleau. L’allure d...

  10. Offers

    CERN Multimedia

    Staff Association

    2014-01-01

      Bénéficiez du tarif spécial de 35 CHF/personne + 1 accompagnant au Théâtre de Carouge  en étant membre de l’Association du personnel.  Envoyez votre réservation par mail à smills@tcag.ch via votre adresse mail professionnelle. Indiquez la date de votre réservation, votre nom, prénom et numéro de téléphone. Une confirmation de réservation vous sera retournée par mail. La présentation de votre carte de membre sera demandée lors du retrait des billets.   De Molière – Mise en scène de Jean Liermier Argan, veuf, remarié avec Béline qui n’attend que la mort de son mari pour hériter, multiplie saignées, purges et autres ingestions de remèdes. Angélique, sa fille, vuet &a...

  11. Offers

    CERN Multimedia

    Staff Association

    2013-01-01

    Tickets "Zone terrestre": 21.50 € instead of 27 €. Access to Aqualibi: 5 € instead of 8 € on presentation of your SA member ticket. Free for children (3-11 years old) before 12 h 00. Free for children under 3, with limited access to the attractions. Car park free.

  12. Offer

    CERN Multimedia

    Staff Association

    2015-01-01

    Le parc ouvre ses portes le samedi 4 avril 2015!   La Chasse aux Oeufs du 4 au 26 avril En plus de ses 25 attractions et spectacles, le parc proposera aux enfants de 3 à 12 ans de relever le challenge d’une course aux oeufs dans un jardin de Pâques reconstitué ! Autant de petits oeufs à trouver dans un temps limite ; tout cela au milieu de lapins, poules, fleurs et autres oeufs géants pour repartir avec des gourmandises en chocolat de la marque Revillon Chocolatier.   Profitez de notre offre spéciale pour nos membres : Tarif unique Adulte/Enfant Entrée Zone terrestre 21,50 euros au lieu de 27 euros Accès à l’Aqualibi : 5 euros au lieu de 8 euros sur présentation du billet d’entrée au tarif membre AP. Entrée gratuite pour les enfants de moins de 3 ans, avec accès limité aux attractions. Les billet...

  13. Offers

    CERN Multimedia

    Staff Association

    2015-01-01

    Tickets "Zone terrestre": 21 € instead of 27 €. Access to Aqualibi: 5 € instead of 8 € on presentation of your SA member ticket. Free for children (3-11 years old) before 12 h 00. Free for children under 3, with limited access to the attractions. Car park free.

  14. OFFERS

    CERN Multimedia

    Staff Association

    2014-01-01

    Nouveau partenaire - Joy’s Club   Venez profiter des remises au Joy’s Club / Minigolf à Divonne-les-bains en tant que membre de l’Association ! Sur présentation de votre carte membre, vous bénéficierez d’une remise immédiate telle que : - Pour une partie adulte : 6 euros au lieu de 7 euros - Pour une partie enfant : 4 euros au lieu de 5 euros - Pour le mini Park : 6 euros au lieu de 7 euros Pour plus de renseignements, n’hésitez pas à demander au Secrétariat de l’Association ou à consulter notre site web: http://staff-association.web.cern.ch/fr/socioculturel/offres  

  15. Offers

    CERN Multimedia

    Staff Association

    2012-01-01

    Si cette offre vous intéresse, merci d’envoyer un mail à mh.boulanger@comedie.ch avec le détail de votre réservation via votre adresse mail professionnelle. Le retrait des places se fait à la billetterie sur présentation de votre carte de membre de l’Association du personnel. Pour toute commande d’abonnement ou de carte de réduction par courrier ou internet, cocher le tarif collectif en indiquant le nom de l’entreprise et en joignant un justificatif nominatif. Pour tout renseignement, n’hésitez pas à contacter Marie-Hélène Boulanger : –  Tel. : 022 809 60 86 –  email : mh.boulanger@comedie.ch

  16. Biomolecular Electrostatics Simulation by an FMM-based BEM on 512 GPUs

    CERN Document Server

    Yokota, Rio; Bardhan, Jaydeep P; Knepley, Matthew G; Barba, L A

    2010-01-01

    We present simulations of biomolecular electrostatics at a scale not reached before, thanks to both algorithmic and hardware acceleration. The algorithmic acceleration is achieved with the fast multipole method (FMM) in conjunction with a boundary element method (BEM) formulation of the continuum electrostatic model. The hardware acceleration is achieved through graphics processors, GPUs. We demonstrate the power of our algorithms and software for the calculation of the electrostatic interactions between biological molecules in solution. Computational experiments are presented simulating the electrostatics of protein--drug binding and several multi-million atom systems consisting of hundreds to thousands of copies of the problems, which models over 20 million atoms and has more than six billion unknowns, one iteration step requires only a few minutes on 512 GPU nodes. We achieved a sustained performance of 34.6TFlops for the entire BEM calculation. We are currently adapting our solver to model the linearized ...

  17. Real-time Vision using FPGAs, GPUs and Multi-core CPUs

    DEFF Research Database (Denmark)

    Kjær-Nielsen, Anders

    the introduction and evolution of a wide variety of powerful hardware architectures have made the developed theory more applicable in performance demanding and real-time applications. Three different architectures have dominated the field due to their parallel capabilities that are often desired when dealing...... processors in the vision community. The introduction of programming languages like CUDA from NVIDIA has made it easier to utilize the high parallel processing powers of the GPU for general purpose computing and thereby realistic to use based on the effort involved with development. The increased clock......-linear filtering processes on FPGAs that has been used for preprocessing images in the context of a bigger Early Cognitive Vision (ECV) system. With the introduction of GPUs for general purpose computing the preprocessing was re-implemented on this architecture and used together with a multi-core CPU to form...

  18. Fully 3D list-mode time-of-flight PET image reconstruction on GPUs using CUDA.

    Science.gov (United States)

    Cui, Jing-Yu; Pratx, Guillem; Prevrhal, Sven; Levin, Craig S

    2011-12-01

    List-mode processing is an efficient way of dealing with the sparse nature of positron emission tomography (PET) data sets and is the processing method of choice for time-of-flight (ToF) PET image reconstruction. However, the massive amount of computation involved in forward projection and backprojection limits the application of list-mode reconstruction in practice, and makes it challenging to incorporate accurate system modeling. The authors present a novel formulation for computing line projection operations on graphics processing units (GPUs) using the compute unified device architecture (CUDA) framework, and apply the formulation to list-mode ordered-subsets expectation maximization (OSEM) image reconstruction. Our method overcomes well-known GPU challenges such as divergence of compute threads, limited bandwidth of global memory, and limited size of shared memory, while exploiting GPU capabilities such as fast access to shared memory and efficient linear interpolation of texture memory. Execution time comparison and image quality analysis of the GPU-CUDA method and the central processing unit (CPU) method are performed on several data sets acquired on a preclinical scanner and a clinical ToF scanner. When applied to line projection operations for non-ToF list-mode PET, this new GPU-CUDA method is >200 times faster than a single-threaded reference CPU implementation. For ToF reconstruction, we exploit a ToF-specific optimization to improve the efficiency of our parallel processing method, resulting in GPU reconstruction >300 times faster than the CPU counterpart. For a typical whole-body scan with 75 × 75 × 26 image matrix, 40.7 million LORs, 33 subsets, and 3 iterations, the overall processing time is 7.7 s for GPU and 42 min for a single-threaded CPU. Image quality and accuracy are preserved for multiple imaging configurations and reconstruction parameters, with normalized root mean squared (RMS) deviation less than 1% between CPU and GPU

  19. Heterogeneous Multicore Parallel Programming for Graphics Processing Units

    Directory of Open Access Journals (Sweden)

    Francois Bodin

    2009-01-01

    Full Text Available Hybrid parallel multicore architectures based on graphics processing units (GPUs can provide tremendous computing power. Current NVIDIA and AMD Graphics Product Group hardware display a peak performance of hundreds of gigaflops. However, exploiting GPUs from existing applications is a difficult task that requires non-portable rewriting of the code. In this paper, we present HMPP, a Heterogeneous Multicore Parallel Programming workbench with compilers, developed by CAPS entreprise, that allows the integration of heterogeneous hardware accelerators in a unintrusive manner while preserving the legacy code.

  20. Iterative Methods for MPC on Graphical Processing Units

    DEFF Research Database (Denmark)

    2012-01-01

    The high oating point performance and memory bandwidth of Graphical Processing Units (GPUs) makes them ideal for a large number of computations which often arises in scientic computing, such as matrix operations. GPUs achieve this performance by utilizing massive par- allelism, which requires...... on their applicability for GPUs. We examine published techniques for iterative methods in interior points methods (IPMs) by applying them to simple test cases, such as a system of masses connected by springs. Iterative methods allows us deal with the ill-conditioning occurring in the later iterations of the IPM as well...... as to avoid the use of dense matrices, which may be too large for the limited memory capacity of current graphics cards....

  1. Graphics Processing Units and High-Dimensional Optimization.

    Science.gov (United States)

    Zhou, Hua; Lange, Kenneth; Suchard, Marc A

    2010-08-01

    This paper discusses the potential of graphics processing units (GPUs) in high-dimensional optimization problems. A single GPU card with hundreds of arithmetic cores can be inserted in a personal computer and dramatically accelerates many statistical algorithms. To exploit these devices fully, optimization algorithms should reduce to multiple parallel tasks, each accessing a limited amount of data. These criteria favor EM and MM algorithms that separate parameters and data. To a lesser extent block relaxation and coordinate descent and ascent also qualify. We demonstrate the utility of GPUs in nonnegative matrix factorization, PET image reconstruction, and multidimensional scaling. Speedups of 100 fold can easily be attained. Over the next decade, GPUs will fundamentally alter the landscape of computational statistics. It is time for more statisticians to get on-board.

  2. 26 CFR 301.7430-7 - Qualified offers.

    Science.gov (United States)

    2010-04-01

    ... conference. (5) Remains open. A qualified offer must, by its terms, remain open for acceptance by the United... that would have resulted from the acceptance of E's qualified offer is a reduction in that liability of... 26 Internal Revenue 18 2010-04-01 2010-04-01 false Qualified offers. 301.7430-7 Section...

  3. Turnkey offering a claimed sector 'first'.

    Science.gov (United States)

    Law, Oliver

    2011-01-01

    Manufacturer and supplier of LED theatre lights, HD camera systems, video integration technologies, and ceiling support units, Trumpf Medical Systems UK, and "logistical services" company Canute International Medical Services (CIMS), one of whose specialities is providing mobile medical units for diagnostic imaging, have entered into a partnership that will see the two companies offer fully fitted out modular operating theatres and other medical/clinical buildings incorporating the latest technology and equipment, on a fully project-managed, "turnkey" basis. Oliver Law, Trumpf Medical Systems UK managing director, explains the background, and the new service's anticipated customer benefits.

  4. A 1.5 GFLOPS Reciprocal Unit for Computer Graphics

    DEFF Research Database (Denmark)

    Nannarelli, Alberto; Rasmussen, Morten Sleth; Stuart, Matthias Bo

    2006-01-01

    The reciprocal operation 1/d is a frequent operation performed in graphics processors (GPUs). In this work, we present the design of a radix-16 reciprocal unit based on the algorithm combining the traditional digit-by-digit algorithm and the approximation of the reciprocal by one Newton...

  5. The ultimatum game: Discrete vs. continuous offers

    Science.gov (United States)

    Dishon-Berkovits, Miriam; Berkovits, Richard

    2014-09-01

    In many experimental setups in social-sciences, psychology and economy the subjects are requested to accept or dispense monetary compensation which is usually given in discrete units. Using computer and mathematical modeling we show that in the framework of studying the dynamics of acceptance of proposals in the ultimatum game, the long time dynamics of acceptance of offers in the game are completely different for discrete vs. continuous offers. For discrete values the dynamics follow an exponential behavior. However, for continuous offers the dynamics are described by a power-law. This is shown using an agent based computer simulation as well as by utilizing an analytical solution of a mean-field equation describing the model. These findings have implications to the design and interpretation of socio-economical experiments beyond the ultimatum game.

  6. GPUs for fast pattern matching in the RICH of the NA62 experiment

    CERN Document Server

    Lamanna, G; Sozzi, M

    2011-01-01

    In rare decays experiments an effective online selection is a fundamental part of the data acquisition system (DAQ) in order to reduce both the quantity of data written on tape and the bandwidth requirements for the DAQ system. A multilevel architecture is commonly used to achieve a higher reduction factor, exploiting dedicated custom hardware and flexible software in standard computers. In this paper we discuss the possibility to use commercial video card processors (GPU) to build a fast and effective trigger system, both at hardware and software level. The computing power of the GPUs allows to design a real-time system in which trigger decisions are taken directly in the video processor with a defined maximum latency. This allows building lowest trigger levels based on standard off-the-shelf PCs with CPU and GPU (instead of the commonly adopted solutions based on custom electronics with FPGA or ASICs) with enhanced and high performance computation capabilities, resulting in high rejection power, high effici...

  7. Accelerating moderately stiff chemical kinetics in reactive-flow simulations using GPUs

    CERN Document Server

    Niemeyer, Kyle E

    2014-01-01

    The chemical kinetics ODEs arising from operator-split reactive-flow simulations were solved on GPUs using explicit integration algorithms. Nonstiff chemical kinetics of a hydrogen oxidation mechanism (9 species and 38 irreversible reactions) were computed using the explicit fifth-order Runge-Kutta-Cash-Karp method, and the GPU-accelerated version performed faster than single- and six-core CPU versions by factors of 126 and 25, respectively, for 524,288 ODEs. Moderately stiff kinetics, represented with mechanisms for hydrogen/carbon-monoxide (13 species and 54 irreversible reactions) and methane (53 species and 634 irreversible reactions) oxidation, were computed using the stabilized explicit second-order Runge-Kutta-Chebyshev (RKC) algorithm. The GPU-based RKC implementation demonstrated an increase in performance of nearly 59 and 10 times, for problem sizes consisting of 262,144 ODEs and larger, than the single- and six-core CPU-based RKC algorithms using the hydrogen/carbon-monoxide mechanism. With the met...

  8. Performance of Point and Range Queries for In-memory Databases using Radix Trees on GPUs

    Energy Technology Data Exchange (ETDEWEB)

    Alam, Maksudul [ORNL; Yoginath, Srikanth B [ORNL; Perumalla, Kalyan S [ORNL

    2016-01-01

    In in-memory database systems augmented by hardware accelerators, accelerating the index searching operations can greatly increase the runtime performance of database queries. Recently, adaptive radix trees (ART) have been shown to provide very fast index search implementation on the CPU. Here, we focus on an accelerator-based implementation of ART. We present a detailed performance study of our GPU-based adaptive radix tree (GRT) implementation over a variety of key distributions, synthetic benchmarks, and actual keys from music and book data sets. The performance is also compared with other index-searching schemes on the GPU. GRT on modern GPUs achieves some of the highest rates of index searches reported in the literature. For point queries, a throughput of up to 106 million and 130 million lookups per second is achieved for sparse and dense keys, respectively. For range queries, GRT yields 600 million and 1000 million lookups per second for sparse and dense keys, respectively, on a large dataset of 64 million 32-bit keys.

  9. Adaptive Optics Simulation for the World's Largest Telescope on Multicore Architectures with Multiple GPUs

    KAUST Repository

    Ltaief, Hatem

    2016-06-02

    We present a high performance comprehensive implementation of a multi-object adaptive optics (MOAO) simulation on multicore architectures with hardware accelerators in the context of computational astronomy. This implementation will be used as an operational testbed for simulating the de- sign of new instruments for the European Extremely Large Telescope project (E-ELT), the world\\'s biggest eye and one of Europe\\'s highest priorities in ground-based astronomy. The simulation corresponds to a multi-step multi-stage pro- cedure, which is fed, near real-time, by system and turbulence data coming from the telescope environment. Based on the PLASMA library powered by the OmpSs dynamic runtime system, our implementation relies on a task-based programming model to permit an asynchronous out-of-order execution. Using modern multicore architectures associated with the enormous computing power of GPUS, the resulting data-driven compute-intensive simulation of the entire MOAO application, composed of the tomographic reconstructor and the observing sequence, is capable of coping with the aforementioned real-time challenge and stands as a reference implementation for the computational astronomy community.

  10. Stochastic propagators for multi-pion correlation functions in lattice QCD with GPUs

    CERN Document Server

    Giedt, Joel

    2014-01-01

    Motivated by the application of L\\"uscher's finite volume method to the study of the lightest scalar resonance in the $\\pi\\pi \\to \\pi\\pi$ isoscalar channel, in this article we describe our studies of multi-pion correlation functions computed using stochastic propagators in quenched lattice QCD, harnessing GPUs for acceleration. We consider two methods for constructing the correlation functions. One "outer product" approach becomes quite expensive at large lattice extent $L$, having an ${\\cal O}(L^7)$ scaling. The other "stochastic operator" approach scales as ${\\cal O}(N_r^2 L^4)$, where $N_r$ is the number of random sources. It would become more efficient if variance reduction techniques are used and the volume is fairly large. It is also found that correlations between stochastic propagators appearing in the same diagram, when a single set of random source vectors is used, lead to much larger errors than if separate random sources are used for each propagator. The calculations involve states with quantum nu...

  11. Parallel mutual information estimation for inferring gene regulatory networks on GPUs

    Directory of Open Access Journals (Sweden)

    Liu Weiguo

    2011-06-01

    Full Text Available Abstract Background Mutual information is a measure of similarity between two variables. It has been widely used in various application domains including computational biology, machine learning, statistics, image processing, and financial computing. Previously used simple histogram based mutual information estimators lack the precision in quality compared to kernel based methods. The recently introduced B-spline function based mutual information estimation method is competitive to the kernel based methods in terms of quality but at a lower computational complexity. Results We present a new approach to accelerate the B-spline function based mutual information estimation algorithm with commodity graphics hardware. To derive an efficient mapping onto this type of architecture, we have used the Compute Unified Device Architecture (CUDA programming model to design and implement a new parallel algorithm. Our implementation, called CUDA-MI, can achieve speedups of up to 82 using double precision on a single GPU compared to a multi-threaded implementation on a quad-core CPU for large microarray datasets. We have used the results obtained by CUDA-MI to infer gene regulatory networks (GRNs from microarray data. The comparisons to existing methods including ARACNE and TINGe show that CUDA-MI produces GRNs of higher quality in less time. Conclusions CUDA-MI is publicly available open-source software, written in CUDA and C++ programming languages. It obtains significant speedup over sequential multi-threaded implementation by fully exploiting the compute capability of commonly used CUDA-enabled low-cost GPUs.

  12. An evaluation of the potential of GPUs to accelerate tracking algorithms for the ATLAS trigger

    CERN Document Server

    Baines, JTM; The ATLAS collaboration; Emeliyanov, D; Howard, JR; Kama, S; Washbrook, AJ; Wynne, BM

    2014-01-01

    The potential of GPUs has been evaluated as a possible way to accelerate trigger algorithms for the ATLAS experiment located at the Large Hadron Collider (LHC). During LHC Run-1 ATLAS employed a three-level trigger system to progressively reduce the LHC collision rate of 20 MHz to a storage rate of about 600 Hz for offline processing. Reconstruction of charged particles trajectories through the Inner Detector (ID) was performed at the second (L2) and third (EF) trigger levels. The ID contains pixel, silicon strip (SCT) and straw-tube technologies. Prior to tracking, data-preparation algorithms processed the ID raw data producing measurements of the track position at each detector layer. The data-preparation and tracking consumed almost three-quarters of the total L2 CPU resources during 2012 data-taking. Detailed performance studies of a CUDA™ implementation of the L2 pixel and SCT data-preparation and tracking algorithms running on a Nvidia® Tesla C2050 GPU have shown a speed-up by a factor of 12 for the ...

  13. BlazeDEM3D-GPU A Large Scale DEM simulation code for GPUs

    Science.gov (United States)

    Govender, Nicolin; Wilke, Daniel; Pizette, Patrick; Khinast, Johannes

    2017-06-01

    Accurately predicting the dynamics of particulate materials is of importance to numerous scientific and industrial areas with applications ranging across particle scales from powder flow to ore crushing. Computational discrete element simulations is a viable option to aid in the understanding of particulate dynamics and design of devices such as mixers, silos and ball mills, as laboratory scale tests comes at a significant cost. However, the computational time required to simulate an industrial scale simulation which consists of tens of millions of particles can take months to complete on large CPU clusters, making the Discrete Element Method (DEM) unfeasible for industrial applications. Simulations are therefore typically restricted to tens of thousands of particles with highly detailed particle shapes or a few million of particles with often oversimplified particle shapes. However, a number of applications require accurate representation of the particle shape to capture the macroscopic behaviour of the particulate system. In this paper we give an overview of the recent extensions to the open source GPU based DEM code, BlazeDEM3D-GPU, that can simulate millions of polyhedra and tens of millions of spheres on a desktop computer with a single or multiple GPUs.

  14. Spectral turning bands for efficient Gaussian random fields generation on GPUs and accelerators

    Science.gov (United States)

    Hunger, L.; Cosenza, B.; Kimeswenger, S.; Fahringer, T.

    2015-11-01

    A random field (RF) is a set of correlated random variables associated with different spatial locations. RF generation algorithms are of crucial importance for many scientific areas, such as astrophysics, geostatistics, computer graphics, and many others. Current approaches commonly make use of 3D fast Fourier transform (FFT), which does not scale well for RF bigger than the available memory; they are also limited to regular rectilinear meshes. We introduce random field generation with the turning band method (RAFT), an RF generation algorithm based on the turning band method that is optimized for massively parallel hardware such as GPUs and accelerators. Our algorithm replaces the 3D FFT with a lower-order, one-dimensional FFT followed by a projection step and is further optimized with loop unrolling and blocking. RAFT can easily generate RF on non-regular (non-uniform) meshes and efficiently produce fields with mesh sizes bigger than the available device memory by using a streaming, out-of-core approach. Our algorithm generates RF with the correct statistical behavior and is tested on a variety of modern hardware, such as NVIDIA Tesla, AMD FirePro and Intel Phi. RAFT is faster than the traditional methods on regular meshes and has been successfully applied to two real case scenarios: planetary nebulae and cosmological simulations.

  15. The directory of United States coal & technology export resources. Profiles of domestic US corporations, associations and public entities, nationwide, which offer products or services suitable for export, relating to coal and its utilization

    Energy Technology Data Exchange (ETDEWEB)

    1994-01-01

    The purpose of this directory is to provide a listing of available U.S. coal and coal related resources to potential purchasers of those resources abroad. The directory lists business entities within the US which offer coal related resources, products and services for sale on the international market. Each listing is intended to describe the particular business niche or range of product and/or services offered by a particular company. The listing provides addresses, telephones, and telex/fax for key staff in each company committed to the facilitation of international trade. The content of each listing has been formulated especially for this directory and reflects data current as of the date of this edition. The directory listings are divided into four primary classifications: coal resources; technology resources; support services; and financing and resource packaging. The first three of which are subdivided as follows: Coal Resources -- coal derivatives, coal exporters, and coal mining; Technology Resources -- advanced utilization, architects and engineers, boiler equipment, emissions control and waste disposal systems, facility construction, mining equipment, power generation systems, technical publications, and transport equipment; Support Services -- coal transport, facility operations, freight forwarders, sampling services and equipment, and technical consultants. Listings for the directory were solicited on the basis of this industry breakdown. Each of the four sections of this directory begins with a matrix illustrating which companies fall within the particular subclassifications specific to that main classification. A general alphabetical index of companies and an index by product/service classification are provided following the last section of the directory.

  16. Postgraduates courses offered to nursing

    Directory of Open Access Journals (Sweden)

    Pedro Jorge Araujo

    2011-07-01

    Full Text Available Aim: To know the official masters that the Spanish Universities have offered during the academic course 2010/2011.Material and methods: Descriptive observational and transversal court study, in which it has analysed 170 university official masters and in which it has used a questionnaire with a total of 15 questions elaborated for this work.Results: 52 Spanish Universities of the 75 that there is have offered during the academic course 2010/2011 official masters that can realise for graduated in infirmary. By areas, the official masters more offered have been the ones of nutrition and alimentary security. 76,33% of the official masters have a length of 1 academic year. Almost the half of the official masters have an orientation researcher-professional and almost 40% researcher. 62,65% of the masters give of face-to-face way. In 52,1% of the official masters do not realise external practices and 86,2% has continuity with the doctorate.Conclusions: It has seen that it is necessary that expand the number of masters including other fields of study that contribute to a main specialisation of the professionals of the infirmary. An important percentage of official masters give in face-to-face modality, and there is very few offered on-line or to distance.

  17. GPUs for fast pattern matching in the RICH of the NA62 experiment

    Science.gov (United States)

    Lamanna, Gianluca; Collazuol, Gianmaria; Sozzi, Marco

    2011-05-01

    In rare decays experiments an effective online selection is a fundamental part of the data acquisition system (DAQ) in order to reduce both the quantity of data written on tape and the bandwidth requirements for the DAQ system. A multilevel architecture is commonly used to achieve a higher reduction factor, exploiting dedicated custom hardware and flexible software in standard computers. In this paper we discuss the possibility to use commercial video card processors (GPU) to build a fast and effective trigger system, both at hardware and software level. The computing power of the GPUs allows to design a real-time system in which trigger decisions are taken directly in the video processor with a defined maximum latency. This allows building lowest trigger levels based on standard off-the-shelf PCs with CPU and GPU (instead of the commonly adopted solutions based on custom electronics with FPGA or ASICs) with enhanced and high performance computation capabilities, resulting in high rejection power, high efficiency and simpler low level triggers. The ongoing work presented here shows the results achieved in the case of fast pattern matching in the RICH detector of the NA62 at CERN, aiming at measuring the Branching Ratio of the ultra rare decay K+→π+νν¯, is considered as use case, although the versatility and the customizability of this approach easily allow exporting the concept to different contexts. In particular the application is related to particle identification in the RICH detector of the NA62 experiment, where the rate of events to be analyzed will be around 10 MHz. The results obtained in lab tests are very encouraging to go towards a working prototype. Due to the use of off-the-shelf technology, in continuous development for other purposes (Video Games, image editing,…), the architecture described would be easily exported into other experiments, for building powerful, flexible and fully customizable trigger systems.

  18. PIC codes for plasma accelerators on emerging computer architectures (GPUS, Multicore/Manycore CPUS)

    Science.gov (United States)

    Vincenti, Henri

    2016-03-01

    The advent of exascale computers will enable 3D simulations of a new laser-plasma interaction regimes that were previously out of reach of current Petasale computers. However, the paradigm used to write current PIC codes will have to change in order to fully exploit the potentialities of these new computing architectures. Indeed, achieving Exascale computing facilities in the next decade will be a great challenge in terms of energy consumption and will imply hardware developments directly impacting our way of implementing PIC codes. As data movement (from die to network) is by far the most energy consuming part of an algorithm future computers will tend to increase memory locality at the hardware level and reduce energy consumption related to data movement by using more and more cores on each compute nodes (''fat nodes'') that will have a reduced clock speed to allow for efficient cooling. To compensate for frequency decrease, CPU machine vendors are making use of long SIMD instruction registers that are able to process multiple data with one arithmetic operator in one clock cycle. SIMD register length is expected to double every four years. GPU's also have a reduced clock speed per core and can process Multiple Instructions on Multiple Datas (MIMD). At the software level Particle-In-Cell (PIC) codes will thus have to achieve both good memory locality and vectorization (for Multicore/Manycore CPU) to fully take advantage of these upcoming architectures. In this talk, we present the portable solutions we implemented in our high performance skeleton PIC code PICSAR to both achieve good memory locality and cache reuse as well as good vectorization on SIMD architectures. We also present the portable solutions used to parallelize the Pseudo-sepctral quasi-cylindrical code FBPIC on GPUs using the Numba python compiler.

  19. GPUs for fast pattern matching in the RICH of the NA62 experiment

    Energy Technology Data Exchange (ETDEWEB)

    Lamanna, Gianluca, E-mail: gianluca.lamanna@cern.c [CERN, 1211 Geneve 23 (Switzerland); Collazuol, Gianmaria, E-mail: gianmaria.collazuol@cern.c [INFN Pisa, Largo Pontecorvo 3, 56127 Pisa (Italy); Sozzi, Marco, E-mail: marco.sozzi@cern.c [University and INFN Pisa, Largo Pontecorvo 3, 56127 Pisa (Italy)

    2011-05-21

    In rare decays experiments an effective online selection is a fundamental part of the data acquisition system (DAQ) in order to reduce both the quantity of data written on tape and the bandwidth requirements for the DAQ system. A multilevel architecture is commonly used to achieve a higher reduction factor, exploiting dedicated custom hardware and flexible software in standard computers. In this paper we discuss the possibility to use commercial video card processors (GPU) to build a fast and effective trigger system, both at hardware and software level. The computing power of the GPUs allows to design a real-time system in which trigger decisions are taken directly in the video processor with a defined maximum latency. This allows building lowest trigger levels based on standard off-the-shelf PCs with CPU and GPU (instead of the commonly adopted solutions based on custom electronics with FPGA or ASICs) with enhanced and high performance computation capabilities, resulting in high rejection power, high efficiency and simpler low level triggers. The ongoing work presented here shows the results achieved in the case of fast pattern matching in the RICH detector of the NA62 at CERN, aiming at measuring the Branching Ratio of the ultra rare decay K{sup +}{yields}{pi}{sup +}{nu}{nu}-bar, is considered as use case, although the versatility and the customizability of this approach easily allow exporting the concept to different contexts. In particular the application is related to particle identification in the RICH detector of the NA62 experiment, where the rate of events to be analyzed will be around 10 MHz. The results obtained in lab tests are very encouraging to go towards a working prototype. Due to the use of off-the-shelf technology, in continuous development for other purposes (Video Games, image editing,...), the architecture described would be easily exported into other experiments, for building powerful, flexible and fully customizable trigger systems.

  20. 31 CFR 316.1 - Offering of bonds.

    Science.gov (United States)

    2010-07-01

    ... 31 Money and Finance: Treasury 2 2010-07-01 2010-07-01 false Offering of bonds. 316.1 Section 316.1 Money and Finance: Treasury Regulations Relating to Money and Finance (Continued) FISCAL SERVICE, DEPARTMENT OF THE TREASURY BUREAU OF THE PUBLIC DEBT OFFERING OF UNITED STATES SAVINGS BONDS, SERIES E §...

  1. Project United Families: psychotherapeutic care offered to a social risk population Projeto Famílias Unidas: atendimento psicoterapêutico à população em situação de risco social

    Directory of Open Access Journals (Sweden)

    Leila Márcia Souza Oliveira

    2010-11-01

    Full Text Available

    The aim of this report is encouraging reflection about psychotherapeutic care delivered to a community at social risk. During the period July 2005-July 2006 counseling sessions were held at a philanthropic institution, the Association for SocialWorks of the St. Michael’s Parish in the city of Simões Filho (Bahia, Brazil. The initiative was part of the Project United Families, implemented by the before mentioned association. Issues such as self-esteem and the possibility of social reinclusion were discussed in weekly meetings to enable the individuals introduce changes into their reality of life and help them adopt an attitude towards good citizenship. This work was conducted following the concepts of “Extended General Practice” which provides us with a new way of dealing with and trying to minimize poverty and of revitalizing the individuals desire outside the doctor’s office, at informal community-based institutions that represent potential resources capable of producing sense, contractibility and wellbeing.

    Este relato tem o objetivo de promover reflexão sobre atendimento psicológico em uma comunidade em situação de risco social. O atendimento foi realizado em instituição filantrópica, no município de Simões Filho (BA, através do Projeto Famílias Unidas da Associação Obras Sociais da Paróquia de São Miguel, no período compreendido entre julho de 2005 e julho de 2006. O atendimento aconteceu em encontros semanais onde foram abordadas a autoestima e a possibilidade de reinserção social do sujeito, o que possibilitou mudança na sua realidade de vida e um posicionamento direcionado à cidadania. O trabalho realizado, nos moldes da Clínica Ampliada, traz-nos uma nova forma de lidar com a pobreza, na tentativa de ajudar a minimizá-la, na revitalização do desejo do sujeito em uma clínica extraconsultório, utilizando as instituições informais da comunidade, pois as mesmas representam recursos potenciais

  2. A Fast MHD Code for Gravitationally Stratified Media using Graphical Processing Units: SMAUG

    Indian Academy of Sciences (India)

    M. K. Griffiths; V. Fedun; R.Erdélyi

    2015-03-01

    Parallelization techniques have been exploited most successfully by the gaming/graphics industry with the adoption of graphical processing units (GPUs), possessing hundreds of processor cores. The opportunity has been recognized by the computational sciences and engineering communities, who have recently harnessed successfully the numerical performance of GPUs. For example, parallel magnetohydrodynamic (MHD) algorithms are important for numerical modelling of highly inhomogeneous solar, astrophysical and geophysical plasmas. Here, we describe the implementation of SMAUG, the Sheffield Magnetohydrodynamics Algorithm Using GPUs. SMAUG is a 1–3D MHD code capable of modelling magnetized and gravitationally stratified plasma. The objective of this paper is to present the numerical methods and techniques used for porting the code to this novel and highly parallel compute architecture. The methods employed are justified by the performance benchmarks and validation results demonstrating that the code successfully simulates the physics for a range of test scenarios including a full 3D realistic model of wave propagation in the solar atmosphere.

  3. Magnetohydrodynamics simulations on graphics processing units

    CERN Document Server

    Wong, Hon-Cheng; Feng, Xueshang; Tang, Zesheng

    2009-01-01

    Magnetohydrodynamics (MHD) simulations based on the ideal MHD equations have become a powerful tool for modeling phenomena in a wide range of applications including laboratory, astrophysical, and space plasmas. In general, high-resolution methods for solving the ideal MHD equations are computationally expensive and Beowulf clusters or even supercomputers are often used to run the codes that implemented these methods. With the advent of the Compute Unified Device Architecture (CUDA), modern graphics processing units (GPUs) provide an alternative approach to parallel computing for scientific simulations. In this paper we present, to the authors' knowledge, the first implementation to accelerate computation of MHD simulations on GPUs. Numerical tests have been performed to validate the correctness of our GPU MHD code. Performance measurements show that our GPU-based implementation achieves speedups of 2 (1D problem with 2048 grids), 106 (2D problem with 1024^2 grids), and 43 (3D problem with 128^3 grids), respec...

  4. Graphics Processing Unit Assisted Thermographic Compositing

    Science.gov (United States)

    Ragasa, Scott; McDougal, Matthew; Russell, Sam

    2013-01-01

    Objective: To develop a software application utilizing general purpose graphics processing units (GPUs) for the analysis of large sets of thermographic data. Background: Over the past few years, an increasing effort among scientists and engineers to utilize the GPU in a more general purpose fashion is allowing for supercomputer level results at individual workstations. As data sets grow, the methods to work them grow at an equal, and often greater, pace. Certain common computations can take advantage of the massively parallel and optimized hardware constructs of the GPU to allow for throughput that was previously reserved for compute clusters. These common computations have high degrees of data parallelism, that is, they are the same computation applied to a large set of data where the result does not depend on other data elements. Signal (image) processing is one area were GPUs are being used to greatly increase the performance of certain algorithms and analysis techniques.

  5. Parallelized CCHE2D flow model with CUDA Fortran on Graphics Process Units

    Science.gov (United States)

    This paper presents the CCHE2D implicit flow model parallelized using CUDA Fortran programming technique on Graphics Processing Units (GPUs). A parallelized implicit Alternating Direction Implicit (ADI) solver using Parallel Cyclic Reduction (PCR) algorithm on GPU is developed and tested. This solve...

  6. Cerveza platforms offer economic options

    Energy Technology Data Exchange (ETDEWEB)

    Leblanc, L.A.

    1982-08-01

    Two single-piece platforms, Cerveze and Cerveza Ligera, were installed by Union Oil Co. in 925-935 ft of water. The technology and equipment used for the two platforms can be used for units to a depth of 1,400 ft in mild climates and to 1,000 ft in more critical weather areas such as the North Sea. The significant improvements in design and procedures in the construction and installation of the Cerveza Ligera platform are: (1) four leg structure, as opposed to eight, requiring less steel; (2) simplified fabrication; and (3) quicker installation. The most significant area of improvement in the Ligera project compared with Cerveza was in communications. Communications between naval architects and onshore launch foremen during loadout, and between surveyors and tug captains during positioning, are cited as examples.

  7. 24.77 Pflops on a Gravitational Tree-Code to Simulate the Milky Way Galaxy with 18600 GPUs

    CERN Document Server

    Bédorf, Jeroen; Fujii, Michiko S; Nitadori, Keigo; Ishiyama, Tomoaki; Zwart, Simon Portegies

    2014-01-01

    We have simulated, for the first time, the long term evolution of the Milky Way Galaxy using 51 billion particles on the Swiss Piz Daint supercomputer with our $N$-body gravitational tree-code Bonsai. Herein, we describe the scientific motivation and numerical algorithms. The Milky Way model was simulated for 6 billion years, during which the bar structure and spiral arms were fully formed. This improves upon previous simulations by using 1000 times more particles, and provides a wealth of new data that can be directly compared with observations. We also report the scalability on both the Swiss Piz Daint and the US ORNL Titan. On Piz Daint the parallel efficiency of Bonsai was above 95%. The highest performance was achieved with a 242 billion particle Milky Way model using 18600 GPUs on Titan, thereby reaching a sustained GPU and application performance of 33.49 Pflops and 24.77 Pflops respectively.

  8. Flux tube widening in compact U (1) lattice gauge theory computed at T < Tc with the multilevel method and GPUs

    CERN Document Server

    Amado, A; Bicudo, P

    2013-01-01

    We utilize Polyakov loop correlations to study d=3+1 compact U (1) flux tubes and the static electron-positron potential in lattice gauge theory. With the plaquette field operator, in U(1) lattice gauge theory, we probe directly the components of the electric and magnetic fields. In order to improve the signal-to-noise ratio in the confinement phase, we apply the L\\"uscher-Weiss multilevel algorithm. Our code is written in CUDA, and we run it in NVIDIA FERMI generation GPUs, in order to achieve the necessary efficiency for our computations. We measure in detail the quantum widening of the flux tube, as a function of the intercharge distance and at different finite temperatures T < Tc . Our results are compatible with the Effective String Theory.

  9. Economic Dispatch of Demand Response Balancing through Asymmetric Block Offers

    DEFF Research Database (Denmark)

    O'Connell, Niamh; Pinson, Pierre; Madsen, Henrik

    2015-01-01

    load to provide a response to the power system and the subsequent need to recover. The conventional system dispatch algorithm is altered to facilitate the dispatch of demand response units alongside generating units using the proposed offer structure. The value of demand response is assessed through...... case studies that dispatch flexible supermarket refrigeration loads for the provision of regulating power. The demand resource is described by a set of asymmetric blocks, and a set of four blocks offers is shown to offer cost savings for the procurement of regulating power in excess of 20......%. For comparative purposes, the cost savings achievable with a fully observable and controllable demand response resource are evaluated, using a time series model of the refrigeration loads. The fully modeled resource offers greater savings; however the difference is small and potentially insufficient to justify...

  10. Real-space density functional theory on graphical processing units: computational approach and comparison to Gaussian basis set methods

    CERN Document Server

    Andrade, Xavier

    2013-01-01

    We discuss the application of graphical processing units (GPUs) to accelerate real-space density functional theory (DFT) calculations. To make our implementation efficient, we have developed a scheme to expose the data parallelism available in the DFT approach; this is applied to the different procedures required for a real-space DFT calculation. We present results for current-generation GPUs from AMD and Nvidia, which show that our scheme, implemented in the free code OCTOPUS, can reach a sustained performance of up to 90 GFlops for a single GPU, representing an important speed-up when compared to the CPU version of the code. Moreover, for some systems our implementation can outperform a GPU Gaussian basis set code, showing that the real-space approach is a competitive alternative for DFT simulations on GPUs.

  11. Real-Space Density Functional Theory on Graphical Processing Units: Computational Approach and Comparison to Gaussian Basis Set Methods.

    Science.gov (United States)

    Andrade, Xavier; Aspuru-Guzik, Alán

    2013-10-01

    We discuss the application of graphical processing units (GPUs) to accelerate real-space density functional theory (DFT) calculations. To make our implementation efficient, we have developed a scheme to expose the data parallelism available in the DFT approach; this is applied to the different procedures required for a real-space DFT calculation. We present results for current-generation GPUs from AMD and Nvidia, which show that our scheme, implemented in the free code Octopus, can reach a sustained performance of up to 90 GFlops for a single GPU, representing a significant speed-up when compared to the CPU version of the code. Moreover, for some systems, our implementation can outperform a GPU Gaussian basis set code, showing that the real-space approach is a competitive alternative for DFT simulations on GPUs.

  12. High-speed nonlinear finite element analysis for surgical simulation using graphics processing units.

    Science.gov (United States)

    Taylor, Z A; Cheng, M; Ourselin, S

    2008-05-01

    The use of biomechanical modelling, especially in conjunction with finite element analysis, has become common in many areas of medical image analysis and surgical simulation. Clinical employment of such techniques is hindered by conflicting requirements for high fidelity in the modelling approach, and fast solution speeds. We report the development of techniques for high-speed nonlinear finite element analysis for surgical simulation. We use a fully nonlinear total Lagrangian explicit finite element formulation which offers significant computational advantages for soft tissue simulation. However, the key contribution of the work is the presentation of a fast graphics processing unit (GPU) solution scheme for the finite element equations. To the best of our knowledge, this represents the first GPU implementation of a nonlinear finite element solver. We show that the present explicit finite element scheme is well suited to solution via highly parallel graphics hardware, and that even a midrange GPU allows significant solution speed gains (up to 16.8 x) compared with equivalent CPU implementations. For the models tested the scheme allows real-time solution of models with up to 16,000 tetrahedral elements. The use of GPUs for such purposes offers a cost-effective high-performance alternative to expensive multi-CPU machines, and may have important applications in medical image analysis and surgical simulation.

  13. 48 CFR 52.225-17 - Evaluation of Foreign Currency Offers.

    Science.gov (United States)

    2010-10-01

    ... Clauses 52.225-17 Evaluation of Foreign Currency Offers. As prescribed in 25.1103(c), insert the following provision: Evaluation of Foreign Currency Offers (FEB 2000) If the Government receives offers in more than one currency, the Government will evaluate offers by converting the foreign currency to United...

  14. 77 FR 18707 - USPS Package Intercept-New Product Offerings

    Science.gov (United States)

    2012-03-28

    ... From the Federal Register Online via the Government Publishing Office POSTAL SERVICE 39 CFR Part 111 USPS Package Intercept--New Product Offerings AGENCY: Postal Service TM . ACTION: Final rule with comments. ] SUMMARY: The Postal Service proposes to revise Mailing Standards of the United States Postal...

  15. THE ACCOUNTANT INFORMATION. DEMAND AND OFFER

    OpenAIRE

    Irina CHIRITA; Ioana ZAHEU

    2008-01-01

    The present paper is trying to correlate what Demand and Offer mean, from the economical point of view, which in the end tends towards the demand and offer of the accountant information. The objective of the demand and offer of accountant information is to promo te an efficient financial communication, objective that might be reached through the confrontation of the informational offer with the user’s demand. The information given by the enterprises are the basis of numerous economical and po...

  16. Speeding up IA mechanically-steered multistatic radar scheduling with GP-GPUs

    CSIR Research Space (South Africa)

    Focke, RW

    2016-07-01

    Full Text Available In this paper, the authors investigate speeding up the execution time of Interval Algebra (IA) mechanically-steered multistatic and multisite radar scheduling using a general-purpose graphical processing unit (GP-GPU). Multistatic/multisite radar...

  17. Determinants of High Schools' Advanced Course Offerings

    Science.gov (United States)

    Iatarola, Patrice; Conger, Dylan; Long, Mark C.

    2011-01-01

    This article examines the factors that determine a high school's probability of offering Advanced Placement (AP) and International Baccalaureate (IB) courses. The likelihood that a school offers advanced courses, and the number of sections that it offers, is largely driven by having a critical mass of students who enter high school with…

  18. The primary relevance of subconsciously offered attitudes

    DEFF Research Database (Denmark)

    Kristiansen, Tore

    2015-01-01

    ) and subconsciously (covertly) offered attitudes – because subconsciously offered attitudes appear to be a driving force in linguistic variation and change in a way that consciously offered attitudes are not. The argument is based on evidence from empirical investigations of attitudes and use in the ‘...

  19. Fast TPC Online Tracking on GPUs and Asynchronous Data Processing in the ALICE HLT to facilitate Online Calibration

    Science.gov (United States)

    Rohr, David; Gorbunov, Sergey; Krzewicki, Mikolaj; Breitner, Timo; Kretz, Matthias; Lindenstruth, Volker

    2015-12-01

    ALICE (A Large Heavy Ion Experiment) is one of the four major experiments at the Large Hadron Collider (LHC) at CERN, which is today the most powerful particle accelerator worldwide. The High Level Trigger (HLT) is an online compute farm of about 200 nodes, which reconstructs events measured by the ALICE detector in real-time. The HLT uses a custom online data-transport framework to distribute data and workload among the compute nodes. ALICE employs several calibration-sensitive subdetectors, e.g. the TPC (Time Projection Chamber). For a precise reconstruction, the HLT has to perform the calibration online. Online- calibration can make certain Offline calibration steps obsolete and can thus speed up Offline analysis. Looking forward to ALICE Run III starting in 2020, online calibration becomes a necessity. The main detector used for track reconstruction is the TPC. Reconstructing the trajectories in the TPC is the most compute-intense step during event reconstruction. Therefore, a fast tracking implementation is of great importance. Reconstructed TPC tracks build the basis for the calibration making a fast online-tracking mandatory. We present several components developed for the ALICE High Level Trigger to perform fast event reconstruction and to provide features required for online calibration. As first topic, we present our TPC tracker, which employs GPUs to speed up the processing, and which bases on a Cellular Automaton and on the Kalman filter. Our TPC tracking algorithm has been successfully used in 2011 and 2012 in the lead-lead and the proton-lead runs. We have improved it to leverage features of newer GPUs and we have ported it to support OpenCL, CUDA, and CPUs with a single common source code. This makes us vendor independent. As second topic, we present framework extensions required for online calibration. The extensions, however, are generic and can be used for other purposes as well. We have extended the framework to support asynchronous compute

  20. Application Performance Analysis and Efficient Execution on Systems with multi-core CPUs, GPUs and MICs: A Case Study with Microscopy Image Analysis.

    Science.gov (United States)

    Teodoro, George; Kurc, Tahsin; Andrade, Guilherme; Kong, Jun; Ferreira, Renato; Saltz, Joel

    2017-01-01

    We carry out a comparative performance study of multi-core CPUs, GPUs and Intel Xeon Phi (Many Integrated Core-MIC) with a microscopy image analysis application. We experimentally evaluate the performance of computing devices on core operations of the application. We correlate the observed performance with the characteristics of computing devices and data access patterns, computation complexities, and parallelization forms of the operations. The results show a significant variability in the performance of operations with respect to the device used. The performances of operations with regular data access are comparable or sometimes better on a MIC than that on a GPU. GPUs are more efficient than MICs for operations that access data irregularly, because of the lower bandwidth of the MIC for random data accesses. We propose new performance-aware scheduling strategies that consider variabilities in operation speedups. Our scheduling strategies significantly improve application performance compared to classic strategies in hybrid configurations.

  1. A comparison between parallelization approaches in molecular dynamics simulations on GPUs.

    Science.gov (United States)

    Rovigatti, Lorenzo; Sulc, Petr; Reguly, István Z; Romano, Flavio

    2015-01-05

    We test the relative performances of two different approaches to the computation of forces for molecular dynamics simulations on graphics processing units. A "vertex-based" approach, where a computing thread is started per particle, is compared to an "edge-based" approach, where a thread is started per each potentially non-zero interaction. We find that the former is more efficient for systems with many simple interactions per particle while the latter is more efficient if the system has more complicated interactions or fewer of them. By comparing computation times on more and less recent graphics processing unit technology, we predict that, if the current trend of increasing the number of processing cores--as opposed to their computing power--remains, the "edge-based" approach will gradually become the most efficient choice in an increasing number of cases.

  2. Efficient parallelisation techniques for applications running on GPUs using the CUDA framework

    OpenAIRE

    Ottesen, Alexander

    2009-01-01

    Modern graphic processing units (GPU) are powerful parallel processing multi-core devices that are found in most computers today. The increase in processing resources of the GPU, coupled with improvements and flexibility of the programming frameworks, has increased the interest in general purpose programming on theGPU(GPGPU). In this thesis, we investigate how the GPU architecture and its processing capabilities can be utilised in general purpose applications using the NVIDIA c...

  3. Graphics processing units accelerated semiclassical initial value representation molecular dynamics

    Energy Technology Data Exchange (ETDEWEB)

    Tamascelli, Dario; Dambrosio, Francesco Saverio [Dipartimento di Fisica, Università degli Studi di Milano, via Celoria 16, 20133 Milano (Italy); Conte, Riccardo [Department of Chemistry and Cherry L. Emerson Center for Scientific Computation, Emory University, Atlanta, Georgia 30322 (United States); Ceotto, Michele, E-mail: michele.ceotto@unimi.it [Dipartimento di Chimica, Università degli Studi di Milano, via Golgi 19, 20133 Milano (Italy)

    2014-05-07

    This paper presents a Graphics Processing Units (GPUs) implementation of the Semiclassical Initial Value Representation (SC-IVR) propagator for vibrational molecular spectroscopy calculations. The time-averaging formulation of the SC-IVR for power spectrum calculations is employed. Details about the GPU implementation of the semiclassical code are provided. Four molecules with an increasing number of atoms are considered and the GPU-calculated vibrational frequencies perfectly match the benchmark values. The computational time scaling of two GPUs (NVIDIA Tesla C2075 and Kepler K20), respectively, versus two CPUs (Intel Core i5 and Intel Xeon E5-2687W) and the critical issues related to the GPU implementation are discussed. The resulting reduction in computational time and power consumption is significant and semiclassical GPU calculations are shown to be environment friendly.

  4. Accelerating finite-rate chemical kinetics with coprocessors: comparing vectorization methods on GPUs, MICs, and CPUs

    CERN Document Server

    Stone, Christopher P

    2016-01-01

    Efficient ordinary differential equation solvers for chemical kinetics must take into account the available thread and instruction-level parallelism of the underlying hardware, especially on many-core coprocessors, as well as the numerical efficiency. A stiff Rosenbrock and nonstiff Runge-Kutta solver are implemented using the single instruction, multiple thread (SIMT) and single instruction, multiple data (SIMD) paradigms with OpenCL. The performances of these parallel implementations were measured with three chemical kinetic models across several multicore and many-core platforms. Two runtime benchmarks were conducted to clearly determine any performance advantage offered by either method: evaluating the right-hand-side source terms in parallel, and integrating a series of constant-pressure homogeneous reactors using the Rosenbrock and Runge-Kutta solvers. The right-hand-side evaluations with SIMD parallelism on the host multicore Xeon CPU and many-core Xeon Phi co-processor performed approximately three ti...

  5. Fat versus Thin Threading Approach on GPUs: Application to Stochastic Simulation of Chemical Reactions

    KAUST Repository

    Klingbeil, Guido

    2012-02-01

    We explore two different threading approaches on a graphics processing unit (GPU) exploiting two different characteristics of the current GPU architecture. The fat thread approach tries to minimize data access time by relying on shared memory and registers potentially sacrificing parallelism. The thin thread approach maximizes parallelism and tries to hide access latencies. We apply these two approaches to the parallel stochastic simulation of chemical reaction systems using the stochastic simulation algorithm (SSA) by Gillespie [14]. In these cases, the proposed thin thread approach shows comparable performance while eliminating the limitation of the reaction system\\'s size. © 2006 IEEE.

  6. Fast Sparse Matrix-Vector Multiplication on GPUs: Implications for Graph Mining

    CERN Document Server

    Yang, Xintian; Sadayappan, Ponnuswamy

    2011-01-01

    Scaling up the sparse matrix-vector multiplication kernel on modern Graphics Processing Units (GPU) has been at the heart of numerous studies in both academia and industry. In this article we present a novel non-parametric, self-tunable, approach to data representation for computing this kernel, particularly targeting sparse matrices representing power-law graphs. Using real data, we show how our representation scheme, coupled with a novel tiling algorithm, can yield significant benefits over the current state of the art GPU efforts on a number of core data mining algorithms such as PageRank, HITS and Random Walk with Restart.

  7. Marketing strategy to differentiate the offer

    OpenAIRE

    Miceski, Trajko; Pasovska, Silvana

    2013-01-01

    The marketing strategy for differentiation of the offers is important and accepted strategy especially by the bigger legal entities. The differentiation of the offers leads to bigger profit and bigger profitability in operation, through targeting of the demand towards the product of the enterprise. The vertical differentiation of the offers is directed towards the quality of the product itself which is observed as a something superior despite the competitive product which is observed as somet...

  8. Tourists’ expectations towards the agritourism farms’ offer

    Directory of Open Access Journals (Sweden)

    Iwona Wilk

    2013-06-01

    Full Text Available Agritourism plays an important role in multifunctional agriculture. Its development depends on agritourists’ needs identification in relation to the desired agritourism offer components which contributes to their improvement within agritourism farms market activity. The aim of the study was to determine customers preferences towards the agritourism farms offer in the Lodz region. The study was carried out on a sample of 120 respondents in 2011 (July-August and revealed that agritourists expect an offer, consisting of the components of various options offered by agritourism farm, matching their individual needs.

  9. Fast and accurate protein substructure searching with simulated annealing and GPUs

    Directory of Open Access Journals (Sweden)

    Stivala Alex D

    2010-09-01

    Full Text Available Abstract Background Searching a database of protein structures for matches to a query structure, or occurrences of a structural motif, is an important task in structural biology and bioinformatics. While there are many existing methods for structural similarity searching, faster and more accurate approaches are still required, and few current methods are capable of substructure (motif searching. Results We developed an improved heuristic for tableau-based protein structure and substructure searching using simulated annealing, that is as fast or faster and comparable in accuracy, with some widely used existing methods. Furthermore, we created a parallel implementation on a modern graphics processing unit (GPU. Conclusions The GPU implementation achieves up to 34 times speedup over the CPU implementation of tableau-based structure search with simulated annealing, making it one of the fastest available methods. To the best of our knowledge, this is the first application of a GPU to the protein structural search problem.

  10. Online Tracking Algorithms on GPUs for the P̅ANDA Experiment at FAIR

    Science.gov (United States)

    Bianchi, L.; Herten, A.; Ritman, J.; Stockmanns, T.; Adinetz, A.; Kraus, J.; Pleiter, D.

    2015-12-01

    P̅ANDA is a future hadron and nuclear physics experiment at the FAIR facility in construction in Darmstadt, Germany. In contrast to the majority of current experiments, PANDA's strategy for data acquisition is based on event reconstruction from free-streaming data, performed in real time entirely by software algorithms using global detector information. This paper reports the status of the development of algorithms for the reconstruction of charged particle tracks, optimized online data processing applications, using General-Purpose Graphic Processing Units (GPU). Two algorithms for trackfinding, the Triplet Finder and the Circle Hough, are described, and details of their GPU implementations are highlighted. Average track reconstruction times of less than 100 ns are obtained running the Triplet Finder on state-of- the-art GPU cards. In addition, a proof-of-concept system for the dispatch of data to tracking algorithms using Message Queues is presented.

  11. Parallel algorithm for real-time contouring from grid DEM on modern GPUs

    Institute of Scientific and Technical Information of China (English)

    2010-01-01

    A real-time algorithm for constructing contour maps from grid DEM data is pre-sented.It runs completely within the programmable 3D visualization pipeline.The interpolation is paralleled by rasterizer units in the graphics card,and contour line extraction is paralleled by pixel shader.During each frame of the rendering,we first make an elevation gradient map out of original terrain vertex data,then figure out the final contour lines with image-space processing,and directly blend the results on the original scene to obtain a final scene with contour map using alpha-blending.We implement this method in our global 3D-digitalearth system with Direct3D?9.0c API and tested on some consumer level PC platforms.For arbitrary scene with certain LOD level,the process takes less than 10 ms,giving topologically correct,anti-aliased contour lines.

  12. 16 CFR 502.101 - Introductory offers.

    Science.gov (United States)

    2010-01-01

    ... FAIR PACKAGING AND LABELING ACT Retail Sale Price Representations § 502.101 Introductory offers. (a... retail sale at a price lower than the anticipated ordinary and customary retail sale price. (b) The... duration in excess of 6 months. (4) At the time of making the introductory offer promotion, the...

  13. 16 CFR 238.2 - Initial offer.

    Science.gov (United States)

    2010-01-01

    ... 16 Commercial Practices 1 2010-01-01 2010-01-01 false Initial offer. 238.2 Section 238.2 Commercial Practices FEDERAL TRADE COMMISSION GUIDES AND TRADE PRACTICE RULES GUIDES AGAINST BAIT ADVERTISING § 238.2 Initial offer. (a) No statement or illustration should be used in any advertisement...

  14. 7 CFR 3560.656 - Incentives offers.

    Science.gov (United States)

    2010-01-01

    ... 7 Agriculture 15 2010-01-01 2010-01-01 false Incentives offers. 3560.656 Section 3560.656... AGRICULTURE DIRECT MULTI-FAMILY HOUSING LOANS AND GRANTS Housing Preservation § 3560.656 Incentives offers. (a....653(d), incentives to agree to the restrictive-use period in § 3560.662 if the following conditions...

  15. An offer you can’t refuse

    OpenAIRE

    Fletcher, Roland

    2001-01-01

    The general requirements of a valid contract must contain an offer, acceptance, consideration, intention, capacity and if necessary the correct formation, eg, does the contract have to be in writing. The focus of this article will be on offer, acceptance, consideration and an invitation to treat when dealing with contracts concluded during an auction.

  16. 17 CFR 230.252 - Offering statement.

    Science.gov (United States)

    2010-04-01

    ..., language and pagination. The requirements for offering statements are the same as those specified in § 230... Officer, a majority of the members of its board of directors or other governing body, and each selling... that contains the following language: This offering statement shall become qualified on the...

  17. Home-care companies' offerings take off.

    Science.gov (United States)

    Lutz, S

    1991-06-03

    Some home infusion therapy companies have been the beneficiaries of cash infusions thanks to the bullish reception of public offerings this year. The lucrative industry, reimbursed primarily by private payers and one of the fastest growing in healthcare, has long been a favorite on Wall Street. The companies plan to use proceeds from the successful offerings to pay off debt and finance expansion.

  18. Feature-Adaptive Rendering of Loop Subdivision Surfaces on Modern GPUs

    Institute of Scientific and Technical Information of China (English)

    黄韵岑; 冯结青; 崔元敏; 杨宝光

    2014-01-01

    We present a novel approach for real-time rendering Loop subdivision surfaces on modern graphics hardware. Our algorithm evaluates both positions and normals accurately, thus providing the true Loop subdivision surface. The core idea is to recursively refine irregular patches using a GPU compute kernel. All generated regular patches are then directly evaluated and rendered using the hardware tessellation unit. Our approach handles triangular control meshes of arbitrary topologies and incorporates common subdivision surface features such as semi-sharp creases and hierarchical edits. While surface rendering is accurate up to machine precision, we also enforce a consistent bitwise evaluation of positions and normals at patch boundaries. This is particularly useful in the context of displacement mapping which strictly requires matching surface normals. Furthermore, we incorporate efficient level-of-detail rendering where subdivision depth and tessellation density can be adjusted on-the-fly. Overall, our algorithm provides high-quality results at real-time frame rates, thus being ideally suited to interactive rendering applications such as video games or authoring tools.

  19. Harnessing the Power of GPUs to Speed Up Feature Selection for Outlier Detection

    Institute of Scientific and Technical Information of China (English)

    Fatemeh Azmandian; Ayse Yilmazer; Jennifer G Dy; Javed A Aslam; David R Kaeli

    2014-01-01

    Acquiring a set of features that emphasize the differences between normal data points and outliers can drastically facilitate the task of identifying outliers. In our work, we present a novel non-parametric evaluation criterion for filter-based feature selection which has an eye towards the final goal of outlier detection. The proposed method seeks the subset of features that represent the inherent characteristics of the normal dataset while forcing outliers to stand out, making them more easily distinguished by outlier detection algorithms. Experimental results on real datasets show the advantage of our feature selection algorithm compared with popular and state-of-the-art methods. We also show that the proposed algorithm is able to overcome the small sample space problem and perform well on highly imbalanced datasets. Furthermore, due to the highly parallelizable nature of the feature selection, we implement the algorithm on a graphics processing unit (GPU) to gain significant speedup over the serial version. The benefits of the GPU implementation are two-fold, as its performance scales very well in terms of the number of features, as well as the number of data points.

  20. An evaluation of GPUs for use in an upgraded ATLAS High Level Trigger

    CERN Document Server

    Tavares Delgado, Ademar; The ATLAS collaboration; Bold, Tomasz; Augusto, Jose; Kama, Sami; Negrini, Matteo; Sidoti, Antonio; Rinaldi, Lorenzo; Tupputi, Salvatore; Baines, John; Bauce, Matteo; Messina, Andrea; Emeliyanov, Dmitry; Elliott, Aaron Kyle; Greenwood, Zeno Dixon; Laosooksathit, Supada

    2015-01-01

    ATLAS is a general purpose particle physics experiment located on the LHC collider at CERN. The ATLAS Trigger system consists of two levels, the first level (L1) implemented in hardware and the High Level Trigger (HLT) implemented in software running on a farm of commodity CPU. The HLT reduces the trigger rate from the 100 kHz L1 accept rate to 1 kHz for recording requiring an average per-event processing time of ~250 ms for this task. The HLT selection is based on reconstructing tracks in the Inner Detector and Muon Spectrometer and clusters of energy deposited in the Calorimeter. Performing this reconstruction within the available HLT farm resources presents a significant challenge that will increase significantly after future LHC upgrades resulting in higher detector occupancies. General purpose Graphics Processor Units (GPGPU) are being evaluated for possible future inclusion in an upgraded HLT farm. We report on a demonstrator that has been developed consisting of GPGPU implementations of the Calorimeter...

  1. Reionization simulations powered by GPUs I: the structure of the Ultraviolet radiation field

    CERN Document Server

    Aubert, Dominique

    2010-01-01

    We present a set of cosmological simulations with radiative transfer in order to model the reionization history of the Universe. Galaxy formation and the associated star formation are followed self-consistently with gas and dark matter dynamics using the RAMSES code, while radiative transfer is performed as a post-processing step using a moment-based method with M1 closure relation in the ATON code. The latter has been ported to a multiple Graphics Processing Units (GPU) architecture using CUDA + MPI, resulting in an overall acceleration (x80) that allows us to tackle radiative transfer problems at resolution of 1024^3 + 2 levels of refinement for the hydro adaptive grid and 1024^3 for the RT cartesian grid. We observe a good convergence between our different resolution runs as long as the effects of finite resolution on the star formation history are properly taken into account. We also show that the neutral fraction depends on the total mass density, in a way close to the predictions of photoionization equi...

  2. An evaluation of GPUs for use in an upgraded ATLAS High Level Trigger

    CERN Document Server

    Tavares Delgado, Ademar; The ATLAS collaboration; Pastore, Francesca; Conde Muino, Patricia; Augusto, Jose; Kama, Sami; Negrini, Matteo; Sidoti, Antonio; Rinaldi, Lorenzo; Tupputi, Salvatore; Baines, John; Bauce, Matteo; Messina, Andrea; Emeliyanov, Dmitry; Elliott, Aaron Kyle; Greenwood Jr, Dick; Laosooksathit, Supada

    2015-01-01

    ATLAS is a general purpose particle physics experiment located on the LHC collider at CERN. The ATLAS Trigger system consists of two levels, the first level (L1) implemented in hardware and the High Level Trigger (HLT) implemented in software running on a farm of commodity CPU. The HLT reduces the trigger rate from the 100 kHz L1 accept rate to 1 kHz for recording requiring an average per-event processing time of ~250 ms for this task. The HLT selection is based on reconstructing tracks in the Inner Detector and Muon Spectrometer and clusters of energy deposited in the Calorimeter. Performing this reconstruction within the available HLT farm resources presents a significant challenge that will increase significantly after future LHC upgrades resulting in higher detector occupancies. General purpose Graphics Processor Units (GPGPU) are being evaluated for possible future inclusion in an upgraded HLT farm. We report on a demonstrator that has been developed consisting of GPGPU implementations of the Calorimeter...

  3. Molecular dynamics simulations with many-body potentials on multiple GPUs - the implementation, package and performance

    CERN Document Server

    Hou, Qing; Zhou, Yulu; Cui, Jiechao; Cui, Zhenguo; Wang, Jun

    2013-01-01

    Molecular dynamics (MD) is an important research tool extensively applied in materials science. Running MD on a graphics processing unit (GPU) is an attractive new approach for accelerating MD simulations. Currently, GPU implementations of MD usually run in a one-host-process-one-GPU (OHPOG) scheme. This scheme may pose a limitation on the system size that an implementation can handle due to the small device memory relative to the host memory. In this paper, we present a one-host-process-multiple-GPU (OHPMG) implementation of MD with embedded-atom-model or semi-empirical tight-binding many-body potentials. Because more device memory is available in an OHPMG process, the system size that can be handled is increased to a few million or more atoms. In comparison with the CPU implementation, in which Newton's third law is applied to improve the computational efficiency, our OHPMG implementation has achieved a 28.9x~86.0x speedup in double precision, depending on the system size, the cut-off ranges and the number ...

  4. Efficient magnetohydrodynamic simulations on graphics processing units with CUDA

    Science.gov (United States)

    Wong, Hon-Cheng; Wong, Un-Hong; Feng, Xueshang; Tang, Zesheng

    2011-10-01

    Magnetohydrodynamic (MHD) simulations based on the ideal MHD equations have become a powerful tool for modeling phenomena in a wide range of applications including laboratory, astrophysical, and space plasmas. In general, high-resolution methods for solving the ideal MHD equations are computationally expensive and Beowulf clusters or even supercomputers are often used to run the codes that implemented these methods. With the advent of the Compute Unified Device Architecture (CUDA), modern graphics processing units (GPUs) provide an alternative approach to parallel computing for scientific simulations. In this paper we present, to the best of the author's knowledge, the first implementation of MHD simulations entirely on GPUs with CUDA, named GPU-MHD, to accelerate the simulation process. GPU-MHD supports both single and double precision computations. A series of numerical tests have been performed to validate the correctness of our code. Accuracy evaluation by comparing single and double precision computation results is also given. Performance measurements of both single and double precision are conducted on both the NVIDIA GeForce GTX 295 (GT200 architecture) and GTX 480 (Fermi architecture) graphics cards. These measurements show that our GPU-based implementation achieves between one and two orders of magnitude of improvement depending on the graphics card used, the problem size, and the precision when comparing to the original serial CPU MHD implementation. In addition, we extend GPU-MHD to support the visualization of the simulation results and thus the whole MHD simulation and visualization process can be performed entirely on GPUs.

  5. Vacancy Duration, Wage Offers, and Job Requirements

    DEFF Research Database (Denmark)

    Eriksson, Tor Viking; Chen, Long-Hwa

    is concerned with how vacancy durations vary with firms' minimum wage offers and minimum job requirements (regarding education, skills, age, gender and earlier work experience). The empirical analysis is based on ten employer surveys carried out by the DGBAS on Taiwan during the period 1996-2006. We estimate......Besides wage offers, credentials like education, work experience and skill requirements are key screening tools for firms in their recruitment of new employees. This paper adds some new evidence to a relatively tiny literature on firms' recruitment behaviour. In particular, our analysis...... logistic discrete hazard models with a rich set of job and firm characteristics as explanatory variables. The results show that vacancies associated with higher wage offers take, ceteris paribus, longer to be filled. The impact of firms' wage offers and credential requirements does not vary over...

  6. M.S. Offered in Industrial Chemistry

    Science.gov (United States)

    Chemical and Engineering News, 1975

    1975-01-01

    Describes graduate training geared specifically to prepare students for work in industry. Reports on schools offering such a program, and outlines the major characteristics of each school's curriculum. (GS)

  7. Mylan to Offer Generic EpiPen

    Science.gov (United States)

    ... news/fullstory_160669.html Mylan to Offer Generic EpiPen Manufacturer responds to mounting criticism about price hikes ... cheaper generic version of the emergency allergy treatment EpiPen will be made available within the next few ...

  8. Offering Spiritual Support for Family or Friends

    Science.gov (United States)

    Offering Spiritual Support for Family or Friends People who are very ill often ask spiritual questions, in seeking comfort, meaning and hope. While clergy, chaplains and other spiritual leaders may play an important role in spiritual ...

  9. Fast crustal deformation computing method for multiple computations accelerated by a graphics processing unit cluster

    Science.gov (United States)

    Yamaguchi, Takuma; Ichimura, Tsuyoshi; Yagi, Yuji; Agata, Ryoichiro; Hori, Takane; Hori, Muneo

    2017-08-01

    As high-resolution observational data become more common, the demand for numerical simulations of crustal deformation using 3-D high-fidelity modelling is increasing. To increase the efficiency of performing numerical simulations with high computation costs, we developed a fast solver using heterogeneous computing, with graphics processing units (GPUs) and central processing units, and then used the solver in crustal deformation computations. The solver was based on an iterative solver and was devised so that a large proportion of the computation was calculated more quickly using GPUs. To confirm the utility of the proposed solver, we demonstrated a numerical simulation of the coseismic slip distribution estimation, which requires 360 000 crustal deformation computations with 82 196 106 degrees of freedom.

  10. Parallel computing for simultaneous iterative tomographic imaging by graphics processing units

    Science.gov (United States)

    Bello-Maldonado, Pedro D.; López, Ricardo; Rogers, Colleen; Jin, Yuanwei; Lu, Enyue

    2016-05-01

    In this paper, we address the problem of accelerating inversion algorithms for nonlinear acoustic tomographic imaging by parallel computing on graphics processing units (GPUs). Nonlinear inversion algorithms for tomographic imaging often rely on iterative algorithms for solving an inverse problem, thus computationally intensive. We study the simultaneous iterative reconstruction technique (SIRT) for the multiple-input-multiple-output (MIMO) tomography algorithm which enables parallel computations of the grid points as well as the parallel execution of multiple source excitation. Using graphics processing units (GPUs) and the Compute Unified Device Architecture (CUDA) programming model an overall improvement of 26.33x was achieved when combining both approaches compared with sequential algorithms. Furthermore we propose an adaptive iterative relaxation factor and the use of non-uniform weights to improve the overall convergence of the algorithm. Using these techniques, fast computations can be performed in parallel without the loss of image quality during the reconstruction process.

  11. Cigarette promotional offers: who takes advantage?

    Science.gov (United States)

    White, Victoria M; White, Martha M; Freeman, Karen; Gilpin, Elizabeth A; Pierce, John P

    2006-03-01

    Promotional offers on cigarettes (e.g., dollar-off, multipack discounts) composed the largest share of tobacco industry marketing expenditures, totaling $8.9 billion, or 72% of the total budget in 2002. Internal industry documents indicate that young adults, potential quitters, and other price-sensitive groups are the targets of these marketing tactics. How effective they are in actually reaching these groups in the general population of smokers has not yet been investigated. Data were from 4618 current smokers responding to the large, random-digit-dialed population-based 2002 California Tobacco Survey. The characteristics were identified of smokers who reported that they used these offers "every time I see one." Thirty-five percent of smokers used promotional offers every time they saw one. Multivariate analyses identified young adults, women, African Americans, those with higher daily cigarette consumption, and those worried about cigarette costs as more likely to use promotional offers at every opportunity. Smokers most committed to quitting were no more likely to use promotional offers than those with no intention to quit. Cigarette brand was highly correlated with age and race/ethnicity, and therefore was not included in the multivariate analysis. Those who smoked menthol cigarettes and Camels, more often young adults and African Americans, were much more likely than those of other brands to use promotional offers. With the exception of smokers intending to quit, cigarette promotional offers are effectively reaching most industry-targeted groups. Importantly, young adults, who have the greatest long-term customer potential, are responding.

  12. Vacancy Duration, Wage Offers, and Job Requirements

    DEFF Research Database (Denmark)

    Eriksson, Tor Viking; Chen, Long-Hwa

    Besides wage offers, credentials like education, work experience and skill requirements are key screening tools for firms in their recruitment of new employees. This paper adds some new evidence to a relatively tiny literature on firms' recruitment behaviour. In particular, our analysis...... is concerned with how vacancy durations vary with firms' minimum wage offers and minimum job requirements (regarding education, skills, age, gender and earlier work experience). The empirical analysis is based on ten employer surveys carried out by the DGBAS on Taiwan during the period 1996-2006. We estimate...... the business cycle. However, firms vary their skills requirements over the business cycle: our empirical analysis shows that, for a given wage offer, requirements are stricter in recessions and downturns. Separating between reasons for posting vacancies turned out important in explaining differences in vacancy...

  13. Product Offerings Testing through Customer Satisfaction

    Directory of Open Access Journals (Sweden)

    Tina Vukasovic

    2014-09-01

    Full Text Available Consumer satisfaction is imperative to a successful business, the reason for the choice of topic for this paper being explained thereby. Market changes have resulted in consumer’s enormous growth of power, which was recognized by many companies who adapted their business to meeting those expectations. Adaptation, however, also resulted in the need for constant measuring and evaluation. According to the above-mentioned, this paper measures consumer satisfaction with the product offer of the drugstore chain X The survey results have shown that X’s offer has not completely come up to the expectations of a smaller number of interviewees. In relation to the measuring ranks of consumer satisfaction defined, the greatest number of consumers has turned out to be satisfied with the product offer, whereas the percentage of those who find it excellent is smaller than the percentage of those who assess it as average.

  14. Wind offering in energy and reserve markets

    DEFF Research Database (Denmark)

    Soares, Tiago; Pinson, Pierre; Morais, Hugo

    2016-01-01

    their revenue, since currently wind turbine/farm technologies allow them to provide ancillary services. Thus, wind power producers are to develop offering strategies for participation in both energy and reserve markets, accounting for market rules, while ensuring optimal revenue. We consider a proportional...... offering strategy to optimally decide upon participation in both markets by maximizing expected revenue from day-ahead decisions while accounting for estimated regulation costs for failing to provide the services. An evaluation of considering the same proportional splitting of energy and reserve in both...

  15. Hair-offerings: an enigmatic Egyptian custom

    Directory of Open Access Journals (Sweden)

    G. J. Tassie

    1996-11-01

    Full Text Available The Egyptians did not record the reasons that lay behind the offering of hair. Using an holistic approach, which combines both ethnographic and ethnohistoric evidence, insights may be gained into the ancient remains of these rituals and practices.

  16. Developing an integrated offer for sustainable renovations

    NARCIS (Netherlands)

    Cré, J.; Mlecnik, E.; Kondratenko, I.; Degraeve, P.; Van der Have, J.A.; Vrijders, J.; Van Dessel, J.; Haavik, T.; Aabrekk, S.; Paiho, S.; Stenlund, O.; Svendsen, S.; Vanhoutteghem, L.; Hansen, S.

    2012-01-01

    Within an ERANET-ERACOBUILD project, this study investigates the opportunities and barriers to establish a “one stop shop” with an integrated supply side, to counteract the fragmented offer in sustainable renovation of single-family houses and to increase the level of knowledge, skills and innovatio

  17. The timing of initial public offerings

    NARCIS (Netherlands)

    Benninga, Simon; Helmantel, Mark; Sarig, Oded

    2001-01-01

    Abstract In this paper, we study the dynamics of initial public offerings (IPOs) by examining the tradeoff between an entrepreneur’s private benefits, which are lost whenever the firm is publicly traded, versus the advantages from diversification. We characterize the timing dimension of the decision

  18. Data Sorting Using Graphics Processing Units

    Directory of Open Access Journals (Sweden)

    M. J. Mišić

    2012-06-01

    Full Text Available Graphics processing units (GPUs have been increasingly used for general-purpose computation in recent years. The GPU accelerated applications are found in both scientific and commercial domains. Sorting is considered as one of the very important operations in many applications, so its efficient implementation is essential for the overall application performance. This paper represents an effort to analyze and evaluate the implementations of the representative sorting algorithms on the graphics processing units. Three sorting algorithms (Quicksort, Merge sort, and Radix sort were evaluated on the Compute Unified Device Architecture (CUDA platform that is used to execute applications on NVIDIA graphics processing units. Algorithms were tested and evaluated using an automated test environment with input datasets of different characteristics. Finally, the results of this analysis are briefly discussed.

  19. Porting LAMMPS to GPUs.

    Energy Technology Data Exchange (ETDEWEB)

    Brown, William Michael; Plimpton, Steven James; Wang, Peng; Agarwal, Pratul K.; Hampton, Scott; Crozier, Paul Stewart

    2010-03-01

    LAMMPS is a classical molecular dynamics code, and an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator. LAMMPS has potentials for soft materials (biomolecules, polymers) and solid-state materials (metals, semiconductors) and coarse-grained or mesoscopic systems. It can be used to model atoms or, more generically, as a parallel particle simulator at the atomic, meso, or continuum scale. LAMMPS runs on single processors or in parallel using message-passing techniques and a spatial-decomposition of the simulation domain. The code is designed to be easy to modify or extend with new functionality.

  20. Service Offering at Electrical Equipment Manufacturers

    Directory of Open Access Journals (Sweden)

    Lucie Kaňovská

    2015-09-01

    Full Text Available Purpose of the article: The aim of the paper is to uncover ways of managing service offering provid-ed by electrical equipment manufactures in the Czech Republic. The segment is extremely important for Czech industry nowadays, especially because of many companies being subcontractors for the car industry and mechanical engineering. The producers of electric equipment comply with the Czech in-dustry classification CZ-NACE 27. Methodology/methods: The questionnaire in the form of the Likert scale was prepared to gather in-formation about customer services. The respondents were usually directors or managers, e.g. employ-ees with high competencies of knowing customer services in this particular market. The total of 22 companies were included in the survey. Research was focused on the following industries classifica-tions belonging to CZ-NACE 27: CZ-NACE 27, CZ-NACE 271 and CZ-NACE 273. According to Czech Statistical Office the total number of companies belonging to these 3 segments is 136. It means 16,2 % companies belonging to CZ-NACE 27 participated in our research. Basic statistical methods were used to analyse the complete database. Scientific aim: The paper deals with the problem of service offering provided by today’s manufactur-ers. Global understanding of services that manufacturers really develop, sell, deliver and manage is still limited. Findings: Managing service offering provided by today‘s manufacturers shows that 1 Manufacturers not offer only tangible products, but also wide range of services and even information and support. 2 New products are not designed only according to company technicians, but also according to their cus-tomers. Their products and services are developed, tested and improved according to their needs. 3 Services provide complex customer care from time product selection to its end. Conclusions: Manufacturers of tangible products need to enlarge their product offering to be able to satisfy customers. Therefore

  1. Designing and Implementing an OVERFLOW Reader for ParaView and Comparing Performance Between Central Processing Units and Graphical Processing Units

    Science.gov (United States)

    Chawner, David M.; Gomez, Ray J.

    2010-01-01

    In the Applied Aerosciences and CFD branch at Johnson Space Center, computational simulations are run that face many challenges. Two of which are the ability to customize software for specialized needs and the need to run simulations as fast as possible. There are many different tools that are used for running these simulations and each one has its own pros and cons. Once these simulations are run, there needs to be software capable of visualizing the results in an appealing manner. Some of this software is called open source, meaning that anyone can edit the source code to make modifications and distribute it to all other users in a future release. This is very useful, especially in this branch where many different tools are being used. File readers can be written to load any file format into a program, to ease the bridging from one tool to another. Programming such a reader requires knowledge of the file format that is being read as well as the equations necessary to obtain the derived values after loading. When running these CFD simulations, extremely large files are being loaded and having values being calculated. These simulations usually take a few hours to complete, even on the fastest machines. Graphics processing units (GPUs) are usually used to load the graphics for computers; however, in recent years, GPUs are being used for more generic applications because of the speed of these processors. Applications run on GPUs have been known to run up to forty times faster than they would on normal central processing units (CPUs). If these CFD programs are extended to run on GPUs, the amount of time they would require to complete would be much less. This would allow more simulations to be run in the same amount of time and possibly perform more complex computations.

  2. FOREIGN LANGUAGE PROGRAMS OFFERED IN TURKISH UNIVERSITIES

    Directory of Open Access Journals (Sweden)

    Bengül CETINTAS

    2016-10-01

    Full Text Available n this study, the departments of philology and teaching, which take place in higher education programs in Turkey and give education in foreign language, have been examined. 23 different languages are offered to philology students who wants to attend to faculty of literature. Students can prefer classical languages besides modern languages. However, English, German, French, Arabic and Japanese are offered to the students of teaching department. To teach another foreign language, pedagogical formation is also required.This study focuses on the departments of German Language Teaching and German Language and Literature. From this point, the place and the importance of other philology and foreign language teaching departments in Turkish higher education have been examined.

  3. What's in Those Boxes Anyway? An Analysis of School Book Club Offerings (Rapid Research Report).

    Science.gov (United States)

    Strickland, Dorothy S.; And Others

    1996-01-01

    Investigates the nature and range of materials available through school book clubs in the United States. Finds that the clubs offered a wide range of books that extend beyond contemporary realistic fiction. (SR)

  4. What's in Those Boxes Anyway? An Analysis of School Book Club Offerings (Rapid Research Report).

    Science.gov (United States)

    Strickland, Dorothy S.; And Others

    1996-01-01

    Investigates the nature and range of materials available through school book clubs in the United States. Finds that the clubs offered a wide range of books that extend beyond contemporary realistic fiction. (SR)

  5. Real-Time Computation of Parameter Fitting and Image Reconstruction Using Graphical Processing Units

    CERN Document Server

    Locans, Uldis; Suter, Andreas; Fischer, Jannis; Lustermann, Werner; Dissertori, Gunther; Wang, Qiulin

    2016-01-01

    In recent years graphical processing units (GPUs) have become a powerful tool in scientific computing. Their potential to speed up highly parallel applications brings the power of high performance computing to a wider range of users. However, programming these devices and integrating their use in existing applications is still a challenging task. In this paper we examined the potential of GPUs for two different applications. The first application, created at Paul Scherrer Institut (PSI), is used for parameter fitting during data analysis of muSR (muon spin rotation, relaxation and resonance) experiments. The second application, developed at ETH, is used for PET (Positron Emission Tomography) image reconstruction and analysis. Applications currently in use were examined to identify parts of the algorithms in need of optimization. Efficient GPU kernels were created in order to allow applications to use a GPU, to speed up the previously identified parts. Benchmarking tests were performed in order to measure the ...

  6. Pulse shape analysis for segmented germanium detectors implemented in graphics processing units

    Energy Technology Data Exchange (ETDEWEB)

    Calore, Enrico, E-mail: enrico.calore@lnl.infn.it [INFN Laboratori Nazionali di Legnaro, Viale Dell' Università 2, I-35020 Legnaro, Padova (Italy); Bazzacco, Dino, E-mail: dino.bazzacco@pd.infn.it [INFN Sezione di Padova, Via Marzolo 8, I-35131 Padova (Italy); Recchia, Francesco, E-mail: francesco.recchia@pd.infn.it [INFN Sezione di Padova, Via Marzolo 8, I-35131 Padova (Italy); Dipartimento di Fisica e Astronomia dell' Università di Padova, Via Marzolo 8, I-35131 Padova (Italy)

    2013-08-11

    Position sensitive highly segmented germanium detectors constitute the state-of-the-art of the technology employed for γ-spectroscopy studies. The operation of large spectrometers composed of tens to hundreds of such detectors demands enormous amounts of computing power for the digital treatment of the signals. The use of Graphics Processing Units (GPUs) has been evaluated as a cost-effective solution to meet such requirements. Different implementations and the hardware constraints limiting the performance of the system are examined. -- Highlights: • We implemented the grid-search algorithm in OpenCL in order to be run on GPUs. • We compared its performances in respect to an optimized CPU implementation in C++. • We analyzed the results highlighting the most probable factors limiting their speed. • We propose some solutions to overcome their speed limits.

  7. Discontinuous Galerkin methods on graphics processing units for nonlinear hyperbolic conservation laws

    CERN Document Server

    Fuhry, Martin; Krivodonova, Lilia

    2016-01-01

    We present a novel implementation of the modal discontinuous Galerkin (DG) method for hyperbolic conservation laws in two dimensions on graphics processing units (GPUs) using NVIDIA's Compute Unified Device Architecture (CUDA). Both flexible and highly accurate, DG methods accommodate parallel architectures well as their discontinuous nature produces element-local approximations. High performance scientific computing suits GPUs well, as these powerful, massively parallel, cost-effective devices have recently included support for double-precision floating point numbers. Computed examples for Euler equations over unstructured triangle meshes demonstrate the effectiveness of our implementation on an NVIDIA GTX 580 device. Profiling of our method reveals performance comparable to an existing nodal DG-GPU implementation for linear problems.

  8. Speedup for quantum optimal control from automatic differentiation based on graphics processing units

    Science.gov (United States)

    Leung, Nelson; Abdelhafez, Mohamed; Koch, Jens; Schuster, David

    2017-04-01

    We implement a quantum optimal control algorithm based on automatic differentiation and harness the acceleration afforded by graphics processing units (GPUs). Automatic differentiation allows us to specify advanced optimization criteria and incorporate them in the optimization process with ease. We show that the use of GPUs can speedup calculations by more than an order of magnitude. Our strategy facilitates efficient numerical simulations on affordable desktop computers and exploration of a host of optimization constraints and system parameters relevant to real-life experiments. We demonstrate optimization of quantum evolution based on fine-grained evaluation of performance at each intermediate time step, thus enabling more intricate control on the evolution path, suppression of departures from the truncated model subspace, as well as minimization of the physical time needed to perform high-fidelity state preparation and unitary gates.

  9. The Timing and Function of Offers in U.S. and Japanese Negotiations

    Science.gov (United States)

    Adair, Wendi L.; Weingart, Laurie; Brett, Jeanne

    2007-01-01

    The authors examined the function of offers in U.S. and Japanese integrative negotiations. They proposed that early 1st offers begin information sharing and generate joint gains in Japan but have an anchoring effect that hinders joint gains in the United States. The data from the negotiation transcripts of 20 U.S. and 20 Japanese dyads supported 2…

  10. 31 CFR 316.13 - Reservation as to terms of offer.

    Science.gov (United States)

    2010-07-01

    ... 31 Money and Finance: Treasury 2 2010-07-01 2010-07-01 false Reservation as to terms of offer. 316.13 Section 316.13 Money and Finance: Treasury Regulations Relating to Money and Finance (Continued) FISCAL SERVICE, DEPARTMENT OF THE TREASURY BUREAU OF THE PUBLIC DEBT OFFERING OF UNITED STATES...

  11. Analysis of impact of general-purpose graphics processor units in supersonic flow modeling

    Science.gov (United States)

    Emelyanov, V. N.; Karpenko, A. G.; Kozelkov, A. S.; Teterina, I. V.; Volkov, K. N.; Yalozo, A. V.

    2017-06-01

    Computational methods are widely used in prediction of complex flowfields associated with off-normal situations in aerospace engineering. Modern graphics processing units (GPU) provide architectures and new programming models that enable to harness their large processing power and to design computational fluid dynamics (CFD) simulations at both high performance and low cost. Possibilities of the use of GPUs for the simulation of external and internal flows on unstructured meshes are discussed. The finite volume method is applied to solve three-dimensional unsteady compressible Euler and Navier-Stokes equations on unstructured meshes with high resolution numerical schemes. CUDA technology is used for programming implementation of parallel computational algorithms. Solutions of some benchmark test cases on GPUs are reported, and the results computed are compared with experimental and computational data. Approaches to optimization of the CFD code related to the use of different types of memory are considered. Speedup of solution on GPUs with respect to the solution on central processor unit (CPU) is compared. Performance measurements show that numerical schemes developed achieve 20-50 speedup on GPU hardware compared to CPU reference implementation. The results obtained provide promising perspective for designing a GPU-based software framework for applications in CFD.

  12. Shipping Firm to Offer Funeral Service

    Institute of Scientific and Technical Information of China (English)

    黄菊妹

    2001-01-01

    Japan’s largest shipping company, Nippon Yusen KK, plans to offer a new funeral ash-scattering service in a bid to (释义见19页文) capitalize on demand fuelled by the high ccst of cemeteries in Japan, a financial daily said Sunday. 日本最大的海运公司Nippon Yusen KK打算提供新的骨灰海葬服务,以求赢利。日本的公墓的价格居高不下,刺激了骨灰海葬的需求。一份金融日报星期日称。

  13. 基于GPU的快速色阶映射算子实现%Fast tone mapping operator implementation based on GPUs

    Institute of Scientific and Technical Information of China (English)

    钱银玲

    2013-01-01

    To improve the efficiency of tone mapping computing, this paper designed a fast implementation algorithm based on GPUs. First combined with basic reduce algorithm and the parallel computing characteristic of GPUs, the algorithm proposed two kernel functions to compute maximum luminance. Then it computed pixel-centered area average luminance using shared area intermediate result. Besides,for video stream processing, it proposed a solution to the unmatched problem of CPU reading and GPU computing by a texture pool and a heuristic method to update the global maximum luminance according to the value of a pixel set. Experiment proves that it obtains 4-5 efficiency improvement relative to CPU implementations. The algorithm can take full advantage of GPUs' parallel computing capability and reduce much duplicate computation. It's sufficient for realtime rendering and satiable for textures with different scales.%为了提高色阶映射计算的效率,设计了基于GPU的快速色阶映射算法.首先结合基本规约算法和GPU的并行运算特征设计了基于两个核函数的最大亮度计算方法,然后通过区域中间值共享计算以像素为中心的区域平均亮度,最后针对视屏处理,提出利用纹理缓存池解决CPU读数据和GPU处理数据速度不匹配的问题,并根据像素子集最大亮度自适应地更新全局最大亮度.实验结果相对相同算法的CPU实现得到了4~5倍的速度提升,表明所提出的算法能够充分利用GPU的并行性,并减少了大量重复运算,满足实时渲染的要求,并且对不同规模的纹理具有良好的适应性.

  14. CURRICULAR OFFER INFLUENCING STUDENTS’ SATISFACTION: COMPARATIVE STUDY

    Directory of Open Access Journals (Sweden)

    Oana DUMITRASCU

    2014-11-01

    Full Text Available The main objective of the study is the determination of students’ satisfaction regarding curricular activities. The study has been accomplished using the qualitative and quantitative research, using the bibliographic study, various secondary sources and different primary sources. The study is developed with a marketing research and accomplished using the survey method. 699 students from four universities have been questioned. Due to a comparative study the University of Applied Sciences Worms, University of Applied Sciences Wiesbaden Rüsselsheim, University of Applied Sciences Frankfurt am Main and Nürtingen-Geislingen University have been analysed and their similarities and differences have been identified. The collected data, based on the established sample, is evaluated through univariate and bivariate analysis. In accordance with the evaluated sample, specific gaps from each region are identified regarding the curricular offer of the analysed universities. As a result to the conducted study, recommendations for the University of Applied Sciences Worms regarding the student’s satisfaction concerning the curricular offer are presented.

  15. Wind offering in energy and reserve markets

    Science.gov (United States)

    Soares, T.; Pinson, P.; Morais, H.

    2016-09-01

    The increasing penetration of wind generation in power systems to fulfil the ambitious European targets will make wind power producers to play an even more important role in the future power system. Wind power producers are being incentivized to participate in reserve markets to increase their revenue, since currently wind turbine/farm technologies allow them to provide ancillary services. Thus, wind power producers are to develop offering strategies for participation in both energy and reserve markets, accounting for market rules, while ensuring optimal revenue. We consider a proportional offering strategy to optimally decide upon participation in both markets by maximizing expected revenue from day-ahead decisions while accounting for estimated regulation costs for failing to provide the services. An evaluation of considering the same proportional splitting of energy and reserve in both day- ahead and balancing market is performed. A set of numerical examples illustrate the behavior of such strategy. An important conclusion is that the optimal split of the available wind power between energy and reserve strongly depends upon prices and penalties on both market trading floors.

  16. Offer - La Comédie theatre

    CERN Multimedia

    Staff Association

    2017-01-01

    The “La Comédie” theatre unveiled its programme for the season 2017–2018. We are delighted to share this brand new, rich and varied programme with you. The “La Comédie” theatre has various discounts for our members Buy 2 subscriptions for the price of 1 : 2 cards “Libertà” for CHF 240.- instead of CHF 480.- Cruise freely through the season with an 8-entry card valid for the shows of your choice. These cards are transferable and can be shared with one or more accompanying persons. 2 cards “Piccolo” for CHF 120 instead of CHF 240.- This card lets you discover 4 shows which are suitable for all audiences (offers valid while stock lasts) Benefit from a reduction of 20 % on a full price ticket during all the season: from CHF 40.- to CHF 24.- ticket instead of CHF 50.- to CHF 30.- depending on the show (Also valid for one accompanying person). Interested in one of these offers? Create an ac...

  17. Presentation tourism offer on the Internet

    Directory of Open Access Journals (Sweden)

    Ćurčić Nevena

    2006-01-01

    Full Text Available The Internet has become basic infrastructure for many jobs, a medium of representing many businesses and progressively influential source of information. Therefore, the Internet has developed into a significant distributing, communicational and trade channel in tourism and catering industry throughout the world and in this area as well. This paper is a short survey of various possibilities of the Internet, with the special emphasis on the analysis of Serbian tourism presence on the Internet, by means of tourism and catering offer in cyber companies’ system of services. The aim of the paper is to make partial systematization of the present tourist offer of Serbia on the Internet with the brief rating and analysis of the websites. The goal of the paper is to point out the presence of Serbian tourist and catering services on the Internet, the restricting factors in e-business of the country and the importance of further promotional activities of tourism of Serbia on the global network.

  18. CURRICULAR OFFER INFLUENCING STUDENTS’ SATISFACTION: COMPARATIVE STUDY

    Directory of Open Access Journals (Sweden)

    Oana DUMITRASCU

    2014-11-01

    Full Text Available The main objective of the study is the determination of students’ satisfaction regarding curricular activities. The study has been accomplished using the qualitative and quantitative research, using the bibliographic study, various secondary sources and different primary sources. The study is developed with a marketing research and accomplished using the survey method. 699 students from four universities have been questioned. Due to a comparative study the University of Applied Sciences Worms, University of Applied Sciences Wiesbaden Rüsselsheim, University of Applied Sciences Frankfurt am Main and Nürtingen-Geislingen University have been analysed and their similarities and differences have been identified. The collected data, based on the established sample, is evaluated through univariate and bivariate analysis. In accordance with the evaluated sample, specific gaps from each region are identified regarding the curricular offer of the analysed universities. As a result to the conducted study, recommendations for the University of Applied Sciences Worms regarding the student’s satisfaction concerning the curricular offer are presented.

  19. Drug offers as a context for violence perpetration and victimization.

    Science.gov (United States)

    Helm, Susana; Okamoto, Scott; Kaliades, Alexis; Giroux, Danielle

    2014-01-01

    Drug use has been linked empirically with and aggression and violence among youth in national and State of Hawai'i samples. However, the nature of this link and its implications for prevention are unclear. Therefore, this article explores the intersection of drugs with aggression and violence by using the drug offer context as the unit of analysis. Native Hawaiian youth are sampled because substance use rates tend to be higher and onset tends to be earlier for them than for their non-Hawaiian peers. Fourteen sex-specific focus group discussions were held with rural Native Hawaiian middle school students (N = 64). Students discussed what they think they would do in terms of drug refusal strategies in a variety of drug offer contexts. Although aggression and violence were perceived to be socially inappropriate, students nonetheless felt drug use would be less socially competent. Narrative analyses indicated that aggression and violence were thought to function as potential drug refusal strategies. As proximal drug resistance, aggression and violence perpetration served as an immediate deterrent to the drug offerer and thus drug use. As distal drug resistance, victimization served as a rationale for avoiding drug using contexts. Implications are discussed in terms of prevention policy and practice, specifically in terms of a school-based prevention curriculum. Future research in Hawaiian epistemology and gendered approaches are warranted.

  20. An Offer You Cannot Refuse: Obtaining Efficiency and Fairness in Preplay Negotiation Games with Conditional Offers

    DEFF Research Database (Denmark)

    Goranko, Valentin; Turrini, Paolo

    2013-01-01

    . Such offers transform the payoff matrix of the original game and allow for some degree of cooperation between rational players while preserving the non-cooperative nature of the game. We focus on 2-player negotiations games arising in the preplay phase when offers for payments are made conditional...... on a suggested matching offer of the same kind being made in return by the receiver. We study and analyze such bargaining games, obtain results describing their possible solutions and discuss the degrees of efficiency and fairness that can be achieved in such negotiation process depending on whether time...

  1. Porting a Hall MHD Code to a Graphic Processing Unit

    Science.gov (United States)

    Dorelli, John C.

    2011-01-01

    We present our experience porting a Hall MHD code to a Graphics Processing Unit (GPU). The code is a 2nd order accurate MUSCL-Hancock scheme which makes use of an HLL Riemann solver to compute numerical fluxes and second-order finite differences to compute the Hall contribution to the electric field. The divergence of the magnetic field is controlled with Dedner?s hyperbolic divergence cleaning method. Preliminary benchmark tests indicate a speedup (relative to a single Nehalem core) of 58x for a double precision calculation. We discuss scaling issues which arise when distributing work across multiple GPUs in a CPU-GPU cluster.

  2. INTERCULTURAL ISSUES OF THE TOURIST OFFER

    Directory of Open Access Journals (Sweden)

    Cristina Elena ALBU

    2015-03-01

    Full Text Available Tourism is an opportunity for people from different cultures to meet and interact, to exchange ideas, traditions and ways of thinking. In this industry it is very easy to judge a person just after taking an overall look and studying her general behavior. The aim of this article is to determine the intercultural issues that can show up during the tourist act, while people belonging to different cultures interact. The research method used for creating this article is documentary study. The tourist offer shall be adapted to different types of tourist, taking into consideration some aspects as tourists’ behavior, the type of tourism they prefer and, most of all, the culture they belong to and the culture of the people from the place they visit. Despite the immense diversity of our minds, there is a structure that can serve as a basis for mutual understanding.

  3. Virtual Reality System Offers a Wide Perspective

    Science.gov (United States)

    2008-01-01

    Robot Systems Technology Branch engineers at Johnson Space Center created the remotely controlled Robonaut for use as an additional "set of hands" in extravehicular activities (EVAs) and to allow exploration of environments that would be too dangerous or difficult for humans. One of the problems Robonaut developers encountered was that the robot s interface offered an extremely limited field of vision. Johnson robotics engineer, Darby Magruder, explained that the 40-degree field-of-view (FOV) in initial robotic prototypes provided very narrow tunnel vision, which posed difficulties for Robonaut operators trying to see the robot s surroundings. Because of the narrow FOV, NASA decided to reach out to the private sector for assistance. In addition to a wider FOV, NASA also desired higher resolution in a head-mounted display (HMD) with the added ability to capture and display video.

  4. Environmental Chemistry and Environmental Science: A Survey of Courses Offered in U.S. Colleges and Universities.

    Science.gov (United States)

    Aram, Roberta J.; Manahan, Stanley E.

    1995-01-01

    Conducted a survey to determine the variety and availability of college-level environmental chemistry course offerings in the United States. Reports that 45% of all responding institutions offered an environmental chemistry course within the chemistry department, 24% were contemplating offering such a course, and 53% of responding institutions…

  5. Self-contained cable systems offer advantages

    Energy Technology Data Exchange (ETDEWEB)

    Morello, A.S.; Occhini, E.

    1977-05-01

    Low-pressure oil-filled (LPOF) cable systems, while seldom used in this country, have several advantages of interest to engineers. Less oil and insulating paper is required for low-pressure than for the more commonly used high-pressure systems because of the single core. The lower pessure offers safety features in the event of accidental oil loss. LPOF cables are used internationally because of their good thermal characteristics. Applications besides ac transmission lines include underground and submarine cables, such as those connecting islands with mainland facilities. Cooling can be accomplished either by circulating oil inside the central duct or circulating water through parallel nonmetallic pipes. Forced-cooling of the LPOF cables is less complicated, which allows them to have higher current ratings and makes them more adaptable to thermal transients. Conductor cooling, which increases capacity but prohibits overloads in LPOF cables, is the only system available to high voltage (HVDC) cables. Several experimental and demonstration cable systems are described. (DCK)

  6. What Has Literature to Offer Computer Science?

    Directory of Open Access Journals (Sweden)

    Mark Dougherty

    2004-04-01

    Full Text Available In this paper I ask the question: what has literature to offer computer science? Can a bilateral programme of research be started with the aim of discovering the same kind of deep intertwining of ideas between computer science and literature, as already exists between computer science and linguistics? What practical use could such results yield? I begin by studying a classic forum for some of the most unintelligible pieces of prose ever written, the computer manual. Why are these books so hard to understand? Could a richer diet of metaphor and onomatopoeia help me get my laser printer working? I then dig down a little deeper and explore computer programs themselves as literature. Do they exhibit aesthetics, emotion and all the other multifarious aspects of true literature? If so, does this support their purpose and understandability? Finally I explore the link between computer code and the human writer. Rather than write large amounts of code directly, we encourage students to write algorithms as pseudo-code as a first step. Pseudo-code tells a story within a semi-formalised framework of conventions. Is this the intertwining we should be looking for?

  7. Industrial Heritage in Tuzla Canton Tourist Offer

    Directory of Open Access Journals (Sweden)

    Edin Jahić

    2014-04-01

    Full Text Available Industrial heritage has a great importance in development of tourism of Tuzla Canton because this is a region which had well developed industry in the past. Major part of this industry has been destroyed and now can be used for touristic purposes Besides this function, industrial plants can be used for development of culture, education, etc., and we already have such positive examples in wealthier European countries. The aim of the survey was to examine the opinion of tourist agencies, which are providers of tourist services, on further development of tourism in the region of Tuzla Canton, with special emphasis on industrial tourism, because tourist agencies are one of the key factors in creation of tourism development. Methods used for data collecting, processing and analysis are: historical, descriptive, comparative, case study, survey (SPSS version 20. Elements that need improving and further development are highlighted. The research results can help the tourist destination management, in this case TC, but also all segments of the tourism industry of TC, improve their offer and communication with a potential tourism market.

  8. Cycad mutualist offers more than pollen transport.

    Science.gov (United States)

    Marler, Thomas E

    2010-05-01

    Specialist insects share obligate mutualisms with some contemporary cycad species whereby the insect's pollination services are rewarded with a nursery in which the insect's larvae consume the postdispersal male cone. I prevented visits of the pollinator moth Anatrachyntis sp. to male Cycas micronesica (Cycadaceae) cones to show that consumption of the cone tissue by the mutualist hastened initiation of the plant's subsequent reproductive event. This is the first documented case where removal of a postdispersal cycad pollination organ speeds up subsequent reproductive events, and the current paradigm that the offering of cone tissue as a nursery is a sacrifice by the plant in return for the pollination services is therefore inaccurate. In C. micronesica, the herbivory stage of pollination mutualism confers a cryptic benefit of cone tissue disposal, which translates into an increase in ultimate lifetime reproductive effort. The plant population relies on the pollinator for moving gametes, as well as for increasing the number of male coning events. The dual benefits afforded to the plant by associating with this pollinator shows that mutualism can operate simultaneously on very different traits.

  9. PROBLEMA ANOMALI DALAM INITIAL PUBLIC OFFERING (IPO

    Directory of Open Access Journals (Sweden)

    Sautma Ronni Basana

    2003-01-01

    Full Text Available This study on Initial Public Offerings (IPO showed that IPO stocks on average were underpriced, underperformed in the long run aftermarket and the Hot and Cold market cycle was present. These phenomenom can be explained by among others, the following: Asymetric Information, Winner's Curse, Traditional-Ibbotson, and Signalling Equilibrium Phenomenom. However, all of these can't give a satisfactory explanation because they were based on the price which existed on the secondary market (Underpriced Theory. The Withdrawn IPO (WIPO theory tried to explain the IPO anomaly differently so that the IPO anomaly could be explained clearly. Abstract in Bahasa Indonesia : Studi tentang Penawaran Saham Perdana (IPO menunjukkan saham-saham IPO secara rata-rata mengalami Underpriced, kinerja jangka panjang yang jelek dan adanya siklus pasar "Hot" dan "Cold". Ada beberapa penjelasan terhadap fenomena tersebut, antara lain: Informasi Asimetri, WinnerCurse, Tradisional-Ibbotson, Signaling Equilibrium Phenomenom tetapi semuanya belum memuaskan karena berdasarkan pada basis harga yang ada di pasar sekunder (Teori Underpriced. Teori Withdrawn IPO (WIPO mencoba menjelaskan fenomena anomali IPO dengan cara yang berbeda sehingga fenomena anomali IPO dapat dijelaskan secara tuntas. Kata kunci: Penawaran Saham Perdana (IPO, Underpriced, Withdrawn IPO (WIPO.

  10. New Science Opportunities Offered by MUSE

    Science.gov (United States)

    Bacon, R.; Bauer, S.; Brau-Nogué, S.; Caillier, P.; Capoani, L.; Carollo, M.; Contini, T.; Daguisé, E.; Delabre, B.; Dreizler, S.; Dubois, J. P.; Dupieux, M.; Dupin, J.; Emsellem, E.; Ferruit, P.; Francois, M.; Franx, M.; Gallou, G.; Gerssen, J.; Guiderdoni, B.; Hansali, G.; Hofmann, D.; Jarno, A.; Kelz, A.; Koehler, C.; Kollatschny, W.; Kosmalski, J.; Laurent, F.; Lilly, S.; Lizon, J.; Loupias, M.; Monstein, C.; Moultaka, J.; Nicklas, H.; Parés, L.; Pasquini, L.; Pecontal, A.; Pello, R.; Petit, C.; Manescau, A.; Reiss, R.; Remillieux, A.; Renault, E.; Roth, M.; Schaye, J.; Steinmetz, M.; Ströbele, S.; Stuik, R.; Weilbacher, P.; Wisotzki, L.; Wozniak, H.

    The Multi Unit Spectroscopic Explorer MUSE [MUSE public web site: http://muse.univ-lyon1.fr] is one of the second generation VLT instruments. MUSE is a wide-field optical integral field spectrograph operating in the visible wavelength range with improved spatial resolution. The MUSE Consortium consists of groups at Lyon (PI institute, CRAL), Gottingen (IAG), Potsdam (AIP), Leiden (NOVA), Toulouse (LATT), Zurich (ETH) and ESO. The project is currently in its final design phase. Manufacturing, assembly and integration will start after the Final Design Review which is foreseen for late 2008. The Preliminary acceptance in Europe is scheduled for mid 2011 and the instrument shall be in operation at Paranal in 2012.

  11. What One Physicist Has to Offer

    Science.gov (United States)

    Ross, Marc

    2004-05-01

    I was a particle theorist. In the early 1970s I began to analyze energy and its use in society. My theme is: What can physicists offer on a societal issue like energy? I have four topics: 1) Traffic safety and vehicle mass. The measurements are the record of some 40,000 deaths per year, vehicle characterizations and registrations. The statistical record is good, but information is lacking on physical processes in serious crashes. Our insight: while driver behavior is critical to safety, so is vehicle quality and design. Although one cannot definitively separate the injury impacts associated with momentum transfer from those due to intrusion, mass as such is not critical to safety. 2) Prospects for improving the energy efficiency of industrial processes. Our "measurements" were planning documents and interviews enabling us to analyze which "energy projects" were undertaken and which not. Insight: capital for projects was not allocated according to textbook economics; instead it was rationed. 3) Energy use by cars. Based on dynamometer studies motivated by the 1990 Clean Air Act Amendments, we created models of energy consumption that enable evaluation of modifications such as adopting a small engine while supplementing its capability for power. Insight: Vehicles could be designed to use much less fuel; but the gain for society is offset by low interest by new-car-buyers and manufacturers. 4) The effectiveness of automotive emissions controls. In addition to laboratory studies, we had surveys in "non-attainment" areas. Insight: Controls installed by original manufacturers are more robust and effective than repairs. Of the four, this is the one success for society. Conclusions: There are fascinating and solvable analytical challenges everywhere you look. But applications are hampered by the lack of a heritage and the close coupling between theorists and experimenters we know in physics.

  12. Energy- and cost-efficient lattice-QCD computations using graphics processing units

    Energy Technology Data Exchange (ETDEWEB)

    Bach, Matthias

    2014-07-01

    Quarks and gluons are the building blocks of all hadronic matter, like protons and neutrons. Their interaction is described by Quantum Chromodynamics (QCD), a theory under test by large scale experiments like the Large Hadron Collider (LHC) at CERN and in the future at the Facility for Antiproton and Ion Research (FAIR) at GSI. However, perturbative methods can only be applied to QCD for high energies. Studies from first principles are possible via a discretization onto an Euclidean space-time grid. This discretization of QCD is called Lattice QCD (LQCD) and is the only ab-initio option outside of the high-energy regime. LQCD is extremely compute and memory intensive. In particular, it is by definition always bandwidth limited. Thus - despite the complexity of LQCD applications - it led to the development of several specialized compute platforms and influenced the development of others. However, in recent years General-Purpose computation on Graphics Processing Units (GPGPU) came up as a new means for parallel computing. Contrary to machines traditionally used for LQCD, graphics processing units (GPUs) are a massmarket product. This promises advantages in both the pace at which higher-performing hardware becomes available and its price. CL2QCD is an OpenCL based implementation of LQCD using Wilson fermions that was developed within this thesis. It operates on GPUs by all major vendors as well as on central processing units (CPUs). On the AMD Radeon HD 7970 it provides the fastest double-precision D kernel for a single GPU, achieving 120GFLOPS. D - the most compute intensive kernel in LQCD simulations - is commonly used to compare LQCD platforms. This performance is enabled by an in-depth analysis of optimization techniques for bandwidth-limited codes on GPUs. Further, analysis of the communication between GPU and CPU, as well as between multiple GPUs, enables high-performance Krylov space solvers and linear scaling to multiple GPUs within a single system. LQCD

  13. Unitals in Projective Planes

    CERN Document Server

    Barwick, Susan

    2008-01-01

    Unitals are key structures in projective planes, and have connections with other structures in algebra. This book presents a monograph on unitals embedded in finite projective planes. It offers a survey of the research literature on embedded unitals. It is suitable for graduate students and researchers who want to learn about this topic

  14. Marketing Digital Offerings Is Different: Strategies for Teaching about Digital Offerings in the Marketing Classroom

    Science.gov (United States)

    Roberts, Scott D.; Micken, Kathleen S.

    2015-01-01

    Digital offerings represent different challenges for marketers than do traditional goods and services. After reviewing the literature, the authors suggest ways that the marketing of digital goods and services might be better presented to and better understood by students. The well-known four challenges of services marketing model (e.g.,…

  15. Marketing Digital Offerings Is Different: Strategies for Teaching about Digital Offerings in the Marketing Classroom

    Science.gov (United States)

    Roberts, Scott D.; Micken, Kathleen S.

    2015-01-01

    Digital offerings represent different challenges for marketers than do traditional goods and services. After reviewing the literature, the authors suggest ways that the marketing of digital goods and services might be better presented to and better understood by students. The well-known four challenges of services marketing model (e.g.,…

  16. Simposium 19: Teaching Offers Many Possibilities

    Directory of Open Access Journals (Sweden)

    Denise Vaz Macedo

    2014-08-01

    Full Text Available K-Education(PortugueseChair: V. TrindadeBayardo Torres; Clovis Wannmacher; Denise Macedo  Teaching Offers Many Possibilities Denise Vaz Macedo Biochemistry Department, Biology Institute, Unicamp, Campinas, Brazil.   In the last years my research lines are maintained exclusively through my biochemistry teaching activities at graduation and specialization course (360h. The teaching methodology used was developed over these 20 years into the classroom research. It is based on five practical activities carried out at the initial moment by the students themselves, who monitor the effects of different physical activity situations through the measurement of some plasma metabolites on point of care devices. After instructions the students perform the exercises collects and tabulate the data generated and document all the doubts arising. The educational goal right now is to show that the theory related to muscle contraction, the ATP-producing metabolic pathways is linked to their profession. At adequate moments each group presents to the whole class the practical activity carried out, the data and the doubts produced. After a fully discussion the students are able to relate the data to the studied theory. Also the initial doubts are clarified. A questionnaire applied before and after the discipline indicates the learning effectiveness of this method. Some other results: the students who have demonstrated special interest in the classroom normally join into de lab. Simultaneously they are also prepared for the teaching activity. The demand of specialization course is greater than the supply. The financial resources generated are expressive and administered by the University Foundation. They are fully applied to purchase permanent and consumption materials and for the payment of eventual scholarships for lab researchers. The publication in indexed journals has been constant and regular, and the obtained experimental results always return to the

  17. Graphics Processing Units for HEP trigger systems

    Science.gov (United States)

    Ammendola, R.; Bauce, M.; Biagioni, A.; Chiozzi, S.; Cotta Ramusino, A.; Fantechi, R.; Fiorini, M.; Giagu, S.; Gianoli, A.; Lamanna, G.; Lonardo, A.; Messina, A.; Neri, I.; Paolucci, P. S.; Piandani, R.; Pontisso, L.; Rescigno, M.; Simula, F.; Sozzi, M.; Vicini, P.

    2016-07-01

    General-purpose computing on GPUs (Graphics Processing Units) is emerging as a new paradigm in several fields of science, although so far applications have been tailored to the specific strengths of such devices as accelerator in offline computation. With the steady reduction of GPU latencies, and the increase in link and memory throughput, the use of such devices for real-time applications in high-energy physics data acquisition and trigger systems is becoming ripe. We will discuss the use of online parallel computing on GPU for synchronous low level trigger, focusing on CERN NA62 experiment trigger system. The use of GPU in higher level trigger system is also briefly considered.

  18. Graphics Processing Units for HEP trigger systems

    Energy Technology Data Exchange (ETDEWEB)

    Ammendola, R. [INFN Sezione di Roma “Tor Vergata”, Via della Ricerca Scientifica 1, 00133 Roma (Italy); Bauce, M. [INFN Sezione di Roma “La Sapienza”, P.le A. Moro 2, 00185 Roma (Italy); University of Rome “La Sapienza”, P.lee A.Moro 2, 00185 Roma (Italy); Biagioni, A. [INFN Sezione di Roma “La Sapienza”, P.le A. Moro 2, 00185 Roma (Italy); Chiozzi, S.; Cotta Ramusino, A. [INFN Sezione di Ferrara, Via Saragat 1, 44122 Ferrara (Italy); University of Ferrara, Via Saragat 1, 44122 Ferrara (Italy); Fantechi, R. [INFN Sezione di Pisa, Largo B. Pontecorvo 3, 56127 Pisa (Italy); CERN, Geneve (Switzerland); Fiorini, M. [INFN Sezione di Ferrara, Via Saragat 1, 44122 Ferrara (Italy); University of Ferrara, Via Saragat 1, 44122 Ferrara (Italy); Giagu, S. [INFN Sezione di Roma “La Sapienza”, P.le A. Moro 2, 00185 Roma (Italy); University of Rome “La Sapienza”, P.lee A.Moro 2, 00185 Roma (Italy); Gianoli, A. [INFN Sezione di Ferrara, Via Saragat 1, 44122 Ferrara (Italy); University of Ferrara, Via Saragat 1, 44122 Ferrara (Italy); Lamanna, G., E-mail: gianluca.lamanna@cern.ch [INFN Sezione di Pisa, Largo B. Pontecorvo 3, 56127 Pisa (Italy); INFN Laboratori Nazionali di Frascati, Via Enrico Fermi 40, 00044 Frascati (Roma) (Italy); Lonardo, A. [INFN Sezione di Roma “La Sapienza”, P.le A. Moro 2, 00185 Roma (Italy); Messina, A. [INFN Sezione di Roma “La Sapienza”, P.le A. Moro 2, 00185 Roma (Italy); University of Rome “La Sapienza”, P.lee A.Moro 2, 00185 Roma (Italy); and others

    2016-07-11

    General-purpose computing on GPUs (Graphics Processing Units) is emerging as a new paradigm in several fields of science, although so far applications have been tailored to the specific strengths of such devices as accelerator in offline computation. With the steady reduction of GPU latencies, and the increase in link and memory throughput, the use of such devices for real-time applications in high-energy physics data acquisition and trigger systems is becoming ripe. We will discuss the use of online parallel computing on GPU for synchronous low level trigger, focusing on CERN NA62 experiment trigger system. The use of GPU in higher level trigger system is also briefly considered.

  19. Exploiting graphics processing units for computational biology and bioinformatics.

    Science.gov (United States)

    Payne, Joshua L; Sinnott-Armstrong, Nicholas A; Moore, Jason H

    2010-09-01

    Advances in the video gaming industry have led to the production of low-cost, high-performance graphics processing units (GPUs) that possess more memory bandwidth and computational capability than central processing units (CPUs), the standard workhorses of scientific computing. With the recent release of generalpurpose GPUs and NVIDIA's GPU programming language, CUDA, graphics engines are being adopted widely in scientific computing applications, particularly in the fields of computational biology and bioinformatics. The goal of this article is to concisely present an introduction to GPU hardware and programming, aimed at the computational biologist or bioinformaticist. To this end, we discuss the primary differences between GPU and CPU architecture, introduce the basics of the CUDA programming language, and discuss important CUDA programming practices, such as the proper use of coalesced reads, data types, and memory hierarchies. We highlight each of these topics in the context of computing the all-pairs distance between instances in a dataset, a common procedure in numerous disciplines of scientific computing. We conclude with a runtime analysis of the GPU and CPU implementations of the all-pairs distance calculation. We show our final GPU implementation to outperform the CPU implementation by a factor of 1700.

  20. 7 CFR 1410.31 - Acceptability of offers.

    Science.gov (United States)

    2010-01-01

    ... Administrator for the for the area offered. Acceptance or rejection of any offer, however, shall be in the sole... 7 Agriculture 10 2010-01-01 2010-01-01 false Acceptability of offers. 1410.31 Section 1410.31... Acceptability of offers. (a) Except as provided in paragraph (c) of this section, producers may submit...

  1. 19 CFR 172.33 - Acceptance of offers in compromise.

    Science.gov (United States)

    2010-04-01

    ... 19 Customs Duties 2 2010-04-01 2010-04-01 false Acceptance of offers in compromise. 172.33 Section... OF THE TREASURY (CONTINUED) CLAIMS FOR LIQUIDATED DAMAGES; PENALTIES SECURED BY BONDS Offers in Compromise § 172.33 Acceptance of offers in compromise. An offer in compromise will be considered...

  2. 19 CFR 171.32 - Acceptance of offers in compromise.

    Science.gov (United States)

    2010-04-01

    ... 19 Customs Duties 2 2010-04-01 2010-04-01 false Acceptance of offers in compromise. 171.32 Section... OF THE TREASURY (CONTINUED) FINES, PENALTIES, AND FORFEITURES Offers in Compromise § 171.32 Acceptance of offers in compromise. An offer in compromise will be considered accepted only when the...

  3. b-Bit Minwise Hashing in Practice: Large-Scale Batch and Online Learning and Using GPUs for Fast Preprocessing with Simple Hash Functions

    CERN Document Server

    Li, Ping; Konig, Arnd Christian

    2012-01-01

    In this paper, we study several critical issues which must be tackled before one can apply b-bit minwise hashing to the volumes of data often used industrial applications, especially in the context of search. 1. (b-bit) Minwise hashing requires an expensive preprocessing step that computes k (e.g., 500) minimal values after applying the corresponding permutations for each data vector. We developed a parallelization scheme using GPUs and observed that the preprocessing time can be reduced by a factor of 20-80 and becomes substantially smaller than the data loading time. 2. One major advantage of b-bit minwise hashing is that it can substantially reduce the amount of memory required for batch learning. However, as online algorithms become increasingly popular for large-scale learning in the context of search, it is not clear if b-bit minwise yields significant improvements for them. This paper demonstrates that $b$-bit minwise hashing provides an effective data size/dimension reduction scheme and hence it can d...

  4. Star River Second Phase Offering with Low Profile and Higher Quality

    Institute of Scientific and Technical Information of China (English)

    Li Yinghong

    2006-01-01

    @@ The widely expected phase two project of Beijing Star River will open its phase two model units to the public on Oct.19, and by then Beijing Star River, which has long been regarded as the paradigm of good residence, will unveil its long-awaited phase two offering. The reporter took the lead to pay a visit to the phase two model units recently. The fully innovated phase two products are still ready-made residence, and the quality is even higher.

  5. Monte Carlo-based fluorescence molecular tomography reconstruction method accelerated by a cluster of graphic processing units.

    Science.gov (United States)

    Quan, Guotao; Gong, Hui; Deng, Yong; Fu, Jianwei; Luo, Qingming

    2011-02-01

    High-speed fluorescence molecular tomography (FMT) reconstruction for 3-D heterogeneous media is still one of the most challenging problems in diffusive optical fluorescence imaging. In this paper, we propose a fast FMT reconstruction method that is based on Monte Carlo (MC) simulation and accelerated by a cluster of graphics processing units (GPUs). Based on the Message Passing Interface standard, we modified the MC code for fast FMT reconstruction, and different Green's functions representing the flux distribution in media are calculated simultaneously by different GPUs in the cluster. A load-balancing method was also developed to increase the computational efficiency. By applying the Fréchet derivative, a Jacobian matrix is formed to reconstruct the distribution of the fluorochromes using the calculated Green's functions. Phantom experiments have shown that only 10 min are required to get reconstruction results with a cluster of 6 GPUs, rather than 6 h with a cluster of multiple dual opteron CPU nodes. Because of the advantages of high accuracy and suitability for 3-D heterogeneity media with refractive-index-unmatched boundaries from the MC simulation, the GPU cluster-accelerated method provides a reliable approach to high-speed reconstruction for FMT imaging.

  6. Parallelized multi-graphics processing unit framework for high-speed Gabor-domain optical coherence microscopy.

    Science.gov (United States)

    Tankam, Patrice; Santhanam, Anand P; Lee, Kye-Sung; Won, Jungeun; Canavesi, Cristina; Rolland, Jannick P

    2014-07-01

    Gabor-domain optical coherence microscopy (GD-OCM) is a volumetric high-resolution technique capable of acquiring three-dimensional (3-D) skin images with histological resolution. Real-time image processing is needed to enable GD-OCM imaging in a clinical setting. We present a parallelized and scalable multi-graphics processing unit (GPU) computing framework for real-time GD-OCM image processing. A parallelized control mechanism was developed to individually assign computation tasks to each of the GPUs. For each GPU, the optimal number of amplitude-scans (A-scans) to be processed in parallel was selected to maximize GPU memory usage and core throughput. We investigated five computing architectures for computational speed-up in processing 1000×1000 A-scans. The proposed parallelized multi-GPU computing framework enables processing at a computational speed faster than the GD-OCM image acquisition, thereby facilitating high-speed GD-OCM imaging in a clinical setting. Using two parallelized GPUs, the image processing of a 1×1×0.6  mm3 skin sample was performed in about 13 s, and the performance was benchmarked at 6.5 s with four GPUs. This work thus demonstrates that 3-D GD-OCM data may be displayed in real-time to the examiner using parallelized GPU processing.

  7. Parallelized multi–graphics processing unit framework for high-speed Gabor-domain optical coherence microscopy

    Science.gov (United States)

    Tankam, Patrice; Santhanam, Anand P.; Lee, Kye-Sung; Won, Jungeun; Canavesi, Cristina; Rolland, Jannick P.

    2014-01-01

    Abstract. Gabor-domain optical coherence microscopy (GD-OCM) is a volumetric high-resolution technique capable of acquiring three-dimensional (3-D) skin images with histological resolution. Real-time image processing is needed to enable GD-OCM imaging in a clinical setting. We present a parallelized and scalable multi-graphics processing unit (GPU) computing framework for real-time GD-OCM image processing. A parallelized control mechanism was developed to individually assign computation tasks to each of the GPUs. For each GPU, the optimal number of amplitude-scans (A-scans) to be processed in parallel was selected to maximize GPU memory usage and core throughput. We investigated five computing architectures for computational speed-up in processing 1000×1000 A-scans. The proposed parallelized multi-GPU computing framework enables processing at a computational speed faster than the GD-OCM image acquisition, thereby facilitating high-speed GD-OCM imaging in a clinical setting. Using two parallelized GPUs, the image processing of a 1×1×0.6  mm3 skin sample was performed in about 13 s, and the performance was benchmarked at 6.5 s with four GPUs. This work thus demonstrates that 3-D GD-OCM data may be displayed in real-time to the examiner using parallelized GPU processing. PMID:24695868

  8. [HPV vaccination: active offer in an Italian region].

    Science.gov (United States)

    Terracciano, Elisa; D'Alò, Gian Loreto; Aquilani, Silvia; Aversa, Anna Maria; Bartolomei, Giuseppina; Calenda, Maria Gabriella; Catapano, Raffaele; Compagno, Silvio; Della Rovere, Piera; Fraioli, Angelo; Ieraci, Roberto; Reggiani, Daniela; Sgricia, Stefano; Spadea, Antonietta; Zaratti, Laura; Franco, Elisabetta

    2017-01-01

    Human Papillomavirus is responsible for 4.8% of cancers, and is the main cause of cervical cancer. Cervical cancer can be reduced by mean of secondary prevention (PAP-test, HPV-DNA test), while through primary prevention (anti-HPV vaccine) the incidence of other HPV-attributable cancers can also be reduced. In Italy, anti-HPV vaccination is part of the immunization schedule in girls since 2008, and in 2017 it was extended to boys. However, vaccine coverage is decreasing nationwide. This study aims to examine anti-HPV vaccination practices in Health care services of Lazio Region, Italy. Questionnaires were sent or administered directly to those in charge of vaccinations. Data, collected from 11/12 (92%) Lazio Local Health Units and from 116 vaccination centers, show a remarkable diversity in the offer: 41% of the centers open only 1-2 days/week, 42% only in the morning, and only 7% are open on Saturday. Vaccination is available by reservation only in 62% of the centers, while vaccines are not administered to ≥18 years subjects in 33%; 93% of the centers call actively the girls in the target cohort, while 70% and 94% recall the patients who had not received the first or the second dose of vaccine, respectively. Collaboration with family physicians and/or pediatricians was declared by 80% of the centers. Vaccine coverage could probably be improved by addressing the highlighted critical issues and applying best practices widely.

  9. 48 CFR 619.804 - Evaluation, offering, and acceptance.

    Science.gov (United States)

    2010-10-01

    ....804 Evaluation, offering, and acceptance. ... 48 Federal Acquisition Regulations System 4 2010-10-01 2010-10-01 false Evaluation, offering, and acceptance. 619.804 Section 619.804 Federal Acquisition Regulations System DEPARTMENT OF STATE...

  10. 48 CFR 19.804 - Evaluation, offering, and acceptance.

    Science.gov (United States)

    2010-10-01

    ...) Program) 19.804 Evaluation, offering, and acceptance. ... 48 Federal Acquisition Regulations System 1 2010-10-01 2010-10-01 false Evaluation, offering, and acceptance. 19.804 Section 19.804 Federal Acquisition Regulations System FEDERAL ACQUISITION...

  11. FDA Offers Guidance on Fish Intake for Kids, Pregnant Women

    Science.gov (United States)

    ... page: https://medlineplus.gov/news/fullstory_163113.html FDA Offers Guidance on Fish Intake for Kids, Pregnant ... is far less than the recommended amount, the FDA said. Fish offers nutritional benefits important for growth ...

  12. Certification of School Social Workers and Curriculum Content of Programs Offering Training in School Social Work

    Science.gov (United States)

    Mumm, Ann Marie; Bye, Lynn

    2011-01-01

    This article examines the status of certification requirements for school social workers across the United States and the policy context in which certification is embedded. The article also details findings of a study on the curriculum available at various schools of social work offering training in school social work. The article makes a case for…

  13. 32 CFR 644.87 - Preparation and execution of offers.

    Science.gov (United States)

    2010-07-01

    ... the first and third lines, respectively, on page 2 of the offer form when title is being acquired free... applicable or can be fully complied with without the use of an Offer to Sell. Pages 1 and 2 of ENG Form 2970... Form 42, Offer to Sell Real Property, is required in all authorized projects, except in those cases...

  14. Final-Offer Arbitration: "Sudden Death" in Eugene

    Science.gov (United States)

    Long, Gary; Feuille, Peter

    1974-01-01

    A case study on final offer arbitration experiences in Eugene, Oregon, is presented and discussed. Basic criticisms leveled against the final-offer system are opposed by the authors and evidence is given in support of the use of final-offer arbitration. (DS)

  15. 7 CFR 1494.501 - Submission of offers to CCC.

    Science.gov (United States)

    2010-01-01

    ... before the time the offer is to be considered by CCC, unless otherwise required by law; (viii) No attempt... required certifications, unless CCC determines that acceptance of the offer would be in the best interests... 7 Agriculture 10 2010-01-01 2010-01-01 false Submission of offers to CCC. 1494.501 Section...

  16. 7 CFR 1494.601 - Acceptance of offers by CCC.

    Science.gov (United States)

    2010-01-01

    ... 7 Agriculture 10 2010-01-01 2010-01-01 false Acceptance of offers by CCC. 1494.601 Section 1494... Program Operations § 1494.601 Acceptance of offers by CCC. (a) Establishment of acceptable sales prices... that becomes available to CCC. (b) Acceptance of offers for a CCC bonus on a competitive basis....

  17. 26 CFR 601.203 - Offers in compromise.

    Science.gov (United States)

    2010-04-01

    ... penalty and the District Counsel concurs in the acceptance of the offer, or (iv) Recommend to the Regional Commissioner the acceptance of the offer if it involves a civil liability of $100,000 or over. (2)(i) If the... acceptance or rejection of the offer together with the examining officer's report of the investigation....

  18. 33 CFR 25.129 - Acceptance of offer of settlement.

    Science.gov (United States)

    2010-07-01

    ... 33 Navigation and Navigable Waters 1 2010-07-01 2010-07-01 false Acceptance of offer of settlement. 25.129 Section 25.129 Navigation and Navigable Waters COAST GUARD, DEPARTMENT OF HOMELAND SECURITY GENERAL CLAIMS General § 25.129 Acceptance of offer of settlement. Claimant's acceptance of an offer...

  19. 26 CFR 300.3 - Offer to compromise fee.

    Science.gov (United States)

    2010-04-01

    ... taxpayer if the offer is accepted, rejected, withdrawn, or returned as nonprocessable after acceptance for... 26 Internal Revenue 18 2010-04-01 2010-04-01 false Offer to compromise fee. 300.3 Section 300.3... ADMINISTRATION USER FEES § 300.3 Offer to compromise fee. (a) Applicability. This section applies to...

  20. 31 CFR 50.13 - Offer, purchase, and renewal.

    Science.gov (United States)

    2010-07-01

    ... 31 Money and Finance: Treasury 1 2010-07-01 2010-07-01 false Offer, purchase, and renewal. 50.13... PROGRAM Disclosures as Conditions for Federal Payment § 50.13 Offer, purchase, and renewal. An insurer is deemed to be in compliance with the requirement of providing disclosure “at the time of offer,...

  1. Uncontracted Rys Quadrature Implementation of up to G Functions on Graphical Processing Units.

    Science.gov (United States)

    Asadchev, Andrey; Allada, Veerendra; Felder, Jacob; Bode, Brett M; Gordon, Mark S; Windus, Theresa L

    2010-03-09

    An implementation is presented of an uncontracted Rys quadrature algorithm for electron repulsion integrals, including up to g functions on graphical processing units (GPUs). The general GPU programming model, the challenges associated with implementing the Rys quadrature on these highly parallel emerging architectures, and a new approach to implementing the quadrature are outlined. The performance of the implementation is evaluated for single and double precision on two different types of GPU devices. The performance obtained is on par with the matrix-vector routine from the CUDA basic linear algebra subroutines (CUBLAS) library.

  2. Efficient neighbor list calculation for molecular simulation of colloidal systems using graphics processing units

    Science.gov (United States)

    Howard, Michael P.; Anderson, Joshua A.; Nikoubashman, Arash; Glotzer, Sharon C.; Panagiotopoulos, Athanassios Z.

    2016-06-01

    We present an algorithm based on linear bounding volume hierarchies (LBVHs) for computing neighbor (Verlet) lists using graphics processing units (GPUs) for colloidal systems characterized by large size disparities. We compare this to a GPU implementation of the current state-of-the-art CPU algorithm based on stenciled cell lists. We report benchmarks for both neighbor list algorithms in a Lennard-Jones binary mixture with synthetic interaction range disparity and a realistic colloid solution. LBVHs outperformed the stenciled cell lists for systems with moderate or large size disparity and dilute or semidilute fractions of large particles, conditions typical of colloidal systems.

  3. Negotiating for more: the multiple equivalent simultaneous offer.

    Science.gov (United States)

    Heller, Richard E

    2014-02-01

    Whether a doctor, professional baseball manager, or a politician, having successful negotiation skills is a critical part of being a leader. Building upon prior journal articles on negotiation strategy, the author presents the concept of the multiple equivalent simultaneous offer (MESO). The concept of a MESO is straightforward: as opposed to making a single offer, make multiple offers with several variables. Each offer alters the different variables, such that the end result of each offer is equivalent from the perspective of the party making the offer. Research has found several advantages to the use of MESOs. For example, using MESOs, an offer was more likely to be accepted, and the counterparty was more likely to be satisfied with the negotiated deal. Additional benefits have been documented as well, underscoring why a prepared radiology business leader should understand the theory and practice of MESO. Copyright © 2014 American College of Radiology. Published by Elsevier Inc. All rights reserved.

  4. The ATLAS Trigger Algorithms for General Purpose Graphics Processor Units

    CERN Document Server

    Tavares Delgado, Ademar; The ATLAS collaboration

    2016-01-01

    The ATLAS Trigger Algorithms for General Purpose Graphics Processor Units Type: Talk Abstract: We present the ATLAS Trigger algorithms developed to exploit General­ Purpose Graphics Processor Units. ATLAS is a particle physics experiment located on the LHC collider at CERN. The ATLAS Trigger system has two levels, hardware-­based Level 1 and the High Level Trigger implemented in software running on a farm of commodity CPU. Performing the trigger event selection within the available farm resources presents a significant challenge that will increase future LHC upgrades. are being evaluated as a potential solution for trigger algorithms acceleration. Key factors determining the potential benefit of this new technology are the relative execution speedup, the number of GPUs required and the relative financial cost of the selected GPU. We have developed a trigger demonstrator which includes algorithms for reconstructing tracks in the Inner Detector and Muon Spectrometer and clusters of energy deposited in the Cal...

  5. CUDASW++2.0: enhanced Smith-Waterman protein database search on CUDA-enabled GPUs based on SIMT and virtualized SIMD abstractions.

    Science.gov (United States)

    Liu, Yongchao; Schmidt, Bertil; Maskell, Douglas L

    2010-04-06

    Due to its high sensitivity, the Smith-Waterman algorithm is widely used for biological database searches. Unfortunately, the quadratic time complexity of this algorithm makes it highly time-consuming. The exponential growth of biological databases further deteriorates the situation. To accelerate this algorithm, many efforts have been made to develop techniques in high performance architectures, especially the recently emerging many-core architectures and their associated programming models. This paper describes the latest release of the CUDASW++ software, CUDASW++ 2.0, which makes new contributions to Smith-Waterman protein database searches using compute unified device architecture (CUDA). A parallel Smith-Waterman algorithm is proposed to further optimize the performance of CUDASW++ 1.0 based on the single instruction, multiple thread (SIMT) abstraction. For the first time, we have investigated a partitioned vectorized Smith-Waterman algorithm using CUDA based on the virtualized single instruction, multiple data (SIMD) abstraction. The optimized SIMT and the partitioned vectorized algorithms were benchmarked, and remarkably, have similar performance characteristics. CUDASW++ 2.0 achieves performance improvement over CUDASW++ 1.0 as much as 1.74 (1.72) times using the optimized SIMT algorithm and up to 1.77 (1.66) times using the partitioned vectorized algorithm, with a performance of up to 17 (30) billion cells update per second (GCUPS) on a single-GPU GeForce GTX 280 (dual-GPU GeForce GTX 295) graphics card. CUDASW++ 2.0 is publicly available open-source software, written in CUDA and C++ programming languages. It obtains significant performance improvement over CUDASW++ 1.0 using either the optimized SIMT algorithm or the partitioned vectorized algorithm for Smith-Waterman protein database searches by fully exploiting the compute capability of commonly used CUDA-enabled low-cost GPUs.

  6. CUDASW++2.0: enhanced Smith-Waterman protein database search on CUDA-enabled GPUs based on SIMT and virtualized SIMD abstractions

    Directory of Open Access Journals (Sweden)

    Schmidt Bertil

    2010-04-01

    Full Text Available Abstract Background Due to its high sensitivity, the Smith-Waterman algorithm is widely used for biological database searches. Unfortunately, the quadratic time complexity of this algorithm makes it highly time-consuming. The exponential growth of biological databases further deteriorates the situation. To accelerate this algorithm, many efforts have been made to develop techniques in high performance architectures, especially the recently emerging many-core architectures and their associated programming models. Findings This paper describes the latest release of the CUDASW++ software, CUDASW++ 2.0, which makes new contributions to Smith-Waterman protein database searches using compute unified device architecture (CUDA. A parallel Smith-Waterman algorithm is proposed to further optimize the performance of CUDASW++ 1.0 based on the single instruction, multiple thread (SIMT abstraction. For the first time, we have investigated a partitioned vectorized Smith-Waterman algorithm using CUDA based on the virtualized single instruction, multiple data (SIMD abstraction. The optimized SIMT and the partitioned vectorized algorithms were benchmarked, and remarkably, have similar performance characteristics. CUDASW++ 2.0 achieves performance improvement over CUDASW++ 1.0 as much as 1.74 (1.72 times using the optimized SIMT algorithm and up to 1.77 (1.66 times using the partitioned vectorized algorithm, with a performance of up to 17 (30 billion cells update per second (GCUPS on a single-GPU GeForce GTX 280 (dual-GPU GeForce GTX 295 graphics card. Conclusions CUDASW++ 2.0 is publicly available open-source software, written in CUDA and C++ programming languages. It obtains significant performance improvement over CUDASW++ 1.0 using either the optimized SIMT algorithm or the partitioned vectorized algorithm for Smith-Waterman protein database searches by fully exploiting the compute capability of commonly used CUDA-enabled low-cost GPUs.

  7. Multidisciplinary Simulation Acceleration using Multiple Shared-Memory Graphical Processing Units

    Science.gov (United States)

    Kemal, Jonathan Yashar

    For purposes of optimizing and analyzing turbomachinery and other designs, the unsteady Favre-averaged flow-field differential equations for an ideal compressible gas can be solved in conjunction with the heat conduction equation. We solve all equations using the finite-volume multiple-grid numerical technique, with the dual time-step scheme used for unsteady simulations. Our numerical solver code targets CUDA-capable Graphical Processing Units (GPUs) produced by NVIDIA. Making use of MPI, our solver can run across networked compute notes, where each MPI process can use either a GPU or a Central Processing Unit (CPU) core for primary solver calculations. We use NVIDIA Tesla C2050/C2070 GPUs based on the Fermi architecture, and compare our resulting performance against Intel Zeon X5690 CPUs. Solver routines converted to CUDA typically run about 10 times faster on a GPU for sufficiently dense computational grids. We used a conjugate cylinder computational grid and ran a turbulent steady flow simulation using 4 increasingly dense computational grids. Our densest computational grid is divided into 13 blocks each containing 1033x1033 grid points, for a total of 13.87 million grid points or 1.07 million grid points per domain block. To obtain overall speedups, we compare the execution time of the solver's iteration loop, including all resource intensive GPU-related memory copies. Comparing the performance of 8 GPUs to that of 8 CPUs, we obtain an overall speedup of about 6.0 when using our densest computational grid. This amounts to an 8-GPU simulation running about 39.5 times faster than running than a single-CPU simulation.

  8. Sport's offer as an instrument of sports marketing mix

    Directory of Open Access Journals (Sweden)

    Gašović Milan

    2004-01-01

    Full Text Available Taking logical postulate that a product is all what can be offered on the market in order to satisfy needs, demands or wants of customer, regarding the core of sport's offer (product, marketing experts must give answers to three key questions: What can sports companies, teams or individuals offer to consumer? What needs can sports companies, teams or individuals satisfy? What instruments (techniques and methods should use marketing experts in sports organizations in order to satisfy identified customer needs? .

  9. 一种输入感知的雷达回波快速聚类实现%Input-aware Runtime Scheduling Support for Fast Clustering of Radar Reflectivity Data on GPUs

    Institute of Scientific and Technical Information of China (English)

    周伟; 安虹; 刘谷; 李小强; 吴石磊

    2012-01-01

    As a classic algorithm in data mining, the clustering algorithm is often adopted in analysis of radar reflectivity data. However, it is time-consuming while facing dataset of large scale and high dimension. Recently, several studies have been conducted to make effort in parallelization or optimization of the clustering algorithm on GPUs. Although these studies have shown promising results,one important factor-program inputs-in the optimization is ignored in op-timizatioa We took the program inputs in consider as a factor for optimization of the clustering algorithm on GPUs. By observing the distribution feature of the input radar reflectivity data, we found that the ability to adapt to inputs is important for our application to achieve the best performance on GPUs. The results shows that our approach can gain a 20%~40% performance increment,compared to previous parallel code on GPUs, which makes it satisfies the requirement of real-time application well.%聚类算法作为数据挖掘中的经典算法,在雷达回波的数据分析中经常被采用.然而对于规模和维度都较大的输入数据集,算法十分耗时.很多研究虽然对聚类算法进行了GPU平台的并行和优化的工作,但都忽略了输入数据集对优化的影响.因此,提出了在GPU/CUDA平台上的一种新颖的雷达快速聚类实现.该实现通过运行时的方式对输入的回波数据进行观察,以获取数据的分布信息,用以指导聚类计算在GPU上执行时的线程块调度.而该运行时模块本身的开销非常小.实验表明,引入这种输入感知的运行时调度支持后,大大削减了GPU的计算负载,获得了相对于一般策略的CUDA实现的20%~40%的性能提升,加强了算法的实时性能.

  10. Educational Programs Offered by Colleges of Pharmacy and Drug Information Centers within the United States.

    Science.gov (United States)

    Kirschenbaum, Harold L.; Rosenberg, Jack M.

    1984-01-01

    Surveys mailed to institutions known to be active in disseminating drug information as well as colleges of pharmacy indicated that many of today's pharmacy students may not be receiving sufficient drug information training to respond to the drug information needs of other health professionals and the public. (Author/MLW)

  11. Educational Programs Offered by Colleges of Pharmacy and Drug Information Centers within the United States.

    Science.gov (United States)

    Kirschenbaum, Harold L.; Rosenberg, Jack M.

    1984-01-01

    Surveys mailed to institutions known to be active in disseminating drug information as well as colleges of pharmacy indicated that many of today's pharmacy students may not be receiving sufficient drug information training to respond to the drug information needs of other health professionals and the public. (Author/MLW)

  12. Triple Play Service and IPTV Services Offered within it

    Directory of Open Access Journals (Sweden)

    Dagmar Pajdusakova

    2008-01-01

    Full Text Available This paper deals with Triple Play multimedia service and figures its architecture. Triple Play offers voice, video and data services together in one customer connection. There is offered IPTV (Internet Protocol Television service within this service, where we can include also Video on Demand service and other different additional services. In the paper is described classification of Video on Demand services.

  13. 21 CFR 1270.42 - Human tissue offered for import.

    Science.gov (United States)

    2010-04-01

    ... 21 Food and Drugs 8 2010-04-01 2010-04-01 false Human tissue offered for import. 1270.42 Section...) REGULATIONS UNDER CERTAIN OTHER ACTS ADMINISTERED BY THE FOOD AND DRUG ADMINISTRATION HUMAN TISSUE INTENDED FOR TRANSPLANTATION Inspection of Tissue Establishments § 1270.42 Human tissue offered for import. (a...

  14. 45 CFR 2544.115 - Who may offer a donation?

    Science.gov (United States)

    2010-10-01

    ... 45 Public Welfare 4 2010-10-01 2010-10-01 false Who may offer a donation? 2544.115 Section 2544... COMMUNITY SERVICE SOLICITATION AND ACCEPTANCE OF DONATIONS § 2544.115 Who may offer a donation? Anyone... donation to the Corporation....

  15. 48 CFR 219.804 - Evaluation, offering, and acceptance.

    Science.gov (United States)

    2010-10-01

    ... Business Administration (The 8(a) Program) 219.804 Evaluation, offering, and acceptance. When processing... 48 Federal Acquisition Regulations System 3 2010-10-01 2010-10-01 false Evaluation, offering, and acceptance. 219.804 Section 219.804 Federal Acquisition Regulations System DEFENSE ACQUISITION...

  16. 14 CFR 151.29 - Procedures: Offer, amendment, and acceptance.

    Science.gov (United States)

    2010-01-01

    ... § 151.29 Procedures: Offer, amendment, and acceptance. (a) Upon approving a project, the Administrator... 14 Aeronautics and Space 3 2010-01-01 2010-01-01 false Procedures: Offer, amendment, and acceptance. 151.29 Section 151.29 Aeronautics and Space FEDERAL AVIATION ADMINISTRATION, DEPARTMENT...

  17. 48 CFR 52.247-51 - Evaluation of Export Offers.

    Science.gov (United States)

    2010-10-01

    ... Government bill of lading. (1) Offers shall be evaluated and awards made on the basis of the lowest laid down...) Unless offers are applicable only to f.o.b. origin delivery under Government bills of lading (see... authorities of the St. Lawrence Seaway (normal period is between April 15 and November 30 annually). All...

  18. 47 CFR 76.1621 - Equipment compatibility offer.

    Science.gov (United States)

    2010-10-01

    ... 47 Telecommunication 4 2010-10-01 2010-10-01 false Equipment compatibility offer. 76.1621 Section... MULTICHANNEL VIDEO AND CABLE TELEVISION SERVICE Notices § 76.1621 Equipment compatibility offer. Cable system... individual compatibility problems. (d) Cable operators shall provide such equipment at the request...

  19. Perceived value creation process: focus on the company offer

    Directory of Open Access Journals (Sweden)

    Irena Pandža Bajs

    2012-12-01

    Full Text Available In the competitive business environment, as the number of rational consumers faced with many choices increases, companies can achieve their dominance best by applying the business concepts oriented to consumers in order to deliver a value which is different and better than that of their competitors. Among the various products on the market, an educated consumer chooses the offer that provides the greatest value for him/her. Therefore, it is essential for each company to determine how consumers perceive the value of its offer, and which factors determine the high level of perceived value for current and potential consumers. An analysis of these factors provides guidance on how to improve the existing offer and what the offer to be delivered in the future should be like. That could increase the perceived value of the company offer and result in a positive impact on consumer satisfaction and on establishing a stronger, longterm relationship with consumers. The process of defining the perceived value of a particular market offer is affected by the factors of the respective company’s offer as well as by competition factors, consumer factors and buying process factors. The aim of this paper is to analyze the relevant knowledge about the process of creating the perceived value of the company’s market offer and the factors that influence this process. The paper presents a conceptual model of the perceived value creation process in consumers’ mind.

  20. Service offerings and agreements a guide for ITIL exam candidates

    CERN Document Server

    Griffiths, Richard

    2014-01-01

    By implementing good practice in service offerings and agreements, IT departments can achieve customer satisfaction. Providing clarification and expansion of the core ITIL® texts, this new edition reflects the current thinking from ITIL and is aligned to the latest syllabus for the Intermediate Certificate in Service Offerings and Agreements.

  1. Nonprofit Groups Offer Genetic Testing for Jewish Students

    Science.gov (United States)

    Supiano, Beckie

    2008-01-01

    This article describes how nonprofit organizations like Hillel are offering free genetic testing for Jewish college students. A growing number of colleges, including Pittsburgh, Brandeis University, and Columbia University are offering students free or reduced-cost screenings for diseases common to Jewish population. Genetic diseases common to…

  2. 12 CFR 16.4 - Communications not deemed an offer.

    Science.gov (United States)

    2010-01-01

    ... Section 16.4 Banks and Banking COMPTROLLER OF THE CURRENCY, DEPARTMENT OF THE TREASURY SECURITIES OFFERING...) Subsequent to the filing of a registration statement, any notice, circular, advertisement, letter, or other... CFR 230.134); (3) Subsequent to the filing of a registration statement, any oral offer of...

  3. Service offerings and agreements a guide for ITIL exam candidates

    CERN Document Server

    Griffiths, Richard

    2014-01-01

    By implementing good practice in service offerings and agreements, IT departments can achieve customer satisfaction. Providing clarification and expansion of the core ITIL® texts, this new edition reflects the current thinking from ITIL and is aligned to the latest syllabus for the Intermediate Certificate in Service Offerings and Agreements.

  4. Information Uncertainty in Electricity Markets: Introducing Probabilistic Offers

    DEFF Research Database (Denmark)

    Papakonstantinou, Athanasios; Pinson, Pierre

    2016-01-01

    We propose a shift from the current paradigm of electricity markets treating stochastic producers similarly to conventional ones in terms of their offers. We argue that the producers’ offers should be probabilistic to reflect the limited predictability of renewable energy generation, while we...

  5. Negotiation as a form of persuasion: arguments in first offers.

    Science.gov (United States)

    Maaravi, Yossi; Ganzach, Yoav; Pazy, Asya

    2011-08-01

    In this article we examined aspects of negotiation within a persuasion framework. Specifically, we investigated how the provision of arguments that justified the first offer in a negotiation affected the behavior of the parties, namely, how it influenced counteroffers and settlement prices. In a series of 4 experiments and 2 pilot studies, we demonstrated that when the generation of counterarguments was easy, negotiators who did not add arguments to their first offers achieved superior results compared with negotiators who used arguments to justify their first offer. We hypothesized and provided evidence that adding arguments to a first offer was likely to cause the responding party to search for counterarguments, and this, in turn, led him or her to present counteroffers that were further away from the first offer.

  6. Decision support for organ offers in liver transplantation.

    Science.gov (United States)

    Volk, Michael L; Goodrich, Nathan; Lai, Jennifer C; Sonnenday, Christopher; Shedden, Kerby

    2015-06-01

    Organ offers in liver transplantation are high-risk medical decisions with a low certainty of whether a better liver offer will come along before death. We hypothesized that decision support could improve the decision to accept or decline. With data from the Scientific Registry of Transplant Recipients, survival models were constructed for 42,857 waiting-list patients and 28,653 posttransplant patients from 2002 to 2008. Daily covariate-adjusted survival probabilities from these 2 models were combined into a 5-year area under the curve to create an individualized prediction of whether an organ offer should be accepted for a given patient. Among 650,832 organ offers from 2008 to 2013, patient survival was compared by whether the clinical decision was concordant or discordant with model predictions. The acceptance benefit (AB)--the predicted gain or loss of life by accepting a given organ versus waiting for the next organ--ranged from 3 to -22 years (harm) and varied geographically; for example, the average benefit of accepting a donation after cardiac death organ ranged from 0.47 to -0.71 years by donation service area. Among organ offers, even when AB was >1 year, the offer was only accepted 10% of the time. Patient survival from the time of the organ offer was better if the model recommendations and the clinical decision were concordant: for offers with AB > 0, the 3-year survival was 80% if the offer was accepted and 66% if it was declined (P decision support may improve patient survival in liver transplantation. © 2015 American Association for the Study of Liver Diseases.

  7. Enhanced static ground power unit based on flying capacitor based h-bridge hybrid active-neutral-point-clamped converter

    DEFF Research Database (Denmark)

    Abarzadeh, Mostafa; Madadi Kojabadi, Hossein; Deng, Fujin

    2016-01-01

    Static power converters have various applications, such as static ground power units (GPUs) for airplanes. This study proposes a new configuration of a static GPU based on a novel nine-level flying capacitor h-bridge active-neutral-point-clamped (FCHB_ANPC) converter. The main advantages...... improvement in GPU dynamic performances. This progress is achieved by utilising the proposed FCHB converter to an ANPC converter and using the suggested modulation method. This leads to diminish the size and cost and enhance the feasibility and reliability of the converter. Applying the proposed modulation...

  8. Can Protein in Common Skin Bacteria Offer Disease Protection?

    Science.gov (United States)

    ... https://medlineplus.gov/news/fullstory_162192.html Can Protein in Common Skin Bacteria Offer Disease Protection? RoxP ... Swedish researchers report that Propionibacterium acnes secretes a protein called RoxP that protects against bacteria that are ...

  9. ISLAND DESTINATIONS' TOURISM OFFER - TOURISTS' VS. RESIDENTS' ATTITUDES

    National Research Council Canada - National Science Library

    Daniela Soldic Frleta

    2014-01-01

      The intent of this paper is to provide empirical insights into the tourists' and residents' attitudes regarding islands tourism and its offer, using the Kvarner Bay islands (Losinj and Rab) as a case study...

  10. 'Groundbreaking' Research Offers Clues to Cause of Dyslexia

    Science.gov (United States)

    ... html 'Groundbreaking' Research Offers Clues to Cause of Dyslexia Brain scans revealed that those with the reading ... 2016 (HealthDay News) -- People with the reading disability dyslexia may have brain differences that are surprisingly wide- ...

  11. New Therapies Offer Valuable Options for Patients with Melanoma

    Science.gov (United States)

    Two phase III clinical trials of new therapies for patients with metastatic melanoma presented in June at the 2011 ASCO conference confirmed that vemurafenib and ipilimumab (Yervoy™) offer valuable new options for the disease.

  12. French scientists offered time to set up companies

    CERN Multimedia

    Butler, D

    1999-01-01

    The French minister of national education, research and technology announced that French researchers working for public research institutes and universities are to be offered up to six years sabbatical leave to set up their own companies (11 para)

  13. Gene Therapy Offers Hope to Some Hemophilia Patients

    Science.gov (United States)

    ... page: https://medlineplus.gov/news/fullstory_162389.html Gene Therapy Offers Hope to Some Hemophilia Patients Small, preliminary trial suggests it may free hemophilia B patients from transfusions To use the sharing features on this page, please enable ...

  14. Simulating Lattice Spin Models on Graphics Processing Units

    CERN Document Server

    Levy, Tal; Rabani, Eran; 10.1021/ct100385b

    2012-01-01

    Lattice spin models are useful for studying critical phenomena and allow the extraction of equilibrium and dynamical properties. Simulations of such systems are usually based on Monte Carlo (MC) techniques, and the main difficulty is often the large computational effort needed when approaching critical points. In this work, it is shown how such simulations can be accelerated with the use of NVIDIA graphics processing units (GPUs) using the CUDA programming architecture. We have developed two different algorithms for lattice spin models, the first useful for equilibrium properties near a second-order phase transition point and the second for dynamical slowing down near a glass transition. The algorithms are based on parallel MC techniques, and speedups from 70- to 150-fold over conventional single-threaded computer codes are obtained using consumer-grade hardware.

  15. Accelerating Radio Astronomy Cross-Correlation with Graphics Processing Units

    CERN Document Server

    Clark, M A; Greenhill, L J

    2011-01-01

    We present a highly parallel implementation of the cross-correlation of time-series data using graphics processing units (GPUs), which is scalable to hundreds of independent inputs and suitable for the processing of signals from "Large-N" arrays of many radio antennas. The computational part of the algorithm, the X-engine, is implementated efficiently on Nvidia's Fermi architecture, sustaining up to 79% of the peak single precision floating-point throughput. We compare performance obtained for hardware- and software-managed caches, observing significantly better performance for the latter. The high performance reported involves use of a multi-level data tiling strategy in memory and use of a pipelined algorithm with simultaneous computation and transfer of data from host to device memory. The speed of code development, flexibility, and low cost of the GPU implementations compared to ASIC and FPGA implementations have the potential to greatly shorten the cycle of correlator development and deployment, for case...

  16. Air pollution modelling using a graphics processing unit with CUDA

    CERN Document Server

    Molnar, Ferenc; Meszaros, Robert; Lagzi, Istvan; 10.1016/j.cpc.2009.09.008

    2010-01-01

    The Graphics Processing Unit (GPU) is a powerful tool for parallel computing. In the past years the performance and capabilities of GPUs have increased, and the Compute Unified Device Architecture (CUDA) - a parallel computing architecture - has been developed by NVIDIA to utilize this performance in general purpose computations. Here we show for the first time a possible application of GPU for environmental studies serving as a basement for decision making strategies. A stochastic Lagrangian particle model has been developed on CUDA to estimate the transport and the transformation of the radionuclides from a single point source during an accidental release. Our results show that parallel implementation achieves typical acceleration values in the order of 80-120 times compared to CPU using a single-threaded implementation on a 2.33 GHz desktop computer. Only very small differences have been found between the results obtained from GPU and CPU simulations, which are comparable with the effect of stochastic tran...

  17. Significantly reducing registration time in IGRT using graphics processing units

    DEFF Research Database (Denmark)

    Noe, Karsten Østergaard; Denis de Senneville, Baudouin; Tanderup, Kari

    2008-01-01

    Purpose/Objective For online IGRT, rapid image processing is needed. Fast parallel computations using graphics processing units (GPUs) have recently been made more accessible through general purpose programming interfaces. We present a GPU implementation of the Horn and Schunck method...... respiration phases in a free breathing volunteer and 41 anatomical landmark points in each image series. The registration method used is a multi-resolution GPU implementation of the 3D Horn and Schunck algorithm. It is based on the CUDA framework from Nvidia. Results On an Intel Core 2 CPU at 2.4GHz each...... registration took 30 minutes. On an Nvidia Geforce 8800GTX GPU in the same machine this registration took 37 seconds, making the GPU version 48.7 times faster. The nine image series of different respiration phases were registered to the same reference image (full inhale). Accuracy was evaluated on landmark...

  18. Polymer Field-Theory Simulations on Graphics Processing Units

    CERN Document Server

    Delaney, Kris T

    2012-01-01

    We report the first CUDA graphics-processing-unit (GPU) implementation of the polymer field-theoretic simulation framework for determining fully fluctuating expectation values of equilibrium properties for periodic and select aperiodic polymer systems. Our implementation is suitable both for self-consistent field theory (mean-field) solutions of the field equations, and for fully fluctuating simulations using the complex Langevin approach. Running on NVIDIA Tesla T20 series GPUs, we find double-precision speedups of up to 30x compared to single-core serial calculations on a recent reference CPU, while single-precision calculations proceed up to 60x faster than those on the single CPU core. Due to intensive communications overhead, an MPI implementation running on 64 CPU cores remains two times slower than a single GPU.

  19. United abominations: Density functional studies of heavy metal chemistry

    Energy Technology Data Exchange (ETDEWEB)

    Schoendorff, George [Iowa State Univ., Ames, IA (United States)

    2012-01-01

    Carbonyl and nitrile addition to uranyl (UO22+) are studied. The competition between nitrile and water ligands in the formation of uranyl complexes is investigated. The possibility of hypercoordinated uranyl with acetone ligands is examined. Uranyl is studied with diactone alcohol ligands as a means to explain the apparent hypercoordinated uranyl. A discussion of the formation of mesityl oxide ligands is also included. A joint theory/experimental study of reactions of zwitterionic boratoiridium(I) complexes with oxazoline-based scorpionate ligands is reported. A computational study was done of the catalytic hydroamination/cyclization of aminoalkenes with zirconium-based catalysts. Techniques are surveyed for programming for graphical processing units (GPUs) using Fortran.

  20. Analysis of the offer of chosen tour operator

    OpenAIRE

    Boušová, Ivana

    2012-01-01

    Bachelor thesis is focused on the analysis of tour operator Victoria offer. In theoretical part there are defined basic terms of tourism industry, such as tour operator, tour agent and holiday package. Next part analyzes the offer of the tour operator, which is based on Victoria's profile and services, which are provided by the tour operator. The third part contains important information about the sale. There are mentioned terms of payment, discount programmes, commission sale and forms of pr...

  1. Pricing Multi-play Offers under Uncertainty and Competition

    OpenAIRE

    Hélène, Le Cadre

    2007-01-01

    In a mature market, telecommunication operators try to differentiate themselves by marketing bundles offers. In this highly competitive context, operators should anticipate the strategies of their adversaries and guess the consumers' tastes, to maximize their benefits. To price their offers, operators have to deal with deep uncertainties on the other operators' cost structures, strategies, and on the consumers' preferences. We segment the market and estimate the consumers' subjective prices, ...

  2. Offer alternation at local market: Case study Terranova Serbia

    Directory of Open Access Journals (Sweden)

    Jovanović Aleksandar

    2013-01-01

    Full Text Available Primary goal of the contemporary companies imposes monitoring of customer needs and appropriate responses to them, in order to achieve a significant competitive advantage in terms of profit growth and increased market share. In this sense, it can be concluded that the adaptation of supply accordingly to consumer needs presents a very significant component of increased competitiveness and overall business performance. This is achieved through adjustment of offered goods, by taking care of the consumers' needs and the efficient circulation of information from local vendors who are in direct contact with customers and senior decision making management levels. In this paper, authors analyze offer adjustment to the local market on the example of TEDDY S.p.A., at Serbian market, identify importance and role of project management in defining what the organization offers, and analyze project management of offer adjustment. Through selected case study, example of process of defining new company offer in accordance with the characteristics of the local market is presented as well as its impact on profitability growth. Furthermore, the role of offer adjustment in creating the market position of the organization is presented as well as the necessity of implementing such a project in time of expressed differences in the needs of consumers in different local markets.

  3. Availability of websites offering to sell psilocybin spores and psilocybin.

    Science.gov (United States)

    Lott, Jason P; Marlowe, Douglas B; Forman, Robert F

    2009-09-01

    This study assesses the availability of websites offering to sell psilocybin spores and psilocybin, a powerful hallucinogen contained in Psilocybe mushrooms. Over a 25-month period beginning in March 2003, eight searches were conducted in Google using the term "psilocybin spores." In each search the first 100 nonsponsored links obtained were scored by two independent raters according to standardized criteria to determine whether they offered to sell psilocybin or psilocybin spores. No attempts were made to procure the products offered for sale in order to ascertain whether the marketed psilocybin was in fact "genuine" or "counterfeit." Of the 800 links examined, 58% led to websites offering to sell psilocybin spores. Additionally, evidence that whole Psilocybe mushrooms are offered for sale online was obtained. Psilocybin and psilocybin spores were found to be widely available for sale over the Internet. Online purchase of psilocybin may facilitate illicit use of this potent psychoactive substance. Additional studies are needed to assess whether websites offering to sell psilocybin and psilocybin spores actually deliver their products as advertised.

  4. Adolescents' responses to peer smoking offers: the role of sensation seeking and self-esteem.

    Science.gov (United States)

    Greene, Kathryn; Banerjee, Smita C

    2008-01-01

    This article deals with an important topic (youth smoking) and makes a contribution to the literature by validating existing research and extending our understanding of smoking resistance strategies. This study classified adolescent reports of their responses to cigarette smoking offers utilizing four drug refusal strategies of refuse, explain, avoid, and leave (REAL) and explored how personality factors explain adolescents' use of cigarette refusal strategies. Participants were predominantly Hispanic junior high students (6th-8th grades) from schools in the Northeast United States who participated in a survey design (N = 260). The strategy of explain was reported most frequently for initial and follow-up smoking offers. Adolescents with a greater number of friends who smoked were more likely to use the avoid strategy for initial smoking offers. Sensation seeking was positively related to the use of leave and avoid strategies for initial smoking offers and leave strategy for follow-up smoking offers. No association was found between self-esteem and use of smoking refusal strategies. Implications and directions for future research are discussed.

  5. Density functional theory calculation on many-cores hybrid central processing unit-graphic processing unit architectures.

    Science.gov (United States)

    Genovese, Luigi; Ospici, Matthieu; Deutsch, Thierry; Méhaut, Jean-François; Neelov, Alexey; Goedecker, Stefan

    2009-07-21

    We present the implementation of a full electronic structure calculation code on a hybrid parallel architecture with graphic processing units (GPUs). This implementation is performed on a free software code based on Daubechies wavelets. Such code shows very good performances, systematic convergence properties, and an excellent efficiency on parallel computers. Our GPU-based acceleration fully preserves all these properties. In particular, the code is able to run on many cores which may or may not have a GPU associated, and thus on parallel and massive parallel hybrid machines. With double precision calculations, we may achieve considerable speedup, between a factor of 20 for some operations and a factor of 6 for the whole density functional theory code.

  6. Optimization Solutions for Improving the Performance of the Parallel Reduction Algorithm Using Graphics Processing Units

    Directory of Open Access Journals (Sweden)

    Ion LUNGU

    2012-01-01

    Full Text Available In this paper, we research, analyze and develop optimization solutions for the parallel reduction function using graphics processing units (GPUs that implement the Compute Unified Device Architecture (CUDA, a modern and novel approach for improving the software performance of data processing applications and algorithms. Many of these applications and algorithms make use of the reduction function in their computational steps. After having designed the function and its algorithmic steps in CUDA, we have progressively developed and implemented optimization solutions for the reduction function. In order to confirm, test and evaluate the solutions' efficiency, we have developed a custom tailored benchmark suite. We have analyzed the obtained experimental results regarding: the comparison of the execution time and bandwidth when using graphic processing units covering the main CUDA architectures (Tesla GT200, Fermi GF100, Kepler GK104 and a central processing unit; the data type influence; the binary operator's influence.

  7. Care closer to home - what does it offer? [PhD abstract

    DEFF Research Database (Denmark)

    Overgaard, Charlotte

    2012-01-01

    The thesis represents an empirical study of safety and quality of maternity care in freestanding midwifery units in Denmark. It is publication-based and consists of a thesis overview and three peer-reviewed publications. A multidisciplinary and mixed-methods approach is applied and the work is in...... to women’s autonomy and the “humanisation” of care, midwifery units have emerged as an alternative to OU care for low-risk women, offering low-technological, individualised, and patient-centred care.......-income countries, obstetric units (OU) have become the primary setting for birth, also for low risk women. This model of care is dominated by a medical and technological perspective that has led some to question the ability of OUs to meet the needs of all birthing women. While OUs have given increased attention...

  8. Investor Reaction to Mandatory Offers on the Warsaw Stock Exchange

    Directory of Open Access Journals (Sweden)

    Szymon Okoń

    2012-06-01

    Full Text Available The following paper aims to assess investor reaction to mandatory offers on the Warsaw Stock Exchange, which is important because knowledge about these reactions can be used to make better investment decisions. This paper highlights the importance of procedure in making a mandatory offer and its grounds in the Polish legal system. Additionally, it presents empirical research on the reactions of investors to mandatory offers on the Warsaw Stock Exchange. It has been provided that mandatory offers have a significant impact on the price of a company’s shares listed on the Warsaw Stock Exchange. Knowledge about the reactions of investors to a mandatory offer may be used when selecting securities for an investment portfolio. The findings may provide guidance in deciding whether to begin or end investment in the company, both for individual and institutional investors. The event study methodology approach used in the paper is regarded as valuable and can be the basis for further research in other areas of the capital market research, especially in the context of information efficiency.

  9. Lattice QCD simulations using the OpenACC platform

    Science.gov (United States)

    Majumdar, Pushan

    2016-10-01

    In this article we will explore the OpenACC platform for programming Graphics Processing Units (GPUs). The OpenACC platform offers a directive based programming model for GPUs which avoids the detailed data flow control and memory management necessary in a CUDA programming environment. In the OpenACC model, programs can be written in high level languages with OpenMP like directives. We present some examples of QCD simulation codes using OpenACC and discuss their performance on the Fermi and Kepler GPUs.

  10. Accelerating MATLAB with GPU computing a primer with examples

    CERN Document Server

    Suh, Jung W

    2013-01-01

    Beyond simulation and algorithm development, many developers increasingly use MATLAB even for product deployment in computationally heavy fields. This often demands that MATLAB codes run faster by leveraging the distributed parallelism of Graphics Processing Units (GPUs). While MATLAB successfully provides high-level functions as a simulation tool for rapid prototyping, the underlying details and knowledge needed for utilizing GPUs make MATLAB users hesitate to step into it. Accelerating MATLAB with GPUs offers a primer on bridging this gap. Starting with the basics, setting up MATLAB for

  11. BUSINESS NEEDS AND GRADUATE BUSINESS SCHOOL OFFERINGS IN MARKETING.

    Science.gov (United States)

    Thams, Meg; Glueck, Deborah

    2007-04-01

    The purpose of this study was to determine if a gap exists in the skill and knowledge businesses require of marketing employees and what the Association to Advance Collegiate Schools of Business (AACSB) accredited schools actually provide. In this quantitative study, two set of data were collected and compared, and a gap analysis conducted. A questionnaire was used to obtain data from members of the Business Marketing Association (BMA) regarding course preferences that would best prepare students for positions in marketing. Records analysis was then undertaken of the marketing course offerings of AACSB accredited MBA programs offering an emphasis in Marketing. Gap analysis was conducted by applying a test of difference to the results of the two data collection efforts. Results of the study suggest that some misalignment between school offerings and business needs exists.

  12. Studi Empiris Tingkat Underpricing pada Initial Public Offering

    Directory of Open Access Journals (Sweden)

    Daniel Sugama Stephanus

    2015-12-01

    Full Text Available At Initial Public Offering (IPO, underpricing still happened. This is due to asymmetric information between firms and underwriters. This research to analyze accounting factors and non-accounting factors which affect the level of underpricing. This research is quantitative research with regression analysis. The datas of this research are companies which did an IPO in 2010 – 2014. Accounting factors are ROA, DER, CR, and SIZE. Non-accounting factors are OFFER, AGE, auditor’s reputation, and underwriter’s reputation. Results showed that only underwriter’s reputation affects underpricing. This indicates that underwriter’s reputation is very important for firms to reduce underpricing in an IPO.

  13. Viscoelastic Finite Difference Modeling Using Graphics Processing Units

    Science.gov (United States)

    Fabien-Ouellet, G.; Gloaguen, E.; Giroux, B.

    2014-12-01

    Full waveform seismic modeling requires a huge amount of computing power that still challenges today's technology. This limits the applicability of powerful processing approaches in seismic exploration like full-waveform inversion. This paper explores the use of Graphics Processing Units (GPU) to compute a time based finite-difference solution to the viscoelastic wave equation. The aim is to investigate whether the adoption of the GPU technology is susceptible to reduce significantly the computing time of simulations. The code presented herein is based on the freely accessible software of Bohlen (2002) in 2D provided under a General Public License (GNU) licence. This implementation is based on a second order centred differences scheme to approximate time differences and staggered grid schemes with centred difference of order 2, 4, 6, 8, and 12 for spatial derivatives. The code is fully parallel and is written using the Message Passing Interface (MPI), and it thus supports simulations of vast seismic models on a cluster of CPUs. To port the code from Bohlen (2002) on GPUs, the OpenCl framework was chosen for its ability to work on both CPUs and GPUs and its adoption by most of GPU manufacturers. In our implementation, OpenCL works in conjunction with MPI, which allows computations on a cluster of GPU for large-scale model simulations. We tested our code for model sizes between 1002 and 60002 elements. Comparison shows a decrease in computation time of more than two orders of magnitude between the GPU implementation run on a AMD Radeon HD 7950 and the CPU implementation run on a 2.26 GHz Intel Xeon Quad-Core. The speed-up varies depending on the order of the finite difference approximation and generally increases for higher orders. Increasing speed-ups are also obtained for increasing model size, which can be explained by kernel overheads and delays introduced by memory transfers to and from the GPU through the PCI-E bus. Those tests indicate that the GPU memory size

  14. Accelerating glassy dynamics using graphics processing units

    CERN Document Server

    Colberg, Peter H

    2009-01-01

    Modern graphics hardware offers peak performances close to 1 Tflop/s, and NVIDIA's CUDA provides a flexible and convenient programming interface to exploit these immense computing resources. We demonstrate the ability of GPUs to perform high-precision molecular dynamics simulations for nearly a million particles running stably over many days. Particular emphasis is put on the numerical long-time stability in terms of energy and momentum conservation. Floating point precision is a crucial issue here, and sufficient precision is maintained by double-single emulation of the floating point arithmetic. As a demanding test case, we have reproduced the slow dynamics of a binary Lennard-Jones mixture close to the glass transition. The improved numerical accuracy permits us to follow the relaxation dynamics of a large system over 4 non-trivial decades in time. Further, our data provide evidence for a negative power-law decay of the velocity autocorrelation function with exponent 5/2 in the close vicinity of the transi...

  15. 44 CFR 19.415 - Access to course offerings.

    Science.gov (United States)

    2010-10-01

    ... 44 Emergency Management and Assistance 1 2010-10-01 2010-10-01 false Access to course offerings. 19.415 Section 19.415 Emergency Management and Assistance FEDERAL EMERGENCY MANAGEMENT AGENCY... activity of which involves bodily contact. (4) Where use of a single standard of measuring skill...

  16. Offering-level strategy formulation in health service organizations.

    Science.gov (United States)

    Pointer, D D

    1990-01-01

    One of six different strategies must be selected for a health service offering to provide consumers with distinctive value and achieve sustainable competitive advantage in a market or market segment. Decisions must be made regarding objectives sought, market segmentation, market scope, and the customer-value proposition that will be pursued.

  17. Special offer from the Carlson Wagonlit Travel agency

    CERN Multimedia

    Carlson Wagonlit Travel

    2011-01-01

    Special offer 4th – 28th April Reduced CWT service fee for any new reservation of a holiday package (flight + hotel/apartment). Reserve for 1, 2, 3 or more travelers – pay the service fee for one person only. Valid at your CWT CERN agency.

  18. 36 CFR 1150.79 - Offer of proof.

    Science.gov (United States)

    2010-07-01

    ... Section 1150.79 Parks, Forests, and Public Property ARCHITECTURAL AND TRANSPORTATION BARRIERS COMPLIANCE... documentary or written form or refers to documents or records, a copy of the evidence shall be marked for identification and shall accompany the record as the offer of proof. ...

  19. Unique Refresher Course Offered in U.K.

    Science.gov (United States)

    Chemical and Engineering News, 1979

    1979-01-01

    The Advanced Diploma in Chemical Sciences is a new course being offered at the University of East Anglia in Norwich, England. It is billed as a refresher course in modern organic chemistry for senior industrial scientists and is aimed at Ph.D.'s out of school for at least ten years. (Author/BB)

  20. Are Universities Reaping the Available Benefits Internship Programs Offer?

    Science.gov (United States)

    Weible, Rick

    2010-01-01

    Many research studies have examined the benefits student internships offer students and employers, but few looked at the benefits internships might lend to educational institutions. A survey instrument was developed and sent to 619 deans of all U.S. business programs. In all, 29% replied. The results indicate some institutions are gaining the…

  1. Sustainable Development: Does the Capability Approach have Anything to Offer?

    DEFF Research Database (Denmark)

    Crabtree, Andrew

    2013-01-01

    Although the sustainability of development is one of the most important problems facing the world, it has received little attention from the capability approach. This article asks whether the capability approach has anything to offer the debate that has continued for over a quarter of a century...

  2. What serious video games can offer child obesity prevention

    Science.gov (United States)

    Childhood obesity is a worldwide issue, and effective methods encouraging children to adopt healthy diet and physical activity behaviors are needed. This viewpoint addresses the promise of serious video games, and why they may offer one method for helping children eat healthier and become more physi...

  3. 43 CFR 41.415 - Access to course offerings.

    Science.gov (United States)

    2010-10-01

    ... OF SEX IN EDUCATION PROGRAMS OR ACTIVITIES RECEIVING FEDERAL FINANCIAL ASSISTANCE Discrimination on the Basis of Sex in Education Programs or Activities Prohibited § 41.415 Access to course offerings...) Where use of a single standard of measuring skill or progress in a physical education class has...

  4. 28 CFR 54.415 - Access to course offerings.

    Science.gov (United States)

    2010-07-01

    ... SEX IN EDUCATION PROGRAMS OR ACTIVITIES RECEIVING FEDERAL FINANCIAL ASSISTANCE Discrimination on the Basis of Sex in Education Programs or Activities Prohibited § 54.415 Access to course offerings. (a) A...) Where use of a single standard of measuring skill or progress in a physical education class has...

  5. What serious video games can offer child obesity prevention.

    Science.gov (United States)

    Thompson, Debbe

    2014-07-16

    Childhood obesity is a worldwide issue, and effective methods encouraging children to adopt healthy diet and physical activity behaviors are needed. This viewpoint addresses the promise of serious video games, and why they may offer one method for helping children eat healthier and become more physically active. Lessons learned are provided, as well as examples gleaned from personal experiences.

  6. 29 CFR 36.415 - Access to course offerings.

    Science.gov (United States)

    2010-07-01

    ... 29 Labor 1 2010-07-01 2010-07-01 true Access to course offerings. 36.415 Section 36.415 Labor Office of the Secretary of Labor NONDISCRIMINATION ON THE BASIS OF SEX IN EDUCATION PROGRAMS OR..., industrial, business, vocational, technical, home economics, music, and adult education courses. (b)(1)...

  7. Negotiating compensation packages. Bargaining to get the best offer.

    Science.gov (United States)

    Cejka, S

    2001-01-01

    You worked hard to search for a job--sending out resumes, networking with your peers and contacting recruiters. But when the offer finally comes, your work isn't done. Now it's time to negotiate the best salary and employment contract. Take a look at today's standard compensation packages, including bonuses, stock options, relocation expenses and other features. What's negotiable and what's not?

  8. Instagram Photos May Offer Snapshot of Mental Health

    Science.gov (United States)

    ... 167672.html Instagram Photos May Offer Snapshot of Mental Health Social media posts seem to give clues to presence of ... show up in the photos people post on social media sites like Facebook or ... their history of mental health. The team wound up collecting almost 44,000 ...

  9. Social media offers many ways to grow business

    Energy Technology Data Exchange (ETDEWEB)

    Madison, Alison L.

    2011-03-13

    Facebook and Twitter and YouTube, oh my! For some of us the added communications channels offered by social media can feel overwhelming in a hurry. But whether we like it or not, they’re proving to be valuable marketing tools that small and large companies alike are jumping on board to explore.

  10. General Education in Occupational Education Programs Offered by Junior Colleges.

    Science.gov (United States)

    Wiegman, Robert R.

    This report, directed toward junior college board members, presidents, deans, department heads, and teachers, as well as legislators, attempts to stimulate thought and action to improve general education in occupational programs offered by junior colleges. Following a review of the unsatisfactory status of present curricula, a rationale and…

  11. Changes in hotel offer of Belgrade driven by tourist demand

    Directory of Open Access Journals (Sweden)

    Šimičević Dario

    2015-01-01

    Full Text Available This paper focuses on changes that have been noticed in hotel offer of Belgrade during first decade of 21st century. It will be shown that these changes are associated with characteristics of tourism demand, reason of tourists visit and their behavior during the stay. Changes are expressed not only in the rise of number of beds but also in the structural changes of hotel offer in Belgrade. As one of major destinations in Serbia which attracts the majority of foreign and a great deal of domestic guests, Belgrade had to change hotel offer quantitatively as well as qualitatively. Also, this paper will focus on the influence of business tourism development on hotel offer. Business tourism is in tight relationship with changes identified and presented in this paper. Business tourists present a major group of tourists in Belgrade and it is expected to remain so in the following years. The strategy of tourism development of Belgrade and developing of tourism investments back up these statements that will be shown in this paper.

  12. Evaluating the Effectiveness of Selected Continuing Education Offerings

    Science.gov (United States)

    Deets, Carol; Blume, Dorothy

    1977-01-01

    This paper presented at the 1976 National Conference on Continuing Education in Nursing describes evaluation methodology used to determine the effectiveness of different continuing education offerings in nursing. The evaluation design, workshops for inservice directors, findings and problems, and examples of three evaluation forms used are…

  13. 24 CFR 3.415 - Access to course offerings.

    Science.gov (United States)

    2010-04-01

    ... 24 Housing and Urban Development 1 2010-04-01 2010-04-01 false Access to course offerings. 3.415 Section 3.415 Housing and Urban Development Office of the Secretary, Department of Housing and Urban..., business, vocational, technical, home economics, music, and adult education courses. (b)(1) With respect to...

  14. The private rejection of unfair offers and emotional commitment.

    Science.gov (United States)

    Yamagishi, Toshio; Horita, Yutaka; Takagishi, Haruto; Shinada, Mizuho; Tanida, Shigehito; Cook, Karen S

    2009-07-14

    In a series of experiments, we demonstrate that certain players of an economic game reject unfair offers even when this behavior increases rather than decreases inequity. A substantial proportion (30-40%, compared with 60-70% in the standard ultimatum game) of those who responded rejected unfair offers even when rejection reduced only their own earnings to 0, while not affecting the earnings of the person who proposed the unfair split (in an impunity game). Furthermore, even when the responders were not able to communicate their anger to the proposers by rejecting unfair offers in a private impunity game, a similar rate of rejection was observed. The rejection of unfair offers that increases inequity cannot be explained by the social preference for inequity aversion or reciprocity; however, it does provide support for the model of emotion as a commitment device. In this view, emotions such as anger or moral disgust lead people to disregard the immediate consequences of their behavior, committing them to behave consistently to preserve integrity and maintain a reputation over time as someone who is reliably committed to this behavior.

  15. Challenges of Offering a MOOC from an LMIC

    Science.gov (United States)

    Pasha, Aamna; Abidi, Syed Hani; Ali, Syed

    2016-01-01

    Massive open online courses (MOOCs) were initiated in the early 2000s by certain leading American and European universities. An integral part of the MOOC philosophy has been to provide open access to online learning. Despite their potential advantages to local audiences, faculty and institutions, the number of MOOCs offered from low and middle…

  16. Why do patients decline participation in offered pulmonary rehabilitation?

    DEFF Research Database (Denmark)

    Mathar, Helle; Fastholm, Pernille; Lange, Peter

    2017-01-01

    pulmonary rehabilitation. Each category was named using a content characteristic word. RESULTS: This study shows that some patients do not remember or recall that they have been offered pulmonary rehabilitation during hospitalization. Especially the oldest patients perceive themselves to be too frail from...

  17. Classroom "Cupcake" Celebrations: Observations of Foods Offered and Consumed

    Science.gov (United States)

    Isoldi, Kathy K.; Dalton, Sharron; Rodriguez, Desiree P.; Nestle, Marion

    2012-01-01

    Objective: To describe food and beverage types offered and consumed during classroom celebrations at an elementary school in a low-income, urban community. In addition, to report student intake of fresh fruit provided alongside other party foods. Methods: Observations held during 4 classroom celebrations. Food and beverage items were measured and…

  18. Meal Pattern Requirements and Offer versus Serve Manual.

    Science.gov (United States)

    Food and Nutrition Service (USDA), Washington, DC.

    This manual contains information on federal policy regarding meal-pattern requirements for school-nutrition programs. It also describes the Offer Versus Serve (OVS) provision, which allows students to decline either one or two food items they do not intend to eat in order to reduce food waste. The manual explains food components, gives examples of…

  19. 31 CFR 309.6 - Public notice of offering.

    Science.gov (United States)

    2010-07-01

    ... 31 Money and Finance: Treasury 2 2010-07-01 2010-07-01 false Public notice of offering. 309.6... SERVICE, DEPARTMENT OF THE TREASURY BUREAU OF THE PUBLIC DEBT ISSUE AND SALE OF TREASURY BILLS § 309.6... Reserve Banks and branches and at the Bureau of the Public Debt, Washington, DC 20226, and the date...

  20. 7 CFR 3431.17 - VMLRP service agreement offer.

    Science.gov (United States)

    2010-01-01

    ....17 Agriculture Regulations of the Department of Agriculture (Continued) COOPERATIVE STATE RESEARCH, EDUCATION, AND EXTENSION SERVICE, DEPARTMENT OF AGRICULTURE VETERINARY MEDICINE LOAN REPAYMENT PROGRAM Administration of the Veterinary Medicine Loan Repayment Program § 3431.17 VMLRP service agreement offer. The...

  1. Epoxy adhesives offer broad choice for challenging applications

    CERN Multimedia

    Stevens, P

    2006-01-01

    "While they do not offer quite the same fascination as the "instant" adhesives that deliver great strenght from a single drop, exposy adhesives are now available with a remarkably wide range of material properties. As such, grades can be specified for use on almost any material type in most industries"

  2. Offer as a creative foundation of direct marketing

    Directory of Open Access Journals (Sweden)

    Kocić Milan

    2010-01-01

    Full Text Available Investments into communications mix are specific because besides significant financial resources it also involves creative marketing ideas that are a result of imagination and experience. However, the application of direct marketing communication mix is like 'a cave on iceberg'. The visible aspect presented through promotional content is only a small part of the whole process including experts from different fields. The process of creation a new communication mix of direct marketing must fully use all the challenges posed by sophisticated technology. Direct marketing depicted by new technology (cable television, telemarketing and the Internet can substantially increase the competitive advantage of enterprises in attracting their customer's attention and creating their loyalty as well. The companies use the mentioned media more frequently in order to create such offers that will lead towards strengthening loyalty of the existing customers, while simultaneously attracting those customers who were previously making purchases with our direct competitors. Hence, an offer as a creatively designed marketing vehicle proposes a number of potential benefits to a client. However, the client will be able to claim those benefits only in the case when a customer takes action as suggested by a communicated message. Basically, those are 'the rules of the game' that regulate relationship between company and client in direct marketing. Creating offers which guide the process of direct marketing is in the domain of customer relations management. Without an attractive offer, consumers will not create any type of measurable response to a company and therefore relations with clients would never emerge. On the other hand, if a company fails to continuously follow client's needs and wants, those engaged in direct sales will not be able to create adequate offers that would satisfy clients and motivate them to repeat purchases. .

  3. 20 CFR 655.1308 - Offered wage rate.

    Science.gov (United States)

    2010-04-01

    ... Employees' Benefits EMPLOYMENT AND TRAINING ADMINISTRATION, DEPARTMENT OF LABOR TEMPORARY EMPLOYMENT OF FOREIGN WORKERS IN THE UNITED STATES Labor Certification Process for Temporary Agricultural Employment in... for the occupation, skill level, and geographical area from the Bureau of Labor Statistics...

  4. Optical diagnostics of a single evaporating droplet using fast parallel computing on graphics processing units

    Science.gov (United States)

    Jakubczyk, D.; Migacz, S.; Derkachov, G.; Woźniak, M.; Archer, J.; Kolwas, K.

    2016-09-01

    We report on the first application of the graphics processing units (GPUs) accelerated computing technology to improve performance of numerical methods used for the optical characterization of evaporating microdroplets. Single microdroplets of various liquids with different volatility and molecular weight (glycerine, glycols, water, etc.), as well as mixtures of liquids and diverse suspensions evaporate inside the electrodynamic trap under the chosen temperature and composition of atmosphere. The series of scattering patterns recorded from the evaporating microdroplets are processed by fitting complete Mie theory predictions with gradientless lookup table method. We showed that computations on GPUs can be effectively applied to inverse scattering problems. In particular, our technique accelerated calculations of the Mie scattering theory on a single-core processor in a Matlab environment over 800 times and almost 100 times comparing to the corresponding code in C language. Additionally, we overcame problems of the time-consuming data post-processing when some of the parameters (particularly the refractive index) of an investigated liquid are uncertain. Our program allows us to track the parameters characterizing the evaporating droplet nearly simultaneously with the progress of evaporation.

  5. Parallel design of JPEG-LS encoder on graphics processing units

    Science.gov (United States)

    Duan, Hao; Fang, Yong; Huang, Bormin

    2012-01-01

    With recent technical advances in graphic processing units (GPUs), GPUs have outperformed CPUs in terms of compute capability and memory bandwidth. Many successful GPU applications to high performance computing have been reported. JPEG-LS is an ISO/IEC standard for lossless image compression which utilizes adaptive context modeling and run-length coding to improve compression ratio. However, adaptive context modeling causes data dependency among adjacent pixels and the run-length coding has to be performed in a sequential way. Hence, using JPEG-LS to compress large-volume hyperspectral image data is quite time-consuming. We implement an efficient parallel JPEG-LS encoder for lossless hyperspectral compression on a NVIDIA GPU using the computer unified device architecture (CUDA) programming technology. We use the block parallel strategy, as well as such CUDA techniques as coalesced global memory access, parallel prefix sum, and asynchronous data transfer. We also show the relation between GPU speedup and AVIRIS block size, as well as the relation between compression ratio and AVIRIS block size. When AVIRIS images are divided into blocks, each with 64×64 pixels, we gain the best GPU performance with 26.3x speedup over its original CPU code.

  6. Practical Implementation of Prestack Kirchhoff Time Migration on a General Purpose Graphics Processing Unit

    Directory of Open Access Journals (Sweden)

    Liu Guofeng

    2016-08-01

    Full Text Available In this study, we present a practical implementation of prestack Kirchhoff time migration (PSTM on a general purpose graphic processing unit. First, we consider the three main optimizations of the PSTM GPU code, i.e., designing a configuration based on a reasonable execution, using the texture memory for velocity interpolation, and the application of an intrinsic function in device code. This approach can achieve a speedup of nearly 45 times on a NVIDIA GTX 680 GPU compared with CPU code when a larger imaging space is used, where the PSTM output is a common reflection point that is gathered as I[nx][ny][nh][nt] in matrix format. However, this method requires more memory space so the limited imaging space cannot fully exploit the GPU sources. To overcome this problem, we designed a PSTM scheme with multi-GPUs for imaging different seismic data on different GPUs using an offset value. This process can achieve the peak speedup of GPU PSTM code and it greatly increases the efficiency of the calculations, but without changing the imaging result.

  7. Practical Implementation of Prestack Kirchhoff Time Migration on a General Purpose Graphics Processing Unit

    Science.gov (United States)

    Liu, Guofeng; Li, Chun

    2016-08-01

    In this study, we present a practical implementation of prestack Kirchhoff time migration (PSTM) on a general purpose graphic processing unit. First, we consider the three main optimizations of the PSTM GPU code, i.e., designing a configuration based on a reasonable execution, using the texture memory for velocity interpolation, and the application of an intrinsic function in device code. This approach can achieve a speedup of nearly 45 times on a NVIDIA GTX 680 GPU compared with CPU code when a larger imaging space is used, where the PSTM output is a common reflection point that is gathered as I[ nx][ ny][ nh][ nt] in matrix format. However, this method requires more memory space so the limited imaging space cannot fully exploit the GPU sources. To overcome this problem, we designed a PSTM scheme with multi-GPUs for imaging different seismic data on different GPUs using an offset value. This process can achieve the peak speedup of GPU PSTM code and it greatly increases the efficiency of the calculations, but without changing the imaging result.

  8. Graphics Processing Unit (GPU) Acceleration of the Goddard Earth Observing System Atmospheric Model

    Science.gov (United States)

    Putnam, Williama

    2011-01-01

    The Goddard Earth Observing System 5 (GEOS-5) is the atmospheric model used by the Global Modeling and Assimilation Office (GMAO) for a variety of applications, from long-term climate prediction at relatively coarse resolution, to data assimilation and numerical weather prediction, to very high-resolution cloud-resolving simulations. GEOS-5 is being ported to a graphics processing unit (GPU) cluster at the NASA Center for Climate Simulation (NCCS). By utilizing GPU co-processor technology, we expect to increase the throughput of GEOS-5 by at least an order of magnitude, and accelerate the process of scientific exploration across all scales of global modeling, including: The large-scale, high-end application of non-hydrostatic, global, cloud-resolving modeling at 10- to I-kilometer (km) global resolutions Intermediate-resolution seasonal climate and weather prediction at 50- to 25-km on small clusters of GPUs Long-range, coarse-resolution climate modeling, enabled on a small box of GPUs for the individual researcher After being ported to the GPU cluster, the primary physics components and the dynamical core of GEOS-5 have demonstrated a potential speedup of 15-40 times over conventional processor cores. Performance improvements of this magnitude reduce the required scalability of 1-km, global, cloud-resolving models from an unfathomable 6 million cores to an attainable 200,000 GPU-enabled cores.

  9. Exploring Graphics Processing Unit (GPU Resource Sharing Efficiency for High Performance Computing

    Directory of Open Access Journals (Sweden)

    Teng Li

    2013-11-01

    Full Text Available The increasing incorporation of Graphics Processing Units (GPUs as accelerators has been one of the forefront High Performance Computing (HPC trends and provides unprecedented performance; however, the prevalent adoption of the Single-Program Multiple-Data (SPMD programming model brings with it challenges of resource underutilization. In other words, under SPMD, every CPU needs GPU capability available to it. However, since CPUs generally outnumber GPUs, the asymmetric resource distribution gives rise to overall computing resource underutilization. In this paper, we propose to efficiently share the GPU under SPMD and formally define a series of GPU sharing scenarios. We provide performance-modeling analysis for each sharing scenario with accurate experimentation validation. With the modeling basis, we further conduct experimental studies to explore potential GPU sharing efficiency improvements from multiple perspectives. Both further theoretical and experimental GPU sharing performance analysis and results are presented. Our results not only demonstrate the significant performance gain for SPMD programs with the proposed efficient GPU sharing, but also the further improved sharing efficiency with the optimization techniques based on our accurate modeling.

  10. Large scale neural circuit mapping data analysis accelerated with the graphical processing unit (GPU)

    Science.gov (United States)

    Shi, Yulin; Veidenbaum, Alexander V.; Nicolau, Alex; Xu, Xiangmin

    2014-01-01

    Background Modern neuroscience research demands computing power. Neural circuit mapping studies such as those using laser scanning photostimulation (LSPS) produce large amounts of data and require intensive computation for post-hoc processing and analysis. New Method Here we report on the design and implementation of a cost-effective desktop computer system for accelerated experimental data processing with recent GPU computing technology. A new version of Matlab software with GPU enabled functions is used to develop programs that run on Nvidia GPUs to harness their parallel computing power. Results We evaluated both the central processing unit (CPU) and GPU-enabled computational performance of our system in benchmark testing and practical applications. The experimental results show that the GPU-CPU co-processing of simulated data and actual LSPS experimental data clearly outperformed the multi-core CPU with up to a 22x speedup, depending on computational tasks. Further, we present a comparison of numerical accuracy between GPU and CPU computation to verify the precision of GPU computation. In addition, we show how GPUs can be effectively adapted to improve the performance of commercial image processing software such as Adobe Photoshop. Comparison with Existing Method(s) To our best knowledge, this is the first demonstration of GPU application in neural circuit mapping and electrophysiology-based data processing. Conclusions Together, GPU enabled computation enhances our ability to process large-scale data sets derived from neural circuit mapping studies, allowing for increased processing speeds while retaining data precision. PMID:25277633

  11. Performance and scalability of Fourier domain optical coherence tomography acceleration using graphics processing units.

    Science.gov (United States)

    Li, Jian; Bloch, Pavel; Xu, Jing; Sarunic, Marinko V; Shannon, Lesley

    2011-05-01

    Fourier domain optical coherence tomography (FD-OCT) provides faster line rates, better resolution, and higher sensitivity for noninvasive, in vivo biomedical imaging compared to traditional time domain OCT (TD-OCT). However, because the signal processing for FD-OCT is computationally intensive, real-time FD-OCT applications demand powerful computing platforms to deliver acceptable performance. Graphics processing units (GPUs) have been used as coprocessors to accelerate FD-OCT by leveraging their relatively simple programming model to exploit thread-level parallelism. Unfortunately, GPUs do not "share" memory with their host processors, requiring additional data transfers between the GPU and CPU. In this paper, we implement a complete FD-OCT accelerator on a consumer grade GPU/CPU platform. Our data acquisition system uses spectrometer-based detection and a dual-arm interferometer topology with numerical dispersion compensation for retinal imaging. We demonstrate that the maximum line rate is dictated by the memory transfer time and not the processing time due to the GPU platform's memory model. Finally, we discuss how the performance trends of GPU-based accelerators compare to the expected future requirements of FD-OCT data rates.

  12. Multidimensional upwind hydrodynamics on unstructured meshes using graphics processing units - I. Two-dimensional uniform meshes

    Science.gov (United States)

    Paardekooper, S.-J.

    2017-08-01

    We present a new method for numerical hydrodynamics which uses a multidimensional generalization of the Roe solver and operates on an unstructured triangular mesh. The main advantage over traditional methods based on Riemann solvers, which commonly use one-dimensional flux estimates as building blocks for a multidimensional integration, is its inherently multidimensional nature, and as a consequence its ability to recognize multidimensional stationary states that are not hydrostatic. A second novelty is the focus on graphics processing units (GPUs). By tailoring the algorithms specifically to GPUs, we are able to get speedups of 100-250 compared to a desktop machine. We compare the multidimensional upwind scheme to a traditional, dimensionally split implementation of the Roe solver on several test problems, and we find that the new method significantly outperforms the Roe solver in almost all cases. This comes with increased computational costs per time-step, which makes the new method approximately a factor of 2 slower than a dimensionally split scheme acting on a structured grid.

  13. A Block-Asynchronous Relaxation Method for Graphics Processing Units

    Energy Technology Data Exchange (ETDEWEB)

    Antz, Hartwig [Karlsruhe Inst. of Technology (KIT) (Germany); Tomov, Stanimire [Univ. of Tennessee, Knoxville, TN (United States); Dongarra, Jack [Univ. of Tennessee, Knoxville, TN (United States); Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Univ. of Manchester (United Kingdom); Heuveline, Vincent [Karlsruhe Inst. of Technology (KIT) (Germany)

    2011-11-30

    In this paper, we analyze the potential of asynchronous relaxation methods on Graphics Processing Units (GPUs). For this purpose, we developed a set of asynchronous iteration algorithms in CUDA and compared them with a parallel implementation of synchronous relaxation methods on CPU-based systems. For a set of test matrices taken from the University of Florida Matrix Collection we monitor the convergence behavior, the average iteration time and the total time-to-solution time. Analyzing the results, we observe that even for our most basic asynchronous relaxation scheme, despite its lower convergence rate compared to the Gauss-Seidel relaxation (that we expected), the asynchronous iteration running on GPUs is still able to provide solution approximations of certain accuracy in considerably shorter time then Gauss- Seidel running on CPUs. Hence, it overcompensates for the slower convergence by exploiting the scalability and the good fit of the asynchronous schemes for the highly parallel GPU architectures. Further, enhancing the most basic asynchronous approach with hybrid schemes – using multiple iterations within the ”subdomain” handled by a GPU thread block and Jacobi-like asynchronous updates across the ”boundaries”, subject to tuning various parameters – we manage to not only recover the loss of global convergence but often accelerate convergence of up to two times (compared to the effective but difficult to parallelize Gauss-Seidel type of schemes), while keeping the execution time of a global iteration practically the same. This shows the high potential of the asynchronous methods not only as a stand alone numerical solver for linear systems of equations fulfilling certain convergence conditions but more importantly as a smoother in multigrid methods. Due to the explosion of parallelism in todays architecture designs, the significance and the need for asynchronous methods, as the ones described in this work, is expected to grow.

  14. Strategic Genco offers in electric energy markets cleared by merit order

    Science.gov (United States)

    Hasan, Ebrahim A. Rahman

    In an electricity market cleared by merit-order economic dispatch we identify necessary and sufficient conditions under which the market outcomes supported by pure strategy Nash equilibria (NE) exist when generating companies (Gencos) game through continuously variable incremental cost (IC) block offers. A Genco may own any number of units, each unit having multiple blocks with each block being offered at a constant IC. Next, a mixed-integer linear programming (MILP) scheme devoid of approximations or iterations is developed to identify all possible NE. The MILP scheme is systematic and general but computationally demanding for large systems. Thus, an alternative significantly faster lambda-iterative approach that does not require the use of MILP was also developed. Once all NE are found, one critical question is to identify the one whose corresponding gaming strategy may be considered by all Gencos as being the most rational. To answer this, this thesis proposes the use of a measure based on the potential profit gain and loss by each Genco for each NE. The most rational offer strategy for each Genco in terms of gaming or not gaming that best meets their risk/benefit expectations is the one corresponding to the NE with the largest gain to loss ratio. The computation of all NE is tested on several systems of up to ninety generating units, each with four incremental cost blocks. These NE are then used to examine how market power is influenced by market parameters, specifically, the number of competing Gencos, their size and true ICs, as well as the level of demand and price cap.

  15. 17 CFR 230.802 - Exemption for offerings in connection with an exchange offer or business combination for the...

    Science.gov (United States)

    2010-04-01

    ... chapter) by the first business day after publication or dissemination. If the offeror is a foreign company... journalists for publications with a general circulation in the United States to offshore press...

  16. Loan Products Included in the Offer of Commercial Banks

    Directory of Open Access Journals (Sweden)

    Vasile Dedu

    2009-04-01

    Full Text Available A bank loan is the main form of economical credit. It is for corporate activities – for medium and big companies and for retail activities – for small companies and individuals. The conditions for credit mainly depend on the quality of customers, it means their ability to perform a profitable activity and to be able to pay back the credits. For reasons which are mainly connected to marketing, bank practice has developed a large range of credit names, trying to emphasize some of the parts of the products or to take profit of some competition advantages in relation with customers’ products. We are trying to include the offer of bank loans in a typology which takes into account the law, the bank field rules and the main technical features of the offered products.

  17. Assessment 100% Supported by ICT: Possibilities Offered and Risks

    Directory of Open Access Journals (Sweden)

    Plinio del Carmen Teherán Sermeño

    2010-09-01

    Full Text Available After the launching of a Fundamentals of Physics course offered for different campuses of the National University of Colombia in blended learning mode, various experiences were collected in a virtual assessment that were supported 100% by ICT. We implemented an evaluation system that consists of different categories ranging from traditional examinations, partial examinations, final examinations, workshops, quizzes, attendance, and duties and forums, all evaluated online. In particular, random blocks of questions were used for examinations taken from large databases, built especially for this purpose, on the order of 100 per chapter. We comment on the results achieved with the implementation of the evaluation system proposed, in addition to the possibilities offered and the risks they present.

  18. CORPORATE MOTIVES FOR PUBLIC SHARES OFFERING DURING THE FINANCIAL CRISIS

    Directory of Open Access Journals (Sweden)

    Mihaela Grubisic Seba

    2015-06-01

    Full Text Available Despite greater constraints for obtaining bank loans, public shares’ offerings ceased in the SEE region since the onset of the financial crisis in 2008. With scarce IPOs and SEOs as well as debt offerings, Croatian capital market stands as prime example of mandatory shares’ listing rule application. Surveys of CFOs on going-public vs staying-private decisions are rare even in developed countries and are mostly conducted during the hot IPO markets. In this paper the motives of shares’ issuance are compared between publicly- and privately-held companies during the financial crisis. Research results showed that companies would not issue shares to the public to raise funds for their investments and growth.

  19. Unique Offerings of the ISS as an Earth Observing Platform

    Science.gov (United States)

    Cooley, Victor M.

    2013-01-01

    The International Space Station offers unique capabilities for earth remote sensing. An established Earth orbiting platform with abundant power, data and commanding infrastructure, the ISS has been in operation for twelve years as a crew occupied science laboratory and offers low cost and expedited concept-to-operation paths for new sensing technologies. Plug in modularity on external platforms equipped with structural, power and data interfaces standardizes and streamlines integration and minimizes risk and start up difficulties. Data dissemination is also standardized. Emerging sensor technologies and instruments tailored for sensing of regional dynamics may not be worthy of dedicated platforms and launch vehicles, but may well be worthy of ISS deployment, hitching a ride on one of a variety of government or commercial visiting vehicles. As global acceptance of the urgent need for understanding Climate Change continues to grow, the value of ISS, orbiting in Low Earth Orbit, in complementing airborne, sun synchronous polar, geosynchronous and other platform remote sensing will also grow.

  20. Offer of outgoing volunteer tourism in the Czech Republic

    OpenAIRE

    Bryndová, Karolína

    2015-01-01

    Bachelor thesis describes outgoing volunteer tourism in the Czech Republic through volunteer organizations that offer projects mainly for young people as a meaning leisure activity, but also organizations that are involved in rescue operations during various natural disasters and other emergencies. It then also describes types of projects as well as positive and negative impacts, benefits, and problems of international volunteering. Final survey identifies a profile of participants of these p...

  1. Offer and Acceptance under the Russian Civil Code

    Directory of Open Access Journals (Sweden)

    Valery Musin

    2013-01-01

    Full Text Available The article deals with a procedure of entering into a contract under Russian civil law both at the domestic and foreign markets. An offer and an acceptance are considered in the light or relevant provisions of the Russian Civil Codes of 1922, 1964 and that currently effective as compared with rules of the UN Convention on Contracts for the International Sale of Goods 1980 and INIDROIT Principles of International Commercial Contracts 2010.

  2. Offer and Acceptance under the Russian Civil Code

    OpenAIRE

    Valery Musin

    2013-01-01

    The article deals with a procedure of entering in to a contract under Russian civil law both atthe domestic and foreign markets. An offer and an acceptance are considered in the light or relevant provisions of the Russian Civil Codes of 1922, 1964 and that currently effective as compared with rules of the UN Convention on Contracts for the International Sale of Goods 1980 and INIDROIT Principles of International Commercial Contracts 2010.

  3. Virginia Tech to offer intensive English language refresher courses

    OpenAIRE

    2003-01-01

    Speaking and writing professionally with confidence are difficult for virtually everyone, but for those using a second language, these tasks pose a unique challenge. Virginia Tech's English Language Institute will address these needs by offering two intensive English language skills refresher classes during the semester break, from Jan. 5 to Jan. 16, 2004. Designed for graduate students, post-doctoral fellows, visiting scholars, and members of the international research community, this two-we...

  4. Influence of the support offered to breastfeeding by maternity hospitals

    OpenAIRE

    Adriana Passanha; Maria Helena D'Aquino Benício; Sônia Isoyama Venâncio; Márcia Cristina Guerreiro dos Reis

    2015-01-01

    ABSTRACT OBJECTIVE To evaluate whether the support offered by maternity hospitals is associated with higher prevalences of exclusive and predominant breastfeeding. METHODS This is a cross-sectional study including a representative sample of 916 infants less than six months who were born in maternity hospitals, in Ribeirao Preto, Sao Paulo, Southeastern Brazil, 2011. The maternity hospitals were evaluated in relation to their fulfillment of the Ten Steps to Successful Breastfeeding. Data were ...

  5. Influence of the support offered to breastfeeding by maternity hospitals.

    Science.gov (United States)

    Passanha, Adriana; Benício, Maria Helena D'Aquino; Venâncio, Sônia Isoyama; Reis, Márcia Cristina Guerreiro dos

    2015-01-01

    To evaluate whether the support offered by maternity hospitals is associated with higher prevalences of exclusive and predominant breastfeeding. This is a cross-sectional study including a representative sample of 916 infants less than six months who were born in maternity hospitals, in Ribeirao Preto, Sao Paulo, Southeastern Brazil, 2011. The maternity hospitals were evaluated in relation to their fulfillment of the Ten Steps to Successful Breastfeeding. Data were collected regarding breastfeeding patterns, the birth hospital and other characteristics. The individualized effect of the study factor on exclusive and predominant breastfeeding was analyzed using Poisson multiple regression with robust variance. Predominant breastfeeding tended to be more prevalent when the number of fulfilled steps was higher (p of linear trend = 0.057). The step related to not offering artificial teats or pacifiers to breastfed infants and that related to encouraging the establishment of breastfeeding support groups were associated, respectively, to a higher prevalence of exclusive (PR = 1.26; 95%CI 1.04;1.54) and predominant breastfeeding (PR = 1.55; 95%CI 1.01;2.39), after an adjustment was performed for confounding variables. We observed a positive association between support offered by maternity hospitals and prevalences of exclusive and predominant breastfeeding. These results can be useful to other locations with similar characteristics (cities with hospitals that fulfill the Ten Steps to Successful Breastfeeding) to provide incentive to breastfeeding, by means of promoting, protecting and supporting breastfeeding in maternity hospitals.

  6. Influence of the support offered to breastfeeding by maternity hospitals

    Directory of Open Access Journals (Sweden)

    Adriana Passanha

    2015-01-01

    Full Text Available ABSTRACT OBJECTIVE To evaluate whether the support offered by maternity hospitals is associated with higher prevalences of exclusive and predominant breastfeeding. METHODS This is a cross-sectional study including a representative sample of 916 infants less than six months who were born in maternity hospitals, in Ribeirao Preto, Sao Paulo, Southeastern Brazil, 2011. The maternity hospitals were evaluated in relation to their fulfillment of the Ten Steps to Successful Breastfeeding. Data were collected regarding breastfeeding patterns, the birth hospital and other characteristics. The individualized effect of the study factor on exclusive and predominant breastfeeding was analyzed using Poisson multiple regression with robust variance. RESULTS Predominant breastfeeding tended to be more prevalent when the number of fulfilled steps was higher (p of linear trend = 0.057. The step related to not offering artificial teats or pacifiers to breastfed infants and that related to encouraging the establishment of breastfeeding support groups were associated, respectively, to a higher prevalence of exclusive (PR = 1.26; 95%CI 1.04;1.54 and predominant breastfeeding (PR = 1.55; 95%CI 1.01;2.39, after an adjustment was performed for confounding variables. CONCLUSIONS We observed a positive association between support offered by maternity hospitals and prevalences of exclusive and predominant breastfeeding. These results can be useful to other locations with similar characteristics (cities with hospitals that fulfill the Ten Steps to Successful Breastfeeding to provide incentive to breastfeeding, by means of promoting, protecting and supporting breastfeeding in maternity hospitals.

  7. Mesh-particle interpolations on graphics processing units and multicore central processing units.

    Science.gov (United States)

    Rossinelli, Diego; Conti, Christian; Koumoutsakos, Petros

    2011-06-13

    Particle-mesh interpolations are fundamental operations for particle-in-cell codes, as implemented in vortex methods, plasma dynamics and electrostatics simulations. In these simulations, the mesh is used to solve the field equations and the gradients of the fields are used in order to advance the particles. The time integration of particle trajectories is performed through an extensive resampling of the flow field at the particle locations. The computational performance of this resampling turns out to be limited by the memory bandwidth of the underlying computer architecture. We investigate how mesh-particle interpolation can be efficiently performed on graphics processing units (GPUs) and multicore central processing units (CPUs), and we present two implementation techniques. The single-precision results for the multicore CPU implementation show an acceleration of 45-70×, depending on system size, and an acceleration of 85-155× for the GPU implementation over an efficient single-threaded C++ implementation. In double precision, we observe a performance improvement of 30-40× for the multicore CPU implementation and 20-45× for the GPU implementation. With respect to the 16-threaded standard C++ implementation, the present CPU technique leads to a performance increase of roughly 2.8-3.7× in single precision and 1.7-2.4× in double precision, whereas the GPU technique leads to an improvement of 9× in single precision and 2.2-2.8× in double precision.

  8. Offering model for a virtual power plant based on stochastic programming

    DEFF Research Database (Denmark)

    PandŽić, Hrvoje; Morales González, Juan Miguel; Conejo, Antonio J.

    2013-01-01

    electricity in both the day-ahead and the balancing markets seeking to maximize its expected profit. Such model is mathematically rigorous, yet computationally efficient.The offering problem is cast as a two-stage stochastic mixed-integer linear programming model which maximizes the virtual power plant......A virtual power plant aggregates various local production/consumption units that act in the market as a single entity. This paper considers a virtual power plant consisting of an intermittent source, a storage facility, and a dispatchable power plant. The virtual power plant sells and purchases...

  9. Price-Taker Offering Strategy in Electricity Pay-as-Bid Markets

    DEFF Research Database (Denmark)

    Mazzi, Nicolò; Kazempour, Jalal; Pinson, Pierre

    2017-01-01

    operating region of such units can be modeled using a mixed-integer linear programming approach, and the trading problem as a linear programming problem. However, the existing models mostly assume a uniform pricing scheme in all market stages, while several European balancing markets (e.g., in Germany...... and Italy) are settled under a pay-as-bid pricing scheme. The existing tools for solving the trading problem in pay-as-bid electricity markets rely on non-linear optimization models, which, combined with the unit commitment constraints, result in a mixed-integer non-linear programming problem. In contrast......, we provide a linear formulation for that trading problem. Then, we extend the proposed approach by formulating a two-stage stochastic problem for optimal offering in a two-settlement electricity market with a pay-as-bid pricing scheme at the balancing stage. The resulting model is mixed...

  10. Pediatricians Offer Heads-Up for Preventing Soccer Injuries

    Science.gov (United States)

    ... 2017 SATURDAY, Jan. 14, 2017 (HealthDay News) -- As children's soccer has become more popular in the United States, ... running, twisting, shooting and landing, the AAP explained. Children who are injured while playing soccer most often sustain sprains and strains. Bruises are ...

  11. 20 CFR 655.122 - Contents of job offers.

    Science.gov (United States)

    2010-04-01

    ... for use of the public housing units directly to the housing's management. (5) Family housing. When it... the work contract and all extensions thereof are in effect. (iii) Therefore, if, for example, a work... basis, the employer must use the worker's average hourly piece rate earnings or the required hourly...

  12. SERVE and Project ISU Offer New Career Options

    Science.gov (United States)

    Scheibe, Jim; Tolonen, Howard

    1973-01-01

    Described are individualized study units (self-contained teaching and learning materials designed to teach a single concept and structured for individual or independent use) which were prepared as curriculum materials for vocationally oriented educable and trainable mentally retarded, learning disabled, or emotionally disturbed students in…

  13. Airports offer unrealized potential for alternative energy production.

    Science.gov (United States)

    DeVault, Travis L; Belant, Jerrold L; Blackwell, Bradley F; Martin, James A; Schmidt, Jason A; Wes Burger, L; Patterson, James W

    2012-03-01

    Scaling up for alternative energy such as solar, wind, and biofuel raises a number of environmental issues, notably changes in land use and adverse effects on wildlife. Airports offer one of the few land uses where reductions in wildlife abundance and habitat quality are necessary and socially acceptable, due to risk of wildlife collisions with aircraft. There are several uncertainties and limitations to establishing alternative energy production at airports, such as ensuring these facilities do not create wildlife attractants or other hazards. However, with careful planning, locating alternative energy projects at airports could help mitigate many of the challenges currently facing policy makers, developers, and conservationists.

  14. Study: Sun May Offer Cheap Way to Treat Cancer

    Institute of Scientific and Technical Information of China (English)

    Cheyenne; Hopkins; 杨流芳

    2003-01-01

    目前治疗癌症的有效方法之一是利用激光的照射。以色列的科学家尝试利用太阳能来获取治疗癌症的激光,不仅价廉,而且效果也极佳。有关人员对此项发明十分“低调”,他说:I do not wish to project the impression that we’re of-fering some universally applicable solution.个中原因,聪明的读者一想就能明白。

  15. Virtual computers offer to the modern educational solutions

    Directory of Open Access Journals (Sweden)

    Tadeusz Wilusz

    2010-12-01

    Full Text Available During first decade of the 21 st century more and more production ICT environments have already switched towards virtualized solution. Th is is still ongoing process which offers a lot of benefits but at the same time makes understanding relations between physical environment and virtualized systems harder. The article starts with a short survey of some problems related to effectiveness usa ge of IT capabilities in modern educational environments with conclusion how they can be solved using computer virtualization technique in the way which may result wi th better understanding invisible physical processes in virtualized e-business systems

  16. Fielding The Automated Container Offering System: An interim report

    Energy Technology Data Exchange (ETDEWEB)

    Dixon, B. (EG and G Idaho, Inc., Idaho Falls, ID (USA)); Rochette, D. (Army Artificial Intelligence Center, Washington, DC (USA)); Crandell, J. (Military Traffic Management Command, Falls Church, VA (USA))

    1990-01-01

    The Automated Container Offering System (TACOS) is a cargo booking assistant currently being fielded in the International Traffic Directorate of the Military Traffic Management Command (MTMC). The expert system automates the selection process for type and size of SEAVAN containers, ports, carrier, and ship for containerized military cargo moving from the continental US to Europe. It is designed to perform all processing on simple cases and provide assistance to the human booker on complex cases. MTMC processes requests for {approximately}1000 containers per week on these routes. This paper is a case history which describes factors guiding development of TACOS to illustrate several themes which occur in other (military) logistics expert system projects.

  17. The Effect of Subsidies on the Offer of Sea Transport

    Directory of Open Access Journals (Sweden)

    Drago Pupavac

    2017-01-01

    Full Text Available The main goal of this academic discussion is to study the effect of subsidies on the offer of sea transport. Research results are based on the method of microeconomic analysis. The knowledge obtained through this academic discussion may prove to be of assistance to managers in the area of sea transport in deliberating on more efficient and market-oriented business models. The results of this work reveal that subsidies in sea transport make sense if they contribute to the improvement of the quality of transport or are of help to those for whom they are intended.

  18. Technical Training: WBTechT offers online training

    CERN Multimedia

    Monique Duval

    2005-01-01

    CERN Technical Training 2005 The 2005 CERN Web-Based Technical Training (WBTechT) portal is a computer-skills site offering multimedia learning. Visit http://www.course-source.net/sites/cern/ to self-register, consult the available programmes or request a course via EDH. A self-directed online course costs 50.- CHF for desktop applications and 90.- CHF for technical applications for three months' unlimited access. Visit the WBTechT portal or http://www.cern.ch/TechnicalTraining, and contact Technical.Training@cern.ch or your DTO to find out more information. ENSEIGNEMENT TECHNIQUE TECHNICAL TRAINING Monique Duval 74924 technical.training@cern.ch

  19. Technical Training: WBTechT offers online training

    CERN Document Server

    Monique Duval

    2005-01-01

    CERN Technical Training 2005 The 2005 CERN Web-Based Technical Training portal is a computer-skills site offering multimedia learning. Visit http://www.course-source.net/sites/cern/ to self-register, consult the available programmes or request a course via EDH. A self-directed online course costs 50.- CHF for desktop applications and 90.- CHF for technical applications for three months' unlimited access. Visit the WBTechT portal or http://www.cern.ch/TechnicalTraining, and contact Technical.Training@cern.ch or your DTOs to find out more information. ENSEIGNEMENT TECHNIQUE TECHNICAL TRAINING Monique Duval 74924 technical.training@cern.ch

  20. Offering memorable patient experience through creative, dynamic marketing strategy

    Science.gov (United States)

    Raţiu, M; Purcărea, T

    2008-01-01

    Creative, dynamic strategies are the ones that identify new and better ways of uniquely offering the target customers what they want or need. A business can achieve competitive advantage if it chooses a marketing strategy that sets the business apart from anyone else. Healthcare services companies have to understand that the customer should be placed in the centre of all specific marketing operations. The brand message should reflect the focus on the patient. Healthcare products and services offered must represent exactly the solutions that customers expect. The touchpoints with the patients must be well mastered in order to convince them to accept the proposed solutions. Healthcare service providers must be capable to look beyond customer's behaviour or product and healthcare service aquisition. This will demand proactive and far–reaching changes, including focusing specifically on customer preference, quality, and technological interfaces; rewiring strategy to find new value from existing and unfamiliar sources; disintegrating and radically reassembling operational processes; and restructuring the organization to accommodate new typess of work and skill. PMID:20108466