unit cpu keyboard: Topics by WorldWideScience.org

Sample records for unit cpu keyboard

Operator-system interaction language for a nuclear power unit. 4. Keyboards and menu

International Nuclear Information System (INIS)

Chachko, A.G.

1996-01-01

The article is devoted to the functional-linguistic analysis of keyboards, placed on the NPPs control panels. The front panels of keyboards are considered and the problems on information management and role distribution between the man and machine and the keyboard conveniences for operators are discussed
Warning: This keyboard will deconstruct--the role of the keyboard in skilled typewriting.

Science.gov (United States)

Crump, Matthew J C; Logan, Gordon D

2010-06-01

Skilled actions are commonly assumed to be controlled by precise internal schemas or cognitive maps. We challenge these ideas in the context of skilled typing, where prominent theories assume that typing is controlled by a well-learned cognitive map that plans finger movements without feedback. In two experiments, we demonstrate that online physical interaction with the keyboard critically mediates typing skill. Typists performed single-word and paragraph typing tasks on a regular keyboard, a laser-projection keyboard, and two deconstructed keyboards, made by removing successive layers of a regular keyboard. Averaged over the laser and deconstructed keyboards, response times for the first keystroke increased by 37%, the interval between keystrokes increased by 120%, and error rate increased by 177%, relative to those of the regular keyboard. A schema view predicts no influence of external motor feedback, because actions could be planned internally with high precision. We argue that the expert knowledge mediating action control emerges during online interaction with the physical environment.
Mobile phones and computer keyboards: unlikely reservoirs of multidrug-resistant organisms in the tertiary intensive care unit.

Science.gov (United States)

Smibert, O C; Aung, A K; Woolnough, E; Carter, G P; Schultz, M B; Howden, B P; Seemann, T; Spelman, D; McGloughlin, S; Peleg, A Y

2018-03-02

Few studies have used molecular epidemiological methods to study transmission links to clinical isolates in intensive care units. Ninety-four multidrug-resistant organisms (MDROs) cultured from routine specimens from intensive care unit (ICU) patients over 13 weeks were stored (11 meticillin-resistant Staphylococcus aureus (MRSA), two vancomycin-resistant enterococci and 81 Gram-negative bacteria). Medical staff personal mobile phones, departmental phones, and ICU keyboards were swabbed and cultured for MDROs; MRSA was isolated from two phones. Environmental and patient isolates of the same genus were selected for whole genome sequencing. On whole genome sequencing, the mobile phone isolates had a pairwise single nucleotide polymorphism (SNP) distance of 183. However, >15,000 core genome SNPs separated the mobile phone and clinical isolates. In a low-endemic setting, mobile phones and keyboards appear unlikely to contribute to hospital-acquired MDROs. Copyright © 2018 The Healthcare Infection Society. Published by Elsevier Ltd. All rights reserved.
Keyboards: from Typewriters to Tablet Computers

Directory of Open Access Journals (Sweden)

Gintautas Grigas

2014-06-01

Full Text Available The evolution of Lithuanian keyboards is reviewed. Keyboards are divided up to three categories according to flexibility of their adaptation for typing of Lithuanian texts: 1 mechanical typewriter keyboards (heavily adaptable, 2 electromechanical desktop or laptop computer keyboards, and 3 programmable touch screen tablet computer keyboards (easily adaptable. It is discussed how they were adapted for Lithuanian language, with solutions in other languages are compared. Both successful and unsuccessful solutions are discussed. The reasons of failures as well as their negative impact on writing culture and formation of bad habits in the work with computer are analyzed. The recommendations how to improve current situation are presented.
Cumulative keyboard strokes: a possible risk factor for carpal tunnel syndrome

Directory of Open Access Journals (Sweden)

Eleftheriou Andreas

2012-08-01

Full Text Available Abstract Background Contradictory reports have been published regarding the association of Carpal Tunnel Syndrome (CTS and the use of computer keyboard. Previous studies did not take into account the cumulative exposure to keyboard strokes among computer workers. The aim of the present study was to investigate the association between cumulative keyboard use (keyboard strokes and CTS. Methods Employees (461 from a Governmental data entry & processing unit agreed to participate (response rate: 84.1 % in a cross-sectional study. Α questionnaire was distributed to the participants to obtain information on socio-demographics and risk factors for CTS. The participants were examined for signs and symptoms related to CTS and were asked if they had previous history or surgery for CTS. The cumulative amount of the keyboard strokes per worker per year was calculated by the use of payroll’s registry. Two case definitions for CTS were used. The first included subjects with personal history/surgery for CTS while the second included subjects that belonged to the first case definition plus those participants were identified through clinical examination. Results Multivariate analysis used for both case definitions, indicated that those employees with high cumulative exposure to keyboard strokes were at increased risk of CTS (case definition A: OR = 2.23;95 % CI = 1.09-4.52 and case definition B: OR = 2.41; 95%CI = 1.36-4.25. A dose response pattern between cumulative exposure to keyboard strokes and CTS has been revealed (p Conclusions The present study indicated a possible association between cumulative exposure to keyboard strokes and development of CTS. Cumulative exposure to key-board strokes would be taken into account as an exposure indicator regarding exposure assessment of computer workers. Further research is needed in order to test the results of the current study and assess causality between cumulative keyboard strokes and
IBM model M keyboard

CERN Multimedia

1985-01-01

In 1985, the IBM Model M keyboard was created. This timeless classic was a hit. IBM came out with several varients of the model M. They had the space saver 104 key which is the one most seen today and many international versions of that as well. The second type, and rarest is the 122 key model M which has 24 extra keys at the very top, dubbed the “programmers keyboard”. IBM manufactured these keyboards until 1991. The model M features “caps” over the actual keys that can be taken off separately one at a time for cleaning or to replace them with colored keys or keys of another language, that was a very cost effective way of shipping out internationally the keyboards.
KEYBOARD MONITORING BASED UPON THE IMMUNOLOGIC CLONING

Directory of Open Access Journals (Sweden)

Yu. A. Bryukhomitsky

2016-12-01

Full Text Available The Biometric Keyboard Monitoring System is represented. It’s intended for permanent textindependent control and analysis of automated information data systems users’ keyboard script. It’s suggested a keyboard monitoring method, which is combined the idea and advantages of threaded method of keyboard parameters representation and immunological approach to its realization, based upon the detectors cloning model. Suggested method potentially possesses a pinpoint accuracy, higher convergence rate of classification problems solving, ability to learn on only “own” class exemplars.
Keyboarding, Language Arts, and the Elementary School Child.

Science.gov (United States)

Balajthy, Ernest

1988-01-01

Discusses benefits of keyboarding instruction for elementary school students, emphasizing the integration of keyboarding with language arts instruction. Traditional typing and computer-assisted instruction are discussed, six software packages for adapting keyboarding instruction to the classroom are reviewed, and suggestions for software selection…
Personality identified self-powering keyboard

Science.gov (United States)

Wang, Zhong Lin; Zhu, Guang; Chen, Jun

2018-02-06

A keyboard for converting keystrokes into electrical signals is disclosed. The keyboard includes a plurality of keys. At least one of the keys includes two electrodes and a member that generates triboelectric charges upon skin contact. The member is adjacent to one of the electrodes to affect a flow of electrons between the two electrodes when a distance between the member and the skin varies.
Kernel Korner : The Linux keyboard driver

NARCIS (Netherlands)

Brouwer, A.E.

1995-01-01

Our Kernel Korner series continues with an article describing the Linux keyboard driver. This article is not for "Kernel Hackers" only--in fact, it will be most useful to those who wish to use their own keyboard to its fullest potential, and those who want to write programs to take advantage of the
A keyboard control method for loop measurement

International Nuclear Information System (INIS)

Gao, Z.W.

1994-01-01

This paper describes a keyboard control mode based on the DEC VAX computer. The VAX Keyboard code can be found under running of a program was developed. During the loop measurement or multitask operation, it ables to be distinguished from a keyboard code to stop current operation or transfer to another operation while previous information can be held. The combining of this mode, the author successfully used one key control loop measurement for test Dual Input Memory module which is used in a rearrange Energy Trigger system for LEP 8 Bunch operation
A Hybrid CPU/GPU Pattern-Matching Algorithm for Deep Packet Inspection.

Directory of Open Access Journals (Sweden)

Chun-Liang Lee

Full Text Available The large quantities of data now being transferred via high-speed networks have made deep packet inspection indispensable for security purposes. Scalable and low-cost signature-based network intrusion detection systems have been developed for deep packet inspection for various software platforms. Traditional approaches that only involve central processing units (CPUs are now considered inadequate in terms of inspection speed. Graphic processing units (GPUs have superior parallel processing power, but transmission bottlenecks can reduce optimal GPU efficiency. In this paper we describe our proposal for a hybrid CPU/GPU pattern-matching algorithm (HPMA that divides and distributes the packet-inspecting workload between a CPU and GPU. All packets are initially inspected by the CPU and filtered using a simple pre-filtering algorithm, and packets that might contain malicious content are sent to the GPU for further inspection. Test results indicate that in terms of random payload traffic, the matching speed of our proposed algorithm was 3.4 times and 2.7 times faster than those of the AC-CPU and AC-GPU algorithms, respectively. Further, HPMA achieved higher energy efficiency than the other tested algorithms.
Length-Bounded Hybrid CPU/GPU Pattern Matching Algorithm for Deep Packet Inspection

Directory of Open Access Journals (Sweden)

Yi-Shan Lin

2017-01-01

Full Text Available Since frequent communication between applications takes place in high speed networks, deep packet inspection (DPI plays an important role in the network application awareness. The signature-based network intrusion detection system (NIDS contains a DPI technique that examines the incoming packet payloads by employing a pattern matching algorithm that dominates the overall inspection performance. Existing studies focused on implementing efficient pattern matching algorithms by parallel programming on software platforms because of the advantages of lower cost and higher scalability. Either the central processing unit (CPU or the graphic processing unit (GPU were involved. Our studies focused on designing a pattern matching algorithm based on the cooperation between both CPU and GPU. In this paper, we present an enhanced design for our previous work, a length-bounded hybrid CPU/GPU pattern matching algorithm (LHPMA. In the preliminary experiment, the performance and comparison with the previous work are displayed, and the experimental results show that the LHPMA can achieve not only effective CPU/GPU cooperation but also higher throughput than the previous method.
Handwriting versus Keyboard Writing: Effect on Word Recall

Directory of Open Access Journals (Sweden)

Anne Mangen

2015-10-01

Full Text Available The objective of this study was to explore effects of writing modality on word recall and recognition. The following three writing modalities were used: handwriting with pen on paper; typewriting on a conventional laptop keyboard; and typewriting on an iPad touch keyboard. Thirty-six females aged 19-54 years participated in a fully counterbalanced within-subjects experimental design. Using a wordlist paradigm, participants were instructed to write down words (one list per writing modality read out loud to them, in the three writing modalities. Memory for words written using handwriting, a conventional keyboard and a virtual iPad keyboard was assessed using oral free recall and recognition. The data was analyzed using non-parametric statistics. Results show that there was an omnibus effect of writing modality and follow-up analyses showed that, for the free recall measure, participants had significantly better free recall of words written in the handwriting condition, compared to both keyboard writing conditions. There was no effect of writing modality in the recognition condition. This indicates that, with respect to aspects of word recall, there may be certain cognitive benefits to handwriting which may not be fully retained in keyboard writing. Cognitive and educational implications of this finding are discussed.
Heterogeneous Gpu&Cpu Cluster For High Performance Computing In Cryptography

Directory of Open Access Journals (Sweden)

Michał Marks

2012-01-01

Full Text Available This paper addresses issues associated with distributed computing systems andthe application of mixed GPU&CPU technology to data encryption and decryptionalgorithms. We describe a heterogenous cluster HGCC formed by twotypes of nodes: Intel processor with NVIDIA graphics processing unit and AMDprocessor with AMD graphics processing unit (formerly ATI, and a novel softwareframework that hides the heterogeneity of our cluster and provides toolsfor solving complex scientific and engineering problems. Finally, we present theresults of numerical experiments. The considered case study is concerned withparallel implementations of selected cryptanalysis algorithms. The main goal ofthe paper is to show the wide applicability of the GPU&CPU technology tolarge scale computation and data processing.
Creating a single South African keyboard layout to promote language

African Journals Online (AJOL)

Not only are problems such as researching the orthographies, key placement and keyboard input options examined, but strategic objectives such as ensuring its wide adoption and creating a multilingual keyboard for all South African languages are also discussed. The result is a keyboard that furthers multilingualism and ...
A low-cost MRI compatible keyboard

DEFF Research Database (Denmark)

Jensen, Martin Snejbjerg; Heggli, Ole Adrian; Alves da Mota, Patricia

2017-01-01

, presenting a challenging environment for playing an instrument. Here, we present an MRI-compatible polyphonic keyboard with a materials cost of 850 $, designed and tested for safe use in 3T (three Tesla) MRI-scanners. We describe design considerations, and prior work in the field. In addition, we provide...... recommendations for future designs and comment on the possibility of using the keyboard in magnetoencephalography (MEG) systems. Preliminary results indicate a comfortable playing experience with no disturbance of the imaging process....
The Heritage of the Future: Historical Keyboards, Technology, and Modernism

OpenAIRE

Ng, Tiffany Kwan

2015-01-01

This dissertation examines modernist twentieth-century applications of the pipe organ and the carillon in the United States and in the Netherlands. These keyboard instruments, historically owned by religious or governmental entities, served an exceptionally diverse variety of political, technological, social, and urban planning functions. Their powerful simultaneous associations with historicism and innovation enabled those who built and played them to anchor the instruments’ novel uses in th...
GeantV: from CPU to accelerators

Science.gov (United States)

Amadio, G.; Ananya, A.; Apostolakis, J.; Arora, A.; Bandieramonte, M.; Bhattacharyya, A.; Bianchini, C.; Brun, R.; Canal, P.; Carminati, F.; Duhem, L.; Elvira, D.; Gheata, A.; Gheata, M.; Goulas, I.; Iope, R.; Jun, S.; Lima, G.; Mohanty, A.; Nikitina, T.; Novak, M.; Pokorski, W.; Ribon, A.; Sehgal, R.; Shadura, O.; Vallecorsa, S.; Wenzel, S.; Zhang, Y.

2016-10-01

The GeantV project aims to research and develop the next-generation simulation software describing the passage of particles through matter. While the modern CPU architectures are being targeted first, resources such as GPGPU, Intel© Xeon Phi, Atom or ARM cannot be ignored anymore by HEP CPU-bound applications. The proof of concept GeantV prototype has been mainly engineered for CPU's having vector units but we have foreseen from early stages a bridge to arbitrary accelerators. A software layer consisting of architecture/technology specific backends supports currently this concept. This approach allows to abstract out the basic types such as scalar/vector but also to formalize generic computation kernels using transparently library or device specific constructs based on Vc, CUDA, Cilk+ or Intel intrinsics. While the main goal of this approach is portable performance, as a bonus, it comes with the insulation of the core application and algorithms from the technology layer. This allows our application to be long term maintainable and versatile to changes at the backend side. The paper presents the first results of basket-based GeantV geometry navigation on the Intel© Xeon Phi KNC architecture. We present the scalability and vectorization study, conducted using Intel performance tools, as well as our preliminary conclusions on the use of accelerators for GeantV transport. We also describe the current work and preliminary results for using the GeantV transport kernel on GPUs.
GeantV: from CPU to accelerators

International Nuclear Information System (INIS)

Amadio, G; Bianchini, C; Iope, R; Ananya, A; Arora, A; Apostolakis, J; Bandieramonte, M; Brun, R; Carminati, F; Gheata, A; Gheata, M; Goulas, I; Nikitina, T; Bhattacharyya, A; Mohanty, A; Canal, P; Elvira, D; Jun, S; Lima, G; Duhem, L

2016-01-01

The GeantV project aims to research and develop the next-generation simulation software describing the passage of particles through matter. While the modern CPU architectures are being targeted first, resources such as GPGPU, Intel© Xeon Phi, Atom or ARM cannot be ignored anymore by HEP CPU-bound applications. The proof of concept GeantV prototype has been mainly engineered for CPU's having vector units but we have foreseen from early stages a bridge to arbitrary accelerators. A software layer consisting of architecture/technology specific backends supports currently this concept. This approach allows to abstract out the basic types such as scalar/vector but also to formalize generic computation kernels using transparently library or device specific constructs based on Vc, CUDA, Cilk+ or Intel intrinsics. While the main goal of this approach is portable performance, as a bonus, it comes with the insulation of the core application and algorithms from the technology layer. This allows our application to be long term maintainable and versatile to changes at the backend side. The paper presents the first results of basket-based GeantV geometry navigation on the Intel© Xeon Phi KNC architecture. We present the scalability and vectorization study, conducted using Intel performance tools, as well as our preliminary conclusions on the use of accelerators for GeantV transport. We also describe the current work and preliminary results for using the GeantV transport kernel on GPUs. (paper)

Older Amateur Keyboard Players Learning for Self-Fulfilment

Science.gov (United States)

Taylor, Angela

2011-01-01

This article investigates self-reported music learning experiences of 21 older amateur pianists and electronic keyboard players. Significant changes in their lives and the encouragement of friends were catalysts for returning to or taking up a keyboard instrument as an adult, although not all returners had positive memories of learning a keyboard…
Keyboard with Universal Communication Protocol Applied to CNC Machine

Directory of Open Access Journals (Sweden)

Mejía-Ugalde Mario

2014-04-01

Full Text Available This article describes the use of a universal communication protocol for industrial keyboard based microcontroller applied to computer numerically controlled (CNC machine. The main difference among the keyboard manufacturers is that each manufacturer has its own programming of source code, producing a different communication protocol, generating an improper interpretation of the function established. The above results in commercial industrial keyboards which are expensive and incompatible in their connection with different machines. In the present work the protocol allows to connect the designed universal keyboard and the standard keyboard of the PC at the same time, it is compatible with all the computers through the communications USB, AT or PS/2, to use in CNC machines, with extension to other machines such as robots, blowing, injection molding machines and others. The advantages of this design include its easy reprogramming, decreased costs, manipulation of various machine functions and easy expansion of entry and exit signals. The results obtained of performance tests were satisfactory, because each key has the programmed and reprogrammed facility in different ways, generating codes for different functions, depending on the application where it is required to be used.
76 FR 21847 - Defense Federal Acquisition Regulation Supplement (DFARS), Alternative Line-Item Structure (DFARS...

Science.gov (United States)

2011-04-19

... with CPU, 20 EA Monitor, Keyboard and Mouse. Alternative line-item structure offer where monitors are... CPU, 20 EA Keyboard and Mouse. 0002 Monitor 20 EA (End of provision)] [FR Doc. 2011-8966 Filed 4-18-11... problems in the receipt and acceptance phase for contract deliverables and payments. This group determined...
The Interwoven Evolution of the Early Keyboard and Baroque Culture

Directory of Open Access Journals (Sweden)

Rachel Stevenson

2016-04-01

Full Text Available The purpose of this paper is to analyze the impact that Baroque society had in the development of the early keyboard. While the main timeframe is Baroque, a few references are made to the late Medieval Period in determining the reason for the keyboard to more prominently emerge in the musical scene. As Baroque society develops and new genres are formed, different keyboard instruments serve vital roles unique to their construction. These new roles also affect the way music was written for the keyboard as well. Advantages and disadvantages of each instrument are discussed, providing an analysis of what would have been either accepted or rejected by Baroque culture. While music is the main focus, other fine arts are mentioned, including architecture, poetry, politics, and others. My research includes primary and secondary resources retrieved from databases provided by Cedarville University. By demonstrating the relationship between Baroque society and early keyboard development, roles and music, this will be a helpful source in furthering the pianist's understanding of the instrument he or she plays. It also serves pedagogical purposes in its analysis of context in helping a student interpret a piece written during this time period with these early keyboard instruments.
A Study of Exploring the Potentiality of Keyboards into Preschool Music Education

OpenAIRE

深見, 友紀子; FUKAMI, Yukiko; 冨田, 芳正; TOMITA, Yoshimasa; 横山, 七佳; YOKOYAMA, Nanaka

2006-01-01

In these days, electronic keyboards are popularly used among people, as substitute of piano or as "a toy with keyboards". The aim of this study is to explore the potentiality of adopting these keyboards into music education in kindergardens, nursery schools, and utilizing them at the scene of rhythmic activities and music plays as it can be handled by ordinary child-care workers. In the study, we focused on four subjects, "rhythm", "tone", "optical navigation", "electronic keyboards+α". "Rhyt...
Perception-Based Tactile Soft Keyboard for the Touchscreen of Tablets

Directory of Open Access Journals (Sweden)

Kwangtaek Kim

2018-01-01

Full Text Available Most mobile devices equipped with touchscreens provide on-screen soft keyboard as an input method. However, many users are experiencing discomfort due to lack of physical feedback that causes slow typing speed and error-prone typing, as compared to the physical keyboard. To solve the problem, a platform-independent haptic soft keyboard suitable for tablet-sized touchscreens was proposed and developed. The platform-independent haptic soft keyboard was verified on both Android and Windows. In addition, a psychophysical experiment has been conducted to find an optimal strength of key click feedback on touchscreens, and the perception result was applied for making uniform tactile forces on touchscreens. The developed haptic soft keyboard can be easily integrated with existing tablets by putting the least amount of effort. The evaluation results confirm platform independency, fast tactile key click feedback, and uniform tactile force distribution on touchscreen with using only two piezoelectric actuators. The proposed system was developed on a commercial tablet (Mu Pad that has dual platforms (Android and Windows.
MIDI Keyboards: Memory Skills and Building Values toward School.

Science.gov (United States)

Marcinkiewicz, Henryk R.; And Others

This document summarizes the results of a study which evaluated whether school instruction with Musical Instrument Digital Interface (MIDI) keyboards improves memory skill and whether school instruction with MIDI keyboards improves sentiments toward school and instructional media. Pupils in early elementary grades at five schools were evaluated…
A data-driven design evaluation tool for handheld device soft keyboards.

Directory of Open Access Journals (Sweden)

Matthieu B Trudeau

Full Text Available Thumb interaction is a primary technique used to operate small handheld devices such as smartphones. Despite the different techniques involved in operating a handheld device compared to a personal computer, the keyboard layouts for both devices are similar. A handheld device keyboard that considers the physical capabilities of the thumb may improve user experience. We developed and applied a design evaluation tool for different geometries of the QWERTY keyboard using a performance evaluation model. The model utilizes previously collected data on thumb motor performance and posture for different tap locations and thumb movement directions. We calculated a performance index (PITOT, 0 is worst and 2 is best for 663 designs consisting in different combinations of three variables: the keyboard's radius of curvature (R (mm, orientation (O (°, and vertical location on the screen (L. The current standard keyboard performed poorly (PITOT = 0.28 compared to other designs considered. Keyboard location (L contributed to the greatest variability in performance out of the three design variables, suggesting that designers should modify this variable first. Performance was greatest for designs in the middle keyboard location. In addition, having a slightly upward curve (R = -20 mm and orientated perpendicular to the thumb's long axis (O = -20° improved performance to PITOT = 1.97. Poorest performances were associated with placement of the keyboard's spacebar in the bottom right corner of the screen (e.g., the worst was for R = 20 mm, O = 40°, L = Bottom (PITOT = 0.09. While this evaluation tool can be used in the design process as an ergonomic reference to promote user motor performance, other design variables such as visual access and usability still remain unexplored.
Comparison of dysgraphia impairments across writing-by-hand and two keyboard modalities

Directory of Open Access Journals (Sweden)

Lisa A Edmonds

2015-05-01

Full Text Available Computer use is essential for tasks such as e-mail, banking and social networking and is important for communication and independence in persons with aphasia. However, most evaluations of dysgraphia have investigated handwriting exclusively. Buchwald and Rapp (2009 evaluated dysgraphia in handwriting and described dissociated distinctions between orthographic long-term memory (O-LTM and working memory (WM. Greater word-level errors (e.g., semantic and frequency effects were indicative of O-LTM impairment. Greater nonword-level errors and length effects indicated a WM-level impairment where the graphemic buffer was less able to maintain orthographic representation for correct order and letter production. Cameron, Cubelli, and Della Sala (2002 posit a common orthographic buffer for handwriting and typing where the orthographic buffer supports a single allographic system with subsystems for handwriting and typing. Thus, a buffer-level impairment should in principle affect writing and typing similarly. However, one additional consideration in keyboard use is the potential impact of divided attention (between keyboard and screen and visual search. Alternatively, the availability of letters may potentially aid in letter activation and/or selection. To examine these questions, this study compares writing-by-hand (WBH and typing on QWERTY and ABC keyboards. Lexical or buffer level impairments should result in similar accuracy across modalities. However, participants with buffer level impairments may show increased or decreased keyboard performance depending on how keyboard use interacts with impairments. Potential differential effects across keyboards could also potentially be seen, since the QWERTY keyboard could recruit procedural memory in previously proficient users like those included in this study, though the ABC keyboard could provide a strategy for letter search. Methods. Seven English speaking participants with chronic aphasia due to stroke
BrailleEasy: One-handed Braille Keyboard for Smartphones.

Science.gov (United States)

Šepić, Barbara; Ghanem, Abdurrahman; Vogel, Stephan

2015-01-01

The evolution of mobile technology is moving at a very fast pace. Smartphones are currently considered a primary communication platform where people exchange voice calls, text messages and emails. The human-smartphone interaction, however, is generally optimized for sighted people through the use of visual cues on the touchscreen, e.g., typing text by tapping on a visual keyboard. Unfortunately, this interaction scheme renders smartphone technology largely inaccessible to visually impaired people as it results in slow typing and higher error rates. Apple and some third party applications provide solutions specific to blind people which enables them to use Braille on smartphones. These applications usually require both hands for typing. However, Brailling with both hands while holding the phone is not very comfortable. Furthermore, two-handed Brailling is not possible on smartwatches, which will be used more pervasively in the future. Therefore, we develop a platform for one-handed Brailing consisting of a custom keyboard called BrailleEasy to input Arabic or English Braille codes within any application, and a BrailleTutor application for practicing. Our platform currently supports Braille grade 1, and will be extended to support contractions, spelling correction, and more languages. Preliminary analysis of user studies for blind participants showed that after less than two hours of practice, participants were able to type significantly faster with the BrailleEasy keyboard than with the standard QWERTY keyboard.
Physical Interactions with Digital Strings - A hybrid approach to a digital keyboard instrument

DEFF Research Database (Denmark)

Dahlstedt, Palle

2017-01-01

of stopping and muting the strings at arbitrary positions. The parameters of the string model are controlled through TouchKeys multitouch sensors on each key, combined with MIDI data and acoustic signals from the digital keyboard frame, using a novel mapping. The instrument is evaluated from a performing...... of control. The contributions are two-fold. First, the use of acoustic sounds from a physical keyboard for excitations and resonances results in a novel hybrid keyboard instrument in itself. Second, the digital model of "inside piano" playing, using multitouch keyboard data, allows for performance techniques...
Privacy Enhancing Keyboard: Design, Implementation, and Usability Testing

Directory of Open Access Journals (Sweden)

Zhen Ling

2017-01-01

Full Text Available To protect users from numerous password inference attacks, we invent a novel context aware privacy enhancing keyboard (PEK for Android touch-based devices. Usually PEK would show a QWERTY keyboard when users input text like an email or a message. Nevertheless, whenever users enter a password in the input box on his or her touch-enabled device, a keyboard will be shown to them with the positions of the characters shuffled at random. PEK has been released on the Google Play since 2014. However, the number of installations has not lived up to our expectation. For the purpose of usable security and privacy, we designed a two-stage usability test and performed two rounds of iterative usability testing in 2016 and 2017 summer with continuous improvements of PEK. The observations from the usability testing are educational: (1 convenience plays a critical role when users select an input method; (2 people think those attacks that PEK prevents are remote from them.
Forget about switching keyboard layouts with the "Compose Key"

CERN Multimedia

CERN. Geneva

2018-01-01

Growing up with a Spanish keyboard was not an easy childhood (using Shift+7 (/) to search in vim, or having to type AltGr+[ to actually have an opening bracket), so at some point in my life I switched to an American keyboard. At the beginning I was happy switching layouts to either do some coding or talk to my mum (I am not a fan of the classical excuse "sorry for my typos, I don't have the 'ñ' in my keyboard"). Things got much worse when I started to need French characters (ç, è) to interact with some services at CERN, or some Slovak letters (č, đ) to talk to Robert, my Slovak colleague . Then I discovered the Compose Key and my life has been different ever since.
Thermoelectric mini cooler coupled with micro thermosiphon for CPU cooling system

International Nuclear Information System (INIS)

Liu, Di; Zhao, Fu-Yun; Yang, Hong-Xing; Tang, Guang-Fa

2015-01-01

In the present study, a thermoelectric mini cooler coupling with a micro thermosiphon cooling system has been proposed for the purpose of CPU cooling. A mathematical model of heat transfer, depending on one-dimensional treatment of thermal and electric power, is firstly established for the thermoelectric module. Analytical results demonstrate the relationship between the maximal COP (Coefficient of Performance) and Q c with the figure of merit. Full-scale experiments have been conducted to investigate the effect of thermoelectric operating voltage, power input of heat source, and thermoelectric module number on the performance of the cooling system. Experimental results indicated that the cooling production increases with promotion of thermoelectric operating voltage. Surface temperature of CPU heat source linearly increases with increasing of power input, and its maximum value reached 70 °C as the prototype CPU power input was equivalent to 84 W. Insulation between air and heat source surface can prevent the condensate water due to low surface temperature. In addition, thermal performance of this cooling system could be enhanced when the total dimension of thermoelectric module matched well with the dimension of CPU. This research could benefit the design of thermal dissipation of electronic chips and CPU units. - Highlights: • A cooling system coupled with thermoelectric module and loop thermosiphon is developed. • Thermoelectric module coupled with loop thermosiphon can achieve high heat-transfer efficiency. • A mathematical model of thermoelectric cooling is built. • An analysis of modeling results for design and experimental data are presented. • Influence of power input and operating voltage on the cooling system are researched
Studies on hand-held visual communication device for the deaf and speech-impaired 2. Keyboard design.

Science.gov (United States)

Thurlow, W R

1980-01-01

Experiments with keyboard arrangements of letters show that simple alphabetic letter-key sequences with 4 to 5 letters in a row lead to most rapid visual search performance. Such arrangements can be used on keyboards operated by the index finger of one hand. Arrangement of letters in words offers a promising alternative because these arrangements can be readily memorized and can result in small interletter distances on the keyboard for frequently occurring letter sequences. Experiments on operation of keyboards show that a space or shift key operated by the left hand (which also holds the communication device) results in faster keyboard operation than when space or shift keys on the front of the keyboard (operated by right hand) are used. Special problems of the deaf-blind are discussed. Keyboard arrangements are investigated, and matching tactual codes are suggested.
PEMANFAATAN SPYWARE UNTUK MONITORING AKTIVITAS KEYBOARD DALAM JARINGAN MICROSOFT WINDOWS

Directory of Open Access Journals (Sweden)

Mulki Indana Zulfa

2015-03-01

Full Text Available Pengawasan terhadap penggunaan teknologi informasi sangat diperlukan terlebih semakin berkembangnya ilmu tentang pembuatan virus, worm, atau spyware. Memasang antivirus bisa juga menjadi solusi untuk mencegah virus masuk ke dalam jaringan atau sistem komputer. Tetapi antivirus tidak bisa melakukan monitoring terhadap aktivitas user contohnya aktivitas keyboard. Keylogger adalah perangkat lunak yang mampu merekam segala aktivitas keyboard. Keylogger harus diinstal terlebih dahulu terhadap target komputer (client yang akan direkam aktivitas keyboard-nya. Kemudian untuk mengambil file hasil rekamannya, file log, harus mempunyai akses fisik ke komputer tersebut dan hal ini akan menjadi masalah jika komputer target yang akan dimonitoring cukup banyak. Metode control keylogger-spy agent dengan memanfaatkan teknologi spyware menjadi solusi dari masalah tersebut. Spy agent akan secara aktif merekam aktivitas keyboard seseorang. File log yang dihasilkan akan disimpan didalam cache-nya sehingga tidak akan menimbulkan kecurigaan user dan tidak perlu mempunyai akses fisik jika ingin mengambil file lognya. Control keylogger dapat menghubungi spy agent mana yang akan diambil file lognya. File log yang berhasil diambil akan disimpan dengan baik di komputer server. Dari hasil pengujian lima komputer yang dijadikan target spy agent semuanya dapat memberikan file log kepada control keylogger.
A combined PLC and CPU approach to multiprocessor control

International Nuclear Information System (INIS)

Harris, J.J.; Broesch, J.D.; Coon, R.M.

1995-10-01

A sophisticated multiprocessor control system has been developed for use in the E-Power Supply System Integrated Control (EPSSIC) on the DIII-D tokamak. EPSSIC provides control and interlocks for the ohmic heating coil power supply and its associated systems. Of particular interest is the architecture of this system: both a Programmable Logic Controller (PLC) and a Central Processor Unit (CPU) have been combined on a standard VME bus. The PLC and CPU input and output signals are routed through signal conditioning modules, which provide the necessary voltage and ground isolation. Additionally these modules adapt the signal levels to that of the VME I/O boards. One set of I/O signals is shared between the two processors. The resulting multiprocessor system provides a number of advantages: redundant operation for mission critical situations, flexible communications using conventional TCP/IP protocols, the simplicity of ladder logic programming for the majority of the control code, and an easily maintained and expandable non-proprietary system
Creating a Single South African Keyboard Layout to Promote Language

Directory of Open Access Journals (Sweden)

Dwayne Bailey

2011-10-01

Full Text Available
Abstract: In this case study, a description is given of a keyboard layout designed to address the input needs of South African languages, specifically Venda, a language which would otherwise be impossible to type on a computer. In creating this keyboard, the designer, Translate.org.za, uses a practical intervention that transforms technology from a means harming a language into one ensuring the creation and preservation of good language resources for minority languages. The study first looks at the implications and consequences of this missing keyboard, and then follows the process from conception, strategy, research and design to the final user response. Not only are problems such as researching the orthographies, key placement and keyboard input options examined, but strategic objectives such as ensuring its wide adoption and creating a multilingual keyboard for all South African languages are also discussed. The result is a keyboard that furthers multilingualism and ensures the capturing of good data for future research. Finally it is a tool helping to boost and bolster the vitality of a language.
Keywords: KEYBOARD, MULTILINGUALISM, VENDA, AFRIKAANS, TSWANA, NORTH-ERN SOTHO, ZULU, SOURCE, FREE SOFTWARE, LAYOUT
Opsomming: Die skep van 'n enkelvoudige Suid-Afrikaanse toetsborduit-leg om taal te bevorder. In hierdie gevallestudie word 'n beskrywing gegee van die ontwerp van 'n sleutelborduitleg vir die hantering van die insetbehoeftes van Suid-Afrikaanse tale, veral Venda, 'n taal wat andersins onmoontlik op 'n rekenaar getik sou kon word. Deur die skep van hierdie sleutelbord gebruik die ontwerper, Translate.org.za, 'n praktiese ingryp wat tegnologie verander van 'n middel wat 'n taal benadeel tot een wat die skep en bewaring van nuttige taal-hulpbronne vir minderheidstale verseker. Die studie kyk eers na die implikasies en gevolge van hierdie ontbrekende sleutelbord, en volg dan die proses van konsepsie, strategie, navorsing en
Algorithm for personal identification in distance learning system based on registration of keyboard rhythm

Science.gov (United States)

Nikitin, P. V.; Savinov, A. N.; Bazhenov, R. I.; Sivandaev, S. V.

2018-05-01

The article describes the method of identifying a person in distance learning systems based on a keyboard rhythm. An algorithm for the organization of access control is proposed, which implements authentication, identification and verification of a person using the keyboard rhythm. Authentication methods based on biometric personal parameters, including those based on the keyboard rhythm, due to the inexistence of biometric characteristics without a particular person, are able to provide an advanced accuracy and inability to refuse authorship and convenience for operators of automated systems, in comparison with other methods of conformity checking. Methods of permanent hidden keyboard monitoring allow detecting the substitution of a student and blocking the key system.
Pointright: a system to redirect mouse and keyboard control among multiple machines

Science.gov (United States)

Johanson, Bradley E [Palo Alto, CA; Winograd, Terry A [Stanford, CA; Hutchins, Gregory M [Mountain View, CA

2008-09-30

The present invention provides a software system, PointRight, that allows for smooth and effortless control of pointing and input devices among multiple displays. With PointRight, a single free-floating mouse and keyboard can be used to control multiple screens. When the cursor reaches the edge of a screen it seamlessly moves to the adjacent screen and keyboard control is simultaneously redirected to the appropriate machine. Laptops may also redirect their keyboard and pointing device, and multiple pointers are supported simultaneously. The system automatically reconfigures itself as displays go on, go off, or change the machine they display.

Differences in typing forces, muscle activity, comfort, and typing performance among virtual, notebook, and desktop keyboards.

Science.gov (United States)

Kim, Jeong Ho; Aulck, Lovenoor; Bartha, Michael C; Harper, Christy A; Johnson, Peter W

2014-11-01

The present study investigated whether there were physical exposure and typing productivity differences between a virtual keyboard with no tactile feedback and two conventional keyboards where key travel and tactile feedback are provided by mechanical switches under the keys. The key size and layout were same across all the keyboards. Typing forces; finger and shoulder muscle activity; self-reported comfort; and typing productivity were measured from 19 subjects while typing on a virtual (0 mm key travel), notebook (1.8 mm key travel), and desktop keyboard (4 mm key travel). When typing on the virtual keyboard, subjects typed with less force (p's typing forces and finger muscle activity came at the expense of a 60% reduction in typing productivity (p typing sessions or when typing productivity is at a premium, conventional keyboards with tactile feedback may be more suitable interface. Copyright © 2014 Elsevier Ltd and The Ergonomics Society. All rights reserved.
A low-cost MRI compatible keyboard

DEFF Research Database (Denmark)

Jensen, Martin Snejbjerg; Heggli, Ole Adrian; Alves da Mota, Patricia

2017-01-01

, presenting a challenging environment for playing an instrument. Here, we present an MRI-compatible polyphonic keyboard with a materials cost of 850 $, designed and tested for safe use in 3T (three Tesla) MRI-scanners. We describe design considerations, and prior work in the field. In addition, we provide...
CPU and GPU (Cuda Template Matching Comparison

Directory of Open Access Journals (Sweden)

Evaldas Borcovas

2014-05-01

Full Text Available Image processing, computer vision or other complicated opticalinformation processing algorithms require large resources. It isoften desired to execute algorithms in real time. It is hard tofulfill such requirements with single CPU processor. NVidiaproposed CUDA technology enables programmer to use theGPU resources in the computer. Current research was madewith Intel Pentium Dual-Core T4500 2.3 GHz processor with4 GB RAM DDR3 (CPU I, NVidia GeForce GT320M CUDAcompliable graphics card (GPU I and Intel Core I5-2500K3.3 GHz processor with 4 GB RAM DDR3 (CPU II, NVidiaGeForce GTX 560 CUDA compatible graphic card (GPU II.Additional libraries as OpenCV 2.1 and OpenCV 2.4.0 CUDAcompliable were used for the testing. Main test were made withstandard function MatchTemplate from the OpenCV libraries.The algorithm uses a main image and a template. An influenceof these factors was tested. Main image and template have beenresized and the algorithm computing time and performancein Gtpix/s have been measured. According to the informationobtained from the research GPU computing using the hardwarementioned earlier is till 24 times faster when it is processing abig amount of information. When the images are small the performanceof CPU and GPU are not significantly different. Thechoice of the template size makes influence on calculating withCPU. Difference in the computing time between the GPUs canbe explained by the number of cores which they have.
A CFD Heterogeneous Parallel Solver Based on Collaborating CPU and GPU

Science.gov (United States)

Lai, Jianqi; Tian, Zhengyu; Li, Hua; Pan, Sha

2018-03-01

Since Graphic Processing Unit (GPU) has a strong ability of floating-point computation and memory bandwidth for data parallelism, it has been widely used in the areas of common computing such as molecular dynamics (MD), computational fluid dynamics (CFD) and so on. The emergence of compute unified device architecture (CUDA), which reduces the complexity of compiling program, brings the great opportunities to CFD. There are three different modes for parallel solution of NS equations: parallel solver based on CPU, parallel solver based on GPU and heterogeneous parallel solver based on collaborating CPU and GPU. As we can see, GPUs are relatively rich in compute capacity but poor in memory capacity and the CPUs do the opposite. We need to make full use of the GPUs and CPUs, so a CFD heterogeneous parallel solver based on collaborating CPU and GPU has been established. Three cases are presented to analyse the solver’s computational accuracy and heterogeneous parallel efficiency. The numerical results agree well with experiment results, which demonstrate that the heterogeneous parallel solver has high computational precision. The speedup on a single GPU is more than 40 for laminar flow, it decreases for turbulent flow, but it still can reach more than 20. What’s more, the speedup increases as the grid size becomes larger.
Detecting Cognitive Stress from Keyboard and Mouse Dynamics during Mental Arithmetic

OpenAIRE

Lim, Yee Mei; Ayesh, Aladdin, 1972-; Stacey, Martin

2014-01-01

Much research has been done to detect human emotion using various computational methods, such as physiological measures and facial expression recognition. These methods are effective but they could be expensive or intrusive as special setups of equipment are needed. Some researchers have utilized nonintrusive methods by using mouse or keyboard analyses and presented comparable effectiveness in detecting human emotion. This paper investigates how both keyboard and mouse features can be combine...
STEM image simulation with hybrid CPU/GPU programming

International Nuclear Information System (INIS)

Yao, Y.; Ge, B.H.; Shen, X.; Wang, Y.G.; Yu, R.C.

2016-01-01

STEM image simulation is achieved via hybrid CPU/GPU programming under parallel algorithm architecture to speed up calculation on a personal computer (PC). To utilize the calculation power of a PC fully, the simulation is performed using the GPU core and multi-CPU cores at the same time to significantly improve efficiency. GaSb and an artificial GaSb/InAs interface with atom diffusion have been used to verify the computation. - Highlights: • STEM image simulation is achieved by hybrid CPU/GPU programming under parallel algorithm architecture to speed up the calculation in the personal computer (PC). • In order to fully utilize the calculation power of the PC, the simulation is performed by GPU core and multi-CPU cores at the same time so efficiency is improved significantly. • GaSb and artificial GaSb/InAs interface with atom diffusion have been used to verify the computation. The results reveal some unintuitive phenomena about the contrast variation with the atom numbers.
STEM image simulation with hybrid CPU/GPU programming

Energy Technology Data Exchange (ETDEWEB)

Yao, Y., E-mail: yaoyuan@iphy.ac.cn; Ge, B.H.; Shen, X.; Wang, Y.G.; Yu, R.C.

2016-07-15

STEM image simulation is achieved via hybrid CPU/GPU programming under parallel algorithm architecture to speed up calculation on a personal computer (PC). To utilize the calculation power of a PC fully, the simulation is performed using the GPU core and multi-CPU cores at the same time to significantly improve efficiency. GaSb and an artificial GaSb/InAs interface with atom diffusion have been used to verify the computation. - Highlights: • STEM image simulation is achieved by hybrid CPU/GPU programming under parallel algorithm architecture to speed up the calculation in the personal computer (PC). • In order to fully utilize the calculation power of the PC, the simulation is performed by GPU core and multi-CPU cores at the same time so efficiency is improved significantly. • GaSb and artificial GaSb/InAs interface with atom diffusion have been used to verify the computation. The results reveal some unintuitive phenomena about the contrast variation with the atom numbers.
Impact of keyboard typing on the morphological changes of the median nerve.

Science.gov (United States)

Yeap Loh, Ping; Liang Yeoh, Wen; Nakashima, Hiroki; Muraki, Satoshi

2017-09-28

The primary objective was to investigate the effects of continuous typing on median nerve changes at the carpal tunnel region at two different keyboard slopes (0° and 20°). The secondary objective was to investigate the differences in wrist kinematics and the changes in wrist anthropometric measurements when typing at the two different keyboard slopes. Fifteen healthy right-handed young men were recruited. A randomized sequence of the conditions (control, typing I, and typing II) was assigned to each participant. Wrist anthropometric measurements, wrist kinematics data collection and ultrasound examination to the median nerve was performed at designated time block. Typing activity and time block do not cause significant changes to the wrist anthropometric measurements. The wrist measurements remained similar across all the time blocks in the three conditions. Subsequently, the wrist extensions and ulnar deviations were significantly higher in both the typing I and typing II conditions than in the control condition for both wrists (ptyping I and typing II conditions after the typing task than before the typing task. The MNCSA significantly decreased in the recovery phase after the typing task. This study demonstrated the immediate changes in the median nerve after continuous keyboard typing. Changes in the median nerve were greater during typing using a keyboard tilted at 20° than during typing using a keyboard tilted at 0°. The main findings suggest wrist posture near to neutral position caused lower changes of the median nerve.
ITCA: Inter-Task Conflict-Aware CPU accounting for CMP

OpenAIRE

Luque, Carlos; Moreto Planas, Miquel; Cazorla Almeida, Francisco Javier; Gioiosa, Roberto; Valero Cortés, Mateo

2010-01-01

Chip-MultiProcessors (CMP) introduce complexities when accounting CPU utilization to processes because the progress done by a process during an interval of time highly depends on the activity of the other processes it is coscheduled with. We propose a new hardware CPU accounting mechanism to improve the accuracy when measuring the CPU utilization in CMPs and compare it with previous accounting mechanisms. Our results show that currently known mechanisms lead to a 16% average error when it com...
Differential effects of type of keyboard playing task and tempo on surface EMG amplitudes of forearm muscles

Directory of Open Access Journals (Sweden)

Hyun Ju eChong

2015-09-01

Full Text Available Despite increasing interest in keyboard playing as a strategy for repetitive finger exercises in fine motor skill development and hand rehabilitation, comparative analysis of task-specific finger movements relevant to keyboard playing has been less extensive. This study examined whether there were differences in surface EMG activity levels of forearm muscles associated with different keyboard playing tasks. Results demonstrated higher muscle activity with sequential keyboard playing in a random pattern compared to individuated playing or sequential playing in a successive pattern. Also, the speed of finger movements was found as a factor that affect muscle activity levels, demonstrating that faster tempo elicited significantly greater muscle activity than self-paced tempo. The results inform our understanding of the type of finger movements involved in different types of keyboard playing at different tempi so as to consider the efficacy and fatigue level of keyboard playing as an intervention for amateur pianists or individuals with impaired fine motor skills.
Differential effects of type of keyboard playing task and tempo on surface EMG amplitudes of forearm muscles

Science.gov (United States)

Chong, Hyun Ju; Kim, Soo Ji; Yoo, Ga Eul

2015-01-01

Despite increasing interest in keyboard playing as a strategy for repetitive finger exercises in fine motor skill development and hand rehabilitation, comparative analysis of task-specific finger movements relevant to keyboard playing has been less extensive. This study examined, whether there were differences in surface EMG activity levels of forearm muscles associated with different keyboard playing tasks. Results demonstrated higher muscle activity with sequential keyboard playing in a random pattern compared to individuated playing or sequential playing in a successive pattern. Also, the speed of finger movements was found as a factor that affect muscle activity levels, demonstrating that faster tempo elicited significantly greater muscle activity than self-paced tempo. The results inform our understanding of the type of finger movements involved in different types of keyboard playing at different tempi. This helps to consider the efficacy and fatigue level of keyboard playing tasks when being used as an intervention for amateur pianists or individuals with impaired fine motor skills. PMID:26388798
Online performance evaluation of RAID 5 using CPU utilization

Science.gov (United States)

Jin, Hai; Yang, Hua; Zhang, Jiangling

1998-09-01

Redundant arrays of independent disks (RAID) technology is the efficient way to solve the bottleneck problem between CPU processing ability and I/O subsystem. For the system point of view, the most important metric of on line performance is the utilization of CPU. This paper first employs the way to calculate the CPU utilization of system connected with RAID level 5 using statistic average method. From the simulation results of CPU utilization of system connected with RAID level 5 subsystem can we see that using multiple disks as an array to access data in parallel is the efficient way to enhance the on-line performance of disk storage system. USing high-end disk drivers to compose the disk array is the key to enhance the on-line performance of system.
Accelerating Spaceborne SAR Imaging Using Multiple CPU/GPU Deep Collaborative Computing

Directory of Open Access Journals (Sweden)

Fan Zhang

2016-04-01

Full Text Available With the development of synthetic aperture radar (SAR technologies in recent years, the huge amount of remote sensing data brings challenges for real-time imaging processing. Therefore, high performance computing (HPC methods have been presented to accelerate SAR imaging, especially the GPU based methods. In the classical GPU based imaging algorithm, GPU is employed to accelerate image processing by massive parallel computing, and CPU is only used to perform the auxiliary work such as data input/output (IO. However, the computing capability of CPU is ignored and underestimated. In this work, a new deep collaborative SAR imaging method based on multiple CPU/GPU is proposed to achieve real-time SAR imaging. Through the proposed tasks partitioning and scheduling strategy, the whole image can be generated with deep collaborative multiple CPU/GPU computing. In the part of CPU parallel imaging, the advanced vector extension (AVX method is firstly introduced into the multi-core CPU parallel method for higher efficiency. As for the GPU parallel imaging, not only the bottlenecks of memory limitation and frequent data transferring are broken, but also kinds of optimized strategies are applied, such as streaming, parallel pipeline and so on. Experimental results demonstrate that the deep CPU/GPU collaborative imaging method enhances the efficiency of SAR imaging on single-core CPU by 270 times and realizes the real-time imaging in that the imaging rate outperforms the raw data generation rate.
Accelerating Spaceborne SAR Imaging Using Multiple CPU/GPU Deep Collaborative Computing.

Science.gov (United States)

Zhang, Fan; Li, Guojun; Li, Wei; Hu, Wei; Hu, Yuxin

2016-04-07

With the development of synthetic aperture radar (SAR) technologies in recent years, the huge amount of remote sensing data brings challenges for real-time imaging processing. Therefore, high performance computing (HPC) methods have been presented to accelerate SAR imaging, especially the GPU based methods. In the classical GPU based imaging algorithm, GPU is employed to accelerate image processing by massive parallel computing, and CPU is only used to perform the auxiliary work such as data input/output (IO). However, the computing capability of CPU is ignored and underestimated. In this work, a new deep collaborative SAR imaging method based on multiple CPU/GPU is proposed to achieve real-time SAR imaging. Through the proposed tasks partitioning and scheduling strategy, the whole image can be generated with deep collaborative multiple CPU/GPU computing. In the part of CPU parallel imaging, the advanced vector extension (AVX) method is firstly introduced into the multi-core CPU parallel method for higher efficiency. As for the GPU parallel imaging, not only the bottlenecks of memory limitation and frequent data transferring are broken, but also kinds of optimized strategies are applied, such as streaming, parallel pipeline and so on. Experimental results demonstrate that the deep CPU/GPU collaborative imaging method enhances the efficiency of SAR imaging on single-core CPU by 270 times and realizes the real-time imaging in that the imaging rate outperforms the raw data generation rate.
Toward Optimization of Gaze-Controlled Human-Computer Interaction: Application to Hindi Virtual Keyboard for Stroke Patients.

Science.gov (United States)

Meena, Yogesh Kumar; Cecotti, Hubert; Wong-Lin, Kongfatt; Dutta, Ashish; Prasad, Girijesh

2018-04-01

Virtual keyboard applications and alternative communication devices provide new means of communication to assist disabled people. To date, virtual keyboard optimization schemes based on script-specific information, along with multimodal input access facility, are limited. In this paper, we propose a novel method for optimizing the position of the displayed items for gaze-controlled tree-based menu selection systems by considering a combination of letter frequency and command selection time. The optimized graphical user interface layout has been designed for a Hindi language virtual keyboard based on a menu wherein 10 commands provide access to type 88 different characters, along with additional text editing commands. The system can be controlled in two different modes: eye-tracking alone and eye-tracking with an access soft-switch. Five different keyboard layouts have been presented and evaluated with ten healthy participants. Furthermore, the two best performing keyboard layouts have been evaluated with eye-tracking alone on ten stroke patients. The overall performance analysis demonstrated significantly superior typing performance, high usability (87% SUS score), and low workload (NASA TLX with 17 scores) for the letter frequency and time-based organization with script specific arrangement design. This paper represents the first optimized gaze-controlled Hindi virtual keyboard, which can be extended to other languages.
Low-power grating detection system chip for high-speed low-cost length and angle precision measurement

Science.gov (United States)

Hou, Ligang; Luo, Rengui; Wu, Wuchen

2006-11-01

This paper forwards a low power grating detection chip (EYAS) on length and angle precision measurement. Traditional grating detection method, such as resister chain divide or phase locked divide circuit are difficult to design and tune. The need of an additional CPU for control and display makes these methods' implementation more complex and costly. Traditional methods also suffer low sampling speed for the complex divide circuit scheme and CPU software compensation. EYAS is an application specific integrated circuit (ASIC). It integrates micro controller unit (MCU), power management unit (PMU), LCD controller, Keyboard interface, grating detection unit and other peripherals. Working at 10MHz, EYAS can afford 5MHz internal sampling rate and can handle 1.25MHz orthogonal signal from grating sensor. With a simple control interface by keyboard, sensor parameter, data processing and system working mode can be configured. Two LCD controllers can adapt to dot array LCD or segment bit LCD, which comprised output interface. PMU alters system between working and standby mode by clock gating technique to save power. EYAS in test mode (system action are more frequently than real world use) consumes 0.9mw, while 0.2mw in real world use. EYAS achieved the whole grating detection system function, high-speed orthogonal signal handling in a single chip with very low power consumption.
Metronome LKM: An open source virtual keyboard driver to measure experiment software latencies.

Science.gov (United States)

Garaizar, Pablo; Vadillo, Miguel A

2017-10-01

Experiment software is often used to measure reaction times gathered with keyboards or other input devices. In previous studies, the accuracy and precision of time stamps has been assessed through several means: (a) generating accurate square wave signals from an external device connected to the parallel port of the computer running the experiment software, (b) triggering the typematic repeat feature of some keyboards to get an evenly separated series of keypress events, or (c) using a solenoid handled by a microcontroller to press the input device (keyboard, mouse button, touch screen) that will be used in the experimental setup. Despite the advantages of these approaches in some contexts, none of them can isolate the measurement error caused by the experiment software itself. Metronome LKM provides a virtual keyboard to assess an experiment's software. Using this open source driver, researchers can generate keypress events using high-resolution timers and compare the time stamps collected by the experiment software with those gathered by Metronome LKM (with nanosecond resolution). Our software is highly configurable (in terms of keys pressed, intervals, SysRq activation) and runs on 2.6-4.8 Linux kernels.
Semiempirical Quantum Chemical Calculations Accelerated on a Hybrid Multicore CPU-GPU Computing Platform.

Science.gov (United States)

Wu, Xin; Koslowski, Axel; Thiel, Walter

2012-07-10

In this work, we demonstrate that semiempirical quantum chemical calculations can be accelerated significantly by leveraging the graphics processing unit (GPU) as a coprocessor on a hybrid multicore CPU-GPU computing platform. Semiempirical calculations using the MNDO, AM1, PM3, OM1, OM2, and OM3 model Hamiltonians were systematically profiled for three types of test systems (fullerenes, water clusters, and solvated crambin) to identify the most time-consuming sections of the code. The corresponding routines were ported to the GPU and optimized employing both existing library functions and a GPU kernel that carries out a sequence of noniterative Jacobi transformations during pseudodiagonalization. The overall computation times for single-point energy calculations and geometry optimizations of large molecules were reduced by one order of magnitude for all methods, as compared to runs on a single CPU core.
Examining the Impact of L2 Proficiency and Keyboarding Skills on Scores on TOEFL-iBT Writing Tasks

Science.gov (United States)

Barkaoui, Khaled

2014-01-01

A major concern with computer-based (CB) tests of second-language (L2) writing is that performance on such tests may be influenced by test-taker keyboarding skills. Poor keyboarding skills may force test-takers to focus their attention and cognitive resources on motor activities (i.e., keyboarding) and, consequently, other processes and aspects of…
A new concept of assistive virtual keyboards based on a systematic review of text entry optimization techniques

Directory of Open Access Journals (Sweden)

Renato de Sousa Gomide

Full Text Available Abstract Introduction: Due to the increasing popularization of computers and the internet expansion, Alternative and Augmentative Communication technologies have been employed to restore the ability to communicate of people with aphasia and tetraplegia. Virtual keyboards are one of the most primitive mechanisms for alternatively entering text and play a very important role in accomplishing this task. However, the text entry for this kind of keyboard is much slower than entering information through their physical counterparts. Many techniques and layouts have been proposed to improve the typing performance of virtual keyboards, each one concerning a different issue or solving a specific problem. However, not all of them are suitable to assist seriously people with motor impairment. Methods: In order to develop an assistive virtual keyboard with improved typing performance, we performed a systematic review on scientific databases. Results: We found 250 related papers and 52 of them were selected to compose. After that, we identified eight essentials virtual keyboard features, five methods to optimize data entry performance and five metrics to assess typing performance. Conclusion: Based on this review, we introduce a concept of an assistive, optimized, compact and adaptive virtual keyboard that gathers a set of suitable techniques such as: a new ambiguous keyboard layout, disambiguation algorithms, dynamic scan techniques, static text prediction of letters and words and, finally, the use of phonetic and similarity algorithms to reduce the user's typing error rate.

Design improvement of FPGA and CPU based digital circuit cards to solve timing issues

International Nuclear Information System (INIS)

Lee, Dongil; Lee, Jaeki; Lee, Kwang-Hyun

2016-01-01

The digital circuit cards installed at NPPs (Nuclear Power Plant) are mostly composed of a CPU (Central Processing Unit) and a PLD (Programmable Logic Device; these include a FPGA (Field Programmable Gate Array) and a CPLD (Complex Programmable Logic Device)). This type of structure is typical and is maintained using digital circuit cards. There are no big problems with this device as a structure. In particular, signal delay causes a lot of problems when various IC (Integrated Circuit) and several circuit cards are connected to the BUS of the backplane in the BUS design. This paper suggests a structure to improve the BUS signal timing problems in a circuit card consisting of CPU and FPGA. Nowadays, as the structure of circuit cards has become complex and mass data at high speed is communicated through the BUS, data integrity is the most important issue. The conventional design does not consider delay and the synchronicity of signal and this causes many problems in data processing. In order to solve these problems, it is important to isolate the BUS controller from the CPU and maintain constancy of the signal delay by using a PLD
Design improvement of FPGA and CPU based digital circuit cards to solve timing issues

Energy Technology Data Exchange (ETDEWEB)

Lee, Dongil; Lee, Jaeki; Lee, Kwang-Hyun [KHNP CRI, Daejeon (Korea, Republic of)

2016-10-15

The digital circuit cards installed at NPPs (Nuclear Power Plant) are mostly composed of a CPU (Central Processing Unit) and a PLD (Programmable Logic Device; these include a FPGA (Field Programmable Gate Array) and a CPLD (Complex Programmable Logic Device)). This type of structure is typical and is maintained using digital circuit cards. There are no big problems with this device as a structure. In particular, signal delay causes a lot of problems when various IC (Integrated Circuit) and several circuit cards are connected to the BUS of the backplane in the BUS design. This paper suggests a structure to improve the BUS signal timing problems in a circuit card consisting of CPU and FPGA. Nowadays, as the structure of circuit cards has become complex and mass data at high speed is communicated through the BUS, data integrity is the most important issue. The conventional design does not consider delay and the synchronicity of signal and this causes many problems in data processing. In order to solve these problems, it is important to isolate the BUS controller from the CPU and maintain constancy of the signal delay by using a PLD.
Piano Crossing – Walking on a Keyboard

Directory of Open Access Journals (Sweden)

Bojan Kverh

2010-11-01

Full Text Available Piano Crossing is an interactive art installation which turns a pedestrian crossing marked with white stripes into a piano keyboard so that pedestrians can generate music by walking over it. Matching tones are created when a pedestrian steps on a particular stripe or key. A digital camera is directed at the crossing from above. A special computer vision application was developed, which maps the stripes of the pedestrian crossing to piano keys and detects by means of an image over which key the center of gravity of each pedestrian is placed at any given moment. Black stripes represent the black piano keys. The application consists of two parts: (1 initialization, where the model of the abstract piano keyboard is mapped to the image of the pedestrian crossing, and (2 the detection of pedestrians at the crossing, so that musical tones can be generated according to their locations. The art installation Piano crossing was presented to the public for the first time during the 51st Jazz Festival in Ljubljana in July 2010.
Piano Crossing – Walking on a Keyboard

Directory of Open Access Journals (Sweden)

Franc Solina

2010-04-01

Full Text Available Piano Crossing is an interactive art installation which turns a pedestrian crossing marked with white stripes into a piano keyboard so that pedestrians can generate music by walking over it. Matching tones are created when a pedestrian steps on a particular stripe or key. A digital camera is directed at the crossing from above. A special computer vision application was developed, which maps the stripes of the pedestrian crossing to piano keys and detects by means of an image over which key the center of gravity of each pedestrian is placed at any given moment. Black stripes represent the black piano keys. The application consists of two parts: (1 initialization, where the model of the abstract piano keyboard is mapped to the image of the pedestrian crossing, and (2 the detection of pedestrians at the crossing, so that musical tones can be generated according to their locations. The art installation Piano crossing was presented to the public for the first time during the 51st Jazz Festival in Ljubljana in July 2010.
Reconstruction of the neutron spectrum using an artificial neural network in CPU and GPU; Reconstruccion del espectro de neutrones usando una red neuronal artificial (RNA) en CPU y GPU

Energy Technology Data Exchange (ETDEWEB)

Hernandez D, V. M.; Moreno M, A.; Ortiz L, M. A. [Universidad de Cordoba, 14002 Cordoba (Spain); Vega C, H. R.; Alonso M, O. E., E-mail: vic.mc68010@gmail.com [Universidad Autonoma de Zacatecas, 98000 Zacatecas, Zac. (Mexico)

2016-10-15

The increase in computing power in personal computers has been increasing, computers now have several processors in the CPU and in addition multiple CUDA cores in the graphics processing unit (GPU); both systems can be used individually or combined to perform scientific computation without resorting to processor or supercomputing arrangements. The Bonner sphere spectrometer is the most commonly used multi-element system for neutron detection purposes and its associated spectrum. Each sphere-detector combination gives a particular response that depends on the energy of the neutrons, and the total set of these responses is known like the responses matrix Rφ(E). Thus, the counting rates obtained with each sphere and the neutron spectrum is related to the Fredholm equation in its discrete version. For the reconstruction of the spectrum has a system of poorly conditioned equations with an infinite number of solutions and to find the appropriate solution, it has been proposed the use of artificial intelligence through neural networks with different platforms CPU and GPU. (Author)
78 FR 6835 - Certain Mobile Handset Devices and Related Touch Keyboard Software; Institution of Investigation

Science.gov (United States)

2013-01-31

... INTERNATIONAL TRADE COMMISSION [Investigation No. 337-TA-864] Certain Mobile Handset Devices and... importation of certain mobile handset devices and related touch keyboard software by reason of infringement of... certain mobile handset devices and related touch keyboard software that infringe one or more of claims 36...
Born to Conquer: The Fortepiano’s Revolution of Keyboard Technique and Style

Directory of Open Access Journals (Sweden)

Rachel A. Lowrance

2014-06-01

Full Text Available The fortepiano had a rough beginning. In 1709 it entered a world that was not quite ready for it; a world that was very comfortable with the earlier keyboard instruments, especially the harpsichord. Pianists and composers were used to the harpsichord technique and style, which is drastically different from the piano. This is because the harpsichord was actually a very different instrument than the piano, as is explained in this paper. This paper traces the history of the piano's rise to dominance over the harpsichord, and how its unique hammer action began creating an idiomatic piano style. The piano also revolutionized keyboard repertoire, taking some genrs from the harpsichord and also creating completely new genres of compositions. Despite its slow start in the early eighteenth century, the piano completely revolutionized the musical world into which it was born. The rise of the fortepiano throughout the late eighteenth and nineteenth centuries transformed traditional keyboard technique, style and compositions.
Validity of questionnaire self-reports on computer, mouse and keyboard usage during a four-week period

DEFF Research Database (Denmark)

Mikkelsen, S.; Vilstrup, Imogen; Lassen, C. F.

2007-01-01

OBJECTIVE: To examine the validity and potential biases in self-reports of computer, mouse and keyboard usage times, compared with objective recordings. METHODS: A study population of 1211 people was asked in a questionnaire to estimate the average time they had worked with computer, mouse...... and keyboard during the past four working weeks. During the same period, a software program recorded these activities objectively. The study was part of a one-year follow-up study from 2000-1 of musculoskeletal outcomes among Danish computer workers. RESULTS: Self-reports on computer, mouse and keyboard usage...... times were positively associated with objectively measured activity, but the validity was low. Self-reports explained only between a quarter and a third of the variance of objectively measured activity, and were even lower for one measure (keyboard time). Self-reports overestimated usage times...
A New Keyboard for the Bohlen-Pierce Scale

CERN Document Server

Nassar, Antonio

2011-01-01

The study of harmonic scales of musical instruments is discussed in all introductory physics texts devoted to the science of sound. In this paper, we present a new piano keyboard to make the so-called Bohlen-Pierce scale more functional and pleasing for composition and performance.
On Assisting a Visual-Facial Affect Recognition System with Keyboard-Stroke Pattern Information

Science.gov (United States)

Stathopoulou, I.-O.; Alepis, E.; Tsihrintzis, G. A.; Virvou, M.

Towards realizing a multimodal affect recognition system, we are considering the advantages of assisting a visual-facial expression recognition system with keyboard-stroke pattern information. Our work is based on the assumption that the visual-facial and keyboard modalities are complementary to each other and that their combination can significantly improve the accuracy in affective user models. Specifically, we present and discuss the development and evaluation process of two corresponding affect recognition subsystems, with emphasis on the recognition of 6 basic emotional states, namely happiness, sadness, surprise, anger and disgust as well as the emotion-less state which we refer to as neutral. We find that emotion recognition by the visual-facial modality can be aided greatly by keyboard-stroke pattern information and the combination of the two modalities can lead to better results towards building a multimodal affect recognition system.
Thermally-aware composite run-time CPU power models

OpenAIRE

Walker, Matthew J.; Diestelhorst, Stephan; Hansson, Andreas; Balsamo, Domenico; Merrett, Geoff V.; Al-Hashimi, Bashir M.

2016-01-01

Accurate and stable CPU power modelling is fundamental in modern system-on-chips (SoCs) for two main reasons: 1) they enable significant online energy savings by providing a run-time manager with reliable power consumption data for controlling CPU energy-saving techniques; 2) they can be used as accurate and trusted reference models for system design and exploration. We begin by showing the limitations in typical performance monitoring counter (PMC) based power modelling approaches and illust...
The relationship between keyboarding skills and self-regulated ...

African Journals Online (AJOL)

Erna Kinsey

record thoughts and ideas, communicate and solve problems (Zimmerman ... Keyboarding skill, as a motor skill, is defined as the ability of learners to key in information ... ly, the forethought phase, the performance or volitional control phase, and the ..... annual meeting of the American Educational Research Association, San ...
The Effect of NUMA Tunings on CPU Performance

Science.gov (United States)

Hollowell, Christopher; Caramarcu, Costin; Strecker-Kellogg, William; Wong, Antonio; Zaytsev, Alexandr

2015-12-01

Non-Uniform Memory Access (NUMA) is a memory architecture for symmetric multiprocessing (SMP) systems where each processor is directly connected to separate memory. Indirect access to other CPU's (remote) RAM is still possible, but such requests are slower as they must also pass through that memory's controlling CPU. In concert with a NUMA-aware operating system, the NUMA hardware architecture can help eliminate the memory performance reductions generally seen in SMP systems when multiple processors simultaneously attempt to access memory. The x86 CPU architecture has supported NUMA for a number of years. Modern operating systems such as Linux support NUMA-aware scheduling, where the OS attempts to schedule a process to the CPU directly attached to the majority of its RAM. In Linux, it is possible to further manually tune the NUMA subsystem using the numactl utility. With the release of Red Hat Enterprise Linux (RHEL) 6.3, the numad daemon became available in this distribution. This daemon monitors a system's NUMA topology and utilization, and automatically makes adjustments to optimize locality. As the number of cores in x86 servers continues to grow, efficient NUMA mappings of processes to CPUs/memory will become increasingly important. This paper gives a brief overview of NUMA, and discusses the effects of manual tunings and numad on the performance of the HEPSPEC06 benchmark, and ATLAS software.
Rater reliability and concurrent validity of the Keyboard Personal Computer Style instrument (K-PeCS).

Science.gov (United States)

Baker, Nancy A; Cook, James R; Redfern, Mark S

2009-01-01

This paper describes the inter-rater and intra-rater reliability, and the concurrent validity of an observational instrument, the Keyboard Personal Computer Style instrument (K-PeCS), which assesses stereotypical postures and movements associated with computer keyboard use. Three trained raters independently rated the video clips of 45 computer keyboard users to ascertain inter-rater reliability, and then re-rated a sub-sample of 15 video clips to ascertain intra-rater reliability. Concurrent validity was assessed by comparing the ratings obtained using the K-PeCS to scores developed from a 3D motion analysis system. The overall K-PeCS had excellent reliability [inter-rater: intra-class correlation coefficients (ICC)=.90; intra-rater: ICC=.92]. Most individual items on the K-PeCS had from good to excellent reliability, although six items fell below ICC=.75. Those K-PeCS items that were assessed for concurrent validity compared favorably to the motion analysis data for all but two items. These results suggest that most items on the K-PeCS can be used to reliably document computer keyboarding style.
In Early Education, Why Teach Handwriting before Keyboarding?

Science.gov (United States)

Stevenson, Nancy C.; Just, Carol

2014-01-01

Legible written communication is essential for students to share knowledge (Rogers and Case-Smith 2002). If students lack proficiency in written communication, their composition skills will suffer, which can affect their self-esteem and grades. Whether or not this proficiency is in handwriting or keyboarding is a question worthy of discussion. In…
The Effect of NUMA Tunings on CPU Performance

International Nuclear Information System (INIS)

Hollowell, Christopher; Caramarcu, Costin; Strecker-Kellogg, William; Wong, Antonio; Zaytsev, Alexandr

2015-01-01

Non-Uniform Memory Access (NUMA) is a memory architecture for symmetric multiprocessing (SMP) systems where each processor is directly connected to separate memory. Indirect access to other CPU's (remote) RAM is still possible, but such requests are slower as they must also pass through that memory's controlling CPU. In concert with a NUMA-aware operating system, the NUMA hardware architecture can help eliminate the memory performance reductions generally seen in SMP systems when multiple processors simultaneously attempt to access memory.The x86 CPU architecture has supported NUMA for a number of years. Modern operating systems such as Linux support NUMA-aware scheduling, where the OS attempts to schedule a process to the CPU directly attached to the majority of its RAM. In Linux, it is possible to further manually tune the NUMA subsystem using the numactl utility. With the release of Red Hat Enterprise Linux (RHEL) 6.3, the numad daemon became available in this distribution. This daemon monitors a system's NUMA topology and utilization, and automatically makes adjustments to optimize locality.As the number of cores in x86 servers continues to grow, efficient NUMA mappings of processes to CPUs/memory will become increasingly important. This paper gives a brief overview of NUMA, and discusses the effects of manual tunings and numad on the performance of the HEPSPEC06 benchmark, and ATLAS software. (paper)
Creating a Single South African Keyboard Layout to Promote ...

African Journals Online (AJOL)

R.B. Ruthven

examined, but strategic objectives such as ensuring its wide adoption and creating ... keyboard for all South African languages are also discussed. .... ease of use and adoption, considering the different types of users (touch ... guage pride, challenge negative language perceptions and build a ... 3.2 Revisiting orthographies.
Is Writing Performance Related to Keyboard Type? An Investigation from Examinees' Perspectives on the TOEFL IBT

Science.gov (United States)

Ling, Guangming

2017-01-01

To investigate whether the type of keyboard used in exams introduces any construct-irrelevant variance to the TOEFL iBT Writing scores, we surveyed 17,040 TOEFL iBT examinees from 24 countries on their keyboard-related perceptions and preferences and analyzed the survey responses together with their test scores. Results suggest that controlling…
Accelerating Smith-Waterman Alignment for Protein Database Search Using Frequency Distance Filtration Scheme Based on CPU-GPU Collaborative System

Directory of Open Access Journals (Sweden)

Yu Liu

2015-01-01

Full Text Available The Smith-Waterman (SW algorithm has been widely utilized for searching biological sequence databases in bioinformatics. Recently, several works have adopted the graphic card with Graphic Processing Units (GPUs and their associated CUDA model to enhance the performance of SW computations. However, these works mainly focused on the protein database search by using the intertask parallelization technique, and only using the GPU capability to do the SW computations one by one. Hence, in this paper, we will propose an efficient SW alignment method, called CUDA-SWfr, for the protein database search by using the intratask parallelization technique based on a CPU-GPU collaborative system. Before doing the SW computations on GPU, a procedure is applied on CPU by using the frequency distance filtration scheme (FDFS to eliminate the unnecessary alignments. The experimental results indicate that CUDA-SWfr runs 9.6 times and 96 times faster than the CPU-based SW method without and with FDFS, respectively.
Accelerating Smith-Waterman Alignment for Protein Database Search Using Frequency Distance Filtration Scheme Based on CPU-GPU Collaborative System.

Science.gov (United States)

Liu, Yu; Hong, Yang; Lin, Chun-Yuan; Hung, Che-Lun

2015-01-01

The Smith-Waterman (SW) algorithm has been widely utilized for searching biological sequence databases in bioinformatics. Recently, several works have adopted the graphic card with Graphic Processing Units (GPUs) and their associated CUDA model to enhance the performance of SW computations. However, these works mainly focused on the protein database search by using the intertask parallelization technique, and only using the GPU capability to do the SW computations one by one. Hence, in this paper, we will propose an efficient SW alignment method, called CUDA-SWfr, for the protein database search by using the intratask parallelization technique based on a CPU-GPU collaborative system. Before doing the SW computations on GPU, a procedure is applied on CPU by using the frequency distance filtration scheme (FDFS) to eliminate the unnecessary alignments. The experimental results indicate that CUDA-SWfr runs 9.6 times and 96 times faster than the CPU-based SW method without and with FDFS, respectively.

ITCA: Inter-Task Conflict-Aware CPU accounting for CMPs

OpenAIRE

Luque, Carlos; Moreto Planas, Miquel; Cazorla, Francisco; Gioiosa, Roberto; Buyuktosunoglu, Alper; Valero Cortés, Mateo

2009-01-01

Chip-MultiProcessor (CMP) architectures are becoming more and more popular as an alternative to the traditional processors that only extract instruction-level parallelism from an application. CMPs introduce complexities when accounting CPU utilization. This is due to the fact that the progress done by an application during an interval of time highly depends on the activity of the other applications it is co-scheduled with. In this paper, we identify how an inaccurate measurement of the CPU ut...
Reconstruction of the neutron spectrum using an artificial neural network in CPU and GPU

International Nuclear Information System (INIS)

Hernandez D, V. M.; Moreno M, A.; Ortiz L, M. A.; Vega C, H. R.; Alonso M, O. E.

2016-10-01

The increase in computing power in personal computers has been increasing, computers now have several processors in the CPU and in addition multiple CUDA cores in the graphics processing unit (GPU); both systems can be used individually or combined to perform scientific computation without resorting to processor or supercomputing arrangements. The Bonner sphere spectrometer is the most commonly used multi-element system for neutron detection purposes and its associated spectrum. Each sphere-detector combination gives a particular response that depends on the energy of the neutrons, and the total set of these responses is known like the responses matrix Rφ(E). Thus, the counting rates obtained with each sphere and the neutron spectrum is related to the Fredholm equation in its discrete version. For the reconstruction of the spectrum has a system of poorly conditioned equations with an infinite number of solutions and to find the appropriate solution, it has been proposed the use of artificial intelligence through neural networks with different platforms CPU and GPU. (Author)
Conserved-peptide upstream open reading frames (CPuORFs are associated with regulatory genes in angiosperms

Directory of Open Access Journals (Sweden)

Richard A Jorgensen

2012-08-01

Full Text Available Upstream open reading frames (uORFs are common in eukaryotic transcripts, but those that encode conserved peptides (CPuORFs occur in less than 1% of transcripts. The peptides encoded by three plant CPuORF families are known to control translation of the downstream ORF in response to a small signal molecule (sucrose, polyamines and phosphocholine. In flowering plants, transcription factors are statistically over-represented among genes that possess CPuORFs, and in general it appeared that many CPuORF genes also had other regulatory functions, though the significance of this suggestion was uncertain (Hayden and Jorgensen, 2007. Five years later the literature provides much more information on the functions of many CPuORF genes. Here we reassess the functions of 27 known CPuORF gene families and find that 22 of these families play a variety of different regulatory roles, from transcriptional control to protein turnover, and from small signal molecules to signal transduction kinases. Clearly then, there is indeed a strong association of CPuORFs with regulatory genes. In addition, 16 of these families play key roles in a variety of different biological processes. Most strikingly, the core sucrose response network includes three different CPuORFs, creating the potential for sophisticated balancing of the network in response to three different molecular inputs. We propose that the function of most CPuORFs is to modulate translation of a downstream major ORF (mORF in response to a signal molecule recognized by the conserved peptide and that because the mORFs of CPuORF genes generally encode regulatory proteins, many of them centrally important in the biology of plants, CPuORFs play key roles in balancing such regulatory networks.
Fast multipurpose Monte Carlo simulation for proton therapy using multi- and many-core CPU architectures

Energy Technology Data Exchange (ETDEWEB)

Souris, Kevin, E-mail: kevin.souris@uclouvain.be; Lee, John Aldo [Center for Molecular Imaging and Experimental Radiotherapy, Institut de Recherche Expérimentale et Clinique, Université catholique de Louvain, Avenue Hippocrate 54, 1200 Brussels, Belgium and ICTEAM Institute, Université catholique de Louvain, Louvain-la-Neuve 1348 (Belgium); Sterpin, Edmond [Center for Molecular Imaging and Experimental Radiotherapy, Institut de Recherche Expérimentale et Clinique, Université catholique de Louvain, Avenue Hippocrate 54, 1200 Brussels, Belgium and Department of Oncology, Katholieke Universiteit Leuven, O& N I Herestraat 49, 3000 Leuven (Belgium)

2016-04-15

Purpose: Accuracy in proton therapy treatment planning can be improved using Monte Carlo (MC) simulations. However the long computation time of such methods hinders their use in clinical routine. This work aims to develop a fast multipurpose Monte Carlo simulation tool for proton therapy using massively parallel central processing unit (CPU) architectures. Methods: A new Monte Carlo, called MCsquare (many-core Monte Carlo), has been designed and optimized for the last generation of Intel Xeon processors and Intel Xeon Phi coprocessors. These massively parallel architectures offer the flexibility and the computational power suitable to MC methods. The class-II condensed history algorithm of MCsquare provides a fast and yet accurate method of simulating heavy charged particles such as protons, deuterons, and alphas inside voxelized geometries. Hard ionizations, with energy losses above a user-specified threshold, are simulated individually while soft events are regrouped in a multiple scattering theory. Elastic and inelastic nuclear interactions are sampled from ICRU 63 differential cross sections, thereby allowing for the computation of prompt gamma emission profiles. MCsquare has been benchmarked with the GATE/GEANT4 Monte Carlo application for homogeneous and heterogeneous geometries. Results: Comparisons with GATE/GEANT4 for various geometries show deviations within 2%–1 mm. In spite of the limited memory bandwidth of the coprocessor simulation time is below 25 s for 10{sup 7} primary 200 MeV protons in average soft tissues using all Xeon Phi and CPU resources embedded in a single desktop unit. Conclusions: MCsquare exploits the flexibility of CPU architectures to provide a multipurpose MC simulation tool. Optimized code enables the use of accurate MC calculation within a reasonable computation time, adequate for clinical practice. MCsquare also simulates prompt gamma emission and can thus be used also for in vivo range verification.
Fast multipurpose Monte Carlo simulation for proton therapy using multi- and many-core CPU architectures

International Nuclear Information System (INIS)

Souris, Kevin; Lee, John Aldo; Sterpin, Edmond

2016-01-01

Purpose: Accuracy in proton therapy treatment planning can be improved using Monte Carlo (MC) simulations. However the long computation time of such methods hinders their use in clinical routine. This work aims to develop a fast multipurpose Monte Carlo simulation tool for proton therapy using massively parallel central processing unit (CPU) architectures. Methods: A new Monte Carlo, called MCsquare (many-core Monte Carlo), has been designed and optimized for the last generation of Intel Xeon processors and Intel Xeon Phi coprocessors. These massively parallel architectures offer the flexibility and the computational power suitable to MC methods. The class-II condensed history algorithm of MCsquare provides a fast and yet accurate method of simulating heavy charged particles such as protons, deuterons, and alphas inside voxelized geometries. Hard ionizations, with energy losses above a user-specified threshold, are simulated individually while soft events are regrouped in a multiple scattering theory. Elastic and inelastic nuclear interactions are sampled from ICRU 63 differential cross sections, thereby allowing for the computation of prompt gamma emission profiles. MCsquare has been benchmarked with the GATE/GEANT4 Monte Carlo application for homogeneous and heterogeneous geometries. Results: Comparisons with GATE/GEANT4 for various geometries show deviations within 2%–1 mm. In spite of the limited memory bandwidth of the coprocessor simulation time is below 25 s for 10"7 primary 200 MeV protons in average soft tissues using all Xeon Phi and CPU resources embedded in a single desktop unit. Conclusions: MCsquare exploits the flexibility of CPU architectures to provide a multipurpose MC simulation tool. Optimized code enables the use of accurate MC calculation within a reasonable computation time, adequate for clinical practice. MCsquare also simulates prompt gamma emission and can thus be used also for in vivo range verification.
Fast multipurpose Monte Carlo simulation for proton therapy using multi- and many-core CPU architectures.

Science.gov (United States)

Souris, Kevin; Lee, John Aldo; Sterpin, Edmond

2016-04-01

Accuracy in proton therapy treatment planning can be improved using Monte Carlo (MC) simulations. However the long computation time of such methods hinders their use in clinical routine. This work aims to develop a fast multipurpose Monte Carlo simulation tool for proton therapy using massively parallel central processing unit (CPU) architectures. A new Monte Carlo, called MCsquare (many-core Monte Carlo), has been designed and optimized for the last generation of Intel Xeon processors and Intel Xeon Phi coprocessors. These massively parallel architectures offer the flexibility and the computational power suitable to MC methods. The class-II condensed history algorithm of MCsquare provides a fast and yet accurate method of simulating heavy charged particles such as protons, deuterons, and alphas inside voxelized geometries. Hard ionizations, with energy losses above a user-specified threshold, are simulated individually while soft events are regrouped in a multiple scattering theory. Elastic and inelastic nuclear interactions are sampled from ICRU 63 differential cross sections, thereby allowing for the computation of prompt gamma emission profiles. MCsquare has been benchmarked with the gate/geant4 Monte Carlo application for homogeneous and heterogeneous geometries. Comparisons with gate/geant4 for various geometries show deviations within 2%-1 mm. In spite of the limited memory bandwidth of the coprocessor simulation time is below 25 s for 10(7) primary 200 MeV protons in average soft tissues using all Xeon Phi and CPU resources embedded in a single desktop unit. MCsquare exploits the flexibility of CPU architectures to provide a multipurpose MC simulation tool. Optimized code enables the use of accurate MC calculation within a reasonable computation time, adequate for clinical practice. MCsquare also simulates prompt gamma emission and can thus be used also for in vivo range verification.
Functions and requirements for a cesium demonstration unit

International Nuclear Information System (INIS)

Howden, G.F.

1994-04-01

Westinghouse Hanford Company is investigating alternative means to pretreat the wastes in the Hanford radioactive waste storage tanks. Alternatives include (but are not limited to) in-tank pretreatment, use of above ground transportable compact processing units (CPU) located adjacent to a tank farm, and fixed processing facilities. This document provides the functions and requirements for a CPU to remove cesium from tank waste as a demonstration of the CPU concept. It is therefore identified as the Cesium Demonstration Unit CDU
The PAMELA storage and control unit

International Nuclear Information System (INIS)

Casolino, M.; Altamura, F.; Basili, A.; De Pascale, M.P.; Minori, M.; Nagni, M.; Picozza, P.; Sparvoli, R.; Adriani, O.; Papini, P.; Spillantini, P.; Castellini, G.; Boezio, M.

2007-01-01

The PAMELA Storage and Control Unit (PSCU) comprises a Central Processing Unit (CPU) and a Mass Memory (MM). The CPU of the experiment is based on a ERC-32 architecture (a SPARC v7 implementation) running a real time operating system (RTEMS). The main purpose of the CPU is to handle slow control, acquisition and store data on a 2 GB MM. Communications between PAMELA and the satellite are done via a 1553B bus. Data acquisition from the sub-detectors is performed via a 2 MB/s interface. Download from the PAMELA MM towards the satellite main storage unit is handled by a 16 MB/s bus. The maximum daily amount of data transmitted to ground is about 20 GB
The PAMELA storage and control unit

Energy Technology Data Exchange (ETDEWEB)

Casolino, M. [INFN, Structure of Rome II, Physics Department, University of Rome II ' Tor Vergata' , I-00133 Rome (Italy)]. E-mail: Marco.Casolino@roma2.infn.it; Altamura, F. [INFN, Structure of Rome II, Physics Department, University of Rome II ' Tor Vergata' , I-00133 Rome (Italy); Basili, A. [INFN, Structure of Rome II, Physics Department, University of Rome II ' Tor Vergata' , I-00133 Rome (Italy); De Pascale, M.P. [INFN, Structure of Rome II, Physics Department, University of Rome II ' Tor Vergata' , I-00133 Rome (Italy); Minori, M. [INFN, Structure of Rome II, Physics Department, University of Rome II ' Tor Vergata' , I-00133 Rome (Italy); Nagni, M. [INFN, Structure of Rome II, Physics Department, University of Rome II ' Tor Vergata' , I-00133 Rome (Italy); Picozza, P. [INFN, Structure of Rome II, Physics Department, University of Rome II ' Tor Vergata' , I-00133 Rome (Italy); Sparvoli, R. [INFN, Structure of Rome II, Physics Department, University of Rome II ' Tor Vergata' , I-00133 Rome (Italy); Adriani, O. [INFN, Structure of Florence, Physics Department, University of Florence, I-50019 Sesto Fiorentino (Italy); Papini, P. [INFN, Structure of Florence, Physics Department, University of Florence, I-50019 Sesto Fiorentino (Italy); Spillantini, P. [INFN, Structure of Florence, Physics Department, University of Florence, I-50019 Sesto Fiorentino (Italy); Castellini, G. [CNR-Istituto di Fisica Applicata ' Nello Carrara' , I-50127 Florence (Italy); Boezio, M. [INFN, Structure of Trieste, Physics Department, University of Trieste, I-34147 Trieste (Italy)

2007-03-01

The PAMELA Storage and Control Unit (PSCU) comprises a Central Processing Unit (CPU) and a Mass Memory (MM). The CPU of the experiment is based on a ERC-32 architecture (a SPARC v7 implementation) running a real time operating system (RTEMS). The main purpose of the CPU is to handle slow control, acquisition and store data on a 2 GB MM. Communications between PAMELA and the satellite are done via a 1553B bus. Data acquisition from the sub-detectors is performed via a 2 MB/s interface. Download from the PAMELA MM towards the satellite main storage unit is handled by a 16 MB/s bus. The maximum daily amount of data transmitted to ground is about 20 GB.
Control of a visual keyboard using an electrocorticographic brain-computer interface.

Science.gov (United States)

Krusienski, Dean J; Shih, Jerry J

2011-05-01

Brain-computer interfaces (BCIs) are devices that enable severely disabled people to communicate and interact with their environments using their brain waves. Most studies investigating BCI in humans have used scalp EEG as the source of electrical signals and focused on motor control of prostheses or computer cursors on a screen. The authors hypothesize that the use of brain signals obtained directly from the cortical surface will more effectively control a communication/spelling task compared to scalp EEG. A total of 6 patients with medically intractable epilepsy were tested for the ability to control a visual keyboard using electrocorticographic (ECOG) signals. ECOG data collected during a P300 visual task paradigm were preprocessed and used to train a linear classifier to subsequently predict the intended target letters. The classifier was able to predict the intended target character at or near 100% accuracy using fewer than 15 stimulation sequences in 5 of the 6 people tested. ECOG data from electrodes outside the language cortex contributed to the classifier and enabled participants to write words on a visual keyboard. This is a novel finding because previous invasive BCI research in humans used signals exclusively from the motor cortex to control a computer cursor or prosthetic device. These results demonstrate that ECOG signals from electrodes both overlying and outside the language cortex can reliably control a visual keyboard to generate language output without voice or limb movements.
Alphabet Writing and Allograph Selection as Predictors of Spelling in Sentences Written by Spanish-Speaking Children Who Are Poor or Good Keyboarders.

Science.gov (United States)

Peake, Christian; Diaz, Alicia; Artiles, Ceferino

This study examined the relationship and degree of predictability that the fluency of writing the alphabet from memory and the selection of allographs have on measures of fluency and accuracy of spelling in a free-writing sentence task when keyboarding. The Test Estandarizado para la Evaluación de la Escritura con Teclado ("Spanish Keyboarding Writing Test"; Jiménez, 2012) was used as the assessment tool. A sample of 986 children from Grades 1 through 3 were classified according to transcription skills measured by keyboard ability (poor vs. good) across the grades. Results demonstrated that fluency in writing the alphabet and selecting allographs mediated the differences in spelling between good and poor keyboarders in the free-writing task. Execution in the allograph selection task and writing alphabet from memory had different degrees of predictability in each of the groups in explaining the level of fluency and spelling in the free-writing task sentences, depending on the grade. These results suggest that early assessment of writing by means of the computer keyboard can provide clues and guidelines for intervention and training to strengthen specific skills to improve writing performance in the early primary grades in transcription skills by keyboarding.
Procedural Memory Consolidation in the Performance of Brief Keyboard Sequences

Science.gov (United States)

Duke, Robert A.; Davis, Carla M.

2006-01-01

Using two sequential key press sequences, we tested the extent to which subjects' performance on a digital piano keyboard changed between the end of training and retest on subsequent days. We found consistent, significant improvements attributable to sleep-based consolidation effects, indicating that learning continued after the cessation of…
Internal Structure and Development of Keyboard Skills in Spanish-Speaking Primary-School Children With and Without LD in Writing.

Science.gov (United States)

Jiménez, Juan E; Marco, Isaac; Suárez, Natalia; González, Desirée

This study had two purposes: examining the internal structure of the Test Estandarizado para la Evaluación Inicial de la Escritura con Teclado (TEVET; Spanish Keyboarding Writing Test), and analyzing the development of keyboarding skills in Spanish elementary school children with and without learning disabilities (LD) in writing. A group of 1,168 elementary school children carried out the following writing tasks: writing the alphabet in order from memory, allograph selection, word copying, writing dictated words with inconsistent spelling, writing pseudowords from dictation, and independent composition of sentence. For this purpose, exploratory factor analysis for the TEVET was conducted. Principal component analysis with a varimax rotation identified three factors with eigenvalues greater than 1.0. Based on factorial analysis, we analyzed the keyboarding skills across grades in Spanish elementary school children with and without LD (i.e., poor handwriters compared with poor spellers, who in turn were compared with mixed compared with typically achieving writers). The results indicated that poor handwriters did not differ from typically achieving writers in phonological processing, visual-orthographic processing, and sentence production components by keyboarding. The educational implications of the findings are analyzed with regard to acquisition of keyboarding skills in children with and without LD in transcription.
Internal Structure and Development of Keyboard Skills in Spanish-Speaking Primary-School Children with and without LD in Writing

Science.gov (United States)

Jiménez, Juan E.; Marco, Isaac; Suárez, Natalia; González, Desirée

2017-01-01

This study had two purposes: examining the internal structure of the "Test Estandarizado para la Evaluación Inicial de la Escritura con Teclado" (TEVET; Spanish Keyboarding Writing Test), and analyzing the development of keyboarding skills in Spanish elementary school children with and without learning disabilities (LD) in writing. A…
Do you know where your fingers have been? Explicit knowledge of the spatial layout of the keyboard in skilled typists.

Science.gov (United States)

Liu, Xianyun; Crump, Matthew J C; Logan, Gordon D

2010-06-01

Two experiments evaluated skilled typists' ability to report knowledge about the layout of keys on a standard keyboard. In Experiment 1, subjects judged the relative direction of letters on the computer keyboard. One group of subjects was asked to imagine the keyboard, one group was allowed to look at the keyboard, and one group was asked to type the letter pair before judging relative direction. The imagine group had larger angular error and longer response time than both the look and touch groups. In Experiment 2, subjects placed one key relative to another. Again, the imagine group had larger angular error, larger distance error, and longer response time than the other groups. The two experiments suggest that skilled typists have poor explicit knowledge of key locations. The results are interpreted in terms of a model with two hierarchical parts in the system controlling typewriting.
The CMSSW benchmarking suite: Using HEP code to measure CPU performance

International Nuclear Information System (INIS)

Benelli, G

2010-01-01

The demanding computing needs of the CMS experiment require thoughtful planning and management of its computing infrastructure. A key factor in this process is the use of realistic benchmarks when assessing the computing power of the different architectures available. In recent years a discrepancy has been observed between the CPU performance estimates given by the reference benchmark for HEP computing (SPECint) and actual performances of HEP code. Making use of the CPU performance tools from the CMSSW performance suite, comparative CPU performance studies have been carried out on several architectures. A benchmarking suite has been developed and integrated in the CMSSW framework, to allow computing centers and interested third parties to benchmark architectures directly with CMSSW. The CMSSW benchmarking suite can be used out of the box, to test and compare several machines in terms of CPU performance and report with the wanted level of detail the different benchmarking scores (e.g. by processing step) and results. In this talk we describe briefly the CMSSW software performance suite, and in detail the CMSSW benchmarking suite client/server design, the performance data analysis and the available CMSSW benchmark scores. The experience in the use of HEP code for benchmarking will be discussed and CMSSW benchmark results presented.
Thermoeconomic cost analysis of CO_2 compression and purification unit in oxy-combustion power plants

International Nuclear Information System (INIS)

Jin, Bo; Zhao, Haibo; Zheng, Chuguang

2015-01-01

Highlights: • Thermoeconomic cost analysis for CO_2 compression and purification unit is conducted. • Exergy cost and thermoeconomic cost occur in flash separation and mixing processes. • Unit exergy costs for flash separator and multi-stream heat exchanger are identical. • Multi-stage CO_2 compressor contributes to the minimum unit exergy cost. • Thermoeconomic performance for optimized CPU is enhanced. - Abstract: High CO_2 purity products can be obtained from oxy-combustion power plants through CO_2 compression and purification unit (CPU) based on phase separation method. To identify cost formation process and potential energy savings for CPU, detailed thermoeconomic cost analysis based on structure theory of thermoeconomics is applied to an optimized CPU (with double flash separators). It is found that the largest unit exergy cost occurs in the first separation process while the multi-stage CO_2 compressor contributes to the minimum unit exergy cost. In two flash separation processes, unit exergy costs for the flash separator and multi-stream heat exchanger are identical but their unit thermoeconomic costs are different once monetary cost for each device is considered. For cost inefficiency occurring in CPU, it mainly derives from large exergy costs and thermoeconomic costs in the flash separation and mixing processes. When compared with an unoptimized CPU, thermoeconomic performance for the optimized CPU is enhanced and the maximum reduction of 5.18% for thermoeconomic cost is attained. To achieve cost effective operation, measures should be taken to improve operations of the flash separation and mixing processes.
SU-E-J-60: Efficient Monte Carlo Dose Calculation On CPU-GPU Heterogeneous Systems

Energy Technology Data Exchange (ETDEWEB)

Xiao, K; Chen, D. Z; Hu, X. S [University of Notre Dame, Notre Dame, IN (United States); Zhou, B [Altera Corp., San Jose, CA (United States)

2014-06-01

Purpose: It is well-known that the performance of GPU-based Monte Carlo dose calculation implementations is bounded by memory bandwidth. One major cause of this bottleneck is the random memory writing patterns in dose deposition, which leads to several memory efficiency issues on GPU such as un-coalesced writing and atomic operations. We propose a new method to alleviate such issues on CPU-GPU heterogeneous systems, which achieves overall performance improvement for Monte Carlo dose calculation. Methods: Dose deposition is to accumulate dose into the voxels of a dose volume along the trajectories of radiation rays. Our idea is to partition this procedure into the following three steps, which are fine-tuned for CPU or GPU: (1) each GPU thread writes dose results with location information to a buffer on GPU memory, which achieves fully-coalesced and atomic-free memory transactions; (2) the dose results in the buffer are transferred to CPU memory; (3) the dose volume is constructed from the dose buffer on CPU. We organize the processing of all radiation rays into streams. Since the steps within a stream use different hardware resources (i.e., GPU, DMA, CPU), we can overlap the execution of these steps for different streams by pipelining. Results: We evaluated our method using a Monte Carlo Convolution Superposition (MCCS) program and tested our implementation for various clinical cases on a heterogeneous system containing an Intel i7 quad-core CPU and an NVIDIA TITAN GPU. Comparing with a straightforward MCCS implementation on the same system (using both CPU and GPU for radiation ray tracing), our method gained 2-5X speedup without losing dose calculation accuracy. Conclusion: The results show that our new method improves the effective memory bandwidth and overall performance for MCCS on the CPU-GPU systems. Our proposed method can also be applied to accelerate other Monte Carlo dose calculation approaches. This research was supported in part by NSF under Grants CCF
An experimental Dutch keyboard-to-speech system for the speech impaired

NARCIS (Netherlands)

Deliege, R.J.H.

1989-01-01

An experimental Dutch keyboard-to-speech system has been developed to explor the possibilities and limitations of Dutch speech synthesis in a communication aid for the speech impaired. The system uses diphones and a formant synthesizer chip for speech synthesis. Input to the system is in
Psychomotor Impairment Detection via Finger Interactions with a Computer Keyboard During Natural Typing

Science.gov (United States)

Giancardo, L.; Sánchez-Ferro, A.; Butterworth, I.; Mendoza, C. S.; Hooker, J. M.

2015-04-01

Modern digital devices and appliances are capable of monitoring the timing of button presses, or finger interactions in general, with a sub-millisecond accuracy. However, the massive amount of high resolution temporal information that these devices could collect is currently being discarded. Multiple studies have shown that the act of pressing a button triggers well defined brain areas which are known to be affected by motor-compromised conditions. In this study, we demonstrate that the daily interaction with a computer keyboard can be employed as means to observe and potentially quantify psychomotor impairment. We induced a psychomotor impairment via a sleep inertia paradigm in 14 healthy subjects, which is detected by our classifier with an Area Under the ROC Curve (AUC) of 0.93/0.91. The detection relies on novel features derived from key-hold times acquired on standard computer keyboards during an uncontrolled typing task. These features correlate with the progression to psychomotor impairment (p < 0.001) regardless of the content and language of the text typed, and perform consistently with different keyboards. The ability to acquire longitudinal measurements of subtle motor changes from a digital device without altering its functionality may allow for early screening and follow-up of motor-compromised neurodegenerative conditions, psychological disorders or intoxication at a negligible cost in the general population.

Joint Optimized CPU and Networking Control Scheme for Improved Energy Efficiency in Video Streaming on Mobile Devices

Directory of Open Access Journals (Sweden)

Sung-Woong Jo

2017-01-01

Full Text Available Video streaming service is one of the most popular applications for mobile users. However, mobile video streaming services consume a lot of energy, resulting in a reduced battery life. This is a critical problem that results in a degraded user’s quality of experience (QoE. Therefore, in this paper, a joint optimization scheme that controls both the central processing unit (CPU and wireless networking of the video streaming process for improved energy efficiency on mobile devices is proposed. For this purpose, the energy consumption of the network interface and CPU is analyzed, and based on the energy consumption profile a joint optimization problem is formulated to maximize the energy efficiency of the mobile device. The proposed algorithm adaptively adjusts the number of chunks to be downloaded and decoded in each packet. Simulation results show that the proposed algorithm can effectively improve the energy efficiency when compared with the existing algorithms.
Use of general purpose graphics processing units with MODFLOW

Science.gov (United States)

Hughes, Joseph D.; White, Jeremy T.

2013-01-01

To evaluate the use of general-purpose graphics processing units (GPGPUs) to improve the performance of MODFLOW, an unstructured preconditioned conjugate gradient (UPCG) solver has been developed. The UPCG solver uses a compressed sparse row storage scheme and includes Jacobi, zero fill-in incomplete, and modified-incomplete lower-upper (LU) factorization, and generalized least-squares polynomial preconditioners. The UPCG solver also includes options for sequential and parallel solution on the central processing unit (CPU) using OpenMP. For simulations utilizing the GPGPU, all basic linear algebra operations are performed on the GPGPU; memory copies between the central processing unit CPU and GPCPU occur prior to the first iteration of the UPCG solver and after satisfying head and flow criteria or exceeding a maximum number of iterations. The efficiency of the UPCG solver for GPGPU and CPU solutions is benchmarked using simulations of a synthetic, heterogeneous unconfined aquifer with tens of thousands to millions of active grid cells. Testing indicates GPGPU speedups on the order of 2 to 8, relative to the standard MODFLOW preconditioned conjugate gradient (PCG) solver, can be achieved when (1) memory copies between the CPU and GPGPU are optimized, (2) the percentage of time performing memory copies between the CPU and GPGPU is small relative to the calculation time, (3) high-performance GPGPU cards are utilized, and (4) CPU-GPGPU combinations are used to execute sequential operations that are difficult to parallelize. Furthermore, UPCG solver testing indicates GPGPU speedups exceed parallel CPU speedups achieved using OpenMP on multicore CPUs for preconditioners that can be easily parallelized.
Keyboard Instruments and Instrumentalists in Manila (1581-1798

Directory of Open Access Journals (Sweden)

Irving, David

2005-12-01

Full Text Available While countless keyboard instruments were made in the Philippine Islands or imported there during the Spanish colonial period (1565-1898, most of those which survive date from the nineteenth century. It should be noted, however, that keyboard music flourished in the Philippines from the early years of Spanish presence. Instruments began to arrive in the late-sixteenth century, and over the following two centuries, numerous organs were manufactured in mission schools throughout the archipelago. The final years of the eighteenth century were a high point in colonial instrument-building, moreover, with the arrival of the Recollect missionary and instrument-builder Diego Cera de la Virgen del Carmen in 1792 and the establishment of his workshop. While some evidence remains fragmentary, there is still a great deal of archival information to be pieced together, and a comprehensive survey of keyboard instruments and instrumentalists present in colonial Manila remains to be undertaken. This article attempts to address this lacuna in part, covering the period between 1581 and 1798.

Aunque muchos instrumentos de teclado fueron fabricados en las Filipinas o importados por aquellas islas durante el período colonial español (1565-1878, la mayor parte de los que aún existen son del siglo XIX. Debe tenerse en cuenta, sin embargo, que la música para teclado tuvo un auge importante en Filipinas desde los inicios mismos de la presencia española. Empezaron a llegar diversos instrumentos hacia finales del siglo XVI, y a lo largo de los dos siglos siguientes se construyeron varios órganos en las escuelas de las misiones presentes en el archipiélago. Al final del siglo XVIII muchos instrumentos fueron construidos en las colonias, sobre todo con la llegada del misionero y constructor de instrumentos, Diego Cera de la Virgen en el año 1792, y el establecimiento de su taller. Aunque sólo disponemos de una información incompleta, aún existe mucha
Inhibition of CPU0213, a Dual Endothelin Receptor Antagonist, on Apoptosis via Nox4-Dependent ROS in HK-2 Cells

Directory of Open Access Journals (Sweden)

Qing Li

2016-06-01

Full Text Available Background/Aims: Our previous studies have indicated that a novel endothelin receptor antagonist CPU0213 effectively normalized renal function in diabetic nephropathy. However, the molecular mechanisms mediating the nephroprotective role of CPU0213 remain unknown. Methods and Results: In the present study, we first detected the role of CPU0213 on apoptosis in human renal tubular epithelial cell (HK-2. It was shown that high glucose significantly increased the protein expression of Bax and decreased Bcl-2 protein in HK-2 cells, which was reversed by CPU0213. The percentage of HK-2 cells that showed Annexin V-FITC binding was markedly suppressed by CPU0213, which confirmed the inhibitory role of CPU0213 on apoptosis. Given the regulation of endothelin (ET system to oxidative stress, we determined the role of redox signaling in the regulation of CPU0213 on apoptosis. It was demonstrated that the production of superoxide (O2-. was substantially attenuated by CPU0213 treatment in HK-2 cells. We further found that CPU0213 dramatically inhibited expression of Nox4 protein, which gene silencing mimicked the role of CPU0213 on the apoptosis under high glucose stimulation. We finally examined the role of CPU0213 on ET-1 receptors and found that high glucose-induced protein expression of endothelin A and B receptors was dramatically inhibited by CPU0213. Conclusion: Taken together, these results suggest that this Nox4-dependenet O2- production is critical for the apoptosis of HK-2 cells in high glucose. Endothelin receptor antagonist CPU0213 has an anti-apoptosis role through Nox4-dependent O2-.production, which address the nephroprotective role of CPU0213 in diabetic nephropathy.
An efficient implementation of 3D high-resolution imaging for large-scale seismic data with GPU/CPU heterogeneous parallel computing

Science.gov (United States)

Xu, Jincheng; Liu, Wei; Wang, Jin; Liu, Linong; Zhang, Jianfeng

2018-02-01

De-absorption pre-stack time migration (QPSTM) compensates for the absorption and dispersion of seismic waves by introducing an effective Q parameter, thereby making it an effective tool for 3D, high-resolution imaging of seismic data. Although the optimal aperture obtained via stationary-phase migration reduces the computational cost of 3D QPSTM and yields 3D stationary-phase QPSTM, the associated computational efficiency is still the main problem in the processing of 3D, high-resolution images for real large-scale seismic data. In the current paper, we proposed a division method for large-scale, 3D seismic data to optimize the performance of stationary-phase QPSTM on clusters of graphics processing units (GPU). Then, we designed an imaging point parallel strategy to achieve an optimal parallel computing performance. Afterward, we adopted an asynchronous double buffering scheme for multi-stream to perform the GPU/CPU parallel computing. Moreover, several key optimization strategies of computation and storage based on the compute unified device architecture (CUDA) were adopted to accelerate the 3D stationary-phase QPSTM algorithm. Compared with the initial GPU code, the implementation of the key optimization steps, including thread optimization, shared memory optimization, register optimization and special function units (SFU), greatly improved the efficiency. A numerical example employing real large-scale, 3D seismic data showed that our scheme is nearly 80 times faster than the CPU-QPSTM algorithm. Our GPU/CPU heterogeneous parallel computing framework significant reduces the computational cost and facilitates 3D high-resolution imaging for large-scale seismic data.
SAFARI digital processing unit: performance analysis of the SpaceWire links in case of a LEON3-FT based CPU

Science.gov (United States)

Giusi, Giovanni; Liu, Scige J.; Di Giorgio, Anna M.; Galli, Emanuele; Pezzuto, Stefano; Farina, Maria; Spinoglio, Luigi

2014-08-01

SAFARI (SpicA FAR infrared Instrument) is a far-infrared imaging Fourier Transform Spectrometer for the SPICA mission. The Digital Processing Unit (DPU) of the instrument implements the functions of controlling the overall instrument and implementing the science data compression and packing. The DPU design is based on the use of a LEON family processor. In SAFARI, all instrument components are connected to the central DPU via SpaceWire links. On these links science data, housekeeping and commands flows are in some cases multiplexed, therefore the interface control shall be able to cope with variable throughput needs. The effective data transfer workload can be an issue for the overall system performances and becomes a critical parameter for the on-board software design, both at application layer level and at lower, and more HW related, levels. To analyze the system behavior in presence of the expected SAFARI demanding science data flow, we carried out a series of performance tests using the standard GR-CPCI-UT699 LEON3-FT Development Board, provided by Aeroflex/Gaisler, connected to the emulator of the SAFARI science data links, in a point-to-point topology. Two different communication protocols have been used in the tests, the ECSS-E-ST-50-52C RMAP protocol and an internally defined one, the SAFARI internal data handling protocol. An incremental approach has been adopted to measure the system performances at different levels of the communication protocol complexity. In all cases the performance has been evaluated by measuring the CPU workload and the bus latencies. The tests have been executed initially in a custom low level execution environment and finally using the Real- Time Executive for Multiprocessor Systems (RTEMS), which has been selected as the operating system to be used onboard SAFARI. The preliminary results of the carried out performance analysis confirmed the possibility of using a LEON3 CPU processor in the SAFARI DPU, but pointed out, in agreement
Using the CPU and GPU for real-time video enhancement on a mobile computer

CSIR Research Space (South Africa)

Bachoo, AK

2010-09-01

Full Text Available . In this paper, the current advances in mobile CPU and GPU hardware are used to implement video enhancement algorithms in a new way on a mobile computer. Both the CPU and GPU are used effectively to achieve realtime performance for complex image enhancement...
An evaluation of touchscreen versus keyboard/mouse interaction for large screen process control displays.

Science.gov (United States)

Noah, Benjamin; Li, Jingwen; Rothrock, Ling

2017-10-01

The objectives of this study were to test the effect of interaction device on performance in a process control task (managing a tank farm). The study compared the following two conditions: a) 4K-resolution 55" screen with a 21" touchscreen versus b) 4K-resolution 55″ screen with keyboard/mouse. The touchscreen acted both as an interaction device for data entry and navigation and as an additional source of information. A within-subject experiment was conducted among 20 college engineering students. A primary task of preventing tanks from overfilling as well as a secondary task of manual logging with situation awareness questions were designed for the study. Primary Task performance (including tank level at discharge, number of tank discharged and performance score), Secondary Task Performance (including Tank log count, performance score), system interaction times, subjective workload, situation awareness questionnaire, user experience survey regarding usability and condition comparison were used as the measures. Parametric data resulted in two metrics statistically different means between the two conditions: The 4K-keyboard condition resulted in faster Detection + Navigation time compared to the 4K-touchscreen condition, by about 2 s, while participants within the 4K-touchscreen condition were about 2 s faster in data entry than in the 4K-keyboard condition. No significant results were found for: performance on the secondary task, situation awareness, and workload. Additionally, no clear significant differences were found in the non-parametric data analysis. However, participants showed a slight preference for the 4K-touchscreen condition compared to the 4K-keyboard condition in subjective responses in comparing the conditions. Introducing the touchscreen as an additional/alternative input device showed to have an effect in interaction times, which suggests that proper design considerations need to be made. While having values shown on the interaction device
LHCb: Statistical Comparison of CPU performance for LHCb applications on the Grid

CERN Multimedia

Graciani, R

2009-01-01

The usage of CPU resources by LHCb on the Grid id dominated by two different applications: Gauss and Brunel. Gauss the application doing the Monte Carlo simulation of proton-proton collisions. Brunel is the application responsible for the reconstruction of the signals recorded by the detector converting them into objects that can be used for later physics analysis of the data (tracks, clusters,…) Both applications are based on the Gaudi and LHCb software frameworks. Gauss uses Pythia and Geant as underlying libraries for the simulation of the collision and the later passage of the generated particles through the LHCb detector. While Brunel makes use of LHCb specific code to process the data from each sub-detector. Both applications are CPU bound. Large Monte Carlo productions or data reconstructions running on the Grid are an ideal benchmark to compare the performance of the different CPU models for each case. Since the processed events are only statistically comparable, only statistical comparison of the...
An Experimental Study of a Six Key Handprint Chord Keyboard.

Science.gov (United States)

1986-05-01

analysis: sequence time, list time, and errors, is better divided by group of tests, beginning or ending. This division forms a logical outline from which...accomplished pianists . Due to the limited amount of time at the keyboard that volunteers were willing to endure, asymptotic behavior was not reached...considerable attention , and it includes an idea of time 1152 quite different from that enunciated by Newton. According to this theory, 1226 there is no
CPU and cache efficient management of memory-resident databases

NARCIS (Netherlands)

Pirk, H.; Funke, F.; Grund, M.; Neumann, T.; Leser, U.; Manegold, S.; Kemper, A.; Kersten, M.L.

2013-01-01

Memory-Resident Database Management Systems (MRDBMS) have to be optimized for two resources: CPU cycles and memory bandwidth. To optimize for bandwidth in mixed OLTP/OLAP scenarios, the hybrid or Partially Decomposed Storage Model (PDSM) has been proposed. However, in current implementations,
CPU and Cache Efficient Management of Memory-Resident Databases

NARCIS (Netherlands)

H. Pirk (Holger); F. Funke; M. Grund; T. Neumann (Thomas); U. Leser; S. Manegold (Stefan); A. Kemper (Alfons); M.L. Kersten (Martin)

2013-01-01

htmlabstractMemory-Resident Database Management Systems (MRDBMS) have to be optimized for two resources: CPU cycles and memory bandwidth. To optimize for bandwidth in mixed OLTP/OLAP scenarios, the hybrid or Partially Decomposed Storage Model (PDSM) has been proposed. However, in current
The relationship among CPU utilization, temperature, and thermal power for waste heat utilization

International Nuclear Information System (INIS)

Haywood, Anna M.; Sherbeck, Jon; Phelan, Patrick; Varsamopoulos, Georgios; Gupta, Sandeep K.S.

2015-01-01

Highlights: • This work graphs a triad relationship among CPU utilization, temperature and power. • Using a custom-built cold plate, we were able capture CPU-generated high quality heat. • The work undertakes a radical approach using mineral oil to directly cool CPUs. • We found that it is possible to use CPU waste energy to power an absorption chiller. - Abstract: This work addresses significant datacenter issues of growth in numbers of computer servers and subsequent electricity expenditure by proposing, analyzing and testing a unique idea of recycling the highest quality waste heat generated by datacenter servers. The aim was to provide a renewable and sustainable energy source for use in cooling the datacenter. The work incorporates novel approaches in waste heat usage, graphing CPU temperature, power and utilization simultaneously, and a mineral oil experimental design and implementation. The work presented investigates and illustrates the quantity and quality of heat that can be captured from a variably tasked liquid-cooled microprocessor on a datacenter server blade. It undertakes a radical approach using mineral oil. The trials examine the feasibility of using the thermal energy from a CPU to drive a cooling process. Results indicate that 123 servers encapsulated in mineral oil can power a 10-ton chiller with a design point of 50.2 kW th . Compared with water-cooling experiments, the mineral oil experiment mitigated the temperature drop between the heat source and discharge line by up to 81%. In addition, due to this reduction in temperature drop, the heat quality in the oil discharge line was up to 12.3 °C higher on average than for water-cooled experiments. Furthermore, mineral oil cooling holds the potential to eliminate the 50% cooling expenditure which initially motivated this project
High performance technique for database applicationsusing a hybrid GPU/CPU platform

KAUST Repository

Zidan, Mohammed A.

2012-07-28

Many database applications, such as sequence comparing, sequence searching, and sequence matching, etc, process large database sequences. we introduce a novel and efficient technique to improve the performance of database applica- tions by using a Hybrid GPU/CPU platform. In particular, our technique solves the problem of the low efficiency result- ing from running short-length sequences in a database on a GPU. To verify our technique, we applied it to the widely used Smith-Waterman algorithm. The experimental results show that our Hybrid GPU/CPU technique improves the average performance by a factor of 2.2, and improves the peak performance by a factor of 2.8 when compared to earlier implementations. Copyright © 2011 by ASME.
Computerized control and warning system for uranium mines

International Nuclear Information System (INIS)

Sheeran, C.T.; Franklin, J.C.

1982-01-01

A commercially available microprocessor-based system capable of monitoring 512 channels has been interfaced with monitors for radon, working level, air velocity, and fan power. The basic system utilizes both Z80 and 8080 microprocessors in a desktop central processing unit (CPU). CPU hardware includes keyboard, video display (CRT), and disk drive; a printer is used to keep permanent data records. Signals from all channels are transmitted to the computer in digital form where they are processed for alarm status. Software developed for the system allows for audiovisual alarms in the event of low and high readings, rate change, change of state, or communication failure. Up to six channels can be continuously displayed on the CRT for current readings. Shift reports and trend logs may be generated to help the ventilation engineer determine average working levels and ventilation effectiveness. Additional software permits the operator to program command sequencies which may be used to automatically restart fans after a power outage
Parallelized computation for computer simulation of electrocardiograms using personal computers with multi-core CPU and general-purpose GPU.

Science.gov (United States)

Shen, Wenfeng; Wei, Daming; Xu, Weimin; Zhu, Xin; Yuan, Shizhong

2010-10-01

Biological computations like electrocardiological modelling and simulation usually require high-performance computing environments. This paper introduces an implementation of parallel computation for computer simulation of electrocardiograms (ECGs) in a personal computer environment with an Intel CPU of Core (TM) 2 Quad Q6600 and a GPU of Geforce 8800GT, with software support by OpenMP and CUDA. It was tested in three parallelization device setups: (a) a four-core CPU without a general-purpose GPU, (b) a general-purpose GPU plus 1 core of CPU, and (c) a four-core CPU plus a general-purpose GPU. To effectively take advantage of a multi-core CPU and a general-purpose GPU, an algorithm based on load-prediction dynamic scheduling was developed and applied to setting (c). In the simulation with 1600 time steps, the speedup of the parallel computation as compared to the serial computation was 3.9 in setting (a), 16.8 in setting (b), and 20.0 in setting (c). This study demonstrates that a current PC with a multi-core CPU and a general-purpose GPU provides a good environment for parallel computations in biological modelling and simulation studies. Copyright 2010 Elsevier Ireland Ltd. All rights reserved.
Keyboarding Instruction at NABTE Institutions: Are We Teaching Techniques to Reduce CTD Incidence?

Science.gov (United States)

Blaszczynski, Carol; Joyce, Marguerite Shane

1996-01-01

Responses from 157 of 193 business teachers who teach keyboarding indicated that 78.7% were aware of cumulative trauma disorder and 22% had experienced it. Only 13% of classrooms were equipped with wrist rests. About 53% teach techniques to reduce incidence, but 20% did not know whether they taught preventive measures. (SK)
Integrating Piano Keyboarding into the Elementary Classroom: Effects on Memory Skills and Sentiment Toward School.

Science.gov (United States)

Marcinkiewicz, Henryk R.; And Others

1995-01-01

Discovered that the introduction of piano keyboarding into elementary school music instruction produced a positive effect regarding children's sentiment towards school. No discernible effect was revealed concerning memory skills. Includes statistical data and description of survey questionnaires. (MJP)
Turbo Charge CPU Utilization in Fork/Join Using the ManagedBlocker

CERN Multimedia

CERN. Geneva

2017-01-01

Fork/Join is a framework for parallelizing calculations using recursive decomposition, also called divide and conquer. These algorithms occasionally end up duplicating work, especially at the beginning of the run. We can reduce wasted CPU cycles by implementing a reserved caching scheme. Before a task starts its calculation, it tries to reserve an entry in the shared map. If it is successful, it immediately begins. If not, it blocks until the other thread has finished its calculation. Unfortunately this might result in a significant number of blocked threads, decreasing CPU utilization. In this talk we will demonstrate this issue and offer a solution in the form of the ManagedBlocker. Combined with the Fork/Join, it can keep parallelism at the desired level.
Information security governance simplified from the boardroom to the keyboard

CERN Document Server

Fitzgerald, Todd

2011-01-01

Security practitioners must be able to build cost-effective security programs while also complying with government regulations. Information Security Governance Simplified: From the Boardroom to the Keyboard lays out these regulations in simple terms and explains how to use control frameworks to build an air-tight information security (IS) program and governance structure. Defining the leadership skills required by IS officers, the book examines the pros and cons of different reporting structures and highlights the various control frameworks available. It details the functions of the security d

A high performance image processing platform based on CPU-GPU heterogeneous cluster with parallel image reconstroctions for micro-CT

International Nuclear Information System (INIS)

Ding Yu; Qi Yujin; Zhang Xuezhu; Zhao Cuilan

2011-01-01

In this paper, we report the development of a high-performance image processing platform, which is based on CPU-GPU heterogeneous cluster. Currently, it consists of a Dell Precision T7500 and HP XW8600 workstations with parallel programming and runtime environment, using the message-passing interface (MPI) and CUDA (Compute Unified Device Architecture). We succeeded in developing parallel image processing techniques for 3D image reconstruction of X-ray micro-CT imaging. The results show that a GPU provides a computing efficiency of about 194 times faster than a single CPU, and the CPU-GPU clusters provides a computing efficiency of about 46 times faster than the CPU clusters. These meet the requirements of rapid 3D image reconstruction and real time image display. In conclusion, the use of CPU-GPU heterogeneous cluster is an effective way to build high-performance image processing platform. (authors)
78 FR 70320 - Certain Mobile Handset Devices and Related Touch Keyboard Software; Commission Determination Not...

Science.gov (United States)

2013-11-25

... INTERNATIONAL TRADE COMMISSION [Investigation No. 337-TA-864] Certain Mobile Handset Devices and Related Touch Keyboard Software; Commission Determination Not To Review an Initial Determination... and Personal Communications Devices, LLC (``PCD'') of Hauppauge, New York as respondents. PCD has been...
Development of automatic nuclear plate analyzing system equipped with TV measuring unit and its application to analysis of elementary particle reaction, 1

International Nuclear Information System (INIS)

Ushida, Noriyuki

1987-01-01

Various improvements are made on an analysis system which was previously reported. Twenty five emulsion plates, each with a decreased size of 3 cm x 3 cm, are mounted on a single acrylic resin sheet to reduce the required measurement time. An interface called New DOMS (digitized on-line microscope) is designed to reduce the analysis time and to improve the reliability of the analysis. The newly developed analysis system consists of five blocks: a stage block (with measuring range of 170 mm along the x and y axes and 2 mm along the z axis and an accuracy of 1 μm for each axis), DG-M10 host computer (with external storages for 15M byte hard disk and 368k byte minifloppy disk), DOMS interface (for control of the stage, operation of the graphic image and control of the CCD TV measuring unit), CCD TV measuring unit (equipped with a CCD TV camera to display the observed emulsion on a TV monitor for measuring the grain position), and measurement terminal (consisting of a picture monitor, video terminal module and keyboards). This report also shows a DOMS system function block diagram (crate controller and I/O, phase converter, motor controller, sub CPU for dysplay, graphic memory, ROM writer, power supply), describes the CCD TV measuring unit hardware (CCD TV camera, sync. separator, window generator, darkest point detector, mixer, focus counter), and outlines the connections among the components. (Nogami, K.)
An Investigation of the Performance of the Colored Gauss-Seidel Solver on CPU and GPU

International Nuclear Information System (INIS)

Yoon, Jong Seon; Choi, Hyoung Gwon; Jeon, Byoung Jin

2017-01-01

The performance of the colored Gauss–Seidel solver on CPU and GPU was investigated for the two- and three-dimensional heat conduction problems by using different mesh sizes. The heat conduction equation was discretized by the finite difference method and finite element method. The CPU yielded good performance for small problems but deteriorated when the total memory required for computing was larger than the cache memory for large problems. In contrast, the GPU performed better as the mesh size increased because of the latency hiding technique. Further, GPU computation by the colored Gauss–Siedel solver was approximately 7 times that by the single CPU. Furthermore, the colored Gauss–Seidel solver was found to be approximately twice that of the Jacobi solver when parallel computing was conducted on the GPU.
An Investigation of the Performance of the Colored Gauss-Seidel Solver on CPU and GPU

Energy Technology Data Exchange (ETDEWEB)

Yoon, Jong Seon; Choi, Hyoung Gwon [Seoul Nat’l Univ. of Science and Technology, Seoul (Korea, Republic of); Jeon, Byoung Jin [Yonsei Univ., Seoul (Korea, Republic of)

2017-02-15

The performance of the colored Gauss–Seidel solver on CPU and GPU was investigated for the two- and three-dimensional heat conduction problems by using different mesh sizes. The heat conduction equation was discretized by the finite difference method and finite element method. The CPU yielded good performance for small problems but deteriorated when the total memory required for computing was larger than the cache memory for large problems. In contrast, the GPU performed better as the mesh size increased because of the latency hiding technique. Further, GPU computation by the colored Gauss–Siedel solver was approximately 7 times that by the single CPU. Furthermore, the colored Gauss–Seidel solver was found to be approximately twice that of the Jacobi solver when parallel computing was conducted on the GPU.
A Programming Framework for Scientific Applications on CPU-GPU Systems

Energy Technology Data Exchange (ETDEWEB)

Owens, John

2013-03-24

At a high level, my research interests center around designing, programming, and evaluating computer systems that use new approaches to solve interesting problems. The rapid change of technology allows a variety of different architectural approaches to computationally difficult problems, and a constantly shifting set of constraints and trends makes the solutions to these problems both challenging and interesting. One of the most important recent trends in computing has been a move to commodity parallel architectures. This sea change is motivated by the industry’s inability to continue to profitably increase performance on a single processor and instead to move to multiple parallel processors. In the period of review, my most significant work has been leading a research group looking at the use of the graphics processing unit (GPU) as a general-purpose processor. GPUs can potentially deliver superior performance on a broad range of problems than their CPU counterparts, but effectively mapping complex applications to a parallel programming model with an emerging programming environment is a significant and important research problem.
Liquid Cooling System for CPU by Electroconjugate Fluid

Directory of Open Access Journals (Sweden)

Yasuo Sakurai

2014-06-01

Full Text Available The dissipated power of CPU for personal computer has been increased because the performance of personal computer becomes higher. Therefore, a liquid cooling system has been employed in some personal computers in order to improve their cooling performance. Electroconjugate fluid (ECF is one of the functional fluids. ECF has a remarkable property that a strong jet flow is generated between electrodes when a high voltage is applied to ECF through the electrodes. By using this strong jet flow, an ECF-pump with simple structure, no sliding portion, no noise, and no vibration seems to be able to be developed. And then, by the use of the ECF-pump, a new liquid cooling system by ECF seems to be realized. In this study, to realize this system, an ECF-pump is proposed and fabricated to investigate the basic characteristics of the ECF-pump experimentally. Next, by utilizing the ECF-pump, a model of a liquid cooling system by ECF is manufactured and some experiments are carried out to investigate the performance of this system. As a result, by using this system, the temperature of heat source of 50 W is kept at 60°C or less. In general, CPU is usually used at this temperature or less.
Online transcranial Doppler ultrasonographic control of an onscreen keyboard

Directory of Open Access Journals (Sweden)

Jie eLu

2014-04-01

Full Text Available Brain-computer interface (BCI systems exploit brain activity for generating a control command and may be used by individuals with severe motor disabilities as an alternative means of communication. An emerging brain monitoring modality for BCI development is transcranial Doppler ultrasonography (TCD, which facilitates the tracking of cerebral blood flow velocities associated with mental tasks. However, TCD-BCI studies to date have exclusively been offline. The feasibility of a TCD-based BCI system hinges on its online performance. In this paper, an online TCD-BCI system was implemented, bilaterally tracking blood flow velocities in the middle cerebral arteries for system-paced control of a scanning keyboard. Target letters or words were selected by repetitively rehearsing the spelling while imagining the writing of the intended word, a left-lateralized task. Undesired letters or words were bypassed by performing visual tracking, a non-lateralized task. The keyboard scanning period was 15s. With 10 able-bodied right-handed young adults, the two mental tasks were differentiated online using a Naïve Bayes classification algorithm and a set of time-domain, user-dependent features. The system achieved an average specificity and sensitivity of 81.44 ± 8.35% and 82.30 ± 7.39%, respectively. The level of agreement between the intended and machine-predicted selections was moderate (=0.60. The average information transfer rate was 0.87 bits/min with an average throughput of 0.31 ± 0.12 character/min. These findings suggest that an online TCD-BCI can achieve reasonable accuracies with an intuitive language task, but with modest throughput. Future interface and signal classification enhancements are required to improve communication rate.
CPU time reduction strategies for the Lambda modes calculation of a nuclear power reactor

Energy Technology Data Exchange (ETDEWEB)

Vidal, V.; Garayoa, J.; Hernandez, V. [Universidad Politecnica de Valencia (Spain). Dept. de Sistemas Informaticos y Computacion; Navarro, J.; Verdu, G.; Munoz-Cobo, J.L. [Universidad Politecnica de Valencia (Spain). Dept. de Ingenieria Quimica y Nuclear; Ginestar, D. [Universidad Politecnica de Valencia (Spain). Dept. de Matematica Aplicada

1997-12-01

In this paper, we present two strategies to reduce the CPU time spent in the lambda modes calculation for a realistic nuclear power reactor.The discretization of the multigroup neutron diffusion equation has been made using a nodal collocation method, solving the associated eigenvalue problem with two different techniques: the Subspace Iteration Method and Arnoldi`s Method. CPU time reduction is based on a coarse grain parallelization approach together with a multistep algorithm to initialize adequately the solution. (author). 9 refs., 6 tabs.
Research on the Prediction Model of CPU Utilization Based on ARIMA-BP Neural Network

Directory of Open Access Journals (Sweden)

Wang Jina

2016-01-01

Full Text Available The dynamic deployment technology of the virtual machine is one of the current cloud computing research focuses. The traditional methods mainly work after the degradation of the service performance that usually lag. To solve the problem a new prediction model based on the CPU utilization is constructed in this paper. A reference offered by the new prediction model of the CPU utilization is provided to the VM dynamic deployment process which will speed to finish the deployment process before the degradation of the service performance. By this method it not only ensure the quality of services but also improve the server performance and resource utilization. The new prediction method of the CPU utilization based on the ARIMA-BP neural network mainly include four parts: preprocess the collected data, build the predictive model of ARIMA-BP neural network, modify the nonlinear residuals of the time series by the BP prediction algorithm and obtain the prediction results by analyzing the above data comprehensively.
Keyboards for inputting Japanese language -A study based on US patents

OpenAIRE

Mishra, Umakant

2013-01-01

The most commonly used Japanese alphabets are Kanji, Hiragana and Katakana. The Kanji alphabet includes pictographs or ideographic characters that were adopted from the Chinese alphabet. Hiragana is used to spell words of Japanese origin, while Katakana is used to spell words of western or other foreign origin. Two methods are commonly used to input Japanese to the computer. One, the 'kana input method' that uses a keyboard having 46 Japanese iroha (or kana) letter keys. The other method is '...
Heterogeneous CPU-GPU moving targets detection for UAV video

Science.gov (United States)

Li, Maowen; Tang, Linbo; Han, Yuqi; Yu, Chunlei; Zhang, Chao; Fu, Huiquan

2017-07-01

Moving targets detection is gaining popularity in civilian and military applications. On some monitoring platform of motion detection, some low-resolution stationary cameras are replaced by moving HD camera based on UAVs. The pixels of moving targets in the HD Video taken by UAV are always in a minority, and the background of the frame is usually moving because of the motion of UAVs. The high computational cost of the algorithm prevents running it at higher resolutions the pixels of frame. Hence, to solve the problem of moving targets detection based UAVs video, we propose a heterogeneous CPU-GPU moving target detection algorithm for UAV video. More specifically, we use background registration to eliminate the impact of the moving background and frame difference to detect small moving targets. In order to achieve the effect of real-time processing, we design the solution of heterogeneous CPU-GPU framework for our method. The experimental results show that our method can detect the main moving targets from the HD video taken by UAV, and the average process time is 52.16ms per frame which is fast enough to solve the problem.
CPU0213, a novel endothelin type A and type B receptor antagonist, protects against myocardial ischemia/reperfusion injury in rats

Directory of Open Access Journals (Sweden)

Z.Y. Wang

2011-11-01

Full Text Available The efficacy of endothelin receptor antagonists in protecting against myocardial ischemia/reperfusion (I/R injury is controversial, and the mechanisms remain unclear. The aim of this study was to investigate the effects of CPU0123, a novel endothelin type A and type B receptor antagonist, on myocardial I/R injury and to explore the mechanisms involved. Male Sprague-Dawley rats weighing 200-250 g were randomized to three groups (6-7 per group: group 1, Sham; group 2, I/R + vehicle. Rats were subjected to in vivo myocardial I/R injury by ligation of the left anterior descending coronary artery and 0.5% sodium carboxymethyl cellulose (1 mL/kg was injected intraperitoneally immediately prior to coronary occlusion. Group 3, I/R + CPU0213. Rats were subjected to identical surgical procedures and CPU0213 (30 mg/kg was injected intraperitoneally immediately prior to coronary occlusion. Infarct size, cardiac function and biochemical changes were measured. CPU0213 pretreatment reduced infarct size as a percentage of the ischemic area by 44.5% (I/R + vehicle: 61.3 ± 3.2 vs I/R + CPU0213: 34.0 ± 5.5%, P < 0.05 and improved ejection fraction by 17.2% (I/R + vehicle: 58.4 ± 2.8 vs I/R + CPU0213: 68.5 ± 2.2%, P < 0.05 compared to vehicle-treated animals. This protection was associated with inhibition of myocardial inflammation and oxidative stress. Moreover, reduction in Akt (protein kinase B and endothelial nitric oxide synthase (eNOS phosphorylation induced by myocardial I/R injury was limited by CPU0213 (P < 0.05. These data suggest that CPU0123, a non-selective antagonist, has protective effects against myocardial I/R injury in rats, which may be related to the Akt/eNOS pathway.
Improving the Performance of CPU Architectures by Reducing the Operating System Overhead (Extended Version

Directory of Open Access Journals (Sweden)

Zagan Ionel

2016-07-01

Full Text Available The predictable CPU architectures that run hard real-time tasks must be executed with isolation in order to provide a timing-analyzable execution for real-time systems. The major problems for real-time operating systems are determined by an excessive jitter, introduced mainly through task switching. This can alter deadline requirements, and, consequently, the predictability of hard real-time tasks. New requirements also arise for a real-time operating system used in mixed-criticality systems, when the executions of hard real-time applications require timing predictability. The present article discusses several solutions to improve the performance of CPU architectures and eventually overcome the Operating Systems overhead inconveniences. This paper focuses on the innovative CPU implementation named nMPRA-MT, designed for small real-time applications. This implementation uses the replication and remapping techniques for the program counter, general purpose registers and pipeline registers, enabling multiple threads to share a single pipeline assembly line. In order to increase predictability, the proposed architecture partially removes the hazard situation at the expense of larger execution latency per one instruction.
High performance technique for database applicationsusing a hybrid GPU/CPU platform

KAUST Repository

Zidan, Mohammed A.; Bonny, Talal; Salama, Khaled N.

2012-01-01

Hybrid GPU/CPU platform. In particular, our technique solves the problem of the low efficiency result- ing from running short-length sequences in a database on a GPU. To verify our technique, we applied it to the widely used Smith-Waterman algorithm
Design Patterns for Sparse-Matrix Computations on Hybrid CPU/GPU Platforms

Directory of Open Access Journals (Sweden)

Valeria Cardellini

2014-01-01

Full Text Available We apply object-oriented software design patterns to develop code for scientific software involving sparse matrices. Design patterns arise when multiple independent developments produce similar designs which converge onto a generic solution. We demonstrate how to use design patterns to implement an interface for sparse matrix computations on NVIDIA GPUs starting from PSBLAS, an existing sparse matrix library, and from existing sets of GPU kernels for sparse matrices. We also compare the throughput of the PSBLAS sparse matrix–vector multiplication on two platforms exploiting the GPU with that obtained by a CPU-only PSBLAS implementation. Our experiments exhibit encouraging results regarding the comparison between CPU and GPU executions in double precision, obtaining a speedup of up to 35.35 on NVIDIA GTX 285 with respect to AMD Athlon 7750, and up to 10.15 on NVIDIA Tesla C2050 with respect to Intel Xeon X5650.
Using Mouse and Keyboard Dynamics to Detect Cognitive Stress During Mental Arithmetic

OpenAIRE

Ayesh, Aladdin, 1972-; Stacey, Martin; Lim, Yee Mei

2015-01-01

To build a personalized e-learning system that can deliver adaptive learning content based on student’s cognitive effort and efficiency, it is important to develop a construct that can help measuring perceived mental state, such as stress and cognitive load. The construct must be able to be quantified, computerized and automated. Our research investigates how mouse and keyboard dynamics analyses could be used to detect cognitive stress, which is induced by high mental arithmetic demand with t...
DSM vs. NSM: CPU Performance Tradeoffs in Block-Oriented Query Processing

NARCIS (Netherlands)

M. Zukowski (Marcin); N.J. Nes (Niels); P.A. Boncz (Peter)

2008-01-01

textabstractComparisons between the merits of row-wise storage (NSM) and columnar storage (DSM) are typically made with respect to the persistent storage layer of database systems. In this paper, however, we focus on the CPU efficiency tradeoffs of tuple representations inside the query
76 FR 58138 - Defense Federal Acquisition Regulation Supplement (DFARS); Alternative Line Item Structure (DFARS...

Science.gov (United States)

2011-09-20

... DoD published a proposed rule in the Federal Register at 76 FR 21847 on April 19, 2011, to add DFARS..., the contract line item may be for a desktop computer, but the actual items delivered, invoiced, and..., Desktop with 20 EA CPU, Monitor, Keyboard and Mouse. Alternative line-item structure offer where monitors...
Performance of the OVERFLOW-MLP and LAURA-MLP CFD Codes on the NASA Ames 512 CPU Origin System

Science.gov (United States)

Taft, James R.

2000-01-01

The shared memory Multi-Level Parallelism (MLP) technique, developed last year at NASA Ames has been very successful in dramatically improving the performance of important NASA CFD codes. This new and very simple parallel programming technique was first inserted into the OVERFLOW production CFD code in FY 1998. The OVERFLOW-MLP code's parallel performance scaled linearly to 256 CPUs on the NASA Ames 256 CPU Origin 2000 system (steger). Overall performance exceeded 20.1 GFLOP/s, or about 4.5x the performance of a dedicated 16 CPU C90 system. All of this was achieved without any major modification to the original vector based code. The OVERFLOW-MLP code is now in production on the inhouse Origin systems as well as being used offsite at commercial aerospace companies. Partially as a result of this work, NASA Ames has purchased a new 512 CPU Origin 2000 system to further test the limits of parallel performance for NASA codes of interest. This paper presents the performance obtained from the latest optimization efforts on this machine for the LAURA-MLP and OVERFLOW-MLP codes. The Langley Aerothermodynamics Upwind Relaxation Algorithm (LAURA) code is a key simulation tool in the development of the next generation shuttle, interplanetary reentry vehicles, and nearly all "X" plane development. This code sustains about 4-5 GFLOP/s on a dedicated 16 CPU C90. At this rate, expected workloads would require over 100 C90 CPU years of computing over the next few calendar years. It is not feasible to expect that this would be affordable or available to the user community. Dramatic performance gains on cheaper systems are needed. This code is expected to be perhaps the largest consumer of NASA Ames compute cycles per run in the coming year.The OVERFLOW CFD code is extensively used in the government and commercial aerospace communities to evaluate new aircraft designs. It is one of the largest consumers of NASA supercomputing cycles and large simulations of highly resolved full

HEP specific benchmarks of virtual machines on multi-core CPU architectures

International Nuclear Information System (INIS)

Alef, M; Gable, I

2010-01-01

Virtualization technologies such as Xen can be used in order to satisfy the disparate and often incompatible system requirements of different user groups in shared-use computing facilities. This capability is particularly important for HEP applications, which often have restrictive requirements. The use of virtualization adds flexibility, however, it is essential that the virtualization technology place little overhead on the HEP application. We present an evaluation of the practicality of running HEP applications in multiple Virtual Machines (VMs) on a single multi-core Linux system. We use the benchmark suite used by the HEPiX CPU Benchmarking Working Group to give a quantitative evaluation relevant to the HEP community. Benchmarks are packaged inside VMs and then the VMs are booted onto a single multi-core system. Benchmarks are then simultaneously executed on each VM to simulate highly loaded VMs running HEP applications. These techniques are applied to a variety of multi-core CPU architectures and VM configurations.
GENIE: a software package for gene-gene interaction analysis in genetic association studies using multiple GPU or CPU cores

Directory of Open Access Journals (Sweden)

Wang Kai

2011-05-01

Full Text Available Abstract Background Gene-gene interaction in genetic association studies is computationally intensive when a large number of SNPs are involved. Most of the latest Central Processing Units (CPUs have multiple cores, whereas Graphics Processing Units (GPUs also have hundreds of cores and have been recently used to implement faster scientific software. However, currently there are no genetic analysis software packages that allow users to fully utilize the computing power of these multi-core devices for genetic interaction analysis for binary traits. Findings Here we present a novel software package GENIE, which utilizes the power of multiple GPU or CPU processor cores to parallelize the interaction analysis. GENIE reads an entire genetic association study dataset into memory and partitions the dataset into fragments with non-overlapping sets of SNPs. For each fragment, GENIE analyzes: 1 the interaction of SNPs within it in parallel, and 2 the interaction between the SNPs of the current fragment and other fragments in parallel. We tested GENIE on a large-scale candidate gene study on high-density lipoprotein cholesterol. Using an NVIDIA Tesla C1060 graphics card, the GPU mode of GENIE achieves a speedup of 27 times over its single-core CPU mode run. Conclusions GENIE is open-source, economical, user-friendly, and scalable. Since the computing power and memory capacity of graphics cards are increasing rapidly while their cost is going down, we anticipate that GENIE will achieve greater speedups with faster GPU cards. Documentation, source code, and precompiled binaries can be downloaded from http://www.cceb.upenn.edu/~mli/software/GENIE/.
The Influence of Emotion on Keyboard Typing: An Experimental Study Using Auditory Stimuli.

Directory of Open Access Journals (Sweden)

Po-Ming Lee

Full Text Available In recent years, a novel approach for emotion recognition has been reported, which is by keystroke dynamics. The advantages of using this approach are that the data used is rather non-intrusive and easy to obtain. However, there were only limited investigations about the phenomenon itself in previous studies. Hence, this study aimed to examine the source of variance in keyboard typing patterns caused by emotions. A controlled experiment to collect subjects' keystroke data in different emotional states induced by International Affective Digitized Sounds (IADS was conducted. Two-way Valence (3 x Arousal (3 ANOVAs was used to examine the collected dataset. The results of the experiment indicate that the effect of arousal is significant in keystroke duration (p < .05, keystroke latency (p < .01, but not in the accuracy rate of keyboard typing. The size of the emotional effect is small, compared to the individual variability. Our findings support the conclusion that the keystroke duration and latency are influenced by arousal. The finding about the size of the effect suggests that the accuracy rate of emotion recognition technology could be further improved if personalized models are utilized. Notably, the experiment was conducted using standard instruments and hence is expected to be highly reproducible.
Effects of Optimizing the Scan-Path on Scanning Keyboards with QWERTY-Layout for English Text.

Science.gov (United States)

Sandnes, Frode Eika; Medola, Fausto Orsi

2017-01-01

Scanning keyboards can be essential tools for individuals with reduced motor function. However, most research addresses layout optimization. Learning new layouts is time-consuming. This study explores the familiar QWERTY layout with alternative scanning paths intended for English text. The results show that carefully designed scan-paths can help QWERTY nearly match optimized layouts in performance.
Enhancing Leakage Power in CPU Cache Using Inverted Architecture

OpenAIRE

Bilal A. Shehada; Ahmed M. Serdah; Aiman Abu Samra

2013-01-01

Power consumption is an increasingly pressing problem in modern processor design. Since the on-chip caches usually consume a significant amount of power so power and energy consumption parameters have become one of the most important design constraint. It is one of the most attractive targets for power reduction. This paper presents an approach to enhance the dynamic power consumption of CPU cache using inverted cache architecture. Our assumption tries to reduce dynamic write power dissipatio...
Performance analysis of the FDTD method applied to holographic volume gratings: Multi-core CPU versus GPU computing

Science.gov (United States)

Francés, J.; Bleda, S.; Neipp, C.; Márquez, A.; Pascual, I.; Beléndez, A.

2013-03-01

The finite-difference time-domain method (FDTD) allows electromagnetic field distribution analysis as a function of time and space. The method is applied to analyze holographic volume gratings (HVGs) for the near-field distribution at optical wavelengths. Usually, this application requires the simulation of wide areas, which implies more memory and time processing. In this work, we propose a specific implementation of the FDTD method including several add-ons for a precise simulation of optical diffractive elements. Values in the near-field region are computed considering the illumination of the grating by means of a plane wave for different angles of incidence and including absorbing boundaries as well. We compare the results obtained by FDTD with those obtained using a matrix method (MM) applied to diffraction gratings. In addition, we have developed two optimized versions of the algorithm, for both CPU and GPU, in order to analyze the improvement of using the new NVIDIA Fermi GPU architecture versus highly tuned multi-core CPU as a function of the size simulation. In particular, the optimized CPU implementation takes advantage of the arithmetic and data transfer streaming SIMD (single instruction multiple data) extensions (SSE) included explicitly in the code and also of multi-threading by means of OpenMP directives. A good agreement between the results obtained using both FDTD and MM methods is obtained, thus validating our methodology. Moreover, the performance of the GPU is compared to the SSE+OpenMP CPU implementation, and it is quantitatively determined that a highly optimized CPU program can be competitive for a wider range of simulation sizes, whereas GPU computing becomes more powerful for large-scale simulations.
A Novel CPU/GPU Simulation Environment for Large-Scale Biologically-Realistic Neural Modeling

Directory of Open Access Journals (Sweden)

Roger V Hoang

2013-10-01

Full Text Available Computational Neuroscience is an emerging field that provides unique opportunities to studycomplex brain structures through realistic neural simulations. However, as biological details are added tomodels, the execution time for the simulation becomes longer. Graphics Processing Units (GPUs are now being utilized to accelerate simulations due to their ability to perform computations in parallel. As such, they haveshown significant improvement in execution time compared to Central Processing Units (CPUs. Most neural simulators utilize either multiple CPUs or a single GPU for better performance, but still show limitations in execution time when biological details are not sacrificed. Therefore, we present a novel CPU/GPU simulation environment for large-scale biological networks,the NeoCortical Simulator version 6 (NCS6. NCS6 is a free, open-source, parallelizable, and scalable simula-tor, designed to run on clusters of multiple machines, potentially with high performance computing devicesin each of them. It has built-in leaky-integrate-and-fire (LIF and Izhikevich (IZH neuron models, but usersalso have the capability to design their own plug-in interface for different neuron types as desired. NCS6is currently able to simulate one million cells and 100 million synapses in quasi real time by distributing dataacross these heterogeneous clusters of CPUs and GPUs.
Electronic keyboard instruments as a helping tool in the process of teaching music

OpenAIRE

Rosiński, Adam

2012-01-01

The following article shows the usage of new technology in the widely understood music teaching in schools of general profile. Innovative usage of electronic keyboard instruments in music lessons on a significant level expands children’s and teenagers’ musicality and music sensitivity, which was proven with research and observations. The usage of new tools by an educator will influence the quality of performed service so that they can meet the criteria that support the course of lesson. Ch...
Designing of Vague Logic Based 2-Layered Framework for CPU Scheduler

Directory of Open Access Journals (Sweden)

Supriya Raheja

2016-01-01

Full Text Available Fuzzy based CPU scheduler has become of great interest by operating system because of its ability to handle imprecise information associated with task. This paper introduces an extension to the fuzzy based round robin scheduler to a Vague Logic Based Round Robin (VBRR scheduler. VBRR scheduler works on 2-layered framework. At the first layer, scheduler has a vague inference system which has the ability to handle the impreciseness of task using vague logic. At the second layer, Vague Logic Based Round Robin (VBRR scheduling algorithm works to schedule the tasks. VBRR scheduler has the learning capability based on which scheduler adapts intelligently an optimum length for time quantum. An optimum time quantum reduces the overhead on scheduler by reducing the unnecessary context switches which lead to improve the overall performance of system. The work is simulated using MATLAB and compared with the conventional round robin scheduler and the other two fuzzy based approaches to CPU scheduler. Given simulation analysis and results prove the effectiveness and efficiency of VBRR scheduler.
A Brain-Computer Interface (BCI) system to use arbitrary Windows applications by directly controlling mouse and keyboard.

Science.gov (United States)

Spuler, Martin

2015-08-01

A Brain-Computer Interface (BCI) allows to control a computer by brain activity only, without the need for muscle control. In this paper, we present an EEG-based BCI system based on code-modulated visual evoked potentials (c-VEPs) that enables the user to work with arbitrary Windows applications. Other BCI systems, like the P300 speller or BCI-based browsers, allow control of one dedicated application designed for use with a BCI. In contrast, the system presented in this paper does not consist of one dedicated application, but enables the user to control mouse cursor and keyboard input on the level of the operating system, thereby making it possible to use arbitrary applications. As the c-VEP BCI method was shown to enable very fast communication speeds (writing more than 20 error-free characters per minute), the presented system is the next step in replacing the traditional mouse and keyboard and enabling complete brain-based control of a computer.
Understanding What It Means for Older Students to Learn Basic Musical Skills on a Keyboard Instrument

Science.gov (United States)

Taylor, Angela; Hallam, Susan

2008-01-01

Although many adults take up or return to instrumental and vocal tuition every year, we know very little about how they experience it. As part of ongoing case study research, eight older learners with modest keyboard skills explored what their musical skills meant to them during conversation-based repertory grid interviews. The data were…
Fast CPU-based Monte Carlo simulation for radiotherapy dose calculation

Science.gov (United States)

Ziegenhein, Peter; Pirner, Sven; Kamerling, Cornelis Ph; Oelfke, Uwe

2015-08-01

Monte-Carlo (MC) simulations are considered to be the most accurate method for calculating dose distributions in radiotherapy. Its clinical application, however, still is limited by the long runtimes conventional implementations of MC algorithms require to deliver sufficiently accurate results on high resolution imaging data. In order to overcome this obstacle we developed the software-package PhiMC, which is capable of computing precise dose distributions in a sub-minute time-frame by leveraging the potential of modern many- and multi-core CPU-based computers. PhiMC is based on the well verified dose planning method (DPM). We could demonstrate that PhiMC delivers dose distributions which are in excellent agreement to DPM. The multi-core implementation of PhiMC scales well between different computer architectures and achieves a speed-up of up to 37× compared to the original DPM code executed on a modern system. Furthermore, we could show that our CPU-based implementation on a modern workstation is between 1.25× and 1.95× faster than a well-known GPU implementation of the same simulation method on a NVIDIA Tesla C2050. Since CPUs work on several hundreds of GB RAM the typical GPU memory limitation does not apply for our implementation and high resolution clinical plans can be calculated.
Exploiting graphics processing units for computational biology and bioinformatics.

Science.gov (United States)

Payne, Joshua L; Sinnott-Armstrong, Nicholas A; Moore, Jason H

2010-09-01

Advances in the video gaming industry have led to the production of low-cost, high-performance graphics processing units (GPUs) that possess more memory bandwidth and computational capability than central processing units (CPUs), the standard workhorses of scientific computing. With the recent release of generalpurpose GPUs and NVIDIA's GPU programming language, CUDA, graphics engines are being adopted widely in scientific computing applications, particularly in the fields of computational biology and bioinformatics. The goal of this article is to concisely present an introduction to GPU hardware and programming, aimed at the computational biologist or bioinformaticist. To this end, we discuss the primary differences between GPU and CPU architecture, introduce the basics of the CUDA programming language, and discuss important CUDA programming practices, such as the proper use of coalesced reads, data types, and memory hierarchies. We highlight each of these topics in the context of computing the all-pairs distance between instances in a dataset, a common procedure in numerous disciplines of scientific computing. We conclude with a runtime analysis of the GPU and CPU implementations of the all-pairs distance calculation. We show our final GPU implementation to outperform the CPU implementation by a factor of 1700.
A Robust Ultra-Low Voltage CPU Utilizing Timing-Error Prevention

OpenAIRE

Hiienkari, Markus; Teittinen, Jukka; Koskinen, Lauri; Turnquist, Matthew; Mäkipää, Jani; Rantala, Arto; Sopanen, Matti; Kaltiokallio, Mikko

2015-01-01

To minimize energy consumption of a digital circuit, logic can be operated at sub- or near-threshold voltage. Operation at this region is challenging due to device and environment variations, and resulting performance may not be adequate to all applications. This article presents two variants of a 32-bit RISC CPU targeted for near-threshold voltage. Both CPUs are placed on the same die and manufactured in 28 nm CMOS process. They employ timing-error prevention with clock stretching to enable ...
The “Chimera”: An Off-The-Shelf CPU/GPGPU/FPGA Hybrid Computing Platform

Directory of Open Access Journals (Sweden)

Ra Inta

2012-01-01

Full Text Available The nature of modern astronomy means that a number of interesting problems exhibit a substantial computational bound and this situation is gradually worsening. Scientists, increasingly fighting for valuable resources on conventional high-performance computing (HPC facilities—often with a limited customizable user environment—are increasingly looking to hardware acceleration solutions. We describe here a heterogeneous CPU/GPGPU/FPGA desktop computing system (the “Chimera”, built with commercial-off-the-shelf components. We show that this platform may be a viable alternative solution to many common computationally bound problems found in astronomy, however, not without significant challenges. The most significant bottleneck in pipelines involving real data is most likely to be the interconnect (in this case the PCI Express bus residing on the CPU motherboard. Finally, we speculate on the merits of our Chimera system on the entire landscape of parallel computing, through the analysis of representative problems from UC Berkeley’s “Thirteen Dwarves.”
Cpu/gpu Computing for AN Implicit Multi-Block Compressible Navier-Stokes Solver on Heterogeneous Platform

Science.gov (United States)

Deng, Liang; Bai, Hanli; Wang, Fang; Xu, Qingxin

2016-06-01

CPU/GPU computing allows scientists to tremendously accelerate their numerical codes. In this paper, we port and optimize a double precision alternating direction implicit (ADI) solver for three-dimensional compressible Navier-Stokes equations from our in-house Computational Fluid Dynamics (CFD) software on heterogeneous platform. First, we implement a full GPU version of the ADI solver to remove a lot of redundant data transfers between CPU and GPU, and then design two fine-grain schemes, namely “one-thread-one-point” and “one-thread-one-line”, to maximize the performance. Second, we present a dual-level parallelization scheme using the CPU/GPU collaborative model to exploit the computational resources of both multi-core CPUs and many-core GPUs within the heterogeneous platform. Finally, considering the fact that memory on a single node becomes inadequate when the simulation size grows, we present a tri-level hybrid programming pattern MPI-OpenMP-CUDA that merges fine-grain parallelism using OpenMP and CUDA threads with coarse-grain parallelism using MPI for inter-node communication. We also propose a strategy to overlap the computation with communication using the advanced features of CUDA and MPI programming. We obtain speedups of 6.0 for the ADI solver on one Tesla M2050 GPU in contrast to two Xeon X5670 CPUs. Scalability tests show that our implementation can offer significant performance improvement on heterogeneous platform.
Acceleration of stereo-matching on multi-core CPU and GPU

OpenAIRE

Tian, Xu; Cockshott, Paul; Oehler, Susanne

2014-01-01

This paper presents an accelerated version of a\\ud dense stereo-correspondence algorithm for two different parallelism\\ud enabled architectures, multi-core CPU and GPU. The\\ud algorithm is part of the vision system developed for a binocular\\ud robot-head in the context of the CloPeMa 1 research project.\\ud This research project focuses on the conception of a new clothes\\ud folding robot with real-time and high resolution requirements\\ud for the vision system. The performance analysis shows th...
Improvement of CPU time of Linear Discriminant Function based on MNM criterion by IP

Directory of Open Access Journals (Sweden)

Shuichi Shinmura

2014-05-01

Full Text Available Revised IP-OLDF (optimal linear discriminant function by integer programming is a linear discriminant function to minimize the number of misclassifications (NM of training samples by integer programming (IP. However, IP requires large computation (CPU time. In this paper, it is proposed how to reduce CPU time by using linear programming (LP. In the first phase, Revised LP-OLDF is applied to all cases, and all cases are categorized into two groups: those that are classified correctly or those that are not classified by support vectors (SVs. In the second phase, Revised IP-OLDF is applied to the misclassified cases by SVs. This method is called Revised IPLP-OLDF.In this research, it is evaluated whether NM of Revised IPLP-OLDF is good estimate of the minimum number of misclassifications (MNM by Revised IP-OLDF. Four kinds of the real data—Iris data, Swiss bank note data, student data, and CPD data—are used as training samples. Four kinds of 20,000 re-sampling cases generated from these data are used as the evaluation samples. There are a total of 149 models of all combinations of independent variables by these data. NMs and CPU times of the 149 models are compared with Revised IPLP-OLDF and Revised IP-OLDF. The following results are obtained: 1 Revised IPLP-OLDF significantly improves CPU time. 2 In the case of training samples, all 149 NMs of Revised IPLP-OLDF are equal to the MNM of Revised IP-OLDF. 3 In the case of evaluation samples, most NMs of Revised IPLP-OLDF are equal to NM of Revised IP-OLDF. 4 Generalization abilities of both discriminant functions are concluded to be high, because the difference between the error rates of training and evaluation samples are almost within 2%. Therefore, Revised IPLP-OLDF is recommended for the analysis of big data instead of Revised IP-OLDF. Next, Revised IPLP-OLDF is compared with LDF and logistic regression by 100-fold cross validation using 100 re-sampling samples. Means of error rates of
Study on efficiency of time computation in x-ray imaging simulation base on Monte Carlo algorithm using graphics processing unit

International Nuclear Information System (INIS)

Setiani, Tia Dwi; Suprijadi; Haryanto, Freddy

2016-01-01

Monte Carlo (MC) is one of the powerful techniques for simulation in x-ray imaging. MC method can simulate the radiation transport within matter with high accuracy and provides a natural way to simulate radiation transport in complex systems. One of the codes based on MC algorithm that are widely used for radiographic images simulation is MC-GPU, a codes developed by Andrea Basal. This study was aimed to investigate the time computation of x-ray imaging simulation in GPU (Graphics Processing Unit) compared to a standard CPU (Central Processing Unit). Furthermore, the effect of physical parameters to the quality of radiographic images and the comparison of image quality resulted from simulation in the GPU and CPU are evaluated in this paper. The simulations were run in CPU which was simulated in serial condition, and in two GPU with 384 cores and 2304 cores. In simulation using GPU, each cores calculates one photon, so, a large number of photon were calculated simultaneously. Results show that the time simulations on GPU were significantly accelerated compared to CPU. The simulations on the 2304 core of GPU were performed about 64 -114 times faster than on CPU, while the simulation on the 384 core of GPU were performed about 20 – 31 times faster than in a single core of CPU. Another result shows that optimum quality of images from the simulation was gained at the history start from 10"8 and the energy from 60 Kev to 90 Kev. Analyzed by statistical approach, the quality of GPU and CPU images are relatively the same.
Study on efficiency of time computation in x-ray imaging simulation base on Monte Carlo algorithm using graphics processing unit

Energy Technology Data Exchange (ETDEWEB)

Setiani, Tia Dwi, E-mail: tiadwisetiani@gmail.com [Computational Science, Faculty of Mathematics and Natural Sciences, Institut Teknologi Bandung Jalan Ganesha 10 Bandung, 40132 (Indonesia); Suprijadi [Computational Science, Faculty of Mathematics and Natural Sciences, Institut Teknologi Bandung Jalan Ganesha 10 Bandung, 40132 (Indonesia); Nuclear Physics and Biophysics Reaserch Division, Faculty of Mathematics and Natural Sciences, Institut Teknologi Bandung Jalan Ganesha 10 Bandung, 40132 (Indonesia); Haryanto, Freddy [Nuclear Physics and Biophysics Reaserch Division, Faculty of Mathematics and Natural Sciences, Institut Teknologi Bandung Jalan Ganesha 10 Bandung, 40132 (Indonesia)

2016-03-11

Monte Carlo (MC) is one of the powerful techniques for simulation in x-ray imaging. MC method can simulate the radiation transport within matter with high accuracy and provides a natural way to simulate radiation transport in complex systems. One of the codes based on MC algorithm that are widely used for radiographic images simulation is MC-GPU, a codes developed by Andrea Basal. This study was aimed to investigate the time computation of x-ray imaging simulation in GPU (Graphics Processing Unit) compared to a standard CPU (Central Processing Unit). Furthermore, the effect of physical parameters to the quality of radiographic images and the comparison of image quality resulted from simulation in the GPU and CPU are evaluated in this paper. The simulations were run in CPU which was simulated in serial condition, and in two GPU with 384 cores and 2304 cores. In simulation using GPU, each cores calculates one photon, so, a large number of photon were calculated simultaneously. Results show that the time simulations on GPU were significantly accelerated compared to CPU. The simulations on the 2304 core of GPU were performed about 64 -114 times faster than on CPU, while the simulation on the 384 core of GPU were performed about 20 – 31 times faster than in a single core of CPU. Another result shows that optimum quality of images from the simulation was gained at the history start from 10{sup 8} and the energy from 60 Kev to 90 Kev. Analyzed by statistical approach, the quality of GPU and CPU images are relatively the same.

Promise of a low power mobile CPU based embedded system in artificial leg control.

Science.gov (United States)

Hernandez, Robert; Zhang, Fan; Zhang, Xiaorong; Huang, He; Yang, Qing

2012-01-01

This paper presents the design and implementation of a low power embedded system using mobile processor technology (Intel Atom™ Z530 Processor) specifically tailored for a neural-machine interface (NMI) for artificial limbs. This embedded system effectively performs our previously developed NMI algorithm based on neuromuscular-mechanical fusion and phase-dependent pattern classification. The analysis shows that NMI embedded system can meet real-time constraints with high accuracies for recognizing the user's locomotion mode. Our implementation utilizes the mobile processor efficiently to allow a power consumption of 2.2 watts and low CPU utilization (less than 4.3%) while executing the complex NMI algorithm. Our experiments have shown that the highly optimized C program implementation on the embedded system has superb advantages over existing PC implementations on MATLAB. The study results suggest that mobile-CPU-based embedded system is promising for implementing advanced control for powered lower limb prostheses.
CPU SIM: A Computer Simulator for Use in an Introductory Computer Organization-Architecture Class.

Science.gov (United States)

Skrein, Dale

1994-01-01

CPU SIM, an interactive low-level computer simulation package that runs on the Macintosh computer, is described. The program is designed for instructional use in the first or second year of undergraduate computer science, to teach various features of typical computer organization through hands-on exercises. (MSE)
Lodovico Giustini and the Emergence of the Keyboard Sonata in Italy

Directory of Open Access Journals (Sweden)

Freeman, Daniel E.

2003-12-01

Full Text Available The twelve keyboard sonatas, Op. 1, of Ludovico Giustini (1685-1743 constitute the earliest music explicitly indicated for performance on the pianoforte. They are attractive compositions in early classic style that exhibit an interesting mixture of influences from Italian keyboard music, the Italian violin sonata, and French harpsichord music. Their unusual format of dances, contrapuntal excursions, and novelties in four or five movements appears to have been inspired by the Op. 1 violin sonatas of Francesco Veracini, a fellow Tuscan. Although the only source of the sonatas is a print dated Florence, 1732, it is clear that the print could only have appeared between 1734 and 1740. It was probably disseminated out of Lisbon, not Florence, as a result of the patronage of the Infante Antonio of Portugal and Dom João de Seixas, a prominent courtier in Lisbon during the late 1730's.

Las doce sonatas para teclado, Op. 1, de Ludovico Giustini (1685-1743, constituyen la música más antigua explícitamente indicada para su interpretación en el pianoforte. Son composiciones atractivas en el estilo clásico temprano, que exhiben una interesante mezcla de influencias de la música italiana de tecla, la sonata italiana para violín y la música francesa para clave. Su inusual formato de danzas, sus excursiones contrapuntísticas, y novedades en cuatro o cinco movimientos, parecen haberse inspirado en las sonatas para violín Op. 1 del toscano Francesco Veracini. Aunque la única fuente de las sonatas es un impreso datado en Florencia, en 1732, está claro que el impreso sólo pudo haber aparecido entre 1734 y 1740. Fue posiblemente difundido a Lisboa, y no a Florencia, como resultado del mecenazgo del Infante Antonio de Portugal y Dom João de Seixas, relevante cortesano en Lisboa durante los últimos años de la década de 1730.
Der ATLAS LVL2-Trigger mit FPGA-Prozessoren : Entwicklung, Aufbau und Funktionsnachweis des hybriden FPGA/CPU-basierten Prozessorsystems ATLANTIS

CERN Document Server

Singpiel, Holger

2000-01-01

This thesis describes the conception and implementation of the hybrid FPGA/CPU based processing system ATLANTIS as trigger processor for the proposed ATLAS experiment at CERN. CompactPCI provides the close coupling of a multi FPGA system and a standard CPU. The system is scalable in computing power and flexible in use due to its partitioning into dedicated FPGA boards for computation, I/O tasks and a private communication. Main focus of the research activities based on the usage of the ATLANTIS system are two areas in the second level trigger (LVL2). First, the acceleration of time critical B physics trigger algorithms is the major aim. The execution of the full scan TRT algorithm on ATLANTIS, which has been used as a demonstrator, results in a speedup of 5.6 compared to a standard CPU. Next, the ATLANTIS system is used as a hardware platform for research work in conjunction with the ATLAS readout systems. For further studies a permanent installation of the ATLANTIS system in the LVL2 application testbed is f...
Fluency and Accuracy in Alphabet Writing by Keyboarding : A Cross-Sectional Study in Spanish-Speaking Children With and Without Learning Disabilities

NARCIS (Netherlands)

Bisschop, Elaine; Morales, Celia; Gil, Verónica; Jiménez-Suárez, Elizabeth

2017-01-01

The aim of this study was to analyze whether children with and without difficulties in handwriting, spelling, or both differed in alphabet writing when using a keyboard. The total sample consisted of 1,333 children from Grades 1 through 3. Scores on the spelling and handwriting factors from the
A Voice-Detecting Sensor and a Scanning Keyboard Emulator to Support Word Writing by Two Boys with Extensive Motor Disabilities

Science.gov (United States)

Lancioni, Giulio E.; Singh, Nirbhay N.; O'Reilly, Mark F.; Sigafoos, Jeff; Green, Vanessa; Chiapparino, Claudia; Stasolla, Fabrizio; Oliva, Doretta

2009-01-01

The present study assessed the use of a voice-detecting sensor interfaced with a scanning keyboard emulator to allow two boys with extensive motor disabilities to write. Specifically, the study (a) compared the effects of the voice-detecting sensor with those of a familiar pressure sensor on the boys' writing time, (b) checked which of the sensors…
A validation study of the Keyboard Personal Computer Style instrument (K-PeCS) for use with children.

Science.gov (United States)

Green, Dido; Meroz, Anat; Margalit, Adi Edit; Ratzon, Navah Z

2012-11-01

This study examines a potential instrument for measurement of typing postures of children. This paper describes inter-rater, test-retest reliability and concurrent validity of the Keyboard Personal Computer Style instrument (K-PeCS), an observational measurement of postures and movements during keyboarding, for use with children. Two trained raters independently rated videos of 24 children (aged 7-10 years). Six children returned one week later for identifying test-retest reliability. Concurrent validity was assessed by comparing ratings obtained using the K-PECS to scores from a 3D motion analysis system. Inter-rater reliability was moderate to high for 12 out of 16 items (Kappa: 0.46 to 1.00; correlation coefficients: 0.77-0.95) and test-retest reliability varied across items (Kappa: 0.25 to 0.67; correlation coefficients: r = 0.20 to r = 0.95). Concurrent validity compared favourably across arm pathlength, wrist extension and ulnar deviation. In light of the limitations of other tools the K-PeCS offers a fairly affordable, reliable and valid instrument to address the gap for measurement of typing styles of children, despite the shortcomings of some items. However further research is required to refine the instrument for use in evaluating typing among children. Copyright © 2012 Elsevier Ltd and The Ergonomics Society. All rights reserved.
Energy consumption optimization of the total-FETI solver by changing the CPU frequency

Science.gov (United States)

Horak, David; Riha, Lubomir; Sojka, Radim; Kruzik, Jakub; Beseda, Martin; Cermak, Martin; Schuchart, Joseph

2017-07-01

The energy consumption of supercomputers is one of the critical problems for the upcoming Exascale supercomputing era. The awareness of power and energy consumption is required on both software and hardware side. This paper deals with the energy consumption evaluation of the Finite Element Tearing and Interconnect (FETI) based solvers of linear systems, which is an established method for solving real-world engineering problems. We have evaluated the effect of the CPU frequency on the energy consumption of the FETI solver using a linear elasticity 3D cube synthetic benchmark. In this problem, we have evaluated the effect of frequency tuning on the energy consumption of the essential processing kernels of the FETI method. The paper provides results for two types of frequency tuning: (1) static tuning and (2) dynamic tuning. For static tuning experiments, the frequency is set before execution and kept constant during the runtime. For dynamic tuning, the frequency is changed during the program execution to adapt the system to the actual needs of the application. The paper shows that static tuning brings up 12% energy savings when compared to default CPU settings (the highest clock rate). The dynamic tuning improves this further by up to 3%.
A data input controller for an alphanumeric and function keyboard with ports to the CAMAC-dataway or the serial plasma display controller

International Nuclear Information System (INIS)

Zahn, J.; Komor, Z.; Geldmeyer, H.J.

1976-01-01

A data input controller has been developed to allow the data transfer from an alphanumeric and function keyboard to the CAMAC-dataway or via the plasma display controller SIG-8AS/S and a serial transmission line to the TTY-/V.24-port of a computer. (orig.) [de
Simulation of small-angle scattering patterns using a CPU-efficient algorithm

Science.gov (United States)

Anitas, E. M.

2017-12-01

Small-angle scattering (of neutrons, x-ray or light; SAS) is a well-established experimental technique for structural analysis of disordered systems at nano and micro scales. For complex systems, such as super-molecular assemblies or protein molecules, analytic solutions of SAS intensity are generally not available. Thus, a frequent approach to simulate the corresponding patterns is to use a CPU-efficient version of the Debye formula. For this purpose, in this paper we implement the well-known DALAI algorithm in Mathematica software. We present calculations for a series of 2D Sierpinski gaskets and respectively of pentaflakes, obtained from chaos game representation.
MASSIVELY PARALLEL LATENT SEMANTIC ANALYSES USING A GRAPHICS PROCESSING UNIT

Energy Technology Data Exchange (ETDEWEB)

Cavanagh, J.; Cui, S.

2009-01-01

Latent Semantic Analysis (LSA) aims to reduce the dimensions of large term-document datasets using Singular Value Decomposition. However, with the ever-expanding size of datasets, current implementations are not fast enough to quickly and easily compute the results on a standard PC. A graphics processing unit (GPU) can solve some highly parallel problems much faster than a traditional sequential processor or central processing unit (CPU). Thus, a deployable system using a GPU to speed up large-scale LSA processes would be a much more effective choice (in terms of cost/performance ratio) than using a PC cluster. Due to the GPU’s application-specifi c architecture, harnessing the GPU’s computational prowess for LSA is a great challenge. We presented a parallel LSA implementation on the GPU, using NVIDIA® Compute Unifi ed Device Architecture and Compute Unifi ed Basic Linear Algebra Subprograms software. The performance of this implementation is compared to traditional LSA implementation on a CPU using an optimized Basic Linear Algebra Subprograms library. After implementation, we discovered that the GPU version of the algorithm was twice as fast for large matrices (1 000x1 000 and above) that had dimensions not divisible by 16. For large matrices that did have dimensions divisible by 16, the GPU algorithm ran fi ve to six times faster than the CPU version. The large variation is due to architectural benefi ts of the GPU for matrices divisible by 16. It should be noted that the overall speeds for the CPU version did not vary from relative normal when the matrix dimensions were divisible by 16. Further research is needed in order to produce a fully implementable version of LSA. With that in mind, the research we presented shows that the GPU is a viable option for increasing the speed of LSA, in terms of cost/performance ratio.
Using Interpretative Phenomenological Analysis in a Mixed Methods Research Design to Explore Music in the Lives of Mature Age Amateur Keyboard Players

Science.gov (United States)

Taylor, Angela

2015-01-01

This article discusses the use of interpretative phenomenological analysis (IPA) in a mixed methods research design with reference to five recent publications about music in the lives of mature age amateur keyboard players. It explores the links between IPA and the data-gathering methods of "Rivers of Musical Experience",…
Evaluation unit of X-ray spectrometer

International Nuclear Information System (INIS)

Polivka, V.

1986-01-01

The evaluation unit is designed as a CAMAC modular system. It processes analog signals from the detector, amplifies them, digitizes them, stores them, and displays them. The analog data collection system consists of a high voltage supply, a linear amplifier, and an analog-to-digital convertor. The digital part of the data collection system consists of a data memory and a mapping unit. The control and calculation system consisting of a controller, a memory, an expandable working memory, a floppy disk controller, a parallel input and output for the terminal, and a controller for block transfer, provides the control of the entire spectrometer and the calculations for qualitative and quantitative analyses. It also provides connection to the peripherals: the disk operating system, the graphics terminal with keyboard, and the mosaic printer. (M.D.)
The Linguistics of Keyboard - to - screen Communication: A New Terminological Framework

Directory of Open Access Journals (Sweden)

Andrea s H. Jucker

2012-01-01

Full Text Available New forms of communication that have recently developed in the context of Web 2.0 make it necessary to reconsider some of the analytical tools of linguistic analysis. In the context of keyboard-to-screen communication (KSC, as we shall call it, a range of old dichotomies have become blurred or cease to be useful altogether, e. g. "asynchronous" versus "synchronous", "written" versus "spoken", "monologic" versus "dialogic", and in particular "text" versus "utterance". We propose alternative terminologies ("communicative act" and "communicative act sequence" that are more adequate to describe the new realities of online communication and can usefully be applied to such diverse entities as weblog entries, tweets, status updates on social network sites, comments on other postings and to sequences of such entities. Furthermore, in the context of social network sites, different forms of communication traditionally separated (i. e. blog, chat, email and so on seem to converge. We illustrate and discuss these phenomena with data from Twitter and Facebook.
Pseudo-random number generators for Monte Carlo simulations on ATI Graphics Processing Units

Science.gov (United States)

Demchik, Vadim

2011-03-01

Basic uniform pseudo-random number generators are implemented on ATI Graphics Processing Units (GPU). The performance results of the realized generators (multiplicative linear congruential (GGL), XOR-shift (XOR128), RANECU, RANMAR, RANLUX and Mersenne Twister (MT19937)) on CPU and GPU are discussed. The obtained speed up factor is hundreds of times in comparison with CPU. RANLUX generator is found to be the most appropriate for using on GPU in Monte Carlo simulations. The brief review of the pseudo-random number generators used in modern software packages for Monte Carlo simulations in high-energy physics is presented.
Overtaking CPU DBMSes with a GPU in whole-query analytic processing with parallelism-friendly execution plan optimization

NARCIS (Netherlands)

A. Agbaria (Adnan); D. Minor (David); N. Peterfreund (Natan); E. Rozenberg (Eyal); O. Rosenberg (Ofer); Huawei Research

2016-01-01

textabstractExisting work on accelerating analytic DB query processing with (discrete) GPUs fails to fully realize their potential for speedup through parallelism: Published results do not achieve significant speedup over more performant CPU-only DBMSes when processing complete queries. This
Discrepancy Between Clinician and Research Assistant in TIMI Score Calculation (TRIAGED CPU

Directory of Open Access Journals (Sweden)

Taylor, Brian T.

2014-11-01

Full Text Available Introduction: Several studies have attempted to demonstrate that the Thrombolysis in Myocardial Infarction (TIMI risk score has the ability to risk stratify emergency department (ED patients with potential acute coronary syndromes (ACS. Most of the studies we reviewed relied on trained research investigators to determine TIMI risk scores rather than ED providers functioning in their normal work capacity. We assessed whether TIMI risk scores obtained by ED providers in the setting of a busy ED differed from those obtained by trained research investigators. Methods: This was an ED-based prospective observational cohort study comparing TIMI scores obtained by 49 ED providers admitting patients to an ED chest pain unit (CPU to scores generated by a team of trained research investigators. We examined provider type, patient gender, and TIMI elements for their effects on TIMI risk score discrepancy. Results: Of the 501 adult patients enrolled in the study, 29.3% of TIMI risk scores determined by ED providers and trained research investigators were generated using identical TIMI risk score variables. In our low-risk population the majority of TIMI risk score differences were small; however, 12% of TIMI risk scores differed by two or more points. Conclusion: TIMI risk scores determined by ED providers in the setting of a busy ED frequently differ from scores generated by trained research investigators who complete them while not under the same pressure of an ED provider. [West J Emerg Med. 2015;16(1:24–33.
Alphabet Writing and Allograph Selection as Predictors of Spelling in Sentences Written by Spanish-Speaking Children Who Are Poor or Good Keyboarders

Science.gov (United States)

Peake, Christian; Diaz, Alicia; Artiles, Ceferino

2017-01-01

This study examined the relationship and degree of predictability that the fluency of writing the alphabet from memory and the selection of allographs have on measures of fluency and accuracy of spelling in a free-writing sentence task when keyboarding. The "Test Estandarizado para la Evaluación de la Escritura con Teclado"…
Don’t Interrupt Me While I Type: Inferring Text Entered Through Gesture Typing on Android Keyboards

Directory of Open Access Journals (Sweden)

Simon Laurent

2016-07-01

Full Text Available We present a new side-channel attack against soft keyboards that support gesture typing on Android smartphones. An application without any special permissions can observe the number and timing of the screen hardware interrupts and system-wide software interrupts generated during user input, and analyze this information to make inferences about the text being entered by the user. System-wide information is usually considered less sensitive than app-specific information, but we provide concrete evidence that this may be mistaken. Our attack applies to all Android versions, including Android M where the SELinux policy is tightened.
Design of a Message Passing Model for Use in a Heterogeneous CPU-NFP Framework for Network Analytics

CSIR Research Space (South Africa)

Pennefather, S

2017-09-01

Full Text Available of applications written in the Go programming language to be executed on a Network Flow Processor (NFP) for enhanced performance. This paper explores the need and feasibility of implementing a message passing model for data transmission between the NFP and CPU...

Discrete-Event Execution Alternatives on General Purpose Graphical Processing Units

International Nuclear Information System (INIS)

Perumalla, Kalyan S.

2006-01-01

Graphics cards, traditionally designed as accelerators for computer graphics, have evolved to support more general-purpose computation. General Purpose Graphical Processing Units (GPGPUs) are now being used as highly efficient, cost-effective platforms for executing certain simulation applications. While most of these applications belong to the category of time-stepped simulations, little is known about the applicability of GPGPUs to discrete event simulation (DES). Here, we identify some of the issues and challenges that the GPGPU stream-based interface raises for DES, and present some possible approaches to moving DES to GPGPUs. Initial performance results on simulation of a diffusion process show that DES-style execution on GPGPU runs faster than DES on CPU and also significantly faster than time-stepped simulations on either CPU or GPGPU.
A Robust Ultra-Low Voltage CPU Utilizing Timing-Error Prevention

Directory of Open Access Journals (Sweden)

Markus Hiienkari

2015-04-01

Full Text Available To minimize energy consumption of a digital circuit, logic can be operated at sub- or near-threshold voltage. Operation at this region is challenging due to device and environment variations, and resulting performance may not be adequate to all applications. This article presents two variants of a 32-bit RISC CPU targeted for near-threshold voltage. Both CPUs are placed on the same die and manufactured in 28 nm CMOS process. They employ timing-error prevention with clock stretching to enable operation with minimal safety margins while maximizing performance and energy efficiency at a given operating point. Measurements show minimum energy of 3.15 pJ/cyc at 400 mV, which corresponds to 39% energy saving compared to operation based on static signoff timing.
Computing the Density Matrix in Electronic Structure Theory on Graphics Processing Units.

Science.gov (United States)

Cawkwell, M J; Sanville, E J; Mniszewski, S M; Niklasson, Anders M N

2012-11-13

The self-consistent solution of a Schrödinger-like equation for the density matrix is a critical and computationally demanding step in quantum-based models of interatomic bonding. This step was tackled historically via the diagonalization of the Hamiltonian. We have investigated the performance and accuracy of the second-order spectral projection (SP2) algorithm for the computation of the density matrix via a recursive expansion of the Fermi operator in a series of generalized matrix-matrix multiplications. We demonstrate that owing to its simplicity, the SP2 algorithm [Niklasson, A. M. N. Phys. Rev. B2002, 66, 155115] is exceptionally well suited to implementation on graphics processing units (GPUs). The performance in double and single precision arithmetic of a hybrid GPU/central processing unit (CPU) and full GPU implementation of the SP2 algorithm exceed those of a CPU-only implementation of the SP2 algorithm and traditional matrix diagonalization when the dimensions of the matrices exceed about 2000 × 2000. Padding schemes for arrays allocated in the GPU memory that optimize the performance of the CUBLAS implementations of the level 3 BLAS DGEMM and SGEMM subroutines for generalized matrix-matrix multiplications are described in detail. The analysis of the relative performance of the hybrid CPU/GPU and full GPU implementations indicate that the transfer of arrays between the GPU and CPU constitutes only a small fraction of the total computation time. The errors measured in the self-consistent density matrices computed using the SP2 algorithm are generally smaller than those measured in matrices computed via diagonalization. Furthermore, the errors in the density matrices computed using the SP2 algorithm do not exhibit any dependence of system size, whereas the errors increase linearly with the number of orbitals when diagonalization is employed.
Intensity-based segmentation and visualization of cells in 3D microscopic images using the GPU

Science.gov (United States)

Kang, Mi-Sun; Lee, Jeong-Eom; Jeon, Woong-ki; Choi, Heung-Kook; Kim, Myoung-Hee

2013-02-01

3D microscopy images contain abundant astronomical data, rendering 3D microscopy image processing time-consuming and laborious on a central processing unit (CPU). To solve these problems, many people crop a region of interest (ROI) of the input image to a small size. Although this reduces cost and time, there are drawbacks at the image processing level, e.g., the selected ROI strongly depends on the user and there is a loss in original image information. To mitigate these problems, we developed a 3D microscopy image processing tool on a graphics processing unit (GPU). Our tool provides efficient and various automatic thresholding methods to achieve intensity-based segmentation of 3D microscopy images. Users can select the algorithm to be applied. Further, the image processing tool provides visualization of segmented volume data and can set the scale, transportation, etc. using a keyboard and mouse. However, the 3D objects visualized fast still need to be analyzed to obtain information for biologists. To analyze 3D microscopic images, we need quantitative data of the images. Therefore, we label the segmented 3D objects within all 3D microscopic images and obtain quantitative information on each labeled object. This information can use the classification feature. A user can select the object to be analyzed. Our tool allows the selected object to be displayed on a new window, and hence, more details of the object can be observed. Finally, we validate the effectiveness of our tool by comparing the CPU and GPU processing times by matching the specification and configuration.
Book Review: Placing the Suspect behind the Keyboard: Using Digital Forensics and Investigative Techniques to Identify Cybercrime Suspects

Directory of Open Access Journals (Sweden)

Thomas Nash

2013-06-01

Full Text Available Shavers, B. (2013. Placing the Suspect behind the Keyboard: Using Digital Forensics and Investigative Techniques to Identify Cybercrime Suspects. Waltham, MA: Elsevier, 290 pages, ISBN-978-1-59749-985-9, US$51.56. Includes bibliographical references and index.Reviewed by Detective Corporal Thomas Nash (tnash@bpdvt.org, Burlington Vermont Police Department, Internet Crime against Children Task Force. Adjunct Instructor, Champlain College, Burlington VT.In this must read for any aspiring novice cybercrime investigator as well as the seasoned professional computer guru alike, Brett Shaver takes the reader into the ever changing and dynamic world of Cybercrime investigation. Shaver, an experienced criminal investigator, lays out the details and intricacies of a computer related crime investigation in a clear and concise manner in his new easy to read publication, Placing the Suspect behind the Keyboard. Using Digital Forensics and Investigative techniques to Identify Cybercrime Suspects. Shaver takes the reader from start to finish through each step of the investigative process in well organized and easy to follow sections, with real case file examples to reach the ultimate goal of any investigation: identifying the suspect and proving their guilt in the crime. Do not be fooled by the title. This excellent, easily accessible reference is beneficial to both criminal as well as civil investigations and should be in every investigator’s library regardless of their respective criminal or civil investigative responsibilities.(see PDF for full review
Arkansas' Curriculum Guide. Competency Based Typewriting.

Science.gov (United States)

Arkansas State Dept. of Education, Little Rock. Div. of Vocational, Technical and Adult Education.

This guide contains the essential parts of a total curriculum for a one-year typewriting course at the secondary school level. Addressed in the individual units of the guide are the following topics: alphabetic keyboarding, numeric keyboarding, basic symbol keyboarding, skill development, problem typewriting, ten-key numeric pads, production…
FAST CALCULATION OF THE LOMB-SCARGLE PERIODOGRAM USING GRAPHICS PROCESSING UNITS

International Nuclear Information System (INIS)

Townsend, R. H. D.

2010-01-01

I introduce a new code for fast calculation of the Lomb-Scargle periodogram that leverages the computing power of graphics processing units (GPUs). After establishing a background to the newly emergent field of GPU computing, I discuss the code design and narrate key parts of its source. Benchmarking calculations indicate no significant differences in accuracy compared to an equivalent CPU-based code. However, the differences in performance are pronounced; running on a low-end GPU, the code can match eight CPU cores, and on a high-end GPU it is faster by a factor approaching 30. Applications of the code include analysis of long photometric time series obtained by ongoing satellite missions and upcoming ground-based monitoring facilities, and Monte Carlo simulation of periodogram statistical properties.
77 FR 76517 - Notice of Receipt of Complaint; Solicitation of Comments Relating to the Public Interest

Science.gov (United States)

2012-12-28

... Certain Mobile Handset Devices and Related Touch Keyboard Software Technology, DN 2923; the Commission is.... 1337) in the importation into the United States, the sale for importation, and the sale within the United States after importation of certain mobile handset devices and related touch keyboard software...
Evaluation of the CPU time for solving the radiative transfer equation with high-order resolution schemes applying the normalized weighting-factor method

Science.gov (United States)

Xamán, J.; Zavala-Guillén, I.; Hernández-López, I.; Uriarte-Flores, J.; Hernández-Pérez, I.; Macías-Melo, E. V.; Aguilar-Castro, K. M.

2018-03-01

In this paper, we evaluated the convergence rate (CPU time) of a new mathematical formulation for the numerical solution of the radiative transfer equation (RTE) with several High-Order (HO) and High-Resolution (HR) schemes. In computational fluid dynamics, this procedure is known as the Normalized Weighting-Factor (NWF) method and it is adopted here. The NWF method is used to incorporate the high-order resolution schemes in the discretized RTE. The NWF method is compared, in terms of computer time needed to obtain a converged solution, with the widely used deferred-correction (DC) technique for the calculations of a two-dimensional cavity with emitting-absorbing-scattering gray media using the discrete ordinates method. Six parameters, viz. the grid size, the order of quadrature, the absorption coefficient, the emissivity of the boundary surface, the under-relaxation factor, and the scattering albedo are considered to evaluate ten schemes. The results showed that using the DC method, in general, the scheme that had the lowest CPU time is the SOU. In contrast, with the results of theDC procedure the CPU time for DIAMOND and QUICK schemes using the NWF method is shown to be, between the 3.8 and 23.1% faster and 12.6 and 56.1% faster, respectively. However, the other schemes are more time consuming when theNWFis used instead of the DC method. Additionally, a second test case was presented and the results showed that depending on the problem under consideration, the NWF procedure may be computationally faster or slower that the DC method. As an example, the CPU time for QUICK and SMART schemes are 61.8 and 203.7%, respectively, slower when the NWF formulation is used for the second test case. Finally, future researches to explore the computational cost of the NWF method in more complex problems are required.
First Evaluation of the CPU, GPGPU and MIC Architectures for Real Time Particle Tracking based on Hough Transform at the LHC

CERN Document Server

Halyo, V.; Lujan, P.; Karpusenko, V.; Vladimirov, A.

2014-04-07

Recent innovations focused around {\\em parallel} processing, either through systems containing multiple processors or processors containing multiple cores, hold great promise for enhancing the performance of the trigger at the LHC and extending its physics program. The flexibility of the CMS/ATLAS trigger system allows for easy integration of computational accelerators, such as NVIDIA's Tesla Graphics Processing Unit (GPU) or Intel's \\xphi, in the High Level Trigger. These accelerators have the potential to provide faster or more energy efficient event selection, thus opening up possibilities for new complex triggers that were not previously feasible. At the same time, it is crucial to explore the performance limits achievable on the latest generation multicore CPUs with the use of the best software optimization methods. In this article, a new tracking algorithm based on the Hough transform will be evaluated for the first time on a multi-core Intel Xeon E5-2697v2 CPU, an NVIDIA Tesla K20c GPU, and an Intel \\x...
VMware vSphere performance designing CPU, memory, storage, and networking for performance-intensive workloads

CERN Document Server

Liebowitz, Matt; Spies, Rynardt

2014-01-01

Covering the latest VMware vSphere software, an essential book aimed at solving vSphere performance problems before they happen VMware vSphere is the industry's most widely deployed virtualization solution. However, if you improperly deploy vSphere, performance problems occur. Aimed at VMware administrators and engineers and written by a team of VMware experts, this resource provides guidance on common CPU, memory, storage, and network-related problems. Plus, step-by-step instructions walk you through techniques for solving problems and shed light on possible causes behind the problems. Divu
Monitoring of mass flux of catalyst FCC in a Cold Pilot Unit by gamma radiation transmission

International Nuclear Information System (INIS)

Brito, Marcio Fernando Paixao de

2014-01-01

This paper proposes a model for monitoring the mass flow of catalyst FCC - Fluid Catalytic Cracking - in a CPU - Cold Pilot unit - due to the injection of air and solid by gamma radiation transmission. The CPU simplifies the process of FCC, which is represented by the catalyst cycle, and it was constructed of acrylic, so that the flow can be visualized. The CPU consists of riser separation chamber and return column, and simulates the riser reactor of the FCC process. The catalyst is injected into the column back to the base of the riser, an inclined tube, where the compressed air means that there fluidization along the riser. When the catalyst comes in the separation chamber, the solid phase is sent to the return column, and the gas phase exits the system through one of the four cyclones at the top of the separation chamber. The transmission gamma of measures will be made by means of three test sections that have source and detector shielded. Pressure drop in the riser measurements are made through three pressure gauges positioned on the riser. The source used was Am-241 gamma ray with energy of 60 keV, and detector used was a scintillator of NaI (Tl) of 2 x 2 . Measures the mass flow of catalyst are made by varying the seal of the catalyst, and density of solid in the riser because with the combination of these measures can determine the speed of the catalyst in the riser. The results show that the transmission gamma is a suitable technique for monitoring the flow of catalyst, flow model in CPU is annular, tomography third generation is more appropriate to study the CPU and the density variation in circulation in the CPU decreases linearly with increasing air flow. (author)
Accelerating Molecular Dynamic Simulation on Graphics Processing Units

Science.gov (United States)

Friedrichs, Mark S.; Eastman, Peter; Vaidyanathan, Vishal; Houston, Mike; Legrand, Scott; Beberg, Adam L.; Ensign, Daniel L.; Bruns, Christopher M.; Pande, Vijay S.

2009-01-01

We describe a complete implementation of all-atom protein molecular dynamics running entirely on a graphics processing unit (GPU), including all standard force field terms, integration, constraints, and implicit solvent. We discuss the design of our algorithms and important optimizations needed to fully take advantage of a GPU. We evaluate its performance, and show that it can be more than 700 times faster than a conventional implementation running on a single CPU core. PMID:19191337
Optimized 4-bit Quantum Reversible Arithmetic Logic Unit

Science.gov (United States)

Ayyoub, Slimani; Achour, Benslama

2017-08-01

Reversible logic has received a great attention in the recent years due to its ability to reduce the power dissipation. The main purposes of designing reversible logic are to decrease quantum cost, depth of the circuits and the number of garbage outputs. The arithmetic logic unit (ALU) is an important part of central processing unit (CPU) as the execution unit. This paper presents a complete design of a new reversible arithmetic logic unit (ALU) that can be part of a programmable reversible computing device such as a quantum computer. The proposed ALU based on a reversible low power control unit and small performance parameters full adder named double Peres gates. The presented ALU can produce the largest number (28) of arithmetic and logic functions and have the smallest number of quantum cost and delay compared with existing designs.
Acceleration of the OpenFOAM-based MHD solver using graphics processing units

International Nuclear Information System (INIS)

He, Qingyun; Chen, Hongli; Feng, Jingchao

2015-01-01

Highlights: • A 3D PISO-MHD was implemented on Kepler-class graphics processing units (GPUs) using CUDA technology. • A consistent and conservative scheme is used in the code which was validated by three basic benchmarks in a rectangular and round ducts. • Parallelized of CPU and GPU acceleration were compared relating to single core CPU in MHD problems and non-MHD problems. • Different preconditions for solving MHD solver were compared and the results showed that AMG method is better for calculations. - Abstract: The pressure-implicit with splitting of operators (PISO) magnetohydrodynamics MHD solver of the couple of Navier–Stokes equations and Maxwell equations was implemented on Kepler-class graphics processing units (GPUs) using the CUDA technology. The solver is developed on open source code OpenFOAM based on consistent and conservative scheme which is suitable for simulating MHD flow under strong magnetic field in fusion liquid metal blanket with structured or unstructured mesh. We verified the validity of the implementation on several standard cases including the benchmark I of Shercliff and Hunt's cases, benchmark II of fully developed circular pipe MHD flow cases and benchmark III of KIT experimental case. Computational performance of the GPU implementation was examined by comparing its double precision run times with those of essentially the same algorithms and meshes. The resulted showed that a GPU (GTX 770) can outperform a server-class 4-core, 8-thread CPU (Intel Core i7-4770k) by a factor of 2 at least.
Acceleration of the OpenFOAM-based MHD solver using graphics processing units

Energy Technology Data Exchange (ETDEWEB)

He, Qingyun; Chen, Hongli, E-mail: hlchen1@ustc.edu.cn; Feng, Jingchao

2015-12-15

Highlights: • A 3D PISO-MHD was implemented on Kepler-class graphics processing units (GPUs) using CUDA technology. • A consistent and conservative scheme is used in the code which was validated by three basic benchmarks in a rectangular and round ducts. • Parallelized of CPU and GPU acceleration were compared relating to single core CPU in MHD problems and non-MHD problems. • Different preconditions for solving MHD solver were compared and the results showed that AMG method is better for calculations. - Abstract: The pressure-implicit with splitting of operators (PISO) magnetohydrodynamics MHD solver of the couple of Navier–Stokes equations and Maxwell equations was implemented on Kepler-class graphics processing units (GPUs) using the CUDA technology. The solver is developed on open source code OpenFOAM based on consistent and conservative scheme which is suitable for simulating MHD flow under strong magnetic field in fusion liquid metal blanket with structured or unstructured mesh. We verified the validity of the implementation on several standard cases including the benchmark I of Shercliff and Hunt's cases, benchmark II of fully developed circular pipe MHD flow cases and benchmark III of KIT experimental case. Computational performance of the GPU implementation was examined by comparing its double precision run times with those of essentially the same algorithms and meshes. The resulted showed that a GPU (GTX 770) can outperform a server-class 4-core, 8-thread CPU (Intel Core i7-4770k) by a factor of 2 at least.
Enhanced round robin CPU scheduling with burst time based time quantum

Science.gov (United States)

Indusree, J. R.; Prabadevi, B.

2017-11-01

Process scheduling is a very important functionality of Operating system. The main-known process-scheduling algorithms are First Come First Serve (FCFS) algorithm, Round Robin (RR) algorithm, Priority scheduling algorithm and Shortest Job First (SJF) algorithm. Compared to its peers, Round Robin (RR) algorithm has the advantage that it gives fair share of CPU to the processes which are already in the ready-queue. The effectiveness of the RR algorithm greatly depends on chosen time quantum value. Through this research paper, we are proposing an enhanced algorithm called Enhanced Round Robin with Burst-time based Time Quantum (ERRBTQ) process scheduling algorithm which calculates time quantum as per the burst-time of processes already in ready queue. The experimental results and analysis of ERRBTQ algorithm clearly indicates the improved performance when compared with conventional RR and its variants.
75 FR 27798 - Notice of Issuance of Final Determination Concerning Certain Commodity-Based Clustered Storage Units

Science.gov (United States)

2010-05-18

...) with instructions on it that allows it to perform certain functions of preventing piracy of software... and HDD canisters usually include a disk array controller frame which effects the interface between the subsystem's storage units and a CPU. In this case, the software effects the interconnection...
Comparison between dynamic programming and genetic algorithm for hydro unit economic load dispatch

Directory of Open Access Journals (Sweden)

Bin Xu

2014-10-01

Full Text Available The hydro unit economic load dispatch (ELD is of great importance in energy conservation and emission reduction. Dynamic programming (DP and genetic algorithm (GA are two representative algorithms for solving ELD problems. The goal of this study was to examine the performance of DP and GA while they were applied to ELD. We established numerical experiments to conduct performance comparisons between DP and GA with two given schemes. The schemes included comparing the CPU time of the algorithms when they had the same solution quality, and comparing the solution quality when they had the same CPU time. The numerical experiments were applied to the Three Gorges Reservoir in China, which is equipped with 26 hydro generation units. We found the relation between the performance of algorithms and the number of units through experiments. Results show that GA is adept at searching for optimal solutions in low-dimensional cases. In some cases, such as with a number of units of less than 10, GA's performance is superior to that of a coarse-grid DP. However, GA loses its superiority in high-dimensional cases. DP is powerful in obtaining stable and high-quality solutions. Its performance can be maintained even while searching over a large solution space. Nevertheless, due to its exhaustive enumerating nature, it costs excess time in low-dimensional cases.
Invasive treatment of NSTEMI patients in German Chest Pain Units - Evidence for a treatment paradox.

Science.gov (United States)

Schmidt, Frank P; Schmitt, Claus; Hochadel, Matthias; Giannitsis, Evangelos; Darius, Harald; Maier, Lars S; Schmitt, Claus; Heusch, Gerd; Voigtländer, Thomas; Mudra, Harald; Gori, Tommaso; Senges, Jochen; Münzel, Thomas

2018-03-15

Patients with non ST-segment elevation myocardial infarction (NSTEMI) represent the largest fraction of patients with acute coronary syndrome in German Chest Pain units. Recent evidence on early vs. selective percutaneous coronary intervention (PCI) is ambiguous with respect to effects on mortality, myocardial infarction (MI) and recurrent angina. With the present study we sought to investigate the prognostic impact of PCI and its timing in German Chest Pain Unit (CPU) NSTEMI patients. Data from 1549 patients whose leading diagnosis was NSTEMI were retrieved from the German CPU registry for the interval between 3/2010 and 3/2014. Follow-up was available at median of 167days after discharge. The patients were grouped into a higher (Group A) and lower risk group (Group B) according to GRACE score and additional criteria on admission. Group A had higher Killip classes, higher BNP levels, reduced EF and significant more triple vessel disease (pGerman Chest Pain Units. This treatment paradox may worsen prognosis in patients who could derive the largest benefit from early revascularization. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

基于单片机的简易电子琴的设计与实现%Design and Implementation of Simple Microcontroller Based Keyboard

Institute of Scientific and Technical Information of China (English)

章丹

2014-01-01

电子琴是现代电子科技与音乐结合的产物，是一种新型的键盘乐器。它在现代音乐扮演重要的角色，单片机具有强大的控制功能和灵活的编程实现特性，它已经溶入现代人们的生活中，成为不可替代的一部分。该文的主要内容是用8253芯片为核心控制元件，设计一个简易电子琴。通过按动STAR ES598PCI单板机的G6区的1~7号键，使用单板机上的8255A芯片进行音调的选择，由8253芯片控制产生不同频率的方波，输出到单板机上D1区的蜂鸣器，使其对应于G6区的1~7号键由低到高发出1~7的音阶，并由8255A芯片控制8253芯片的工作状态，使其能够控制蜂鸣器的发声与否，从而实现简易电子琴的演奏功能。同时，也可以通过事先设置好的“乐谱”回放一段音乐，实现简易电子琴的回放功能以及对用户演奏过的一段音乐进行重放功能。用户可以通过DOS界面下的菜单对电子琴的回放和重放或是演奏功能进行选择。%The keyboard is a combination of modern electronic music technology and the product is a new type of keyboard in-strument. It plays an important role in modern music , SCM has a powerful control functions and flexible programming features, it has been integrated into the modern people's lives , become an irreplaceable part . The main content of this paper is to use 8253 as the core control elements , the design of a simple flower . By pressing the number keys 1-7 STAR ES598PCI SBC G6 zone , 8255A chips on a single board computer for tone selection control 8253 generates a square wave of different frequencies , the output of the SBC to bee region D1 buzzer to make it correspond to the G6 District No. 1 to 7 keys issued from low to high scale of 1 to 7 , 8253 by the 8255A chip to control the working status of the chip so that it can control the buzzer sound or not, thus achieve simple keyboard playing capabilities. Meanwhile , you can
Accelerating VASP electronic structure calculations using graphic processing units

KAUST Repository

Hacene, Mohamed

2012-08-20

We present a way to improve the performance of the electronic structure Vienna Ab initio Simulation Package (VASP) program. We show that high-performance computers equipped with graphics processing units (GPUs) as accelerators may reduce drastically the computation time when offloading these sections to the graphic chips. The procedure consists of (i) profiling the performance of the code to isolate the time-consuming parts, (ii) rewriting these so that the algorithms become better-suited for the chosen graphic accelerator, and (iii) optimizing memory traffic between the host computer and the GPU accelerator. We chose to accelerate VASP with NVIDIA GPU using CUDA. We compare the GPU and original versions of VASP by evaluating the Davidson and RMM-DIIS algorithms on chemical systems of up to 1100 atoms. In these tests, the total time is reduced by a factor between 3 and 8 when running on n (CPU core + GPU) compared to n CPU cores only, without any accuracy loss. © 2012 Wiley Periodicals, Inc.
Accelerating VASP electronic structure calculations using graphic processing units

KAUST Repository

Hacene, Mohamed; Anciaux-Sedrakian, Ani; Rozanska, Xavier; Klahr, Diego; Guignon, Thomas; Fleurat-Lessard, Paul

2012-01-01

We present a way to improve the performance of the electronic structure Vienna Ab initio Simulation Package (VASP) program. We show that high-performance computers equipped with graphics processing units (GPUs) as accelerators may reduce drastically the computation time when offloading these sections to the graphic chips. The procedure consists of (i) profiling the performance of the code to isolate the time-consuming parts, (ii) rewriting these so that the algorithms become better-suited for the chosen graphic accelerator, and (iii) optimizing memory traffic between the host computer and the GPU accelerator. We chose to accelerate VASP with NVIDIA GPU using CUDA. We compare the GPU and original versions of VASP by evaluating the Davidson and RMM-DIIS algorithms on chemical systems of up to 1100 atoms. In these tests, the total time is reduced by a factor between 3 and 8 when running on n (CPU core + GPU) compared to n CPU cores only, without any accuracy loss. © 2012 Wiley Periodicals, Inc.
Effects of transcription ability and transcription mode on translation: Evidence from written compositions, language bursts and pauses when students in grades 4 to 9, with and without persisting dyslexia or dysgraphia, compose by pen or by keyboard

Directory of Open Access Journals (Sweden)

Scott F. Beers

2017-06-01

Full Text Available This study explored the effects of transcription on translation products and processes of adolescent students in grades 4 to 9 with and without persisting specific language disabilities in written language (SLDs—WL. To operationalize transcription ability (handwriting and spelling and transcription mode (by pen on digital tablet or by standard US keyboard, diagnostic groups contrasting in patterns of transcription ability were compared while composing autobiographical (personal narratives by handwriting or by keyboarding: Typically developing students (n=15, students with dyslexia (impaired word reading and spelling, n=20, and students with dysgraphia (impaired handwriting, n=19. They were compared on seven outcomes: total words composed, total composing time, words per minute, percent of spelling errors, average length of pauses, average number of pauses per minute, and average length of language bursts. They were also compared on automaticity of transcription modes—writing the alphabet from memory by handwriting or keyboarding (they could look at keys. Mixed ANOVAs yielded main effects for diagnostic group on percent of spelling errors, words per minute, and length of language burst. Main effects for transcription modes were found for automaticity of writing modes, total words composed, words per minute, and length of language bursts; there were no significant interactions. Regardless of mode, the dyslexia group had more spelling errors, showed a slower rate of composing, and produced shorter language bursts than the typical group. The total number of words, total time composing, words composed per minute, and pauses per minute were greater for keyboarding than handwriting, but length of language bursts was greater for handwriting. Implications of these results for conceptual models of composing and educational assessment practices are discussed.
Comparison of the CPU and memory performance of StatPatternRecognitions (SPR) and Toolkit for MultiVariate Analysis (TMVA)

International Nuclear Information System (INIS)

Palombo, G.

2012-01-01

High Energy Physics data sets are often characterized by a huge number of events. Therefore, it is extremely important to use statistical packages able to efficiently analyze these unprecedented amounts of data. We compare the performance of the statistical packages StatPatternRecognition (SPR) and Toolkit for MultiVariate Analysis (TMVA). We focus on how CPU time and memory usage of the learning process scale versus data set size. As classifiers, we consider Random Forests, Boosted Decision Trees and Neural Networks only, each with specific settings. For our tests, we employ a data set widely used in the machine learning community, “Threenorm” data set, as well as data tailored for testing various edge cases. For each data set, we constantly increase its size and check CPU time and memory needed to build the classifiers implemented in SPR and TMVA. We show that SPR is often significantly faster and consumes significantly less memory. For example, the SPR implementation of Random Forest is by an order of magnitude faster and consumes an order of magnitude less memory than TMVA on Threenorm data.
hybrid\\scriptsize{{MANTIS}}: a CPU-GPU Monte Carlo method for modeling indirect x-ray detectors with columnar scintillators

Science.gov (United States)

Sharma, Diksha; Badal, Andreu; Badano, Aldo

2012-04-01

The computational modeling of medical imaging systems often requires obtaining a large number of simulated images with low statistical uncertainty which translates into prohibitive computing times. We describe a novel hybrid approach for Monte Carlo simulations that maximizes utilization of CPUs and GPUs in modern workstations. We apply the method to the modeling of indirect x-ray detectors using a new and improved version of the code \\scriptsize{{MANTIS}}, an open source software tool used for the Monte Carlo simulations of indirect x-ray imagers. We first describe a GPU implementation of the physics and geometry models in fast\\scriptsize{{DETECT}}2 (the optical transport model) and a serial CPU version of the same code. We discuss its new features like on-the-fly column geometry and columnar crosstalk in relation to the \\scriptsize{{MANTIS}} code, and point out areas where our model provides more flexibility for the modeling of realistic columnar structures in large area detectors. Second, we modify \\scriptsize{{PENELOPE}} (the open source software package that handles the x-ray and electron transport in \\scriptsize{{MANTIS}}) to allow direct output of location and energy deposited during x-ray and electron interactions occurring within the scintillator. This information is then handled by optical transport routines in fast\\scriptsize{{DETECT}}2. A load balancer dynamically allocates optical transport showers to the GPU and CPU computing cores. Our hybrid\\scriptsize{{MANTIS}} approach achieves a significant speed-up factor of 627 when compared to \\scriptsize{{MANTIS}} and of 35 when compared to the same code running only in a CPU instead of a GPU. Using hybrid\\scriptsize{{MANTIS}}, we successfully hide hours of optical transport time by running it in parallel with the x-ray and electron transport, thus shifting the computational bottleneck from optical to x-ray transport. The new code requires much less memory than \\scriptsize{{MANTIS}} and, as a result
A Block-Asynchronous Relaxation Method for Graphics Processing Units

OpenAIRE

Anzt, H.; Dongarra, J.; Heuveline, Vincent; Tomov, S.

2011-01-01

In this paper, we analyze the potential of asynchronous relaxation methods on Graphics Processing Units (GPUs). For this purpose, we developed a set of asynchronous iteration algorithms in CUDA and compared them with a parallel implementation of synchronous relaxation methods on CPU-based systems. For a set of test matrices taken from the University of Florida Matrix Collection we monitor the convergence behavior, the average iteration time and the total time-to-solution time. Analyzing the r...
Remote Maintenance Design Guide for Compact Processing Units

Energy Technology Data Exchange (ETDEWEB)

Draper, J.V.

2000-07-13

Oak Ridge National Laboratory (ORNL) Robotics and Process Systems (RPSD) personnel have extensive experience working with remotely operated and maintained systems. These systems require expert knowledge in teleoperation, human factors, telerobotics, and other robotic devices so that remote equipment may be manipulated, operated, serviced, surveyed, and moved about in a hazardous environment. The RPSD staff has a wealth of experience in this area, including knowledge in the broad topics of human factors, modular electronics, modular mechanical systems, hardware design, and specialized tooling. Examples of projects that illustrate and highlight RPSD's unique experience in remote systems design and application include the following: (1) design of a remote shear and remote dissolver systems in support of U.S. Department of Energy (DOE) fuel recycling research and nuclear power missions; (2) building remotely operated mobile systems for metrology and characterizing hazardous facilities in support of remote operations within those facilities; (3) construction of modular robotic arms, including the Laboratory Telerobotic Manipulator, which was designed for the National Aeronautics and Space Administration (NASA) and the Advanced ServoManipulator, which was designed for the DOE; (4) design of remotely operated laboratories, including chemical analysis and biochemical processing laboratories; (5) construction of remote systems for environmental clean up and characterization, including underwater, buried waste, underground storage tank (UST) and decontamination and dismantlement (D&D) applications. Remote maintenance has played a significant role in fuel reprocessing because of combined chemical and radiological contamination. Furthermore, remote maintenance is expected to play a strong role in future waste remediation. The compact processing units (CPUs) being designed for use in underground waste storage tank remediation are examples of improvements in systems
High-throughput sequence alignment using Graphics Processing Units

Directory of Open Access Journals (Sweden)

Trapnell Cole

2007-12-01

Full Text Available Abstract Background The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and de novo genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies. Results This paper describes MUMmerGPU, an open-source high-throughput parallel pairwise local sequence alignment program that runs on commodity Graphics Processing Units (GPUs in common workstations. MUMmerGPU uses the new Compute Unified Device Architecture (CUDA from nVidia to align multiple query sequences against a single reference sequence stored as a suffix tree. By processing the queries in parallel on the highly parallel graphics card, MUMmerGPU achieves more than a 10-fold speedup over a serial CPU version of the sequence alignment kernel, and outperforms the exact alignment component of MUMmer on a high end CPU by 3.5-fold in total application time when aligning reads from recent sequencing projects using Solexa/Illumina, 454, and Sanger sequencing technologies. Conclusion MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU.
Porting AMG2013 to Heterogeneous CPU+GPU Nodes

Energy Technology Data Exchange (ETDEWEB)

Samfass, Philipp [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

2017-01-26

LLNL's future advanced technology system SIERRA will feature heterogeneous compute nodes that consist of IBM PowerV9 CPUs and NVIDIA Volta GPUs. Conceptually, the motivation for such an architecture is quite straightforward: While GPUs are optimized for throughput on massively parallel workloads, CPUs strive to minimize latency for rather sequential operations. Yet, making optimal use of heterogeneous architectures raises new challenges for the development of scalable parallel software, e.g., with respect to work distribution. Porting LLNL's parallel numerical libraries to upcoming heterogeneous CPU+GPU architectures is therefore a critical factor for ensuring LLNL's future success in ful lling its national mission. One of these libraries, called HYPRE, provides parallel solvers and precondi- tioners for large, sparse linear systems of equations. In the context of this intern- ship project, I consider AMG2013 which is a proxy application for major parts of HYPRE that implements a benchmark for setting up and solving di erent systems of linear equations. In the following, I describe in detail how I ported multiple parts of AMG2013 to the GPU (Section 2) and present results for di erent experiments that demonstrate a successful parallel implementation on the heterogeneous ma- chines surface and ray (Section 3). In Section 4, I give guidelines on how my code should be used. Finally, I conclude and give an outlook for future work (Section 5).
Analysis of impact of general-purpose graphics processor units in supersonic flow modeling

Science.gov (United States)

Emelyanov, V. N.; Karpenko, A. G.; Kozelkov, A. S.; Teterina, I. V.; Volkov, K. N.; Yalozo, A. V.

2017-06-01

Computational methods are widely used in prediction of complex flowfields associated with off-normal situations in aerospace engineering. Modern graphics processing units (GPU) provide architectures and new programming models that enable to harness their large processing power and to design computational fluid dynamics (CFD) simulations at both high performance and low cost. Possibilities of the use of GPUs for the simulation of external and internal flows on unstructured meshes are discussed. The finite volume method is applied to solve three-dimensional unsteady compressible Euler and Navier-Stokes equations on unstructured meshes with high resolution numerical schemes. CUDA technology is used for programming implementation of parallel computational algorithms. Solutions of some benchmark test cases on GPUs are reported, and the results computed are compared with experimental and computational data. Approaches to optimization of the CFD code related to the use of different types of memory are considered. Speedup of solution on GPUs with respect to the solution on central processor unit (CPU) is compared. Performance measurements show that numerical schemes developed achieve 20-50 speedup on GPU hardware compared to CPU reference implementation. The results obtained provide promising perspective for designing a GPU-based software framework for applications in CFD.
The Influence of Emotion on Keyboard Typing: An Experimental Study Using Auditory Stimuli.

Science.gov (United States)

Lee, Po-Ming; Tsui, Wei-Hsuan; Hsiao, Tzu-Chien

2015-01-01

In recent years, a novel approach for emotion recognition has been reported, which is by keystroke dynamics. The advantages of using this approach are that the data used is rather non-intrusive and easy to obtain. However, there were only limited investigations about the phenomenon itself in previous studies. Hence, this study aimed to examine the source of variance in keyboard typing patterns caused by emotions. A controlled experiment to collect subjects' keystroke data in different emotional states induced by International Affective Digitized Sounds (IADS) was conducted. Two-way Valence (3) x Arousal (3) ANOVAs was used to examine the collected dataset. The results of the experiment indicate that the effect of arousal is significant in keystroke duration (p emotional effect is small, compared to the individual variability. Our findings support the conclusion that the keystroke duration and latency are influenced by arousal. The finding about the size of the effect suggests that the accuracy rate of emotion recognition technology could be further improved if personalized models are utilized. Notably, the experiment was conducted using standard instruments and hence is expected to be highly reproducible.
Computation of large covariance matrices by SAMMY on graphical processing units and multicore CPUs

International Nuclear Information System (INIS)

Arbanas, G.; Dunn, M.E.; Wiarda, D.

2011-01-01

Computational power of Graphical Processing Units and multicore CPUs was harnessed by the nuclear data evaluation code SAMMY to speed up computations of large Resonance Parameter Covariance Matrices (RPCMs). This was accomplished by linking SAMMY to vendor-optimized implementations of the matrix-matrix multiplication subroutine of the Basic Linear Algebra Library to compute the most time-consuming step. The 235 U RPCM computed previously using a triple-nested loop was re-computed using the NVIDIA implementation of the subroutine on a single Tesla Fermi Graphical Processing Unit, and also using the Intel's Math Kernel Library implementation on two different multicore CPU systems. A multiplication of two matrices of dimensions 16,000×20,000 that had previously taken days, took approximately one minute on the GPU. Comparable performance was achieved on a dual six-core CPU system. The magnitude of the speed-up suggests that these, or similar, combinations of hardware and libraries may be useful for large matrix operations in SAMMY. Uniform interfaces of standard linear algebra libraries make them a promising candidate for a programming framework of a new generation of SAMMY for the emerging heterogeneous computing platforms. (author)
Computation of large covariance matrices by SAMMY on graphical processing units and multicore CPUs

Energy Technology Data Exchange (ETDEWEB)

Arbanas, G.; Dunn, M.E.; Wiarda, D., E-mail: arbanasg@ornl.gov, E-mail: dunnme@ornl.gov, E-mail: wiardada@ornl.gov [Oak Ridge National Laboratory, Oak Ridge, TN (United States)

2011-07-01

Computational power of Graphical Processing Units and multicore CPUs was harnessed by the nuclear data evaluation code SAMMY to speed up computations of large Resonance Parameter Covariance Matrices (RPCMs). This was accomplished by linking SAMMY to vendor-optimized implementations of the matrix-matrix multiplication subroutine of the Basic Linear Algebra Library to compute the most time-consuming step. The {sup 235}U RPCM computed previously using a triple-nested loop was re-computed using the NVIDIA implementation of the subroutine on a single Tesla Fermi Graphical Processing Unit, and also using the Intel's Math Kernel Library implementation on two different multicore CPU systems. A multiplication of two matrices of dimensions 16,000×20,000 that had previously taken days, took approximately one minute on the GPU. Comparable performance was achieved on a dual six-core CPU system. The magnitude of the speed-up suggests that these, or similar, combinations of hardware and libraries may be useful for large matrix operations in SAMMY. Uniform interfaces of standard linear algebra libraries make them a promising candidate for a programming framework of a new generation of SAMMY for the emerging heterogeneous computing platforms. (author)
Physician discretion is safe and may lower stress test utilization in emergency department chest pain unit patients.

Science.gov (United States)

Napoli, Anthony M; Arrighi, James A; Siket, Matthew S; Gibbs, Frantz J

2012-03-01

Chest pain unit (CPU) observation with defined stress utilization protocols is a common management option for low-risk emergency department patients. We sought to evaluate the safety of a joint emergency medicine and cardiology staffed CPU. Prospective observational trial of consecutive patients admitted to an emergency department CPU was conducted. A standard 6-hour observation protocol was followed by cardiology consultation and stress utilization largely at their discretion. Included patients were at low/intermediate risk by the American Heart Association, had nondiagnostic electrocardiograms, and a normal initial troponin. Excluded patients were those with an acute comorbidity, age >75, and a history of coronary artery disease, or had a coexistent problem restricting 24-hour observation. Primary outcome was 30-day major adverse cardiovascular events-defined as death, nonfatal acute myocardial infarction, revascularization, or out-of-hospital cardiac arrest. A total of 1063 patients were enrolled over 8 months. The mean age of the patients was 52.8 ± 11.8 years, and 51% (95% confidence interval [CI], 48-54) were female. The mean thrombolysis in myocardial infarction and Diamond & Forrester scores were 0.6% (95% CI, 0.51-0.62) and 33% (95% CI, 31-35), respectively. In all, 51% (95% CI, 48-54) received stress testing (52% nuclear stress, 39% stress echocardiogram, 5% exercise, 4% other). In all, 0.9% patients (n = 10, 95% CI, 0.4-1.5) were diagnosed with a non-ST elevation myocardial infarction and 2.2% (n = 23, 95% CI, 1.3-3) with acute coronary syndrome. There was 1 (95% CI, 0%-0.3%) case of a 30-day major adverse cardiovascular events. The 51% stress test utilization rate was less than the range reported in previous CPU studies (P < 0.05). Joint emergency medicine and cardiology management of patients within a CPU protocol is safe, efficacious, and may safely reduce stress testing rates.
Portable implementation model for CFD simulations. Application to hybrid CPU/GPU supercomputers

Science.gov (United States)

Oyarzun, Guillermo; Borrell, Ricard; Gorobets, Andrey; Oliva, Assensi

2017-10-01

Nowadays, high performance computing (HPC) systems experience a disruptive moment with a variety of novel architectures and frameworks, without any clarity of which one is going to prevail. In this context, the portability of codes across different architectures is of major importance. This paper presents a portable implementation model based on an algebraic operational approach for direct numerical simulation (DNS) and large eddy simulation (LES) of incompressible turbulent flows using unstructured hybrid meshes. The strategy proposed consists in representing the whole time-integration algorithm using only three basic algebraic operations: sparse matrix-vector product, a linear combination of vectors and dot product. The main idea is based on decomposing the nonlinear operators into a concatenation of two SpMV operations. This provides high modularity and portability. An exhaustive analysis of the proposed implementation for hybrid CPU/GPU supercomputers has been conducted with tests using up to 128 GPUs. The main objective consists in understanding the challenges of implementing CFD codes on new architectures.
High-Throughput Characterization of Porous Materials Using Graphics Processing Units

Energy Technology Data Exchange (ETDEWEB)

Kim, Jihan; Martin, Richard L.; Rübel, Oliver; Haranczyk, Maciej; Smit, Berend

2012-05-08

We have developed a high-throughput graphics processing units (GPU) code that can characterize a large database of crystalline porous materials. In our algorithm, the GPU is utilized to accelerate energy grid calculations where the grid values represent interactions (i.e., Lennard-Jones + Coulomb potentials) between gas molecules (i.e., CH$_{4}$ and CO$_{2}$) and material's framework atoms. Using a parallel flood fill CPU algorithm, inaccessible regions inside the framework structures are identified and blocked based on their energy profiles. Finally, we compute the Henry coefficients and heats of adsorption through statistical Widom insertion Monte Carlo moves in the domain restricted to the accessible space. The code offers significant speedup over a single core CPU code and allows us to characterize a set of porous materials at least an order of magnitude larger than ones considered in earlier studies. For structures selected from such a prescreening algorithm, full adsorption isotherms can be calculated by conducting multiple grand canonical Monte Carlo simulations concurrently within the GPU.
Accelerating Monte Carlo simulations of photon transport in a voxelized geometry using a massively parallel graphics processing unit

International Nuclear Information System (INIS)

Badal, Andreu; Badano, Aldo

2009-01-01

Purpose: It is a known fact that Monte Carlo simulations of radiation transport are computationally intensive and may require long computing times. The authors introduce a new paradigm for the acceleration of Monte Carlo simulations: The use of a graphics processing unit (GPU) as the main computing device instead of a central processing unit (CPU). Methods: A GPU-based Monte Carlo code that simulates photon transport in a voxelized geometry with the accurate physics models from PENELOPE has been developed using the CUDA programming model (NVIDIA Corporation, Santa Clara, CA). Results: An outline of the new code and a sample x-ray imaging simulation with an anthropomorphic phantom are presented. A remarkable 27-fold speed up factor was obtained using a GPU compared to a single core CPU. Conclusions: The reported results show that GPUs are currently a good alternative to CPUs for the simulation of radiation transport. Since the performance of GPUs is currently increasing at a faster pace than that of CPUs, the advantages of GPU-based software are likely to be more pronounced in the future.
Accelerating Monte Carlo simulations of photon transport in a voxelized geometry using a massively parallel graphics processing unit

Energy Technology Data Exchange (ETDEWEB)

Badal, Andreu; Badano, Aldo [Division of Imaging and Applied Mathematics, OSEL, CDRH, U.S. Food and Drug Administration, Silver Spring, Maryland 20993-0002 (United States)

2009-11-15

Purpose: It is a known fact that Monte Carlo simulations of radiation transport are computationally intensive and may require long computing times. The authors introduce a new paradigm for the acceleration of Monte Carlo simulations: The use of a graphics processing unit (GPU) as the main computing device instead of a central processing unit (CPU). Methods: A GPU-based Monte Carlo code that simulates photon transport in a voxelized geometry with the accurate physics models from PENELOPE has been developed using the CUDA programming model (NVIDIA Corporation, Santa Clara, CA). Results: An outline of the new code and a sample x-ray imaging simulation with an anthropomorphic phantom are presented. A remarkable 27-fold speed up factor was obtained using a GPU compared to a single core CPU. Conclusions: The reported results show that GPUs are currently a good alternative to CPUs for the simulation of radiation transport. Since the performance of GPUs is currently increasing at a faster pace than that of CPUs, the advantages of GPU-based software are likely to be more pronounced in the future.
Accelerating Monte Carlo simulations of photon transport in a voxelized geometry using a massively parallel graphics processing unit.

Science.gov (United States)

Badal, Andreu; Badano, Aldo

2009-11-01

It is a known fact that Monte Carlo simulations of radiation transport are computationally intensive and may require long computing times. The authors introduce a new paradigm for the acceleration of Monte Carlo simulations: The use of a graphics processing unit (GPU) as the main computing device instead of a central processing unit (CPU). A GPU-based Monte Carlo code that simulates photon transport in a voxelized geometry with the accurate physics models from PENELOPE has been developed using the CUDATM programming model (NVIDIA Corporation, Santa Clara, CA). An outline of the new code and a sample x-ray imaging simulation with an anthropomorphic phantom are presented. A remarkable 27-fold speed up factor was obtained using a GPU compared to a single core CPU. The reported results show that GPUs are currently a good alternative to CPUs for the simulation of radiation transport. Since the performance of GPUs is currently increasing at a faster pace than that of CPUs, the advantages of GPU-based software are likely to be more pronounced in the future.

Event- and Time-Driven Techniques Using Parallel CPU-GPU Co-processing for Spiking Neural Networks.

Science.gov (United States)

Naveros, Francisco; Garrido, Jesus A; Carrillo, Richard R; Ros, Eduardo; Luque, Niceto R

2017-01-01

Modeling and simulating the neural structures which make up our central neural system is instrumental for deciphering the computational neural cues beneath. Higher levels of biological plausibility usually impose higher levels of complexity in mathematical modeling, from neural to behavioral levels. This paper focuses on overcoming the simulation problems (accuracy and performance) derived from using higher levels of mathematical complexity at a neural level. This study proposes different techniques for simulating neural models that hold incremental levels of mathematical complexity: leaky integrate-and-fire (LIF), adaptive exponential integrate-and-fire (AdEx), and Hodgkin-Huxley (HH) neural models (ranged from low to high neural complexity). The studied techniques are classified into two main families depending on how the neural-model dynamic evaluation is computed: the event-driven or the time-driven families. Whilst event-driven techniques pre-compile and store the neural dynamics within look-up tables, time-driven techniques compute the neural dynamics iteratively during the simulation time. We propose two modifications for the event-driven family: a look-up table recombination to better cope with the incremental neural complexity together with a better handling of the synchronous input activity. Regarding the time-driven family, we propose a modification in computing the neural dynamics: the bi-fixed-step integration method. This method automatically adjusts the simulation step size to better cope with the stiffness of the neural model dynamics running in CPU platforms. One version of this method is also implemented for hybrid CPU-GPU platforms. Finally, we analyze how the performance and accuracy of these modifications evolve with increasing levels of neural complexity. We also demonstrate how the proposed modifications which constitute the main contribution of this study systematically outperform the traditional event- and time-driven techniques under
A Case Study of MasterMind Chess: Comparing Mouse/Keyboard Interaction with Kinect-Based Gestural Interface

Directory of Open Access Journals (Sweden)

Gabriel Alves Mendes Vasiljevic

2016-01-01

Full Text Available As gestural interfaces emerged as a new type of user interface, their use has been vastly explored by the entertainment industry to better immerse the player in games. Despite being mainly used in dance and sports games, little use was made of gestural interaction in more slow-paced genres, such as board games. In this work, we present a Kinect-based gestural interface for an online and multiplayer chess game and describe a case study with users with different playing skill levels. Comparing the mouse/keyboard interaction with the gesture-based interaction, the results of the activity were synthesized into lessons learned regarding general usability and design of game control mechanisms. These results could be applied to slow-paced board games like chess. Our findings indicate that gestural interfaces may not be suitable for competitive chess matches, yet it can be fun to play while using them in casual matches.
Low-cost general purpose spectral display unit using an IBM PC

International Nuclear Information System (INIS)

Robinson, S.L.

1985-10-01

Many physics experiments require acquisition and analysis of spectral data. commercial minicomputer-based multichannel analyzers collect detected counts at various energies, create a histogram of the counts in memory, and display the resultant spectra. They acquire data and provide the user-to-display interface. The system discussed separates functions into the three modular components of data acquisition, storage, and display. This decoupling of functions allows the experimenter to use any number of detectors for data collection before forwarding up to 64 spectra to the display unit, thereby increasing data throughput over that available with commercial systems. An IBM PC was chosen for the low-cost, general purpose display unit. Up to four spectra may be displayed simultaneously in different colors. The histogram saves 1024 channels per detector, 640 of which may be distinctly displayed per spectra. The IEEE-488 standard provides the data path between the IBM PC and the data collection unit. Data is sent to the PC under interrupt control, using direct memory access. Display manipulations available via keyboard are also discussed
Monte Carlo methods for neutron transport on graphics processing units using Cuda - 015

International Nuclear Information System (INIS)

Nelson, A.G.; Ivanov, K.N.

2010-01-01

This work examined the feasibility of utilizing Graphics Processing Units (GPUs) to accelerate Monte Carlo neutron transport simulations. First, a clean-sheet MC code was written in C++ for an x86 CPU and later ported to run on GPUs using NVIDIA's CUDA programming language. After further optimization, the GPU ran 21 times faster than the CPU code when using single-precision floating point math. This can be further increased with no additional effort if accuracy is sacrificed for speed: using a compiler flag, the speedup was increased to 22x. Further, if double-precision floating point math is desired for neutron tracking through the geometry, a speedup of 11x was obtained. The GPUs have proven to be useful in this study, but the current generation does have limitations: the maximum memory currently available on a single GPU is only 4 GB; the GPU RAM does not provide error-checking and correction; and the optimization required for large speedups can lead to confusing code. (authors)
Effective electron-density map improvement and structure validation on a Linux multi-CPU web cluster: The TB Structural Genomics Consortium Bias Removal Web Service.

Science.gov (United States)

Reddy, Vinod; Swanson, Stanley M; Segelke, Brent; Kantardjieff, Katherine A; Sacchettini, James C; Rupp, Bernhard

2003-12-01

Anticipating a continuing increase in the number of structures solved by molecular replacement in high-throughput crystallography and drug-discovery programs, a user-friendly web service for automated molecular replacement, map improvement, bias removal and real-space correlation structure validation has been implemented. The service is based on an efficient bias-removal protocol, Shake&wARP, and implemented using EPMR and the CCP4 suite of programs, combined with various shell scripts and Fortran90 routines. The service returns improved maps, converted data files and real-space correlation and B-factor plots. User data are uploaded through a web interface and the CPU-intensive iteration cycles are executed on a low-cost Linux multi-CPU cluster using the Condor job-queuing package. Examples of map improvement at various resolutions are provided and include model completion and reconstruction of absent parts, sequence correction, and ligand validation in drug-target structures.
Finite difference numerical method for the superlattice Boltzmann transport equation and case comparison of CPU(C) and GPU(CUDA) implementations

Energy Technology Data Exchange (ETDEWEB)

Priimak, Dmitri

2014-12-01

We present a finite difference numerical algorithm for solving two dimensional spatially homogeneous Boltzmann transport equation which describes electron transport in a semiconductor superlattice subject to crossed time dependent electric and constant magnetic fields. The algorithm is implemented both in C language targeted to CPU and in CUDA C language targeted to commodity NVidia GPU. We compare performances and merits of one implementation versus another and discuss various software optimisation techniques.
Finite difference numerical method for the superlattice Boltzmann transport equation and case comparison of CPU(C) and GPU(CUDA) implementations

International Nuclear Information System (INIS)

Priimak, Dmitri

2014-01-01

We present a finite difference numerical algorithm for solving two dimensional spatially homogeneous Boltzmann transport equation which describes electron transport in a semiconductor superlattice subject to crossed time dependent electric and constant magnetic fields. The algorithm is implemented both in C language targeted to CPU and in CUDA C language targeted to commodity NVidia GPU. We compare performances and merits of one implementation versus another and discuss various software optimisation techniques
Classification of hyperspectral imagery using MapReduce on a NVIDIA graphics processing unit (Conference Presentation)

Science.gov (United States)

Ramirez, Andres; Rahnemoonfar, Maryam

2017-04-01

A hyperspectral image provides multidimensional figure rich in data consisting of hundreds of spectral dimensions. Analyzing the spectral and spatial information of such image with linear and non-linear algorithms will result in high computational time. In order to overcome this problem, this research presents a system using a MapReduce-Graphics Processing Unit (GPU) model that can help analyzing a hyperspectral image through the usage of parallel hardware and a parallel programming model, which will be simpler to handle compared to other low-level parallel programming models. Additionally, Hadoop was used as an open-source version of the MapReduce parallel programming model. This research compared classification accuracy results and timing results between the Hadoop and GPU system and tested it against the following test cases: the CPU and GPU test case, a CPU test case and a test case where no dimensional reduction was applied.
Conceptual design of the X-IFU Instrument Control Unit on board the ESA Athena mission

Science.gov (United States)

Corcione, L.; Ligori, S.; Capobianco, V.; Bonino, D.; Valenziano, L.; Guizzo, G. P.

2016-07-01

Athena is one of L-class missions selected in the ESA Cosmic Vision 2015-2025 program for the science theme of the Hot and Energetic Universe. The Athena model payload includes the X-ray Integral Field Unit (X-IFU), an advanced actively shielded X-ray microcalorimeter spectrometer for high spectral resolution imaging, utilizing cooled Transition Edge Sensors. This paper describes the preliminary architecture of Instrument Control Unit (ICU), which is aimed at operating all XIFU's subsystems, as well as at implementing the main functional interfaces of the instrument with the S/C control unit. The ICU functions include the TC/TM management with S/C, science data formatting and transmission to S/C Mass Memory, housekeeping data handling, time distribution for synchronous operations and the management of the X-IFU components (i.e. CryoCoolers, Filter Wheel, Detector Readout Electronics Event Processor, Power Distribution Unit). ICU functions baseline implementation for the phase-A study foresees the usage of standard and Space-qualified components from the heritage of past and current space missions (e.g. Gaia, Euclid), which currently encompasses Leon2/Leon3 based CPU board and standard Space-qualified interfaces for the exchange commands and data between ICU and X-IFU subsystems. Alternative architecture, arranged around a powerful PowerPC-based CPU, is also briefly presented, with the aim of endowing the system with enhanced hardware resources and processing power capability, for the handling of control and science data processing tasks not defined yet at this stage of the mission study.
Layout de teclado para uma prancha de comunicação alternativa e ampliada Keyboard layout for an augmentative and alternative communication board

Directory of Open Access Journals (Sweden)

Luciane Aparecida Liegel

2008-12-01

Full Text Available O objetivo deste artigo é descrever e discutir a proposta de um novo layout de teclado projetado especialmente para uma prancha de comunicação alternativa com acionamento mecânico e remoto, para ser utilizado por portadores de paralisia cerebral com capacidade cognitiva preservada. Para compor o layout do teclado de comunicação alternativa, realizou-se uma pesquisa envolvendo disposição e conteúdo das teclas. Participaram do estudo onze voluntárias, sendo: cinco professoras de educação especial, quatro pedagogas especializadas em educação especial e duas fonoaudiólogas. O layout é composto por 95 teclas dispostas em grupos de teclas: alfabéticas, de letras acentuadas, numéricas, de funções e de comunicação alternativa e ampliada. As teclas de comunicação alternativa, contêm ícones associados à palavras ou frases, além de teclas acentuadas. Os ícones contemplados fazem parte de uma linguagem visual brasileira de comunicação, em desenvolvimento. Para auxiliar na localização, tanto o tamanho de teclas e caracteres quanto as cores de fundo das teclas diferenciadas foram utilizadas. As teclas com letras acentuadas e as teclas de comunicação alternativa visam facilitar e acelerar a digitação das mensagens, reduzindo assim o tempo de digitação e conseqüentemente, a ocorrência de fadiga muscular.The aim of this article is to describe and discuss a novel layout proposal for keyboard especially designed for a communication board using mechanical and remote activation to be used by people with cerebral paralysis who present sufficient cognitive skills. In order to design the layout of the augmentative and alternative communication keyboard, a research study involving position and content of the keys was undertaken. Eleven volunteers participated in the study, and they were: five special education teachers, four pedagogues specialized in special education and two speech and language therapists. The layout is made up
A low-cost general purpose spectral display unit using an IBM PC

International Nuclear Information System (INIS)

Robinson, S.L.

1986-01-01

Many physics experiments require acquisition and analysis of spectral data. Commercial minicomputer-based multichannel analyzers collect detected counts at various energies, create a histogram of the counts in memory, and display the resultant spectra. They acquire data and provide the user-to-display interface. The system discussed separates functions into the three modular components of data acquisition, storage, and display. This decoupling of functions allows the experimenter to use any number of detectors for data collection before forwarding up to 64 spectra to the display unit, thereby increasing data throughput over that available with commercial systems. An IBM PC was chosen for the low-cost, general purpose display unit. Up to four spectra may be displayed simultaneously in different colors. The histogram saves 1024 channels per detector, 640 of which may be distinctly displayed per spectra. The IEEE-488 standard provides the data path between the IBM PC and the data collection unit. Data is sent to the PC under interrupt control, using direct memory access. Display manipulations available via keyboard are also discussed
Deployment of 464XLAT (RFC6877) alongside IPv6-only CPU resources at WLCG sites

Science.gov (United States)

Froy, T. S.; Traynor, D. P.; Walker, C. J.

2017-10-01

IPv4 is now officially deprecated by the IETF. A significant amount of effort has already been expended by the HEPiX IPv6 Working Group on testing dual-stacked hosts and IPv6-only CPU resources. Dual-stack adds complexity and administrative overhead to sites that may already be starved of resource. This has resulted in a very slow uptake of IPv6 from WLCG sites. 464XLAT (RFC6877) is intended for IPv6 single-stack environments that require the ability to communicate with IPv4-only endpoints. This paper will present a deployment strategy for 464XLAT, operational experiences of using 464XLAT in production at a WLCG site and important information to consider prior to deploying 464XLAT.
Air compressor multi-pattern smart monitor

Science.gov (United States)

Zhao, Qiancheng; Qin, Yejun; Dai, Juchuan; Huang, Geng

2011-12-01

The device is controlled by TMS320F2812 microprocessor. It mainly includes signal acquisition circuit, keyboard circuit, Chinese / English LCD display circuit, the calendar clock circuits, alarm circuits, relay output circuit, communications interface circuits, DI / DO circuit, power circuit and CPU circuit and so on. In addition, the device integrates a sensor transmission circuit, so it can directly connect with temperature pressure sensors to achieve high-precision measurement and monitoring. According to needs of users, it can work in different modes without the additional controller respectively. The equipment can communicate with each other by CAN bus or RS485. It mainly can realize the control and analysis of equipment status, failure predication and diagnosis, information management, etc.
Implementation of RLS-based Adaptive Filterson nVIDIA GeForce Graphics Processing Unit

OpenAIRE

Hirano, Akihiro; Nakayama, Kenji

2011-01-01

This paper presents efficient implementa- tion of RLS-based adaptive filters with a large number of taps on nVIDIA GeForce graphics processing unit (GPU) and CUDA software development environment. Modification of the order and the combination of calcu- lations reduces the number of accesses to slow off-chip memory. Assigning tasks into multiple threads also takes memory access order into account. For a 4096-tap case, a GPU program is almost three times faster than a CPU program.
Graphics processing unit based computation for NDE applications

Science.gov (United States)

Nahas, C. A.; Rajagopal, Prabhu; Balasubramaniam, Krishnan; Krishnamurthy, C. V.

2012-05-01

Advances in parallel processing in recent years are helping to improve the cost of numerical simulation. Breakthroughs in Graphical Processing Unit (GPU) based computation now offer the prospect of further drastic improvements. The introduction of 'compute unified device architecture' (CUDA) by NVIDIA (the global technology company based in Santa Clara, California, USA) has made programming GPUs for general purpose computing accessible to the average programmer. Here we use CUDA to develop parallel finite difference schemes as applicable to two problems of interest to NDE community, namely heat diffusion and elastic wave propagation. The implementations are for two-dimensions. Performance improvement of the GPU implementation against serial CPU implementation is then discussed.
Fast data reconstructed method of Fourier transform imaging spectrometer based on multi-core CPU

Science.gov (United States)

Yu, Chunchao; Du, Debiao; Xia, Zongze; Song, Li; Zheng, Weijian; Yan, Min; Lei, Zhenggang

2017-10-01

Imaging spectrometer can gain two-dimensional space image and one-dimensional spectrum at the same time, which shows high utility in color and spectral measurements, the true color image synthesis, military reconnaissance and so on. In order to realize the fast reconstructed processing of the Fourier transform imaging spectrometer data, the paper designed the optimization reconstructed algorithm with OpenMP parallel calculating technology, which was further used for the optimization process for the HyperSpectral Imager of `HJ-1' Chinese satellite. The results show that the method based on multi-core parallel computing technology can control the multi-core CPU hardware resources competently and significantly enhance the calculation of the spectrum reconstruction processing efficiency. If the technology is applied to more cores workstation in parallel computing, it will be possible to complete Fourier transform imaging spectrometer real-time data processing with a single computer.
An FPGA Based Multiprocessing CPU for Beam Synchronous Timing in CERN's SPS and LHC

CERN Document Server

Ballester, F J; Gras, J J; Lewis, J; Savioz, J J; Serrano, J

2003-01-01

The Beam Synchronous Timing system (BST) will be used around the LHC and its injector, the SPS, to broadcast timing meassages and synchronize actions with the beam in different receivers. To achieve beam synchronization, the BST Master card encodes messages using the bunch clock, with a nominal value of 40.079 MHz for the LHC. These messages are produced by a set of tasks every revolution period, which is every 89 us for the LHC and every 23 us for the SPS, therefore imposing a hard real-time constraint on the system. To achieve determinism, the BST Master uses a dedicated CPU inside its main Field Programmable Gate Array (FPGA) featuring zero-delay hardware task switching and a reduced instruction set. This paper describes the BST Master card, stressing the main FPGA design, as well as the associated software, including the LynxOS driver and the tailor-made assembler.
Collaborating CPU and GPU for large-scale high-order CFD simulations with complex grids on the TianHe-1A supercomputer

Energy Technology Data Exchange (ETDEWEB)

Xu, Chuanfu, E-mail: xuchuanfu@nudt.edu.cn [College of Computer Science, National University of Defense Technology, Changsha 410073 (China); Deng, Xiaogang; Zhang, Lilun [College of Computer Science, National University of Defense Technology, Changsha 410073 (China); Fang, Jianbin [Parallel and Distributed Systems Group, Delft University of Technology, Delft 2628CD (Netherlands); Wang, Guangxue; Jiang, Yi [State Key Laboratory of Aerodynamics, P.O. Box 211, Mianyang 621000 (China); Cao, Wei; Che, Yonggang; Wang, Yongxian; Wang, Zhenghua; Liu, Wei; Cheng, Xinghua [College of Computer Science, National University of Defense Technology, Changsha 410073 (China)

2014-12-01

Programming and optimizing complex, real-world CFD codes on current many-core accelerated HPC systems is very challenging, especially when collaborating CPUs and accelerators to fully tap the potential of heterogeneous systems. In this paper, with a tri-level hybrid and heterogeneous programming model using MPI + OpenMP + CUDA, we port and optimize our high-order multi-block structured CFD software HOSTA on the GPU-accelerated TianHe-1A supercomputer. HOSTA adopts two self-developed high-order compact definite difference schemes WCNS and HDCS that can simulate flows with complex geometries. We present a dual-level parallelization scheme for efficient multi-block computation on GPUs and perform particular kernel optimizations for high-order CFD schemes. The GPU-only approach achieves a speedup of about 1.3 when comparing one Tesla M2050 GPU with two Xeon X5670 CPUs. To achieve a greater speedup, we collaborate CPU and GPU for HOSTA instead of using a naive GPU-only approach. We present a novel scheme to balance the loads between the store-poor GPU and the store-rich CPU. Taking CPU and GPU load balance into account, we improve the maximum simulation problem size per TianHe-1A node for HOSTA by 2.3×, meanwhile the collaborative approach can improve the performance by around 45% compared to the GPU-only approach. Further, to scale HOSTA on TianHe-1A, we propose a gather/scatter optimization to minimize PCI-e data transfer times for ghost and singularity data of 3D grid blocks, and overlap the collaborative computation and communication as far as possible using some advanced CUDA and MPI features. Scalability tests show that HOSTA can achieve a parallel efficiency of above 60% on 1024 TianHe-1A nodes. With our method, we have successfully simulated an EET high-lift airfoil configuration containing 800M cells and China's large civil airplane configuration containing 150M cells. To our best knowledge, those are the largest-scale CPU–GPU collaborative simulations
Collaborating CPU and GPU for large-scale high-order CFD simulations with complex grids on the TianHe-1A supercomputer

International Nuclear Information System (INIS)

Xu, Chuanfu; Deng, Xiaogang; Zhang, Lilun; Fang, Jianbin; Wang, Guangxue; Jiang, Yi; Cao, Wei; Che, Yonggang; Wang, Yongxian; Wang, Zhenghua; Liu, Wei; Cheng, Xinghua

2014-01-01

Programming and optimizing complex, real-world CFD codes on current many-core accelerated HPC systems is very challenging, especially when collaborating CPUs and accelerators to fully tap the potential of heterogeneous systems. In this paper, with a tri-level hybrid and heterogeneous programming model using MPI + OpenMP + CUDA, we port and optimize our high-order multi-block structured CFD software HOSTA on the GPU-accelerated TianHe-1A supercomputer. HOSTA adopts two self-developed high-order compact definite difference schemes WCNS and HDCS that can simulate flows with complex geometries. We present a dual-level parallelization scheme for efficient multi-block computation on GPUs and perform particular kernel optimizations for high-order CFD schemes. The GPU-only approach achieves a speedup of about 1.3 when comparing one Tesla M2050 GPU with two Xeon X5670 CPUs. To achieve a greater speedup, we collaborate CPU and GPU for HOSTA instead of using a naive GPU-only approach. We present a novel scheme to balance the loads between the store-poor GPU and the store-rich CPU. Taking CPU and GPU load balance into account, we improve the maximum simulation problem size per TianHe-1A node for HOSTA by 2.3×, meanwhile the collaborative approach can improve the performance by around 45% compared to the GPU-only approach. Further, to scale HOSTA on TianHe-1A, we propose a gather/scatter optimization to minimize PCI-e data transfer times for ghost and singularity data of 3D grid blocks, and overlap the collaborative computation and communication as far as possible using some advanced CUDA and MPI features. Scalability tests show that HOSTA can achieve a parallel efficiency of above 60% on 1024 TianHe-1A nodes. With our method, we have successfully simulated an EET high-lift airfoil configuration containing 800M cells and China's large civil airplane configuration containing 150M cells. To our best knowledge, those are the largest-scale CPU–GPU collaborative simulations
A Bit String Content Aware Chunking Strategy for Reduced CPU Energy on Cloud Storage

Directory of Open Access Journals (Sweden)

Bin Zhou

2015-01-01

Full Text Available In order to achieve energy saving and reduce the total cost of ownership, green storage has become the first priority for data center. Detecting and deleting the redundant data are the key factors to the reduction of the energy consumption of CPU, while high performance stable chunking strategy provides the groundwork for detecting redundant data. The existing chunking algorithm greatly reduces the system performance when confronted with big data and it wastes a lot of energy. Factors affecting the chunking performance are analyzed and discussed in the paper and a new fingerprint signature calculation is implemented. Furthermore, a Bit String Content Aware Chunking Strategy (BCCS is put forward. This strategy reduces the cost of signature computation in chunking process to improve the system performance and cuts down the energy consumption of the cloud storage data center. On the basis of relevant test scenarios and test data of this paper, the advantages of the chunking strategy are verified.

Design of a memory-access controller with 3.71-times-enhanced energy efficiency for Internet-of-Things-oriented nonvolatile microcontroller unit

Science.gov (United States)

Natsui, Masanori; Hanyu, Takahiro

2018-04-01

In realizing a nonvolatile microcontroller unit (MCU) for sensor nodes in Internet-of-Things (IoT) applications, it is important to solve the data-transfer bottleneck between the central processing unit (CPU) and the nonvolatile memory constituting the MCU. As one circuit-oriented approach to solving this problem, we propose a memory access minimization technique for magnetoresistive-random-access-memory (MRAM)-embedded nonvolatile MCUs. In addition to multiplexing and prefetching of memory access, the proposed technique realizes efficient instruction fetch by eliminating redundant memory access while considering the code length of the instruction to be fetched and the transition of the memory address to be accessed. As a result, the performance of the MCU can be improved while relaxing the performance requirement for the embedded MRAM, and compact and low-power implementation can be performed as compared with the conventional cache-based one. Through the evaluation using a system consisting of a general purpose 32-bit CPU and embedded MRAM, it is demonstrated that the proposed technique increases the peak efficiency of the system up to 3.71 times, while a 2.29-fold area reduction is achieved compared with the cache-based one.
Integration in a nuclear physics experiment of a visualization unit managed by a microprocessor

International Nuclear Information System (INIS)

Lefebvre, M.

1976-01-01

A microprocessor (Intel 8080) is introduced in the equipment controlling the (e,e'p) experiment that will take place at the linear accelerator operating in the premises of CEA (Orme des Merisiers, Gif-sur-Yvette, France). The purpose of the microprocessor is to handle the visualization tasks that are necessary to have a continuous control of the experiment. By doing so more time and more memory will be left for data processing by the calculator unit. In a forward version of the system, the controlling of the level of helium in the target might also be in charge of the microprocessor. This work is divided into 7 main parts: 1) a presentation of the linear accelerator and its experimental facilities, 2) the Intel 8080 micro-processor and its programming, 3) the implementation of the micro-processor in the electronic system, 4) the management of the memory, 5) data acquisition, 6) the keyboard, and 7) the visualization unit [fr
Integrating post-Newtonian equations on graphics processing units

Energy Technology Data Exchange (ETDEWEB)

Herrmann, Frank; Tiglio, Manuel [Department of Physics, Center for Fundamental Physics, and Center for Scientific Computation and Mathematical Modeling, University of Maryland, College Park, MD 20742 (United States); Silberholz, John [Center for Scientific Computation and Mathematical Modeling, University of Maryland, College Park, MD 20742 (United States); Bellone, Matias [Facultad de Matematica, Astronomia y Fisica, Universidad Nacional de Cordoba, Cordoba 5000 (Argentina); Guerberoff, Gustavo, E-mail: tiglio@umd.ed [Facultad de Ingenieria, Instituto de Matematica y Estadistica ' Prof. Ing. Rafael Laguardia' , Universidad de la Republica, Montevideo (Uruguay)

2010-02-07

We report on early results of a numerical and statistical study of binary black hole inspirals. The two black holes are evolved using post-Newtonian approximations starting with initially randomly distributed spin vectors. We characterize certain aspects of the distribution shortly before merger. In particular we note the uniform distribution of black hole spin vector dot products shortly before merger and a high correlation between the initial and final black hole spin vector dot products in the equal-mass, maximally spinning case. More than 300 million simulations were performed on graphics processing units, and we demonstrate a speed-up of a factor 50 over a more conventional CPU implementation. (fast track communication)
Real-time autocorrelator for fluorescence correlation spectroscopy based on graphical-processor-unit architecture: method, implementation, and comparative studies

Science.gov (United States)

Laracuente, Nicholas; Grossman, Carl

2013-03-01

We developed an algorithm and software to calculate autocorrelation functions from real-time photon-counting data using the fast, parallel capabilities of graphical processor units (GPUs). Recent developments in hardware and software have allowed for general purpose computing with inexpensive GPU hardware. These devices are more suited for emulating hardware autocorrelators than traditional CPU-based software applications by emphasizing parallel throughput over sequential speed. Incoming data are binned in a standard multi-tau scheme with configurable points-per-bin size and are mapped into a GPU memory pattern to reduce time-expensive memory access. Applications include dynamic light scattering (DLS) and fluorescence correlation spectroscopy (FCS) experiments. We ran the software on a 64-core graphics pci card in a 3.2 GHz Intel i5 CPU based computer running Linux. FCS measurements were made on Alexa-546 and Texas Red dyes in a standard buffer (PBS). Software correlations were compared to hardware correlator measurements on the same signals. Supported by HHMI and Swarthmore College
Three Dimensional Simulation of Ion Thruster Plume-Spacecraft Interaction Based on a Graphic Processor Unit

International Nuclear Information System (INIS)

Ren Junxue; Xie Kan; Qiu Qian; Tang Haibin; Li Juan; Tian Huabing

2013-01-01

Based on the three-dimensional particle-in-cell (PIC) method and Compute Unified Device Architecture (CUDA), a parallel particle simulation code combined with a graphic processor unit (GPU) has been developed for the simulation of charge-exchange (CEX) xenon ions in the plume of an ion thruster. Using the proposed technique, the potential and CEX plasma distribution are calculated for the ion thruster plume surrounding the DS1 spacecraft at different thrust levels. The simulation results are in good agreement with measured CEX ion parameters reported in literature, and the GPU's results are equal to a CPU's. Compared with a single CPU Intel Core 2 E6300, 16-processor GPU NVIDIA GeForce 9400 GT indicates a speedup factor of 3.6 when the total macro particle number is 1.1×10 6 . The simulation results also reveal how the back flow CEX plasma affects the spacecraft floating potential, which indicates that the plume of the ion thruster is indeed able to alleviate the extreme negative floating potentials of spacecraft in geosynchronous orbit
Multidisciplinary Simulation Acceleration using Multiple Shared-Memory Graphical Processing Units

Science.gov (United States)

Kemal, Jonathan Yashar

For purposes of optimizing and analyzing turbomachinery and other designs, the unsteady Favre-averaged flow-field differential equations for an ideal compressible gas can be solved in conjunction with the heat conduction equation. We solve all equations using the finite-volume multiple-grid numerical technique, with the dual time-step scheme used for unsteady simulations. Our numerical solver code targets CUDA-capable Graphical Processing Units (GPUs) produced by NVIDIA. Making use of MPI, our solver can run across networked compute notes, where each MPI process can use either a GPU or a Central Processing Unit (CPU) core for primary solver calculations. We use NVIDIA Tesla C2050/C2070 GPUs based on the Fermi architecture, and compare our resulting performance against Intel Zeon X5690 CPUs. Solver routines converted to CUDA typically run about 10 times faster on a GPU for sufficiently dense computational grids. We used a conjugate cylinder computational grid and ran a turbulent steady flow simulation using 4 increasingly dense computational grids. Our densest computational grid is divided into 13 blocks each containing 1033x1033 grid points, for a total of 13.87 million grid points or 1.07 million grid points per domain block. To obtain overall speedups, we compare the execution time of the solver's iteration loop, including all resource intensive GPU-related memory copies. Comparing the performance of 8 GPUs to that of 8 CPUs, we obtain an overall speedup of about 6.0 when using our densest computational grid. This amounts to an 8-GPU simulation running about 39.5 times faster than running than a single-CPU simulation.
Monitoring of mass flux of catalyst FCC in a Cold Pilot Unit by gamma radiation transmission; Monitoramento da taxa de fluxo do catalisador FCC em uma unidade piloto a frio por medicao de transmissao gama

Energy Technology Data Exchange (ETDEWEB)

Brito, Marcio Fernando Paixao de

2014-09-01

This paper proposes a model for monitoring the mass flow of catalyst FCC - Fluid Catalytic Cracking - in a CPU - Cold Pilot unit - due to the injection of air and solid by gamma radiation transmission. The CPU simplifies the process of FCC, which is represented by the catalyst cycle, and it was constructed of acrylic, so that the flow can be visualized. The CPU consists of riser separation chamber and return column, and simulates the riser reactor of the FCC process. The catalyst is injected into the column back to the base of the riser, an inclined tube, where the compressed air means that there fluidization along the riser. When the catalyst comes in the separation chamber, the solid phase is sent to the return column, and the gas phase exits the system through one of the four cyclones at the top of the separation chamber. The transmission gamma of measures will be made by means of three test sections that have source and detector shielded. Pressure drop in the riser measurements are made through three pressure gauges positioned on the riser. The source used was Am-241 gamma ray with energy of 60 keV, and detector used was a scintillator of NaI (Tl) of 2 {sup x} 2{sup .} Measures the mass flow of catalyst are made by varying the seal of the catalyst, and density of solid in the riser because with the combination of these measures can determine the speed of the catalyst in the riser. The results show that the transmission gamma is a suitable technique for monitoring the flow of catalyst, flow model in CPU is annular, tomography third generation is more appropriate to study the CPU and the density variation in circulation in the CPU decreases linearly with increasing air flow. (author)
Porting of the transfer-matrix method for multilayer thin-film computations on graphics processing units

Science.gov (United States)

Limmer, Steffen; Fey, Dietmar

2013-07-01

Thin-film computations are often a time-consuming task during optical design. An efficient way to accelerate these computations with the help of graphics processing units (GPUs) is described. It turned out that significant speed-ups can be achieved. We investigate the circumstances under which the best speed-up values can be expected. Therefore we compare different GPUs among themselves and with a modern CPU. Furthermore, the effect of thickness modulation on the speed-up and the runtime behavior depending on the input data is examined.
Evaluation of Selected Resource Allocation and Scheduling Methods in Heterogeneous Many-Core Processors and Graphics Processing Units

Directory of Open Access Journals (Sweden)

Ciznicki Milosz

2014-12-01

Full Text Available Heterogeneous many-core computing resources are increasingly popular among users due to their improved performance over homogeneous systems. Many developers have realized that heterogeneous systems, e.g. a combination of a shared memory multi-core CPU machine with massively parallel Graphics Processing Units (GPUs, can provide significant performance opportunities to a wide range of applications. However, the best overall performance can only be achieved if application tasks are efficiently assigned to different types of processor units in time taking into account their specific resource requirements. Additionally, one should note that available heterogeneous resources have been designed as general purpose units, however, with many built-in features accelerating specific application operations. In other words, the same algorithm or application functionality can be implemented as a different task for CPU or GPU. Nevertheless, from the perspective of various evaluation criteria, e.g. the total execution time or energy consumption, we may observe completely different results. Therefore, as tasks can be scheduled and managed in many alternative ways on both many-core CPUs or GPUs and consequently have a huge impact on the overall computing resources performance, there are needs for new and improved resource management techniques. In this paper we discuss results achieved during experimental performance studies of selected task scheduling methods in heterogeneous computing systems. Additionally, we present a new architecture for resource allocation and task scheduling library which provides a generic application programming interface at the operating system level for improving scheduling polices taking into account a diversity of tasks and heterogeneous computing resources characteristics.
Graphics processor efficiency for realization of rapid tabular computations

International Nuclear Information System (INIS)

Dudnik, V.A.; Kudryavtsev, V.I.; Us, S.A.; Shestakov, M.V.

2016-01-01

Capabilities of graphics processing units (GPU) and central processing units (CPU) have been investigated for realization of fast-calculation algorithms with the use of tabulated functions. The realization of tabulated functions is exemplified by the GPU/CPU architecture-based processors. Comparison is made between the operating efficiencies of GPU and CPU, employed for tabular calculations at different conditions of use. Recommendations are formulated for the use of graphical and central processors to speed up scientific and engineering computations through the use of tabulated functions
Efficient molecular dynamics simulations with many-body potentials on graphics processing units

Science.gov (United States)

Fan, Zheyong; Chen, Wei; Vierimaa, Ville; Harju, Ari

2017-09-01

Graphics processing units have been extensively used to accelerate classical molecular dynamics simulations. However, there is much less progress on the acceleration of force evaluations for many-body potentials compared to pairwise ones. In the conventional force evaluation algorithm for many-body potentials, the force, virial stress, and heat current for a given atom are accumulated within different loops, which could result in write conflict between different threads in a CUDA kernel. In this work, we provide a new force evaluation algorithm, which is based on an explicit pairwise force expression for many-body potentials derived recently (Fan et al., 2015). In our algorithm, the force, virial stress, and heat current for a given atom can be accumulated within a single thread and is free of write conflicts. We discuss the formulations and algorithms and evaluate their performance. A new open-source code, GPUMD, is developed based on the proposed formulations. For the Tersoff many-body potential, the double precision performance of GPUMD using a Tesla K40 card is equivalent to that of the LAMMPS (Large-scale Atomic/Molecular Massively Parallel Simulator) molecular dynamics code running with about 100 CPU cores (Intel Xeon CPU X5670 @ 2.93 GHz).
Bradykinesia-akinesia incoordination test: validating an online keyboard test of upper limb function.

Science.gov (United States)

Noyce, Alastair J; Nagy, Anna; Acharya, Shami; Hadavi, Shahrzad; Bestwick, Jonathan P; Fearnley, Julian; Lees, Andrew J; Giovannoni, Gavin

2014-01-01

The Bradykinesia Akinesia Incoordination (BRAIN) test is a computer keyboard-tapping task that was developed for use in assessing the effect of symptomatic treatment on motor function in Parkinson's disease (PD). An online version has now been designed for use in a wider clinical context and the research setting. Validation of the online BRAIN test was undertaken in 58 patients with Parkinson's disease (PD) and 93 age-matched, non-neurological controls. Kinesia scores (KS30, number of key taps in 30 seconds), akinesia times (AT30, mean dwell time on each key in milliseconds), incoordination scores (IS30, variance of travelling time between key presses) and dysmetria scores (DS30, accuracy of key presses) were compared between groups. These parameters were correlated against total motor scores and sub-scores from the Unified Parkinson's Disease Rating Scale (UPDRS). Mean KS30, AT30 and IS30 were significantly different between PD patients and controls (p≤0.0001). Sensitivity for 85% specificity was 50% for KS30, 40% for AT30 and 29% for IS30. KS30, AT30 and IS30 correlated significantly with UPDRS total motor scores (r = -0.53, r = 0.27 and r = 0.28 respectively) and motor UPDRS sub-scores. The reliability of KS30, AT30 and DS30 was good on repeated testing. The BRAIN test is a reliable, convenient test of upper limb motor function that can be used routinely in the outpatient clinic, at home and in clinical trials. In addition, it can be used as an objective longitudinal measurement of emerging motor dysfunction for the prediction of PD in at-risk cohorts.
Bradykinesia-akinesia incoordination test: validating an online keyboard test of upper limb function.

Directory of Open Access Journals (Sweden)

Alastair J Noyce

Full Text Available The Bradykinesia Akinesia Incoordination (BRAIN test is a computer keyboard-tapping task that was developed for use in assessing the effect of symptomatic treatment on motor function in Parkinson's disease (PD. An online version has now been designed for use in a wider clinical context and the research setting.Validation of the online BRAIN test was undertaken in 58 patients with Parkinson's disease (PD and 93 age-matched, non-neurological controls. Kinesia scores (KS30, number of key taps in 30 seconds, akinesia times (AT30, mean dwell time on each key in milliseconds, incoordination scores (IS30, variance of travelling time between key presses and dysmetria scores (DS30, accuracy of key presses were compared between groups. These parameters were correlated against total motor scores and sub-scores from the Unified Parkinson's Disease Rating Scale (UPDRS.Mean KS30, AT30 and IS30 were significantly different between PD patients and controls (p≤0.0001. Sensitivity for 85% specificity was 50% for KS30, 40% for AT30 and 29% for IS30. KS30, AT30 and IS30 correlated significantly with UPDRS total motor scores (r = -0.53, r = 0.27 and r = 0.28 respectively and motor UPDRS sub-scores. The reliability of KS30, AT30 and DS30 was good on repeated testing.The BRAIN test is a reliable, convenient test of upper limb motor function that can be used routinely in the outpatient clinic, at home and in clinical trials. In addition, it can be used as an objective longitudinal measurement of emerging motor dysfunction for the prediction of PD in at-risk cohorts.
Deployment of IPv6-only CPU resources at WLCG sites

Science.gov (United States)

Babik, M.; Chudoba, J.; Dewhurst, A.; Finnern, T.; Froy, T.; Grigoras, C.; Hafeez, K.; Hoeft, B.; Idiculla, T.; Kelsey, D. P.; López Muñoz, F.; Martelli, E.; Nandakumar, R.; Ohrenberg, K.; Prelz, F.; Rand, D.; Sciabà, A.; Tigerstedt, U.; Traynor, D.

2017-10-01

The fraction of Internet traffic carried over IPv6 continues to grow rapidly. IPv6 support from network hardware vendors and carriers is pervasive and becoming mature. A network infrastructure upgrade often offers sites an excellent window of opportunity to configure and enable IPv6. There is a significant overhead when setting up and maintaining dual-stack machines, so where possible sites would like to upgrade their services directly to IPv6 only. In doing so, they are also expediting the transition process towards its desired completion. While the LHC experiments accept there is a need to move to IPv6, it is currently not directly affecting their work. Sites are unwilling to upgrade if they will be unable to run LHC experiment workflows. This has resulted in a very slow uptake of IPv6 from WLCG sites. For several years the HEPiX IPv6 Working Group has been testing a range of WLCG services to ensure they are IPv6 compliant. Several sites are now running many of their services as dual-stack. The working group, driven by the requirements of the LHC VOs to be able to use IPv6-only opportunistic resources, continues to encourage wider deployment of dual-stack services to make the use of such IPv6-only clients viable. This paper presents the working group’s plan and progress so far to allow sites to deploy IPv6-only CPU resources. This includes making experiment central services dual-stack as well as a number of storage services. The monitoring, accounting and information services that are used by jobs also need to be upgraded. Finally the VO testing that has taken place on hosts connected via IPv6-only is reported.
Applying graphics processor units to Monte Carlo dose calculation in radiation therapy

Directory of Open Access Journals (Sweden)

Bakhtiari M

2010-01-01

Full Text Available We investigate the potential in using of using a graphics processor unit (GPU for Monte-Carlo (MC-based radiation dose calculations. The percent depth dose (PDD of photons in a medium with known absorption and scattering coefficients is computed using a MC simulation running on both a standard CPU and a GPU. We demonstrate that the GPU′s capability for massive parallel processing provides a significant acceleration in the MC calculation, and offers a significant advantage for distributed stochastic simulations on a single computer. Harnessing this potential of GPUs will help in the early adoption of MC for routine planning in a clinical environment.
Endoplasmic reticulum stress mediating downregulated StAR and 3-beta-HSD and low plasma testosterone caused by hypoxia is attenuated by CPU86017-RS and nifedipine

Directory of Open Access Journals (Sweden)

Liu Gui-Lai

2012-01-01

Full Text Available Abstract Background Hypoxia exposure initiates low serum testosterone levels that could be attributed to downregulated androgen biosynthesizing genes such as StAR (steroidogenic acute regulatory protein and 3-beta-HSD (3-beta-hydroxysteroid dehydrogenase in the testis. It was hypothesized that these abnormalities in the testis by hypoxia are associated with oxidative stress and an increase in chaperones of endoplasmic reticulum stress (ER stress and ER stress could be modulated by a reduction in calcium influx. Therefore, we verify that if an application of CPU86017-RS (simplified as RS, a derivative to berberine could alleviate the ER stress and depressed gene expressions of StAR and 3-beta-HSD, and low plasma testosterone in hypoxic rats, these were compared with those of nifedipine. Methods Adult male Sprague-Dawley rats were randomly divided into control, hypoxia for 28 days, and hypoxia treated (mg/kg, p.o. during the last 14 days with nifedipine (Nif, 10 and three doses of RS (20, 40, 80, and normal rats treated with RS isomer (80. Serum testosterone (T and luteinizing hormone (LH were measured. The testicular expressions of biomarkers including StAR, 3-beta-HSD, immunoglobulin heavy chain binding protein (Bip, double-strand RNA-activated protein kinase-like ER kinase (PERK and pro-apoptotic transcription factor C/EBP homologous protein (CHOP were measured. Results In hypoxic rats, serum testosterone levels decreased and mRNA and protein expressions of the testosterone biosynthesis related genes, StAR and 3-beta-HSD were downregulated. These changes were linked to an increase in oxidants and upregulated ER stress chaperones: Bip, PERK, CHOP and distorted histological structure of the seminiferous tubules in the testis. These abnormalities were attenuated significantly by CPU86017-RS and nifedipine. Conclusion Downregulated StAR and 3-beta-HSD significantly contribute to low testosterone in hypoxic rats and is associated with ER stress
A heterogeneous CPU+GPU Poisson solver for space charge calculations in beam dynamics studies

Energy Technology Data Exchange (ETDEWEB)

Zheng, Dawei; Rienen, Ursula van [University of Rostock, Institute of General Electrical Engineering (Germany)

2016-07-01

In beam dynamics studies in accelerator physics, space charge plays a central role in the low energy regime of an accelerator. Numerical space charge calculations are required, both, in the design phase and in the operation of the machines as well. Due to its efficiency, mostly the Particle-In-Cell (PIC) method is chosen for the space charge calculation. Then, the solution of Poisson's equation for the charge distribution in the rest frame is the most prominent part within the solution process. The Poisson solver directly affects the accuracy of the self-field applied on the charged particles when the equation of motion is solved in the laboratory frame. As the Poisson solver consumes the major part of the computing time in most simulations it has to be as fast as possible since it has to be carried out once per time step. In this work, we demonstrate a novel heterogeneous CPU+GPU routine for the Poisson solver. The novel solver also benefits from our new research results on the utilization of a discrete cosine transform within the classical Hockney and Eastwood's convolution routine.
The ATLAS Trigger Algorithms for General Purpose Graphics Processor Units

CERN Document Server

Tavares Delgado, Ademar; The ATLAS collaboration

2016-01-01

The ATLAS Trigger Algorithms for General Purpose Graphics Processor Units Type: Talk Abstract: We present the ATLAS Trigger algorithms developed to exploit General Purpose Graphics Processor Units. ATLAS is a particle physics experiment located on the LHC collider at CERN. The ATLAS Trigger system has two levels, hardware-based Level 1 and the High Level Trigger implemented in software running on a farm of commodity CPU. Performing the trigger event selection within the available farm resources presents a significant challenge that will increase future LHC upgrades. are being evaluated as a potential solution for trigger algorithms acceleration. Key factors determining the potential benefit of this new technology are the relative execution speedup, the number of GPUs required and the relative financial cost of the selected GPU. We have developed a trigger demonstrator which includes algorithms for reconstructing tracks in the Inner Detector and Muon Spectrometer and clusters of energy deposited in the Cal...
Accelerating the SCE-UA Global Optimization Method Based on Multi-Core CPU and Many-Core GPU

Directory of Open Access Journals (Sweden)

Guangyuan Kan

2016-01-01

Full Text Available The famous global optimization SCE-UA method, which has been widely used in the field of environmental model parameter calibration, is an effective and robust method. However, the SCE-UA method has a high computational load which prohibits the application of SCE-UA to high dimensional and complex problems. In recent years, the hardware of computer, such as multi-core CPUs and many-core GPUs, improves significantly. These much more powerful new hardware and their software ecosystems provide an opportunity to accelerate the SCE-UA method. In this paper, we proposed two parallel SCE-UA methods and implemented them on Intel multi-core CPU and NVIDIA many-core GPU by OpenMP and CUDA Fortran, respectively. The Griewank benchmark function was adopted in this paper to test and compare the performances of the serial and parallel SCE-UA methods. According to the results of the comparison, some useful advises were given to direct how to properly use the parallel SCE-UA methods.
Leveraging the checkpoint-restart technique for optimizing CPU efficiency of ATLAS production applications on opportunistic platforms

CERN Document Server

Cameron, David; The ATLAS collaboration

2017-01-01

Data processing applications of the ATLAS experiment, such as event simulation and reconstruction, spend considerable amount of time in the initialization phase. This phase includes loading a large number of shared libraries, reading detector geometry and condition data from external databases, building a transient representation of the detector geometry and initializing various algorithms and services. In some cases the initialization step can take as long as 10-15 minutes. Such slow initialization, being inherently serial, has a significant negative impact on overall CPU efficiency of the production job, especially when the job is executed on opportunistic, often short-lived, resources such as commercial clouds or volunteer computing. In order to improve this situation, we can take advantage of the fact that ATLAS runs large numbers of production jobs with similar configuration parameters (e.g. jobs within the same production task). This allows us to checkpoint one job at the end of its configuration step a...

GPScheDVS: A New Paradigm of the Autonomous CPU Speed Control for Commodity-OS-based General-Purpose Mobile Computers with a DVS-friendly Task Scheduling

OpenAIRE

Kim, Sookyoung

2008-01-01

This dissertation studies the problem of increasing battery life-time and reducing CPU heat dissipation without degrading system performance in commodity-OS-based general-purpose (GP) mobile computers using the dynamic voltage scaling (DVS) function of modern CPUs. The dissertation especially focuses on the impact of task scheduling on the effectiveness of DVS in achieving this goal. The task scheduling mechanism used in most contemporary general-purpose operating systems (GPOS) prioritizes t...
Accelerating cardiac bidomain simulations using graphics processing units.

Science.gov (United States)

Neic, A; Liebmann, M; Hoetzl, E; Mitchell, L; Vigmond, E J; Haase, G; Plank, G

2012-08-01

Anatomically realistic and biophysically detailed multiscale computer models of the heart are playing an increasingly important role in advancing our understanding of integrated cardiac function in health and disease. Such detailed simulations, however, are computationally vastly demanding, which is a limiting factor for a wider adoption of in-silico modeling. While current trends in high-performance computing (HPC) hardware promise to alleviate this problem, exploiting the potential of such architectures remains challenging since strongly scalable algorithms are necessitated to reduce execution times. Alternatively, acceleration technologies such as graphics processing units (GPUs) are being considered. While the potential of GPUs has been demonstrated in various applications, benefits in the context of bidomain simulations where large sparse linear systems have to be solved in parallel with advanced numerical techniques are less clear. In this study, the feasibility of multi-GPU bidomain simulations is demonstrated by running strong scalability benchmarks using a state-of-the-art model of rabbit ventricles. The model is spatially discretized using the finite element methods (FEM) on fully unstructured grids. The GPU code is directly derived from a large pre-existing code, the Cardiac Arrhythmia Research Package (CARP), with very minor perturbation of the code base. Overall, bidomain simulations were sped up by a factor of 11.8 to 16.3 in benchmarks running on 6-20 GPUs compared to the same number of CPU cores. To match the fastest GPU simulation which engaged 20 GPUs, 476 CPU cores were required on a national supercomputing facility.
Part-task simulator of a WWER-440 type nuclear power plant unit

International Nuclear Information System (INIS)

Palecek, P.

1990-01-01

In the present paper the design of a part-task simulator for WWER-440 type nuclear power plant units by the CEZ (Czech Power Works) Concern is reported. This part-task simulator has been designed for the training of NPP operating personnel. It includes a central computer that is coupled with the training work places and the trainer place. Interchange of information is performed by functional keyboards and semigraphical colour displays. The process is simulated, also in real time scale, on the basis of dynamic models. In addition to the precision of the models used, great importance has primarily been attached to plasticity of information presentation. The part-task simulator may be applied to simulation and demonstration as well as to teaching purposes. The paper presents the achieved state of implementation of the part-task simulator and points out some further stage of evolution. (author)
Coronary Artery Calcium as an Independent Surrogate Marker in the Risk Assessment of Patients With Atrial Fibrillation and an Intermediate Pretest Likelihood for Coronary Artery Disease Admitted to a German Chest Pain Unit.

Science.gov (United States)

Breuckmann, Frank; Olligs, Jan; Hinrichs, Liane; Koopmann, Matthias; Lichtenberg, Michael; Böse, Dirk; Fischer, Dieter; Eckardt, Lars; Waltenberger, Johannes; Garvey, J Lee

2016-03-01

About 10% of patients admitted to a chest pain unit (CPU) exhibit atrial fibrillation (AF). To determine whether calcium scores (CS) are superior over common risk scores for coronary artery disease (CAD) in patients presenting with atypical chest pain, newly diagnosed AF, and intermediate pretest probability for CAD within the CPU. In 73 subjects, CS was related to the following risk scores: Global Registry of Acute Coronary Events (GRACE) score, including a new model of a frequency-normalized approach; Thrombolysis In Myocardial Infarction score; European Society of Cardiology Systematic Coronary Risk Evaluation (SCORE); Framingham risk score; and Prospective Cardiovascular Münster Study score. Revascularization rates during index stay were assessed. Median CS was 77 (interquartile range, 1-270), with higher values in men and the left anterior descending artery. Only the modified GRACE (ρ = 0.27; P = 0.02) and the SCORE (ρ = 0.39; P risk scores and calcium burden, as well as revascularization rates during index stay, were low. By contrast, the determination of CS may be used as an additional surrogate marker in risk stratification in AF patients with intermediate pretest likelihood for CAD admitted to a CPU. © 2016 Wiley Periodicals, Inc.
Mission: Define Computer Literacy. The Illinois-Wisconsin ISACS Computer Coordinators' Committee on Computer Literacy Report (May 1985).

Science.gov (United States)

Computing Teacher, 1985

1985-01-01

Defines computer literacy and describes a computer literacy course which stresses ethics, hardware, and disk operating systems throughout. Core units on keyboarding, word processing, graphics, database management, problem solving, algorithmic thinking, and programing are outlined, together with additional units on spreadsheets, simulations,…
System for processing an encrypted instruction stream in hardware

Science.gov (United States)

Griswold, Richard L.; Nickless, William K.; Conrad, Ryan C.

2016-04-12

A system and method of processing an encrypted instruction stream in hardware is disclosed. Main memory stores the encrypted instruction stream and unencrypted data. A central processing unit (CPU) is operatively coupled to the main memory. A decryptor is operatively coupled to the main memory and located within the CPU. The decryptor decrypts the encrypted instruction stream upon receipt of an instruction fetch signal from a CPU core. Unencrypted data is passed through to the CPU core without decryption upon receipt of a data fetch signal.
Interactive dose shaping - efficient strategies for CPU-based real-time treatment planning

International Nuclear Information System (INIS)

Ziegenhein, P; Kamerling, C P; Oelfke, U

2014-01-01

Conventional intensity modulated radiation therapy (IMRT) treatment planning is based on the traditional concept of iterative optimization using an objective function specified by dose volume histogram constraints for pre-segmented VOIs. This indirect approach suffers from unavoidable shortcomings: i) The control of local dose features is limited to segmented VOIs. ii) Any objective function is a mathematical measure of the plan quality, i.e., is not able to define the clinically optimal treatment plan. iii) Adapting an existing plan to changed patient anatomy as detected by IGRT procedures is difficult. To overcome these shortcomings, we introduce the method of Interactive Dose Shaping (IDS) as a new paradigm for IMRT treatment planning. IDS allows for a direct and interactive manipulation of local dose features in real-time. The key element driving the IDS process is a two-step Dose Modification and Recovery (DMR) strategy: A local dose modification is initiated by the user which translates into modified fluence patterns. This also affects existing desired dose features elsewhere which is compensated by a heuristic recovery process. The IDS paradigm was implemented together with a CPU-based ultra-fast dose calculation and a 3D GUI for dose manipulation and visualization. A local dose feature can be implemented via the DMR strategy within 1-2 seconds. By imposing a series of local dose features, equal plan qualities could be achieved compared to conventional planning for prostate and head and neck cases within 1-2 minutes. The idea of Interactive Dose Shaping for treatment planning has been introduced and first applications of this concept have been realized.
The Research and Test of Fast Radio Burst Real-time Search Algorithm Based on GPU Acceleration

Science.gov (United States)

Wang, J.; Chen, M. Z.; Pei, X.; Wang, Z. Q.

2017-03-01

In order to satisfy the research needs of Nanshan 25 m radio telescope of Xinjiang Astronomical Observatory (XAO) and study the key technology of the planned QiTai radio Telescope (QTT), the receiver group of XAO studied the GPU (Graphics Processing Unit) based real-time FRB searching algorithm which developed from the original FRB searching algorithm based on CPU (Central Processing Unit), and built the FRB real-time searching system. The comparison of the GPU system and the CPU system shows that: on the basis of ensuring the accuracy of the search, the speed of the GPU accelerated algorithm is improved by 35-45 times compared with the CPU algorithm.
Compute-unified device architecture implementation of a block-matching algorithm for multiple graphical processing unit cards.

Science.gov (United States)

Massanes, Francesc; Cadennes, Marie; Brankov, Jovan G

2011-07-01

In this paper we describe and evaluate a fast implementation of a classical block matching motion estimation algorithm for multiple Graphical Processing Units (GPUs) using the Compute Unified Device Architecture (CUDA) computing engine. The implemented block matching algorithm (BMA) uses summed absolute difference (SAD) error criterion and full grid search (FS) for finding optimal block displacement. In this evaluation we compared the execution time of a GPU and CPU implementation for images of various sizes, using integer and non-integer search grids.The results show that use of a GPU card can shorten computation time by a factor of 200 times for integer and 1000 times for a non-integer search grid. The additional speedup for non-integer search grid comes from the fact that GPU has built-in hardware for image interpolation. Further, when using multiple GPU cards, the presented evaluation shows the importance of the data splitting method across multiple cards, but an almost linear speedup with a number of cards is achievable.In addition we compared execution time of the proposed FS GPU implementation with two existing, highly optimized non-full grid search CPU based motion estimations methods, namely implementation of the Pyramidal Lucas Kanade Optical flow algorithm in OpenCV and Simplified Unsymmetrical multi-Hexagon search in H.264/AVC standard. In these comparisons, FS GPU implementation still showed modest improvement even though the computational complexity of FS GPU implementation is substantially higher than non-FS CPU implementation.We also demonstrated that for an image sequence of 720×480 pixels in resolution, commonly used in video surveillance, the proposed GPU implementation is sufficiently fast for real-time motion estimation at 30 frames-per-second using two NVIDIA C1060 Tesla GPU cards.
Security central processing unit applications in the protection of nuclear facilities

International Nuclear Information System (INIS)

Goetzke, R.E.

1987-01-01

New or upgraded electronic security systems protecting nuclear facilities or complexes will be heavily computer dependent. Proper planning for new systems and the employment of new state-of-the-art 32 bit processors in the processing of subsystem reports are key elements in effective security systems. The processing of subsystem reports represents only a small segment of system overhead. In selecting a security system to meet the current and future needs for nuclear security applications the central processing unit (CPU) applied in the system architecture is the critical element in system performance. New 32 bit technology eliminates the need for program overlays while providing system programmers with well documented program tools to develop effective systems to operate in all phases of nuclear security applications
Graphics processing units in bioinformatics, computational biology and systems biology.

Science.gov (United States)

Nobile, Marco S; Cazzaniga, Paolo; Tangherloni, Andrea; Besozzi, Daniela

2017-09-01

Several studies in Bioinformatics, Computational Biology and Systems Biology rely on the definition of physico-chemical or mathematical models of biological systems at different scales and levels of complexity, ranging from the interaction of atoms in single molecules up to genome-wide interaction networks. Traditional computational methods and software tools developed in these research fields share a common trait: they can be computationally demanding on Central Processing Units (CPUs), therefore limiting their applicability in many circumstances. To overcome this issue, general-purpose Graphics Processing Units (GPUs) are gaining an increasing attention by the scientific community, as they can considerably reduce the running time required by standard CPU-based software, and allow more intensive investigations of biological systems. In this review, we present a collection of GPU tools recently developed to perform computational analyses in life science disciplines, emphasizing the advantages and the drawbacks in the use of these parallel architectures. The complete list of GPU-powered tools here reviewed is available at http://bit.ly/gputools. © The Author 2016. Published by Oxford University Press.
GPU-Accelerated Parallel FDTD on Distributed Heterogeneous Platform

Directory of Open Access Journals (Sweden)

Ronglin Jiang

2014-01-01

Full Text Available This paper introduces a (finite difference time domain FDTD code written in Fortran and CUDA for realistic electromagnetic calculations with parallelization methods of Message Passing Interface (MPI and Open Multiprocessing (OpenMP. Since both Central Processing Unit (CPU and Graphics Processing Unit (GPU resources are utilized, a faster execution speed can be reached compared to a traditional pure GPU code. In our experiments, 64 NVIDIA TESLA K20m GPUs and 64 INTEL XEON E5-2670 CPUs are used to carry out the pure CPU, pure GPU, and CPU + GPU tests. Relative to the pure CPU calculations for the same problems, the speedup ratio achieved by CPU + GPU calculations is around 14. Compared to the pure GPU calculations for the same problems, the CPU + GPU calculations have 7.6%–13.2% performance improvement. Because of the small memory size of GPUs, the FDTD problem size is usually very small. However, this code can enlarge the maximum problem size by 25% without reducing the performance of traditional pure GPU code. Finally, using this code, a microstrip antenna array with 16×18 elements is calculated and the radiation patterns are compared with the ones of MoM. Results show that there is a well agreement between them.
Brain morphometry shows effects of long-term musical practice in middle-aged keyboard players

Directory of Open Access Journals (Sweden)

Hanna eGärtner

2013-09-01

Full Text Available To what extent does musical practice change the structure of the brain? In order to understand how long-lasting musical training changes brain structure, 20 male right-handed, middle-aged professional musicians and 19 matched controls were investigated. Among the musicians, 13 were pianists or organists with intensive practice regimes. The others were either music teachers at schools or string instrumentalists, who had studied the piano at least as a subsidiary subject, and practiced less intensively. The study was based on T1-weighted MR images, which were analyzed using Deformation Field Morphometry. Cytoarchitectonic probabilistic maps of cortical areas and subcortical nuclei as well as myeloarchitectonic maps of fiber tracts were used as regions of interest to compare volume differences in the brains of musicians and controls. In addition, maps of voxel-wise volume differences were computed and analyzed.Musicians showed a significantly better symmetric motor performance as well as a greater capability of controlling hand independence than controls. Structural MRI-data revealed significant volumetric differences between the brains of keyboard players, who practiced intensively and controls in right sensorimotor areas and the corticospinal tract as well as in the entorhinal cortex and the left superior parietal lobule. Moreover, they showed also larger volumes in a comparable set of regions than the less intensively practicing musicians. The structural changes in the sensory and motor systems correspond well to the behavioral results, and can be interpreted in terms of plasticity as a result of intensive motor training. Areas of the superior parietal lobule and the entorhinal cortex might be enlarged in musicians due to their special skills in sight-playing and memorizing of scores. In conclusion, intensive and specific musical training seems to have an impact on brain structure, not only during the sensitive period of childhood but throughout
Acceleration of PIC simulation with GPU

International Nuclear Information System (INIS)

Suzuki, Junya; Shimazu, Hironori; Fukazawa, Keiichiro; Den, Mitsue

2011-01-01

Particle-in-cell (PIC) is a simulation technique for plasma physics. The large number of particles in high-resolution plasma simulation increases the volume computation required, making it vital to increase computation speed. In this study, we attempt to accelerate computation speed on graphics processing units (GPUs) using KEMPO, a PIC simulation code package. We perform two tests for benchmarking, with small and large grid sizes. In these tests, we run KEMPO1 code using a CPU only, both a CPU and a GPU, and a GPU only. The results showed that performance using only a GPU was twice that of using a CPU alone. While, execution time for using both a CPU and GPU is comparable to the tests with a CPU alone, because of the significant bottleneck in communication between the CPU and GPU. (author)
Parallel direct solver for finite element modeling of manufacturing processes

DEFF Research Database (Denmark)

Nielsen, Chris Valentin; Martins, P.A.F.

2017-01-01

The central processing unit (CPU) time is of paramount importance in finite element modeling of manufacturing processes. Because the most significant part of the CPU time is consumed in solving the main system of equations resulting from finite element assemblies, different approaches have been...
Graphics Processing Unit Enhanced Parallel Document Flocking Clustering

Energy Technology Data Exchange (ETDEWEB)

Cui, Xiaohui [ORNL; Potok, Thomas E [ORNL; ST Charles, Jesse Lee [ORNL

2010-01-01

Analyzing and clustering documents is a complex problem. One explored method of solving this problem borrows from nature, imitating the flocking behavior of birds. One limitation of this method of document clustering is its complexity O(n2). As the number of documents grows, it becomes increasingly difficult to generate results in a reasonable amount of time. In the last few years, the graphics processing unit (GPU) has received attention for its ability to solve highly-parallel and semi-parallel problems much faster than the traditional sequential processor. In this paper, we have conducted research to exploit this archi- tecture and apply its strengths to the flocking based document clustering problem. Using the CUDA platform from NVIDIA, we developed a doc- ument flocking implementation to be run on the NVIDIA GEFORCE GPU. Performance gains ranged from thirty-six to nearly sixty times improvement of the GPU over the CPU implementation.
An Adaptive Hybrid Multiprocessor technique for bioinformatics sequence alignment

KAUST Repository

Bonny, Talal

2012-07-28

Sequence alignment algorithms such as the Smith-Waterman algorithm are among the most important applications in the development of bioinformatics. Sequence alignment algorithms must process large amounts of data which may take a long time. Here, we introduce our Adaptive Hybrid Multiprocessor technique to accelerate the implementation of the Smith-Waterman algorithm. Our technique utilizes both the graphics processing unit (GPU) and the central processing unit (CPU). It adapts to the implementation according to the number of CPUs given as input by efficiently distributing the workload between the processing units. Using existing resources (GPU and CPU) in an efficient way is a novel approach. The peak performance achieved for the platforms GPU + CPU, GPU + 2CPUs, and GPU + 3CPUs is 10.4 GCUPS, 13.7 GCUPS, and 18.6 GCUPS, respectively (with the query length of 511 amino acid). © 2010 IEEE.
Pre-design safety analyses of cesium ion-exchange compact processing unit

International Nuclear Information System (INIS)

Richmond, W.G.; Ballinger, M.Y.

1993-11-01

This report describes an innovative radioactive waste pretreatment concept. This cost-effective, highly flexible processing approach is based on the use of Compact Processing Units (CPUs) to treat highly radioactive tank wastes in proximity to the tanks themselves. The units will be designed to treat tank wastes at rates from 8 to 20 liters per minute and have the capacity to remove cesium, and ultimately other radionuclides, from 4,000 cubic meters of waste per year. This new concept is being integrated into waste per year. This new concept is being integrated into Hanford's tank farm management plans by a team of PNL and Westinghouse Hanford Company scientists and engineers. The first CPU to be designed and deployed will be used to remove cesium from Hanford double-shell tank (DST) supernatant waste. Separating Cs from the waste would be a major step toward lowering the radioactivity in the bulk of the waste, allowing it to be disposed of as a low-level solid waste form (e.g.,grout), while concentrating the more highly radioactive material for processing as high-level solid waste
Optimized Laplacian image sharpening algorithm based on graphic processing unit

Science.gov (United States)

Ma, Tinghuai; Li, Lu; Ji, Sai; Wang, Xin; Tian, Yuan; Al-Dhelaan, Abdullah; Al-Rodhaan, Mznah

2014-12-01

In classical Laplacian image sharpening, all pixels are processed one by one, which leads to large amount of computation. Traditional Laplacian sharpening processed on CPU is considerably time-consuming especially for those large pictures. In this paper, we propose a parallel implementation of Laplacian sharpening based on Compute Unified Device Architecture (CUDA), which is a computing platform of Graphic Processing Units (GPU), and analyze the impact of picture size on performance and the relationship between the processing time of between data transfer time and parallel computing time. Further, according to different features of different memory, an improved scheme of our method is developed, which exploits shared memory in GPU instead of global memory and further increases the efficiency. Experimental results prove that two novel algorithms outperform traditional consequentially method based on OpenCV in the aspect of computing speed.
Comparing GPU and CPU in OLAP Cubes Creation

Science.gov (United States)

Kaczmarski, Krzysztof

GPGPU (General Purpose Graphical Processing Unit) programming is receiving more attention recently because of enormous computations speed up offered by this technology. GPGPU is applied in many branches of science and industry not excluding databases, even if this is not the primary field of expected benefits.

Architecture of security management unit for safe hosting of multiple agents

Science.gov (United States)

Gilmont, Tanguy; Legat, Jean-Didier; Quisquater, Jean-Jacques

1999-04-01

In such growing areas as remote applications in large public networks, electronic commerce, digital signature, intellectual property and copyright protection, and even operating system extensibility, the hardware security level offered by existing processors is insufficient. They lack protection mechanisms that prevent the user from tampering critical data owned by those applications. Some devices make exception, but have not enough processing power nor enough memory to stand up to such applications (e.g. smart cards). This paper proposes an architecture of secure processor, in which the classical memory management unit is extended into a new security management unit. It allows ciphered code execution and ciphered data processing. An internal permanent memory can store cipher keys and critical data for several client agents simultaneously. The ordinary supervisor privilege scheme is replaced by a privilege inheritance mechanism that is more suited to operating system extensibility. The result is a secure processor that has hardware support for extensible multitask operating systems, and can be used for both general applications and critical applications needing strong protection. The security management unit and the internal permanent memory can be added to an existing CPU core without loss of performance, and do not require it to be modified.
Flocking-based Document Clustering on the Graphics Processing Unit

Energy Technology Data Exchange (ETDEWEB)

Cui, Xiaohui [ORNL; Potok, Thomas E [ORNL; Patton, Robert M [ORNL; ST Charles, Jesse Lee [ORNL

2008-01-01

Abstract?Analyzing and grouping documents by content is a complex problem. One explored method of solving this problem borrows from nature, imitating the flocking behavior of birds. Each bird represents a single document and flies toward other documents that are similar to it. One limitation of this method of document clustering is its complexity O(n2). As the number of documents grows, it becomes increasingly difficult to receive results in a reasonable amount of time. However, flocking behavior, along with most naturally inspired algorithms such as ant colony optimization and particle swarm optimization, are highly parallel and have found increased performance on expensive cluster computers. In the last few years, the graphics processing unit (GPU) has received attention for its ability to solve highly-parallel and semi-parallel problems much faster than the traditional sequential processor. Some applications see a huge increase in performance on this new platform. The cost of these high-performance devices is also marginal when compared with the price of cluster machines. In this paper, we have conducted research to exploit this architecture and apply its strengths to the document flocking problem. Our results highlight the potential benefit the GPU brings to all naturally inspired algorithms. Using the CUDA platform from NIVIDA? we developed a document flocking implementation to be run on the NIVIDA?GEFORCE 8800. Additionally, we developed a similar but sequential implementation of the same algorithm to be run on a desktop CPU. We tested the performance of each on groups of news articles ranging in size from 200 to 3000 documents. The results of these tests were very significant. Performance gains ranged from three to nearly five times improvement of the GPU over the CPU implementation. This dramatic improvement in runtime makes the GPU a potentially revolutionary platform for document clustering algorithms.
Hybrid GPU-CPU adaptive precision ray-triangle intersection tests for robust high-performance GPU dosimetry computations

International Nuclear Information System (INIS)

Perrotte, Lancelot; Bodin, Bruno; Chodorge, Laurent

2011-01-01

Before an intervention on a nuclear site, it is essential to study different scenarios to identify the less dangerous one for the operator. Therefore, it is mandatory to dispose of an efficient dosimetry simulation code with accurate results. One classical method in radiation protection is the straight-line attenuation method with build-up factors. In the case of 3D industrial scenes composed of meshes, the computation cost resides in the fast computation of all of the intersections between the rays and the triangles of the scene. Efficient GPU algorithms have already been proposed, that enable dosimetry calculation for a huge scene (800000 rays, 800000 triangles) in a fraction of second. But these algorithms are not robust: because of the rounding caused by floating-point arithmetic, the numerical results of the ray-triangle intersection tests can differ from the expected mathematical results. In worst case scenario, this can lead to a computed dose rate dramatically inferior to the real dose rate to which the operator is exposed. In this paper, we present a hybrid GPU-CPU algorithm to manage adaptive precision floating-point arithmetic. This algorithm allows robust ray-triangle intersection tests, with very small loss of performance (less than 5 % overhead), and without any need for scene-dependent tuning. (author)
First Update of the Criteria for Certification of Chest Pain Units in Germany: Facelift or New Model?

Science.gov (United States)

Breuckmann, Frank; Rassaf, Tienush

2016-03-01

In an effort to provide a systematic and specific standard-of-care for patients with acute chest pain, the German Cardiac Society introduced criteria for certification of specialized chest pain units (CPUs) in 2008, which have been replaced by a recent update published in 2015. We reviewed the development of CPU establishment in Germany during the past 7 years and compared and commented the current update of the certification criteria. As of October 2015, 228 CPUs in Germany have been successfully certified by the German Cardiac Society; 300 CPUs are needed for full coverage closing gaps in rural regions. Current changes of the criteria mainly affect guideline-adherent adaptions of diagnostic work-ups, therapeutic strategies, risk stratification, in-hospital timing and education, and quality measures, whereas the overall structure remained unchanged. Benchmarking by participation within the German CPU registry is encouraged. Even though the history is short, the concept of certified CPUs in Germany is accepted and successful underlined by its recent implementation in national and international guidelines. First registry data demonstrated a high standard of quality-of-care. The current update provides rational adaptions to new guidelines and developments without raising the level for successful certifications. A periodic release of fast-track updates with shorter time frames and an increase of minimum requirements should be considered.
Development of efficient GPU parallelization of WRF Yonsei University planetary boundary layer scheme

Directory of Open Access Journals (Sweden)

M. Huang

2015-09-01

Full Text Available The planetary boundary layer (PBL is the lowest part of the atmosphere and where its character is directly affected by its contact with the underlying planetary surface. The PBL is responsible for vertical sub-grid-scale fluxes due to eddy transport in the whole atmospheric column. It determines the flux profiles within the well-mixed boundary layer and the more stable layer above. It thus provides an evolutionary model of atmospheric temperature, moisture (including clouds, and horizontal momentum in the entire atmospheric column. For such purposes, several PBL models have been proposed and employed in the weather research and forecasting (WRF model of which the Yonsei University (YSU scheme is one. To expedite weather research and prediction, we have put tremendous effort into developing an accelerated implementation of the entire WRF model using graphics processing unit (GPU massive parallel computing architecture whilst maintaining its accuracy as compared to its central processing unit (CPU-based implementation. This paper presents our efficient GPU-based design on a WRF YSU PBL scheme. Using one NVIDIA Tesla K40 GPU, the GPU-based YSU PBL scheme achieves a speedup of 193× with respect to its CPU counterpart running on one CPU core, whereas the speedup for one CPU socket (4 cores with respect to 1 CPU core is only 3.5×. We can even boost the speedup to 360× with respect to 1 CPU core as two K40 GPUs are applied.
Remote Working Level Monitor. Final report

International Nuclear Information System (INIS)

1977-01-01

The Remote Working Level Monitor (RWLM) is an instrument used to remotely monitor the RN-daughter concentrations and the Working Level (WL). It is an ac powered, microprocessor based instrument which multiplexes two independent detector units to a single central processor unit (CPU). The CPU controls the actuation of the detector units and processes and outputs the data received from these remote detector units. The remote detector units are fully automated and require no manual operation once they are set up. They detect and separate the alpha emitters of RaA and RaC' as well as detecting the beta emitters of RaB and RaC. The resultant pulses from these detected radioisotopes are transmitted to the CPU for processing. The programmed microprocessor performs the mathematical manipulations necessary to output accurate Rn-daughter concentrations and the WL. A special subroutine within the program enables the RWLM to run and output a calibration procedure on command. The data resulting from this request can then be processed in a separate program on most computers capable of BASIC programming. The calibration program results in the derivation of coefficients and beta efficiencies which provides calibrated coefficients and beta efficiencies
High-performance computing on GPUs for resistivity logging of oil and gas wells

Science.gov (United States)

Glinskikh, V.; Dudaev, A.; Nechaev, O.; Surodina, I.

2017-10-01

We developed and implemented into software an algorithm for high-performance simulation of electrical logs from oil and gas wells using high-performance heterogeneous computing. The numerical solution of the 2D forward problem is based on the finite-element method and the Cholesky decomposition for solving a system of linear algebraic equations (SLAE). Software implementations of the algorithm used the NVIDIA CUDA technology and computing libraries are made, allowing us to perform decomposition of SLAE and find its solution on central processor unit (CPU) and graphics processor unit (GPU). The calculation time is analyzed depending on the matrix size and number of its non-zero elements. We estimated the computing speed on CPU and GPU, including high-performance heterogeneous CPU-GPU computing. Using the developed algorithm, we simulated resistivity data in realistic models.
SpaceCubeX: A Framework for Evaluating Hybrid Multi-Core CPU FPGA DSP Architectures

Science.gov (United States)

Schmidt, Andrew G.; Weisz, Gabriel; French, Matthew; Flatley, Thomas; Villalpando, Carlos Y.

2017-01-01

The SpaceCubeX project is motivated by the need for high performance, modular, and scalable on-board processing to help scientists answer critical 21st century questions about global climate change, air quality, ocean health, and ecosystem dynamics, while adding new capabilities such as low-latency data products for extreme event warnings. These goals translate into on-board processing throughput requirements that are on the order of 100-1,000 more than those of previous Earth Science missions for standard processing, compression, storage, and downlink operations. To study possible future architectures to achieve these performance requirements, the SpaceCubeX project provides an evolvable testbed and framework that enables a focused design space exploration of candidate hybrid CPU/FPGA/DSP processing architectures. The framework includes ArchGen, an architecture generator tool populated with candidate architecture components, performance models, and IP cores, that allows an end user to specify the type, number, and connectivity of a hybrid architecture. The framework requires minimal extensions to integrate new processors, such as the anticipated High Performance Spaceflight Computer (HPSC), reducing time to initiate benchmarking by months. To evaluate the framework, we leverage a wide suite of high performance embedded computing benchmarks and Earth science scenarios to ensure robust architecture characterization. We report on our projects Year 1 efforts and demonstrate the capabilities across four simulation testbed models, a baseline SpaceCube 2.0 system, a dual ARM A9 processor system, a hybrid quad ARM A53 and FPGA system, and a hybrid quad ARM A53 and DSP system.
Redundancy scheme for multi-layered accelerator control system

International Nuclear Information System (INIS)

Chauhan, Amit; Fatnani, Pravin

2009-01-01

The control system for SRS Indus-2 has three-layered architecture. There are VMEbus based stations at the lower two layers that are controlled by their respective CPU board. The 'Profibus' fieldbus standard is used for communication between these VME stations distributed in the field. There is a Profibus controller board at each station to implement the communication protocol. The mode of communication is master-slave (command-response) type. This paper proposes a scheme to implement redundancy at the lower two layers namely Layer-2 (Supervisory Layer / Profibus-master) and Layer-3 (Equipment Unit Interface Layer / Profibus-slave). The redundancy is for both the CPU and the communication board. The scheme uses two CPU boards and two Profi controller boards at each L-3 station. This helps in decreasing any downtime resulting either from CPU faults or communication board faults that are placed in the field area. Redundancy of Profi boards provides two active communication channels between the stations that can be used in different ways thereby increasing the availability on a communication link. Redundancy of CPU boards provides certain level of auto fault-recovery as one CPU remains active and the other CPU remains in standby mode, which takes over the control of VMEbus in case of any fault in the main CPU. (author)
A μp based automation system for Raman and Rayleigh spectrometers

International Nuclear Information System (INIS)

Kesavamoorthy, R.; Arora, A.K.; Vasumathi, D.

1988-01-01

μp based data acquisition cum automation system for Raman and Rayleigh Spectrometers is described. The experiments require simultaneous acquisition of different digital data in two separate counters, their storage and rotation of grating through stepper motor in a repetitive cycle. Various modes of operation are selected through a function keyboard. The current status of the experiment is also displayed using 7 segment 12 element display unit. The input parameters are fed through a hexadecimal keyboard before the start of the experiment. The stored data can be send to a printer/terminal or to a PC through a serial port after the completion of the experiment. (author)
You can't touch this: touch-free navigation through radiological images.

Science.gov (United States)

Ebert, Lars C; Hatch, Gary; Ampanozi, Garyfalia; Thali, Michael J; Ross, Steffen

2012-09-01

Keyboards, mice, and touch screens are a potential source of infection or contamination in operating rooms, intensive care units, and autopsy suites. The authors present a low-cost prototype of a system, which allows for touch-free control of a medical image viewer. This touch-free navigation system consists of a computer system (IMac, OS X 10.6 Apple, USA) with a medical image viewer (OsiriX, OsiriX foundation, Switzerland) and a depth camera (Kinect, Microsoft, USA). They implemented software that translates the data delivered by the camera and a voice recognition software into keyboard and mouse commands, which are then passed to OsiriX. In this feasibility study, the authors introduced 10 medical professionals to the system and asked them to re-create 12 images from a CT data set. They evaluated response times and usability of the system compared with standard mouse/keyboard control. Users felt comfortable with the system after approximately 10 minutes. Response time was 120 ms. Users required 1.4 times more time to re-create an image with gesture control. Users with OsiriX experience were significantly faster using the mouse/keyboard and faster than users without prior experience. They rated the system 3.4 out of 5 for ease of use in comparison to the mouse/keyboard. The touch-free, gesture-controlled system performs favorably and removes a potential vector for infection, protecting both patients and staff. Because the camera can be quickly and easily integrated into existing systems, requires no calibration, and is low cost, the barriers to using this technology are low.
Handwriting or Typewriting? The Influence of Pen- or Keyboard-Based Writing Training on Reading and Writing Performance in Preschool Children.

Science.gov (United States)

Kiefer, Markus; Schuler, Stefanie; Mayer, Carmen; Trumpp, Natalie M; Hille, Katrin; Sachse, Steffi

2015-01-01

Digital writing devices associated with the use of computers, tablet PCs, or mobile phones are increasingly replacing writing by hand. It is, however, controversially discussed how writing modes influence reading and writing performance in children at the start of literacy. On the one hand, the easiness of typing on digital devices may accelerate reading and writing in young children, who have less developed sensory-motor skills. On the other hand, the meaningful coupling between action and perception during handwriting, which establishes sensory-motor memory traces, could facilitate written language acquisition. In order to decide between these theoretical alternatives, for the present study, we developed an intense training program for preschool children attending the German kindergarten with 16 training sessions. Using closely matched letter learning games, eight letters of the German alphabet were trained either by handwriting with a pen on a sheet of paper or by typing on a computer keyboard. Letter recognition, naming, and writing performance as well as word reading and writing performance were assessed. Results did not indicate a superiority of typing training over handwriting training in any of these tasks. In contrast, handwriting training was superior to typing training in word writing, and, as a tendency, in word reading. The results of our study, therefore, support theories of action-perception coupling assuming a facilitatory influence of sensory-motor representations established during handwriting on reading and writing.
A parallel approximate string matching under Levenshtein distance on graphics processing units using warp-shuffle operations.

Directory of Open Access Journals (Sweden)

ThienLuan Ho

Full Text Available Approximate string matching with k-differences has a number of practical applications, ranging from pattern recognition to computational biology. This paper proposes an efficient memory-access algorithm for parallel approximate string matching with k-differences on Graphics Processing Units (GPUs. In the proposed algorithm, all threads in the same GPUs warp share data using warp-shuffle operation instead of accessing the shared memory. Moreover, we implement the proposed algorithm by exploiting the memory structure of GPUs to optimize its performance. Experiment results for real DNA packages revealed that the performance of the proposed algorithm and its implementation archived up to 122.64 and 1.53 times compared to that of sequential algorithm on CPU and previous parallel approximate string matching algorithm on GPUs, respectively.
Fast Parallel Image Registration on CPU and GPU for Diagnostic Classification of Alzheimer's Disease

Directory of Open Access Journals (Sweden)

Denis P Shamonin

2014-01-01

Full Text Available Nonrigid image registration is an important, but time-consuming taskin medical image analysis. In typical neuroimaging studies, multipleimage registrations are performed, i.e. for atlas-based segmentationor template construction. Faster image registration routines wouldtherefore be beneficial.In this paper we explore acceleration of the image registrationpackage elastix by a combination of several techniques: iparallelization on the CPU, to speed up the cost function derivativecalculation; ii parallelization on the GPU building on andextending the OpenCL framework from ITKv4, to speed up the Gaussianpyramid computation and the image resampling step; iii exploitationof certain properties of the B-spline transformation model; ivfurther software optimizations.The accelerated registration tool is employed in a study ondiagnostic classification of Alzheimer's disease and cognitivelynormal controls based on T1-weighted MRI. We selected 299participants from the publicly available Alzheimer's DiseaseNeuroimaging Initiative database. Classification is performed with asupport vector machine based on gray matter volumes as a marker foratrophy. We evaluated two types of strategies (voxel-wise andregion-wise that heavily rely on nonrigid image registration.Parallelization and optimization resulted in an acceleration factorof 4-5x on an 8-core machine. Using OpenCL a speedup factor of ~2was realized for computation of the Gaussian pyramids, and 15-60 forthe resampling step, for larger images. The voxel-wise and theregion-wise classification methods had an area under thereceiver operator characteristic curve of 88% and 90%,respectively, both for standard and accelerated registration.We conclude that the image registration package elastix wassubstantially accelerated, with nearly identical results to thenon-optimized version. The new functionality will become availablein the next release of elastix as open source under the BSD license.
FLOCKING-BASED DOCUMENT CLUSTERING ON THE GRAPHICS PROCESSING UNIT [Book Chapter

Energy Technology Data Exchange (ETDEWEB)

Charles, J S; Patton, R M; Potok, T E; Cui, X

2008-01-01

Analyzing and grouping documents by content is a complex problem. One explored method of solving this problem borrows from nature, imitating the fl ocking behavior of birds. Each bird represents a single document and fl ies toward other documents that are similar to it. One limitation of this method of document clustering is its complexity O(n2). As the number of documents grows, it becomes increasingly diffi cult to receive results in a reasonable amount of time. However, fl ocking behavior, along with most naturally inspired algorithms such as ant colony optimization and particle swarm optimization, are highly parallel and have experienced improved performance on expensive cluster computers. In the last few years, the graphics processing unit (GPU) has received attention for its ability to solve highly-parallel and semi-parallel problems much faster than the traditional sequential processor. Some applications see a huge increase in performance on this new platform. The cost of these high-performance devices is also marginal when compared with the price of cluster machines. In this paper, we have conducted research to exploit this architecture and apply its strengths to the document flocking problem. Our results highlight the potential benefi t the GPU brings to all naturally inspired algorithms. Using the CUDA platform from NVIDIA®, we developed a document fl ocking implementation to be run on the NVIDIA® GEFORCE 8800. Additionally, we developed a similar but sequential implementation of the same algorithm to be run on a desktop CPU. We tested the performance of each on groups of news articles ranging in size from 200 to 3,000 documents. The results of these tests were very signifi cant. Performance gains ranged from three to nearly fi ve times improvement of the GPU over the CPU implementation. This dramatic improvement in runtime makes the GPU a potentially revolutionary platform for document clustering algorithms.
Application of GPU to computational multiphase fluid dynamics

International Nuclear Information System (INIS)

Nagatake, T; Kunugi, T

2010-01-01

The MARS (Multi-interfaces Advection and Reconstruction Solver) [1] is one of the surface volume tracking methods for multi-phase flows. Nowadays, the performance of GPU (Graphics Processing Unit) is much higher than the CPU (Central Processing Unit). In this study, the GPU was applied to the MARS in order to accelerate the computation of multi-phase flows (GPU-MARS), and the performance of the GPU-MARS was discussed. From the performance of the interface tracking method for the analyses of one-directional advection problem, it is found that the computing time of GPU(single GTX280) was around 4 times faster than that of the CPU (Xeon 5040, 4 threads parallelized). From the performance of Poisson Solver by using the algorithm developed in this study, it is found that the performance of the GPU showed around 30 times faster than that of the CPU. Finally, it is confirmed that the GPU showed the large acceleration of the fluid flow computation (GPU-MARS) compared to the CPU. However, it is also found that the double-precision computation of the GPU must perform with very high precision.
Real-time computation of parameter fitting and image reconstruction using graphical processing units

Science.gov (United States)

Locans, Uldis; Adelmann, Andreas; Suter, Andreas; Fischer, Jannis; Lustermann, Werner; Dissertori, Günther; Wang, Qiulin

2017-06-01

In recent years graphical processing units (GPUs) have become a powerful tool in scientific computing. Their potential to speed up highly parallel applications brings the power of high performance computing to a wider range of users. However, programming these devices and integrating their use in existing applications is still a challenging task. In this paper we examined the potential of GPUs for two different applications. The first application, created at Paul Scherrer Institut (PSI), is used for parameter fitting during data analysis of μSR (muon spin rotation, relaxation and resonance) experiments. The second application, developed at ETH, is used for PET (Positron Emission Tomography) image reconstruction and analysis. Applications currently in use were examined to identify parts of the algorithms in need of optimization. Efficient GPU kernels were created in order to allow applications to use a GPU, to speed up the previously identified parts. Benchmarking tests were performed in order to measure the achieved speedup. During this work, we focused on single GPU systems to show that real time data analysis of these problems can be achieved without the need for large computing clusters. The results show that the currently used application for parameter fitting, which uses OpenMP to parallelize calculations over multiple CPU cores, can be accelerated around 40 times through the use of a GPU. The speedup may vary depending on the size and complexity of the problem. For PET image analysis, the obtained speedups of the GPU version were more than × 40 larger compared to a single core CPU implementation. The achieved results show that it is possible to improve the execution time by orders of magnitude.
Introduction of Parallel GPGPU Acceleration Algorithms for the Solution of Radiative Transfer

Science.gov (United States)

Godoy, William F.; Liu, Xu

2011-01-01

General-purpose computing on graphics processing units (GPGPU) is a recent technique that allows the parallel graphics processing unit (GPU) to accelerate calculations performed sequentially by the central processing unit (CPU). To introduce GPGPU to radiative transfer, the Gauss-Seidel solution of the well-known expressions for 1-D and 3-D homogeneous, isotropic media is selected as a test case. Different algorithms are introduced to balance memory and GPU-CPU communication, critical aspects of GPGPU. Results show that speed-ups of one to two orders of magnitude are obtained when compared to sequential solutions. The underlying value of GPGPU is its potential extension in radiative solvers (e.g., Monte Carlo, discrete ordinates) at a minimal learning curve.
Human factors engineering of interfaces for speech and text in the office

NARCIS (Netherlands)

Nes, van F.L.

1986-01-01

Current data-processing equipment almost exclusively uses one input medium: the keyboard, and one output medium: the visual display unit. An alternative to typing would be welcome in view of the effort needed to become proficient in typing; speech may provide this alternative if a proper spee
Fast parallel image registration on CPU and GPU for diagnostic classification of Alzheimer's disease.

Science.gov (United States)

Shamonin, Denis P; Bron, Esther E; Lelieveldt, Boudewijn P F; Smits, Marion; Klein, Stefan; Staring, Marius

2013-01-01

Nonrigid image registration is an important, but time-consuming task in medical image analysis. In typical neuroimaging studies, multiple image registrations are performed, i.e., for atlas-based segmentation or template construction. Faster image registration routines would therefore be beneficial. In this paper we explore acceleration of the image registration package elastix by a combination of several techniques: (i) parallelization on the CPU, to speed up the cost function derivative calculation; (ii) parallelization on the GPU building on and extending the OpenCL framework from ITKv4, to speed up the Gaussian pyramid computation and the image resampling step; (iii) exploitation of certain properties of the B-spline transformation model; (iv) further software optimizations. The accelerated registration tool is employed in a study on diagnostic classification of Alzheimer's disease and cognitively normal controls based on T1-weighted MRI. We selected 299 participants from the publicly available Alzheimer's Disease Neuroimaging Initiative database. Classification is performed with a support vector machine based on gray matter volumes as a marker for atrophy. We evaluated two types of strategies (voxel-wise and region-wise) that heavily rely on nonrigid image registration. Parallelization and optimization resulted in an acceleration factor of 4-5x on an 8-core machine. Using OpenCL a speedup factor of 2 was realized for computation of the Gaussian pyramids, and 15-60 for the resampling step, for larger images. The voxel-wise and the region-wise classification methods had an area under the receiver operator characteristic curve of 88 and 90%, respectively, both for standard and accelerated registration. We conclude that the image registration package elastix was substantially accelerated, with nearly identical results to the non-optimized version. The new functionality will become available in the next release of elastix as open source under the BSD license.

Impact of memory bottleneck on the performance of graphics processing units

Science.gov (United States)

Son, Dong Oh; Choi, Hong Jun; Kim, Jong Myon; Kim, Cheol Hong

2015-12-01

Recent graphics processing units (GPUs) can process general-purpose applications as well as graphics applications with the help of various user-friendly application programming interfaces (APIs) supported by GPU vendors. Unfortunately, utilizing the hardware resource in the GPU efficiently is a challenging problem, since the GPU architecture is totally different to the traditional CPU architecture. To solve this problem, many studies have focused on the techniques for improving the system performance using GPUs. In this work, we analyze the GPU performance varying GPU parameters such as the number of cores and clock frequency. According to our simulations, the GPU performance can be improved by 125.8% and 16.2% on average as the number of cores and clock frequency increase, respectively. However, the performance is saturated when memory bottleneck problems incur due to huge data requests to the memory. The performance of GPUs can be improved as the memory bottleneck is reduced by changing GPU parameters dynamically.
General purpose graphic processing unit implementation of adaptive pulse compression algorithms

Science.gov (United States)

Cai, Jingxiao; Zhang, Yan

2017-07-01

This study introduces a practical approach to implement real-time signal processing algorithms for general surveillance radar based on NVIDIA graphical processing units (GPUs). The pulse compression algorithms are implemented using compute unified device architecture (CUDA) libraries such as CUDA basic linear algebra subroutines and CUDA fast Fourier transform library, which are adopted from open source libraries and optimized for the NVIDIA GPUs. For more advanced, adaptive processing algorithms such as adaptive pulse compression, customized kernel optimization is needed and investigated. A statistical optimization approach is developed for this purpose without needing much knowledge of the physical configurations of the kernels. It was found that the kernel optimization approach can significantly improve the performance. Benchmark performance is compared with the CPU performance in terms of processing accelerations. The proposed implementation framework can be used in various radar systems including ground-based phased array radar, airborne sense and avoid radar, and aerospace surveillance radar.
Handwriting or Typewriting? The Influence of Pen- or Keyboard-Based Writing Training on Reading and Writing Performance in Preschool Children

Science.gov (United States)

Kiefer, Markus; Schuler, Stefanie; Mayer, Carmen; Trumpp, Natalie M.; Hille, Katrin; Sachse, Steffi

2015-01-01

Digital writing devices associated with the use of computers, tablet PCs, or mobile phones are increasingly replacing writing by hand. It is, however, controversially discussed how writing modes influence reading and writing performance in children at the start of literacy. On the one hand, the easiness of typing on digital devices may accelerate reading and writing in young children, who have less developed sensory-motor skills. On the other hand, the meaningful coupling between action and perception during handwriting, which establishes sensory-motor memory traces, could facilitate written language acquisition. In order to decide between these theoretical alternatives, for the present study, we developed an intense training program for preschool children attending the German kindergarten with 16 training sessions. Using closely matched letter learning games, eight letters of the German alphabet were trained either by handwriting with a pen on a sheet of paper or by typing on a computer keyboard. Letter recognition, naming, and writing performance as well as word reading and writing performance were assessed. Results did not indicate a superiority of typing training over handwriting training in any of these tasks. In contrast, handwriting training was superior to typing training in word writing, and, as a tendency, in word reading. The results of our study, therefore, support theories of action-perception coupling assuming a facilitatory influence of sensory-motor representations established during handwriting on reading and writing. PMID:26770286
Using the Computer in Special Vocational Programs. Inservice Activities.

Science.gov (United States)

Lane, Kenneth; Ward, Raymond

This inservice manual is intended to assist vocational education teachers in using the techniques of computer-assisted instruction in special vocational education programs. Addressed in the individual units are the following topics: the basic principles of computer-assisted instruction (TRS-80 computers and typing on a computer keyboard); money…
A hybrid CPU-GPU accelerated framework for fast mapping of high-resolution human brain connectome.

Directory of Open Access Journals (Sweden)

Yu Wang

Full Text Available Recently, a combination of non-invasive neuroimaging techniques and graph theoretical approaches has provided a unique opportunity for understanding the patterns of the structural and functional connectivity of the human brain (referred to as the human brain connectome. Currently, there is a very large amount of brain imaging data that have been collected, and there are very high requirements for the computational capabilities that are used in high-resolution connectome research. In this paper, we propose a hybrid CPU-GPU framework to accelerate the computation of the human brain connectome. We applied this framework to a publicly available resting-state functional MRI dataset from 197 participants. For each subject, we first computed Pearson's Correlation coefficient between any pairs of the time series of gray-matter voxels, and then we constructed unweighted undirected brain networks with 58 k nodes and a sparsity range from 0.02% to 0.17%. Next, graphic properties of the functional brain networks were quantified, analyzed and compared with those of 15 corresponding random networks. With our proposed accelerating framework, the above process for each network cost 80∼150 minutes, depending on the network sparsity. Further analyses revealed that high-resolution functional brain networks have efficient small-world properties, significant modular structure, a power law degree distribution and highly connected nodes in the medial frontal and parietal cortical regions. These results are largely compatible with previous human brain network studies. Taken together, our proposed framework can substantially enhance the applicability and efficacy of high-resolution (voxel-based brain network analysis, and have the potential to accelerate the mapping of the human brain connectome in normal and disease states.
The stringing of Italian keyboard instruments c.1500- c.1650. Part One: Discussion and bibliography

Science.gov (United States)

Wraight, Ralph Denzil

1997-12-01

The problem of deciding which stringing materials were used on Italian string keyboard instruments is approached in two ways, by examination of documentary evidence and through the evidence of the instruments. Information on 748 instruments is presented in a catalogue which examines and describes the original condition. 89 new attributions of instruments are presented. From this comprehensive pool of information on the compasses and string lengths of the instruments made it is argued that previous authors worked with too little information to enable accurate conclusions to be drawn. Documentary evidence alone is held to be inconclusive in showing which string material was used for particular instruments at specific periods, and not as useful as argued by some previous authors. The scaling design of instruments is considered and conclusions are advanced that most virginals were designed to be strung with iron wire. It is also argued that most 16th-century harpsichords were intended to be strung with iron wire. A change to brass- scaled designs took place from about 1600-1650, and this also coincided with a loss of popularity of designs employing a 4' stop. The orientation of the first and last notes of Italian compasses on c and f notes is argued to be a consequence of music theory and not a sure indication of pitch level. There were some 16th-century harpsichords made for a pitch a fourth lower than other 8' instruments, but it is argued that there were not two groups a fourth apart in pitch. The string lengths used in the cities of Venice, Florence, Rome, Milan, and Naples are listed and show that a normal 8' range covering a whole tone was in use in all areas at various times. The evidence of the string lengths suggests that instrument makers organised the pitches of instruments into 1/3 tone steps, a scheme which may be related to the apparent use of a 1/3 comma meantone tuning system in an organ of 1494 and clavichord of 1543, before it was described in print
MGUPGMA: A Fast UPGMA Algorithm With Multiple Graphics Processing Units Using NCCL

Directory of Open Access Journals (Sweden)

Guan-Jie Hua

2017-10-01

Full Text Available A phylogenetic tree is a visual diagram of the relationship between a set of biological species. The scientists usually use it to analyze many characteristics of the species. The distance-matrix methods, such as Unweighted Pair Group Method with Arithmetic Mean and Neighbor Joining, construct a phylogenetic tree by calculating pairwise genetic distances between taxa. These methods have the computational performance issue. Although several new methods with high-performance hardware and frameworks have been proposed, the issue still exists. In this work, a novel parallel Unweighted Pair Group Method with Arithmetic Mean approach on multiple Graphics Processing Units is proposed to construct a phylogenetic tree from extremely large set of sequences. The experimental results present that the proposed approach on a DGX-1 server with 8 NVIDIA P100 graphic cards achieves approximately 3-fold to 7-fold speedup over the implementation of Unweighted Pair Group Method with Arithmetic Mean on a modern CPU and a single GPU, respectively.
MGUPGMA: A Fast UPGMA Algorithm With Multiple Graphics Processing Units Using NCCL.

Science.gov (United States)

Hua, Guan-Jie; Hung, Che-Lun; Lin, Chun-Yuan; Wu, Fu-Che; Chan, Yu-Wei; Tang, Chuan Yi

2017-01-01

A phylogenetic tree is a visual diagram of the relationship between a set of biological species. The scientists usually use it to analyze many characteristics of the species. The distance-matrix methods, such as Unweighted Pair Group Method with Arithmetic Mean and Neighbor Joining, construct a phylogenetic tree by calculating pairwise genetic distances between taxa. These methods have the computational performance issue. Although several new methods with high-performance hardware and frameworks have been proposed, the issue still exists. In this work, a novel parallel Unweighted Pair Group Method with Arithmetic Mean approach on multiple Graphics Processing Units is proposed to construct a phylogenetic tree from extremely large set of sequences. The experimental results present that the proposed approach on a DGX-1 server with 8 NVIDIA P100 graphic cards achieves approximately 3-fold to 7-fold speedup over the implementation of Unweighted Pair Group Method with Arithmetic Mean on a modern CPU and a single GPU, respectively.
Evaluation of Pb and Cu contents of selected component parts of ...

African Journals Online (AJOL)

Thirty five (35) units of waste computer central processing unit (CPU) and 24 units of waste computer monitors of different brands, manufacturers, year of manufacture, and model were collected from different electronic repairers' shops in Ibadan, South-western Nigeria and investigated for the lead and copper contents.
Adaptive real-time methodology for optimizing energy-efficient computing

Science.gov (United States)

Hsu, Chung-Hsing [Los Alamos, NM; Feng, Wu-Chun [Blacksburg, VA

2011-06-28

Dynamic voltage and frequency scaling (DVFS) is an effective way to reduce energy and power consumption in microprocessor units. Current implementations of DVFS suffer from inaccurate modeling of power requirements and usage, and from inaccurate characterization of the relationships between the applicable variables. A system and method is proposed that adjusts CPU frequency and voltage based on run-time calculations of the workload processing time, as well as a calculation of performance sensitivity with respect to CPU frequency. The system and method are processor independent, and can be applied to either an entire system as a unit, or individually to each process running on a system.
GPU-accelerated Gibbs ensemble Monte Carlo simulations of Lennard-Jonesium

Science.gov (United States)

Mick, Jason; Hailat, Eyad; Russo, Vincent; Rushaidat, Kamel; Schwiebert, Loren; Potoff, Jeffrey

2013-12-01

This work describes an implementation of canonical and Gibbs ensemble Monte Carlo simulations on graphics processing units (GPUs). The pair-wise energy calculations, which consume the majority of the computational effort, are parallelized using the energetic decomposition algorithm. While energetic decomposition is relatively inefficient for traditional CPU-bound codes, the algorithm is ideally suited to the architecture of the GPU. The performance of the CPU and GPU codes are assessed for a variety of CPU and GPU combinations for systems containing between 512 and 131,072 particles. For a system of 131,072 particles, the GPU-enabled canonical and Gibbs ensemble codes were 10.3 and 29.1 times faster (GTX 480 GPU vs. i5-2500K CPU), respectively, than an optimized serial CPU-bound code. Due to overhead from memory transfers from system RAM to the GPU, the CPU code was slightly faster than the GPU code for simulations containing less than 600 particles. The critical temperature Tc∗=1.312(2) and density ρc∗=0.316(3) were determined for the tail corrected Lennard-Jones potential from simulations of 10,000 particle systems, and found to be in exact agreement with prior mixed field finite-size scaling calculations [J.J. Potoff, A.Z. Panagiotopoulos, J. Chem. Phys. 109 (1998) 10914].
Improving Maintenance Data Collection Via Point-Of-Maintenance (POMX) Implementation

Science.gov (United States)

2006-03-01

accurate documentation, (3) identifying and correcting the root causes for poor data integrity, and (4) educating the unit on the critical need for data ...the validity of the results. The data in this study were analyzed using the SAS JMP 6.0 statistical software package. The results for the tests...traditional keyboard data entry methods at a computer terminal. These terminals are typically located in the aircraft maintenance unit (AMU) facility , away
Army Communicator. Volume 27, Number 1, Spring 2002

Science.gov (United States)

2002-01-01

master’s degrees (MPA and MBA) and a PhD in economics ,” he said. ACRONYM QUICKSCAN FM – frequency modulation PDF – Panamanian Defense Force PML...support ve- hicles. Technological inserts include the multiprocessor unit and the keyboard-video-mouse switch unit. The MPU is ideally suited for...vintage and the comput- ers have different operating sys- tems). The MPU provides a versatile, configurable platform that consoli- dates up to six
Different microprocessor controlled devices for ITU TRIGA Mark II reactor

International Nuclear Information System (INIS)

Can, B.; Omuz, S.; Uzun, S.; Apan, H.

1990-01-01

In this paper the design of a period meter and multichannel thermometer, which are controlled by a microprocessor, in order to be used at ITU TRIGA Mark-II Reactor is presented. The system works as a simple microcomputer, which includes a CPU, a EPROM, a RAM, a CTC, a PIO, a PIA a keyboard and displays, using the assembly language. The period meter can work either with pulse signal or with analog signal depending on demand of the user. The period is calculated by software and its range is -99,9 sec, to +2.1 sec. When the period drops +3 sec, the system gives alarm illuminating a LED. The multichannel thermometer has eight temperature channels. Temperature channels can manually or automatically be selected. The channel selection time can be adjusted. The thermometer gives alarm illuminating a LED, when the temperature rises to 600 C. Temperature data is stored in the RAM and is shown on a display. This system provides us to use four spare thermocouples in the reactor. (orig.)
Visual Media Reasoning - Terrain-based Geolocation

Science.gov (United States)

2015-06-01

the drawings, specifications, or other data does not license the holder or any other person or corporation ; or convey any rights or permission to...3.4 Alternative Metric Investigation This section describes a graphics processor unit (GPU) based implementation in the NVIDIA CUDA programming...utilizing 2 concurrent CPU cores, each controlling a single Nvidia C2075 Tesla Fermi CUDA card. Figure 22 shows a comparison of the CPU and the GPU powered
Towards 100,000 CPU Cycle-Scavenging by Genetic Algorithms

Science.gov (United States)

Globus, Al; Biegel, Bryan A. (Technical Monitor)

2001-01-01

We examine a web-centric design using standard tools such as web servers, web browsers, PHP, and mySQL. We also consider the applicability of Information Power Grid tools such as the Globus (no relation to the author) Toolkit. We intend to implement this architecture with JavaGenes running on at least two cycle-scavengers: Condor and United Devices. JavaGenes, a genetic algorithm code written in Java, will be used to evolve multi-species reactive molecular force field parameters.
GPU's for event reconstruction in the FairRoot framework

International Nuclear Information System (INIS)

Al-Turany, M; Uhlig, F; Karabowicz, R

2010-01-01

FairRoot is the simulation and analysis framework used by CBM and PANDA experiments at FAIR/GSI. The use of graphics processor units (GPUs) for event reconstruction in FairRoot will be presented. The fact that CUDA (Nvidia's Compute Unified Device Architecture) development tools work alongside the conventional C/C++ compiler, makes it possible to mix GPU code with general-purpose code for the host CPU, based on this some of the reconstruction tasks can be send to the graphic cards. Moreover, tasks that run on the GPU's can also run in emulation mode on the host CPU, which has the advantage that the same code is used on both CPU and GPU.
CPU Server

CERN Multimedia

The CERN computer centre has hundreds of racks like these. They are over a million times more powerful than our first computer in the 1960's. This tray is a 'dual-core' server. This means it effectively has two CPUs in it (eg. two of your home computers minimised to fit into a single box). Also note the copper cooling fins, to help dissipate the heat.
Clinical implementation of a GPU-based simplified Monte Carlo method for a treatment planning system of proton beam therapy

International Nuclear Information System (INIS)

Kohno, R; Hotta, K; Nishioka, S; Matsubara, K; Tansho, R; Suzuki, T

2011-01-01

We implemented the simplified Monte Carlo (SMC) method on graphics processing unit (GPU) architecture under the computer-unified device architecture platform developed by NVIDIA. The GPU-based SMC was clinically applied for four patients with head and neck, lung, or prostate cancer. The results were compared to those obtained by a traditional CPU-based SMC with respect to the computation time and discrepancy. In the CPU- and GPU-based SMC calculations, the estimated mean statistical errors of the calculated doses in the planning target volume region were within 0.5% rms. The dose distributions calculated by the GPU- and CPU-based SMCs were similar, within statistical errors. The GPU-based SMC showed 12.30–16.00 times faster performance than the CPU-based SMC. The computation time per beam arrangement using the GPU-based SMC for the clinical cases ranged 9–67 s. The results demonstrate the successful application of the GPU-based SMC to a clinical proton treatment planning. (note)
Parallelizing ATLAS Reconstruction and Simulation: Issues and Optimization Solutions for Scaling on Multi- and Many-CPU Platforms

International Nuclear Information System (INIS)

Leggett, C; Jackson, K; Tatarkhanov, M; Yao, Y; Binet, S; Levinthal, D

2011-01-01

Thermal limitations have forced CPU manufacturers to shift from simply increasing clock speeds to improve processor performance, to producing chip designs with multi- and many-core architectures. Further the cores themselves can run multiple threads as a zero overhead context switch allowing low level resource sharing (Intel Hyperthreading). To maximize bandwidth and minimize memory latency, memory access has become non uniform (NUMA). As manufacturers add more cores to each chip, a careful understanding of the underlying architecture is required in order to fully utilize the available resources. We present AthenaMP and the Atlas event loop manager, the driver of the simulation and reconstruction engines, which have been rewritten to make use of multiple cores, by means of event based parallelism, and final stage I/O synchronization. However, initial studies on 8 andl6 core Intel architectures have shown marked non-linearities as parallel process counts increase, with as much as 30% reductions in event throughput in some scenarios. Since the Intel Nehalem architecture (both Gainestown and Westmere) will be the most common choice for the next round of hardware procurements, an understanding of these scaling issues is essential. Using hardware based event counters and Intel's Performance Tuning Utility, we have studied the performance bottlenecks at the hardware level, and discovered optimization schemes to maximize processor throughput. We have also produced optimization mechanisms, common to all large experiments, that address the extreme nature of today's HEP code, which due to it's size, places huge burdens on the memory infrastructure of today's processors.

The role of heat pipes in intensified unit operations

International Nuclear Information System (INIS)

Reay, David; Harvey, Adam

2013-01-01

Heat pipes are heat transfer devices that rely, most commonly, on the evaporation and condensation of a working fluid contained within them, with passive pumping of the condensate back to the evaporator. They are sometimes referred to as ‘thermal superconductors’ because of their exceptionally high effective thermal conductivity (substantially higher than any metal). This, together with several other characteristics make them attractive to a range of intensified unit operations, particularly reactors. The majority of modern computers deploy heat pipes for cooling of the CPU. The application areas of heat pipes come within a number of broad groups, each of which describes a property of the heat pipe. The ones particularly relevant to chemical reactors are: i. Separation of heat source and sink. ii. Temperature flattening, or isothermalisation. iii. Temperature control. Chemical reactors, as a heat pipe application area, highlight the benefits of the heat pipe based on isothermalisation/temperature flattening device and on being a highly effective heat transfer unit. Temperature control, done passively, is also of relevance. Heat pipe technology offers a number of potential benefits to reactor performance and operation. The aim of increased yield of high purity, high added value chemicals means less waste and higher profitability. Other intensified unit operations, such as those employing sorption processes, can also profit from heat pipe technology. This paper describes several variants of heat pipe and the opportunities for their use in intensified plant, and will give some current examples. -- Highlights: ► Heat pipes – thermal superconductors – can lead to improved chemical reactor performance. ► Isothermalisation within a reactor vessel is an ideal application. ► The variable conductance heat pipe can control reaction temperatures within close limits. ► Heat pipes can be beneficial in intensified reactors
Fast simulation of Proton Induced X-Ray Emission Tomography using CUDA

Energy Technology Data Exchange (ETDEWEB)

Beasley, D.G., E-mail: dgbeasley@itn.pt; Marques, A.C.; Alves, L.C.; Silva, R.C. da

2013-07-01

A new 3D Proton Induced X-Ray Emission Tomography (PIXE-T) and Scanning Transmission Ion Microscopy Tomography (STIM-T) simulation software has been developed in Java and uses NVIDIA™ Common Unified Device Architecture (CUDA) to calculate the X-ray attenuation for large detector areas. A challenge with PIXE-T is to get sufficient counts while retaining a small beam spot size. Therefore a high geometric efficiency is required. However, as the detector solid angle increases the calculations required for accurate reconstruction of the data increase substantially. To overcome this limitation, the CUDA parallel computing platform was used which enables general purpose programming of NVIDIA graphics processing units (GPUs) to perform computations traditionally handled by the central processing unit (CPU). For simulation performance evaluation, the results of a CPU- and a CUDA-based simulation of a phantom are presented. Furthermore, a comparison with the simulation code in the PIXE-Tomography reconstruction software DISRA (A. Sakellariou, D.N. Jamieson, G.J.F. Legge, 2001) is also shown. Compared to a CPU implementation, the CUDA based simulation is approximately 30× faster.
Fast simulation of Proton Induced X-Ray Emission Tomography using CUDA

International Nuclear Information System (INIS)

Beasley, D.G.; Marques, A.C.; Alves, L.C.; Silva, R.C. da

2013-01-01

A new 3D Proton Induced X-Ray Emission Tomography (PIXE-T) and Scanning Transmission Ion Microscopy Tomography (STIM-T) simulation software has been developed in Java and uses NVIDIA™ Common Unified Device Architecture (CUDA) to calculate the X-ray attenuation for large detector areas. A challenge with PIXE-T is to get sufficient counts while retaining a small beam spot size. Therefore a high geometric efficiency is required. However, as the detector solid angle increases the calculations required for accurate reconstruction of the data increase substantially. To overcome this limitation, the CUDA parallel computing platform was used which enables general purpose programming of NVIDIA graphics processing units (GPUs) to perform computations traditionally handled by the central processing unit (CPU). For simulation performance evaluation, the results of a CPU- and a CUDA-based simulation of a phantom are presented. Furthermore, a comparison with the simulation code in the PIXE-Tomography reconstruction software DISRA (A. Sakellariou, D.N. Jamieson, G.J.F. Legge, 2001) is also shown. Compared to a CPU implementation, the CUDA based simulation is approximately 30× faster
Parallel Sequential Monte Carlo for Efficient Density Combination: The Deco Matlab Toolbox

DEFF Research Database (Denmark)

Casarin, Roberto; Grassi, Stefano; Ravazzolo, Francesco

This paper presents the Matlab package DeCo (Density Combination) which is based on the paper by Billio et al. (2013) where a constructive Bayesian approach is presented for combining predictive densities originating from different models or other sources of information. The combination weights...... for standard CPU computing and for Graphical Process Unit (GPU) parallel computing. For the GPU implementation we use the Matlab parallel computing toolbox and show how to use General Purposes GPU computing almost effortless. This GPU implementation comes with a speed up of the execution time up to seventy...... times compared to a standard CPU Matlab implementation on a multicore CPU. We show the use of the package and the computational gain of the GPU version, through some simulation experiments and empirical applications....
Degradation of the quality of energy in a computer center; Degradacion de la calidad de la energia en un Centro de Computos

Energy Technology Data Exchange (ETDEWEB)

Suarez, J. A.; Dimenna, C.; Mauro, R. di; Mauro, G. di; Anaut, D. [Universidad Nacional Mar del Plata (Argentina). Fac. de Ingenieria. Grupo LAT], e-mail: lat@mdp.edu.ar

2009-07-01

The aim of this study is to analyze the impact that has on the current output distortion the simultaneous connection of a large number of central processing units (CPU). The combinations of peripherals (Monitors, printers) and the same units working with different operating regimes are studied.
Daily Arrests

Data.gov (United States)

Montgomery County of Maryland — This dataset provides the public with arrest information from the Montgomery County Central Processing Unit (CPU) systems. The data presented is derived from every...
32 CFR 286.29 - Collection of fees and fee rates.

Science.gov (United States)

2010-07-01

... support, operator, programmer, database administrator, or action officer). (ii) Machine time. Machine time involves only direct costs of the Central Processing Unit (CPU), input/output devices, and memory capacity...
Transportable GPU (General Processor Units) chip set technology for standard computer architectures

Science.gov (United States)

Fosdick, R. E.; Denison, H. C.

1982-11-01

The USAFR-developed GPU Chip Set has been utilized by Tracor to implement both USAF and Navy Standard 16-Bit Airborne Computer Architectures. Both configurations are currently being delivered into DOD full-scale development programs. Leadless Hermetic Chip Carrier packaging has facilitated implementation of both architectures on single 41/2 x 5 substrates. The CMOS and CMOS/SOS implementations of the GPU Chip Set have allowed both CPU implementations to use less than 3 watts of power each. Recent efforts by Tracor for USAF have included the definition of a next-generation GPU Chip Set that will retain the application-proven architecture of the current chip set while offering the added cost advantages of transportability across ISO-CMOS and CMOS/SOS processes and across numerous semiconductor manufacturers using a newly-defined set of common design rules. The Enhanced GPU Chip Set will increase speed by an approximate factor of 3 while significantly reducing chip counts and costs of standard CPU implementations.
Human brain as the model of a new computer system. II

Energy Technology Data Exchange (ETDEWEB)

Holtz, K; Langheld, E

1981-12-09

For Pt. I see IBID., Vol. 29, No. 22, P. 13 (1981). The authors describe the self-generating system of connections of a self-teaching no-program associative computer. The self-generating systems of connections are regarded as simulation models of the human brain and compared with the brain structure. The system hardware comprises microprocessor, PROM, memory, VDU, keyboard unit.
Validation of GPU based TomoTherapy dose calculation engine.

Science.gov (United States)

Chen, Quan; Lu, Weiguo; Chen, Yu; Chen, Mingli; Henderson, Douglas; Sterpin, Edmond

2012-04-01

The graphic processing unit (GPU) based TomoTherapy convolution/superposition(C/S) dose engine (GPU dose engine) achieves a dramatic performance improvement over the traditional CPU-cluster based TomoTherapy dose engine (CPU dose engine). Besides the architecture difference between the GPU and CPU, there are several algorithm changes from the CPU dose engine to the GPU dose engine. These changes made the GPU dose slightly different from the CPU-cluster dose. In order for the commercial release of the GPU dose engine, its accuracy has to be validated. Thirty eight TomoTherapy phantom plans and 19 patient plans were calculated with both dose engines to evaluate the equivalency between the two dose engines. Gamma indices (Γ) were used for the equivalency evaluation. The GPU dose was further verified with the absolute point dose measurement with ion chamber and film measurements for phantom plans. Monte Carlo calculation was used as a reference for both dose engines in the accuracy evaluation in heterogeneous phantom and actual patients. The GPU dose engine showed excellent agreement with the current CPU dose engine. The majority of cases had over 99.99% of voxels with Γ(1%, 1 mm) engine also showed similar degree of accuracy in heterogeneous media as the current TomoTherapy dose engine. It is verified and validated that the ultrafast TomoTherapy GPU dose engine can safely replace the existing TomoTherapy cluster based dose engine without degradation in dose accuracy.
Computer hardware for radiologists: Part I

International Nuclear Information System (INIS)

Indrajit, IK; Alam, A

2010-01-01

Computers are an integral part of modern radiology practice. They are used in different radiology modalities to acquire, process, and postprocess imaging data. They have had a dramatic influence on contemporary radiology practice. Their impact has extended further with the emergence of Digital Imaging and Communications in Medicine (DICOM), Picture Archiving and Communication System (PACS), Radiology information system (RIS) technology, and Teleradiology. A basic overview of computer hardware relevant to radiology practice is presented here. The key hardware components in a computer are the motherboard, central processor unit (CPU), the chipset, the random access memory (RAM), the memory modules, bus, storage drives, and ports. The personnel computer (PC) has a rectangular case that contains important components called hardware, many of which are integrated circuits (ICs). The fiberglass motherboard is the main printed circuit board and has a variety of important hardware mounted on it, which are connected by electrical pathways called “buses”. The CPU is the largest IC on the motherboard and contains millions of transistors. Its principal function is to execute “programs”. A Pentium ® 4 CPU has transistors that execute a billion instructions per second. The chipset is completely different from the CPU in design and function; it controls data and interaction of buses between the motherboard and the CPU. Memory (RAM) is fundamentally semiconductor chips storing data and instructions for access by a CPU. RAM is classified by storage capacity, access speed, data rate, and configuration
Computer hardware for radiologists: Part I

Directory of Open Access Journals (Sweden)

Indrajit I

2010-01-01

Full Text Available Computers are an integral part of modern radiology practice. They are used in different radiology modalities to acquire, process, and postprocess imaging data. They have had a dramatic influence on contemporary radiology practice. Their impact has extended further with the emergence of Digital Imaging and Communications in Medicine (DICOM, Picture Archiving and Communication System (PACS, Radiology information system (RIS technology, and Teleradiology. A basic overview of computer hardware relevant to radiology practice is presented here. The key hardware components in a computer are the motherboard, central processor unit (CPU, the chipset, the random access memory (RAM, the memory modules, bus, storage drives, and ports. The personnel computer (PC has a rectangular case that contains important components called hardware, many of which are integrated circuits (ICs. The fiberglass motherboard is the main printed circuit board and has a variety of important hardware mounted on it, which are connected by electrical pathways called "buses". The CPU is the largest IC on the motherboard and contains millions of transistors. Its principal function is to execute "programs". A Pentium® 4 CPU has transistors that execute a billion instructions per second. The chipset is completely different from the CPU in design and function; it controls data and interaction of buses between the motherboard and the CPU. Memory (RAM is fundamentally semiconductor chips storing data and instructions for access by a CPU. RAM is classified by storage capacity, access speed, data rate, and configuration.
32 CFR 518.20 - Collection of fees and fee rates.

Science.gov (United States)

2010-07-01

..., programmer, database administrator, or action officer). (ii) Machine time. Machine time involves only direct costs of the Central Processing Unit (CPU), input/output devices, and memory capacity used in the actual...
Multi-Threaded Algorithms for General purpose Graphics Processor Units in the ATLAS High Level Trigger

CERN Document Server

Conde Mui\\~no, Patricia; The ATLAS collaboration

2016-01-01

General purpose Graphics Processor Units (GPGPU) are being evaluated for possible future inclusion in an upgraded ATLAS High Level Trigger farm. We have developed a demonstrator including GPGPU implementations of Inner Detector and Muon tracking and Calorimeter clustering within the ATLAS software framework. ATLAS is a general purpose particle physics experiment located on the LHC collider at CERN. The ATLAS Trigger system consists of two levels, with level 1 implemented in hardware and the High Level Trigger implemented in software running on a farm of commodity CPU. The High Level Trigger reduces the trigger rate from the 100 kHz level 1 acceptance rate to 1 kHz for recording, requiring an average per-event processing time of ~250 ms for this task. The selection in the high level trigger is based on reconstructing tracks in the Inner Detector and Muon Spectrometer and clusters of energy deposited in the Calorimeter. Performing this reconstruction within the available farm resources presents a significant ...
GRAPHICS PROCESSING UNITS: MORE THAN THE PATHWAY TO REALISTIC VIDEO-GAMES

Directory of Open Access Journals (Sweden)

CARLOS TRUJILLO

2011-01-01

Full Text Available El amplio mercado de los juegos de video ha impulsado un acelerado progreso del hardware y software orientado a lograr ambientes de juego de mayor realidad. Entre estos desarrollos se cuentan las unidades de procesamiento gráfico (GPU, cuyo objetivo es liberar la unidad de procesamiento principal (CPU de los elaborados cómputos que proporcionan "vida" a los juegos de video. Para lograrlo, las GPUs son equipadas con múltiples núcleos de procesamiento operando en paralelo, lo cual permite utilizarlas en tareas mucho más diversas que el desarrollo de juegos de video. En este artículo se presenta una breve descripción de las características de compute unified device architecture (CUDA TM, una arquitectura de cómputo paralelo en GPUs. Se presenta una aplicación de esta arquitectura en la reconstrucción numérica de hologramas, para la cual se reporta una aceleración de 11X con respecto al desempeño alcanzado en una CPU.
SU-E-T-423: Fast Photon Convolution Calculation with a 3D-Ideal Kernel On the GPU

Energy Technology Data Exchange (ETDEWEB)

Moriya, S; Sato, M [Komazawa University, Setagaya, Tokyo (Japan); Tachibana, H [National Cancer Center Hospital East, Kashiwa, Chiba (Japan)

2015-06-15

Purpose: The calculation time is a trade-off for improving the accuracy of convolution dose calculation with fine calculation spacing of the KERMA kernel. We investigated to accelerate the convolution calculation using an ideal kernel on the Graphic Processing Units (GPU). Methods: The calculation was performed on the AMD graphics hardware of Dual FirePro D700 and our algorithm was implemented using the Aparapi that convert Java bytecode to OpenCL. The process of dose calculation was separated with the TERMA and KERMA steps. The dose deposited at the coordinate (x, y, z) was determined in the process. In the dose calculation running on the central processing unit (CPU) of Intel Xeon E5, the calculation loops were performed for all calculation points. On the GPU computation, all of the calculation processes for the points were sent to the GPU and the multi-thread computation was done. In this study, the dose calculation was performed in a water equivalent homogeneous phantom with 150{sup 3} voxels (2 mm calculation grid) and the calculation speed on the GPU to that on the CPU and the accuracy of PDD were compared. Results: The calculation time for the GPU and the CPU were 3.3 sec and 4.4 hour, respectively. The calculation speed for the GPU was 4800 times faster than that for the CPU. The PDD curve for the GPU was perfectly matched to that for the CPU. Conclusion: The convolution calculation with the ideal kernel on the GPU was clinically acceptable for time and may be more accurate in an inhomogeneous region. Intensity modulated arc therapy needs dose calculations for different gantry angles at many control points. Thus, it would be more practical that the kernel uses a coarse spacing technique if the calculation is faster while keeping the similar accuracy to a current treatment planning system.
Optimization of Selected Remote Sensing Algorithms for Embedded NVIDIA Kepler GPU Architecture

Science.gov (United States)

Riha, Lubomir; Le Moigne, Jacqueline; El-Ghazawi, Tarek

2015-01-01

This paper evaluates the potential of embedded Graphic Processing Units in the Nvidias Tegra K1 for onboard processing. The performance is compared to a general purpose multi-core CPU and full fledge GPU accelerator. This study uses two algorithms: Wavelet Spectral Dimension Reduction of Hyperspectral Imagery and Automated Cloud-Cover Assessment (ACCA) Algorithm. Tegra K1 achieved 51 for ACCA algorithm and 20 for the dimension reduction algorithm, as compared to the performance of the high-end 8-core server Intel Xeon CPU with 13.5 times higher power consumption.
Accumulo/Hadoop, MongoDB, and Elasticsearch Performance for Semi Structured Intrusion Detection (IDS) Data

Science.gov (United States)

2016-11-01

RHEL patches were applied: 6 • 1 Dell PowerEdge R710 server ○ One 2.26-GHz Xeon 4 Core central processing unit (CPU) ○ Two 250-GB, 7,200-RPM...Hadoop, and Elasticsearch “master” server • 4 Dell PowerEdge R420 servers ○ Two 2.2-GHz Xeon E5-2430 6 Core CPU ○ Four 2-TB, 7,200-RPM SATA Drives...report are not to be construed as an official Department of the Army position unless so designated by other authorized documents . Citation of
Application of GPU to Multi-interfaces Advection and Reconstruction Solver (MARS)

International Nuclear Information System (INIS)

Nagatake, Taku; Takase, Kazuyuki; Kunugi, Tomoaki

2010-01-01

In the nuclear engineering fields, a high performance computer system is necessary to perform the large scale computations. Recently, a Graphics Processing Unit (GPU) has been developed as a rendering computational system in order to reduce a Central Processing Unit (CPU) load. In the graphics processing, the high performance computing is needed to render the high-quality 3D objects in some video games. Thus the GPU consists of many processing units and a wide memory bandwidth. In this study, the Multi-interfaces Advection and Reconstruction Solver (MARS) which is one of the interface volume tracking methods for multi-phase flows has been performed. The multi-phase flow computation is very important for the nuclear reactors and other engineering fields. The MARS consists of two computing parts: the interface tracking part and the fluid motion computing part. As for the interface tracking part, the performance of GPU (GTX280) was 6 times faster than that of the CPU (Dual-Xeon 5040), and in the fluid motion computing part the Poisson Solver by the GPU (GTX285) was 22 times faster than that by the CPU(Core i7). As for the Dam Breaking Problem, the result of GPU-MARS showed slightly different from the experimental result. Because the GPU-MARS was developed using the single-precision GPU, it can be considered that the round-off error might be accumulated. (author)
Does computer use affect the incidence of distal arm pain? A one-year prospective study using objective measures of computer use

DEFF Research Database (Denmark)

Mikkelsen, S.; Lassen, C. F.; Vilstrup, Imogen

2012-01-01

PURPOSE: To study how objectively recorded mouse and keyboard activity affects distal arm pain among computer workers. METHODS: Computer activities were recorded among 2,146 computer workers. For 52 weeks mouse and keyboard time, sustained activity, speed and micropauses were recorded with a soft......PURPOSE: To study how objectively recorded mouse and keyboard activity affects distal arm pain among computer workers. METHODS: Computer activities were recorded among 2,146 computer workers. For 52 weeks mouse and keyboard time, sustained activity, speed and micropauses were recorded...... with a software program installed on the participants' computers. Participants reported weekly pain scores via the software program for elbow, forearm and wrist/hand as well as in a questionnaire at baseline and 1-year follow up. Associations between pain development and computer work were examined for three pain...... were not risk factors for acute pain, nor did they modify the effects of mouse or keyboard time. Computer usage parameters were not associated with prolonged or chronic pain. A major limitation of the study was low keyboard times. CONCLUSION: Computer work was not related to the development...

Password Cracking on Graphics Processing Unit Based Systems

OpenAIRE

N. Gopalakrishna Kini; Ranjana Paleppady; Akshata K. Naik

2015-01-01

Password authentication is one of the widely used methods to achieve authentication for legal users of computers and defense against attackers. There are many different ways to authenticate users of a system and there are many password cracking methods also developed. This paper proposes how best password cracking can be performed on a CPU-GPGPU based system. The main objective of this work is to project how quickly a password can be cracked with some knowledge about the ...
The Point Lepreau Desktop Simulator

International Nuclear Information System (INIS)

MacLean, M.; Hogg, J.; Newman, H.

1997-01-01

The Point Lepreau Desktop Simulator runs plant process modeling software on a 266 MHz single CPU DEC Alpha computer. This same Alpha also runs the plant control computer software on an SSCI 125 emulator. An adjacent Pentium PC runs the simulator's Instructor Facility software, and communicates with the Alpha through an Ethernet. The Point Lepreau Desktop simulator is constructed to be as similar as possible to the Point Lepreau full scope training simulator. This minimizes total maintenance costs and enhances the benefits of the desktop simulator. Both simulators have the same modeling running on a single CPU in the same schedule of calculations. Both simulators have the same Instructor Facility capable of developing and executing the same lesson plans, doing the same monitoring and control of simulations, inserting all the same malfunctions, performing all the same overrides, capable of making and restoring all the same storepoints. Both simulators run the same plant control computer software - the same assembly language control programs as the power plant uses for reactor control, heat transport control, annunciation, etc. This is a higher degree of similarity between a desktop simulator and a full scope training simulator than previously reported for a computer controlled nuclear plant. The large quantity of control room hardware missing from the desktop simulator is replaced by software. The Instructor Facility panel override software of the training simulator provides the means by which devices (switches, controllers, windows, etc.) on the control room panels can be controlled and monitored in the desktop simulator. The CRT of the Alpha provides a mouse operated DCC keyboard mimic for controlling the plant control computer emulation. Two emulated RAMTEK display channels appear as windows for monitoring anything of interest on plant DCC displays, including one channel for annunciation. (author)
Facile synthesis of silver nanoparticles and its antibacterial activity against Escherichia coli and unknown bacteria on mobile phone touch surfaces/computer keyboards

Science.gov (United States)

Reddy, T. Ranjeth Kumar; Kim, Hyun-Joong

2016-07-01

In recent years, there has been significant interest in the development of novel metallic nanoparticles using various top-down and bottom-up synthesis techniques. Kenaf is a huge biomass product and a potential component for industrial applications. In this work, we investigated the green synthesis of silver nanoparticles (AgNPs) by using kenaf ( Hibiscus cannabinus) cellulose extract and sucrose, which act as stabilizing and reducing agents in solution. With this method, by changing the pH of the solution as a function of time, we studied the optical, morphological and antibacterial properties of the synthesized AgNPs. In addition, these nanoparticles were characterized by Ultraviolet-visible spectroscopy, transmission electron microscopy (TEM), field-emission scanning electron microscopy, Fourier transform infrared (FTIR) spectroscopy and energy-dispersive X-ray spectroscopy (EDX). As the pH of the solution varies, the surface plasmon resonance peak also varies. A fast rate of reaction at pH 10 compared with that at pH 5 was identified. TEM micrographs confirm that the shapes of the particles are spherical and polygonal. Furthermore, the average size of the nanoparticles synthesized at pH 5, pH 8 and pH 10 is 40.26, 28.57 and 24.57 nm, respectively. The structure of the synthesized AgNPs was identified as face-centered cubic (fcc) by XRD. The compositional analysis was determined by EDX. FTIR confirms that the kenaf cellulose extract and sucrose act as stabilizing and reducing agents for the silver nanoparticles. Meanwhile, these AgNPs exhibited size-dependent antibacterial activity against Escherichia coli ( E. coli) and two other unknown bacteria from mobile phone screens and computer keyboard surfaces.
A TBB-CUDA Implementation for Background Removal in a Video-Based Fire Detection System

Directory of Open Access Journals (Sweden)

Fan Wang

2014-01-01

Full Text Available This paper presents a parallel TBB-CUDA implementation for the acceleration of single-Gaussian distribution model, which is effective for background removal in the video-based fire detection system. In this framework, TBB mainly deals with initializing work of the estimated Gaussian model running on CPU, and CUDA performs background removal and adaption of the model running on GPU. This implementation can exploit the combined computation power of TBB-CUDA, which can be applied to the real-time environment. Over 220 video sequences are utilized in the experiments. The experimental results illustrate that TBB+CUDA can achieve a higher speedup than both TBB and CUDA. The proposed framework can effectively overcome the disadvantages of limited memory bandwidth and few execution units of CPU, and it reduces data transfer latency and memory latency between CPU and GPU.
Le rendement dans l’utilisation du clavier d’ordinateur pour écrire chez les personnes ayant un trouble du spectre de l’autisme / Performance using a computer keyboard for writing among persons with an autism spectrum disorder

Directory of Open Access Journals (Sweden)

Claire Dumont

2015-03-01

Full Text Available Les personnes ayant un trouble du spectre de l’autisme (TSA ont fréquemment des problèmes graphomoteurs. Ces difficultés ont à leur tour des répercussions sur de nombreuses occupations, le rendement scolaire et le développement de la personne. Dans cette situation, l’écriture avec un clavier peut s’avérer une alternative pertinente. Cette étude vise à décrire le rendement des enfants ayant un TSA dans leur réalisation de tâches avec le clavier d’ordinateur et de le comparer à celui des enfants à développement typique. Le Test du rendement dans l’utilisation de l’ordinateur a été administré à un échantillon de 53 enfants âgés entre six et 15 ans ayant un TSA. Les résultats suggèrent qu’ils peuvent avoir un rendement équivalent, supérieur ou inférieur en fonction de leurs caractéristiques et du type de tâche. Le test permet également d’observer plusieurs particularités de leurs processus d’apprentissage. Des suggestions pour l’amélioration des connaissances et des pratiques en découlent. Persons with an autism spectrum disorder (ASD frequently have graphomotor problems. These difficulties in turn have repercussions on numerous tasks, on academic performance and on personal development. In this situation, writing using a keyboard can be a relevant alternative. This study aims to describe the performance of children with an ASD in performing tasks using a computer keyboard and to compare it with that of children whose development is typical. The Performance using a computer test was administered to a sample of 53 children with an ASD, aged 6 to 15. Findings suggest that they can offer an equal, superior or inferior performance depending on their characteristics and on the type of task performed. The test also allows the observation of several characteristics in their learning processes. Suggestions for the improvement of the knowledge base and practices follow.
Ultra-Fast Image Reconstruction of Tomosynthesis Mammography Using GPU.

Science.gov (United States)

Arefan, D; Talebpour, A; Ahmadinejhad, N; Kamali Asl, A

2015-06-01

Digital Breast Tomosynthesis (DBT) is a technology that creates three dimensional (3D) images of breast tissue. Tomosynthesis mammography detects lesions that are not detectable with other imaging systems. If image reconstruction time is in the order of seconds, we can use Tomosynthesis systems to perform Tomosynthesis-guided Interventional procedures. This research has been designed to study ultra-fast image reconstruction technique for Tomosynthesis Mammography systems using Graphics Processing Unit (GPU). At first, projections of Tomosynthesis mammography have been simulated. In order to produce Tomosynthesis projections, it has been designed a 3D breast phantom from empirical data. It is based on MRI data in its natural form. Then, projections have been created from 3D breast phantom. The image reconstruction algorithm based on FBP was programmed with C++ language in two methods using central processing unit (CPU) card and the Graphics Processing Unit (GPU). It calculated the time of image reconstruction in two kinds of programming (using CPU and GPU).
Ultra-Fast Image Reconstruction of Tomosynthesis Mammography Using GPU

Directory of Open Access Journals (Sweden)

Arefan D

2015-06-01

Full Text Available Digital Breast Tomosynthesis (DBT is a technology that creates three dimensional (3D images of breast tissue. Tomosynthesis mammography detects lesions that are not detectable with other imaging systems. If image reconstruction time is in the order of seconds, we can use Tomosynthesis systems to perform Tomosynthesis-guided Interventional procedures. This research has been designed to study ultra-fast image reconstruction technique for Tomosynthesis Mammography systems using Graphics Processing Unit (GPU. At first, projections of Tomosynthesis mammography have been simulated. In order to produce Tomosynthesis projections, it has been designed a 3D breast phantom from empirical data. It is based on MRI data in its natural form. Then, projections have been created from 3D breast phantom. The image reconstruction algorithm based on FBP was programmed with C++ language in two methods using central processing unit (CPU card and the Graphics Processing Unit (GPU. It calculated the time of image reconstruction in two kinds of programming (using CPU and GPU.
Development of a computer system at La Hague center

International Nuclear Information System (INIS)

Mimaud, Robert; Malet, Georges; Ollivier, Francis; Fabre, J.-C.; Valois, Philippe; Desgranges, Patrick; Anfossi, Gilbert; Gentizon, Michel; Serpollet, Roger.

1977-01-01

The U.P.2 plant, built at La Hague Center is intended mainly for the reprocessing of spent fuels coming from (as metal) graphite-gas reactors and (as oxide) light-water, heavy-water and breeder reactors. In each of the five large nuclear units the digital processing of measurements was dealt with until 1974 by CAE 3030 data processors. During the period 1974-1975 a modern industrial computer system was set up. This system, equipped with T 2000/20 material from the Telemecanique company, consists of five measurement acquisition devices (for a total of 1500 lines processed) and two central processing units (CPU). The connection of these two PCU (Hardware and Software) enables an automatic connection of the system either on the first CPU or on the second one. The system covers, at present, data processing, threshold monitoring, alarm systems, display devices, periodical listing, and specific calculations concerning the process (balances etc), and at a later stage, an automatic control of certain units of the Process [fr
Analysis of Small Muscle Movement Effects on EEG Signals

Science.gov (United States)

2016-12-22

different conditions are recorded in this experiment. These conditions are the resting state, left finger keyboard press, right finger keyboard...51 4.3.2. Right and Left Finger Keyboard Press Conditions ..................................... 57 4.4. Detection of Hand...solving Gamma 30 Hz and higher Blending of multiple brain functions ; Muscle related artifacts 2.2. EEG Artifacts EEG recordings are intended to
Development of software for the microsimulator for the KO-RI nuclear power plant unit 2

International Nuclear Information System (INIS)

Seok, H.; No, H.C.; Cho, S.J.; Park, S.D.; Jun, H.Y.; Lee, Y.K.

1994-01-01

A workstation-based real-time simulator for two-loop pressurized water reactor plants is developed for classroom training in support of a full-scale simulator, on-site transient analysis, and engineering studies. The present simulator consists of three functional modules: plant module, graphic module, and man-machine interaction module. The plant module includes models for the core kinetics, reactor coolant system, steam generator, main steam line, balance of plant, and control and protection system. Each of the models is optimized to obtain the capability of real-time simulation. The graphic module is designed to provide the user with more information at a glance by dynamically displaying schematic diagrams of the systems, symbols indicating the operating status of each component, trend curves, and the main control room. As tools for the man-machine interface, the man-machine interaction model uses a color cathode ray tube monitor, a standard keyboard, and the mouse. The interactive communication module receives parameters from the user via the keyboard and mouse, and transfers them to the plant module so as to enable the user to perform his specific actions. This module provides the user with various initiating events (malfunctions and manual controls) through SYSTEM, CONTROL ROOM, and ACCIDENTS menus, and thus a wide range of nuclear steam supply system transients can be easily simulated. The FISA-2/WS is verified through comparisons with analytical solutions, separated tests and integral tests, and predictions by RETRAN-2 and RELAP5/MOD3
Optimization of the coherence function estimation for multi-core central processing unit

Science.gov (United States)

Cheremnov, A. G.; Faerman, V. A.; Avramchuk, V. S.

2017-02-01

The paper considers use of parallel processing on multi-core central processing unit for optimization of the coherence function evaluation arising in digital signal processing. Coherence function along with other methods of spectral analysis is commonly used for vibration diagnosis of rotating machinery and its particular nodes. An algorithm is given for the function evaluation for signals represented with digital samples. The algorithm is analyzed for its software implementation and computational problems. Optimization measures are described, including algorithmic, architecture and compiler optimization, their results are assessed for multi-core processors from different manufacturers. Thus, speeding-up of the parallel execution with respect to sequential execution was studied and results are presented for Intel Core i7-4720HQ и AMD FX-9590 processors. The results show comparatively high efficiency of the optimization measures taken. In particular, acceleration indicators and average CPU utilization have been significantly improved, showing high degree of parallelism of the constructed calculating functions. The developed software underwent state registration and will be used as a part of a software and hardware solution for rotating machinery fault diagnosis and pipeline leak location with acoustic correlation method.
Accelerating Electrostatic Surface Potential Calculation with Multiscale Approximation on Graphics Processing Units

Science.gov (United States)

Anandakrishnan, Ramu; Scogland, Tom R. W.; Fenley, Andrew T.; Gordon, John C.; Feng, Wu-chun; Onufriev, Alexey V.

2010-01-01

Tools that compute and visualize biomolecular electrostatic surface potential have been used extensively for studying biomolecular function. However, determining the surface potential for large biomolecules on a typical desktop computer can take days or longer using currently available tools and methods. Two commonly used techniques to speed up these types of electrostatic computations are approximations based on multi-scale coarse-graining and parallelization across multiple processors. This paper demonstrates that for the computation of electrostatic surface potential, these two techniques can be combined to deliver significantly greater speed-up than either one separately, something that is in general not always possible. Specifically, the electrostatic potential computation, using an analytical linearized Poisson Boltzmann (ALPB) method, is approximated using the hierarchical charge partitioning (HCP) multiscale method, and parallelized on an ATI Radeon 4870 graphical processing unit (GPU). The implementation delivers a combined 934-fold speed-up for a 476,040 atom viral capsid, compared to an equivalent non-parallel implementation on an Intel E6550 CPU without the approximation. This speed-up is significantly greater than the 42-fold speed-up for the HCP approximation alone or the 182-fold speed-up for the GPU alone. PMID:20452792
Space Object Collision Probability via Monte Carlo on the Graphics Processing Unit

Science.gov (United States)

Vittaldev, Vivek; Russell, Ryan P.

2017-09-01

Fast and accurate collision probability computations are essential for protecting space assets. Monte Carlo (MC) simulation is the most accurate but computationally intensive method. A Graphics Processing Unit (GPU) is used to parallelize the computation and reduce the overall runtime. Using MC techniques to compute the collision probability is common in literature as the benchmark. An optimized implementation on the GPU, however, is a challenging problem and is the main focus of the current work. The MC simulation takes samples from the uncertainty distributions of the Resident Space Objects (RSOs) at any time during a time window of interest and outputs the separations at closest approach. Therefore, any uncertainty propagation method may be used and the collision probability is automatically computed as a function of RSO collision radii. Integration using a fixed time step and a quartic interpolation after every Runge Kutta step ensures that no close approaches are missed. Two orders of magnitude speedups over a serial CPU implementation are shown, and speedups improve moderately with higher fidelity dynamics. The tool makes the MC approach tractable on a single workstation, and can be used as a final product, or for verifying surrogate and analytical collision probability methods.
Accelerating large-scale protein structure alignments with graphics processing units

Directory of Open Access Journals (Sweden)

Pang Bin

2012-02-01

Full Text Available Abstract Background Large-scale protein structure alignment, an indispensable tool to structural bioinformatics, poses a tremendous challenge on computational resources. To ensure structure alignment accuracy and efficiency, efforts have been made to parallelize traditional alignment algorithms in grid environments. However, these solutions are costly and of limited accessibility. Others trade alignment quality for speedup by using high-level characteristics of structure fragments for structure comparisons. Findings We present ppsAlign, a parallel protein structure Alignment framework designed and optimized to exploit the parallelism of Graphics Processing Units (GPUs. As a general-purpose GPU platform, ppsAlign could take many concurrent methods, such as TM-align and Fr-TM-align, into the parallelized algorithm design. We evaluated ppsAlign on an NVIDIA Tesla C2050 GPU card, and compared it with existing software solutions running on an AMD dual-core CPU. We observed a 36-fold speedup over TM-align, a 65-fold speedup over Fr-TM-align, and a 40-fold speedup over MAMMOTH. Conclusions ppsAlign is a high-performance protein structure alignment tool designed to tackle the computational complexity issues from protein structural data. The solution presented in this paper allows large-scale structure comparisons to be performed using massive parallel computing power of GPU.
Does computer use affect the incidence of distal arm pain? A one-year prospective study using objective measures of computer use

DEFF Research Database (Denmark)

Mikkelsen, Sigurd; Lassen, Christina Funch; Vilstrup, Imogen

2012-01-01

PURPOSE: To study how objectively recorded mouse and keyboard activity affects distal arm pain among computer workers. METHODS: Computer activities were recorded among 2,146 computer workers. For 52 weeks mouse and keyboard time, sustained activity, speed and micropauses were recorded with a soft......PURPOSE: To study how objectively recorded mouse and keyboard activity affects distal arm pain among computer workers. METHODS: Computer activities were recorded among 2,146 computer workers. For 52 weeks mouse and keyboard time, sustained activity, speed and micropauses were recorded...... with a software program installed on the participants' computers. Participants reported weekly pain scores via the software program for elbow, forearm and wrist/hand as well as in a questionnaire at baseline and 1-year follow up. Associations between pain development and computer work were examined for three pain...... were not risk factors for acute pain, nor did they modify the effects of mouse or keyboard time. Computer usage parameters were not associated with prolonged or chronic pain. A major limitation of the study was low keyboard times. CONCLUSION: Computer work was not related to the development...
A Study on GPU-based Iterative ML-EM Reconstruction Algorithm for Emission Computed Tomographic Imaging Systems

Energy Technology Data Exchange (ETDEWEB)

Ha, Woo Seok; Kim, Soo Mee; Park, Min Jae; Lee, Dong Soo; Lee, Jae Sung [Seoul National University, Seoul (Korea, Republic of)

2009-10-15

The maximum likelihood-expectation maximization (ML-EM) is the statistical reconstruction algorithm derived from probabilistic model of the emission and detection processes. Although the ML-EM has many advantages in accuracy and utility, the use of the ML-EM is limited due to the computational burden of iterating processing on a CPU (central processing unit). In this study, we developed a parallel computing technique on GPU (graphic processing unit) for ML-EM algorithm. Using Geforce 9800 GTX+ graphic card and CUDA (compute unified device architecture) the projection and backprojection in ML-EM algorithm were parallelized by NVIDIA's technology. The time delay on computations for projection, errors between measured and estimated data and backprojection in an iteration were measured. Total time included the latency in data transmission between RAM and GPU memory. The total computation time of the CPU- and GPU-based ML-EM with 32 iterations were 3.83 and 0.26 sec, respectively. In this case, the computing speed was improved about 15 times on GPU. When the number of iterations increased into 1024, the CPU- and GPU-based computing took totally 18 min and 8 sec, respectively. The improvement was about 135 times and was caused by delay on CPU-based computing after certain iterations. On the other hand, the GPU-based computation provided very small variation on time delay per iteration due to use of shared memory. The GPU-based parallel computation for ML-EM improved significantly the computing speed and stability. The developed GPU-based ML-EM algorithm could be easily modified for some other imaging geometries
A Study on GPU-based Iterative ML-EM Reconstruction Algorithm for Emission Computed Tomographic Imaging Systems

International Nuclear Information System (INIS)

Ha, Woo Seok; Kim, Soo Mee; Park, Min Jae; Lee, Dong Soo; Lee, Jae Sung

2009-01-01

The maximum likelihood-expectation maximization (ML-EM) is the statistical reconstruction algorithm derived from probabilistic model of the emission and detection processes. Although the ML-EM has many advantages in accuracy and utility, the use of the ML-EM is limited due to the computational burden of iterating processing on a CPU (central processing unit). In this study, we developed a parallel computing technique on GPU (graphic processing unit) for ML-EM algorithm. Using Geforce 9800 GTX+ graphic card and CUDA (compute unified device architecture) the projection and backprojection in ML-EM algorithm were parallelized by NVIDIA's technology. The time delay on computations for projection, errors between measured and estimated data and backprojection in an iteration were measured. Total time included the latency in data transmission between RAM and GPU memory. The total computation time of the CPU- and GPU-based ML-EM with 32 iterations were 3.83 and 0.26 sec, respectively. In this case, the computing speed was improved about 15 times on GPU. When the number of iterations increased into 1024, the CPU- and GPU-based computing took totally 18 min and 8 sec, respectively. The improvement was about 135 times and was caused by delay on CPU-based computing after certain iterations. On the other hand, the GPU-based computation provided very small variation on time delay per iteration due to use of shared memory. The GPU-based parallel computation for ML-EM improved significantly the computing speed and stability. The developed GPU-based ML-EM algorithm could be easily modified for some other imaging geometries
A real-time GNSS-R system based on software-defined radio and graphics processing units

Science.gov (United States)

Hobiger, Thomas; Amagai, Jun; Aida, Masanori; Narita, Hideki

2012-04-01

Reflected signals of the Global Navigation Satellite System (GNSS) from the sea or land surface can be utilized to deduce and monitor physical and geophysical parameters of the reflecting area. Unlike most other remote sensing techniques, GNSS-Reflectometry (GNSS-R) operates as a passive radar that takes advantage from the increasing number of navigation satellites that broadcast their L-band signals. Thereby, most of the GNSS-R receiver architectures are based on dedicated hardware solutions. Software-defined radio (SDR) technology has advanced in the recent years and enabled signal processing in real-time, which makes it an ideal candidate for the realization of a flexible GNSS-R system. Additionally, modern commodity graphic cards, which offer massive parallel computing performances, allow to handle the whole signal processing chain without interfering with the PC's CPU. Thus, this paper describes a GNSS-R system which has been developed on the principles of software-defined radio supported by General Purpose Graphics Processing Units (GPGPUs), and presents results from initial field tests which confirm the anticipated capability of the system.
Burn Injury Caused by Laptop Computers

African Journals Online (AJOL)

generated in central processing unit (CPU), graphics processing unit, hard drive, internal ... change its position. Discussion ... Suzuki, et al. reported that the critical temperature for superficial burn was 37.8°C, for deep dermal burns 41.9°C and ... The laptop should be placed on a hard surface and not on soft surfaces like.
On developing B-spline registration algorithms for multi-core processors

International Nuclear Information System (INIS)

Shackleford, J A; Kandasamy, N; Sharp, G C

2010-01-01

Spline-based deformable registration methods are quite popular within the medical-imaging community due to their flexibility and robustness. However, they require a large amount of computing time to obtain adequate results. This paper makes two contributions towards accelerating B-spline-based registration. First, we propose a grid-alignment scheme and associated data structures that greatly reduce the complexity of the registration algorithm. Based on this grid-alignment scheme, we then develop highly data parallel designs for B-spline registration within the stream-processing model, suitable for implementation on multi-core processors such as graphics processing units (GPUs). Particular attention is focused on an optimal method for performing analytic gradient computations in a data parallel fashion. CPU and GPU versions are validated for execution time and registration quality. Performance results on large images show that our GPU algorithm achieves a speedup of 15 times over the single-threaded CPU implementation whereas our multi-core CPU algorithm achieves a speedup of 8 times over the single-threaded implementation. The CPU and GPU versions achieve near-identical registration quality in terms of RMS differences between the generated vector fields.

Timing comparison of two-dimensional discrete-ordinates codes for criticality calculations

International Nuclear Information System (INIS)

Miller, W.F. Jr.; Alcouffe, R.E.; Bosler, G.E.; Brinkley, F.W. Jr.; O'dell, R.D.

1979-01-01

The authors compare two-dimensional discrete-ordinates neutron transport computer codes to solve reactor criticality problems. The fundamental interest is in determining which code requires the minimum Central Processing Unit (CPU) time for a given numerical model of a reasonably realistic fast reactor core and peripherals. The computer codes considered are the most advanced available and, in three cases, are not officially released. The conclusion, based on the study of four fast reactor core models, is that for this class of problems the diffusion synthetic accelerated version of TWOTRAN, labeled TWOTRAN-DA, is superior to the other codes in terms of CPU requirements
Computer Workstations: Keyboards

Science.gov (United States)

... Safety and Health Program Recommendations It's the Law Poster REGULATIONS Law and Regulations Standard Interpretations Training Requirements ... page requires that javascript be enabled for some elements to function correctly. Please contact the OSHA Directorate ...
A GPU-based calculation using the three-dimensional FDTD method for electromagnetic field analysis.

Science.gov (United States)

Nagaoka, Tomoaki; Watanabe, Soichi

2010-01-01

Numerical simulations with the numerical human model using the finite-difference time domain (FDTD) method have recently been performed frequently in a number of fields in biomedical engineering. However, the FDTD calculation runs too slowly. We focus, therefore, on general purpose programming on the graphics processing unit (GPGPU). The three-dimensional FDTD method was implemented on the GPU using Compute Unified Device Architecture (CUDA). In this study, we used the NVIDIA Tesla C1060 as a GPGPU board. The performance of the GPU is evaluated in comparison with the performance of a conventional CPU and a vector supercomputer. The results indicate that three-dimensional FDTD calculations using a GPU can significantly reduce run time in comparison with that using a conventional CPU, even a native GPU implementation of the three-dimensional FDTD method, while the GPU/CPU speed ratio varies with the calculation domain and thread block size.
Implementation and Optimization of GPU-Based Static State Security Analysis in Power Systems

Directory of Open Access Journals (Sweden)

Yong Chen

2017-01-01

Full Text Available Static state security analysis (SSSA is one of the most important computations to check whether a power system is in normal and secure operating state. It is a challenge to satisfy real-time requirements with CPU-based concurrent methods due to the intensive computations. A sensitivity analysis-based method with Graphics processing unit (GPU is proposed for power systems, which can reduce calculation time by 40% compared to the execution on a 4-core CPU. The proposed method involves load flow analysis and sensitivity analysis. In load flow analysis, a multifrontal method for sparse LU factorization is explored on GPU through dynamic frontal task scheduling between CPU and GPU. The varying matrix operations during sensitivity analysis on GPU are highly optimized in this study. The results of performance evaluations show that the proposed GPU-based SSSA with optimized matrix operations can achieve a significant reduction in computation time.
Noniterative Multireference Coupled Cluster Methods on Heterogeneous CPU-GPU Systems

Energy Technology Data Exchange (ETDEWEB)

Bhaskaran-Nair, Kiran; Ma, Wenjing; Krishnamoorthy, Sriram; Villa, Oreste; van Dam, Hubertus JJ; Apra, Edoardo; Kowalski, Karol

2013-04-09

A novel parallel algorithm for non-iterative multireference coupled cluster (MRCC) theories, which merges recently introduced reference-level parallelism (RLP) [K. Bhaskaran-Nair, J.Brabec, E. Aprà, H.J.J. van Dam, J. Pittner, K. Kowalski, J. Chem. Phys. 137, 094112 (2012)] with the possibility of accelerating numerical calculations using graphics processing unit (GPU) is presented. We discuss the performance of this algorithm on the example of the MRCCSD(T) method (iterative singles and doubles and perturbative triples), where the corrections due to triples are added to the diagonal elements of the MRCCSD (iterative singles and doubles) effective Hamiltonian matrix. The performance of the combined RLP/GPU algorithm is illustrated on the example of the Brillouin-Wigner (BW) and Mukherjee (Mk) state-specific MRCCSD(T) formulations.
HP 9816

CERN Multimedia

1982-01-01

The 9816 was introduced in late 1982. This was the low-cost model in the 200 Series range. It only had two expansion slots and featured a monitor integrated with the system unit and modular keyboard and mass storage (usually a 9121 dual 3.5 inch floppy drive). The monitor was nine inches diagonally with a 400 by 300 dot resolution. The HP 9816 was also designated as the HP 9000 216. It did not include any disk drives but it had a built-in 9 inch monochrome monitor, built-in HP-IB and RS-232 ports and 2 expansion slots. The standard keyboard for the 9816 is a itty-bitty number. The 9816 A came with 128K bytes of memory. The 9816 S included all of the above plus disk based BASIC and a card containing an additional 256K of memory bringing the total memory to 512K but only leaving only one expansion slot open.
Adaptive Processing for Sequence Alignment

KAUST Repository

Zidan, Mohammed A.; Bonny, Talal; Salama, Khaled N.

2012-01-01

Disclosed are various embodiments for adaptive processing for sequence alignment. In one embodiment, among others, a method includes obtaining a query sequence and a plurality of database sequences. A first portion of the plurality of database sequences is distributed to a central processing unit (CPU) and a second portion of the plurality of database sequences is distributed to a graphical processing unit (GPU) based upon a predetermined splitting ratio associated with the plurality of database sequences, where the database sequences of the first portion are shorter than the database sequences of the second portion. A first alignment score for the query sequence is determined with the CPU based upon the first portion of the plurality of database sequences and a second alignment score for the query sequence is determined with the GPU based upon the second portion of the plurality of database sequences.
Adaptive Processing for Sequence Alignment

KAUST Repository

Zidan, Mohammed A.

2012-01-26

Disclosed are various embodiments for adaptive processing for sequence alignment. In one embodiment, among others, a method includes obtaining a query sequence and a plurality of database sequences. A first portion of the plurality of database sequences is distributed to a central processing unit (CPU) and a second portion of the plurality of database sequences is distributed to a graphical processing unit (GPU) based upon a predetermined splitting ratio associated with the plurality of database sequences, where the database sequences of the first portion are shorter than the database sequences of the second portion. A first alignment score for the query sequence is determined with the CPU based upon the first portion of the plurality of database sequences and a second alignment score for the query sequence is determined with the GPU based upon the second portion of the plurality of database sequences.
Molecular Monte Carlo Simulations Using Graphics Processing Units: To Waste Recycle or Not?

Science.gov (United States)

Kim, Jihan; Rodgers, Jocelyn M; Athènes, Manuel; Smit, Berend

2011-10-11

In the waste recycling Monte Carlo (WRMC) algorithm, (1) multiple trial states may be simultaneously generated and utilized during Monte Carlo moves to improve the statistical accuracy of the simulations, suggesting that such an algorithm may be well posed for implementation in parallel on graphics processing units (GPUs). In this paper, we implement two waste recycling Monte Carlo algorithms in CUDA (Compute Unified Device Architecture) using uniformly distributed random trial states and trial states based on displacement random-walk steps, and we test the methods on a methane-zeolite MFI framework system to evaluate their utility. We discuss the specific implementation details of the waste recycling GPU algorithm and compare the methods to other parallel algorithms optimized for the framework system. We analyze the relationship between the statistical accuracy of our simulations and the CUDA block size to determine the efficient allocation of the GPU hardware resources. We make comparisons between the GPU and the serial CPU Monte Carlo implementations to assess speedup over conventional microprocessors. Finally, we apply our optimized GPU algorithms to the important problem of determining free energy landscapes, in this case for molecular motion through the zeolite LTA.
Acoustic reverse-time migration using GPU card and POSIX thread based on the adaptive optimal finite-difference scheme and the hybrid absorbing boundary condition

Science.gov (United States)

Cai, Xiaohui; Liu, Yang; Ren, Zhiming

2018-06-01

Reverse-time migration (RTM) is a powerful tool for imaging geologically complex structures such as steep-dip and subsalt. However, its implementation is quite computationally expensive. Recently, as a low-cost solution, the graphic processing unit (GPU) was introduced to improve the efficiency of RTM. In the paper, we develop three ameliorative strategies to implement RTM on GPU card. First, given the high accuracy and efficiency of the adaptive optimal finite-difference (FD) method based on least squares (LS) on central processing unit (CPU), we study the optimal LS-based FD method on GPU. Second, we develop the CPU-based hybrid absorbing boundary condition (ABC) to the GPU-based one by addressing two issues of the former when introduced to GPU card: time-consuming and chaotic threads. Third, for large-scale data, the combinatorial strategy for optimal checkpointing and efficient boundary storage is introduced for the trade-off between memory and recomputation. To save the time of communication between host and disk, the portable operating system interface (POSIX) thread is utilized to create the other CPU core at the checkpoints. Applications of the three strategies on GPU with the compute unified device architecture (CUDA) programming language in RTM demonstrate their efficiency and validity.
Computer hardware for radiologists: Part 2

International Nuclear Information System (INIS)

Indrajit, IK; Alam, A

2010-01-01

Computers are an integral part of modern radiology equipment. In the first half of this two-part article, we dwelt upon some fundamental concepts regarding computer hardware, covering components like motherboard, central processing unit (CPU), chipset, random access memory (RAM), and memory modules. In this article, we describe the remaining computer hardware components that are of relevance to radiology. “Storage drive” is a term describing a “memory” hardware used to store data for later retrieval. Commonly used storage drives are hard drives, floppy drives, optical drives, flash drives, and network drives. The capacity of a hard drive is dependent on many factors, including the number of disk sides, number of tracks per side, number of sectors on each track, and the amount of data that can be stored in each sector. “Drive interfaces” connect hard drives and optical drives to a computer. The connections of such drives require both a power cable and a data cable. The four most popular “input/output devices” used commonly with computers are the printer, monitor, mouse, and keyboard. The “bus” is a built-in electronic signal pathway in the motherboard to permit efficient and uninterrupted data transfer. A motherboard can have several buses, including the system bus, the PCI express bus, the PCI bus, the AGP bus, and the (outdated) ISA bus. “Ports” are the location at which external devices are connected to a computer motherboard. All commonly used peripheral devices, such as printers, scanners, and portable drives, need ports. A working knowledge of computers is necessary for the radiologist if the workflow is to realize its full potential and, besides, this knowledge will prepare the radiologist for the coming innovations in the ‘ever increasing’ digital future
Computer hardware for radiologists: Part 2

Directory of Open Access Journals (Sweden)

Indrajit I

2010-01-01

Full Text Available Computers are an integral part of modern radiology equipment. In the first half of this two-part article, we dwelt upon some fundamental concepts regarding computer hardware, covering components like motherboard, central processing unit (CPU, chipset, random access memory (RAM, and memory modules. In this article, we describe the remaining computer hardware components that are of relevance to radiology. "Storage drive" is a term describing a "memory" hardware used to store data for later retrieval. Commonly used storage drives are hard drives, floppy drives, optical drives, flash drives, and network drives. The capacity of a hard drive is dependent on many factors, including the number of disk sides, number of tracks per side, number of sectors on each track, and the amount of data that can be stored in each sector. "Drive interfaces" connect hard drives and optical drives to a computer. The connections of such drives require both a power cable and a data cable. The four most popular "input/output devices" used commonly with computers are the printer, monitor, mouse, and keyboard. The "bus" is a built-in electronic signal pathway in the motherboard to permit efficient and uninterrupted data transfer. A motherboard can have several buses, including the system bus, the PCI express bus, the PCI bus, the AGP bus, and the (outdated ISA bus. "Ports" are the location at which external devices are connected to a computer motherboard. All commonly used peripheral devices, such as printers, scanners, and portable drives, need ports. A working knowledge of computers is necessary for the radiologist if the workflow is to realize its full potential and, besides, this knowledge will prepare the radiologist for the coming innovations in the ′ever increasing′ digital future.
CPU time optimization and precise adjustment of the Geant4 physics parameters for a VARIAN 2100 C/D gamma radiotherapy linear accelerator simulation using GAMOS

Science.gov (United States)

Arce, Pedro; Lagares, Juan Ignacio

2018-02-01

We have verified the GAMOS/Geant4 simulation model of a 6 MV VARIAN Clinac 2100 C/D linear accelerator by the procedure of adjusting the initial beam parameters to fit the percentage depth dose and cross-profile dose experimental data at different depths in a water phantom. Thanks to the use of a wide range of field sizes, from 2 × 2 cm2 to 40 × 40 cm2, a small phantom voxel size and high statistics, fine precision in the determination of the beam parameters has been achieved. This precision has allowed us to make a thorough study of the different physics models and parameters that Geant4 offers. The three Geant4 electromagnetic physics sets of models, i.e. Standard, Livermore and Penelope, have been compared to the experiment, testing the four different models of angular bremsstrahlung distributions as well as the three available multiple-scattering models, and optimizing the most relevant Geant4 electromagnetic physics parameters. Before the fitting, a comprehensive CPU time optimization has been done, using several of the Geant4 efficiency improvement techniques plus a few more developed in GAMOS.
Non-invasive measuring instrument of kVp, R/M and exposure time

International Nuclear Information System (INIS)

Laan, Flavio T. van der; Elbern, Alwin W.

1996-01-01

The development of an instrument for fast measurement of essential parameters related to quality control of X-ray equipment is described. The unit is designed with a 80 C31 micro controller, a function keyboard, an αnumeric display and a probe with PV diodes. Testing and calibration in this non-invasive instrument has been done at the X-rays equipment for the Santa Rita Hospital in Porto Alegre, Rio Grande do Sul State, Brazil
Research on control law accelerator of digital signal process chip TMS320F28035 for real-time data acquisition and processing

Science.gov (United States)

Zhao, Shuangle; Zhang, Xueyi; Sun, Shengli; Wang, Xudong

2017-08-01

TI C2000 series digital signal process (DSP) chip has been widely used in electrical engineering, measurement and control, communications and other professional fields, DSP TMS320F28035 is one of the most representative of a kind. When using the DSP program, need data acquisition and data processing, and if the use of common mode C or assembly language programming, the program sequence, analogue-to-digital (AD) converter cannot be real-time acquisition, often missing a lot of data. The control low accelerator (CLA) processor can run in parallel with the main central processing unit (CPU), and the frequency is consistent with the main CPU, and has the function of floating point operations. Therefore, the CLA coprocessor is used in the program, and the CLA kernel is responsible for data processing. The main CPU is responsible for the AD conversion. The advantage of this method is to reduce the time of data processing and realize the real-time performance of data acquisition.
On localization attacks against cloud infrastructure

Science.gov (United States)

Ge, Linqiang; Yu, Wei; Sistani, Mohammad Ali

2013-05-01

One of the key characteristics of cloud computing is the device and location independence that enables the user to access systems regardless of their location. Because cloud computing is heavily based on sharing resource, it is vulnerable to cyber attacks. In this paper, we investigate a localization attack that enables the adversary to leverage central processing unit (CPU) resources to localize the physical location of server used by victims. By increasing and reducing CPU usage through the malicious virtual machine (VM), the response time from the victim VM will increase and decrease correspondingly. In this way, by embedding the probing signal into the CPU usage and correlating the same pattern in the response time from the victim VM, the adversary can find the location of victim VM. To determine attack accuracy, we investigate features in both the time and frequency domains. We conduct both theoretical and experimental study to demonstrate the effectiveness of such an attack.
Near-Zero Emissions Oxy-Combustion Flue Gas Purification

Energy Technology Data Exchange (ETDEWEB)

Minish Shah; Nich Degenstein; Monica Zanfir; Rahul Solunke; Ravi Kumar; Jennifer Bugayong; Ken Burgers

2012-06-30

The objectives of this project were to carry out an experimental program to enable development and design of near zero emissions (NZE) CO{sub 2} processing unit (CPU) for oxy-combustion plants burning high and low sulfur coals and to perform commercial viability assessment. The NZE CPU was proposed to produce high purity CO{sub 2} from the oxycombustion flue gas, to achieve > 95% CO{sub 2} capture rate and to achieve near zero atmospheric emissions of criteria pollutants. Two SOx/NOx removal technologies were proposed depending on the SOx levels in the flue gas. The activated carbon process was proposed for power plants burning low sulfur coal and the sulfuric acid process was proposed for power plants burning high sulfur coal. For plants burning high sulfur coal, the sulfuric acid process would convert SOx and NOx in to commercial grade sulfuric and nitric acid by-products, thus reducing operating costs associated with SOx/NOx removal. For plants burning low sulfur coal, investment in separate FGD and SCR equipment for producing high purity CO{sub 2} would not be needed. To achieve high CO{sub 2} capture rates, a hybrid process that combines cold box and VPSA (vacuum pressure swing adsorption) was proposed. In the proposed hybrid process, up to 90% of CO{sub 2} in the cold box vent stream would be recovered by CO{sub 2} VPSA and then it would be recycled and mixed with the flue gas stream upstream of the compressor. The overall recovery from the process will be > 95%. The activated carbon process was able to achieve simultaneous SOx and NOx removal in a single step. The removal efficiencies were >99.9% for SOx and >98% for NOx, thus exceeding the performance targets of >99% and >95%, respectively. The process was also found to be suitable for power plants burning both low and high sulfur coals. Sulfuric acid process did not meet the performance expectations. Although it could achieve high SOx (>99%) and NOx (>90%) removal efficiencies, it could not produce by
A new nonlinear conjugate gradient coefficient under strong Wolfe-Powell line search

Science.gov (United States)

Mohamed, Nur Syarafina; Mamat, Mustafa; Rivaie, Mohd

2017-08-01

A nonlinear conjugate gradient method (CG) plays an important role in solving a large-scale unconstrained optimization problem. This method is widely used due to its simplicity. The method is known to possess sufficient descend condition and global convergence properties. In this paper, a new nonlinear of CG coefficient βk is presented by employing the Strong Wolfe-Powell inexact line search. The new βk performance is tested based on number of iterations and central processing unit (CPU) time by using MATLAB software with Intel Core i7-3470 CPU processor. Numerical experimental results show that the new βk converge rapidly compared to other classical CG method.
Acceleration of Linear Finite-Difference Poisson-Boltzmann Methods on Graphics Processing Units.

Science.gov (United States)

Qi, Ruxi; Botello-Smith, Wesley M; Luo, Ray

2017-07-11

Electrostatic interactions play crucial roles in biophysical processes such as protein folding and molecular recognition. Poisson-Boltzmann equation (PBE)-based models have emerged as widely used in modeling these important processes. Though great efforts have been put into developing efficient PBE numerical models, challenges still remain due to the high dimensionality of typical biomolecular systems. In this study, we implemented and analyzed commonly used linear PBE solvers for the ever-improving graphics processing units (GPU) for biomolecular simulations, including both standard and preconditioned conjugate gradient (CG) solvers with several alternative preconditioners. Our implementation utilizes the standard Nvidia CUDA libraries cuSPARSE, cuBLAS, and CUSP. Extensive tests show that good numerical accuracy can be achieved given that the single precision is often used for numerical applications on GPU platforms. The optimal GPU performance was observed with the Jacobi-preconditioned CG solver, with a significant speedup over standard CG solver on CPU in our diversified test cases. Our analysis further shows that different matrix storage formats also considerably affect the efficiency of different linear PBE solvers on GPU, with the diagonal format best suited for our standard finite-difference linear systems. Further efficiency may be possible with matrix-free operations and integrated grid stencil setup specifically tailored for the banded matrices in PBE-specific linear systems.
Mult-Pollutant Control Through Novel Approaches to Oxygen Enhanced Combustion

Energy Technology Data Exchange (ETDEWEB)

Richard Axelbaum; Pratim Biswas

2009-02-28

Growing concerns about global climate change have focused effortss on identifying approaches to stabilizing carbon dioxide levels in the atmosphere. One approach utilizes oxy-fuel combustion to produce a concentrated flue gas that will enable economical CO{sub 2} capture by direct methods. Oxy-fuel combustion rewuires an Air Separation Unit (ASU) to provide a high-purity stream of oxygen as well as a Compression and Purification Unit (CPU) to clean and compress the CO{sub 2} for long term storage. Overall plant efficiency will suffer from the parasitic load of both the ASU and CPU and researchers are investigating techniques to enhance other aspects of the combustion and gas cleanup proceses to improve the benefit-to-cost ratio. This work examines the influence of oxy-fuel combustion and non-carbon based sorbents on the formation and fate of multiple combustion pollutants both numerically and experimentally.

Personalized keystroke dynamics for self-powered human--machine interfacing.

Science.gov (United States)

Chen, Jun; Zhu, Guang; Yang, Jin; Jing, Qingshen; Bai, Peng; Yang, Weiqing; Qi, Xuewei; Su, Yuanjie; Wang, Zhong Lin

2015-01-27

The computer keyboard is one of the most common, reliable, accessible, and effective tools used for human--machine interfacing and information exchange. Although keyboards have been used for hundreds of years for advancing human civilization, studying human behavior by keystroke dynamics using smart keyboards remains a great challenge. Here we report a self-powered, non-mechanical-punching keyboard enabled by contact electrification between human fingers and keys, which converts mechanical stimuli applied to the keyboard into local electronic signals without applying an external power. The intelligent keyboard (IKB) can not only sensitively trigger a wireless alarm system once gentle finger tapping occurs but also trace and record typed content by detecting both the dynamic time intervals between and during the inputting of letters and the force used for each typing action. Such features hold promise for its use as a smart security system that can realize detection, alert, recording, and identification. Moreover, the IKB is able to identify personal characteristics from different individuals, assisted by the behavioral biometric of keystroke dynamics. Furthermore, the IKB can effectively harness typing motions for electricity to charge commercial electronics at arbitrary typing speeds greater than 100 characters per min. Given the above features, the IKB can be potentially applied not only to self-powered electronics but also to artificial intelligence, cyber security, and computer or network access control.
Space shuttle general purpose computers (GPCs) (current and future versions)

Science.gov (United States)

1988-01-01

Current and future versions of general purpose computers (GPCs) for space shuttle orbiters are represented in this frame. The two boxes on the left (AP101B) represent the current GPC configuration, with the input-output processor at far left and the central processing unit (CPU) at its side. The upgraded version combines both elements in a single unit (far right, AP101S).
August Dvorak (1894-1975): Early expressions of applied behavior analysis and precision teaching

Science.gov (United States)

Joyce, Bonnie; Moxley, Roy A.

1988-01-01

August Dvorak is best known for his development of the Dvorak keyboard. However, Dvorak also adapted and applied many behavioral and scientific management techniques to the field of education. Taken collectively, these techniques are representative of many of the procedures currently used in applied behavior analysis, in general, and especially in precision teaching. The failure to consider Dvorak's instructional methods may explain some of the discrepant findings in studies which compare the efficiency of the Dvorak to the standard keyboard. This article presents a brief background on the development of the standard (QWERTY) and Dvorak keyboards, describes parallels between Dvorak's teaching procedures and those used in precision teaching, reviews some of the comparative research on the Dvorak keyboard, and suggests some implications for further research in applying the principles of behavior analysis. PMID:22477993
Efficient GPU-based texture interpolation using uniform B-splines

NARCIS (Netherlands)

Ruijters, D.; Haar Romenij, ter B.M.; Suetens, P.

2008-01-01

This article presents uniform B-spline interpolation, completely contained on the graphics processing unit (GPU). This implies that the CPU does not need to compute any lookup tables or B-spline basis functions. The cubic interpolation can be decomposed into several linear interpolations [Sigg and
Syncope prevalence in the ED compared to general practice and population: a strong selection process

NARCIS (Netherlands)

Olde Nordkamp, Louise R. A.; van Dijk, Nynke; Ganzeboom, Karin S.; Reitsma, Johannes B.; Luitse, Jan S. K.; Dekker, Lukas R. C.; Shen, Win-Kuang; Wieling, Wouter

2009-01-01

Objective: We assessed the prevalence and distribution of the different causes of transient loss of consciousness (TLOC) in the emergency department (ED) and chest pain unit (CPU) and estimated the proportion of persons with syncope in the general population who seek medical attention from either
Parallel GPGPU Evaluation of Small Angle X-ray Scattering Profiles in a Markov Chain Monte Carlo Framework

DEFF Research Database (Denmark)

Antonov, Lubomir Dimitrov; Andreetta, Christian; Hamelryck, Thomas Wim

2013-01-01

directly determines the complexity of the systems that can be explored. We present an efficient implementation of the forward model for SAXS with full hardware utilization of Graphics Processor Units (GPUs). The proposed algorithm is orders of magnitude faster than an efficient CPU implementation...
Graphics Processing Unit-Enhanced Genetic Algorithms for Solving the Temporal Dynamics of Gene Regulatory Networks.

Science.gov (United States)

García-Calvo, Raúl; Guisado, J L; Diaz-Del-Rio, Fernando; Córdoba, Antonio; Jiménez-Morales, Francisco

2018-01-01

Understanding the regulation of gene expression is one of the key problems in current biology. A promising method for that purpose is the determination of the temporal dynamics between known initial and ending network states, by using simple acting rules. The huge amount of rule combinations and the nonlinear inherent nature of the problem make genetic algorithms an excellent candidate for finding optimal solutions. As this is a computationally intensive problem that needs long runtimes in conventional architectures for realistic network sizes, it is fundamental to accelerate this task. In this article, we study how to develop efficient parallel implementations of this method for the fine-grained parallel architecture of graphics processing units (GPUs) using the compute unified device architecture (CUDA) platform. An exhaustive and methodical study of various parallel genetic algorithm schemes-master-slave, island, cellular, and hybrid models, and various individual selection methods (roulette, elitist)-is carried out for this problem. Several procedures that optimize the use of the GPU's resources are presented. We conclude that the implementation that produces better results (both from the performance and the genetic algorithm fitness perspectives) is simulating a few thousands of individuals grouped in a few islands using elitist selection. This model comprises 2 mighty factors for discovering the best solutions: finding good individuals in a short number of generations, and introducing genetic diversity via a relatively frequent and numerous migration. As a result, we have even found the optimal solution for the analyzed gene regulatory network (GRN). In addition, a comparative study of the performance obtained by the different parallel implementations on GPU versus a sequential application on CPU is carried out. In our tests, a multifold speedup was obtained for our optimized parallel implementation of the method on medium class GPU over an equivalent
Accelerating electrostatic surface potential calculation with multi-scale approximation on graphics processing units.

Science.gov (United States)

Anandakrishnan, Ramu; Scogland, Tom R W; Fenley, Andrew T; Gordon, John C; Feng, Wu-chun; Onufriev, Alexey V

2010-06-01

Tools that compute and visualize biomolecular electrostatic surface potential have been used extensively for studying biomolecular function. However, determining the surface potential for large biomolecules on a typical desktop computer can take days or longer using currently available tools and methods. Two commonly used techniques to speed-up these types of electrostatic computations are approximations based on multi-scale coarse-graining and parallelization across multiple processors. This paper demonstrates that for the computation of electrostatic surface potential, these two techniques can be combined to deliver significantly greater speed-up than either one separately, something that is in general not always possible. Specifically, the electrostatic potential computation, using an analytical linearized Poisson-Boltzmann (ALPB) method, is approximated using the hierarchical charge partitioning (HCP) multi-scale method, and parallelized on an ATI Radeon 4870 graphical processing unit (GPU). The implementation delivers a combined 934-fold speed-up for a 476,040 atom viral capsid, compared to an equivalent non-parallel implementation on an Intel E6550 CPU without the approximation. This speed-up is significantly greater than the 42-fold speed-up for the HCP approximation alone or the 182-fold speed-up for the GPU alone. Copyright (c) 2010 Elsevier Inc. All rights reserved.
Sturm und Drang na música para teclado de Wilhelm Friedemann Bach: evidências reveladas na Polonaise No.4 em Ré menor Sturm und Drang in the keyboard music of Wilhelm Friedemann Bach: detected evidences in Polonaise N.4, in D minor

Directory of Open Access Journals (Sweden)

Stella Almeida Rosa

2011-12-01

Full Text Available Este trabalho propõese a revelar elementos contextuais e musicais, especialmente aqueles ligados à expressividade, que aproximem a obra para teclado de Wilhelm Friedemann Bach ao movimento Sturm und Drang, ocorrido na Alemanha no início da segunda metade do século XVIII, através do reconhecimento dos procedimentos literários e musicais envolvidos e da análise da Polonaise nº 4, em Ré menor, como obra representativa do que se pretende demonstrar.This paper intends to point out contextual and musical elements, especially those relative to expressiveness, that brings Wilhelm Friedemann Bach's keyboard works close to German Sturm und Drang, that happened during the beginning of the second half of the eighteenth century, through the identification of the literary and musical procedures and the analysis of the Polonaise number 4, in D minor, as a representative work of this style.
GPU accelerated manifold correction method for spinning compact binaries

Science.gov (United States)

Ran, Chong-xi; Liu, Song; Zhong, Shuang-ying

2018-04-01

The graphics processing unit (GPU) acceleration of the manifold correction algorithm based on the compute unified device architecture (CUDA) technology is designed to simulate the dynamic evolution of the Post-Newtonian (PN) Hamiltonian formulation of spinning compact binaries. The feasibility and the efficiency of parallel computation on GPU have been confirmed by various numerical experiments. The numerical comparisons show that the accuracy on GPU execution of manifold corrections method has a good agreement with the execution of codes on merely central processing unit (CPU-based) method. The acceleration ability when the codes are implemented on GPU can increase enormously through the use of shared memory and register optimization techniques without additional hardware costs, implying that the speedup is nearly 13 times as compared with the codes executed on CPU for phase space scan (including 314 × 314 orbits). In addition, GPU-accelerated manifold correction method is used to numerically study how dynamics are affected by the spin-induced quadrupole-monopole interaction for black hole binary system.
Assembly For Moving a Robotic Device Along Selected Axes

Science.gov (United States)

Nowlin, Brentley Craig (Inventor); Koch, Lisa Danielle (Inventor)

2001-01-01

An assembly for moving a robotic device along selected axes includes a programmable logic controller (PLC) for controlling movement of the device along selected axes to effect movement of the device to a selected disposition. The PLC includes a plurality of single axis motion control modules, and a central processing unit (CPU) in communication with the motion control modules. A human-machine interface is provided for operator selection of configurations of device movements and is in communication with the CPU. A motor drive is in communication with each of the motion control modules and is operable to effect movement of the device along the selected axes to obtain movement of the device to the selected disposition.
GPU-based high performance Monte Carlo simulation in neutron transport

Energy Technology Data Exchange (ETDEWEB)

Heimlich, Adino; Mol, Antonio C.A.; Pereira, Claudio M.N.A. [Instituto de Engenharia Nuclear (IEN/CNEN-RJ), Rio de Janeiro, RJ (Brazil). Lab. de Inteligencia Artificial Aplicada], e-mail: cmnap@ien.gov.br

2009-07-01

Graphics Processing Units (GPU) are high performance co-processors intended, originally, to improve the use and quality of computer graphics applications. Since researchers and practitioners realized the potential of using GPU for general purpose, their application has been extended to other fields out of computer graphics scope. The main objective of this work is to evaluate the impact of using GPU in neutron transport simulation by Monte Carlo method. To accomplish that, GPU- and CPU-based (single and multicore) approaches were developed and applied to a simple, but time-consuming problem. Comparisons demonstrated that the GPU-based approach is about 15 times faster than a parallel 8-core CPU-based approach also developed in this work. (author)
GPU-based high performance Monte Carlo simulation in neutron transport

International Nuclear Information System (INIS)

Heimlich, Adino; Mol, Antonio C.A.; Pereira, Claudio M.N.A.

2009-01-01

Graphics Processing Units (GPU) are high performance co-processors intended, originally, to improve the use and quality of computer graphics applications. Since researchers and practitioners realized the potential of using GPU for general purpose, their application has been extended to other fields out of computer graphics scope. The main objective of this work is to evaluate the impact of using GPU in neutron transport simulation by Monte Carlo method. To accomplish that, GPU- and CPU-based (single and multicore) approaches were developed and applied to a simple, but time-consuming problem. Comparisons demonstrated that the GPU-based approach is about 15 times faster than a parallel 8-core CPU-based approach also developed in this work. (author)
An Integrated Pipeline of Open Source Software Adapted for Multi-CPU Architectures: Use in the Large-Scale Identification of Single Nucleotide Polymorphisms

Directory of Open Access Journals (Sweden)

B. Jayashree

2007-01-01

Full Text Available The large amounts of EST sequence data available from a single species of an organism as well as for several species within a genus provide an easy source of identification of intra- and interspecies single nucleotide polymorphisms (SNPs. In the case of model organisms, the data available are numerous, given the degree of redundancy in the deposited EST data. There are several available bioinformatics tools that can be used to mine this data; however, using them requires a certain level of expertise: the tools have to be used sequentially with accompanying format conversion and steps like clustering and assembly of sequences become time-intensive jobs even for moderately sized datasets. We report here a pipeline of open source software extended to run on multiple CPU architectures that can be used to mine large EST datasets for SNPs and identify restriction sites for assaying the SNPs so that cost-effective CAPS assays can be developed for SNP genotyping in genetics and breeding applications. At the International Crops Research Institute for the Semi-Arid Tropics (ICRISAT, the pipeline has been implemented to run on a Paracel high-performance system consisting of four dual AMD Opteron processors running Linux with MPICH. The pipeline can be accessed through user-friendly web interfaces at http://hpc.icrisat.cgiar.org/PBSWeb and is available on request for academic use. We have validated the developed pipeline by mining chickpea ESTs for interspecies SNPs, development of CAPS assays for SNP genotyping, and confirmation of restriction digestion pattern at the sequence level.
A Parallel Algebraic Multigrid Solver on Graphics Processing Units

KAUST Repository

Haase, Gundolf

2010-01-01

The paper presents a multi-GPU implementation of the preconditioned conjugate gradient algorithm with an algebraic multigrid preconditioner (PCG-AMG) for an elliptic model problem on a 3D unstructured grid. An efficient parallel sparse matrix-vector multiplication scheme underlying the PCG-AMG algorithm is presented for the many-core GPU architecture. A performance comparison of the parallel solver shows that a singe Nvidia Tesla C1060 GPU board delivers the performance of a sixteen node Infiniband cluster and a multi-GPU configuration with eight GPUs is about 100 times faster than a typical server CPU core. © 2010 Springer-Verlag.
A primary study on the increasing of efficiency in the computer cooling system by means of external air

Energy Technology Data Exchange (ETDEWEB)

Kim, S. H.; Kim, M. H. [Silla University, Busan (Korea, Republic of)

2009-07-01

In recent years, since the continuing increase in the capacity of in personal computer such as the optimal performance, high quality and high resolution image, the computer system's components produce large amounts of heat during operation. This study analyzes and investigates an ability and efficiency of the cooling system inside the computer by means of Central Processing Unit (CPU) and power supply cooling fan. This research was conducted for increasing an ability of the cooling system inside the computer by making a structure which produces different air pressures in an air inflow tube. Consequently, when temperatures of the CPU and room inside computer were compared with a general personal computer, temperatures of the tested CPU, the room and the heat sink were as low as 5 .deg. C, 2.5 .deg. C and 7 .deg. C respectively. In addition to, Revolution Per Minute (RPM) was shown as low as 250 after 1 hour operation. This research explored the possibility of enhancing the effective cooling of high-performance computer systems.
Child–Computer Interaction at the Beginner Stage of Music Learning: Effects of Reflexive Interaction on Children’s Musical Improvisation

Science.gov (United States)

Addessi, Anna Rita; Anelli, Filomena; Benghi, Diber; Friberg, Anders

2017-01-01

In this article children’s musical improvisation is investigated through the “reflexive interaction” paradigm. We used a particular system, the MIROR-Impro, implemented in the framework of the MIROR project (EC-FP7), which is able to reply to the child playing a keyboard by a “reflexive” output, mirroring (with repetitions and variations) her/his inputs. The study was conducted in a public primary school, with 47 children, aged 6–7. The experimental design used the convergence procedure, based on three sample groups allowing us to verify if the reflexive interaction using the MIROR-Impro is necessary and/or sufficient to improve the children’s abilities to improvise. The following conditions were used as independent variables: to play only the keyboard, the keyboard with the MIROR-Impro but with not-reflexive reply, the keyboard with the MIROR-Impro with reflexive reply. As dependent variables we estimated the children’s ability to improvise in solos, and in duets. Each child carried out a training program consisting of 5 weekly individual 12 min sessions. The control group played the complete package of independent variables; Experimental Group 1 played the keyboard and the keyboard with the MIROR-Impro with not-reflexive reply; Experimental Group 2 played only the keyboard with the reflexive system. One week after, the children were asked to improvise a musical piece on the keyboard alone (Solo task), and in pairs with a friend (Duet task). Three independent judges assessed the Solo and the Duet tasks by means of a grid based on the TAI-Test for Ability to Improvise rating scale. The EG2, which trained only with the reflexive system, reached the highest average results and the difference with EG1, which did not used the reflexive system, is statistically significant when the children improvise in a duet. The results indicate that in the sample of participants the reflexive interaction alone could be sufficient to increase the improvisational
Child-Computer Interaction at the Beginner Stage of Music Learning: Effects of Reflexive Interaction on Children's Musical Improvisation.

Science.gov (United States)

Addessi, Anna Rita; Anelli, Filomena; Benghi, Diber; Friberg, Anders

2017-01-01

In this article children's musical improvisation is investigated through the "reflexive interaction" paradigm. We used a particular system, the MIROR-Impro, implemented in the framework of the MIROR project (EC-FP7), which is able to reply to the child playing a keyboard by a "reflexive" output, mirroring (with repetitions and variations) her/his inputs. The study was conducted in a public primary school, with 47 children, aged 6-7. The experimental design used the convergence procedure, based on three sample groups allowing us to verify if the reflexive interaction using the MIROR-Impro is necessary and/or sufficient to improve the children's abilities to improvise. The following conditions were used as independent variables: to play only the keyboard, the keyboard with the MIROR-Impro but with not-reflexive reply, the keyboard with the MIROR-Impro with reflexive reply. As dependent variables we estimated the children's ability to improvise in solos, and in duets. Each child carried out a training program consisting of 5 weekly individual 12 min sessions. The control group played the complete package of independent variables; Experimental Group 1 played the keyboard and the keyboard with the MIROR-Impro with not-reflexive reply; Experimental Group 2 played only the keyboard with the reflexive system. One week after, the children were asked to improvise a musical piece on the keyboard alone (Solo task), and in pairs with a friend (Duet task). Three independent judges assessed the Solo and the Duet tasks by means of a grid based on the TAI-Test for Ability to Improvise rating scale. The EG2, which trained only with the reflexive system, reached the highest average results and the difference with EG1, which did not used the reflexive system, is statistically significant when the children improvise in a duet. The results indicate that in the sample of participants the reflexive interaction alone could be sufficient to increase the improvisational skills, and necessary
AMITIS: A 3D GPU-Based Hybrid-PIC Model for Space and Plasma Physics

Science.gov (United States)

Fatemi, Shahab; Poppe, Andrew R.; Delory, Gregory T.; Farrell, William M.

2017-05-01

We have developed, for the first time, an advanced modeling infrastructure in space simulations (AMITIS) with an embedded three-dimensional self-consistent grid-based hybrid model of plasma (kinetic ions and fluid electrons) that runs entirely on graphics processing units (GPUs). The model uses NVIDIA GPUs and their associated parallel computing platform, CUDA, developed for general purpose processing on GPUs. The model uses a single CPU-GPU pair, where the CPU transfers data between the system and GPU memory, executes CUDA kernels, and writes simulation outputs on the disk. All computations, including moving particles, calculating macroscopic properties of particles on a grid, and solving hybrid model equations are processed on a single GPU. We explain various computing kernels within AMITIS and compare their performance with an already existing well-tested hybrid model of plasma that runs in parallel using multi-CPU platforms. We show that AMITIS runs ∼10 times faster than the parallel CPU-based hybrid model. We also introduce an implicit solver for computation of Faraday’s Equation, resulting in an explicit-implicit scheme for the hybrid model equation. We show that the proposed scheme is stable and accurate. We examine the AMITIS energy conservation and show that the energy is conserved with an error < 0.2% after 500,000 timesteps, even when a very low number of particles per cell is used.
Efficient Scalable Median Filtering Using Histogram-Based Operations.

Science.gov (United States)

Green, Oded

2018-05-01

Median filtering is a smoothing technique for noise removal in images. While there are various implementations of median filtering for a single-core CPU, there are few implementations for accelerators and multi-core systems. Many parallel implementations of median filtering use a sorting algorithm for rearranging the values within a filtering window and taking the median of the sorted value. While using sorting algorithms allows for simple parallel implementations, the cost of the sorting becomes prohibitive as the filtering windows grow. This makes such algorithms, sequential and parallel alike, inefficient. In this work, we introduce the first software parallel median filtering that is non-sorting-based. The new algorithm uses efficient histogram-based operations. These reduce the computational requirements of the new algorithm while also accessing the image fewer times. We show an implementation of our algorithm for both the CPU and NVIDIA's CUDA supported graphics processing unit (GPU). The new algorithm is compared with several other leading CPU and GPU implementations. The CPU implementation has near perfect linear scaling with a speedup on a quad-core system. The GPU implementation is several orders of magnitude faster than the other GPU implementations for mid-size median filters. For small kernels, and , comparison-based approaches are preferable as fewer operations are required. Lastly, the new algorithm is open-source and can be found in the OpenCV library.

Bayer image parallel decoding based on GPU

Science.gov (United States)

Hu, Rihui; Xu, Zhiyong; Wei, Yuxing; Sun, Shaohua

2012-11-01

In the photoelectrical tracking system, Bayer image is decompressed in traditional method, which is CPU-based. However, it is too slow when the images become large, for example, 2K×2K×16bit. In order to accelerate the Bayer image decoding, this paper introduces a parallel speedup method for NVIDA's Graphics Processor Unit (GPU) which supports CUDA architecture. The decoding procedure can be divided into three parts: the first is serial part, the second is task-parallelism part, and the last is data-parallelism part including inverse quantization, inverse discrete wavelet transform (IDWT) as well as image post-processing part. For reducing the execution time, the task-parallelism part is optimized by OpenMP techniques. The data-parallelism part could advance its efficiency through executing on the GPU as CUDA parallel program. The optimization techniques include instruction optimization, shared memory access optimization, the access memory coalesced optimization and texture memory optimization. In particular, it can significantly speed up the IDWT by rewriting the 2D (Tow-dimensional) serial IDWT into 1D parallel IDWT. Through experimenting with 1K×1K×16bit Bayer image, data-parallelism part is 10 more times faster than CPU-based implementation. Finally, a CPU+GPU heterogeneous decompression system was designed. The experimental result shows that it could achieve 3 to 5 times speed increase compared to the CPU serial method.
GPU based acceleration of first principles calculation

International Nuclear Information System (INIS)

Tomono, H; Tsumuraya, K; Aoki, M; Iitaka, T

2010-01-01

We present a Graphics Processing Unit (GPU) accelerated simulations of first principles electronic structure calculations. The FFT, which is the most time-consuming part, is about 10 times accelerated. As the result, the total computation time of a first principles calculation is reduced to 15 percent of that of the CPU.
AcEST: BP915442 [AcEST

Lifescience Database Archive (English)

Full Text Available to BlastX Result : TrEMBL tr_hit_id B8CPU2 Definition tr|B8CPU2|B8CPU2_9GAMM GGDEF domain protein OS=Shewanella piezo...ore E Sequences producing significant alignments: (bits) Value tr|B8CPU2|B8CPU2_9GAMM GGDEF domain protein OS=Shewanella piezo...t... 33 5.5 >tr|B8CPU2|B8CPU2_9GAMM GGDEF domain protein OS=Shewanella piezotolerans WP3 G
The effect of handedness on spatial and motor representation of pitch patterns in pianists.

Directory of Open Access Journals (Sweden)

Eline Adrianne Smit

Full Text Available This study investigated the effect of handedness on pianists' abilities to adjust their keyboard performance skills to new spatial and motor mappings. Left- and right-handed pianists practiced simple melodies on a regular MIDI piano keyboard (practice and were then asked to perform these with modified melodic contours (the same or reversed melodic contour causing a change of fingering and on a reversed MIDI piano keyboard (test. The difference of performance duration between the practice and the test phase as well as the amount of errors played were used as test measures. Overall, a stronger effect for modified melodic contours than for the reversed keyboard was observed. Furthermore, we observed a trend of left-handed pianists to be quicker and more accurate in playing melodies when reversing their fingering with reversed contours in their left-hand performances. This suggests that handedness may influence pianists' skill to adjust to new spatial and motor mappings.
Using GPU to calculate electron dose for hybrid pencil beam model

International Nuclear Information System (INIS)

Guo Chengjun; Li Xia; Hou Qing; Wu Zhangwen

2011-01-01

Hybrid pencil beam model (HPBM) offers an efficient approach to calculate the three-dimension dose distribution from a clinical electron beam. Still, clinical radiation treatment activity desires faster treatment plan process. Our work presented the fast implementation of HPBM-based electron dose calculation using graphics processing unit (GPU). The HPBM algorithm was implemented in compute unified device architecture running on the GPU, and C running on the CPU, respectively. Several tests with various sizes of the field, beamlet and voxel were used to evaluate our implementation. On an NVIDIA GeForce GTX470 GPU card, we achieved speedup factors of 2.18- 98.23 with acceptable accuracy, compared with the results from a Pentium E5500 2.80 GHz Dual-core CPU. (authors)
Paediatric dose display

International Nuclear Information System (INIS)

Griffin, D.W.; Derges, S.; Hesslewood, S.

1984-01-01

A compact, inexpensive unit, based on an 8085 microprocessor, has been designed for calculating doses of intravenous radioactive injections for children. It has been used successfully for over a year. The dose is calculated from the body surface area and the result displayed in MBq. The operator can obtain the required dose on a twelve character alphanumeric display by entering the age of the patient and the adult dose using a hexadecimal keyboard. Circuit description, memory map and input/output, and firmware are dealt with. (U.K.)
Construction of cost effective homebuilt spin coater for coating ...

African Journals Online (AJOL)

We report the construction of a cost effective and low power consumption spin coater from a direct current (DC) brushless motor. The DC mechanical component is widdely available in the central processing unit (CPU) cooler. This set up permits simple operation where the DC voltage can be controlled manually in order to ...
Development and Evaluation of Educational Materials for Embedded Systems to Increase the Learning Motivation

Science.gov (United States)

Koshino, Makoto; Kojima, Yuki; Kanedera, Noboru

2013-01-01

Educational materials of embedded systems are currently used in many educational institutions. However, they have difficulties in arousing the interest of students. One of the reasons is that a poor CPU (central processing unit), which has been loaded in the current materials, cannot execute the multimedia processing. In order to make the…
Real time 3D structural and Doppler OCT imaging on graphics processing units

Science.gov (United States)

Sylwestrzak, Marcin; Szlag, Daniel; Szkulmowski, Maciej; Gorczyńska, Iwona; Bukowska, Danuta; Wojtkowski, Maciej; Targowski, Piotr

2013-03-01

In this report the application of graphics processing unit (GPU) programming for real-time 3D Fourier domain Optical Coherence Tomography (FdOCT) imaging with implementation of Doppler algorithms for visualization of the flows in capillary vessels is presented. Generally, the time of the data processing of the FdOCT data on the main processor of the computer (CPU) constitute a main limitation for real-time imaging. Employing additional algorithms, such as Doppler OCT analysis, makes this processing even more time consuming. Lately developed GPUs, which offers a very high computational power, give a solution to this problem. Taking advantages of them for massively parallel data processing, allow for real-time imaging in FdOCT. The presented software for structural and Doppler OCT allow for the whole processing with visualization of 2D data consisting of 2000 A-scans generated from 2048 pixels spectra with frame rate about 120 fps. The 3D imaging in the same mode of the volume data build of 220 × 100 A-scans is performed at a rate of about 8 frames per second. In this paper a software architecture, organization of the threads and optimization applied is shown. For illustration the screen shots recorded during real time imaging of the phantom (homogeneous water solution of Intralipid in glass capillary) and the human eye in-vivo is presented.
Micron system for automatization and analysis of measurements in nuclear photoemulsion

International Nuclear Information System (INIS)

Dajon, M.I.; Kotel'nikov, K.A.; Martynov, A.G.; Rappoport, V.M.; Smirnitskij, V.A.; Ozerskij, M.A.

1987-01-01

The automatized ''Micron'' system designed for measuring, processing and analyzing events in nuclear photoemulsion is described. The flowsheets of the device, program packages for searching neutrino interactions in nuclear photoemulsion and plotting target diagrams in X-ray emulsion chambers are presented. The ''Micron'' system consists of the following functional units: a three-coordinate measuring microscope MPEh-11 combined with a coordinate recording unit, designed for measuring coordinates of grains in the emulsion and displaying them on a peripheral, a control unit based on ''Elektronika-60'' microcomputer, a controller KK-60 for connecting the CAMAC highway, an analog-to-digital display with the keyboard. The PDP-11/70 is the basic computer. The event of charmed Λ c + barion production followed by the Λ c + →Σ + π + π - decay observed in nuclear photoemulsion is described
16-Bit RISC Processor Design for Convolution Application

OpenAIRE

Anand Nandakumar Shardul

2013-01-01

In this project, we propose a 16-bit non-pipelined RISC processor, which is used for signal processing applications. The processor consists of the blocks, namely, program counter, clock control unit, ALU, IDU and registers. Advantageous architectural modifications have been made in the incremented circuit used in program counter and carry select adder unit of the ALU in the RISC CPU core. Furthermore, a high speed and low power modified modifies multiplier has been designed and introduced in ...
The Ambiguity of Musical Expression Marks and the Challenges of ...

African Journals Online (AJOL)

UJAH: Unizik Journal of Arts and Humanities ... Piano-keyboard education is still a 'tender' art in Nigerian higher institutions where most learners start at a relatively very late age (17-30 yrs) and so, it becomes burdensome and sometimes unproductive to encumber the undergraduate piano-keyboard student with a plethora ...
The Photon Shell Game and the Quantum von Neumann Architecture with Superconducting Circuits

Science.gov (United States)

Mariantoni, Matteo

2012-02-01

Superconducting quantum circuits have made significant advances over the past decade, allowing more complex and integrated circuits that perform with good fidelity. We have recently implemented a machine comprising seven quantum channels, with three superconducting resonators, two phase qubits, and two zeroing registers. I will explain the design and operation of this machine, first showing how a single microwave photon | 1 > can be prepared in one resonator and coherently transferred between the three resonators. I will also show how more exotic states such as double photon states | 2 > and superposition states | 0 >+ | 1 > can be shuffled among the resonators as well [1]. I will then demonstrate how this machine can be used as the quantum-mechanical analog of the von Neumann computer architecture, which for a classical computer comprises a central processing unit and a memory holding both instructions and data. The quantum version comprises a quantum central processing unit (quCPU) that exchanges data with a quantum random-access memory (quRAM) integrated on one chip, with instructions stored on a classical computer. I will also present a proof-of-concept demonstration of a code that involves all seven quantum elements: (1), Preparing an entangled state in the quCPU, (2), writing it to the quRAM, (3), preparing a second state in the quCPU, (4), zeroing it, and, (5), reading out the first state stored in the quRAM [2]. Finally, I will demonstrate that the quantum von Neumann machine provides one unit cell of a two-dimensional qubit-resonator array that can be used for surface code quantum computing. This will allow the realization of a scalable, fault-tolerant quantum processor with the most forgiving error rates to date. [4pt] [1] M. Mariantoni et al., Nature Physics 7, 287-293 (2011.)[0pt] [2] M. Mariantoni et al., Science 334, 61-65 (2011).
Microbiological contamination in digital radiography: evaluation at the radiology clinic of an educational institution.

Science.gov (United States)

Malta, Cristiana P; Damasceno, Naiana Nl; Ribeiro, Rosangela A; Silva, Carolina Sf; Devito, Karina L

2016-12-01

The aim of this study was to evaluate the contamination rate of intra and extraoral digital X ray equipment in a dental radiology clinic at a public educational institution. Samples were collected on three different days, at two times in the day: in the morning, before attending patients, and at the end of the day, after appointment hours and before cleaning and disinfection procedures. Samples were collected from the periapical X-ray machine (tube head, positioning device, control panel and activator button), the panoramic X- ray machine (temporal support, bite block, control panel and activator button), the intraoral digital system (sensor), and the digital system computers (keyboard and mouse). The samples were seeded in different culture media, incubated, and colony forming units (CFU/mL) counted. Biochemical tests were performed for suspected colonies of Staphylococcus, Streptococcus and Gramnegative bacilli (GNB). Fungi were visually differentiated into filamentous fungi and yeasts. The results indicated the growth of fungi and Staphylococcus fromall sampling locations. GNB growth was observed from all sites sampled from the intraoral X-ray equipment. On the panoramic unit, GNB growth was observed in samples from activator button, keyboard and mouse. In general, a higher number of CFU/mL was present before use. It can be concluded that more stringent protocols are needed to control infection and prevent X-ray exams from acting as vehicle for cross contamination. Sociedad Argentina de Investigación Odontológica.
Symplectic multi-particle tracking on GPUs

Science.gov (United States)

Liu, Zhicong; Qiang, Ji

2018-05-01

A symplectic multi-particle tracking model is implemented on the Graphic Processing Units (GPUs) using the Compute Unified Device Architecture (CUDA) language. The symplectic tracking model can preserve phase space structure and reduce non-physical effects in long term simulation, which is important for beam property evaluation in particle accelerators. Though this model is computationally expensive, it is very suitable for parallelization and can be accelerated significantly by using GPUs. In this paper, we optimized the implementation of the symplectic tracking model on both single GPU and multiple GPUs. Using a single GPU processor, the code achieves a factor of 2-10 speedup for a range of problem sizes compared with the time on a single state-of-the-art Central Processing Unit (CPU) node with similar power consumption and semiconductor technology. It also shows good scalability on a multi-GPU cluster at Oak Ridge Leadership Computing Facility. In an application to beam dynamics simulation, the GPU implementation helps save more than a factor of two total computing time in comparison to the CPU implementation.
GPU accelerated generation of digitally reconstructed radiographs for 2-D/3-D image registration.

Science.gov (United States)

Dorgham, Osama M; Laycock, Stephen D; Fisher, Mark H

2012-09-01

Recent advances in programming languages for graphics processing units (GPUs) provide developers with a convenient way of implementing applications which can be executed on the CPU and GPU interchangeably. GPUs are becoming relatively cheap, powerful, and widely available hardware components, which can be used to perform intensive calculations. The last decade of hardware performance developments shows that GPU-based computation is progressing significantly faster than CPU-based computation, particularly if one considers the execution of highly parallelisable algorithms. Future predictions illustrate that this trend is likely to continue. In this paper, we introduce a way of accelerating 2-D/3-D image registration by developing a hybrid system which executes on the CPU and utilizes the GPU for parallelizing the generation of digitally reconstructed radiographs (DRRs). Based on the advancements of the GPU over the CPU, it is timely to exploit the benefits of many-core GPU technology by developing algorithms for DRR generation. Although some previous work has investigated the rendering of DRRs using the GPU, this paper investigates approximations which reduce the computational overhead while still maintaining a quality consistent with that needed for 2-D/3-D registration with sufficient accuracy to be clinically acceptable in certain applications of radiation oncology. Furthermore, by comparing implementations of 2-D/3-D registration on the CPU and GPU, we investigate current performance and propose an optimal framework for PC implementations addressing the rigid registration problem. Using this framework, we are able to render DRR images from a 256×256×133 CT volume in ~24 ms using an NVidia GeForce 8800 GTX and in ~2 ms using NVidia GeForce GTX 580. In addition to applications requiring fast automatic patient setup, these levels of performance suggest image-guided radiation therapy at video frame rates is technically feasible using relatively low cost PC
Optimizing The Performance of Streaming Numerical Kernels On The IBM Blue Gene/P PowerPC 450

KAUST Repository

Malas, Tareq

2011-07-01

Several emerging petascale architectures use energy-efficient processors with vectorized computational units and in-order thread processing. On these architectures the sustained performance of streaming numerical kernels, ubiquitous in the solution of partial differential equations, represents a formidable challenge despite the regularity of memory access. Sophisticated optimization techniques beyond the capabilities of modern compilers are required to fully utilize the Central Processing Unit (CPU). The aim of the work presented here is to improve the performance of streaming numerical kernels on high performance architectures by developing efficient algorithms to utilize the vectorized floating point units. The importance of the development time demands the creation of tools to enable simple yet direct development in assembly to utilize the power-efficient cores featuring in-order execution and multiple-issue units. We implement several stencil kernels for a variety of cached memory scenarios using our Python instruction simulation and generation tool. Our technique simplifies the development of efficient assembly code for the IBM Blue Gene/P supercomputer\\'s PowerPC 450. This enables us to perform high-level design, construction, verification, and simulation on a subset of the CPU\\'s instruction set. Our framework has the capability to implement streaming numerical kernels on current and future high performance architectures. Finally, we present several automatically generated implementations, including a 27-point stencil achieving a 1.7x speedup over the best previously published results.
Performance Analysis of FEM Algorithmson GPU and Many-Core Architectures

KAUST Repository

Khurram, Rooh

2015-04-27

The roadmaps of the leading supercomputer manufacturers are based on hybrid systems, which consist of a mix of conventional processors and accelerators. This trend is mainly due to the fact that the power consumption cost of the future cpu-only Exascale systems will be unsustainable, thus accelerators such as graphic processing units (GPUs) and many-integrated-core (MIC) will likely be the integral part of the TOP500 (http://www.top500.org/) supercomputers, beyond 2020. The emerging supercomputer architecture will bring new challenges for the code developers. Continuum mechanics codes will particularly be affected, because the traditional synchronous implicit solvers will probably not scale on hybrid Exascale machines. In the previous study[1], we reported on the performance of a conjugate gradient based mesh motion algorithm[2]on Sandy Bridge, Xeon Phi, and K20c. In the present study we report on the comparative study of finite element codes, using PETSC and AmgX solvers on CPU and GPUs, respectively [3,4]. We believe this study will be a good starting point for FEM code developers, who are contemplating a CPU to accelerator transition.
BarraCUDA - a fast short read sequence aligner using graphics processing units

Directory of Open Access Journals (Sweden)

Klus Petr

2012-01-01

Full Text Available Abstract Background With the maturation of next-generation DNA sequencing (NGS technologies, the throughput of DNA sequencing reads has soared to over 600 gigabases from a single instrument run. General purpose computing on graphics processing units (GPGPU, extracts the computing power from hundreds of parallel stream processors within graphics processing cores and provides a cost-effective and energy efficient alternative to traditional high-performance computing (HPC clusters. In this article, we describe the implementation of BarraCUDA, a GPGPU sequence alignment software that is based on BWA, to accelerate the alignment of sequencing reads generated by these instruments to a reference DNA sequence. Findings Using the NVIDIA Compute Unified Device Architecture (CUDA software development environment, we ported the most computational-intensive alignment component of BWA to GPU to take advantage of the massive parallelism. As a result, BarraCUDA offers a magnitude of performance boost in alignment throughput when compared to a CPU core while delivering the same level of alignment fidelity. The software is also capable of supporting multiple CUDA devices in parallel to further accelerate the alignment throughput. Conclusions BarraCUDA is designed to take advantage of the parallelism of GPU to accelerate the alignment of millions of sequencing reads generated by NGS instruments. By doing this, we could, at least in part streamline the current bioinformatics pipeline such that the wider scientific community could benefit from the sequencing technology. BarraCUDA is currently available from http://seqbarracuda.sf.net
GAMER: A GRAPHIC PROCESSING UNIT ACCELERATED ADAPTIVE-MESH-REFINEMENT CODE FOR ASTROPHYSICS

International Nuclear Information System (INIS)

Schive, H.-Y.; Tsai, Y.-C.; Chiueh Tzihong

2010-01-01

We present the newly developed code, GPU-accelerated Adaptive-MEsh-Refinement code (GAMER), which adopts a novel approach in improving the performance of adaptive-mesh-refinement (AMR) astrophysical simulations by a large factor with the use of the graphic processing unit (GPU). The AMR implementation is based on a hierarchy of grid patches with an oct-tree data structure. We adopt a three-dimensional relaxing total variation diminishing scheme for the hydrodynamic solver and a multi-level relaxation scheme for the Poisson solver. Both solvers have been implemented in GPU, by which hundreds of patches can be advanced in parallel. The computational overhead associated with the data transfer between the CPU and GPU is carefully reduced by utilizing the capability of asynchronous memory copies in GPU, and the computing time of the ghost-zone values for each patch is diminished by overlapping it with the GPU computations. We demonstrate the accuracy of the code by performing several standard test problems in astrophysics. GAMER is a parallel code that can be run in a multi-GPU cluster system. We measure the performance of the code by performing purely baryonic cosmological simulations in different hardware implementations, in which detailed timing analyses provide comparison between the computations with and without GPU(s) acceleration. Maximum speed-up factors of 12.19 and 10.47 are demonstrated using one GPU with 4096 3 effective resolution and 16 GPUs with 8192 3 effective resolution, respectively.

BarraCUDA - a fast short read sequence aligner using graphics processing units

LENUS (Irish Health Repository)

Klus, Petr

2012-01-13

Abstract Background With the maturation of next-generation DNA sequencing (NGS) technologies, the throughput of DNA sequencing reads has soared to over 600 gigabases from a single instrument run. General purpose computing on graphics processing units (GPGPU), extracts the computing power from hundreds of parallel stream processors within graphics processing cores and provides a cost-effective and energy efficient alternative to traditional high-performance computing (HPC) clusters. In this article, we describe the implementation of BarraCUDA, a GPGPU sequence alignment software that is based on BWA, to accelerate the alignment of sequencing reads generated by these instruments to a reference DNA sequence. Findings Using the NVIDIA Compute Unified Device Architecture (CUDA) software development environment, we ported the most computational-intensive alignment component of BWA to GPU to take advantage of the massive parallelism. As a result, BarraCUDA offers a magnitude of performance boost in alignment throughput when compared to a CPU core while delivering the same level of alignment fidelity. The software is also capable of supporting multiple CUDA devices in parallel to further accelerate the alignment throughput. Conclusions BarraCUDA is designed to take advantage of the parallelism of GPU to accelerate the alignment of millions of sequencing reads generated by NGS instruments. By doing this, we could, at least in part streamline the current bioinformatics pipeline such that the wider scientific community could benefit from the sequencing technology. BarraCUDA is currently available from http:\\/\\/seqbarracuda.sf.net
BPU Simulator

DEFF Research Database (Denmark)

Rehr, Martin; Skovhede, Kenneth; Vinter, Brian

2013-01-01

in that process. Our goal is to support all execution platforms, and in this work we introduce the Bohrium Processing Unit, BPU, which will be the FPGA backend for Bohrium. The BPU is modeled as a PyCSP application, and the clear advantages of using CSP for simulating a new CPU is described. The current Py...
Typewriting 10-20-30. Senior High School Teacher Resource Manual.

Science.gov (United States)

Alberta Dept. of Education, Edmonton. Curriculum Branch.

This manual is designed to assist typewriting teachers in the implementation of the Alberta, Canada, Typewriting 10-20-30 Curriculum (1985). Many ideas contained in the handbook can also be used in curricula that address keyboarding or the skill of typing in computer keyboarding, dictaphone typing, or word processing. The manual is organized in…
Speech Recognition Technology for Disabilities Education

Science.gov (United States)

Tang, K. Wendy; Kamoua, Ridha; Sutan, Victor; Farooq, Omer; Eng, Gilbert; Chu, Wei Chern; Hou, Guofeng

2005-01-01

Speech recognition is an alternative to traditional methods of interacting with a computer, such as textual input through a keyboard. An effective system can replace or reduce the reliability on standard keyboard and mouse input. This can especially assist dyslexic students who have problems with character or word use and manipulation in a textual…
Neural Point-and-Click Communication by a Person With Incomplete Locked-In Syndrome.

Science.gov (United States)

Bacher, Daniel; Jarosiewicz, Beata; Masse, Nicolas Y; Stavisky, Sergey D; Simeral, John D; Newell, Katherine; Oakley, Erin M; Cash, Sydney S; Friehs, Gerhard; Hochberg, Leigh R

2015-06-01

A goal of brain-computer interface research is to develop fast and reliable means of communication for individuals with paralysis and anarthria. We evaluated the ability of an individual with incomplete locked-in syndrome enrolled in the BrainGate Neural Interface System pilot clinical trial to communicate using neural point-and-click control. A general-purpose interface was developed to provide control of a computer cursor in tandem with one of two on-screen virtual keyboards. The novel BrainGate Radial Keyboard was compared to a standard QWERTY keyboard in a balanced copy-spelling task. The Radial Keyboard yielded a significant improvement in typing accuracy and speed-enabling typing rates over 10 correct characters per minute. The participant used this interface to communicate face-to-face with research staff by using text-to-speech conversion, and remotely using an internet chat application. This study demonstrates the first use of an intracortical brain-computer interface for neural point-and-click communication by an individual with incomplete locked-in syndrome. © The Author(s) 2014.
Factors influencing hand/eye synchronicity in the computer age.

Science.gov (United States)

Grant, A H

1992-09-01

In using a computer, the relation of vision to hand/finger actuated keyboard usage in performing fine motor-coordinated functions is influenced by the physical location, size, and collective placement of the keys. Traditional nonprehensile flat/rectangular keyboard applications usually require a high and nearly constant level of visual attention. Biometrically shaped keyboards would allow for prehensile hand-posturing, thus affording better tactile familiarity with the keys, requiring less intense and less constant level of visual attention to the task, and providing a greater measure of freedom from having to visualize the key(s). Workpace and related physiological changes, aging, onset of monocularization (intermittent lapsing of binocularity for near vision) that accompanies presbyopia, tool colors, and background contrast are factors affecting constancy of visual attention to task performance. Capitas extension, excessive excyclotorsion, and repetitive strain injuries (such as carpal tunnel syndrome) are common and debilitating concomitants to computer usage. These problems can be remedied by improved keyboard design. The salutary role of mnemonics in minimizing visual dependency is discussed.
Significantly reducing registration time in IGRT using graphics processing units

DEFF Research Database (Denmark)

Noe, Karsten Østergaard; Denis de Senneville, Baudouin; Tanderup, Kari

2008-01-01

respiration phases in a free breathing volunteer and 41 anatomical landmark points in each image series. The registration method used is a multi-resolution GPU implementation of the 3D Horn and Schunck algorithm. It is based on the CUDA framework from Nvidia. Results On an Intel Core 2 CPU at 2.4GHz each...... registration took 30 minutes. On an Nvidia Geforce 8800GTX GPU in the same machine this registration took 37 seconds, making the GPU version 48.7 times faster. The nine image series of different respiration phases were registered to the same reference image (full inhale). Accuracy was evaluated on landmark...
Fast-GPU-PCC: A GPU-Based Technique to Compute Pairwise Pearson's Correlation Coefficients for Time Series Data-fMRI Study.

Science.gov (United States)

Eslami, Taban; Saeed, Fahad

2018-04-20

Functional magnetic resonance imaging (fMRI) is a non-invasive brain imaging technique, which has been regularly used for studying brain’s functional activities in the past few years. A very well-used measure for capturing functional associations in brain is Pearson’s correlation coefficient. Pearson’s correlation is widely used for constructing functional network and studying dynamic functional connectivity of the brain. These are useful measures for understanding the effects of brain disorders on connectivities among brain regions. The fMRI scanners produce huge number of voxels and using traditional central processing unit (CPU)-based techniques for computing pairwise correlations is very time consuming especially when large number of subjects are being studied. In this paper, we propose a graphics processing unit (GPU)-based algorithm called Fast-GPU-PCC for computing pairwise Pearson’s correlation coefficient. Based on the symmetric property of Pearson’s correlation, this approach returns N ( N − 1 ) / 2 correlation coefficients located at strictly upper triangle part of the correlation matrix. Storing correlations in a one-dimensional array with the order as proposed in this paper is useful for further usage. Our experiments on real and synthetic fMRI data for different number of voxels and varying length of time series show that the proposed approach outperformed state of the art GPU-based techniques as well as the sequential CPU-based versions. We show that Fast-GPU-PCC runs 62 times faster than CPU-based version and about 2 to 3 times faster than two other state of the art GPU-based methods.
Development of parallel GPU based algorithms for problems in nuclear area

International Nuclear Information System (INIS)

Almeida, Adino Americo Heimlich

2009-01-01

Graphics Processing Units (GPU) are high performance co-processors intended, originally, to improve the use and quality of computer graphics applications. Since researchers and practitioners realized the potential of using GPU for general purpose, their application has been extended to other fields out of computer graphics scope. The main objective of this work is to evaluate the impact of using GPU in two typical problems of Nuclear area. The neutron transport simulation using Monte Carlo method and solve heat equation in a bi-dimensional domain by finite differences method. To achieve this, we develop parallel algorithms for GPU and CPU in the two problems described above. The comparison showed that the GPU-based approach is faster than the CPU in a computer with two quad core processors, without precision loss. (author)
Development of parallel GPU based algorithms for problems in nuclear area; Desenvolvimento de algoritmos paralelos baseados em GPU para solucao de problemas na area nuclear

Energy Technology Data Exchange (ETDEWEB)

Almeida, Adino Americo Heimlich

2009-07-01

Graphics Processing Units (GPU) are high performance co-processors intended, originally, to improve the use and quality of computer graphics applications. Since researchers and practitioners realized the potential of using GPU for general purpose, their application has been extended to other fields out of computer graphics scope. The main objective of this work is to evaluate the impact of using GPU in two typical problems of Nuclear area. The neutron transport simulation using Monte Carlo method and solve heat equation in a bi-dimensional domain by finite differences method. To achieve this, we develop parallel algorithms for GPU and CPU in the two problems described above. The comparison showed that the GPU-based approach is faster than the CPU in a computer with two quad core processors, without precision loss. (author)
Bioinspired, Mobile Robots With High Stability, Functionality and Low Cost

Science.gov (United States)

2014-02-19

system. We pressurized the actuators at a flow rate (0.07614 ml/s) slow enough to avoid dynamic pressure effects (i.e., quasi-static behavior) to...34Mary Had a Little Lamb" on a keyboard (Fig 11A). The four actuators are fixed to a keyboard using Velcro and actuated with 15 PSI controlled by
The Performance Improvement of the Lagrangian Particle Dispersion Model (LPDM) Using Graphics Processing Unit (GPU) Computing

Science.gov (United States)

2017-08-01

used for its GPU computing capability during the experiment. It has Nvidia Tesla K40 GPU accelerators containing 32 GPU nodes consisting of 1024...cores. CUDA is a parallel computing platform and application programming interface (API) model that was created and designed by Nvidia to give direct...Agricultural and Forest Meteorology. 1995:76:277–291, ISSN 0168-1923. 3. GPU vs. CPU? What is GPU computing? Santa Clara (CA): Nvidia Corporation; 2017
Second-generation 1024-channel portable gamma-ray spectrometer

International Nuclear Information System (INIS)

McGibbon, A.L.

1976-01-01

Following the successful design in 1974 of a 256-channel battery-powered pulse-height analyzer system, we have completed a second-generation analyzer with advanced features, lighter weight, and more rugged construction. The 17-kg analyzer includes a NaI detector and is packaged as a small suitcase; it has high stability and accuracy to allow use over the temperature range from --30 to +70 0 C. The waterproof unit has many features not found on any commercial unit to allow sophisticated analysis by non-electronics oriented personnel. Its 36-button keyboard will allow manipulation of multiple spectra, integrations, and expanded energy scale with readout in keV. If its self-contained SX70 display camera is not sufficient for record keeping, the unit will telemeter all data onto analog tape or send to a remote computer via phone coupler
Personal risk factors for carpal tunnel syndrome in female visual display unit workers

Directory of Open Access Journals (Sweden)

Matteo Riccò

2016-12-01

Full Text Available Objectives: Carpal tunnel syndrome (CTS is the most common nerve entrapment syndrome, which since the beginning of the seventies has been linked to the keyboard and visual display unit (VDU. The objective of this study was to investigate the prevalence and personal factors associated with CTS in female VDU workers in Italy. Material and Methods: Participants in this study were female adult subjects, working ≥ 20 h/week (N = 631, mean age 38.14±7.81 years, mean working age 12.9±7.24 years. Signs and symptoms were collected during compulsory occupational medical surveillance. The binary logistic regression was used to estimate adjusted odds ratios for the factors of interest. Results: Diagnosis of CTS was reported in 48 cases (7.6%, 11 of them or 1.7% after a surgical correction for the incidence of 5.94/1000 person-years. In general, signs and symptoms of CTS were associated with the following demographic factors: previous trauma of upper limb (adjusted odds ratio (ORa = 8.093, 95% confidence interval (CI: 2.347–27.904, history (> 5 years of oral contraceptives therapy/hormone replacement therapy (ORa = 3.77, 95% CI: 1.701–8.354 and cervical spine signs/symptoms (ORa = 4.565, 95% CI: 2.281–9.136. Conclusions: The prevalence of CTS was similar to the estimates for the general population of Italy. Among personal risk factors, hormone therapy, previous trauma of the upper limb and signs/symptoms of the cervical spine appeared to be associated with a higher risk of CTS syndrome. Eventually, the results reinforce interpretation of CTS in VDU workers as a work-related musculoskeletal disorder rather than a classical occupational disease. Int J Occup Med Environ Health 2016;29(6:927–936
Personal risk factors for carpal tunnel syndrome in female visual display unit workers.

Science.gov (United States)

Riccò, Matteo; Cattani, Silvia; Signorelli, Carlo

2016-11-18

Carpal tunnel syndrome (CTS) is the most common nerve entrapment syndrome, which since the beginning of the seventies has been linked to the keyboard and visual display unit (VDU). The objective of this study was to investigate the prevalence and personal factors associated with CTS in female VDU workers in Italy. Participants in this study were female adult subjects, working ≥ 20 h/week (N = 631, mean age 38.14±7.81 years, mean working age 12.9±7.24 years). Signs and symptoms were collected during compulsory occupational medical surveillance. The binary logistic regression was used to estimate adjusted odds ratios for the factors of interest. Diagnosis of CTS was reported in 48 cases (7.6%, 11 of them or 1.7% after a surgical correction) for the incidence of 5.94/1000 person-years. In general, signs and symptoms of CTS were associated with the following demographic factors: previous trauma of upper limb (adjusted odds ratio (ORa) = 8.093, 95% confidence interval (CI): 2.347-27.904), history (> 5 years) of oral contraceptives therapy/hormone replacement therapy (ORa = 3.77, 95% CI: 1.701-8.354) and cervical spine signs/symptoms (ORa = 4.565, 95% CI: 2.281-9.136). The prevalence of CTS was similar to the estimates for the general population of Italy. Among personal risk factors, hormone therapy, previous trauma of the upper limb and signs/symptoms of the cervical spine appeared to be associated with a higher risk of CTS syndrome. Eventually, the results reinforce interpretation of CTS in VDU workers as a work-related musculoskeletal disorder rather than a classical occupational disease. Int J Occup Med Environ Health 2016;29(6):927-936. This work is available in Open Access model and licensed under a CC BY-NC 3.0 PL license.
Graphics processing unit accelerated three-dimensional model for the simulation of pulsed low-temperature plasmas

Energy Technology Data Exchange (ETDEWEB)

Fierro, Andrew, E-mail: andrew.fierro@ttu.edu; Dickens, James; Neuber, Andreas [Center for Pulsed Power and Power Electronics, Department of Electrical and Computer Engineering, Texas Tech University, Lubbock, Texas 79409 (United States)

2014-12-15

A 3-dimensional particle-in-cell/Monte Carlo collision simulation that is fully implemented on a graphics processing unit (GPU) is described and used to determine low-temperature plasma characteristics at high reduced electric field, E/n, in nitrogen gas. Details of implementation on the GPU using the NVIDIA Compute Unified Device Architecture framework are discussed with respect to efficient code execution. The software is capable of tracking around 10 × 10{sup 6} particles with dynamic weighting and a total mesh size larger than 10{sup 8} cells. Verification of the simulation is performed by comparing the electron energy distribution function and plasma transport parameters to known Boltzmann Equation (BE) solvers. Under the assumption of a uniform electric field and neglecting the build-up of positive ion space charge, the simulation agrees well with the BE solvers. The model is utilized to calculate plasma characteristics of a pulsed, parallel plate discharge. A photoionization model provides the simulation with additional electrons after the initial seeded electron density has drifted towards the anode. Comparison of the performance benefits between the GPU-implementation versus a CPU-implementation is considered, and a speed-up factor of 13 for a 3D relaxation Poisson solver is obtained. Furthermore, a factor 60 speed-up is realized for parallelization of the electron processes.
The AMchip04 and the processing unit prototype for the FastTracker

International Nuclear Information System (INIS)

Andreani, A; Alberti, F; Stabile, A; Annovi, A; Beretta, M; Volpi, G; Bogdan, M; Shochet, M; Tang, J; Tompkins, L; Citterio, M; Giannetti, P; Lanza, A; Magalotti, D; Piendibene, M

2012-01-01

Modern experiments search for extremely rare processes hidden in much larger background levels. As the experiment's complexity, the accelerator backgrounds and luminosity increase we need increasingly complex and exclusive event selection. We present the first prototype of a new Processing Unit (PU), the core of the FastTracker processor (FTK). FTK is a real time tracking device for the ATLAS experiment's trigger upgrade. The computing power of the PU is such that a few hundred of them will be able to reconstruct all the tracks with transverse momentum above 1 GeV/c in ATLAS events up to Phase II instantaneous luminosities (3 × 10 34 cm −2 s −1 ) with an event input rate of 100 kHz and a latency below a hundred microseconds. The PU provides massive computing power to minimize the online execution time of complex tracking algorithms. The time consuming pattern recognition problem, generally referred to as the ''combinatorial challenge'', is solved by the Associative Memory (AM) technology exploiting parallelism to the maximum extent; it compares the event to all pre-calculated ''expectations'' or ''patterns'' (pattern matching) simultaneously, looking for candidate tracks called ''roads''. This approach reduces to a linear behavior the typical exponential complexity of the CPU based algorithms. Pattern recognition is completed by the time data are loaded into the AM devices. We report on the design of the first Processing Unit prototypes. The design had to address the most challenging aspects of this technology: a huge number of detector clusters (''hits'') must be distributed at high rate with very large fan-out to all patterns (10 Million patterns will be located on 128 chips placed on a single board) and a huge number of roads must be collected and sent back to the FTK post-pattern-recognition functions. A network of high speed serial links is used to solve the data distribution problem.
GPU-accelerated automatic identification of robust beam setups for proton and carbon-ion radiotherapy

International Nuclear Information System (INIS)

Ammazzalorso, F; Jelen, U; Bednarz, T

2014-01-01

We demonstrate acceleration on graphic processing units (GPU) of automatic identification of robust particle therapy beam setups, minimizing negative dosimetric effects of Bragg peak displacement caused by treatment-time patient positioning errors. Our particle therapy research toolkit, RobuR, was extended with OpenCL support and used to implement calculation on GPU of the Port Homogeneity Index, a metric scoring irradiation port robustness through analysis of tissue density patterns prior to dose optimization and computation. Results were benchmarked against an independent native CPU implementation. Numerical results were in agreement between the GPU implementation and native CPU implementation. For 10 skull base cases, the GPU-accelerated implementation was employed to select beam setups for proton and carbon ion treatment plans, which proved to be dosimetrically robust, when recomputed in presence of various simulated positioning errors. From the point of view of performance, average running time on the GPU decreased by at least one order of magnitude compared to the CPU, rendering the GPU-accelerated analysis a feasible step in a clinical treatment planning interactive session. In conclusion, selection of robust particle therapy beam setups can be effectively accelerated on a GPU and become an unintrusive part of the particle therapy treatment planning workflow. Additionally, the speed gain opens new usage scenarios, like interactive analysis manipulation (e.g. constraining of some setup) and re-execution. Finally, through OpenCL portable parallelism, the new implementation is suitable also for CPU-only use, taking advantage of multiple cores, and can potentially exploit types of accelerators other than GPUs.
GPU-accelerated automatic identification of robust beam setups for proton and carbon-ion radiotherapy

Science.gov (United States)

Ammazzalorso, F.; Bednarz, T.; Jelen, U.

2014-03-01

We demonstrate acceleration on graphic processing units (GPU) of automatic identification of robust particle therapy beam setups, minimizing negative dosimetric effects of Bragg peak displacement caused by treatment-time patient positioning errors. Our particle therapy research toolkit, RobuR, was extended with OpenCL support and used to implement calculation on GPU of the Port Homogeneity Index, a metric scoring irradiation port robustness through analysis of tissue density patterns prior to dose optimization and computation. Results were benchmarked against an independent native CPU implementation. Numerical results were in agreement between the GPU implementation and native CPU implementation. For 10 skull base cases, the GPU-accelerated implementation was employed to select beam setups for proton and carbon ion treatment plans, which proved to be dosimetrically robust, when recomputed in presence of various simulated positioning errors. From the point of view of performance, average running time on the GPU decreased by at least one order of magnitude compared to the CPU, rendering the GPU-accelerated analysis a feasible step in a clinical treatment planning interactive session. In conclusion, selection of robust particle therapy beam setups can be effectively accelerated on a GPU and become an unintrusive part of the particle therapy treatment planning workflow. Additionally, the speed gain opens new usage scenarios, like interactive analysis manipulation (e.g. constraining of some setup) and re-execution. Finally, through OpenCL portable parallelism, the new implementation is suitable also for CPU-only use, taking advantage of multiple cores, and can potentially exploit types of accelerators other than GPUs.
GPU: the biggest key processor for AI and parallel processing

Science.gov (United States)

Baji, Toru

2017-07-01

Two types of processors exist in the market. One is the conventional CPU and the other is Graphic Processor Unit (GPU). Typical CPU is composed of 1 to 8 cores while GPU has thousands of cores. CPU is good for sequential processing, while GPU is good to accelerate software with heavy parallel executions. GPU was initially dedicated for 3D graphics. However from 2006, when GPU started to apply general-purpose cores, it was noticed that this architecture can be used as a general purpose massive-parallel processor. NVIDIA developed a software framework Compute Unified Device Architecture (CUDA) that make it possible to easily program the GPU for these application. With CUDA, GPU started to be used in workstations and supercomputers widely. Recently two key technologies are highlighted in the industry. The Artificial Intelligence (AI) and Autonomous Driving Cars. AI requires a massive parallel operation to train many-layers of neural networks. With CPU alone, it was impossible to finish the training in a practical time. The latest multi-GPU system with P100 makes it possible to finish the training in a few hours. For the autonomous driving cars, TOPS class of performance is required to implement perception, localization, path planning processing and again SoC with integrated GPU will play a key role there. In this paper, the evolution of the GPU which is one of the biggest commercial devices requiring state-of-the-art fabrication technology will be introduced. Also overview of the GPU demanding key application like the ones described above will be introduced.

32-Bit FASTBUS computer

International Nuclear Information System (INIS)

Blossom, J.M.; Hong, J.P.; Kellner, R.G.

1985-01-01

Los Alamos National Laboratory is building a 32-bit FASTBUS computer using the NATIONAL SEMICONDUCTOR 32032 central processing unit (CPU) and containing 16 million bytes of memory. The board can act both as a FASTBUS master and as a FASTBUS slave. It contains a custom direct memory access (DMA) channel which can perform 80 million bytes per second block transfers across the FASTBUS
Parallel hyperbolic PDE simulation on clusters: Cell versus GPU

Science.gov (United States)

Rostrup, Scott; De Sterck, Hans

2010-12-01

Increasingly, high-performance computing is looking towards data-parallel computational devices to enhance computational performance. Two technologies that have received significant attention are IBM's Cell Processor and NVIDIA's CUDA programming model for graphics processing unit (GPU) computing. In this paper we investigate the acceleration of parallel hyperbolic partial differential equation simulation on structured grids with explicit time integration on clusters with Cell and GPU backends. The message passing interface (MPI) is used for communication between nodes at the coarsest level of parallelism. Optimizations of the simulation code at the several finer levels of parallelism that the data-parallel devices provide are described in terms of data layout, data flow and data-parallel instructions. Optimized Cell and GPU performance are compared with reference code performance on a single x86 central processing unit (CPU) core in single and double precision. We further compare the CPU, Cell and GPU platforms on a chip-to-chip basis, and compare performance on single cluster nodes with two CPUs, two Cell processors or two GPUs in a shared memory configuration (without MPI). We finally compare performance on clusters with 32 CPUs, 32 Cell processors, and 32 GPUs using MPI. Our GPU cluster results use NVIDIA Tesla GPUs with GT200 architecture, but some preliminary results on recently introduced NVIDIA GPUs with the next-generation Fermi architecture are also included. This paper provides computational scientists and engineers who are considering porting their codes to accelerator environments with insight into how structured grid based explicit algorithms can be optimized for clusters with Cell and GPU accelerators. It also provides insight into the speed-up that may be gained on current and future accelerator architectures for this class of applications. Program summaryProgram title: SWsolver Catalogue identifier: AEGY_v1_0 Program summary URL
Towards a Unified Sentiment Lexicon Based on Graphics Processing Units

Directory of Open Access Journals (Sweden)

Liliana Ibeth Barbosa-Santillán

2014-01-01

Full Text Available This paper presents an approach to create what we have called a Unified Sentiment Lexicon (USL. This approach aims at aligning, unifying, and expanding the set of sentiment lexicons which are available on the web in order to increase their robustness of coverage. One problem related to the task of the automatic unification of different scores of sentiment lexicons is that there are multiple lexical entries for which the classification of positive, negative, or neutral {P,N,Z} depends on the unit of measurement used in the annotation methodology of the source sentiment lexicon. Our USL approach computes the unified strength of polarity of each lexical entry based on the Pearson correlation coefficient which measures how correlated lexical entries are with a value between 1 and −1, where 1 indicates that the lexical entries are perfectly correlated, 0 indicates no correlation, and −1 means they are perfectly inversely correlated and so is the UnifiedMetrics procedure for CPU and GPU, respectively. Another problem is the high processing time required for computing all the lexical entries in the unification task. Thus, the USL approach computes a subset of lexical entries in each of the 1344 GPU cores and uses parallel processing in order to unify 155802 lexical entries. The results of the analysis conducted using the USL approach show that the USL has 95.430 lexical entries, out of which there are 35.201 considered to be positive, 22.029 negative, and 38.200 neutral. Finally, the runtime was 10 minutes for 95.430 lexical entries; this allows a reduction of the time computing for the UnifiedMetrics by 3 times.
A washable, stretchable, and self-powered human-machine interfacing Triboelectric nanogenerator for wireless communications and soft robotics pressure sensor arrays

KAUST Repository

Ahmed, Abdelsalam

2017-01-20

Flexible and stretchable human-machine Interfacing devices have attracted great attention due to the need for portable, ergonomic, and geometrically compatible devices in the new era of computer technology. Triboelectric nanogenerators (TENG) have shown promising potential for self-powered human–machine interacting devices. In this paper, a flexible, stretchable and self-powered keyboard is developed based on vertical contact-separation mode TENG. The keyboard is fabricated using urethane, silicone rubbers and Carbon Nanotubes (CNTs) electrodes. The structure shows a highly flexible, stretchable, and mechanically durable behavior, which can be conformal on different surfaces. The keyboard is capable of converting mechanical energy of finger tapping to electrical energy based on contact electrification, which can eliminate the need of external power source. The device can be utilized for wireless communication with computers owing to the self-powering mechanism. The keyboards also demonstrate consistent behavior in generating voltage signals regardless of touching objects’ materials and environmental effects like humidity. In addition, the proposed system can be used for keystroke dynamic-based authentication. Therefore, highly secured accessibility to the computers can be achieved owing to the keyboard’s high sensitivity and accurate selectivity of different users.
The QWERTY effect: how typing shapes the meanings of words.

Science.gov (United States)

Jasmin, Kyle; Casasanto, Daniel

2012-06-01

The QWERTY keyboard mediates communication for millions of language users. Here, we investigated whether differences in the way words are typed correspond to differences in their meanings. Some words are spelled with more letters on the right side of the keyboard and others with more letters on the left. In three experiments, we tested whether asymmetries in the way people interact with keys on the right and left of the keyboard influence their evaluations of the emotional valence of the words. We found the predicted relationship between emotional valence and QWERTY key position across three languages (English, Spanish, and Dutch). Words with more right-side letters were rated as more positive in valence, on average, than words with more left-side letters: the QWERTY effect. This effect was strongest in new words coined after QWERTY was invented and was also found in pseudowords. Although these data are correlational, the discovery of a similar pattern across languages, which was strongest in neologisms, suggests that the QWERTY keyboard is shaping the meanings of words as people filter language through their fingers. Widespread typing introduces a new mechanism by which semantic changes in language can arise.
A washable, stretchable, and self-powered human-machine interfacing Triboelectric nanogenerator for wireless communications and soft robotics pressure sensor arrays

KAUST Repository

Ahmed, Abdelsalam; Zhang, Steven L.; Hassan, Islam; Saadatnia, Zia; Zi, Yunlong; Zu, Jean; Wang, Zhong Lin

2017-01-01

Flexible and stretchable human-machine Interfacing devices have attracted great attention due to the need for portable, ergonomic, and geometrically compatible devices in the new era of computer technology. Triboelectric nanogenerators (TENG) have shown promising potential for self-powered human–machine interacting devices. In this paper, a flexible, stretchable and self-powered keyboard is developed based on vertical contact-separation mode TENG. The keyboard is fabricated using urethane, silicone rubbers and Carbon Nanotubes (CNTs) electrodes. The structure shows a highly flexible, stretchable, and mechanically durable behavior, which can be conformal on different surfaces. The keyboard is capable of converting mechanical energy of finger tapping to electrical energy based on contact electrification, which can eliminate the need of external power source. The device can be utilized for wireless communication with computers owing to the self-powering mechanism. The keyboards also demonstrate consistent behavior in generating voltage signals regardless of touching objects’ materials and environmental effects like humidity. In addition, the proposed system can be used for keystroke dynamic-based authentication. Therefore, highly secured accessibility to the computers can be achieved owing to the keyboard’s high sensitivity and accurate selectivity of different users.
Using Arduino microcontroller boards to measure response latencies.

Science.gov (United States)

Schubert, Thomas W; D'Ausilio, Alessandro; Canto, Rosario

2013-12-01

Latencies of buttonpresses are a staple of cognitive science paradigms. Often keyboards are employed to collect buttonpresses, but their imprecision and variability decreases test power and increases the risk of false positives. Response boxes and data acquisition cards are precise, but expensive and inflexible, alternatives. We propose using open-source Arduino microcontroller boards as an inexpensive and flexible alternative. These boards connect to standard experimental software using a USB connection and a virtual serial port, or by emulating a keyboard. In our solution, an Arduino measures response latencies after being signaled the start of a trial, and communicates the latency and response back to the PC over a USB connection. We demonstrated the reliability, robustness, and precision of this communication in six studies. Test measures confirmed that the error added to the measurement had an SD of less than 1 ms. Alternatively, emulation of a keyboard results in similarly precise measurement. The Arduino performs as well as a serial response box, and better than a keyboard. In addition, our setup allows for the flexible integration of other sensors, and even actuators, to extend the cognitive science toolbox.
Design of eight-channel ADC card for GHz signal conversion

CERN Document Server

Habib, Samer Bou; Jalmuzna, Wojciech; Jezynski, Tomasz

2011-01-01

This paper describes the design of an eight-channel ATCA card suited for direct analog-to-digital conversion of 1.3 GHz signals with a maximum ADC clock frequency of 500 MHz. The undersampling operation is used for signal conversion. This card was designed for the needs of the LLRF system of the FLASH and XFEL accelerators. The designed module consists of a main ATCA board with eight ADCs, FPGA unit, memory, power supply and diagnostic circuits. The main ATCA card allows connecting a daughter board with IPMI, CPU and fast interfaces for communication purposes. This paper describes such issues as system organization allowing acquisition of data at such high data rates, circuit synchronization by high-quality clock signals, CPU and connectivity features, 20-layer PCB design and techniques used for high-frequency signals transmission and matching.
Free-Space Optical Interconnect Employing VCSEL Diodes

Science.gov (United States)

Simons, Rainee N.; Savich, Gregory R.; Torres, Heidi

2009-01-01

Sensor signal processing is widely used on aircraft and spacecraft. The scheme employs multiple input/output nodes for data acquisition and CPU (central processing unit) nodes for data processing. To connect 110 nodes and CPU nodes, scalable interconnections such as backplanes are desired because the number of nodes depends on requirements of each mission. An optical backplane consisting of vertical-cavity surface-emitting lasers (VCSELs), VCSEL drivers, photodetectors, and transimpedance amplifiers is the preferred approach since it can handle several hundred megabits per second data throughput.The next generation of satellite-borne systems will require transceivers and processors that can handle several Gb/s of data. Optical interconnects have been praised for both their speed and functionality with hopes that light can relieve the electrical bottleneck predicted for the near future. Optoelectronic interconnects provide a factor of ten improvement over electrical interconnects.
Parallel Computer System for 3D Visualization Stereo on GPU

Science.gov (United States)

Al-Oraiqat, Anas M.; Zori, Sergii A.

2018-03-01

This paper proposes the organization of a parallel computer system based on Graphic Processors Unit (GPU) for 3D stereo image synthesis. The development is based on the modified ray tracing method developed by the authors for fast search of tracing rays intersections with scene objects. The system allows significant increase in the productivity for the 3D stereo synthesis of photorealistic quality. The generalized procedure of 3D stereo image synthesis on the Graphics Processing Unit/Graphics Processing Clusters (GPU/GPC) is proposed. The efficiency of the proposed solutions by GPU implementation is compared with single-threaded and multithreaded implementations on the CPU. The achieved average acceleration in multi-thread implementation on the test GPU and CPU is about 7.5 and 1.6 times, respectively. Studying the influence of choosing the size and configuration of the computational Compute Unified Device Archi-tecture (CUDA) network on the computational speed shows the importance of their correct selection. The obtained experimental estimations can be significantly improved by new GPUs with a large number of processing cores and multiprocessors, as well as optimized configuration of the computing CUDA network.
Modeling and Simulation of the Economics of Mining in the Bitcoin Market.

Science.gov (United States)

Cocco, Luisanna; Marchesi, Michele

2016-01-01

In January 3, 2009, Satoshi Nakamoto gave rise to the "Bitcoin Blockchain", creating the first block of the chain hashing on his computer's central processing unit (CPU). Since then, the hash calculations to mine Bitcoin have been getting more and more complex, and consequently the mining hardware evolved to adapt to this increasing difficulty. Three generations of mining hardware have followed the CPU's generation. They are GPU's, FPGA's and ASIC's generations. This work presents an agent-based artificial market model of the Bitcoin mining process and of the Bitcoin transactions. The goal of this work is to model the economy of the mining process, starting from GPU's generation, the first with economic significance. The model reproduces some "stylized facts" found in real-time price series and some core aspects of the mining business. In particular, the computational experiments performed can reproduce the unit root property, the fat tail phenomenon and the volatility clustering of Bitcoin price series. In addition, under proper assumptions, they can reproduce the generation of Bitcoins, the hashing capability, the power consumption, and the mining hardware and electrical energy expenditures of the Bitcoin network.
Modeling and Simulation of the Economics of Mining in the Bitcoin Market.

Directory of Open Access Journals (Sweden)

Luisanna Cocco

Full Text Available In January 3, 2009, Satoshi Nakamoto gave rise to the "Bitcoin Blockchain", creating the first block of the chain hashing on his computer's central processing unit (CPU. Since then, the hash calculations to mine Bitcoin have been getting more and more complex, and consequently the mining hardware evolved to adapt to this increasing difficulty. Three generations of mining hardware have followed the CPU's generation. They are GPU's, FPGA's and ASIC's generations. This work presents an agent-based artificial market model of the Bitcoin mining process and of the Bitcoin transactions. The goal of this work is to model the economy of the mining process, starting from GPU's generation, the first with economic significance. The model reproduces some "stylized facts" found in real-time price series and some core aspects of the mining business. In particular, the computational experiments performed can reproduce the unit root property, the fat tail phenomenon and the volatility clustering of Bitcoin price series. In addition, under proper assumptions, they can reproduce the generation of Bitcoins, the hashing capability, the power consumption, and the mining hardware and electrical energy expenditures of the Bitcoin network.
An efficient automated parameter tuning framework for spiking neural networks.

Science.gov (United States)

Carlson, Kristofor D; Nageswaran, Jayram Moorkanikara; Dutt, Nikil; Krichmar, Jeffrey L

2014-01-01

As the desire for biologically realistic spiking neural networks (SNNs) increases, tuning the enormous number of open parameters in these models becomes a difficult challenge. SNNs have been used to successfully model complex neural circuits that explore various neural phenomena such as neural plasticity, vision systems, auditory systems, neural oscillations, and many other important topics of neural function. Additionally, SNNs are particularly well-adapted to run on neuromorphic hardware that will support biological brain-scale architectures. Although the inclusion of realistic plasticity equations, neural dynamics, and recurrent topologies has increased the descriptive power of SNNs, it has also made the task of tuning these biologically realistic SNNs difficult. To meet this challenge, we present an automated parameter tuning framework capable of tuning SNNs quickly and efficiently using evolutionary algorithms (EA) and inexpensive, readily accessible graphics processing units (GPUs). A sample SNN with 4104 neurons was tuned to give V1 simple cell-like tuning curve responses and produce self-organizing receptive fields (SORFs) when presented with a random sequence of counterphase sinusoidal grating stimuli. A performance analysis comparing the GPU-accelerated implementation to a single-threaded central processing unit (CPU) implementation was carried out and showed a speedup of 65× of the GPU implementation over the CPU implementation, or 0.35 h per generation for GPU vs. 23.5 h per generation for CPU. Additionally, the parameter value solutions found in the tuned SNN were studied and found to be stable and repeatable. The automated parameter tuning framework presented here will be of use to both the computational neuroscience and neuromorphic engineering communities, making the process of constructing and tuning large-scale SNNs much quicker and easier.
Keyboard Improvisation: A Phenomenological Study

Science.gov (United States)

Kingscott, John; Durrant, Colin

2010-01-01

The purpose of this study was to explore the phenomenon of musical improvisation within two contrasting musical genres--jazz piano and liturgical and concert organ. While improvisation is well documented in both genres, there is little literature relating the two forms and, in particular, the process of improvisation. The aim of this study is to…
SECURITY MEASURES OF RANDVUL KEYBOARD

OpenAIRE

RADHA DAMODARAM; Dr. M.L. VALARMATHI

2010-01-01

Phishing is a “con trick” by which consumers are sent email purporting to originate from legitimate services like banks or other financial institutions. Phishing can be thought of as the marriage of social engineering and technology. The goal of a phisher is typically to learn information that allows him to access resources belonging to his victims. The most common type of phishing attack aims to obtainaccount numbers and passwords used for online banking, in order to either steal money from ...
CPU architecture for a fast and energy-saving calculation of convolution neural networks

Science.gov (United States)

Knoll, Florian J.; Grelcke, Michael; Czymmek, Vitali; Holtorf, Tim; Hussmann, Stephan

2017-06-01

One of the most difficult problem in the use of artificial neural networks is the computational capacity. Although large search engine companies own specially developed hardware to provide the necessary computing power, for the conventional user only remains the state of the art method, which is the use of a graphic processing unit (GPU) as a computational basis. Although these processors are well suited for large matrix computations, they need massive energy. Therefore a new processor on the basis of a field programmable gate array (FPGA) has been developed and is optimized for the application of deep learning. This processor is presented in this paper. The processor can be adapted for a particular application (in this paper to an organic farming application). The power consumption is only a fraction of a GPU application and should therefore be well suited for energy-saving applications.
The association between problematic cellular phone use and risky behaviors and low self-esteem among Taiwanese adolescents.

Science.gov (United States)

Yang, Yuan-Sheng; Yen, Ju-Yu; Ko, Chih-Hung; Cheng, Chung-Ping; Yen, Cheng-Fang

2010-04-28

Cellular phone use (CPU) is an important part of life for many adolescents. However, problematic CPU may complicate physiological and psychological problems. The aim of our study was to examine the associations between problematic CPU and a series of risky behaviors and low self-esteem in Taiwanese adolescents. A total of 11,111 adolescent students in Southern Taiwan were randomly selected into this study. We used the Problematic Cellular Phone Use Questionnaire to identify the adolescents with problematic CPU. Meanwhile, a series of risky behaviors and self-esteem were evaluated. Multilevel logistic regression analyses were employed to examine the associations between problematic CPU and risky behaviors and low self-esteem regarding gender and age. The results indicated that positive associations were found between problematic CPU and aggression, insomnia, smoking cigarettes, suicidal tendencies, and low self-esteem in all groups with different sexes and ages. However, gender and age differences existed in the associations between problematic CPU and suspension from school, criminal records, tattooing, short nocturnal sleep duration, unprotected sex, illicit drugs use, drinking alcohol and chewing betel nuts. There were positive associations between problematic CPU and a series of risky behaviors and low self-esteem in Taiwanese adolescents. It is worthy for parents and mental health professionals to pay attention to adolescents' problematic CPU.
NUI framework based on real-time head pose estimation and hand gesture recognition

Directory of Open Access Journals (Sweden)

Kim Hyunduk

2016-01-01

Full Text Available The natural user interface (NUI is used for the natural motion interface without using device or tool such as mice, keyboards, pens and markers. In this paper, we develop natural user interface framework based on two recognition module. First module is real-time head pose estimation module using random forests and second module is hand gesture recognition module, named Hand gesture Key Emulation Toolkit (HandGKET. Using the head pose estimation module, we can know where the user is looking and what the user’s focus of attention is. Moreover, using the hand gesture recognition module, we can also control the computer using the user’s hand gesture without mouse and keyboard. In proposed framework, the user’s head direction and hand gesture are mapped into mouse and keyboard event, respectively.
Manufacture of Platform Prototype for Digital Safety System

International Nuclear Information System (INIS)

Lee, S. Y.; Kim, J. S.; Kim, J. M.

2010-01-01

Unit controller is a basic unit of digital safety system platform prototype. The typical unit controller is comprised of CPB(CPU board), CMB(communication board), AIB(Analog input board), AOB(Analog output board), CIB(contact input board), COB(contact output board), and a subrack. It is developed according to H/W development procedure and S/W development life cycle. A digital safety system(for example, plant protection system) is the assemblies of unit controllers. CPB performs the function of each system. DSP(digital signal processor) is built in CPB. CMB is responsible for communication between unit controllers. NSD(Network Switching Device) exchanges data between the unit controllers. Each unit controller of the platform are connected to NSD through CMB. Reliability analyses on unit controller and NSD are performed. These reliability data are used as input of technical validation
CUDA-Sankoff

DEFF Research Database (Denmark)

Sundfeld, Daniel; Havgaard, Jakob H.; Gorodkin, Jan

2017-01-01

In this paper, we propose and evaluate CUDASankoff, a solution to the RNA structural alignment problem based on the Sankoff algorithm in Graphics Processing Units (GPUS). To our knowledge, this is the first time the Sankoff algorithm is implemented in GPU. In our solution, we show how to lineariz...... to 24 times faster than a 16-core CPU solution in the 281 nucleotide Sankoff execution....

Turbine Control System Replacement at NPP NEK; System Specifics, Project Experience and Lessons Learned

International Nuclear Information System (INIS)

Mandic, D.; Zilavy, M. J.

2010-01-01

The main intention of this paper is to present feedback from the implementation of the new Turbine Control System (TCS) replacement project at Nuclear Power Plant (NPP) NEK - Krsko. From the plant construction time and the first plant start-up in 1981, the NPP NEK TG (Turbine-Generator) set was controlled and monitored by DEH (Digital Electro Hydraulic) Mod II Control System designed in 70's based on P2500 CPU and number of I/O controllers and modules. The P2500 CPU and associated controllers were built with discrete TTL components (TTL logic chips) and the P2500 CPU had 64k of 16 bit words of ferrite core memory. For that time, DEH Mod II had sophisticated MCR (Main Control Room) HMI (Human Machine Interface) based on digital functional keyboards, one alphanumeric black and white CRT monitor and printer. After twenty eight years of operation and because of several other reasons that are explained in the paper, NEK decided to replace the old DEH Mod II Control system with the new Emerson Ovation based DCS (Distributed Control System) on redundant platform for the control and monitoring of secondary plant systems in the NPP Krsko (NEK), and the new system was named PDEH (Programmable Digital Electro Hydraulic) TCS. In May 2007, NEK signed the turn-key contract with Westinghouse Electric Company (WEC) for the project of replacement of the TCS, Turbine Emergency Trip System (ETS), Moisture Separator Reheater (MSR) control and some other control and monitoring functions. WEC subcontracted a number of other companies for equipment delivery, AE (Architect Engineering Design) activities, specific software development tasks (changes of KFSS - Krsko Full Scope Simulator and PIS - Process Information System interface) and field installation activities. The subject project enveloped implementation of PDEH system on three application platforms: BG KFSS (Background KFSS), FG KFSS (Foreground KFSS) and PDEH system installed in the plant. The HMI for the BG KFSS platform
打楽器音による即興演奏を通しての感情コミュニケーション

OpenAIRE

菊地, 正; 生駒, 忍; Kikuchi, Tadashi; Ikoma, Shinobu; イコマ, シノブ; キクチ, タダシ

2009-01-01

For emotional communication by improvisation, happiness was conveyed most successfully when snare drum was used, but sadness was well conveyed when keyboard was used. Which factor causes the dissonance, the timbre (percussive timbre vs. piano) or physical property of the instruments (played with sticks vs. fingers)? We conducted an experiment using MIDI keyboard with percussive timbre. Thirty-nine participants made an emotion detection task for recorded improvisations, which expressed happine...
Online versus offline: The Web as a medium for response time data collection.

Science.gov (United States)

Chetverikov, Andrey; Upravitelev, Philipp

2016-09-01

The Internet provides a convenient environment for data collection in psychology. Modern Web programming languages, such as JavaScript or Flash (ActionScript), facilitate complex experiments without the necessity of experimenter presence. Yet there is always a question of how much noise is added due to the differences between the setups used by participants and whether it is compensated for by increased ecological validity and larger sample sizes. This is especially a problem for experiments that measure response times (RTs), because they are more sensitive (and hence more susceptible to noise) than, for example, choices per se. We used a simple visual search task with different set sizes to compare laboratory performance with Web performance. The results suggest that although the locations (means) of RT distributions are different, other distribution parameters are not. Furthermore, the effect of experiment setting does not depend on set size, suggesting that task difficulty is not important in the choice of a data collection method. We also collected an additional online sample to investigate the effects of hardware and software diversity on the accuracy of RT data. We found that the high diversity of browsers, operating systems, and CPU performance may have a detrimental effect, though it can partly be compensated for by increased sample sizes and trial numbers. In sum, the findings show that Web-based experiments are an acceptable source of RT data, comparable to a common keyboard-based setup in the laboratory.
The association between problematic cellular phone use and risky behaviors and low self-esteem among Taiwanese adolescents

Directory of Open Access Journals (Sweden)

Ko Chih-Hung

2010-04-01

Full Text Available Abstract Background Cellular phone use (CPU is an important part of life for many adolescents. However, problematic CPU may complicate physiological and psychological problems. The aim of our study was to examine the associations between problematic CPU and a series of risky behaviors and low self-esteem in Taiwanese adolescents. Methods A total of 11,111 adolescent students in Southern Taiwan were randomly selected into this study. We used the Problematic Cellular Phone Use Questionnaire to identify the adolescents with problematic CPU. Meanwhile, a series of risky behaviors and self-esteem were evaluated. Multilevel logistic regression analyses were employed to examine the associations between problematic CPU and risky behaviors and low self-esteem regarding gender and age. Results The results indicated that positive associations were found between problematic CPU and aggression, insomnia, smoking cigarettes, suicidal tendencies, and low self-esteem in all groups with different sexes and ages. However, gender and age differences existed in the associations between problematic CPU and suspension from school, criminal records, tattooing, short nocturnal sleep duration, unprotected sex, illicit drugs use, drinking alcohol and chewing betel nuts. Conclusions There were positive associations between problematic CPU and a series of risky behaviors and low self-esteem in Taiwanese adolescents. It is worthy for parents and mental health professionals to pay attention to adolescents' problematic CPU.
Using Intel's Knight Landing Processor to Accelerate Global Nested Air Quality Prediction Modeling System (GNAQPMS) Model

Science.gov (United States)

Wang, H.; Chen, H.; Chen, X.; Wu, Q.; Wang, Z.

2016-12-01

The Global Nested Air Quality Prediction Modeling System for Hg (GNAQPMS-Hg) is a global chemical transport model coupled Hg transport module to investigate the mercury pollution. In this study, we present our work of transplanting the GNAQPMS model on Intel Xeon Phi processor, Knights Landing (KNL) to accelerate the model. KNL is the second-generation product adopting Many Integrated Core Architecture (MIC) architecture. Compared with the first generation Knight Corner (KNC), KNL has more new hardware features, that it can be used as unique processor as well as coprocessor with other CPU. According to the Vtune tool, the high overhead modules in GNAQPMS model have been addressed, including CBMZ gas chemistry, advection and convection module, and wet deposition module. These high overhead modules were accelerated by optimizing code and using new techniques of KNL. The following optimized measures was done: 1) Changing the pure MPI parallel mode to hybrid parallel mode with MPI and OpenMP; 2.Vectorizing the code to using the 512-bit wide vector computation unit. 3. Reducing unnecessary memory access and calculation. 4. Reducing Thread Local Storage (TLS) for common variables with each OpenMP thread in CBMZ. 5. Changing the way of global communication from files writing and reading to MPI functions. After optimization, the performance of GNAQPMS is greatly increased both on CPU and KNL platform, the single-node test showed that optimized version has 2.6x speedup on two sockets CPU platform and 3.3x speedup on one socket KNL platform compared with the baseline version code, which means the KNL has 1.29x speedup when compared with 2 sockets CPU platform.
Distributed GPU Computing in GIScience

Science.gov (United States)

Jiang, Y.; Yang, C.; Huang, Q.; Li, J.; Sun, M.

2013-12-01

Geoscientists strived to discover potential principles and patterns hidden inside ever-growing Big Data for scientific discoveries. To better achieve this objective, more capable computing resources are required to process, analyze and visualize Big Data (Ferreira et al., 2003; Li et al., 2013). Current CPU-based computing techniques cannot promptly meet the computing challenges caused by increasing amount of datasets from different domains, such as social media, earth observation, environmental sensing (Li et al., 2013). Meanwhile CPU-based computing resources structured as cluster or supercomputer is costly. In the past several years with GPU-based technology matured in both the capability and performance, GPU-based computing has emerged as a new computing paradigm. Compare to traditional computing microprocessor, the modern GPU, as a compelling alternative microprocessor, has outstanding high parallel processing capability with cost-effectiveness and efficiency(Owens et al., 2008), although it is initially designed for graphical rendering in visualization pipe. This presentation reports a distributed GPU computing framework for integrating GPU-based computing within distributed environment. Within this framework, 1) for each single computer, computing resources of both GPU-based and CPU-based can be fully utilized to improve the performance of visualizing and processing Big Data; 2) within a network environment, a variety of computers can be used to build up a virtual super computer to support CPU-based and GPU-based computing in distributed computing environment; 3) GPUs, as a specific graphic targeted device, are used to greatly improve the rendering efficiency in distributed geo-visualization, especially for 3D/4D visualization. Key words: Geovisualization, GIScience, Spatiotemporal Studies Reference : 1. Ferreira de Oliveira, M. C., & Levkowitz, H. (2003). From visual data exploration to visual data mining: A survey. Visualization and Computer Graphics, IEEE
Design of a continuous duty cryopump

International Nuclear Information System (INIS)

Sedgley, D.W.

1985-05-01

A continuous duty cryopump system was designed and developed that comprises a self-contained cryopump for installation into a vacuum chamber, and a microprocessor controller for automatic operation. This deuterium pump has two units in a single housing, arranged so that one is pumping while the other is being regenerated. Liquid helium-cooled, finned sections in each unit pump deuterium by condensation, and a third pump integral within the cryopump housing collects the regenerated gas. A microprocessor unit controls distribution of liquid and gaseous helium, used for conditioning the pumping units, and operates remote actuators for the regeneration. Software provides fully automatic, timed sequencing of the repetitive cryopump events which include: cooldown of the pumping units, opening of the louvers isolating the unit from the vacuum chamber, closing of the louvers, and warming up of the unit for regeneration. Default values in the software can be reprogrammed by the operator through the keyboard in response to prompts displayed on the computer. An override allows the operator to control the cryopump manually by activating switches on a control panel. Interlocks to prevent cryogen lockup are included in the software
A Framework for Dynamically-Loaded Hardware Library (HLL) in FPGA Acceleration

DEFF Research Database (Denmark)

Cardarilli, Gian Carlo; Di Carlo, Leonardo; Nannarelli, Alberto

2016-01-01

Hardware acceleration is often used to address the need for speed and computing power in embedded systems. FPGAs always represented a good solution for HW acceleration and, recently, new SoC platforms extended the flexibility of the FPGAs by combining on a single chip both high-performance CPUs...... and FPGA fabric. The aim of this work is the implementation of hardware accelerators for these new SoCs. The innovative feature of these accelerators is the on-the-fly reconfiguration of the hardware to dynamically adapt the accelerator’s functionalities to the current CPU workload. The realization...... of the accelerators preliminarily requires also the profiling of both the SW (ARM CPU + NEON Units) and HW (FPGA) performance, an evaluation of the partial reconfiguration times and the development of an applicationspecific IP-cores library. This paper focuses on the profiling aspect of both the SW and HW...
Braille Touch : Mobile Touchscreen Text Entry for the Visually Impaired

OpenAIRE

Southern, Caleb; Clawson, James; Frey, Brian; Abowd, Gregory; Romero, Mario

2012-01-01

We present a demonstration of BrailleTouch, an accessible keyboard for blind users on a touchscreen smartphone (see Figure 1). Based on the standard Perkins Brailler, BrailleTouch implements a six-key chorded braille soft keyboard [1]. We will briefly introduce audience members to the braille code, and then allow them to hold the BrailleTouch prototype and enter text, with the aid of a visual chart of the braille alphabet. QC 20160418
Hardware characteristic and application

International Nuclear Information System (INIS)

Gu, Dong Hyeon

1990-03-01

The contents of this book are system board on memory, performance, system timer system click and specification, coprocessor such as programing interface and hardware interface, power supply on input and output, protection for DC output, Power Good signal, explanation on 84 keyboard and 101/102 keyboard,BIOS system, 80286 instruction set and 80287 coprocessor, characters, keystrokes and colors, communication and compatibility of IBM personal computer on application direction, multitasking and code for distinction of system.
Evaluation of Mobile Phones for Large Display Interaction

OpenAIRE

Bauer, Jens; Thelen, Sebastian; Ebert, Achim

2012-01-01

Large displays have become more and more common in the last few years. While interaction with these displays can be conducted using standard methods such as computer mouse and keyboard, this approach causes issues in multi-user environments, where the various conditions for providing multiple keyboards and mice, together with the facilities to employ them, cannot be met. To solve this problem, interaction using mobile phones was proposed by several authors. Previous solutions were specialized...
Energy- and cost-efficient lattice-QCD computations using graphics processing units

Energy Technology Data Exchange (ETDEWEB)

Bach, Matthias

2014-07-01

Quarks and gluons are the building blocks of all hadronic matter, like protons and neutrons. Their interaction is described by Quantum Chromodynamics (QCD), a theory under test by large scale experiments like the Large Hadron Collider (LHC) at CERN and in the future at the Facility for Antiproton and Ion Research (FAIR) at GSI. However, perturbative methods can only be applied to QCD for high energies. Studies from first principles are possible via a discretization onto an Euclidean space-time grid. This discretization of QCD is called Lattice QCD (LQCD) and is the only ab-initio option outside of the high-energy regime. LQCD is extremely compute and memory intensive. In particular, it is by definition always bandwidth limited. Thus - despite the complexity of LQCD applications - it led to the development of several specialized compute platforms and influenced the development of others. However, in recent years General-Purpose computation on Graphics Processing Units (GPGPU) came up as a new means for parallel computing. Contrary to machines traditionally used for LQCD, graphics processing units (GPUs) are a massmarket product. This promises advantages in both the pace at which higher-performing hardware becomes available and its price. CL2QCD is an OpenCL based implementation of LQCD using Wilson fermions that was developed within this thesis. It operates on GPUs by all major vendors as well as on central processing units (CPUs). On the AMD Radeon HD 7970 it provides the fastest double-precision D kernel for a single GPU, achieving 120GFLOPS. D - the most compute intensive kernel in LQCD simulations - is commonly used to compare LQCD platforms. This performance is enabled by an in-depth analysis of optimization techniques for bandwidth-limited codes on GPUs. Further, analysis of the communication between GPU and CPU, as well as between multiple GPUs, enables high-performance Krylov space solvers and linear scaling to multiple GPUs within a single system. LQCD
Energy- and cost-efficient lattice-QCD computations using graphics processing units

International Nuclear Information System (INIS)

Bach, Matthias

2014-01-01

Quarks and gluons are the building blocks of all hadronic matter, like protons and neutrons. Their interaction is described by Quantum Chromodynamics (QCD), a theory under test by large scale experiments like the Large Hadron Collider (LHC) at CERN and in the future at the Facility for Antiproton and Ion Research (FAIR) at GSI. However, perturbative methods can only be applied to QCD for high energies. Studies from first principles are possible via a discretization onto an Euclidean space-time grid. This discretization of QCD is called Lattice QCD (LQCD) and is the only ab-initio option outside of the high-energy regime. LQCD is extremely compute and memory intensive. In particular, it is by definition always bandwidth limited. Thus - despite the complexity of LQCD applications - it led to the development of several specialized compute platforms and influenced the development of others. However, in recent years General-Purpose computation on Graphics Processing Units (GPGPU) came up as a new means for parallel computing. Contrary to machines traditionally used for LQCD, graphics processing units (GPUs) are a massmarket product. This promises advantages in both the pace at which higher-performing hardware becomes available and its price. CL2QCD is an OpenCL based implementation of LQCD using Wilson fermions that was developed within this thesis. It operates on GPUs by all major vendors as well as on central processing units (CPUs). On the AMD Radeon HD 7970 it provides the fastest double-precision D kernel for a single GPU, achieving 120GFLOPS. D - the most compute intensive kernel in LQCD simulations - is commonly used to compare LQCD platforms. This performance is enabled by an in-depth analysis of optimization techniques for bandwidth-limited codes on GPUs. Further, analysis of the communication between GPU and CPU, as well as between multiple GPUs, enables high-performance Krylov space solvers and linear scaling to multiple GPUs within a single system. LQCD
MIT CSAIL and Lincoln Laboratory Task Force Report

Science.gov (United States)

2016-08-01

funded through MIT LL, most notably through by the MIT LL Technology Office (TO). TO funding has come through a diverse set of venues: ASD (R&E) Line...many and varied opportunities to participate could better cohere research interests at each institution and between them as well. 3.4.2 Autonomy-at...applications. Near-data processing is about migrating selected processing operations from the central processing unit (CPU) of traditional computer
Portable microcomputer for the analysis of plutonium gamma-ray spectra. Volume II. Software description and listings

International Nuclear Information System (INIS)

Ruhter, W.D.

1984-05-01

A portable microcomputer has been developed and programmed for the International Atomic Energy Agency (IAEA) to perform in-field analysis of plutonium gamma-ray spectra. The unit includes a 16-bit LSI-11/2 microprocessor, 32-K words of memory, a 20-character display for user prompting, a numeric keyboard for user responses, and a 20-character thermal printer for hard-copy output of results. The unit weights 11 kg and has dimensions of 33.5 x 30.5 x 23.0 cm. This compactness allows the unit to be stored under an airline seat. Only the positions of the 148-keV 241 Pu and 208-keV 237 U peaks are required for spectral analysis that gives plutonium isotopic ratios and weight percent abundances. Volume I of this report provides a detailed description of the data analysis methodology, operation instructions, hardware, and maintenance and troubleshooting. Volume II describes the software and provides software listings
Automatic speech recognition for radiological reporting

International Nuclear Information System (INIS)

Vidal, B.

1991-01-01

Large vocabulary speech recognition, its techniques and its software and hardware technology, are being developed, aimed at providing the office user with a tool that could significantly improve both quantity and quality of his work: the dictation machine, which allows memos and documents to be input using voice and a microphone instead of fingers and a keyboard. The IBM Rome Science Center, together with the IBM Research Division, has built a prototype recognizer that accepts sentences in natural language from 20.000-word Italian vocabulary. The unit runs on a personal computer equipped with a special hardware capable of giving all the necessary computing power. The first laboratory experiments yielded very interesting results and pointed out such system characteristics to make its use possible in operational environments. To this purpose, the dictation of medical reports was considered as a suitable application. In cooperation with the 2nd Radiology Department of S. Maria della Misericordia Hospital (Udine, Italy), a system was experimented by radiology department doctors during their everyday work. The doctors were able to directly dictate their reports to the unit. The text appeared immediately on the screen, and eventual errors could be corrected either by voice or by using the keyboard. At the end of report dictation, the doctors could both print and archive the text. The report could also be forwarded to hospital information system, when the latter was available. Our results have been very encouraging: the system proved to be robust, simple to use, and accurate (over 95% average recognition rate). The experiment was precious for suggestion and comments, and its results are useful for system evolution towards improved system management and efficency
Control programs of multichannel pulse height analyzer with CAMAC system using FACOM U-200 mini-computer

International Nuclear Information System (INIS)

Yamagishi, Kojiro

1978-02-01

The 4096 channel Pulse Height Analyzer (PHA) assembled with CAMAC plug-in units has been developed in JAERI. The PHA consists of ADC unit, CRT-display unit, and CAMAC plug-in units, which are memory-controller, MCA-timer, 4K words RAM memory and CRT-driver. The system is on-line connected to FACOM U-200 Mini-Computer through CAMAC interface unit Crate-controller. The softwares for on-line data acquisition of the system have been developed. These are four utility programs written in FORTRAN and two program packages written in assembler language FASP which are CAMAC Program Package and Basic Input/Output Program Package. CAMAC Program Package has 18 subroutine programs for control of CAMAC plug-in units from FACOM U-200 Mini-Computer; and Basic Input/Output Program Package has 26 subroutine programs to input/output data to/from a typewriter, keyboard, cassette magnetic tape and open reel magnetic tape. These subroutine programs are all FORTRAN callable. The PHA with CAMAC system is first outlined, and then usage is described in detail of four utility programs, CAMAC Program Package and Basic Input/Output Program Package. (auth.)
Fast analysis of molecular dynamics trajectories with graphics processing units-Radial distribution function histogramming

International Nuclear Information System (INIS)

Levine, Benjamin G.; Stone, John E.; Kohlmeyer, Axel

2011-01-01

The calculation of radial distribution functions (RDFs) from molecular dynamics trajectory data is a common and computationally expensive analysis task. The rate limiting step in the calculation of the RDF is building a histogram of the distance between atom pairs in each trajectory frame. Here we present an implementation of this histogramming scheme for multiple graphics processing units (GPUs). The algorithm features a tiling scheme to maximize the reuse of data at the fastest levels of the GPU's memory hierarchy and dynamic load balancing to allow high performance on heterogeneous configurations of GPUs. Several versions of the RDF algorithm are presented, utilizing the specific hardware features found on different generations of GPUs. We take advantage of larger shared memory and atomic memory operations available on state-of-the-art GPUs to accelerate the code significantly. The use of atomic memory operations allows the fast, limited-capacity on-chip memory to be used much more efficiently, resulting in a fivefold increase in performance compared to the version of the algorithm without atomic operations. The ultimate version of the algorithm running in parallel on four NVIDIA GeForce GTX 480 (Fermi) GPUs was found to be 92 times faster than a multithreaded implementation running on an Intel Xeon 5550 CPU. On this multi-GPU hardware, the RDF between two selections of 1,000,000 atoms each can be calculated in 26.9 s per frame. The multi-GPU RDF algorithms described here are implemented in VMD, a widely used and freely available software package for molecular dynamics visualization and analysis.
Ab Initio factorized LCAO calculations of the electronic band structure of ZnSe, ZnS, and the (ZnSe)1(ZnS)1 strained-layer superlattice

International Nuclear Information System (INIS)

Marshall, T.S.; Wilson, T.M.

1992-01-01

The authors report on the results of electronic band structure calculations of bulk ZnSe, bulk ZnS and the (ZnSe) 1 (ZnS) 1 , strained-layer superlattice (SLS) using the ab initio factorized linear combination of atomic orbitals method. The bulk calculations were done using the standard primitive nonrectangular 2-atom zinc blende unit cell, while the SLS calculation was done using a primitive tetragonal 4-atom unit cell modeled from the CuAu I structure. The analytic fit to the SLS crystalline potential was determined by using the nonlinear coefficients from the bulk fits. The CPU time saved by factorizing the energy matrix integrals and using a rectangular unit cell is discussed
GPU-based Branchless Distance-Driven Projection and Backprojection.

Science.gov (United States)

Liu, Rui; Fu, Lin; De Man, Bruno; Yu, Hengyong

2017-12-01

Projection and backprojection operations are essential in a variety of image reconstruction and physical correction algorithms in CT. The distance-driven (DD) projection and backprojection are widely used for their highly sequential memory access pattern and low arithmetic cost. However, a typical DD implementation has an inner loop that adjusts the calculation depending on the relative position between voxel and detector cell boundaries. The irregularity of the branch behavior makes it inefficient to be implemented on massively parallel computing devices such as graphics processing units (GPUs). Such irregular branch behaviors can be eliminated by factorizing the DD operation as three branchless steps: integration, linear interpolation, and differentiation, all of which are highly amenable to massive vectorization. In this paper, we implement and evaluate a highly parallel branchless DD algorithm for 3D cone beam CT. The algorithm utilizes the texture memory and hardware interpolation on GPUs to achieve fast computational speed. The developed branchless DD algorithm achieved 137-fold speedup for forward projection and 188-fold speedup for backprojection relative to a single-thread CPU implementation. Compared with a state-of-the-art 32-thread CPU implementation, the proposed branchless DD achieved 8-fold acceleration for forward projection and 10-fold acceleration for backprojection. GPU based branchless DD method was evaluated by iterative reconstruction algorithms with both simulation and real datasets. It obtained visually identical images as the CPU reference algorithm.

Mobile Geospatial Information Systems for Land Force Operations: Analysis of Operational Needs and Research Opportunities

Science.gov (United States)

2010-03-01

road barriers (e.g., dragon teeth) and searches vehicles for weapons and proper license plates. The rifleman also escorts VIPs after conveying the...Are there automated systems that know that in X scenario, operator Y would want to see Z , or is there an exhaustive list of options that the operator...directional) or a trackball (moves in any direction). Selection is made my depressing the wheel/ ball . • Keyboard – The size of the keyboard can
Inventions on GUI for Touch Sensitive Screens

OpenAIRE

Mishra, Umakant

2014-01-01

A touch sensitive screen displays the information on the screen and also receives the input by sensing a user's touch on the same screen. This mechanism facilitates system interaction directly through the screen without needing a mouse or keyboard. This method has the advantage to make the system compact by removing keyboard, mouse and similar interactive device. However there are certain difficulties to implement a touch screen interface. The display screens of portable devices are becoming ...
Security Inference from Noisy Data

Science.gov (United States)

2008-04-08

and RFID chips, introduce new ways of communication and sharing data. For ex- ample, the Nike +iPod Sport Kit is a new wireless accessory for the iPod...Agrawal show: • A wide variety (e.g. different keyboards of the same model, different models, different brands ) of keyboards have keys with distinct...grammar level and spelling level in this case) are built into a single model. Algorithms to maximize global joint probability may improve the
Synthesis and characterization of conductive, biodegradable, elastomeric polyurethanes for biomedical applications.

Science.gov (United States)

Xu, Cancan; Yepez, Gerardo; Wei, Zi; Liu, Fuqiang; Bugarin, Alejandro; Hong, Yi

2016-09-01

Biodegradable conductive polymers are currently of significant interest in tissue repair and regeneration, drug delivery, and bioelectronics. However, biodegradable materials exhibiting both conductive and elastic properties have rarely been reported to date. To that end, an electrically conductive polyurethane (CPU) was synthesized from polycaprolactone diol, hexadiisocyanate, and aniline trimer and subsequently doped with (1S)-(+)-10-camphorsulfonic acid (CSA). All CPU films showed good elasticity within a 30% strain range. The electrical conductivity of the CPU films, as enhanced with increasing amounts of CSA, ranged from 2.7 ± 0.9 × 10(-10) to 4.4 ± 0.6 × 10(-7) S/cm in a dry state and 4.2 ± 0.5 × 10(-8) to 7.3 ± 1.5 × 10(-5) S/cm in a wet state. The redox peaks of a CPU1.5 film (molar ratio CSA:aniline trimer = 1.5:1) in the cyclic voltammogram confirmed the desired good electroactivity. The doped CPU film exhibited good electrical stability (87% of initial conductivity after 150 hours charge) as measured in a cell culture medium. The degradation rates of CPU films increased with increasing CSA content in both phosphate-buffered solution (PBS) and lipase/PBS solutions. After 7 days of enzymatic degradation, the conductivity of all CSA-doped CPU films had decreased to that of the undoped CPU film. Mouse 3T3 fibroblasts proliferated and spread on all CPU films. This developed biodegradable CPU with good elasticity, electrical stability, and biocompatibility may find potential applications in tissue engineering, smart drug release, and electronics. © 2016 Wiley Periodicals, Inc. J Biomed Mater Res Part A: 104A: 2305-2314, 2016. © 2016 Wiley Periodicals, Inc.
A low-cost multichannel pulse-height analyzer PHA 256 using single-chip microcomputer

International Nuclear Information System (INIS)

Koehler, M.; Meiling, W.

1985-01-01

The PHA 256 multichannel analyzer on the base of the U8820 single-chip microcomputer applied for radiation measurements, for example in monitoring systems with scintillation detectors, is described. The analyzer contains a power supply unit and 7 boards, namely, the processor board; data and program memory; 8-bit analog-to-digital converter; driver to display device; keyboard with 23 function keys; pulse amplifier and high-voltage supply (up to 2 kV). Software used provides preprocessing of spectra supported by following functions: addition and subtraction of different spectra, spectrum monitoring by use of a 5-point-algorithm, calculation of peak areas with linearly interpolated background
Development of a portable instantaneous soil radon measurement instrument

International Nuclear Information System (INIS)

Wang Yushuang; Ge Liangquan; Jiang Haijing; Lin Yanchang

2007-01-01

A dual-channel instantaneous soil radon measurement instrument based on the method of electrostatic collection is designed. It has the features of small size, low cost, and high sensitivity, etc. A single chip microcomputer is adopted as the data processing and control unit. The concentration of radon can be reported in field. The result is also corrected by the pressure sensing system. A double channel discriminator is used so that the detector can eliminate the interference from the progenies of radon except RaA. LCD and MCU based encoding keyboard are used to give users a friendly interface. Operating and function setting is easy. (authors)
[A magnetic therapy apparatus with an adaptable electromagnetic spectrum for the treatment of prostatitis and gynecopathies].

Science.gov (United States)

Kuz'min, A A; Meshkovskiĭ, D V; Filist, S A

2008-01-01

Problems of engineering and algorithm development of magnetic therapy apparatuses with pseudo-random radiation spectrum within the audio range for treatment of prostatitis and gynecopathies are considered. A typical design based on a PIC 16F microcontroller is suggested. It includes a keyboard, LCD indicator, audio amplifier, inducer, and software units. The problem of pseudo-random signal generation within the audio range is considered. A series of rectangular pulses is generated on a random-length interval on the basis of a three-component random vector. This series provides the required spectral characteristics of the therapeutic magnetic field and their adaptation to the therapeutic conditions and individual features of the patient.
Generalized Load Sharing for Homogeneous Networks of Distributed Environment

Directory of Open Access Journals (Sweden)

A. Satheesh

2008-01-01

Full Text Available We propose a method for job migration policies by considering effective usage of global memory in addition to CPU load sharing in distributed systems. When a node is identified for lacking sufficient memory space to serve jobs, one or more jobs of the node will be migrated to remote nodes with low memory allocations. If the memory space is sufficiently large, the jobs will be scheduled by a CPU-based load sharing policy. Following the principle of sharing both CPU and memory resources, we present several load sharing alternatives. Our objective is to reduce the number of page faults caused by unbalanced memory allocations for jobs among distributed nodes, so that overall performance of a distributed system can be significantly improved. We have conducted trace-driven simulations to compare CPU-based load sharing policies with our policies. We show that our load sharing policies not only improve performance of memory bound jobs, but also maintain the same load sharing quality as the CPU-based policies for CPU-bound jobs. Regarding remote execution and preemptive migration strategies, our experiments indicate that a strategy selection in load sharing is dependent on the amount of memory demand of jobs, remote execution is more effective for memory-bound jobs, and preemptive migration is more effective for CPU-bound jobs. Our CPU-memory-based policy using either high performance or high throughput approach and using the remote execution strategy performs the best for both CPU-bound and memory-bound job in homogeneous networks of distributed environment.
The effects of office ergonomic training on musculoskeletal complaints, sickness absence, and psychological well-being: a cluster randomized control trial.

Science.gov (United States)

Mahmud, Norashikin; Kenny, Dianna T; Md Zein, Raemy; Hassan, Siti Nurani

2015-03-01

This study explored whether musculoskeletal complaints can be reduced by the provision of ergonomics education. A cluster randomized controlled trial study was conducted in which 3 units were randomized to intervention and received training and 3 units were given a leaflet. The effect of intervention on knowledge, workstation practices, musculoskeletal complaints, sickness absence, and psychological well-being were assessed at 6 and 12 months. Although there was no increment of knowledge among workers, significant improvements in workstation practices in the use of monitor, keyboard, and chair were observed. There were significant reductions in neck and upper and lower back complaints among workers but these did not translate into fewer days lost from work. Workers' stress was found to be significantly reduced across the studies. In conclusion, office ergonomics training can be beneficial in reducing musculoskeletal risks and stress among workers. © 2011 APJPH.
Fast Occlusion and Shadow Detection for High Resolution Remote Sensing Image Combined with LIDAR Point Cloud

Science.gov (United States)

Hu, X.; Li, X.

2012-08-01

The orthophoto is an important component of GIS database and has been applied in many fields. But occlusion and shadow causes the loss of feature information which has a great effect on the quality of images. One of the critical steps in true orthophoto generation is the detection of occlusion and shadow. Nowadays LiDAR can obtain the digital surface model (DSM) directly. Combined with this technology, image occlusion and shadow can be detected automatically. In this paper, the Z-Buffer is applied for occlusion detection. The shadow detection can be regarded as a same problem with occlusion detection considering the angle between the sun and the camera. However, the Z-Buffer algorithm is computationally expensive. And the volume of scanned data and remote sensing images is very large. Efficient algorithm is another challenge. Modern graphics processing unit (GPU) is much more powerful than central processing unit (CPU). We introduce this technology to speed up the Z-Buffer algorithm and get 7 times increase in speed compared with CPU. The results of experiments demonstrate that Z-Buffer algorithm plays well in occlusion and shadow detection combined with high density of point cloud and GPU can speed up the computation significantly.
A fast and accurate image reconstruction using GPU for OpenPET prototype

International Nuclear Information System (INIS)

Kinouchi, Shoko; Suga, Mikio; Yamaya, Taiga; Yoshida, Eiji

2010-01-01

The OpenPET (positron emission tomography), which have a physically opened space between two detector rings, is our new geometry to enable PET imaging during radiation therapy if the real-time imaging system is realized. In this paper, therefore, we developed a list-mode image reconstruction method using general purpose graphic processing units (GPUs). We used the list-mode dynamic row-action maximum likelihood algorithm (DRAMA). For GPU implementation, the efficiency of acceleration depends on the implementation method which is required to avoid conditional statements. We developed a system model in which each element of system matrix is calculated as the value of detector response function (DRF) of the length between the center of a voxel and a line of response (LOR). The system model was suited to GPU implementations that enable us to calculate each element of the system matrix with reduced number of the conditional statements. We applied the developed method to a small OpenPET prototype, which was developed for a proof-of-concept. We measured the micro-Derenzo phantom placed at the gap. The results showed that the same quality of reconstructed images using GPU as using central processing unit (CPU) were achieved, and calculation speed on the GPU was 35.5 times faster than that on the CPU. (author)
Acceleration of spiking neural network based pattern recognition on NVIDIA graphics processors.

Science.gov (United States)

Han, Bing; Taha, Tarek M

2010-04-01

There is currently a strong push in the research community to develop biological scale implementations of neuron based vision models. Systems at this scale are computationally demanding and generally utilize more accurate neuron models, such as the Izhikevich and the Hodgkin-Huxley models, in favor of the more popular integrate and fire model. We examine the feasibility of using graphics processing units (GPUs) to accelerate a spiking neural network based character recognition network to enable such large scale systems. Two versions of the network utilizing the Izhikevich and Hodgkin-Huxley models are implemented. Three NVIDIA general-purpose (GP) GPU platforms are examined, including the GeForce 9800 GX2, the Tesla C1060, and the Tesla S1070. Our results show that the GPGPUs can provide significant speedup over conventional processors. In particular, the fastest GPGPU utilized, the Tesla S1070, provided a speedup of 5.6 and 84.4 over highly optimized implementations on the fastest central processing unit (CPU) tested, a quadcore 2.67 GHz Xeon processor, for the Izhikevich and the Hodgkin-Huxley models, respectively. The CPU implementation utilized all four cores and the vector data parallelism offered by the processor. The results indicate that GPUs are well suited for this application domain.
FAST OCCLUSION AND SHADOW DETECTION FOR HIGH RESOLUTION REMOTE SENSING IMAGE COMBINED WITH LIDAR POINT CLOUD

Directory of Open Access Journals (Sweden)

X. Hu

2012-08-01

Full Text Available The orthophoto is an important component of GIS database and has been applied in many fields. But occlusion and shadow causes the loss of feature information which has a great effect on the quality of images. One of the critical steps in true orthophoto generation is the detection of occlusion and shadow. Nowadays LiDAR can obtain the digital surface model (DSM directly. Combined with this technology, image occlusion and shadow can be detected automatically. In this paper, the Z-Buffer is applied for occlusion detection. The shadow detection can be regarded as a same problem with occlusion detection considering the angle between the sun and the camera. However, the Z-Buffer algorithm is computationally expensive. And the volume of scanned data and remote sensing images is very large. Efficient algorithm is another challenge. Modern graphics processing unit (GPU is much more powerful than central processing unit (CPU. We introduce this technology to speed up the Z-Buffer algorithm and get 7 times increase in speed compared with CPU. The results of experiments demonstrate that Z-Buffer algorithm plays well in occlusion and shadow detection combined with high density of point cloud and GPU can speed up the computation significantly.
TU-FG-BRB-07: GPU-Based Prompt Gamma Ray Imaging From Boron Neutron Capture Therapy

Energy Technology Data Exchange (ETDEWEB)

Kim, S; Suh, T; Yoon, D; Jung, J; Shin, H; Kim, M [The catholic university of Korea, Seoul (Korea, Republic of)

2016-06-15

Purpose: The purpose of this research is to perform the fast reconstruction of a prompt gamma ray image using a graphics processing unit (GPU) computation from boron neutron capture therapy (BNCT) simulations. Methods: To evaluate the accuracy of the reconstructed image, a phantom including four boron uptake regions (BURs) was used in the simulation. After the Monte Carlo simulation of the BNCT, the modified ordered subset expectation maximization reconstruction algorithm using the GPU computation was used to reconstruct the images with fewer projections. The computation times for image reconstruction were compared between the GPU and the central processing unit (CPU). Also, the accuracy of the reconstructed image was evaluated by a receiver operating characteristic (ROC) curve analysis. Results: The image reconstruction time using the GPU was 196 times faster than the conventional reconstruction time using the CPU. For the four BURs, the area under curve values from the ROC curve were 0.6726 (A-region), 0.6890 (B-region), 0.7384 (C-region), and 0.8009 (D-region). Conclusion: The tomographic image using the prompt gamma ray event from the BNCT simulation was acquired using the GPU computation in order to perform a fast reconstruction during treatment. The authors verified the feasibility of the prompt gamma ray reconstruction using the GPU computation for BNCT simulations.
Environmental wodking level monitor. Final report

International Nuclear Information System (INIS)

Keefe, D; McDowell, W.P.; Groer, P.G.

1978-01-01

The Environmental Working Level Monitor (EWLM) is an instrument used to automatically monitor airborne Rn-daughter concentrations and the Working Level (WL). It is an ac powered, microprocessor--based instrument with an external inverter provided for dc operation if desired. The microprocessor's control processor unit (CPU) controls the actuation of the detector assembly and processes its output signals to yield the measurements in the proper units. The detectors are fully automated and require no manual operations once the instrument is programmed. They detect and separate the alpha emitters of RaA and RaC' as well as detecting the beta emitters of RaB and RaC. The resultant pulses from these detected radioisotopes are transmitted to the CPU. The programmed microprocessor performs the mathematical manipulations necessary to output accurate Rn-daughter concentrations and the WL. A special subroutine within the system program enables the EWLM to run a calibration procedure on command which yields calibration data. This data can then be processed in a separate program on most computers capable of BASIC programming. This calibration program results in the derivation of coefficients and beta efficiencies which provides the calibrated coefficients and beta efficiencies required by the main system program to assure proper calibration of the individual EWLM's
PERFORMANCE EVALUATION OF OR1200 PROCESSOR WITH EVOLUTIONARY PARALLEL HPRC USING GEP

Directory of Open Access Journals (Sweden)

R. Maheswari

2012-04-01

Full Text Available In this fast computing era, most of the embedded system requires more computing power to complete the complex function/ task at the lesser amount of time. One way to achieve this is by boosting up the processor performance which allows processor core to run faster. This paper presents a novel technique of increasing the performance by parallel HPRC (High Performance Reconfigurable Computing in the CPU/DSP (Digital Signal Processor unit of OR1200 (Open Reduced Instruction Set Computer (RISC 1200 using Gene Expression Programming (GEP an evolutionary programming model. OR1200 is a soft-core RISC processor of the Intellectual Property cores that can efficiently run any modern operating system. In the manufacturing process of OR1200 a parallel HPRC is placed internally in the Integer Execution Pipeline unit of the CPU/DSP core to increase the performance. The GEP Parallel HPRC is activated /deactivated by triggering the signals i HPRC_Gene_Start ii HPRC_Gene_End. A Verilog HDL(Hardware Description language functional code for Gene Expression Programming parallel HPRC is developed and synthesised using XILINX ISE in the former part of the work and a CoreMark processor core benchmark is used to test the performance of the OR1200 soft core in the later part of the work. The result of the implementation ensures the overall speed-up increased to 20.59% by GEP based parallel HPRC in the execution unit of OR1200.
Embedded-Based Graphics Processing Unit Cluster Platform for Multiple Sequence Alignments

Directory of Open Access Journals (Sweden)

Jyh-Da Wei

2017-08-01

Full Text Available High-end graphics processing units (GPUs, such as NVIDIA Tesla/Fermi/Kepler series cards with thousands of cores per chip, are widely applied to high-performance computing fields in a decade. These desktop GPU cards should be installed in personal computers/servers with desktop CPUs, and the cost and power consumption of constructing a GPU cluster platform are very high. In recent years, NVIDIA releases an embedded board, called Jetson Tegra K1 (TK1, which contains 4 ARM Cortex-A15 CPUs and 192 Compute Unified Device Architecture cores (belong to Kepler GPUs. Jetson Tegra K1 has several advantages, such as the low cost, low power consumption, and high applicability, and it has been applied into several specific applications. In our previous work, a bioinformatics platform with a single TK1 (STK platform was constructed, and this previous work is also used to prove that the Web and mobile services can be implemented in the STK platform with a good cost-performance ratio by comparing a STK platform with the desktop CPU and GPU. In this work, an embedded-based GPU cluster platform will be constructed with multiple TK1s (MTK platform. Complex system installation and setup are necessary procedures at first. Then, 2 job assignment modes are designed for the MTK platform to provide services for users. Finally, ClustalW v2.0.11 and ClustalWtk will be ported to the MTK platform. The experimental results showed that the speedup ratios achieved 5.5 and 4.8 times for ClustalW v2.0.11 and ClustalWtk, respectively, by comparing 6 TK1s with a single TK1. The MTK platform is proven to be useful for multiple sequence alignments.
Embedded-Based Graphics Processing Unit Cluster Platform for Multiple Sequence Alignments.

Science.gov (United States)

Wei, Jyh-Da; Cheng, Hui-Jun; Lin, Chun-Yuan; Ye, Jin; Yeh, Kuan-Yu

2017-01-01

High-end graphics processing units (GPUs), such as NVIDIA Tesla/Fermi/Kepler series cards with thousands of cores per chip, are widely applied to high-performance computing fields in a decade. These desktop GPU cards should be installed in personal computers/servers with desktop CPUs, and the cost and power consumption of constructing a GPU cluster platform are very high. In recent years, NVIDIA releases an embedded board, called Jetson Tegra K1 (TK1), which contains 4 ARM Cortex-A15 CPUs and 192 Compute Unified Device Architecture cores (belong to Kepler GPUs). Jetson Tegra K1 has several advantages, such as the low cost, low power consumption, and high applicability, and it has been applied into several specific applications. In our previous work, a bioinformatics platform with a single TK1 (STK platform) was constructed, and this previous work is also used to prove that the Web and mobile services can be implemented in the STK platform with a good cost-performance ratio by comparing a STK platform with the desktop CPU and GPU. In this work, an embedded-based GPU cluster platform will be constructed with multiple TK1s (MTK platform). Complex system installation and setup are necessary procedures at first. Then, 2 job assignment modes are designed for the MTK platform to provide services for users. Finally, ClustalW v2.0.11 and ClustalWtk will be ported to the MTK platform. The experimental results showed that the speedup ratios achieved 5.5 and 4.8 times for ClustalW v2.0.11 and ClustalWtk, respectively, by comparing 6 TK1s with a single TK1. The MTK platform is proven to be useful for multiple sequence alignments.
High performance image acquisition and processing architecture for fast plant system controllers based on FPGA and GPU

International Nuclear Information System (INIS)

Nieto, J.; Sanz, D.; Guillén, P.; Esquembri, S.; Arcas, G. de; Ruiz, M.; Vega, J.; Castro, R.

2016-01-01

Highlights: • To test an image acquisition and processing system for Camera Link devices based in a FPGA, compliant with ITER fast controllers. • To move data acquired from the set NI1483-NIPXIe7966R directly to a NVIDIA GPU using NVIDIA GPUDirect RDMA technology. • To obtain a methodology to include GPUs processing in ITER Fast Plant Controllers, using EPICS integration through Nominal Device Support (NDS). - Abstract: The two dominant technologies that are being used in real time image processing are Field Programmable Gate Array (FPGA) and Graphical Processor Unit (GPU) due to their algorithm parallelization capabilities. But not much work has been done to standardize how these technologies can be integrated in data acquisition systems, where control and supervisory requirements are in place, such as ITER (International Thermonuclear Experimental Reactor). This work proposes an architecture, and a development methodology, to develop image acquisition and processing systems based on FPGAs and GPUs compliant with ITER fast controller solutions. A use case based on a Camera Link device connected to an FPGA DAQ device (National Instruments FlexRIO technology), and a NVIDIA Tesla GPU series card has been developed and tested. The architecture proposed has been designed to optimize system performance by minimizing data transfer operations and CPU intervention thanks to the use of NVIDIA GPUDirect RDMA and DMA technologies. This allows moving the data directly between the different hardware elements (FPGA DAQ-GPU-CPU) avoiding CPU intervention and therefore the use of intermediate CPU memory buffers. A special effort has been put to provide a development methodology that, maintaining the highest possible abstraction from the low level implementation details, allows obtaining solutions that conform to CODAC Core System standards by providing EPICS and Nominal Device Support.
High performance image acquisition and processing architecture for fast plant system controllers based on FPGA and GPU

Energy Technology Data Exchange (ETDEWEB)

Nieto, J., E-mail: jnieto@sec.upm.es [Grupo de Investigación en Instrumentación y Acústica Aplicada, Universidad Politécnica de Madrid, Crta. Valencia Km-7, Madrid 28031 (Spain); Sanz, D.; Guillén, P.; Esquembri, S.; Arcas, G. de; Ruiz, M. [Grupo de Investigación en Instrumentación y Acústica Aplicada, Universidad Politécnica de Madrid, Crta. Valencia Km-7, Madrid 28031 (Spain); Vega, J.; Castro, R. [Asociación EURATOM/CIEMAT para Fusión, Madrid (Spain)

2016-11-15

Highlights: • To test an image acquisition and processing system for Camera Link devices based in a FPGA, compliant with ITER fast controllers. • To move data acquired from the set NI1483-NIPXIe7966R directly to a NVIDIA GPU using NVIDIA GPUDirect RDMA technology. • To obtain a methodology to include GPUs processing in ITER Fast Plant Controllers, using EPICS integration through Nominal Device Support (NDS). - Abstract: The two dominant technologies that are being used in real time image processing are Field Programmable Gate Array (FPGA) and Graphical Processor Unit (GPU) due to their algorithm parallelization capabilities. But not much work has been done to standardize how these technologies can be integrated in data acquisition systems, where control and supervisory requirements are in place, such as ITER (International Thermonuclear Experimental Reactor). This work proposes an architecture, and a development methodology, to develop image acquisition and processing systems based on FPGAs and GPUs compliant with ITER fast controller solutions. A use case based on a Camera Link device connected to an FPGA DAQ device (National Instruments FlexRIO technology), and a NVIDIA Tesla GPU series card has been developed and tested. The architecture proposed has been designed to optimize system performance by minimizing data transfer operations and CPU intervention thanks to the use of NVIDIA GPUDirect RDMA and DMA technologies. This allows moving the data directly between the different hardware elements (FPGA DAQ-GPU-CPU) avoiding CPU intervention and therefore the use of intermediate CPU memory buffers. A special effort has been put to provide a development methodology that, maintaining the highest possible abstraction from the low level implementation details, allows obtaining solutions that conform to CODAC Core System standards by providing EPICS and Nominal Device Support.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.