WorldWideScience

Sample records for unit cpu time

  1. Combustion Power Unit--400: CPU-400.

    Science.gov (United States)

    Combustion Power Co., Palo Alto, CA.

    Aerospace technology may have led to a unique basic unit for processing solid wastes and controlling pollution. The Combustion Power Unit--400 (CPU-400) is designed as a turboelectric generator plant that will use municipal solid wastes as fuel. The baseline configuration is a modular unit that is designed to utilize 400 tons of refuse per day…

  2. Proposed Fuzzy CPU Scheduling Algorithm (PFCS for Real Time Operating Systems

    Directory of Open Access Journals (Sweden)

    Prerna Ajmani

    2013-12-01

    Full Text Available In the era of supercomputers multiprogramming operating system has emerged. Multiprogramming operating system allows more than one ready to execute processes to be loaded into memory. CPU scheduling is the process of selecting from among the processes in memory that are ready to execute and allocate the processor time (CPU to it. Many conventional algorithms have been proposed for scheduling CPU such as FCFS, shortest job first (SJF, priority scheduling etc. But no algorithm is absolutely ideal in terms of increased throughput, decreased waiting time, decreased turnaround time etc. In this paper, a new fuzzy logic based CPU scheduling algorithm has been proposed to overcome the drawbacks of conventional algorithms for efficient utilization of CPU.

  3. Self-referral to chest pain units: results of the German CPU-registry.

    Science.gov (United States)

    Nowak, Bernd; Giannitsis, Evangelos; Riemer, Thomas; Münzel, Thomas; Haude, Michael; Maier, Lars S; Schmitt, Claus; Schumacher, Burghard; Mudra, Harald; Hamm, Christian; Senges, Jochen; Voigtländer, Thomas

    2012-12-01

    Chest pain units (CPUs) are increasingly established in emergency cardiology services. With improved visibility of CPUs in the population, patients may refer themselves directly to these units, obviating emergency medical services (EMS). Little is known about characteristics and outcomes of self-referred patients, as compared with those referred by EMS. Therefore, we described self-referral patients enrolled in the CPU-registry of the German Cardiac Society and compared them with those referred by EMS. From 2008 until 2010, the prospective CPU-registry enrolled 11,581 consecutive patients. Of those 3789 (32.7%) were self-referrals (SRs), while 7792 (67.3%) were referred by EMS. SR-patients were significantly younger (63.6 vs. 70.1 years), had less prior myocardial infarction or coronary artery bypass surgery, but more previous percutaneous coronary interventions (PCIs). Acute coronary syndromes were diagnosed less frequently in the SR-patients (30.3 vs. 46.9%; pCPU as a self-referral are younger, less severely ill and have more non-coronary problems than those calling an emergency medical service. Nevertheless, 30% of self-referral patients had an acute coronary syndrome.

  4. CPU timing routines for a CONVEX C220 computer system

    Science.gov (United States)

    Bynum, Mary Ann

    1989-01-01

    The timing routines available on the CONVEX C220 computer system in the Structural Mechanics Division (SMD) at NASA Langley Research Center are examined. The function of the timing routines, the use of the timing routines in sequential, parallel, and vector code, and the interpretation of the results from the timing routines with respect to the CONVEX model of computing are described. The timing routines available on the SMD CONVEX fall into two groups. The first group includes standard timing routines generally available with UNIX 4.3 BSD operating systems, while the second group includes routines unique to the SMD CONVEX. The standard timing routines described in this report are /bin/csh time,/bin/time, etime, and ctime. The routines unique to the SMD CONVEX are getinfo, second, cputime, toc, and a parallel profiling package made up of palprof, palinit, and palsum.

  5. A hybrid CPU-GPGPU approach for real-time elastography.

    Science.gov (United States)

    Yang, Xu; Deka, Sthiti; Righetti, Raffaella

    2011-12-01

    Ultrasound elastography is becoming a widely available clinical imaging tool. In recent years, several real- time elastography algorithms have been proposed; however, most of these algorithms achieve real-time frame rates through compromises in elastographic image quality. Cross-correlation- based elastographic techniques are known to provide high- quality elastographic estimates, but they are computationally intense and usually not suitable for real-time clinical applications. Recently, the use of massively parallel general purpose graphics processing units (GPGPUs) for accelerating computationally intense operations in biomedical applications has received great interest. In this study, we investigate the use of the GPGPU to speed up generation of cross-correlation-based elastograms and achieve real-time frame rates while preserving elastographic image quality. We propose and statistically analyze performance of a new hybrid model of computation suitable for elastography applications in which sequential code is executed on the CPU and parallel code is executed on the GPGPU. Our results indicate that the proposed hybrid approach yields optimal results and adequately addresses the trade-off between speed and quality.

  6. Improvement of CPU time of Linear Discriminant Function based on MNM criterion by IP

    Directory of Open Access Journals (Sweden)

    Shuichi Shinmura

    2014-05-01

    Full Text Available Revised IP-OLDF (optimal linear discriminant function by integer programming is a linear discriminant function to minimize the number of misclassifications (NM of training samples by integer programming (IP. However, IP requires large computation (CPU time. In this paper, it is proposed how to reduce CPU time by using linear programming (LP. In the first phase, Revised LP-OLDF is applied to all cases, and all cases are categorized into two groups: those that are classified correctly or those that are not classified by support vectors (SVs. In the second phase, Revised IP-OLDF is applied to the misclassified cases by SVs. This method is called Revised IPLP-OLDF.In this research, it is evaluated whether NM of Revised IPLP-OLDF is good estimate of the minimum number of misclassifications (MNM by Revised IP-OLDF. Four kinds of the real data—Iris data, Swiss bank note data, student data, and CPD data—are used as training samples. Four kinds of 20,000 re-sampling cases generated from these data are used as the evaluation samples. There are a total of 149 models of all combinations of independent variables by these data. NMs and CPU times of the 149 models are compared with Revised IPLP-OLDF and Revised IP-OLDF. The following results are obtained: 1 Revised IPLP-OLDF significantly improves CPU time. 2 In the case of training samples, all 149 NMs of Revised IPLP-OLDF are equal to the MNM of Revised IP-OLDF. 3 In the case of evaluation samples, most NMs of Revised IPLP-OLDF are equal to NM of Revised IP-OLDF. 4 Generalization abilities of both discriminant functions are concluded to be high, because the difference between the error rates of training and evaluation samples are almost within 2%.   Therefore, Revised IPLP-OLDF is recommended for the analysis of big data instead of Revised IP-OLDF. Next, Revised IPLP-OLDF is compared with LDF and logistic regression by 100-fold cross validation using 100 re-sampling samples. Means of error rates of

  7. Chest pain unit (CPU) in the management of low to intermediate risk acute coronary syndrome: a tertiary hospital experience from New Zealand.

    Science.gov (United States)

    Mazhar, J; Killion, B; Liang, M; Lee, M; Devlin, G

    2013-02-01

    A chest pain unit (CPU) for management of patients with chest pain at low to intermediate risk for acute coronary syndrome (ACS) appears safe and cost-effective. We report our experience with a CPU from March 2005 to July 2009. Prospective audit of patients presenting with chest pain suggestive of ACS but no high risk features and managed using a CPU, which included; serial cardiac troponins and electrocardiography and exercise tolerance test (ETT) if indicated. Outcomes assessed included three-month readmission rate and one year mortality. 2358 patients were managed according to the CPU. Mean age 56 years (17-96 years), 59% men and median stay of 22h (IQR 17-26h). 1933 (82%) were diagnosed as non-cardiac chest pain. 1741 (74%) patients had an ETT. Median time from triage to ETT was 21h (IQR 16-24h). 64 (2.7%) were readmitted within three months. The majority of readmissions, 39 (61%) were for a non-cardiac cause. Twenty patients (1%) were readmitted with ACS. There was no cardiac death after one year of being discharged as non-cardiac chest pain. This study confirms that a CPU with high usage of predischarge ETT is a safe and effective way of excluding ACS in patients without high risk features in a New Zealand setting. Copyright © 2012 Australian and New Zealand Society of Cardiac and Thoracic Surgeons (ANZSCTS) and the Cardiac Society of Australia and New Zealand (CSANZ). Published by Elsevier B.V. All rights reserved.

  8. An FPGA Based Multiprocessing CPU for Beam Synchronous Timing in CERN's SPS and LHC

    CERN Document Server

    Ballester, F J; Gras, J J; Lewis, J; Savioz, J J; Serrano, J

    2003-01-01

    The Beam Synchronous Timing system (BST) will be used around the LHC and its injector, the SPS, to broadcast timing meassages and synchronize actions with the beam in different receivers. To achieve beam synchronization, the BST Master card encodes messages using the bunch clock, with a nominal value of 40.079 MHz for the LHC. These messages are produced by a set of tasks every revolution period, which is every 89 us for the LHC and every 23 us for the SPS, therefore imposing a hard real-time constraint on the system. To achieve determinism, the BST Master uses a dedicated CPU inside its main Field Programmable Gate Array (FPGA) featuring zero-delay hardware task switching and a reduced instruction set. This paper describes the BST Master card, stressing the main FPGA design, as well as the associated software, including the LynxOS driver and the tailor-made assembler.

  9. A Robust Ultra-Low Voltage CPU Utilizing Timing-Error Prevention

    Directory of Open Access Journals (Sweden)

    Markus Hiienkari

    2015-04-01

    Full Text Available To minimize energy consumption of a digital circuit, logic can be operated at sub- or near-threshold voltage. Operation at this region is challenging due to device and environment variations, and resulting performance may not be adequate to all applications. This article presents two variants of a 32-bit RISC CPU targeted for near-threshold voltage. Both CPUs are placed on the same die and manufactured in 28 nm CMOS process. They employ timing-error prevention with clock stretching to enable operation with minimal safety margins while maximizing performance and energy efficiency at a given operating point. Measurements show minimum energy of 3.15 pJ/cyc at 400 mV, which corresponds to 39% energy saving compared to operation based on static signoff timing.

  10. A Spiking Neural Simulator Integrating Event-Driven and Time-Driven Computation Schemes Using Parallel CPU-GPU Co-Processing: A Case Study.

    Science.gov (United States)

    Naveros, Francisco; Luque, Niceto R; Garrido, Jesús A; Carrillo, Richard R; Anguita, Mancia; Ros, Eduardo

    2015-07-01

    Time-driven simulation methods in traditional CPU architectures perform well and precisely when simulating small-scale spiking neural networks. Nevertheless, they still have drawbacks when simulating large-scale systems. Conversely, event-driven simulation methods in CPUs and time-driven simulation methods in graphic processing units (GPUs) can outperform CPU time-driven methods under certain conditions. With this performance improvement in mind, we have developed an event-and-time-driven spiking neural network simulator suitable for a hybrid CPU-GPU platform. Our neural simulator is able to efficiently simulate bio-inspired spiking neural networks consisting of different neural models, which can be distributed heterogeneously in both small layers and large layers or subsystems. For the sake of efficiency, the low-activity parts of the neural network can be simulated in CPU using event-driven methods while the high-activity subsystems can be simulated in either CPU (a few neurons) or GPU (thousands or millions of neurons) using time-driven methods. In this brief, we have undertaken a comparative study of these different simulation methods. For benchmarking the different simulation methods and platforms, we have used a cerebellar-inspired neural-network model consisting of a very dense granular layer and a Purkinje layer with a smaller number of cells (according to biological ratios). Thus, this cerebellar-like network includes a dense diverging neural layer (increasing the dimensionality of its internal representation and sparse coding) and a converging neural layer (integration) similar to many other biologically inspired and also artificial neural networks.

  11. Comparative Performance Analysis of Best Performance Round Robin Scheduling Algorithm (BPRR) using Dynamic Time Quantum with Priority Based Round Robin (PBRR) CPU Scheduling Algorithm in Real Time Systems

    OpenAIRE

    Pallab Banerjee; Talat Zabin; ShwetaKumai; Pushpa Kumari

    2015-01-01

    Round Robin Scheduling algorithm is designed especially for Real Time Operating system (RTOS). It is a preemptive CPU scheduling algorithm which switches between the processes when static time Quantum expires .Existing Round Robin CPU scheduling algorithm cannot be implemented in real time operating system due to their high context switch rates, large waiting time, large response time, large turnaround time and less throughput . In this paper a new algorithm is presented called Best Performan...

  12. Predictable CPU Architecture Designed for Small Real-Time Application - Concept and Theory of Operation

    Directory of Open Access Journals (Sweden)

    Nicoleta Cristina GAITAN

    2015-04-01

    Full Text Available The purpose of this paper is to describe an predictable CPU architecture, based on the five stage pipeline assembly line and a hardware scheduler engine. We aim at developing a fine-grained multithreading implementation, named nMPRA-MT. The new proposed architecture uses replication and remapping techniques for the program counter, the register file, and the pipeline registers and is implemented with a FPGA device. An original implementation of a MIPS processor with thread interleaved pipeline is obtained, using dynamic scheduling of hard real-time tasks and interrupts. In terms of interrupts handling, the architecture uses a particular method consisting of assigning interrupts to tasks, which insures an efficient control for both the context switch, and the system real-time behavior. The originality of the approach resides in the predictability and spatial isolation of the hard real-time tasks, executed every two clock cycles. The nMPRA-MT architecture is enabled by an innovative scheme of predictable scheduling algorithm, without stalling the pipeline assembly line.

  13. The German CPU Registry: Dyspnea independently predicts negative short-term outcome in patients admitted to German Chest Pain Units.

    Science.gov (United States)

    Hellenkamp, Kristian; Darius, Harald; Giannitsis, Evangelos; Erbel, Raimund; Haude, Michael; Hamm, Christian; Hasenfuss, Gerd; Heusch, Gerd; Mudra, Harald; Münzel, Thomas; Schmitt, Claus; Schumacher, Burghard; Senges, Jochen; Voigtländer, Thomas; Maier, Lars S

    2015-02-15

    While dyspnea is a common symptom in patients admitted to Chest Pain Units (CPUs) little is known about the impact of dyspnea on their outcome. The purpose of this study was to evaluate the impact of dyspnea on the short-term outcome of CPU patients. We analyzed data from a total of 9169 patients admitted to one of the 38 participating CPUs in this registry between December 2008 and January 2013. Only patients who underwent coronary angiography for suspected ACS were included. 2601 patients (28.4%) presented with dyspnea. Patients with dyspnea at admission were older and frequently had a wide range of comorbidities compared to patients without dyspnea. Heart failure symptoms in particular were more common in patients with dyspnea (21.0% vs. 5.3%, pCPU patients. Our data show that dyspnea is associated with a fourfold higher 3month mortality which is underestimated by the established ACS risk scores. To improve their predictive value we therefore propose to add dyspnea as an item to common risk scores. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  14. Identification of a site critical for kinase regulation on the central processing unit (CPU) helix of the aspartate receptor.

    Science.gov (United States)

    Trammell, M A; Falke, J J

    1999-01-05

    Ligand binding to the homodimeric aspartate receptor of Escherichia coli and Salmonella typhimurium generates a transmembrane signal that regulates the activity of a cytoplasmic histidine kinase, thereby controlling cellular chemotaxis. This receptor also senses intracellular pH and ambient temperature and is covalently modified by an adaptation system. A specific helix in the cytoplasmic domain of the receptor, helix alpha6, has been previously implicated in the processing of these multiple input signals. While the solvent-exposed face of helix alpha6 possesses adaptive methylation sites known to play a role in kinase regulation, the functional significance of its buried face is less clear. This buried region lies at the subunit interface where helix alpha6 packs against its symmetric partner, helix alpha6'. To test the role of the helix alpha6-helix alpha6' interface in kinase regulation, the present study introduces a series of 13 side-chain substitutions at the Gly 278 position on the buried face of helix alpha6. The substitutions are observed to dramatically alter receptor function in vivo and in vitro, yielding effects ranging from kinase superactivation (11 examples) to complete kinase inhibition (one example). Moreover, four hydrophobic, branched side chains (Val, Ile, Phe, and Trp) lock the kinase in the superactivated state regardless of whether the receptor is occupied by ligand. The observation that most side-chain substitutions at position 278 yield kinase superactivation, combined with evidence that such facile superactivation is rare at other receptor positions, identifies the buried Gly 278 residue as a regulatory hotspot where helix packing is tightly coupled to kinase regulation. Together, helix alpha6 and its packing interactions function as a simple central processing unit (CPU) that senses multiple input signals, integrates these signals, and transmits the output to the signaling subdomain where the histidine kinase is bound. Analogous CPU

  15. Who gets admitted to the Chest Pain Unit (CPU) and how do we manage them? Improving the use of the CPU in Waikato DHB, New Zealand.

    Science.gov (United States)

    Jade, Judith; Huggan, Paul; Stephenson, Douglas

    2015-01-01

    Chest pain is a commonly encountered presentation in the emergency department (ED). The chest pain unit at Waikato DHB is designed for patients with likely stable angina, who are at low risk of acute coronary syndrome (ACS), with a normal ECG and Troponin T, who have a history which is highly suggestive of coronary artery disease (CAD). Two issues were identified with patient care on the unit (1) the number of inappropriate admissions and (2) the number of inappropriate exercise tolerance tests. A baseline study showed that 73% of admissions did not fulfil the criteria and the majority of patients (72%) had an exercise tolerance test (ETT) irrespective of clinical picture. We delivered educational presentations to key stakeholders and the implementation of a new fast track chest pain pathway for discharging patients directly from the ED. There was an improvement in the number of patients inappropriately admitted, which fell to 61%. However, the number of inappropriate ETTs did not decrease, and were still performed on 76.9% of patients.

  16. Robotic goalie with 3 ms reaction time at 4% CPU load using event-based dynamic vision sensor.

    Science.gov (United States)

    Delbruck, Tobi; Lang, Manuel

    2013-01-01

    Conventional vision-based robotic systems that must operate quickly require high video frame rates and consequently high computational costs. Visual response latencies are lower-bound by the frame period, e.g., 20 ms for 50 Hz frame rate. This paper shows how an asynchronous neuromorphic dynamic vision sensor (DVS) silicon retina is used to build a fast self-calibrating robotic goalie, which offers high update rates and low latency at low CPU load. Independent and asynchronous per pixel illumination change events from the DVS signify moving objects and are used in software to track multiple balls. Motor actions to block the most "threatening" ball are based on measured ball positions and velocities. The goalie also sees its single-axis goalie arm and calibrates the motor output map during idle periods so that it can plan open-loop arm movements to desired visual locations. Blocking capability is about 80% for balls shot from 1 m from the goal even with the fastest-shots, and approaches 100% accuracy when the ball does not beat the limits of the servo motor to move the arm to the necessary position in time. Running with standard USB buses under a standard preemptive multitasking operating system (Windows), the goalie robot achieves median update rates of 550 Hz, with latencies of 2.2 ± 2 ms from ball movement to motor command at a peak CPU load of less than 4%. Practical observations and measurements of USB device latency are provided.

  17. Robotic Goalie with 3ms Reaction Time at 4% CPU Load Using Event-Based Dynamic Vision Sensor

    Directory of Open Access Journals (Sweden)

    Tobi eDelbruck

    2013-11-01

    Full Text Available Conventional vision-based robotic systems that must operate quickly require high video frame rates and consequently high computational costs. Visual response latencies are lower-bound by the frame period, e.g. 20 ms for 50 Hz frame rate. This paper shows how an asynchronous neuromorphic dynamic vision sensor (DVS silicon retina is used to build a fast self-calibrating robotic goalie, which offers high update rates and low latency at low CPU load. Independent and asynchronous per pixel illumination change events from the DVS signify moving objects and are used in software to track multiple balls. Motor actions to block the most threatening ball are based on measured ball positions and velocities. The goalie also sees its single-axis goalie arm and calibrates the motor output map during idle periods so that it can plan open-loop arm movements to desired visual locations. Blocking capability is about 80% for balls shot from 1m from the goal even with the fastest shots, and approaches 100% accuracy when the ball does not beat the limits of the servo motor to move the arm to the necessary position in time. Running with standard USB buses under a standard preemptive multitasking operating system (Windows, the goalie robot achieves median update rates of 550 Hz, with latencies of 2.2+/-2ms from ball movement to motor command at a peak CPU load of less than 4%. Practical observations and measurements of USB device latency are provided.

  18. Event- and Time-Driven Techniques Using Parallel CPU-GPU Co-processing for Spiking Neural Networks

    Science.gov (United States)

    Naveros, Francisco; Garrido, Jesus A.; Carrillo, Richard R.; Ros, Eduardo; Luque, Niceto R.

    2017-01-01

    Modeling and simulating the neural structures which make up our central neural system is instrumental for deciphering the computational neural cues beneath. Higher levels of biological plausibility usually impose higher levels of complexity in mathematical modeling, from neural to behavioral levels. This paper focuses on overcoming the simulation problems (accuracy and performance) derived from using higher levels of mathematical complexity at a neural level. This study proposes different techniques for simulating neural models that hold incremental levels of mathematical complexity: leaky integrate-and-fire (LIF), adaptive exponential integrate-and-fire (AdEx), and Hodgkin-Huxley (HH) neural models (ranged from low to high neural complexity). The studied techniques are classified into two main families depending on how the neural-model dynamic evaluation is computed: the event-driven or the time-driven families. Whilst event-driven techniques pre-compile and store the neural dynamics within look-up tables, time-driven techniques compute the neural dynamics iteratively during the simulation time. We propose two modifications for the event-driven family: a look-up table recombination to better cope with the incremental neural complexity together with a better handling of the synchronous input activity. Regarding the time-driven family, we propose a modification in computing the neural dynamics: the bi-fixed-step integration method. This method automatically adjusts the simulation step size to better cope with the stiffness of the neural model dynamics running in CPU platforms. One version of this method is also implemented for hybrid CPU-GPU platforms. Finally, we analyze how the performance and accuracy of these modifications evolve with increasing levels of neural complexity. We also demonstrate how the proposed modifications which constitute the main contribution of this study systematically outperform the traditional event- and time-driven techniques under

  19. Event- and Time-Driven Techniques Using Parallel CPU-GPU Co-processing for Spiking Neural Networks.

    Science.gov (United States)

    Naveros, Francisco; Garrido, Jesus A; Carrillo, Richard R; Ros, Eduardo; Luque, Niceto R

    2017-01-01

    Modeling and simulating the neural structures which make up our central neural system is instrumental for deciphering the computational neural cues beneath. Higher levels of biological plausibility usually impose higher levels of complexity in mathematical modeling, from neural to behavioral levels. This paper focuses on overcoming the simulation problems (accuracy and performance) derived from using higher levels of mathematical complexity at a neural level. This study proposes different techniques for simulating neural models that hold incremental levels of mathematical complexity: leaky integrate-and-fire (LIF), adaptive exponential integrate-and-fire (AdEx), and Hodgkin-Huxley (HH) neural models (ranged from low to high neural complexity). The studied techniques are classified into two main families depending on how the neural-model dynamic evaluation is computed: the event-driven or the time-driven families. Whilst event-driven techniques pre-compile and store the neural dynamics within look-up tables, time-driven techniques compute the neural dynamics iteratively during the simulation time. We propose two modifications for the event-driven family: a look-up table recombination to better cope with the incremental neural complexity together with a better handling of the synchronous input activity. Regarding the time-driven family, we propose a modification in computing the neural dynamics: the bi-fixed-step integration method. This method automatically adjusts the simulation step size to better cope with the stiffness of the neural model dynamics running in CPU platforms. One version of this method is also implemented for hybrid CPU-GPU platforms. Finally, we analyze how the performance and accuracy of these modifications evolve with increasing levels of neural complexity. We also demonstrate how the proposed modifications which constitute the main contribution of this study systematically outperform the traditional event- and time-driven techniques under

  20. HD fisheye video stream real-time correction system based on embedded CPU-GPU%基于嵌入式CPU-GPU的高清鱼眼视频实时校正系统

    Institute of Scientific and Technical Information of China (English)

    公维理

    2016-01-01

    In the field of security video surveillance, real-time monitoring system is needed to achieve real-time monitoring for 360°×180° wide panoramic areas with high definition, existing fish-eye correction systems have higher cost, poor flexi-bility, especially low definition and poor real-time problems, etc. In order to solve real-time problems, this article proposes CPU-GPU high-speed communication protocol based on embedded platform STiH418 and CPU-GPU memory sharing method based on programming shader, and achieves HD panoramic fisheye video real-time correction system on CPU-GPU using texture mapping technology. Compares with the related correction system, experimental results show that the system takes good algorithm efficiency, effectiveness and integrity of corrected image, can fully meet the HD pano-ramic 360° × 180°(4 million pixels, 2048 × 2048p30)in real-time correction, embedded systems STiH418 reduce overall system cost compared with PC, correction MAP is generated and updated by ARM CPU in software, virtual PTZ improves system flexibility and stability, so the system has a high practical value in security video surveillance market.%在安防监控领域,需要鱼眼实时监控系统实现360°×180°大范围高质量无死角全景实时监控,现有的鱼眼校正系统存在成本较高,灵活性差,特别是清晰度不高和实时性差等方面的问题。针对如何提高全景高清鱼眼视频校正的实时性问题,提出了基于嵌入式平台STiH418的CPU-GPU高速通信协议和基于可编程着色器的嵌入式CPU-GPU内存共享方法,并利用GPU的纹理映射技术实现了全景高清鱼眼视频实时校正系统。实验结果表明,与相关校正系统相比,该系统很好地兼顾到算法效率、图像校正效果和完整性,可以完全满足360°×180°的全景高清(400万像素,2048×2048p30)鱼眼视频实时监控,而且与使用PC服务器相比嵌入式系统

  1. CPU Server

    CERN Multimedia

    The CERN computer centre has hundreds of racks like these. They are over a million times more powerful than our first computer in the 1960's. This tray is a 'dual-core' server. This means it effectively has two CPUs in it (eg. two of your home computers minimised to fit into a single box). Also note the copper cooling fins, to help dissipate the heat.

  2. Online real-time reconstruction of adaptive TSENSE with commodity CPU / GPU hardware

    DEFF Research Database (Denmark)

    Roujol, Sebastien; de Senneville, Baudouin; Vahala, E.;

    2009-01-01

    A real-time reconstruction for adaptive TSENSE is presented that is optimized for MR-guidance of interventional procedures. The proposed method allows high frame-rate imaging with low image latencies, even when large coil arrays are employed and can be implemented on affordable commodity hardware....

  3. Online real-time reconstruction of adaptive TSENSE with commodity CPU / GPU hardware

    DEFF Research Database (Denmark)

    Roujol, Sebastien; de Senneville, Baudouin Denis; Vahalla, Erkki

    2009-01-01

    and latencies and thus hampers its use for applications such as MR-guided thermotherapy or cardiovascular catheterization. Here, we demonstrate a real-time reconstruction pipeline for adaptive TSENSE with low image latencies and high frame rates on affordable commodity personal computer hardware. For typical......Adaptive temporal sensitivity encoding (TSENSE) has been suggested as a robust parallel imaging method suitable for MR guidance of interventional procedures. However, in practice, the reconstruction of adaptive TSENSE images obtained with large coil arrays leads to long reconstruction times...... image sizes used in interventional imaging (128 × 96, 16 channels, sensitivity encoding (SENSE) factor 2-4), the pipeline is able to reconstruct adaptive TSENSE images with image latencies below 90 ms at frame rates of up to 40 images/s, rendering the MR performance in practice limited...

  4. SAFARI digital processing unit: performance analysis of the SpaceWire links in case of a LEON3-FT based CPU

    Science.gov (United States)

    Giusi, Giovanni; Liu, Scige J.; Di Giorgio, Anna M.; Galli, Emanuele; Pezzuto, Stefano; Farina, Maria; Spinoglio, Luigi

    2014-08-01

    SAFARI (SpicA FAR infrared Instrument) is a far-infrared imaging Fourier Transform Spectrometer for the SPICA mission. The Digital Processing Unit (DPU) of the instrument implements the functions of controlling the overall instrument and implementing the science data compression and packing. The DPU design is based on the use of a LEON family processor. In SAFARI, all instrument components are connected to the central DPU via SpaceWire links. On these links science data, housekeeping and commands flows are in some cases multiplexed, therefore the interface control shall be able to cope with variable throughput needs. The effective data transfer workload can be an issue for the overall system performances and becomes a critical parameter for the on-board software design, both at application layer level and at lower, and more HW related, levels. To analyze the system behavior in presence of the expected SAFARI demanding science data flow, we carried out a series of performance tests using the standard GR-CPCI-UT699 LEON3-FT Development Board, provided by Aeroflex/Gaisler, connected to the emulator of the SAFARI science data links, in a point-to-point topology. Two different communication protocols have been used in the tests, the ECSS-E-ST-50-52C RMAP protocol and an internally defined one, the SAFARI internal data handling protocol. An incremental approach has been adopted to measure the system performances at different levels of the communication protocol complexity. In all cases the performance has been evaluated by measuring the CPU workload and the bus latencies. The tests have been executed initially in a custom low level execution environment and finally using the Real- Time Executive for Multiprocessor Systems (RTEMS), which has been selected as the operating system to be used onboard SAFARI. The preliminary results of the carried out performance analysis confirmed the possibility of using a LEON3 CPU processor in the SAFARI DPU, but pointed out, in agreement

  5. Online real-time reconstruction of adaptive TSENSE with commodity CPU/GPU hardware.

    Science.gov (United States)

    Roujol, Sébastien; de Senneville, Baudouin Denis; Vahala, Erkki; Sørensen, Thomas Sangild; Moonen, Chrit; Ries, Mario

    2009-12-01

    Adaptive temporal sensitivity encoding (TSENSE) has been suggested as a robust parallel imaging method suitable for MR guidance of interventional procedures. However, in practice, the reconstruction of adaptive TSENSE images obtained with large coil arrays leads to long reconstruction times and latencies and thus hampers its use for applications such as MR-guided thermotherapy or cardiovascular catheterization. Here, we demonstrate a real-time reconstruction pipeline for adaptive TSENSE with low image latencies and high frame rates on affordable commodity personal computer hardware. For typical image sizes used in interventional imaging (128 x 96, 16 channels, sensitivity encoding (SENSE) factor 2-4), the pipeline is able to reconstruct adaptive TSENSE images with image latencies below 90 ms at frame rates of up to 40 images/s, rendering the MR performance in practice limited by the constraints of the MR acquisition. Its performance is demonstrated by the online reconstruction of in vivo MR images for rapid temperature mapping of the kidney and for cardiac catheterization. (c) 2009 Wiley-Liss, Inc.

  6. CPU Scheduling Algorithms: A Survey

    Directory of Open Access Journals (Sweden)

    Imran Qureshi

    2014-01-01

    Full Text Available Scheduling is the fundamental function of operating system. For scheduling, resources of system shared among processes which are going to be executed. CPU scheduling is a technique by which processes are allocating to the CPU for a specific time quantum. In this paper the review of different scheduling algorithms are perform with different parameters, such as running time, burst time and waiting times etc. The reviews algorithms are first come first serve, Shortest Job First, Round Robin, and Priority scheduling algorithm.

  7. Disease distribution and outcome in troponin-positive patients with or without revascularization in a chest pain unit: results of the German CPU-Registry.

    Science.gov (United States)

    Illmann, Alexander; Riemer, Thomas; Erbel, Raimund; Giannitsis, Evangelos; Hamm, Christian; Haude, Michael; Heusch, Gerd; Maier, Lars S; Münzel, Thomas; Schmitt, Claus; Schumacher, Burghard; Senges, Jochen; Voigtländer, Thomas; Mudra, Harald

    2014-01-01

    The aim of this analysis was to compare troponin-positive patients presenting to a chest pain unit (CPU) and undergoing coronary angiography with or without subsequent revascularization. Leading diagnosis, disease distribution, and short-term outcomes were evaluated. Chest pain units are increasingly implemented to promptly clarify acute chest pain of uncertain origin, including patients with suspected acute coronary syndrome (ACS). A total of 11,753 patients were prospectively enrolled into the German CPU-Registry of the German Cardiac Society between December 2008 and April 2011. All patients with elevated troponin undergoing a coronary angiography were selected. Three months after discharge a follow-up was performed. A total of 2,218 patients were included. 1,613 troponin-positive patients (72.7 %) underwent a coronary angiography with subsequent PCI or CABG and had an ACS in 96.0 %. In contrast, 605 patients (27.3 %) underwent a coronary angiography without revascularization and had an ACS in 79.8 %. The most frequent non-coronary diagnoses in non-revascularized patients were acute arrhythmias (13.4 %), pericarditis/myocarditis (4.5 %), decompensated congestive heart failure (3.7 %), Takotsubo cardiomyopathy (2.7 %), hypertensive crisis (2.4 %), and pulmonary embolism (0.3 %). During the 3-month followup, patients without revascularization had a higher mortality (12.1 vs. 4.5 %, pCPU with elevated troponin levels mostly suffer from ACS and in a smaller proportion a variety of different diseases are responsible. The short-term outcome in troponin-positive patients with or without an ACS not undergoing a revascularization was worse, indicating that these patients were more seriously ill than patients with revascularization of the culprit lesion. Therefore, an adequate diagnostic evaluation and improved treatment strategies are warranted.

  8. CPU-GPU mixed implementation of virtual node method for real-time interactive cutting of deformable objects using OpenCL.

    Science.gov (United States)

    Jia, Shiyu; Zhang, Weizhong; Yu, Xiaokang; Pan, Zhenkuan

    2015-09-01

    Surgical simulators need to simulate interactive cutting of deformable objects in real time. The goal of this work was to design an interactive cutting algorithm that eliminates traditional cutting state classification and can work simultaneously with real-time GPU-accelerated deformation without affecting its numerical stability. A modified virtual node method for cutting is proposed. Deformable object is modeled as a real tetrahedral mesh embedded in a virtual tetrahedral mesh, and the former is used for graphics rendering and collision, while the latter is used for deformation. Cutting algorithm first subdivides real tetrahedrons to eliminate all face and edge intersections, then splits faces, edges and vertices along cutting tool trajectory to form cut surfaces. Next virtual tetrahedrons containing more than one connected real tetrahedral fragments are duplicated, and connectivity between virtual tetrahedrons is updated. Finally, embedding relationship between real and virtual tetrahedral meshes is updated. Co-rotational linear finite element method is used for deformation. Cutting and collision are processed by CPU, while deformation is carried out by GPU using OpenCL. Efficiency of GPU-accelerated deformation algorithm was tested using block models with varying numbers of tetrahedrons. Effectiveness of our cutting algorithm under multiple cuts and self-intersecting cuts was tested using a block model and a cylinder model. Cutting of a more complex liver model was performed, and detailed performance characteristics of cutting, deformation and collision were measured and analyzed. Our cutting algorithm can produce continuous cut surfaces when traditional minimal element creation algorithm fails. Our GPU-accelerated deformation algorithm remains stable with constant time step under multiple arbitrary cuts and works on both NVIDIA and AMD GPUs. GPU-CPU speed ratio can be as high as 10 for models with 80,000 tetrahedrons. Forty to sixty percent real-time

  9. A heterogeneous system based on GPU and multi-core CPU for real-time fluid and rigid body simulation

    Science.gov (United States)

    da Silva Junior, José Ricardo; Gonzalez Clua, Esteban W.; Montenegro, Anselmo; Lage, Marcos; Dreux, Marcelo de Andrade; Joselli, Mark; Pagliosa, Paulo A.; Kuryla, Christine Lucille

    2012-03-01

    Computational fluid dynamics in simulation has become an important field not only for physics and engineering areas but also for simulation, computer graphics, virtual reality and even video game development. Many efficient models have been developed over the years, but when many contact interactions must be processed, most models present difficulties or cannot achieve real-time results when executed. The advent of parallel computing has enabled the development of many strategies for accelerating the simulations. Our work proposes a new system which uses some successful algorithms already proposed, as well as a data structure organisation based on a heterogeneous architecture using CPUs and GPUs, in order to process the simulation of the interaction of fluids and rigid bodies. This successfully results in a two-way interaction between them and their surrounding objects. As far as we know, this is the first work that presents a computational collaborative environment which makes use of two different paradigms of hardware architecture for this specific kind of problem. Since our method achieves real-time results, it is suitable for virtual reality, simulation and video game fluid simulation problems.

  10. Feasibility Analysis of Low Cost Graphical Processing Units for Electromagnetic Field Simulations by Finite Difference Time Domain Method

    CERN Document Server

    Choudhari, A V; Gupta, M R

    2013-01-01

    Among several techniques available for solving Computational Electromagnetics (CEM) problems, the Finite Difference Time Domain (FDTD) method is one of the best suited approaches when a parallelized hardware platform is used. In this paper we investigate the feasibility of implementing the FDTD method using the NVIDIA GT 520, a low cost Graphical Processing Unit (GPU), for solving the differential form of Maxwell's equation in time domain. Initially a generalized benchmarking problem of bandwidth test and another benchmarking problem of 'matrix left division is discussed for understanding the correlation between the problem size and the performance on the CPU and the GPU respectively. This is further followed by the discussion of the FDTD method, again implemented on both, the CPU and the GT520 GPU. For both of the above comparisons, the CPU used is Intel E5300, a low cost dual core CPU.

  11. Evaluating Mobile Graphics Processing Units (GPUs) for Real-Time Resource Constrained Applications

    Energy Technology Data Exchange (ETDEWEB)

    Meredith, J; Conger, J; Liu, Y; Johnson, J

    2005-11-11

    Modern graphics processing units (GPUs) can provide tremendous performance boosts for some applications beyond what a single CPU can accomplish, and their performance is growing at a rate faster than CPUs as well. Mobile GPUs available for laptops have the small form factor and low power requirements suitable for use in embedded processing. We evaluated several desktop and mobile GPUs and CPUs on traditional and non-traditional graphics tasks, as well as on the most time consuming pieces of a full hyperspectral imaging application. Accuracy remained high despite small differences in arithmetic operations like rounding. Performance improvements are summarized here relative to a desktop Pentium 4 CPU.

  12. Dynamic Quantum Allocation and Swap-Time Variability in Time-Sharing Operating Systems.

    Science.gov (United States)

    Bhat, U. Narayan; Nance, Richard E.

    The effects of dynamic quantum allocation and swap-time variability on central processing unit (CPU) behavior are investigated using a model that allows both quantum length and swap-time to be state-dependent random variables. Effective CPU utilization is defined to be the proportion of a CPU busy period that is devoted to program processing, i.e.…

  13. CPU and GPU (Cuda Template Matching Comparison

    Directory of Open Access Journals (Sweden)

    Evaldas Borcovas

    2014-05-01

    Full Text Available Image processing, computer vision or other complicated opticalinformation processing algorithms require large resources. It isoften desired to execute algorithms in real time. It is hard tofulfill such requirements with single CPU processor. NVidiaproposed CUDA technology enables programmer to use theGPU resources in the computer. Current research was madewith Intel Pentium Dual-Core T4500 2.3 GHz processor with4 GB RAM DDR3 (CPU I, NVidia GeForce GT320M CUDAcompliable graphics card (GPU I and Intel Core I5-2500K3.3 GHz processor with 4 GB RAM DDR3 (CPU II, NVidiaGeForce GTX 560 CUDA compatible graphic card (GPU II.Additional libraries as OpenCV 2.1 and OpenCV 2.4.0 CUDAcompliable were used for the testing. Main test were made withstandard function MatchTemplate from the OpenCV libraries.The algorithm uses a main image and a template. An influenceof these factors was tested. Main image and template have beenresized and the algorithm computing time and performancein Gtpix/s have been measured. According to the informationobtained from the research GPU computing using the hardwarementioned earlier is till 24 times faster when it is processing abig amount of information. When the images are small the performanceof CPU and GPU are not significantly different. Thechoice of the template size makes influence on calculating withCPU. Difference in the computing time between the GPUs canbe explained by the number of cores which they have.

  14. CPU Efficiency Enhancement through Offload

    National Research Council Canada - National Science Library

    Naeem Akhter; Iqra Sattar

    2017-01-01

      There are several causes of slowness in personal computers. While working on a PC to regularly execute jobs of similar nature, it is essential to be aware of the reasons of slowness to achieve the optimal CPU speed...

  15. Unit 6 Nap Time

    Institute of Scientific and Technical Information of China (English)

    2004-01-01

    <正> Story: After a busy morning,it is time for a little rest."It’s nap time. Take off your shoes,"says Miss Grant. She helps everyone get their sleeping bags and makes sure that everyone has put their shoes outside Before everyone falls asleep they ask the teacher," Read us a story, please."

  16. The UMIDUS program release 2.1 and the CPU time reduction for long term simulations; O programa UMIDUS 2.1 e a reducao de tempo de CPU em simulacoes de longos periodos

    Energy Technology Data Exchange (ETDEWEB)

    Mendes, Nathan [Pontificia Universidade Catolica do Parana, Curitiba, PR (Brazil). Lab. de Sistemas Termicos]. E-mail: nmemdes@cdet.pucpr.br; Lamberts, Roberto; Philippi, Paulo C. [Santa Catarina Univ., Florianopolis, SC (Brazil). Lab. de Eficiencia Energetica de Edificacoes]|[Santa Catarina Univ., Florianopolis, SC (Brazil). Lab. de Meios Porosos e Propriedades Termofisicas de Edificacoes]. E-mail: lamberts@ecv.ufsc.br; philippi@lmpt.ufsc.br

    2000-07-01

    This paper presents the Umidus program r2.1 which has been developed to model coupled heat and moisture transfer within porous media, in order to analyze hygrothemal performance of building elements when subjected to any kind of climate conditions. The model predicts moisture and temperature profiles within multi-layer building elements for any time step and calculates heat and mass transfer. We also present the new Umidus 2.1 capability, allowing to speed-up simulations with no loss in accuracy of temperature and moisture content profile determination. (author)

  17. Importance of Explicit Vectorization for CPU and GPU Software Performance

    CERN Document Server

    Dickson, Neil G; Hamze, Firas

    2010-01-01

    Much of the current focus in high-performance computing is on multi-threading, multi-computing, and graphics processing unit (GPU) computing. However, vectorization and non-parallel optimization techniques, which can often be employed additionally, are less frequently discussed. In this paper, we present an analysis of several optimizations done on both central processing unit (CPU) and GPU implementations of a particular computationally intensive Metropolis Monte Carlo algorithm. Explicit vectorization on the CPU and the equivalent, explicit memory coalescing, on the GPU are found to be critical to achieving good performance of this algorithm in both environments. The fully-optimized CPU version achieves a 9x to 12x speedup over the original CPU version, in addition to speedup from multi-threading. This is 2x faster than the fully-optimized GPU version.

  18. A Hybrid CPU/GPU Pattern-Matching Algorithm for Deep Packet Inspection.

    Directory of Open Access Journals (Sweden)

    Chun-Liang Lee

    Full Text Available The large quantities of data now being transferred via high-speed networks have made deep packet inspection indispensable for security purposes. Scalable and low-cost signature-based network intrusion detection systems have been developed for deep packet inspection for various software platforms. Traditional approaches that only involve central processing units (CPUs are now considered inadequate in terms of inspection speed. Graphic processing units (GPUs have superior parallel processing power, but transmission bottlenecks can reduce optimal GPU efficiency. In this paper we describe our proposal for a hybrid CPU/GPU pattern-matching algorithm (HPMA that divides and distributes the packet-inspecting workload between a CPU and GPU. All packets are initially inspected by the CPU and filtered using a simple pre-filtering algorithm, and packets that might contain malicious content are sent to the GPU for further inspection. Test results indicate that in terms of random payload traffic, the matching speed of our proposed algorithm was 3.4 times and 2.7 times faster than those of the AC-CPU and AC-GPU algorithms, respectively. Further, HPMA achieved higher energy efficiency than the other tested algorithms.

  19. A Hybrid CPU/GPU Pattern-Matching Algorithm for Deep Packet Inspection.

    Science.gov (United States)

    Lee, Chun-Liang; Lin, Yi-Shan; Chen, Yaw-Chung

    2015-01-01

    The large quantities of data now being transferred via high-speed networks have made deep packet inspection indispensable for security purposes. Scalable and low-cost signature-based network intrusion detection systems have been developed for deep packet inspection for various software platforms. Traditional approaches that only involve central processing units (CPUs) are now considered inadequate in terms of inspection speed. Graphic processing units (GPUs) have superior parallel processing power, but transmission bottlenecks can reduce optimal GPU efficiency. In this paper we describe our proposal for a hybrid CPU/GPU pattern-matching algorithm (HPMA) that divides and distributes the packet-inspecting workload between a CPU and GPU. All packets are initially inspected by the CPU and filtered using a simple pre-filtering algorithm, and packets that might contain malicious content are sent to the GPU for further inspection. Test results indicate that in terms of random payload traffic, the matching speed of our proposed algorithm was 3.4 times and 2.7 times faster than those of the AC-CPU and AC-GPU algorithms, respectively. Further, HPMA achieved higher energy efficiency than the other tested algorithms.

  20. GeantV: from CPU to accelerators

    Science.gov (United States)

    Amadio, G.; Ananya, A.; Apostolakis, J.; Arora, A.; Bandieramonte, M.; Bhattacharyya, A.; Bianchini, C.; Brun, R.; Canal, P.; Carminati, F.; Duhem, L.; Elvira, D.; Gheata, A.; Gheata, M.; Goulas, I.; Iope, R.; Jun, S.; Lima, G.; Mohanty, A.; Nikitina, T.; Novak, M.; Pokorski, W.; Ribon, A.; Sehgal, R.; Shadura, O.; Vallecorsa, S.; Wenzel, S.; Zhang, Y.

    2016-10-01

    The GeantV project aims to research and develop the next-generation simulation software describing the passage of particles through matter. While the modern CPU architectures are being targeted first, resources such as GPGPU, Intel© Xeon Phi, Atom or ARM cannot be ignored anymore by HEP CPU-bound applications. The proof of concept GeantV prototype has been mainly engineered for CPU's having vector units but we have foreseen from early stages a bridge to arbitrary accelerators. A software layer consisting of architecture/technology specific backends supports currently this concept. This approach allows to abstract out the basic types such as scalar/vector but also to formalize generic computation kernels using transparently library or device specific constructs based on Vc, CUDA, Cilk+ or Intel intrinsics. While the main goal of this approach is portable performance, as a bonus, it comes with the insulation of the core application and algorithms from the technology layer. This allows our application to be long term maintainable and versatile to changes at the backend side. The paper presents the first results of basket-based GeantV geometry navigation on the Intel© Xeon Phi KNC architecture. We present the scalability and vectorization study, conducted using Intel performance tools, as well as our preliminary conclusions on the use of accelerators for GeantV transport. We also describe the current work and preliminary results for using the GeantV transport kernel on GPUs.

  1. Novel hybrid GPU-CPU implementation of parallelized Monte Carlo parametric expectation maximization estimation method for population pharmacokinetic data analysis.

    Science.gov (United States)

    Ng, C M

    2013-10-01

    The development of a population PK/PD model, an essential component for model-based drug development, is both time- and labor-intensive. A graphical-processing unit (GPU) computing technology has been proposed and used to accelerate many scientific computations. The objective of this study was to develop a hybrid GPU-CPU implementation of parallelized Monte Carlo parametric expectation maximization (MCPEM) estimation algorithm for population PK data analysis. A hybrid GPU-CPU implementation of the MCPEM algorithm (MCPEMGPU) and identical algorithm that is designed for the single CPU (MCPEMCPU) were developed using MATLAB in a single computer equipped with dual Xeon 6-Core E5690 CPU and a NVIDIA Tesla C2070 GPU parallel computing card that contained 448 stream processors. Two different PK models with rich/sparse sampling design schemes were used to simulate population data in assessing the performance of MCPEMCPU and MCPEMGPU. Results were analyzed by comparing the parameter estimation and model computation times. Speedup factor was used to assess the relative benefit of parallelized MCPEMGPU over MCPEMCPU in shortening model computation time. The MCPEMGPU consistently achieved shorter computation time than the MCPEMCPU and can offer more than 48-fold speedup using a single GPU card. The novel hybrid GPU-CPU implementation of parallelized MCPEM algorithm developed in this study holds a great promise in serving as the core for the next-generation of modeling software for population PK/PD analysis.

  2. A Fusion Model for CPU Load Prediction in Cloud Computing

    Directory of Open Access Journals (Sweden)

    Dayu Xu

    2013-11-01

    Full Text Available Load prediction plays a key role in cost-optimal resource allocation and datacenter energy saving. In this paper, we use real-world traces from Cloud platform and propose a fusion model to forecast the future CPU loads. First, long CPU load time series data are divided into short sequences with same length from the historical data on the basis of cloud control cycle. Then we use kernel fuzzy c-means clustering algorithm to put the subsequences into different clusters. For each cluster, with current load sequence, a genetic algorithm optimized wavelet Elman neural network prediction model is exploited to predict the CPU load in next time interval. Finally, we obtain the optimal cloud computing CPU load prediction results from the cluster and its corresponding predictor with minimum forecasting error. Experimental results show that our algorithm performs better than other models reported in previous works.

  3. Significantly reducing registration time in IGRT using graphics processing units

    DEFF Research Database (Denmark)

    Noe, Karsten Østergaard; Denis de Senneville, Baudouin; Tanderup, Kari

    2008-01-01

    Purpose/Objective For online IGRT, rapid image processing is needed. Fast parallel computations using graphics processing units (GPUs) have recently been made more accessible through general purpose programming interfaces. We present a GPU implementation of the Horn and Schunck method...... respiration phases in a free breathing volunteer and 41 anatomical landmark points in each image series. The registration method used is a multi-resolution GPU implementation of the 3D Horn and Schunck algorithm. It is based on the CUDA framework from Nvidia. Results On an Intel Core 2 CPU at 2.4GHz each...... registration took 30 minutes. On an Nvidia Geforce 8800GTX GPU in the same machine this registration took 37 seconds, making the GPU version 48.7 times faster. The nine image series of different respiration phases were registered to the same reference image (full inhale). Accuracy was evaluated on landmark...

  4. Plasma carboxypeptidase U (CPU, CPB2, TAFIa) generation during in vitro clot lysis and its interplay between coagulation and fibrinolysis.

    Science.gov (United States)

    Leenaerts, Dorien; Aernouts, Jef; Van Der Veken, Pieter; Sim, Yani; Lambeir, Anne-Marie; Hendriks, Dirk

    2017-07-26

    Carboxypeptidase U (CPU, CPB2, TAFIa) is a basic carboxypeptidase that is able to attenuate fibrinolysis. The inactive precursor procarboxypeptidase U is converted to its active form by thrombin, the thrombin-thrombomodulin complex or plasmin. The aim of this study was to investigate and characterise the time course of CPU generation in healthy individuals. In plasma of 29 healthy volunteers, CPU generation was monitored during in vitro clot lysis. CPU activity was measured by means of an enzymatic assay that uses the specific substrate Bz-o-cyano-Phe-Arg. An algorithm was written to plot the CPU generation curve and calculate the parameters that define it. In all individuals, CPU generation was biphasic. Marked inter-individual differences were present and a reference range was determined. The endogenous CPU generation potential is the composite effect of multiple factors. With respect to the first CPU activity peak characteristics, we found correlations with baseline proCPU concentration, proCPU Thr325Ile polymorphism, time to clot initiation and the clot lysis time. The second CPU peak related with baseline proCPU levels and with the maximum turbidity of the clot lysis profile. In conclusion, our method offers a technique to determine the endogenous CPU generation potential of an individual. The parameters obtained by the method quantitatively describe the different mechanisms that influence CPU generation during the complex interplay between coagulation and fibrinolysis, which are in line with the threshold hypothesis.

  5. Semiempirical Quantum Chemical Calculations Accelerated on a Hybrid Multicore CPU-GPU Computing Platform.

    Science.gov (United States)

    Wu, Xin; Koslowski, Axel; Thiel, Walter

    2012-07-10

    In this work, we demonstrate that semiempirical quantum chemical calculations can be accelerated significantly by leveraging the graphics processing unit (GPU) as a coprocessor on a hybrid multicore CPU-GPU computing platform. Semiempirical calculations using the MNDO, AM1, PM3, OM1, OM2, and OM3 model Hamiltonians were systematically profiled for three types of test systems (fullerenes, water clusters, and solvated crambin) to identify the most time-consuming sections of the code. The corresponding routines were ported to the GPU and optimized employing both existing library functions and a GPU kernel that carries out a sequence of noniterative Jacobi transformations during pseudodiagonalization. The overall computation times for single-point energy calculations and geometry optimizations of large molecules were reduced by one order of magnitude for all methods, as compared to runs on a single CPU core.

  6. Quantification of speed-up and accuracy of multi-CPU computational flow dynamics simulations of hemodynamics in a posterior communicating artery aneurysm of complex geometry.

    Science.gov (United States)

    Karmonik, Christof; Yen, Christopher; Gabriel, Edgar; Partovi, Sasan; Horner, Marc; Zhang, Yi J; Klucznik, Richard P; Diaz, Orlando; Grossman, Robert G

    2013-11-01

    Towards the translation of computational fluid dynamics (CFD) techniques into the clinical workflow, performance increases achieved with parallel multi-central processing unit (CPU) pulsatile CFD simulations in a patient-derived model of a bilobed posterior communicating artery aneurysm were evaluated while simultaneously monitoring changes in the accuracy of the solution. Simulations were performed using 2, 4, 6, 8, 10 and 12 processors. In addition, a baseline simulation was obtained with a dual-core dual CPU computer of similar computational power to clinical imaging workstations. Parallel performance indices including computation speed-up, efficiency (speed-up divided by number of processors), computational cost (computation time × number of processors) and accuracy (velocity at four distinct locations: proximal and distal to the aneurysm, in the aneurysm ostium and aneurysm dome) were determined from the simulations and compared. Total computation time decreased from 9 h 10 min (baseline) to 2 h 34 min (10 CPU). Speed-up relative to baseline increased from 1.35 (2 CPU) to 3.57 (maximum at 10 CPU) while efficiency decreased from 0.65 to 0.35 with increasing cost (33.013 to 92.535). Relative velocity component deviations were less than 0.0073% and larger for 12 CPU than for 2 CPU (0.004 ± 0.002%, not statistically significant, p=0.07). Without compromising accuracy, parallel multi-CPU simulation reduces computing time for the simulation of hemodynamics in a model of a cerebral aneurysm by up to a factor of 3.57 (10 CPUs) to 2 h 34 min compared with a workstation with computational power similar to clinical imaging workstations.

  7. Study on efficiency of time computation in x-ray imaging simulation base on Monte Carlo algorithm using graphics processing unit

    Science.gov (United States)

    Setiani, Tia Dwi; Suprijadi, Haryanto, Freddy

    2016-03-01

    Monte Carlo (MC) is one of the powerful techniques for simulation in x-ray imaging. MC method can simulate the radiation transport within matter with high accuracy and provides a natural way to simulate radiation transport in complex systems. One of the codes based on MC algorithm that are widely used for radiographic images simulation is MC-GPU, a codes developed by Andrea Basal. This study was aimed to investigate the time computation of x-ray imaging simulation in GPU (Graphics Processing Unit) compared to a standard CPU (Central Processing Unit). Furthermore, the effect of physical parameters to the quality of radiographic images and the comparison of image quality resulted from simulation in the GPU and CPU are evaluated in this paper. The simulations were run in CPU which was simulated in serial condition, and in two GPU with 384 cores and 2304 cores. In simulation using GPU, each cores calculates one photon, so, a large number of photon were calculated simultaneously. Results show that the time simulations on GPU were significantly accelerated compared to CPU. The simulations on the 2304 core of GPU were performed about 64 -114 times faster than on CPU, while the simulation on the 384 core of GPU were performed about 20 - 31 times faster than in a single core of CPU. Another result shows that optimum quality of images from the simulation was gained at the history start from 108 and the energy from 60 Kev to 90 Kev. Analyzed by statistical approach, the quality of GPU and CPU images are relatively the same.

  8. Using all of your CPU's in HIPE

    Science.gov (United States)

    Jacobson, J. D.; Fadda, D.

    2012-09-01

    Modern computer architectures increasingly feature multi-core CPU's. For example, the MacbookPro features the Intel quad-core i7 processors. Through the use of hyper-threading, where each core can execute two threads simultaneously, the quad-core i7 can support eight simultaneous processing threads. All this on your laptop! This CPU power can now be put into service by scientists to perform data reduction tasks, but only if the software has been designed to take advantage of the multiple processor architectures. Up to now, software written for Herschel data reduction (HIPE), written in Jython and JAVA, is single-threaded and can only utilize a single processor. Users of HIPE do not get any advantage from the additional processors. Why not put all of the CPU resources to work reducing your data? We present a multi-threaded software application that corrects long-term transients in the signal from the PACS unchopped spectroscopy line scan mode. In this poster, we present a multi-threaded software framework to achieve performance improvements from parallel execution. We will show how a task to correct transients in the PACS Spectroscopy Pipeline for the un-chopped line scan mode, has been threaded. This computation-intensive task uses either a one-parameter or a three parameter exponential function, to characterize the transient. The task uses a JAVA implementation of Minpack, translated from the C (Moshier) and IDL (Markwardt) by the authors, to optimize the correction parameters. We also explain how to determine if a task can benefit from threading (Amdahl's Law), and if it is safe to thread. The design and implementation, using the JAVA concurrency package completions service is described. Pitfalls, timing bugs, thread safety, resource control, testing and performance improvements are described and plotted.

  9. A multi-core CPU pipeline architecture for virtual environments.

    Science.gov (United States)

    Acosta, Eric; Liu, Alan; Sieck, Jennifer; Muniz, Gilbert; Bowyer, Mark; Armonda, Rocco

    2009-01-01

    Physically-based virtual environments (VEs) provide realistic interactions and behaviors for computer-based medical simulations. Limited CPU resources have traditionally forced VEs to be simplified for real-time performance. Multi-core processors greatly increase the computational capacity of computers and are quickly becoming standard. However, developing non-application specific methods to fully utilize all available CPU cores for processing VEs is difficult. The paper describes a pipeline VE architecture designed for multi-core CPU systems. The architecture enables development of VEs that leverage the computational resources of all CPU cores for VE simulation. A VE's workload is dynamically distributed across the available CPU cores. A VE can be developed once and scale efficiently with the number of cores. The described pipeline architecture makes it possible to develop complex physically-based VEs for medical simulations. Initial results for a craniotomy simulator being developed have shown super-linear and near-linear speedups when tested with up to four cores.

  10. On Modeling CPU Utilization of MapReduce Applications

    CERN Document Server

    Rizvandi, Nikzad Babaii; Zomaya, Albert Y

    2012-01-01

    In this paper, we present an approach to predict the total CPU utilization in terms of CPU clock tick of applications when running on MapReduce framework. Our approach has two key phases: profiling and modeling. In the profiling phase, an application is run several times with different sets of MapReduce configuration parameters to profile total CPU clock tick of the application on a given platform. In the modeling phase, multi linear regression is used to map the sets of MapReduce configuration parameters (number of Mappers, number of Reducers, size of File System (HDFS) and the size of input file) to total CPU clock ticks of the application. This derived model can be used for predicting total CPU requirements of the same application when using MapReduce framework on the same platform. Our approach aims to eliminate error-prone manual processes and presents a fully automated solution. Three standard applications (WordCount, Exim Mainlog parsing and Terasort) are used to evaluate our modeling technique on pseu...

  11. STEM image simulation with hybrid CPU/GPU programming.

    Science.gov (United States)

    Yao, Y; Ge, B H; Shen, X; Wang, Y G; Yu, R C

    2016-07-01

    STEM image simulation is achieved via hybrid CPU/GPU programming under parallel algorithm architecture to speed up calculation on a personal computer (PC). To utilize the calculation power of a PC fully, the simulation is performed using the GPU core and multi-CPU cores at the same time to significantly improve efficiency. GaSb and an artificial GaSb/InAs interface with atom diffusion have been used to verify the computation. Copyright © 2016 Elsevier B.V. All rights reserved.

  12. Accelerating Smith-Waterman Alignment for Protein Database Search Using Frequency Distance Filtration Scheme Based on CPU-GPU Collaborative System.

    Science.gov (United States)

    Liu, Yu; Hong, Yang; Lin, Chun-Yuan; Hung, Che-Lun

    2015-01-01

    The Smith-Waterman (SW) algorithm has been widely utilized for searching biological sequence databases in bioinformatics. Recently, several works have adopted the graphic card with Graphic Processing Units (GPUs) and their associated CUDA model to enhance the performance of SW computations. However, these works mainly focused on the protein database search by using the intertask parallelization technique, and only using the GPU capability to do the SW computations one by one. Hence, in this paper, we will propose an efficient SW alignment method, called CUDA-SWfr, for the protein database search by using the intratask parallelization technique based on a CPU-GPU collaborative system. Before doing the SW computations on GPU, a procedure is applied on CPU by using the frequency distance filtration scheme (FDFS) to eliminate the unnecessary alignments. The experimental results indicate that CUDA-SWfr runs 9.6 times and 96 times faster than the CPU-based SW method without and with FDFS, respectively.

  13. [Applying graphics processing unit in real-time signal processing and visualization of ophthalmic Fourier-domain OCT system].

    Science.gov (United States)

    Liu, Qiaoyan; Li, Yuejie; Xu, Qiujing; Zhao, Jincheng; Wang, Liwei; Gao, Yonghe

    2013-01-01

    This investigation introduces GPU (Graphics Processing Unit)- based CUDA (Compute Unified Device Architecture) technology into signal processing of ophthalmic FD-OCT (Fourier-Domain Optical Coherence Tomography) imaging system, can realize parallel data processing, using CUDA to optimize relevant operations and algorithms, in order to solve the technical bottlenecks that currently affect ophthalmic real-time imaging in OCT system. Laboratory results showed that with GPU as a general parallel computing processor, the speed of imaging data processing using GPU+CPU mode is more than dozens times faster than traditional CPU platform based serial computing and imaging mode when executing the same data processing, which reaches the clinical requirements for two dimensional real-time imaging.

  14. GPU/CPU Algorithm for Generalized Born/Solvent-Accessible Surface Area Implicit Solvent Calculations.

    Science.gov (United States)

    Tanner, David E; Phillips, James C; Schulten, Klaus

    2012-07-10

    Molecular dynamics methodologies comprise a vital research tool for structural biology. Molecular dynamics has benefited from technological advances in computing, such as multi-core CPUs and graphics processing units (GPUs), but harnessing the full power of hybrid GPU/CPU computers remains difficult. The generalized Born/solvent-accessible surface area implicit solvent model (GB/SA) stands to benefit from hybrid GPU/CPU computers, employing the GPU for the GB calculation and the CPU for the SA calculation. Here, we explore the computational challenges facing GB/SA calculations on hybrid GPU/CPU computers and demonstrate how NAMD, a parallel molecular dynamics program, is able to efficiently utilize GPUs and CPUs simultaneously for fast GB/SA simulations. The hybrid computation principles demonstrated here are generally applicable to parallel applications employing hybrid GPU/CPU calculations.

  15. Performance Analysis of CPU Scheduling Algorithms with Novel OMDRRS Algorithm

    Directory of Open Access Journals (Sweden)

    Neetu Goel

    2016-01-01

    Full Text Available CPU scheduling is one of the most primary and essential part of any operating system. It prioritizes processes to efficiently execute the user requests and help in choosing the appropriate process for execution. Round Robin (RR & Priority Scheduling(PS are one of the most widely used and acceptable CPU scheduling algorithm. But, its performance degrades with respect to turnaround time, waiting time & context switching with each recurrence. A New scheduling algorithm OMDRRS is developed to improve the performance of RR and priority scheduling algorithms. The new algorithm performs better than the popular existing algorithm. Drastic improvement is seen in waiting time, turnaround time, response time and context switching. Comparative analysis of Turn around Time(TAT, Waiting Time(WT, Response Time (RT is shown with the help of ANOVA and t-test.

  16. CPU-GPU hybrid accelerating the Zuker algorithm for RNA secondary structure prediction applications.

    Science.gov (United States)

    Lei, Guoqing; Dou, Yong; Wan, Wen; Xia, Fei; Li, Rongchun; Ma, Meng; Zou, Dan

    2012-01-01

    Prediction of ribonucleic acid (RNA) secondary structure remains one of the most important research areas in bioinformatics. The Zuker algorithm is one of the most popular methods of free energy minimization for RNA secondary structure prediction. Thus far, few studies have been reported on the acceleration of the Zuker algorithm on general-purpose processors or on extra accelerators such as Field Programmable Gate-Array (FPGA) and Graphics Processing Units (GPU). To the best of our knowledge, no implementation combines both CPU and extra accelerators, such as GPUs, to accelerate the Zuker algorithm applications. In this paper, a CPU-GPU hybrid computing system that accelerates Zuker algorithm applications for RNA secondary structure prediction is proposed. The computing tasks are allocated between CPU and GPU for parallel cooperate execution. Performance differences between the CPU and the GPU in the task-allocation scheme are considered to obtain workload balance. To improve the hybrid system performance, the Zuker algorithm is optimally implemented with special methods for CPU and GPU architecture. Speedup of 15.93× over optimized multi-core SIMD CPU implementation and performance advantage of 16% over optimized GPU implementation are shown in the experimental results. More than 14% of the sequences are executed on CPU in the hybrid system. The system combining CPU and GPU to accelerate the Zuker algorithm is proven to be promising and can be applied to other bioinformatics applications.

  17. Accelerating Spaceborne SAR Imaging Using Multiple CPU/GPU Deep Collaborative Computing.

    Science.gov (United States)

    Zhang, Fan; Li, Guojun; Li, Wei; Hu, Wei; Hu, Yuxin

    2016-04-07

    With the development of synthetic aperture radar (SAR) technologies in recent years, the huge amount of remote sensing data brings challenges for real-time imaging processing. Therefore, high performance computing (HPC) methods have been presented to accelerate SAR imaging, especially the GPU based methods. In the classical GPU based imaging algorithm, GPU is employed to accelerate image processing by massive parallel computing, and CPU is only used to perform the auxiliary work such as data input/output (IO). However, the computing capability of CPU is ignored and underestimated. In this work, a new deep collaborative SAR imaging method based on multiple CPU/GPU is proposed to achieve real-time SAR imaging. Through the proposed tasks partitioning and scheduling strategy, the whole image can be generated with deep collaborative multiple CPU/GPU computing. In the part of CPU parallel imaging, the advanced vector extension (AVX) method is firstly introduced into the multi-core CPU parallel method for higher efficiency. As for the GPU parallel imaging, not only the bottlenecks of memory limitation and frequent data transferring are broken, but also kinds of optimized strategies are applied, such as streaming, parallel pipeline and so on. Experimental results demonstrate that the deep CPU/GPU collaborative imaging method enhances the efficiency of SAR imaging on single-core CPU by 270 times and realizes the real-time imaging in that the imaging rate outperforms the raw data generation rate.

  18. Accelerating Spaceborne SAR Imaging Using Multiple CPU/GPU Deep Collaborative Computing

    Directory of Open Access Journals (Sweden)

    Fan Zhang

    2016-04-01

    Full Text Available With the development of synthetic aperture radar (SAR technologies in recent years, the huge amount of remote sensing data brings challenges for real-time imaging processing. Therefore, high performance computing (HPC methods have been presented to accelerate SAR imaging, especially the GPU based methods. In the classical GPU based imaging algorithm, GPU is employed to accelerate image processing by massive parallel computing, and CPU is only used to perform the auxiliary work such as data input/output (IO. However, the computing capability of CPU is ignored and underestimated. In this work, a new deep collaborative SAR imaging method based on multiple CPU/GPU is proposed to achieve real-time SAR imaging. Through the proposed tasks partitioning and scheduling strategy, the whole image can be generated with deep collaborative multiple CPU/GPU computing. In the part of CPU parallel imaging, the advanced vector extension (AVX method is firstly introduced into the multi-core CPU parallel method for higher efficiency. As for the GPU parallel imaging, not only the bottlenecks of memory limitation and frequent data transferring are broken, but also kinds of optimized strategies are applied, such as streaming, parallel pipeline and so on. Experimental results demonstrate that the deep CPU/GPU collaborative imaging method enhances the efficiency of SAR imaging on single-core CPU by 270 times and realizes the real-time imaging in that the imaging rate outperforms the raw data generation rate.

  19. Design and Implementation of Interface Circuit Between CPU and GPU%CPU 与 GPU 之间接口电路的设计与实现

    Institute of Scientific and Technical Information of China (English)

    石茉莉; 蒋林; 刘有耀

    2013-01-01

    During constructing the Collaborative Computing between Central Process Unit and Graphic Process Unit or Central Process Unit and other device .Through the Peripheral Component Interconnect connects Graphic Process Unit and Central Process Unit ,it’ s responsiable for doing the parrallel computing . Point at the asyncronous transmission and timing matched in the connection of Peripheral Component Interconnect IP core and the Graphic Process Unit ,based on the standard of the Peripheral Component Interconnect and the Graphic Process Unit chip , use the method of processing asyncronous signals ,this paper design an timing matched interface circuit between the Central Process Unit and Graphic Process Unit ,which aims at the different clock systems and the timing matced of them .The simulation results prove that the interface circute between Central Process Unit and Graphic Process Unit can works at 252 M Hz frequency ,it achieves the circuit demand ,and realize high-speed data transmission between Graphic Process Unit and Central Process Unit .%在构建CPU(Central Process Unit ,CPU)与GPU(Graphic Process Unit)或者CPU与其它设备协同计算的过程中,通过PCI(Peripheral Component Interconnect)总线将GPU等其他设备连接至CPU ,承担并行计算的任务。为了解决PCI接口芯片与GPU芯片之间的异步传输和时序匹配问题,基于 PCI总线规范与GPU 芯片的时序规范,采用跨时钟域信号的处理方法,设计了一个CPU与GPU 之间跨时钟域连接的时序匹配接口电路。通过仿真,验证了该电路的正确性。结果表明,该电路可工作在252 M Hz频率下,能够满足GPU 与CPU 间接口电路对速率和带宽的要求。

  20. FUPS-DV: Full-Time Preemption CPU Scheduling for Desktop Virtualization%FUPS- DV:用于桌面虚拟化的全时抢占CPU调度算法

    Institute of Scientific and Technical Information of China (English)

    夏虞斌; 杨春; 程旭

    2011-01-01

    Desktop virtualization usually runs mixed workload,and is mote sensitive to me interactive performance. Current virtual machine scheduler cannot meet the two demands.This paper presents a full-time preemption CPU scheduler. It uses grey-box technology to inspect information inside virtual machine to support virtual machine scheduling, and considers the characteristics of remote desktop workload for optimization. The evaluation results show that when S Windows XP virtual machines running mixed workload concurrently, the display latency of slides presentation is reduced by at least 60% with our optimization.%桌面虚拟化通常运行混合负载,且更注重交互式性能,现有的虚拟机调度算法无法很好适应这两个特点.本文提出了一种全时抢占CPU调度算法,通过灰盒技术探测虚拟机内部信息用于辅助虚拟机调度,并结合远程桌面的负载特性进行优化.评测表明,5台WindowsXP虚拟机同时运行混合负载,优化后播放幻灯片的显示延迟降低了至少60%.

  1. An Optimized Round Robin Scheduling Algorithm for CPU Scheduling

    Directory of Open Access Journals (Sweden)

    Ajit Singh

    2010-10-01

    Full Text Available The main objective of this paper is to develop a new approach for round robin scheduling which help to improve the CPU efficiency in real time and time sharing operating system. There are many algorithms available for CPU scheduling. But we cannot implemented in real time operating system because of high context switch rates, large waiting time, large response time, large trn around time and less throughput. The proposed algorithm improves all the drawback of simple round robin architecture. The author have also given comparative analysis of proposed with simple round robin scheduling algorithm. Therefore, the author strongly feel that the proposed architecture solves all the problem encountered in simple round robin architecture by decreasing the performance parameters to desirable extent and thereby increasing the system throughput.

  2. Fast multipurpose Monte Carlo simulation for proton therapy using multi- and many-core CPU architectures.

    Science.gov (United States)

    Souris, Kevin; Lee, John Aldo; Sterpin, Edmond

    2016-04-01

    Accuracy in proton therapy treatment planning can be improved using Monte Carlo (MC) simulations. However the long computation time of such methods hinders their use in clinical routine. This work aims to develop a fast multipurpose Monte Carlo simulation tool for proton therapy using massively parallel central processing unit (CPU) architectures. A new Monte Carlo, called MCsquare (many-core Monte Carlo), has been designed and optimized for the last generation of Intel Xeon processors and Intel Xeon Phi coprocessors. These massively parallel architectures offer the flexibility and the computational power suitable to MC methods. The class-II condensed history algorithm of MCsquare provides a fast and yet accurate method of simulating heavy charged particles such as protons, deuterons, and alphas inside voxelized geometries. Hard ionizations, with energy losses above a user-specified threshold, are simulated individually while soft events are regrouped in a multiple scattering theory. Elastic and inelastic nuclear interactions are sampled from ICRU 63 differential cross sections, thereby allowing for the computation of prompt gamma emission profiles. MCsquare has been benchmarked with the gate/geant4 Monte Carlo application for homogeneous and heterogeneous geometries. Comparisons with gate/geant4 for various geometries show deviations within 2%-1 mm. In spite of the limited memory bandwidth of the coprocessor simulation time is below 25 s for 10(7) primary 200 MeV protons in average soft tissues using all Xeon Phi and CPU resources embedded in a single desktop unit. MCsquare exploits the flexibility of CPU architectures to provide a multipurpose MC simulation tool. Optimized code enables the use of accurate MC calculation within a reasonable computation time, adequate for clinical practice. MCsquare also simulates prompt gamma emission and can thus be used also for in vivo range verification.

  3. Workload-Aware and CPU Frequency Scaling for Optimal Energy Consumption in VM Allocation

    Directory of Open Access Journals (Sweden)

    Zhen Liu

    2014-01-01

    Full Text Available In the problem of VMs consolidation for cloud energy saving, different workloads will ask for different resources. Thus, considering workload characteristic, the VM placement solution will be more reasonable. In the real world, different workload works in a varied CPU utilization during its work time according to its task characteristics. That means energy consumption related to both the CPU utilization and CPU frequency. Therefore, only using the model of CPU frequency to evaluate energy consumption is insufficient. This paper theoretically verified that there will be a CPU frequency best suit for a certain CPU utilization in order to obtain the minimum energy consumption. According to this deduction, we put forward a heuristic CPU frequency scaling algorithm VP-FS (virtual machine placement with frequency scaling. In order to carry the experiments, we realized three typical greedy algorithms for VMs placement and simulate three groups of VM tasks. Our efforts show that different workloads will affect VMs allocation results. Each group of workload has its most suitable algorithm when considering the minimum used physical machines. And because of the CPU frequency scaling, VP-FS has the best results on the total energy consumption compared with the other three algorithms under any of the three groups of workloads.

  4. The Creation of a CPU Timer for High Fidelity Programs

    Science.gov (United States)

    Dick, Aidan A.

    2011-01-01

    Using C and C++ programming languages, a tool was developed that measures the efficiency of a program by recording the amount of CPU time that various functions consume. By inserting the tool between lines of code in the program, one can receive a detailed report of the absolute and relative time consumption associated with each section. After adapting the generic tool for a high-fidelity launch vehicle simulation program called MAVERIC, the components of a frequently used function called "derivatives ( )" were measured. Out of the 34 sub-functions in "derivatives ( )", it was found that the top 8 sub-functions made up 83.1% of the total time spent. In order to decrease the overall run time of MAVERIC, a launch vehicle simulation program, a change was implemented in the sub-function "Event_Controller ( )". Reformatting "Event_Controller ( )" led to a 36.9% decrease in the total CPU time spent by that sub-function, and a 3.2% decrease in the total CPU time spent by the overarching function "derivatives ( )".

  5. Efficient simulation of diffusion-based choice RT models on CPU and GPU.

    Science.gov (United States)

    Verdonck, Stijn; Meers, Kristof; Tuerlinckx, Francis

    2016-03-01

    In this paper, we present software for the efficient simulation of a broad class of linear and nonlinear diffusion models for choice RT, using either CPU or graphical processing unit (GPU) technology. The software is readily accessible from the popular scripting languages MATLAB and R (both 64-bit). The speed obtained on a single high-end GPU is comparable to that of a small CPU cluster, bringing standard statistical inference of complex diffusion models to the desktop platform.

  6. CPU and memory allocation optimization using fuzzy logic

    Science.gov (United States)

    Zalevsky, Zeev; Gur, Eran; Mendlovic, David

    2002-12-01

    The allocation of CPU time and memory resources, are well known problems in organizations with a large number of users, and a single mainframe. Usually the amount of resources given to a single user is based on its own statistics, not on the entire statistics of the organization therefore patterns are not well identified and the allocation system is prodigal. In this work the authors suggest a fuzzy logic based algorithm to optimize the CPU and memory distribution between the users based on the history of the users. The algorithm works separately on heavy users and light users since they have different patterns to be observed. The result is a set of rules, generated by the fuzzy logic inference engine that will allow the system to use its computing ability in an optimized manner. Test results on data taken from the Faculty of Engineering in Tel Aviv University, demonstrate the abilities of the new algorithm.

  7. Fuzzy-logic optical optimization of mainframe CPU and memory

    Science.gov (United States)

    Zalevsky, Zeev; Gur, Eran; Mendlovic, David

    2006-07-01

    The allocation of CPU time and memory resources is a familiar problem in organizations with a large number of users and a single mainframe. Usually the amount of resources allocated to a single user is based on the user's own statistics not on the statistics of the entire organization, therefore patterns are not well identified and the allocation system is prodigal. A fuzzy-logic-based algorithm to optimize the CPU and memory distribution among users based on their history is suggested. The algorithm works on heavy and light users separately since they present different patterns to be observed. The result is a set of rules generated by the fuzzy-logic inference engine that will allow the system to use its computing ability in an optimized manner. Test results on data taken from the Faculty of Engineering of Tel Aviv University demonstrate the capabilities of the new algorithm.

  8. Fuzzy-logic optical optimization of mainframe CPU and memory.

    Science.gov (United States)

    Zalevsky, Zeev; Gur, Eran; Mendlovic, David

    2006-07-01

    The allocation of CPU time and memory resources is a familiar problem in organizations with a large number of users and a single mainframe. Usually the amount of resources allocated to a single user is based on the user's own statistics not on the statistics of the entire organization, therefore patterns are not well identified and the allocation system is prodigal. A fuzzy-logic-based algorithm to optimize the CPU and memory distribution among users based on their history is suggested. The algorithm works on heavy and light users separately since they present different patterns to be observed. The result is a set of rules generated by the fuzzy-logic inference engine that will allow the system to use its computing ability in an optimized manner. Test results on data taken from the Faculty of Engineering of Tel Aviv University demonstrate the capabilities of the new algorithm.

  9. Delay Time Analysis of Reconfigurable Firewall Unit

    Directory of Open Access Journals (Sweden)

    Tomoaki Sato

    2012-10-01

    Full Text Available A firewall function is indispensable for mobile devices and it demands low-power operations. To realize this demand, the authors have developed a firewall unit with a reconfigurable device. The firewall unit needs a large amount of register for the timing adjustment of packets. Using the registers is the cause of power consumption. In this paper, to solve the problem of power consumption, the firewall unit has developed by using wave-pipelining technique and detailed delay time for the technique is analyzed.

  10. Revisiting Molecular Dynamics on a CPU/GPU system: Water Kernel and SHAKE Parallelization.

    Science.gov (United States)

    Ruymgaart, A Peter; Elber, Ron

    2012-11-13

    We report Graphics Processing Unit (GPU) and Open-MP parallel implementations of water-specific force calculations and of bond constraints for use in Molecular Dynamics simulations. We focus on a typical laboratory computing-environment in which a CPU with a few cores is attached to a GPU. We discuss in detail the design of the code and we illustrate performance comparable to highly optimized codes such as GROMACS. Beside speed our code shows excellent energy conservation. Utilization of water-specific lists allows the efficient calculations of non-bonded interactions that include water molecules and results in a speed-up factor of more than 40 on the GPU compared to code optimized on a single CPU core for systems larger than 20,000 atoms. This is up four-fold from a factor of 10 reported in our initial GPU implementation that did not include a water-specific code. Another optimization is the implementation of constrained dynamics entirely on the GPU. The routine, which enforces constraints of all bonds, runs in parallel on multiple Open-MP cores or entirely on the GPU. It is based on Conjugate Gradient solution of the Lagrange multipliers (CG SHAKE). The GPU implementation is partially in double precision and requires no communication with the CPU during the execution of the SHAKE algorithm. The (parallel) implementation of SHAKE allows an increase of the time step to 2.0fs while maintaining excellent energy conservation. Interestingly, CG SHAKE is faster than the usual bond relaxation algorithm even on a single core if high accuracy is expected. The significant speedup of the optimized components transfers the computational bottleneck of the MD calculation to the reciprocal part of Particle Mesh Ewald (PME).

  11. An OpenCL implementation for the solution of TDSE on GPU and CPU architectures

    CERN Document Server

    O'Broin, Cathal

    2012-01-01

    Open Computing Language (OpenCL) is a parallel processing language that is ideally suited for running parallel algorithms on Graphical Processing Units (GPUs). In the present work we report the development of a generic parallel single-GPU code for the numerical solution of a system of first-order ordinary differential equations (ODEs) based on the openCL model. We have applied the code in the case of the time-dependent Schr\\"{o}dinger equation of atomic hydrogen in a strong laser field and studied its performance to the two basic kinds of compute units (GPUs and CPUs) . We found an excellent scalability and a significant speed-up of the GPU over the CPU device tending to a value of about 40.

  12. CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions.

    Science.gov (United States)

    Liu, Yongchao; Wirawan, Adrianto; Schmidt, Bertil

    2013-04-04

    The maximal sensitivity for local alignments makes the Smith-Waterman algorithm a popular choice for protein sequence database search based on pairwise alignment. However, the algorithm is compute-intensive due to a quadratic time complexity. Corresponding runtimes are further compounded by the rapid growth of sequence databases. We present CUDASW++ 3.0, a fast Smith-Waterman protein database search algorithm, which couples CPU and GPU SIMD instructions and carries out concurrent CPU and GPU computations. For the CPU computation, this algorithm employs SSE-based vector execution units as accelerators. For the GPU computation, we have investigated for the first time a GPU SIMD parallelization, which employs CUDA PTX SIMD video instructions to gain more data parallelism beyond the SIMT execution model. Moreover, sequence alignment workloads are automatically distributed over CPUs and GPUs based on their respective compute capabilities. Evaluation on the Swiss-Prot database shows that CUDASW++ 3.0 gains a performance improvement over CUDASW++ 2.0 up to 2.9 and 3.2, with a maximum performance of 119.0 and 185.6 GCUPS, on a single-GPU GeForce GTX 680 and a dual-GPU GeForce GTX 690 graphics card, respectively. In addition, our algorithm has demonstrated significant speedups over other top-performing tools: SWIPE and BLAST+. CUDASW++ 3.0 is written in CUDA C++ and PTX assembly languages, targeting GPUs based on the Kepler architecture. This algorithm obtains significant speedups over its predecessor: CUDASW++ 2.0, by benefiting from the use of CPU and GPU SIMD instructions as well as the concurrent execution on CPUs and GPUs. The source code and the simulated data are available at http://cudasw.sourceforge.net.

  13. Real-world comparison of CPU and GPU implementations of SNPrank: a network analysis tool for GWAS.

    Science.gov (United States)

    Davis, Nicholas A; Pandey, Ahwan; McKinney, B A

    2011-01-15

    Bioinformatics researchers have a variety of programming languages and architectures at their disposal, and recent advances in graphics processing unit (GPU) computing have added a promising new option. However, many performance comparisons inflate the actual advantages of GPU technology. In this study, we carry out a realistic performance evaluation of SNPrank, a network centrality algorithm that ranks single nucleotide polymorhisms (SNPs) based on their importance in the context of a phenotype-specific interaction network. Our goal is to identify the best computational engine for the SNPrank web application and to provide a variety of well-tested implementations of SNPrank for Bioinformaticists to integrate into their research. Using SNP data from the Wellcome Trust Case Control Consortium genome-wide association study of Bipolar Disorder, we compare multiple SNPrank implementations, including Python, Matlab and Java as well as CPU versus GPU implementations. When compared with naïve, single-threaded CPU implementations, the GPU yields a large improvement in the execution time. However, with comparable effort, multi-threaded CPU implementations negate the apparent advantage of GPU implementations. The SNPrank code is open source and available at http://insilico.utulsa.edu/snprank.

  14. SATA Controller Into a Space CPU

    Science.gov (United States)

    De Nino, M.; Titomanlio, D.; Calvanese, R.; Capuano, G.; Rovatti, M.

    2014-08-01

    This paper is dedicated to the presentation of a project, funded by ESA, named "SATA Controller into a Space CPU" aimed at starting a development activity to spin- in the SATA technology to the space market.Space applications could benefit from the adoption of the SATA protocol as interface layer between the host controller and the mass memory module. Currently no space-proven implementation of the SATA specification exists.

  15. Dosimetric comparison of helical tomotherapy treatment plans for total marrow irradiation created using GPU and CPU dose calculation engines.

    Science.gov (United States)

    Nalichowski, Adrian; Burmeister, Jay

    2013-07-01

    To compare optimization characteristics, plan quality, and treatment delivery efficiency between total marrow irradiation (TMI) plans using the new TomoTherapy graphic processing unit (GPU) based dose engine and CPU/cluster based dose engine. Five TMI plans created on an anthropomorphic phantom were optimized and calculated with both dose engines. The planning treatment volume (PTV) included all the bones from head to mid femur except for upper extremities. Evaluated organs at risk (OAR) consisted of lung, liver, heart, kidneys, and brain. The following treatment parameters were used to generate the TMI plans: field widths of 2.5 and 5 cm, modulation factors of 2 and 2.5, and pitch of either 0.287 or 0.43. The optimization parameters were chosen based on the PTV and OAR priorities and the plans were optimized with a fixed number of iterations. The PTV constraint was selected to ensure that at least 95% of the PTV received the prescription dose. The plans were evaluated based on D80 and D50 (dose to 80% and 50% of the OAR volume, respectively) and hotspot volumes within the PTVs. Gamma indices (Γ) were also used to compare planar dose distributions between the two modalities. The optimization and dose calculation times were compared between the two systems. The treatment delivery times were also evaluated. The results showed very good dosimetric agreement between the GPU and CPU calculated plans for any of the evaluated planning parameters indicating that both systems converge on nearly identical plans. All D80 and D50 parameters varied by less than 3% of the prescription dose with an average difference of 0.8%. A gamma analysis Γ(3%, 3 mm) CPU plan. The average number of voxels meeting the Γ CPU/cluster based system was 579 vs 26.8 min for the GPU based system. There was no difference in the calculated treatment delivery time per fraction. Beam-on time varied based on field width and pitch and ranged between 15 and 28 min. The TomoTherapy GPU based dose engine

  16. Optimization strategies for parallel CPU and GPU implementations of a meshfree particle method

    CERN Document Server

    Domínguez, Jose M; Gómez-Gesteira, Moncho

    2011-01-01

    Much of the current focus in high performance computing (HPC) for computational fluid dynamics (CFD) deals with grid based methods. However, parallel implementations for new meshfree particle methods such as Smoothed Particle Hydrodynamics (SPH) are less studied. In this work, we present optimizations for both central processing unit (CPU) and graphics processing unit (GPU) of a SPH method. These optimization strategies can be further applied to many other meshfree methods. The obtained performance for each architecture and a comparison between the most efficient implementations for CPU and GPU are shown.

  17. Pharmacokinetics of CPU0213, a novel endothelin receptor antagonist, after intravenous administration in mice

    Institute of Scientific and Technical Information of China (English)

    Li GUAN; Yu FENG; Min JI; De-zai DAI

    2006-01-01

    Aim: To determine the pharmacokinetics associated with acute toxic doses of CPU0213, a novel endothelin receptor antagonist in mice after a single intravenous administration. Methods: Concentrations in serum and the pharmacokinetic parameters of CPU0213 were assayed by high pressure liquid chromatography (HPLC) following a single intravenous bolus of CPU0213 at concentrations of 25, 50, and 100 mg/kg in mice. The intravenous acute toxicity of CPU0213 was also assessed in mice. Results: A simple, sensitive and selective HPLC method was developed for quantitative determination of CPU0213 in mouse serum. The concentration-time data conform to a 2-compartment model after iv administration of CPU0213 at concentrations of 25, 50,100 mg/kg. The corresponding distribution half-lives (T1/2α) were 3.6, 4.2, 1.1 min and the elimination half-lives (T1/2β) were 39.4,70.3,61.9 min. There was a linear increase in C0 proportional to dose, and the same as AUC0-t and AUC0-∞. AUC0-t and AUC0-∞ were 4.511,13.070,23.666 g·min·L-1 and 4.596,13.679,24.115 g·min·L-1, respectively. The intravenous LD50 was 315.5 mg/kg. Conclusion: First order rate pharmacokinetics were observed for CPU0213 within the range of doses used, and the acute toxicity of CPU0213 is mild.

  18. Ultra-fast hybrid CPU-GPU multiple scatter simulation for 3-D PET.

    Science.gov (United States)

    Kim, Kyung Sang; Son, Young Don; Cho, Zang Hee; Ra, Jong Beom; Ye, Jong Chul

    2014-01-01

    Scatter correction is very important in 3-D PET reconstruction due to a large scatter contribution in measurements. Currently, one of the most popular methods is the so-called single scatter simulation (SSS), which considers single Compton scattering contributions from many randomly distributed scatter points. The SSS enables a fast calculation of scattering with a relatively high accuracy; however, the accuracy of SSS is dependent on the accuracy of tail fitting to find a correct scaling factor, which is often difficult in low photon count measurements. To overcome this drawback as well as to improve accuracy of scatter estimation by incorporating multiple scattering contribution, we propose a multiple scatter simulation (MSS) based on a simplified Monte Carlo (MC) simulation that considers photon migration and interactions due to photoelectric absorption and Compton scattering. Unlike the SSS, the MSS calculates a scaling factor by comparing simulated prompt data with the measured data in the whole volume, which enables a more robust estimation of a scaling factor. Even though the proposed MSS is based on MC, a significant acceleration of the computational time is possible by using a virtual detector array with a larger pitch by exploiting that the scatter distribution varies slowly in spatial domain. Furthermore, our MSS implementation is nicely fit to a parallel implementation using graphic processor unit (GPU). In particular, we exploit a hybrid CPU-GPU technique using the open multiprocessing and the compute unified device architecture, which results in 128.3 times faster than using a single CPU. Overall, the computational time of MSS is 9.4 s for a high-resolution research tomograph (HRRT) system. The performance of the proposed MSS is validated through actual experiments using an HRRT.

  19. Graphic processing unit accelerated real-time partially coherent beam generator

    Science.gov (United States)

    Ni, Xiaolong; Liu, Zhi; Chen, Chunyi; Jiang, Huilin; Fang, Hanhan; Song, Lujun; Zhang, Su

    2016-07-01

    A method of using liquid-crystals (LCs) to generate a partially coherent beam in real-time is described. An expression for generating a partially coherent beam is given and calculated using a graphic processing unit (GPU), i.e., the GeForce GTX 680. A liquid-crystal on silicon (LCOS) with 256 × 256 pixels is used as the partially coherent beam generator (PCBG). An optimizing method with partition convolution is used to improve the generating speed of our LC PCBG. The total time needed to generate a random phase map with a coherence width range from 0.015 mm to 1.5 mm is less than 2.4 ms for calculation and readout with the GPU; adding the time needed for the CPU to read and send to LCOS with the response time of the LC PCBG, the real-time partially coherent beam (PCB) generation frequency of our LC PCBG is up to 312 Hz. To our knowledge, it is the first real-time partially coherent beam generator. A series of experiments based on double pinhole interference are performed. The result shows that to generate a laser beam with a coherence width of 0.9 mm and 1.5 mm, with a mean error of approximately 1%, the RMS values needed 0.021306 and 0.020883 and the PV values required 0.073576 and 0.072998, respectively.

  20. A Novel CPU/GPU Simulation Environment for Large-Scale Biologically-Realistic Neural Modeling

    Directory of Open Access Journals (Sweden)

    Roger V Hoang

    2013-10-01

    Full Text Available Computational Neuroscience is an emerging field that provides unique opportunities to studycomplex brain structures through realistic neural simulations. However, as biological details are added tomodels, the execution time for the simulation becomes longer. Graphics Processing Units (GPUs are now being utilized to accelerate simulations due to their ability to perform computations in parallel. As such, they haveshown significant improvement in execution time compared to Central Processing Units (CPUs. Most neural simulators utilize either multiple CPUs or a single GPU for better performance, but still show limitations in execution time when biological details are not sacrificed. Therefore, we present a novel CPU/GPU simulation environment for large-scale biological networks,the NeoCortical Simulator version 6 (NCS6. NCS6 is a free, open-source, parallelizable, and scalable simula-tor, designed to run on clusters of multiple machines, potentially with high performance computing devicesin each of them. It has built-in leaky-integrate-and-fire (LIF and Izhikevich (IZH neuron models, but usersalso have the capability to design their own plug-in interface for different neuron types as desired. NCS6is currently able to simulate one million cells and 100 million synapses in quasi real time by distributing dataacross these heterogeneous clusters of CPUs and GPUs.

  1. A novel CPU/GPU simulation environment for large-scale biologically realistic neural modeling.

    Science.gov (United States)

    Hoang, Roger V; Tanna, Devyani; Jayet Bray, Laurence C; Dascalu, Sergiu M; Harris, Frederick C

    2013-01-01

    Computational Neuroscience is an emerging field that provides unique opportunities to study complex brain structures through realistic neural simulations. However, as biological details are added to models, the execution time for the simulation becomes longer. Graphics Processing Units (GPUs) are now being utilized to accelerate simulations due to their ability to perform computations in parallel. As such, they have shown significant improvement in execution time compared to Central Processing Units (CPUs). Most neural simulators utilize either multiple CPUs or a single GPU for better performance, but still show limitations in execution time when biological details are not sacrificed. Therefore, we present a novel CPU/GPU simulation environment for large-scale biological networks, the NeoCortical Simulator version 6 (NCS6). NCS6 is a free, open-source, parallelizable, and scalable simulator, designed to run on clusters of multiple machines, potentially with high performance computing devices in each of them. It has built-in leaky-integrate-and-fire (LIF) and Izhikevich (IZH) neuron models, but users also have the capability to design their own plug-in interface for different neuron types as desired. NCS6 is currently able to simulate one million cells and 100 million synapses in quasi real time by distributing data across eight machines with each having two video cards.

  2. 国产CPU自主生产程度评估模型%Evaluation Model for Initiative Degree of Domestic CPU Production

    Institute of Scientific and Technical Information of China (English)

    朱帅; 吴玲达; 郭静

    2016-01-01

    随着国内CPU设计加工技术的不断提高,国产 CPU 已进入批量生产阶段并在军队和政府部门得到初步应用.为鉴别国产 CPU 是否实现独立自主生产,杜绝由植入后门引起的安全隐患,在分析 CPU 设计生产一般流程的基础上,建立CPU自主生产程度指标体系;基于AHP法、Delphi法设计国产 CPU 自主生产程度评估模型,明确指标权重的确定策略与评分依据,得出待评估国产 CPU 自主生产程度值.%With continuous development of China-made central processing unit (CPU)design and manufacturing technology,domestic CPU has been in mass production and primary application in the military and governmental authorities.To identify whether China-made CPU is produced independ-ently and can avoid the security risk caused by implantation,based on the analysis on general process of CPU design and production,the paper establishes an indicator system of degree of independent CPU production;Based on AHP (Analytic Hierarchy Process)method and Delphi method,the paper designs a model of degree of independent production of China-made CPU,makes clear the determina-tion strategy and evaluation basis for the indicator weight and thus obtain the degree of independent production of China-made CPU to be evaluated.

  3. Length-Bounded Hybrid CPU/GPU Pattern Matching Algorithm for Deep Packet Inspection

    Directory of Open Access Journals (Sweden)

    Yi-Shan Lin

    2017-01-01

    Full Text Available Since frequent communication between applications takes place in high speed networks, deep packet inspection (DPI plays an important role in the network application awareness. The signature-based network intrusion detection system (NIDS contains a DPI technique that examines the incoming packet payloads by employing a pattern matching algorithm that dominates the overall inspection performance. Existing studies focused on implementing efficient pattern matching algorithms by parallel programming on software platforms because of the advantages of lower cost and higher scalability. Either the central processing unit (CPU or the graphic processing unit (GPU were involved. Our studies focused on designing a pattern matching algorithm based on the cooperation between both CPU and GPU. In this paper, we present an enhanced design for our previous work, a length-bounded hybrid CPU/GPU pattern matching algorithm (LHPMA. In the preliminary experiment, the performance and comparison with the previous work are displayed, and the experimental results show that the LHPMA can achieve not only effective CPU/GPU cooperation but also higher throughput than the previous method.

  4. An Investigation of the Performance of the Colored Gauss-Seidel Solver on CPU and GPU

    Energy Technology Data Exchange (ETDEWEB)

    Yoon, Jong Seon; Choi, Hyoung Gwon [Seoul Nat’l Univ. of Science and Technology, Seoul (Korea, Republic of); Jeon, Byoung Jin [Yonsei Univ., Seoul (Korea, Republic of)

    2017-02-15

    The performance of the colored Gauss–Seidel solver on CPU and GPU was investigated for the two- and three-dimensional heat conduction problems by using different mesh sizes. The heat conduction equation was discretized by the finite difference method and finite element method. The CPU yielded good performance for small problems but deteriorated when the total memory required for computing was larger than the cache memory for large problems. In contrast, the GPU performed better as the mesh size increased because of the latency hiding technique. Further, GPU computation by the colored Gauss–Siedel solver was approximately 7 times that by the single CPU. Furthermore, the colored Gauss–Seidel solver was found to be approximately twice that of the Jacobi solver when parallel computing was conducted on the GPU.

  5. A Mechanism That Bounds Execution Performance for Process Group for Mitigating CPU Abuse

    Science.gov (United States)

    Yamauchi, Toshihiro; Hara, Takayuki; Taniguchi, Hideo

    Secure OS has been the focus of several studies. However, CPU resources, which are important resources for executing a program, are not the object of access control. For preventing the abuse of CPU resources, we had earlier proposed a new type of execution resource that controls the maximum CPU usage [5,6] The previously proposed mechanism can control only one process at a time. Because most services involve multiple processes, the mechanism should control all the processes in each service. In this paper, we propose an improved mechanism that helps to achieve a bound on the execution performance of a process group, in order to limit unnecessary processor usage. We report the results of an evaluation of our proposed mechanism.

  6. Use of a graphics processing unit (GPU) to facilitate real-time 3D graphic presentation of the patient skin-dose distribution during fluoroscopic interventional procedures.

    Science.gov (United States)

    Rana, Vijay; Rudin, Stephen; Bednarek, Daniel R

    2012-02-23

    We have developed a dose-tracking system (DTS) that calculates the radiation dose to the patient's skin in real-time by acquiring exposure parameters and imaging-system-geometry from the digital bus on a Toshiba Infinix C-arm unit. The cumulative dose values are then displayed as a color map on an OpenGL-based 3D graphic of the patient for immediate feedback to the interventionalist. Determination of those elements on the surface of the patient 3D-graphic that intersect the beam and calculation of the dose for these elements in real time demands fast computation. Reducing the size of the elements results in more computation load on the computer processor and therefore a tradeoff occurs between the resolution of the patient graphic and the real-time performance of the DTS. The speed of the DTS for calculating dose to the skin is limited by the central processing unit (CPU) and can be improved by using the parallel processing power of a graphics processing unit (GPU). Here, we compare the performance speed of GPU-based DTS software to that of the current CPU-based software as a function of the resolution of the patient graphics. Results show a tremendous improvement in speed using the GPU. While an increase in the spatial resolution of the patient graphics resulted in slowing down the computational speed of the DTS on the CPU, the speed of the GPU-based DTS was hardly affected. This GPU-based DTS can be a powerful tool for providing accurate, real-time feedback about patient skin-dose to physicians while performing interventional procedures.

  7. New Multithreaded Hybrid CPU/GPU Approach to Hartree-Fock.

    Science.gov (United States)

    Asadchev, Andrey; Gordon, Mark S

    2012-11-13

    In this article, a new multithreaded Hartree-Fock CPU/GPU method is presented which utilizes automatically generated code and modern C++ techniques to achieve a significant improvement in memory usage and computer time. In particular, the newly implemented Rys Quadrature and Fock Matrix algorithms, implemented as a stand-alone C++ library, with C and Fortran bindings, provides up to 40% improvement over the traditional Fortran Rys Quadrature. The C++ GPU HF code provides approximately a factor of 17.5 improvement over the corresponding C++ CPU code.

  8. Use of a graphics processing unit (GPU) to facilitate real-time 3D graphic presentation of the patient skin-dose distribution during fluoroscopic interventional procedures

    Science.gov (United States)

    Rana, Vijay; Rudin, Stephen; Bednarek, Daniel R.

    2012-03-01

    We have developed a dose-tracking system (DTS) that calculates the radiation dose to the patient's skin in realtime by acquiring exposure parameters and imaging-system-geometry from the digital bus on a Toshiba Infinix C-arm unit. The cumulative dose values are then displayed as a color map on an OpenGL-based 3D graphic of the patient for immediate feedback to the interventionalist. Determination of those elements on the surface of the patient 3D-graphic that intersect the beam and calculation of the dose for these elements in real time demands fast computation. Reducing the size of the elements results in more computation load on the computer processor and therefore a tradeoff occurs between the resolution of the patient graphic and the real-time performance of the DTS. The speed of the DTS for calculating dose to the skin is limited by the central processing unit (CPU) and can be improved by using the parallel processing power of a graphics processing unit (GPU). Here, we compare the performance speed of GPU-based DTS software to that of the current CPU-based software as a function of the resolution of the patient graphics. Results show a tremendous improvement in speed using the GPU. While an increase in the spatial resolution of the patient graphics resulted in slowing down the computational speed of the DTS on the CPU, the speed of the GPU-based DTS was hardly affected. This GPU-based DTS can be a powerful tool for providing accurate, real-time feedback about patient skin-dose to physicians while performing interventional procedures.

  9. "Units of Comparison" across Languages, across Time

    Science.gov (United States)

    Thomas, Margaret

    2009-01-01

    Lardiere's keynote article adverts to a succession of "units of comparison" that have been employed in the study of cross-linguistic differences, including mid-twentieth-century structural patterns, generative grammar's parameters, and (within contemporary Minimalism) features. This commentary expands on the idea of units of cross-linguistic…

  10. Accelerated event-by-event Monte Carlo microdosimetric calculations of electrons and protons tracks on a multi-core CPU and a CUDA-enabled GPU.

    Science.gov (United States)

    Kalantzis, Georgios; Tachibana, Hidenobu

    2014-01-01

    For microdosimetric calculations event-by-event Monte Carlo (MC) methods are considered the most accurate. The main shortcoming of those methods is the extensive requirement for computational time. In this work we present an event-by-event MC code of low projectile energy electron and proton tracks for accelerated microdosimetric MC simulations on a graphic processing unit (GPU). Additionally, a hybrid implementation scheme was realized by employing OpenMP and CUDA in such a way that both GPU and multi-core CPU were utilized simultaneously. The two implementation schemes have been tested and compared with the sequential single threaded MC code on the CPU. Performance comparison was established on the speed-up for a set of benchmarking cases of electron and proton tracks. A maximum speedup of 67.2 was achieved for the GPU-based MC code, while a further improvement of the speedup up to 20% was achieved for the hybrid approach. The results indicate the capability of our CPU-GPU implementation for accelerated MC microdosimetric calculations of both electron and proton tracks without loss of accuracy. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  11. Comparative substrate specificity study of carboxypeptidase U (TAFIa) and carboxypeptidase N: development of highly selective CPU substrates as useful tools for assay development.

    Science.gov (United States)

    Willemse, Johan L; Polla, Magnus; Olsson, Thomas; Hendriks, Dirk F

    2008-01-01

    Measurement of procarboxypeptidase U (TAFI) in plasma by activity-based assays is complicated by the presence of plasma carboxypeptidase N (CPN). Accurate blank measurements, correcting for this interfering CPN activity, should therefore be performed. A selective CPU substrate will make proCPU determination much less time-consuming. We searched for selective and sensitive CPU substrates by kinetic screening of different Bz-Xaa-Arg (Xaa=a naturally occurring amino acid) substrates using a novel kinetic assay. The presence of an aromatic amino acid (Phe, Tyr, Trp) resulted in a fairly high selectivity for CPU which was most pronounced with Bz-Trp-Arg showing a 56-fold higher k(cat)/K(m) value for CPU compared to CPN. Next we performed chemical modifications on the structure of those aromatic amino acids. This approach resulted in a fully selective CPU substrate with a 2.5-fold increase in k(cat) value compared to the commonly used Hip-Arg (Bz-Gly-Arg). We demonstrated significant differences in substrate specificity between CPU and CPN that were previously not fully appreciated. The selective CPU substrate presented in this paper will allow straightforward determination of proCPU in plasma in the future.

  12. COMPARING AND ANALYZING THE SIMILARITIES AND DIFFERENCES BETWEEN CPU HYPER-THREADING AND DUAL-CORE TECHNOLOGIES%比较分析CPU超线程技术与双核技术的异同

    Institute of Scientific and Technical Information of China (English)

    林杰; 余建坤

    2011-01-01

    Hyper-threading and dual-core are two important technologies during the CPU evolution. Hyper-threading technology simulates a physical processor as two "virtual" processors to reduce the idle time of the execution units and some resources, thus increasing CPU utilization. Dual-core technology encapsulates two physical processing cores into one CPU to improve the performance of programs. The paper describes the basic model of CPU, analyzes Hyper-threading and dual-core technology principles, and compares their similarities and differences from three perspectives of system architecture, parallel degree and improved efficiency.%超线程技术和双核技术是CPU发展历程中的重要技术.超线程技术把一个物理处理器模拟成两个“虚拟”的处理器,减少执行单元和一些资源的闲置时间,提高CPU的利用率.双核技术是将两个物理处理核心封装在一个CPU中,提高程序的执行效率.介绍CPU的基本模型,分析超线程和双核的技术原理,并从系统架构、并行程度和提升的效率三个方面比较它们的异同点.

  13. The PAMELA storage and control unit

    Energy Technology Data Exchange (ETDEWEB)

    Casolino, M. [INFN, Structure of Rome II, Physics Department, University of Rome II ' Tor Vergata' , I-00133 Rome (Italy)]. E-mail: Marco.Casolino@roma2.infn.it; Altamura, F. [INFN, Structure of Rome II, Physics Department, University of Rome II ' Tor Vergata' , I-00133 Rome (Italy); Basili, A. [INFN, Structure of Rome II, Physics Department, University of Rome II ' Tor Vergata' , I-00133 Rome (Italy); De Pascale, M.P. [INFN, Structure of Rome II, Physics Department, University of Rome II ' Tor Vergata' , I-00133 Rome (Italy); Minori, M. [INFN, Structure of Rome II, Physics Department, University of Rome II ' Tor Vergata' , I-00133 Rome (Italy); Nagni, M. [INFN, Structure of Rome II, Physics Department, University of Rome II ' Tor Vergata' , I-00133 Rome (Italy); Picozza, P. [INFN, Structure of Rome II, Physics Department, University of Rome II ' Tor Vergata' , I-00133 Rome (Italy); Sparvoli, R. [INFN, Structure of Rome II, Physics Department, University of Rome II ' Tor Vergata' , I-00133 Rome (Italy); Adriani, O. [INFN, Structure of Florence, Physics Department, University of Florence, I-50019 Sesto Fiorentino (Italy); Papini, P. [INFN, Structure of Florence, Physics Department, University of Florence, I-50019 Sesto Fiorentino (Italy); Spillantini, P. [INFN, Structure of Florence, Physics Department, University of Florence, I-50019 Sesto Fiorentino (Italy); Castellini, G. [CNR-Istituto di Fisica Applicata ' Nello Carrara' , I-50127 Florence (Italy); Boezio, M. [INFN, Structure of Trieste, Physics Department, University of Trieste, I-34147 Trieste (Italy)

    2007-03-01

    The PAMELA Storage and Control Unit (PSCU) comprises a Central Processing Unit (CPU) and a Mass Memory (MM). The CPU of the experiment is based on a ERC-32 architecture (a SPARC v7 implementation) running a real time operating system (RTEMS). The main purpose of the CPU is to handle slow control, acquisition and store data on a 2 GB MM. Communications between PAMELA and the satellite are done via a 1553B bus. Data acquisition from the sub-detectors is performed via a 2 MB/s interface. Download from the PAMELA MM towards the satellite main storage unit is handled by a 16 MB/s bus. The maximum daily amount of data transmitted to ground is about 20 GB.

  14. Practical Implementation of Prestack Kirchhoff Time Migration on a General Purpose Graphics Processing Unit

    Directory of Open Access Journals (Sweden)

    Liu Guofeng

    2016-08-01

    Full Text Available In this study, we present a practical implementation of prestack Kirchhoff time migration (PSTM on a general purpose graphic processing unit. First, we consider the three main optimizations of the PSTM GPU code, i.e., designing a configuration based on a reasonable execution, using the texture memory for velocity interpolation, and the application of an intrinsic function in device code. This approach can achieve a speedup of nearly 45 times on a NVIDIA GTX 680 GPU compared with CPU code when a larger imaging space is used, where the PSTM output is a common reflection point that is gathered as I[nx][ny][nh][nt] in matrix format. However, this method requires more memory space so the limited imaging space cannot fully exploit the GPU sources. To overcome this problem, we designed a PSTM scheme with multi-GPUs for imaging different seismic data on different GPUs using an offset value. This process can achieve the peak speedup of GPU PSTM code and it greatly increases the efficiency of the calculations, but without changing the imaging result.

  15. Practical Implementation of Prestack Kirchhoff Time Migration on a General Purpose Graphics Processing Unit

    Science.gov (United States)

    Liu, Guofeng; Li, Chun

    2016-08-01

    In this study, we present a practical implementation of prestack Kirchhoff time migration (PSTM) on a general purpose graphic processing unit. First, we consider the three main optimizations of the PSTM GPU code, i.e., designing a configuration based on a reasonable execution, using the texture memory for velocity interpolation, and the application of an intrinsic function in device code. This approach can achieve a speedup of nearly 45 times on a NVIDIA GTX 680 GPU compared with CPU code when a larger imaging space is used, where the PSTM output is a common reflection point that is gathered as I[ nx][ ny][ nh][ nt] in matrix format. However, this method requires more memory space so the limited imaging space cannot fully exploit the GPU sources. To overcome this problem, we designed a PSTM scheme with multi-GPUs for imaging different seismic data on different GPUs using an offset value. This process can achieve the peak speedup of GPU PSTM code and it greatly increases the efficiency of the calculations, but without changing the imaging result.

  16. Heterogeneous Gpu&Cpu Cluster For High Performance Computing In Cryptography

    Directory of Open Access Journals (Sweden)

    Michał Marks

    2012-01-01

    Full Text Available This paper addresses issues associated with distributed computing systems andthe application of mixed GPU&CPU technology to data encryption and decryptionalgorithms. We describe a heterogenous cluster HGCC formed by twotypes of nodes: Intel processor with NVIDIA graphics processing unit and AMDprocessor with AMD graphics processing unit (formerly ATI, and a novel softwareframework that hides the heterogeneity of our cluster and provides toolsfor solving complex scientific and engineering problems. Finally, we present theresults of numerical experiments. The considered case study is concerned withparallel implementations of selected cryptanalysis algorithms. The main goal ofthe paper is to show the wide applicability of the GPU&CPU technology tolarge scale computation and data processing.

  17. Hybrid CPU/GPU Integral Engine for Strong-Scaling Ab Initio Methods.

    Science.gov (United States)

    Kussmann, Jörg; Ochsenfeld, Christian

    2017-07-11

    We present a parallel integral algorithm for two-electron contributions occurring in Hartree-Fock and hybrid density functional theory that allows for a strong scaling parallelization on inhomogeneous compute clusters. With a particular focus on graphic processing units, we show that our approach allows an efficient use of CPUs and graphics processing units (GPUs) simultaneously, although the different architectures demand conflictive strategies in order to ensure efficient program execution. Furthermore, we present a general strategy to use large basis sets like quadruple-ζ split valence on GPUs and investigate the balance between CPUs and GPUs depending on l-quantum numbers of the corresponding basis functions. Finally, we present first illustrative calculations using a hybrid CPU/GPU environment and demonstrate the strong-scaling performance of our parallelization strategy also for pure CPU-based calculations.

  18. Parallelized computation for computer simulation of electrocardiograms using personal computers with multi-core CPU and general-purpose GPU.

    Science.gov (United States)

    Shen, Wenfeng; Wei, Daming; Xu, Weimin; Zhu, Xin; Yuan, Shizhong

    2010-10-01

    Biological computations like electrocardiological modelling and simulation usually require high-performance computing environments. This paper introduces an implementation of parallel computation for computer simulation of electrocardiograms (ECGs) in a personal computer environment with an Intel CPU of Core (TM) 2 Quad Q6600 and a GPU of Geforce 8800GT, with software support by OpenMP and CUDA. It was tested in three parallelization device setups: (a) a four-core CPU without a general-purpose GPU, (b) a general-purpose GPU plus 1 core of CPU, and (c) a four-core CPU plus a general-purpose GPU. To effectively take advantage of a multi-core CPU and a general-purpose GPU, an algorithm based on load-prediction dynamic scheduling was developed and applied to setting (c). In the simulation with 1600 time steps, the speedup of the parallel computation as compared to the serial computation was 3.9 in setting (a), 16.8 in setting (b), and 20.0 in setting (c). This study demonstrates that a current PC with a multi-core CPU and a general-purpose GPU provides a good environment for parallel computations in biological modelling and simulation studies. Copyright 2010 Elsevier Ireland Ltd. All rights reserved.

  19. Improving the Performance of CPU Architectures by Reducing the Operating System Overhead (Extended Version

    Directory of Open Access Journals (Sweden)

    Zagan Ionel

    2016-07-01

    Full Text Available The predictable CPU architectures that run hard real-time tasks must be executed with isolation in order to provide a timing-analyzable execution for real-time systems. The major problems for real-time operating systems are determined by an excessive jitter, introduced mainly through task switching. This can alter deadline requirements, and, consequently, the predictability of hard real-time tasks. New requirements also arise for a real-time operating system used in mixed-criticality systems, when the executions of hard real-time applications require timing predictability. The present article discusses several solutions to improve the performance of CPU architectures and eventually overcome the Operating Systems overhead inconveniences. This paper focuses on the innovative CPU implementation named nMPRA-MT, designed for small real-time applications. This implementation uses the replication and remapping techniques for the program counter, general purpose registers and pipeline registers, enabling multiple threads to share a single pipeline assembly line. In order to increase predictability, the proposed architecture partially removes the hazard situation at the expense of larger execution latency per one instruction.

  20. Numerical Study of Geometric Multigrid Methods on CPU--GPU Heterogeneous Computers

    CERN Document Server

    Feng, Chunsheng; Xu, Jinchao; Zhang, Chen-Song

    2012-01-01

    The geometric multigrid method (GMG) is one of the most efficient solving techniques for discrete algebraic systems arising from many types of partial differential equations. GMG utilizes a hierarchy of grids or discretizations and reduces the error at a number of frequencies simultaneously. Graphics processing units (GPUs) have recently burst onto the scientific computing scene as a technology that has yielded substantial performance and energy-efficiency improvements. A central challenge in implementing GMG on GPUs, though, is that computational work on coarse levels cannot fully utilize the capacity of a GPU. In this work, we perform numerical studies of GMG on CPU--GPU heterogeneous computers. Furthermore, we compare our implementation with an efficient CPU implementation of GMG and with the most popular fast Poisson solver, Fast Fourier Transform, in the cuFFT library developed by NVIDIA.

  1. cuBLASTP: Fine-Grained Parallelization of Protein Sequence Search on CPU+GPU.

    Science.gov (United States)

    Zhang, Jing; Wang, Hao; Feng, Wu-Chun

    2017-01-01

    BLAST, short for Basic Local Alignment Search Tool, is a ubiquitous tool used in the life sciences for pairwise sequence search. However, with the advent of next-generation sequencing (NGS), whether at the outset or downstream from NGS, the exponential growth of sequence databases is outstripping our ability to analyze the data. While recent studies have utilized the graphics processing unit (GPU) to speedup the BLAST algorithm for searching protein sequences (i.e., BLASTP), these studies use coarse-grained parallelism, where one sequence alignment is mapped to only one thread. Such an approach does not efficiently utilize the capabilities of a GPU, particularly due to the irregularity of BLASTP in both execution paths and memory-access patterns. To address the above shortcomings, we present a fine-grained approach to parallelize BLASTP, where each individual phase of sequence search is mapped to many threads on a GPU. This approach, which we refer to as cuBLASTP, reorders data-access patterns and reduces divergent branches of the most time-consuming phases (i.e., hit detection and ungapped extension). In addition, cuBLASTP optimizes the remaining phases (i.e., gapped extension and alignment with trace back) on a multicore CPU and overlaps their execution with the phases running on the GPU.

  2. Interactive physically-based X-ray simulation: CPU or GPU?

    Science.gov (United States)

    Vidal, Franck P; John, Nigel W; Guillemot, Romain M

    2007-01-01

    Interventional Radiology (IR) procedures are minimally invasive, targeted treatments performed using imaging for guidance. Needle puncture using ultrasound, x-ray, or computed tomography (CT) images is a core task in the radiology curriculum, and we are currently developing a training simulator for this. One requirement is to include support for physically-based simulation of x-ray images from CT data sets. In this paper, we demonstrate how to exploit the capability of today's graphics cards to efficiently achieve this on the Graphics Processing Unit (GPU) and compare performance with an efficient software only implementation using the Central Processing Unit (CPU).

  3. GENIE: a software package for gene-gene interaction analysis in genetic association studies using multiple GPU or CPU cores.

    Science.gov (United States)

    Chikkagoudar, Satish; Wang, Kai; Li, Mingyao

    2011-05-26

    Gene-gene interaction in genetic association studies is computationally intensive when a large number of SNPs are involved. Most of the latest Central Processing Units (CPUs) have multiple cores, whereas Graphics Processing Units (GPUs) also have hundreds of cores and have been recently used to implement faster scientific software. However, currently there are no genetic analysis software packages that allow users to fully utilize the computing power of these multi-core devices for genetic interaction analysis for binary traits. Here we present a novel software package GENIE, which utilizes the power of multiple GPU or CPU processor cores to parallelize the interaction analysis. GENIE reads an entire genetic association study dataset into memory and partitions the dataset into fragments with non-overlapping sets of SNPs. For each fragment, GENIE analyzes: 1) the interaction of SNPs within it in parallel, and 2) the interaction between the SNPs of the current fragment and other fragments in parallel. We tested GENIE on a large-scale candidate gene study on high-density lipoprotein cholesterol. Using an NVIDIA Tesla C1060 graphics card, the GPU mode of GENIE achieves a speedup of 27 times over its single-core CPU mode run. GENIE is open-source, economical, user-friendly, and scalable. Since the computing power and memory capacity of graphics cards are increasing rapidly while their cost is going down, we anticipate that GENIE will achieve greater speedups with faster GPU cards. Documentation, source code, and precompiled binaries can be downloaded from http://www.cceb.upenn.edu/~mli/software/GENIE/.

  4. An Improved Round Robin Scheduling Algorithm for CPU scheduling

    Directory of Open Access Journals (Sweden)

    Rakesh Kumar yadav

    2010-07-01

    Full Text Available There are many functions which are provided by operating system like process management, memory management, file management, input/outputmanagement, networking, protection system and command interpreter system. In these functions, the process management is most important function because operating system is a system program that means at the runtime process interact with hardware. Therefore, we can say that for improving the efficiency of a CPU we need to manage all process. For managing the process we use various types scheduling algorithm. There are many algorithm are available for CPU scheduling. But all algorithms have its own deficiency and limitations. In this paper, I proposed a new approach for round robin scheduling algorithm which helps to improve the efficiency of CPU.

  5. A CPU/MIC Collaborated Parallel Framework for GROMACS on Tianhe-2 Supercomputer.

    Science.gov (United States)

    Peng, Shaoliang; Yang, Shunyun; Su, Wenhe; Zhang, Xiaoyu; Zhang, Tenglilang; Liu, Weiguo; Zhao, Xingming

    2017-06-16

    Molecular Dynamics (MD) is the simulation of the dynamic behavior of atoms and molecules. As the most popular software for molecular dynamics, GROMACS cannot work on large-scale data because of limit computing resources. In this paper, we propose a CPU and Intel® Xeon Phi Many Integrated Core (MIC) collaborated parallel framework to accelerate GROMACS using the offload mode on a MIC coprocessor, with which the performance of GROMACS is improved significantly, especially with the utility of Tianhe-2 supercomputer. Furthermore, we optimize GROMACS so that it can run on both the CPU and MIC at the same time. In addition, we accelerate multi-node GROMACS so that it can be used in practice. Benchmarking on real data, our accelerated GROMACS performs very well and reduces computation time significantly. Source code: https://github.com/tianhe2/gromacs-mic.

  6. Multi-core CPU or GPU-accelerated Multiscale Modeling for Biomolecular Complexes.

    Science.gov (United States)

    Liao, Tao; Zhang, Yongjie; Kekenes-Huskey, Peter M; Cheng, Yuhui; Michailova, Anushka; McCulloch, Andrew D; Holst, Michael; McCammon, J Andrew

    2013-07-01

    Multi-scale modeling plays an important role in understanding the structure and biological functionalities of large biomolecular complexes. In this paper, we present an efficient computational framework to construct multi-scale models from atomic resolution data in the Protein Data Bank (PDB), which is accelerated by multi-core CPU and programmable Graphics Processing Units (GPU). A multi-level summation of Gaus-sian kernel functions is employed to generate implicit models for biomolecules. The coefficients in the summation are designed as functions of the structure indices, which specify the structures at a certain level and enable a local resolution control on the biomolecular surface. A method called neighboring search is adopted to locate the grid points close to the expected biomolecular surface, and reduce the number of grids to be analyzed. For a specific grid point, a KD-tree or bounding volume hierarchy is applied to search for the atoms contributing to its density computation, and faraway atoms are ignored due to the decay of Gaussian kernel functions. In addition to density map construction, three modes are also employed and compared during mesh generation and quality improvement to generate high quality tetrahedral meshes: CPU sequential, multi-core CPU parallel and GPU parallel. We have applied our algorithm to several large proteins and obtained good results.

  7. Multi-GPU and multi-CPU accelerated FDTD scheme for vibroacoustic applications

    Science.gov (United States)

    Francés, J.; Otero, B.; Bleda, S.; Gallego, S.; Neipp, C.; Márquez, A.; Beléndez, A.

    2015-06-01

    The Finite-Difference Time-Domain (FDTD) method is applied to the analysis of vibroacoustic problems and to study the propagation of longitudinal and transversal waves in a stratified media. The potential of the scheme and the relevance of each acceleration strategy for massively computations in FDTD are demonstrated in this work. In this paper, we propose two new specific implementations of the bi-dimensional scheme of the FDTD method using multi-CPU and multi-GPU, respectively. In the first implementation, an open source message passing interface (OMPI) has been included in order to massively exploit the resources of a biprocessor station with two Intel Xeon processors. Moreover, regarding CPU code version, the streaming SIMD extensions (SSE) and also the advanced vectorial extensions (AVX) have been included with shared memory approaches that take advantage of the multi-core platforms. On the other hand, the second implementation called the multi-GPU code version is based on Peer-to-Peer communications available in CUDA on two GPUs (NVIDIA GTX 670). Subsequently, this paper presents an accurate analysis of the influence of the different code versions including shared memory approaches, vector instructions and multi-processors (both CPU and GPU) and compares them in order to delimit the degree of improvement of using distributed solutions based on multi-CPU and multi-GPU. The performance of both approaches was analysed and it has been demonstrated that the addition of shared memory schemes to CPU computing improves substantially the performance of vector instructions enlarging the simulation sizes that use efficiently the cache memory of CPUs. In this case GPU computing is slightly twice times faster than the fine tuned CPU version in both cases one and two nodes. However, for massively computations explicit vector instructions do not worth it since the memory bandwidth is the limiting factor and the performance tends to be the same than the sequential version

  8. Temperature of the Central Processing Unit

    Directory of Open Access Journals (Sweden)

    Ivan Lavrov

    2016-10-01

    Full Text Available Heat is inevitably generated in the semiconductors during operation. Cooling in a computer, and in its main part – the Central Processing Unit (CPU, is crucial, allowing the proper functioning without overheating, malfunctioning, and damage. In order to estimate the temperature as a function of time, it is important to solve the differential equations describing the heat flow and to understand how it depends on the physical properties of the system. This project aims to answer these questions by considering a simplified model of the CPU + heat sink. A similarity with the electrical circuit and certain methods from electrical circuit analysis are discussed.

  9. Use of general purpose graphics processing units with MODFLOW.

    Science.gov (United States)

    Hughes, Joseph D; White, Jeremy T

    2013-01-01

    To evaluate the use of general-purpose graphics processing units (GPGPUs) to improve the performance of MODFLOW, an unstructured preconditioned conjugate gradient (UPCG) solver has been developed. The UPCG solver uses a compressed sparse row storage scheme and includes Jacobi, zero fill-in incomplete, and modified-incomplete lower-upper (LU) factorization, and generalized least-squares polynomial preconditioners. The UPCG solver also includes options for sequential and parallel solution on the central processing unit (CPU) using OpenMP. For simulations utilizing the GPGPU, all basic linear algebra operations are performed on the GPGPU; memory copies between the central processing unit CPU and GPCPU occur prior to the first iteration of the UPCG solver and after satisfying head and flow criteria or exceeding a maximum number of iterations. The efficiency of the UPCG solver for GPGPU and CPU solutions is benchmarked using simulations of a synthetic, heterogeneous unconfined aquifer with tens of thousands to millions of active grid cells. Testing indicates GPGPU speedups on the order of 2 to 8, relative to the standard MODFLOW preconditioned conjugate gradient (PCG) solver, can be achieved when (1) memory copies between the CPU and GPGPU are optimized, (2) the percentage of time performing memory copies between the CPU and GPGPU is small relative to the calculation time, (3) high-performance GPGPU cards are utilized, and (4) CPU-GPGPU combinations are used to execute sequential operations that are difficult to parallelize. Furthermore, UPCG solver testing indicates GPGPU speedups exceed parallel CPU speedups achieved using OpenMP on multicore CPUs for preconditioners that can be easily parallelized. Published 2013. This article is a U.S. Government work and is in the public domain in the USA.

  10. MOIL-opt: Energy-Conserving Molecular Dynamics on a GPU/CPU system.

    Science.gov (United States)

    Ruymgaart, A Peter; Cardenas, Alfredo E; Elber, Ron

    2011-08-26

    We report an optimized version of the molecular dynamics program MOIL that runs on a shared memory system with OpenMP and exploits the power of a Graphics Processing Unit (GPU). The model is of heterogeneous computing system on a single node with several cores sharing the same memory and a GPU. This is a typical laboratory tool, which provides excellent performance at minimal cost. Besides performance, emphasis is made on accuracy and stability of the algorithm probed by energy conservation for explicit-solvent atomically-detailed-models. Especially for long simulations energy conservation is critical due to the phenomenon known as "energy drift" in which energy errors accumulate linearly as a function of simulation time. To achieve long time dynamics with acceptable accuracy the drift must be particularly small. We identify several means of controlling long-time numerical accuracy while maintaining excellent speedup. To maintain a high level of energy conservation SHAKE and the Ewald reciprocal summation are run in double precision. Double precision summation of real-space non-bonded interactions improves energy conservation. In our best option, the energy drift using 1fs for a time step while constraining the distances of all bonds, is undetectable in 10ns simulation of solvated DHFR (Dihydrofolate reductase). Faster options, shaking only bonds with hydrogen atoms, are also very well behaved and have drifts of less than 1kcal/mol per nanosecond of the same system. CPU/GPU implementations require changes in programming models. We consider the use of a list of neighbors and quadratic versus linear interpolation in lookup tables of different sizes. Quadratic interpolation with a smaller number of grid points is faster than linear lookup tables (with finer representation) without loss of accuracy. Atomic neighbor lists were found most efficient. Typical speedups are about a factor of 10 compared to a single-core single-precision code.

  11. Promise of a Low Power Mobile CPU based Embedded System in Artificial Leg Control

    Science.gov (United States)

    Hernandez, Robert; Zhang, Fan; Zhang, Xiaorong; Huang, He; Yang, Qing

    2013-01-01

    This paper presents the design and implementation of a low power embedded system using mobile processor technology (Intel Atom™ Z530 Processor) specifically tailored for a neural-machine interface (NMI) for artificial limbs. This embedded system effectively performs our previously developed NMI algorithm based on neuromuscular-mechanical fusion and phase-dependent pattern classification. The analysis shows that NMI embedded system can meet real-time constraints with high accuracies for recognizing the user's locomotion mode. Our implementation utilizes the mobile processor efficiently to allow a power consumption of 2.2 watts and low CPU utilization (less than 4.3%) while executing the complex NMI algorithm. Our experiments have shown that the highly optimized C program implementation on the embedded system has superb advantages over existing PC implementations on MATLAB. The study results suggest that mobile-CPU-based embedded system is promising for implementing advanced control for powered lower limb prostheses. PMID:23367113

  12. Promise of a low power mobile CPU based embedded system in artificial leg control.

    Science.gov (United States)

    Hernandez, Robert; Zhang, Fan; Zhang, Xiaorong; Huang, He; Yang, Qing

    2012-01-01

    This paper presents the design and implementation of a low power embedded system using mobile processor technology (Intel Atom™ Z530 Processor) specifically tailored for a neural-machine interface (NMI) for artificial limbs. This embedded system effectively performs our previously developed NMI algorithm based on neuromuscular-mechanical fusion and phase-dependent pattern classification. The analysis shows that NMI embedded system can meet real-time constraints with high accuracies for recognizing the user's locomotion mode. Our implementation utilizes the mobile processor efficiently to allow a power consumption of 2.2 watts and low CPU utilization (less than 4.3%) while executing the complex NMI algorithm. Our experiments have shown that the highly optimized C program implementation on the embedded system has superb advantages over existing PC implementations on MATLAB. The study results suggest that mobile-CPU-based embedded system is promising for implementing advanced control for powered lower limb prostheses.

  13. Pipelined CPU Design with FPGA in Teaching Computer Architecture

    Science.gov (United States)

    Lee, Jong Hyuk; Lee, Seung Eun; Yu, Heon Chang; Suh, Taeweon

    2012-01-01

    This paper presents a pipelined CPU design project with a field programmable gate array (FPGA) system in a computer architecture course. The class project is a five-stage pipelined 32-bit MIPS design with experiments on the Altera DE2 board. For proper scheduling, milestones were set every one or two weeks to help students complete the project on…

  14. The Effect of NUMA Tunings on CPU Performance

    Science.gov (United States)

    Hollowell, Christopher; Caramarcu, Costin; Strecker-Kellogg, William; Wong, Antonio; Zaytsev, Alexandr

    2015-12-01

    Non-Uniform Memory Access (NUMA) is a memory architecture for symmetric multiprocessing (SMP) systems where each processor is directly connected to separate memory. Indirect access to other CPU's (remote) RAM is still possible, but such requests are slower as they must also pass through that memory's controlling CPU. In concert with a NUMA-aware operating system, the NUMA hardware architecture can help eliminate the memory performance reductions generally seen in SMP systems when multiple processors simultaneously attempt to access memory. The x86 CPU architecture has supported NUMA for a number of years. Modern operating systems such as Linux support NUMA-aware scheduling, where the OS attempts to schedule a process to the CPU directly attached to the majority of its RAM. In Linux, it is possible to further manually tune the NUMA subsystem using the numactl utility. With the release of Red Hat Enterprise Linux (RHEL) 6.3, the numad daemon became available in this distribution. This daemon monitors a system's NUMA topology and utilization, and automatically makes adjustments to optimize locality. As the number of cores in x86 servers continues to grow, efficient NUMA mappings of processes to CPUs/memory will become increasingly important. This paper gives a brief overview of NUMA, and discusses the effects of manual tunings and numad on the performance of the HEPSPEC06 benchmark, and ATLAS software.

  15. CPU and Cache Efficient Management of Memory-Resident Databases

    NARCIS (Netherlands)

    H. Pirk (Holger); F. Funke; M. Grund; T. Neumann (Thomas); U. Leser; S. Manegold (Stefan); A. Kemper; M.L. Kersten (Martin)

    2013-01-01

    htmlabstractMemory-Resident Database Management Systems (MRDBMS) have to be optimized for two resources: CPU cycles and memory bandwidth. To optimize for bandwidth in mixed OLTP/OLAP scenarios, the hybrid or Partially Decomposed Storage Model (PDSM) has been proposed. However, in current implementat

  16. Pipelined CPU Design with FPGA in Teaching Computer Architecture

    Science.gov (United States)

    Lee, Jong Hyuk; Lee, Seung Eun; Yu, Heon Chang; Suh, Taeweon

    2012-01-01

    This paper presents a pipelined CPU design project with a field programmable gate array (FPGA) system in a computer architecture course. The class project is a five-stage pipelined 32-bit MIPS design with experiments on the Altera DE2 board. For proper scheduling, milestones were set every one or two weeks to help students complete the project on…

  17. CPU and cache efficient management of memory-resident databases

    NARCIS (Netherlands)

    Pirk, H.; Funke, F.; Grund, M.; Neumann, T.; Leser, U.; Manegold, S.; Kemper, A.; Kersten, M.L.

    2013-01-01

    Memory-Resident Database Management Systems (MRDBMS) have to be optimized for two resources: CPU cycles and memory bandwidth. To optimize for bandwidth in mixed OLTP/OLAP scenarios, the hybrid or Partially Decomposed Storage Model (PDSM) has been proposed. However, in current implementations, bandwi

  18. Mean shifts, unit roots and forecasting seasonal time series

    OpenAIRE

    Franses, Philip Hans; Paap, Richard; Hoek, Henk

    1997-01-01

    textabstractExamples of descriptive models for changing seasonal patterns in economic time series are autoregressive models with seasonal unit roots or with deterministic seasonal mean shifts. In this paper we show through a forecasting comparison for three macroeconomic time series (for which tests indicate the presence of seasonal unit roots) that allowing for possible seasonal mean shifts can improve forecast performance. Next, by means of simulation we demonstrate the impact of imposing a...

  19. On the cost of approximating and recognizing a noise perturbed straight line or a quadratic curve segment in the plane. [central processing units

    Science.gov (United States)

    Cooper, D. B.; Yalabik, N.

    1975-01-01

    Approximation of noisy data in the plane by straight lines or elliptic or single-branch hyperbolic curve segments arises in pattern recognition, data compaction, and other problems. The efficient search for and approximation of data by such curves were examined. Recursive least-squares linear curve-fitting was used, and ellipses and hyperbolas are parameterized as quadratic functions in x and y. The error minimized by the algorithm is interpreted, and central processing unit (CPU) times for estimating parameters for fitting straight lines and quadratic curves were determined and compared. CPU time for data search was also determined for the case of straight line fitting. Quadratic curve fitting is shown to require about six times as much CPU time as does straight line fitting, and curves relating CPU time and fitting error were determined for straight line fitting. Results are derived on early sequential determination of whether or not the underlying curve is a straight line.

  20. Research on the Prediction Model of CPU Utilization Based on ARIMA-BP Neural Network

    Directory of Open Access Journals (Sweden)

    Wang Jina

    2016-01-01

    Full Text Available The dynamic deployment technology of the virtual machine is one of the current cloud computing research focuses. The traditional methods mainly work after the degradation of the service performance that usually lag. To solve the problem a new prediction model based on the CPU utilization is constructed in this paper. A reference offered by the new prediction model of the CPU utilization is provided to the VM dynamic deployment process which will speed to finish the deployment process before the degradation of the service performance. By this method it not only ensure the quality of services but also improve the server performance and resource utilization. The new prediction method of the CPU utilization based on the ARIMA-BP neural network mainly include four parts: preprocess the collected data, build the predictive model of ARIMA-BP neural network, modify the nonlinear residuals of the time series by the BP prediction algorithm and obtain the prediction results by analyzing the above data comprehensively.

  1. Time Zones of the United States - Direct Download

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — This map layer portrays the time zones of the United States, Puerto Rico, and the U.S. Virgin Islands. Also included are the Greenwich Mean Time offset, and areas in...

  2. Mean shifts, unit roots and forecasting seasonal time series

    NARCIS (Netherlands)

    Ph.H.B.F. Franses (Philip Hans); R. Paap (Richard); H. Hoek (Henk)

    1997-01-01

    textabstractExamples of descriptive models for changing seasonal patterns in economic time series are autoregressive models with seasonal unit roots or with deterministic seasonal mean shifts. In this paper we show through a forecasting comparison for three macroeconomic time series (for which tests

  3. Mean shifts, unit roots and forecasting seasonal time series

    NARCIS (Netherlands)

    Ph.H.B.F. Franses (Philip Hans); R. Paap (Richard); H. Hoek (Henk)

    1997-01-01

    textabstractExamples of descriptive models for changing seasonal patterns in economic time series are autoregressive models with seasonal unit roots or with deterministic seasonal mean shifts. In this paper we show through a forecasting comparison for three macroeconomic time series (for which tests

  4. OSPRay - A CPU Ray Tracing Framework for Scientific Visualization.

    Science.gov (United States)

    Wald, I; Johnson, G P; Amstutz, J; Brownlee, C; Knoll, A; Jeffers, J; Gunther, J; Navratil, P

    2017-01-01

    Scientific data is continually increasing in complexity, variety and size, making efficient visualization and specifically rendering an ongoing challenge. Traditional rasterization-based visualization approaches encounter performance and quality limitations, particularly in HPC environments without dedicated rendering hardware. In this paper, we present OSPRay, a turn-key CPU ray tracing framework oriented towards production-use scientific visualization which can utilize varying SIMD widths and multiple device backends found across diverse HPC resources. This framework provides a high-quality, efficient CPU-based solution for typical visualization workloads, which has already been integrated into several prevalent visualization packages. We show that this system delivers the performance, high-level API simplicity, and modular device support needed to provide a compelling new rendering framework for implementing efficient scientific visualization workflows.

  5. Development of a GPU and multi-CPU accelerated non-isothermal, multiphase, incompressible Navier-Stokes solver with phase-change

    Science.gov (United States)

    Forster, Christopher J.; Glezer, Ari; Smith, Marc K.

    2012-11-01

    Accurate 3D boiling simulations often use excessive computational resources - in many cases taking several weeks or months to solve. To alleviate this problem, a parallelized, multiphase fluid solver using a particle level-set (PLS) method was implemented. The PLS method offers increased accuracy in interface location tracking, the ability to capture sharp interfacial features with minimal numerical diffusion, and significantly improved mass conservation. The independent nature of the particles is amenable to parallelization using graphics processing unit (GPU) and multi-CPU implementations, since each particle can be updated simultaneously. The present work will explore the speedup provided by GPU and multi-CPU implementations and determine the effectiveness of PLS for accurately capturing sharp interfacial features. The numerical model will be validated by comparison to experimental data for vibration-induced droplet atomization. Further development will add the physics of boiling in the presence of acoustic fields. It is hoped that the resultant boiling simulations will be sufficiently improved to allow for optimization studies of various boiling configurations to be performed in a timely manner. Supported by ONR.

  6. 一种基于CPU+GPU的AVS视频并行编码方法%Parallel Implementation of AVS Video Encoder Based on CPU+GPU

    Institute of Scientific and Technical Information of China (English)

    邹彬彬; 梁凡

    2013-01-01

    The video standard of audio video coding standard (AVS) has high compression performance and good network flexibility,which can be used in widespread applications of digital video.To accelerate the AVS encoding for the real-time implement of AVS encoder is an important issue.A parallel implementation of AVS video encoder based on CPU and GPU is proposed,in which motion estimation,integer transform and quantization are computed by a GPU.Experimental results show that the proposed method can achieve realtime encoding for 1 920×1 080 video sequences.%音视频编码标准(audio video coding standard,AVS)中的视频标准具有较高的压缩性能以及较好的网络适应性,能满足在数字视频领域广泛应用的需求.提高AVS视频编码的速度、实现实时编码是满足数字视频应用需求的重要环节.提出了一种基于CPU+GPU的AVS视频并行编码方法,利用GPU完成编码器的运动估值以及整数变换和量化.实验结果表明,该方法能实现对1920× 1080分辨率视频的实时编码.

  7. 47 CFR 15.32 - Test procedures for CPU boards and computer power supplies.

    Science.gov (United States)

    2010-10-01

    ... 47 Telecommunication 1 2010-10-01 2010-10-01 false Test procedures for CPU boards and computer... FREQUENCY DEVICES General § 15.32 Test procedures for CPU boards and computer power supplies. Power supplies and CPU boards used with personal computers and for which separate authorizations are required to be...

  8. Performance of the OVERFLOW-MLP and LAURA-MLP CFD Codes on the NASA Ames 512 CPU Origin System

    Science.gov (United States)

    Taft, James R.

    2000-01-01

    aircraft are routinely undertaken. Typical large problems might require 100s of Cray C90 CPU hours to complete. The dramatic performance gains with the 256 CPU steger system are exciting. Obtaining results in hours instead of months is revolutionizing the way in which aircraft manufacturers are looking at future aircraft simulation work. Figure 2 below is a current state of the art plot of OVERFLOW-MLP performance on the 512 CPU Lomax system. As can be seen, the chart indicates that OVERFLOW-MLP continues to scale linearly with CPU count up to 512 CPUs on a large 35 million point full aircraft RANS simulation. At this point performance is such that a fully converged simulation of 2500 time steps is completed in less than 2 hours of elapsed time. Further work over the next few weeks will improve the performance of this code even further.The LAURA code has been converted to the MLP format as well. This code is currently being optimized for the 512 CPU system. Performance statistics indicate that the goal of 100 GFLOP/s will be achieved by year's end. This amounts to 20x the 16 CPU C90 result and strongly demonstrates the viability of the new parallel systems rapidly solving very large simulations in a production environment.

  9. Survey of CPU/GPU Synergetic Parallel Computing%CPU/GPU协同并行计算研究综述

    Institute of Scientific and Technical Information of China (English)

    卢风顺; 宋君强; 银福康; 张理论

    2011-01-01

    CPU/GPU异构混合并行系统以其强劲计算能力、高性价比和低能耗等特点成为新型高性能计算平台,但其复杂体系结构为并行计算研究提出了巨大挑战.CPU/GPU协同并行计算属于新兴研究领域,是一个开放的课题.根据所用计算资源的规模将CPU/GPU协同并行计算研究划分为三类,尔后从立项依据、研究内容和研究方法等方面重点介绍了几个混合计算项目,并指出了可进一步研究的方向,以期为领域科学家进行协同并行计算研究提供一定参考.%With the features of tremendous capability, high performance/price ratio and low power, the heterogeneous hybrid CPU/GPU parallel systems have become the new high performance computing platforms. However, the architecture complexity of the hybrid system poses many challenges on the parallel algorithms design on the infrastructure. According to the scale of computational resources involved in the synergetic parallel computing, we classified the recent researches into three categories, detailed the motivations, methodologies and applications of several projects, and discussed some on-going research issues in this direction in the end. We hope the domain experts can gain useful information about synergetic parallel computing from this work.

  10. Performance analysis of the FDTD method applied to holographic volume gratings: Multi-core CPU versus GPU computing

    Science.gov (United States)

    Francés, J.; Bleda, S.; Neipp, C.; Márquez, A.; Pascual, I.; Beléndez, A.

    2013-03-01

    The finite-difference time-domain method (FDTD) allows electromagnetic field distribution analysis as a function of time and space. The method is applied to analyze holographic volume gratings (HVGs) for the near-field distribution at optical wavelengths. Usually, this application requires the simulation of wide areas, which implies more memory and time processing. In this work, we propose a specific implementation of the FDTD method including several add-ons for a precise simulation of optical diffractive elements. Values in the near-field region are computed considering the illumination of the grating by means of a plane wave for different angles of incidence and including absorbing boundaries as well. We compare the results obtained by FDTD with those obtained using a matrix method (MM) applied to diffraction gratings. In addition, we have developed two optimized versions of the algorithm, for both CPU and GPU, in order to analyze the improvement of using the new NVIDIA Fermi GPU architecture versus highly tuned multi-core CPU as a function of the size simulation. In particular, the optimized CPU implementation takes advantage of the arithmetic and data transfer streaming SIMD (single instruction multiple data) extensions (SSE) included explicitly in the code and also of multi-threading by means of OpenMP directives. A good agreement between the results obtained using both FDTD and MM methods is obtained, thus validating our methodology. Moreover, the performance of the GPU is compared to the SSE+OpenMP CPU implementation, and it is quantitatively determined that a highly optimized CPU program can be competitive for a wider range of simulation sizes, whereas GPU computing becomes more powerful for large-scale simulations.

  11. Fast computation of myelin maps from MRI T₂ relaxation data using multicore CPU and graphics card parallelization.

    Science.gov (United States)

    Yoo, Youngjin; Prasloski, Thomas; Vavasour, Irene; MacKay, Alexander; Traboulsee, Anthony L; Li, David K B; Tam, Roger C

    2015-03-01

    To develop a fast algorithm for computing myelin maps from multiecho T2 relaxation data using parallel computation with multicore CPUs and graphics processing units (GPUs). Using an existing MATLAB (MathWorks, Natick, MA) implementation with basic (nonalgorithm-specific) parallelism as a guide, we developed a new version to perform the same computations but using C++ to optimize the hybrid utilization of multicore CPUs and GPUs, based on experimentation to determine which algorithmic components would benefit from CPU versus GPU parallelization. Using 32-echo T2 data of dimensions 256 × 256 × 7 from 17 multiple sclerosis patients and 18 healthy subjects, we compared the two methods in terms of speed, myelin values, and the ability to distinguish between the two patient groups using Student's t-tests. The new method was faster than the MATLAB implementation by 4.13 times for computing a single map and 14.36 times for batch-processing 10 scans. The two methods produced very similar myelin values, with small and explainable differences that did not impact the ability to distinguish the two patient groups. The proposed hybrid multicore approach represents a more efficient alternative to MATLAB, especially for large-scale batch processing. © 2014 Wiley Periodicals, Inc.

  12. CMSA: a heterogeneous CPU/GPU computing system for multiple similar RNA/DNA sequence alignment.

    Science.gov (United States)

    Chen, Xi; Wang, Chen; Tang, Shanjiang; Yu, Ce; Zou, Quan

    2017-06-24

    The multiple sequence alignment (MSA) is a classic and powerful technique for sequence analysis in bioinformatics. With the rapid growth of biological datasets, MSA parallelization becomes necessary to keep its running time in an acceptable level. Although there are a lot of work on MSA problems, their approaches are either insufficient or contain some implicit assumptions that limit the generality of usage. First, the information of users' sequences, including the sizes of datasets and the lengths of sequences, can be of arbitrary values and are generally unknown before submitted, which are unfortunately ignored by previous work. Second, the center star strategy is suited for aligning similar sequences. But its first stage, center sequence selection, is highly time-consuming and requires further optimization. Moreover, given the heterogeneous CPU/GPU platform, prior studies consider the MSA parallelization on GPU devices only, making the CPUs idle during the computation. Co-run computation, however, can maximize the utilization of the computing resources by enabling the workload computation on both CPU and GPU simultaneously. This paper presents CMSA, a robust and efficient MSA system for large-scale datasets on the heterogeneous CPU/GPU platform. It performs and optimizes multiple sequence alignment automatically for users' submitted sequences without any assumptions. CMSA adopts the co-run computation model so that both CPU and GPU devices are fully utilized. Moreover, CMSA proposes an improved center star strategy that reduces the time complexity of its center sequence selection process from O(mn (2)) to O(mn). The experimental results show that CMSA achieves an up to 11× speedup and outperforms the state-of-the-art software. CMSA focuses on the multiple similar RNA/DNA sequence alignment and proposes a novel bitmap based algorithm to improve the center star strategy. We can conclude that harvesting the high performance of modern GPU is a promising approach to

  13. Changing Times: The Use of Reduced Work Time Options in the United States.

    Science.gov (United States)

    Olmsted, Barney

    1983-01-01

    This article addresses the increase in voluntary reduced work time arrangements that have developed in the United States in response to growing interest in alternatives to the standardized approach to scheduling. Permanent part-time employment, job sharing, and voluntary reduced work time plans are defined, described and, to a limited extent,…

  14. Self-Reconfiguration of CPU- Enhancement in the Performance

    Directory of Open Access Journals (Sweden)

    Prashant Singh Yadav

    2012-03-01

    Full Text Available This article presents the initial steps toward a distributed system that can optimize its performance by learning to reconfigure CPU and memory resources in reaction to current workload. We present a learning framework that uses standard system-monitoring tools to identify preferable configurations and their quantitative performance effects. The framework requires no instrumentation of the middleware or of the operating system. Using results from an implementation of the TPC Benchmark™ W (TPC-W online transaction-processing benchmark, we demonstrate a significant performance benefit to reconfiguration in response to workload changes.

  15. Parametric time delay modeling for floating point units

    Science.gov (United States)

    Fahmy, Hossam A. H.; Liddicoat, Albert A.; Flynn, Michael J.

    2002-12-01

    A parametric time delay model to compare floating point unit implementations is proposed. This model is used to compare a previously proposed floating point adder using a redundant number representation with other high-performance implementations. The operand width, the fan-in of the logic gates and the radix of the redundant format are used as parameters to the model. The comparison is done over a range of operand widths, fan-in and radices to show the merits of each implementation.

  16. Joint Optimized CPU and Networking Control Scheme for Improved Energy Efficiency in Video Streaming on Mobile Devices

    Directory of Open Access Journals (Sweden)

    Sung-Woong Jo

    2017-01-01

    Full Text Available Video streaming service is one of the most popular applications for mobile users. However, mobile video streaming services consume a lot of energy, resulting in a reduced battery life. This is a critical problem that results in a degraded user’s quality of experience (QoE. Therefore, in this paper, a joint optimization scheme that controls both the central processing unit (CPU and wireless networking of the video streaming process for improved energy efficiency on mobile devices is proposed. For this purpose, the energy consumption of the network interface and CPU is analyzed, and based on the energy consumption profile a joint optimization problem is formulated to maximize the energy efficiency of the mobile device. The proposed algorithm adaptively adjusts the number of chunks to be downloaded and decoded in each packet. Simulation results show that the proposed algorithm can effectively improve the energy efficiency when compared with the existing algorithms.

  17. Molecular Dynamics Simulation of Macromolecules Using Graphics Processing Unit

    CERN Document Server

    Xu, Ji; Ge, Wei; Yu, Xiang; Yang, Xiaozhen; Li, Jinghai

    2010-01-01

    Molecular dynamics (MD) simulation is a powerful computational tool to study the behavior of macromolecular systems. But many simulations of this field are limited in spatial or temporal scale by the available computational resource. In recent years, graphics processing unit (GPU) provides unprecedented computational power for scientific applications. Many MD algorithms suit with the multithread nature of GPU. In this paper, MD algorithms for macromolecular systems that run entirely on GPU are presented. Compared to the MD simulation with free software GROMACS on a single CPU core, our codes achieve about 10 times speed-up on a single GPU. For validation, we have performed MD simulations of polymer crystallization on GPU, and the results observed perfectly agree with computations on CPU. Therefore, our single GPU codes have already provided an inexpensive alternative for macromolecular simulations on traditional CPU clusters and they can also be used as a basis to develop parallel GPU programs to further spee...

  18. Polymer Field-Theory Simulations on Graphics Processing Units

    CERN Document Server

    Delaney, Kris T

    2012-01-01

    We report the first CUDA graphics-processing-unit (GPU) implementation of the polymer field-theoretic simulation framework for determining fully fluctuating expectation values of equilibrium properties for periodic and select aperiodic polymer systems. Our implementation is suitable both for self-consistent field theory (mean-field) solutions of the field equations, and for fully fluctuating simulations using the complex Langevin approach. Running on NVIDIA Tesla T20 series GPUs, we find double-precision speedups of up to 30x compared to single-core serial calculations on a recent reference CPU, while single-precision calculations proceed up to 60x faster than those on the single CPU core. Due to intensive communications overhead, an MPI implementation running on 64 CPU cores remains two times slower than a single GPU.

  19. Liquid Cooling System for CPU by Electroconjugate Fluid

    Directory of Open Access Journals (Sweden)

    Yasuo Sakurai

    2014-06-01

    Full Text Available The dissipated power of CPU for personal computer has been increased because the performance of personal computer becomes higher. Therefore, a liquid cooling system has been employed in some personal computers in order to improve their cooling performance. Electroconjugate fluid (ECF is one of the functional fluids. ECF has a remarkable property that a strong jet flow is generated between electrodes when a high voltage is applied to ECF through the electrodes. By using this strong jet flow, an ECF-pump with simple structure, no sliding portion, no noise, and no vibration seems to be able to be developed. And then, by the use of the ECF-pump, a new liquid cooling system by ECF seems to be realized. In this study, to realize this system, an ECF-pump is proposed and fabricated to investigate the basic characteristics of the ECF-pump experimentally. Next, by utilizing the ECF-pump, a model of a liquid cooling system by ECF is manufactured and some experiments are carried out to investigate the performance of this system. As a result, by using this system, the temperature of heat source of 50 W is kept at 60°C or less. In general, CPU is usually used at this temperature or less.

  20. Designing of Vague Logic Based 2-Layered Framework for CPU Scheduler

    Directory of Open Access Journals (Sweden)

    Supriya Raheja

    2016-01-01

    Full Text Available Fuzzy based CPU scheduler has become of great interest by operating system because of its ability to handle imprecise information associated with task. This paper introduces an extension to the fuzzy based round robin scheduler to a Vague Logic Based Round Robin (VBRR scheduler. VBRR scheduler works on 2-layered framework. At the first layer, scheduler has a vague inference system which has the ability to handle the impreciseness of task using vague logic. At the second layer, Vague Logic Based Round Robin (VBRR scheduling algorithm works to schedule the tasks. VBRR scheduler has the learning capability based on which scheduler adapts intelligently an optimum length for time quantum. An optimum time quantum reduces the overhead on scheduler by reducing the unnecessary context switches which lead to improve the overall performance of system. The work is simulated using MATLAB and compared with the conventional round robin scheduler and the other two fuzzy based approaches to CPU scheduler. Given simulation analysis and results prove the effectiveness and efficiency of VBRR scheduler.

  1. Influence of the compiler on multi-CPU performance of WRFv3

    Directory of Open Access Journals (Sweden)

    T. Langkamp

    2011-07-01

    Full Text Available The Weather Research and Forecasting system version 3 (WRFv3 is an open source and state of the art numerical Regional Climate Model used in climate related sciences. These years the model has been successfully optimized on a wide variety of clustered compute nodes connected with high speed interconnects. This is currently the most used hardware architecture for high-performance computing (Shainer et al., 2009. As such, understanding the influence of hardware like the CPU, its interconnects, or the software on WRFs performance is crucial for saving computing time. This is important because computing time in general is rare, resource intensive, and hence very expensive.

    This paper evaluates the influence of different compilers on WRFs performance, which was found to differ up to 26 %. The paper also evaluates the performance of different Message Passing Interface library versions, a software which is needed for multi CPU runs, and of different WRF versions. Both showed no significant influence on the performance for this test case on the used High Performance Cluster (HPC hardware.

    Emphasis is also laid on the applied non-standard method of performance measuring, which was required because of performance fluctuations between identical runs on the used HPC. Those are caused by contention for network resources, a phenomenon examined for many HPCs (Wright et al., 2009.

  2. Influence of the compiler on multi-CPU performance of WRFv3

    Directory of Open Access Journals (Sweden)

    T. Langkamp

    2011-03-01

    Full Text Available The Weather Research and Forecasting system version 3 (WRFv3 is an open source and state of the art numerical regional climate model used in climate related sciences. Over the years the model has been successfully optimized on a wide variety of clustered compute nodes connected with high speed interconnects. This is currently the most used hardware architecture for high-performance computing. As such, understanding WRFs dependency on the various hardware elements like the CPU, its interconnects, and the software is crucial for saving computing time. This is important because computing time in general is rare, resource intensive, and hence very expensive.

    This paper evaluates the influence of different compilers on WRFs performance, which was found to differ up to 26%. The paper also evaluates the performance of different message passing interface library versions, a software which is needed for multi CPU runs, and of different WRF versions. Both showed no significant influence on the performance for this test case on the used High Performance Cluster (HPC hardware.

    Some emphasis is also laid on the applied non-standard method of performance measuring, which was required because of performance fluctuations between identical runs on the used HPC. Those are caused by contention for network resources, a phenomenon examined for many HPCs.

  3. Deferred High Level Trigger in LHCb: A Boost to CPU Resource Utilization

    CERN Document Server

    Frank, Markus; v.Herwijnen, E; Jost, B; Neufeld, N

    2014-01-01

    The LHCb experiment at the LHC accelerator at CERN collects collisions of particle bunches at 40 MHz. After a first level of hardware trigger with output of 1 MHz, the physically interesting collisions are selected by running dedicated trigger algorithms in the High Level Trigger (HLT) computing farm. This farm consists of up to roughly 25000 CPU cores in roughly 1600 physical nodes each equipped with at least 1 TB of local storage space. This work describes the architecture to treble the available CPU power of the HLT farm given that the LHC collider in previous years delivered stable physics beams about 30% of the time. The gain is achieved by splitting the event selection process in two, a first stage reducing the data taken during stable beams and buffering the preselected particle collisions locally. A second processing stage running constantly at lower priority will then finalize the event filtering process and benefits fully from the time when LHC does not deliver stable beams e.g. while preparing a ne...

  4. Adaptive real-time methodology for optimizing energy-efficient computing

    Science.gov (United States)

    Hsu, Chung-Hsing; Feng, Wu-Chun

    2013-01-29

    Dynamic voltage and frequency scaling (DVFS) is an effective way to reduce energy and power consumption in microprocessor units. Current implementations of DVFS suffer from inaccurate modeling of power requirements and usage, and from inaccurate characterization of the relationships between the applicable variables. A system and method is proposed that adjusts CPU frequency and voltage based on run-time calculations of the workload processing time, as well as a calculation of performance sensitivity with respect to CPU frequency. The system and method are processor independent, and can be applied to either an entire system as a unit, or individually to each process running on a system.

  5. Blockade of L-type calcium channel in myocardium and calcium-induced contractions of vascular smooth muscle by CPU 86017.

    Science.gov (United States)

    Dai, De-zai; Hu, Hui-juan; Zhao, Jing; Hao, Xue-mei; Yang, Dong-mei; Zhou, Pei-ai; Wu, Cai-hong

    2004-04-01

    To assess the blockade by CPU 86017 on the L-type calcium channels in the myocardium and on the Ca(2+)-related contractions of vascular smooth muscle. The whole-cell patch-clamp was applied to investigate the blocking effect of CPU 86017 on the L-type calcium current in isolated guinea pig myocytes and contractions by KCl or phenylephrine (Phe) of the isolated rat tail arteries were measured. Suppression of the L-type current of the isolated myocytes by CPU 86017 was moderate, in time- and concentration-dependent manner and with no influence on the activation and inactivation curves. The IC(50) was 11.5 micromol/L. Suppressive effect of CPU 86017 on vaso-contractions induced by KCl 100 mmol/L, phenylephrine 1 micromol/L in KH solution (phase 1), Ca(2+) free KH solution ( phase 2), and by addition of CaCl(2) into Ca(2+)-free KH solution (phase 3) were observed. The IC(50) to suppress vaso-contractions by calcium entry via the receptor operated channel (ROC) and voltage-dependent channel (VDC) was 0.324 micromol/L and 16.3 micromol/L, respectively. The relative potency of CPU 86017 to suppress vascular tone by Ca(2+) entry through ROC and VDC is 1/187 of prazosin and 1/37 of verapamil, respectively. The blocking effects of CPU 86017 on the L-type calcium channel of myocardium and vessel are moderate and non-selective. CPU 86017 is approximately 50 times more potent in inhibiting ROC than VDC.

  6. Blockade of L-type calcium channel in myocardium and calcium-induced contractions of vascular smooth muscle by by CPU 86017

    Institute of Scientific and Technical Information of China (English)

    De-zai DAI; Hui-juan HU; Jing ZHAO; Xue-mei HAO; Dong-mei YANG; Pei-ai ZHOU; Cai-hong WU

    2004-01-01

    AIM: To assess the blockade by CPU 86017 on the L-type calcium channels in the myocardium and on the Ca2+related contractions of vascular smooth muscle. METHODS: The whole-cell patch-clamp was applied to investigate the blocking effect of CPU 86017 on the L-type calcium current in isolated guinea pig myocytes and contractions by KC1 or phenylephrine (Phe) of the isolated rat tail arteries were measured. RESULTS: Suppression of the L-type current of the isolated myocytes by CPU 86017 was moderate, in time- and concentration-dependent manner and with no influence on the activation and inactivation curves. The IC50 was 11.5 μmol/L. Suppressive effect of CPU 86017 on vaso-contractions induced by KC1 100 mmol/L, phenylephrine I μmol/Lin KH solution (phase 1),Ca2+ free KH solution ( phase 2), and by addition of CaCI2 into Ca2+-free KH solution (phase 3) were observed. The IC50 to suppress vaso-contractions by calcium entry via the receptor operated channel (ROC) and Voltage-dependent channel (VDC) was 0.324 μmol/L and 16.3 μmol/L, respectively. The relative potency of CPU 86017 to suppress vascular tone by Ca2+ entry through ROC and VDC is 1/187 of prazosin and 1/37 of verapamil, respectively.CONCLUSION: The blocking effects of CPU 86017 on the L-type calcium channel of myocardium and vessel are moderate and non-selective. CPU 86017 is approximately 50 times more potent in inhibiting ROC than VDC.

  7. Providing Source Code Level Portability Between CPU and GPU with MapCG

    Institute of Scientific and Technical Information of China (English)

    Chun-Tao Hong; De-Hao Chen; Yu-Bei Chen; Wen-Guang Chen; Wei-Min Zheng; Hai-Bo Lin

    2012-01-01

    Graphics processing units (GPU) have taken an important role in the general purpose computing market in recent years.At present,the common approach to programming GPU units is to write GPU specific code with low level GPU APIs such as CUDA.Although this approach can achieve good performance,it creates serious portability issues as programmers are required to write a specific version of the code for each potential target architecture.This results in high development and maintenance costs.We believe it is desirable to have a programming model which provides source code portability between CPUs and GPUs,as well as different GPUs.This would allow programmers to write one version of the code,which can be compiled and executed on either CPUs or GPUs efficiently without modification.In this paper,we propose MapCG,a MapReduce framework to provide source code level portability between CPUs and GPUs.In contrast to other approaches such as OpenCL,our framework,based on MapReduce,provides a high level programming model and makes programming much easier.We describe the design of MapCG,including the MapReduce-style high-level programming framework and the runtime system on the CPU and GPU.A prototype of the MapCG runtime,supporting multi-core CPUs and NVIDIA GPUs,was implemented. Our experimental results show that this implementation can execute the same source code efficiently on multi-core CPU platforms and GPUs,achieving an average speedup of 1.6~2.5x over previous implementations of MapReduce on eight commonly used applications.

  8. Hybrid computing: CPU+GPU co-processing and its application to tomographic reconstruction

    Energy Technology Data Exchange (ETDEWEB)

    Agulleiro, J.I.; Vazquez, F.; Garzon, E.M. [Supercomputing and Algorithms Group, Associated Unit CSIC-UAL, University of Almeria, 04120 Almeria (Spain); Fernandez, J.J., E-mail: JJ.Fernandez@csic.es [National Centre for Biotechnology, National Research Council (CNB-CSIC), Campus UAM, C/Darwin 3, Cantoblanco, 28049 Madrid (Spain)

    2012-04-15

    Modern computers are equipped with powerful computing engines like multicore processors and GPUs. The 3DEM community has rapidly adapted to this scenario and many software packages now make use of high performance computing techniques to exploit these devices. However, the implementations thus far are purely focused on either GPUs or CPUs. This work presents a hybrid approach that collaboratively combines the GPUs and CPUs available in a computer and applies it to the problem of tomographic reconstruction. Proper orchestration of workload in such a heterogeneous system is an issue. Here we use an on-demand strategy whereby the computing devices request a new piece of work to do when idle. Our hybrid approach thus takes advantage of the whole computing power available in modern computers and further reduces the processing time. This CPU+GPU co-processing can be readily extended to other image processing tasks in 3DEM. -- Highlights: Black-Right-Pointing-Pointer Hybrid computing allows full exploitation of the power (CPU+GPU) in a computer. Black-Right-Pointing-Pointer Proper orchestration of workload is managed by an on-demand strategy. Black-Right-Pointing-Pointer Total number of threads running in the system should be limited to the number of CPUs.

  9. Multithreading Image Processing in Single-core and Multi-core CPU using Java

    Directory of Open Access Journals (Sweden)

    Alda Kika

    2013-10-01

    Full Text Available Multithreading has been shown to be a powerful approach for boosting a system performance. One of the good examples of applications that benefits from multithreading is image processing. Image processing requires many resources and processing run time because the calculations are often done on a matrix of pixels. The programming language Java supports the multithreading programming as part of the language itself instead of treating threads through the operating system. In this paper we explore the performance of Java image processing applications designed with multithreading approach. In order to test how the multithreading influences on the performance of the program, we tested several image processing algorithms implemented with Java language using the sequential one thread and multithreading approach on single and multi-core CPU. The experiments were based not only on different platforms and algorithms that differ from each other from the level of complexity, but also on changing the sizes of the images and the number of threads when multithreading approach is applied. Performance is increased on single core and multiple core CPU in different ways in relation with image size, complexity of the algorithm and the platform.

  10. Fast CPU-based Monte Carlo simulation for radiotherapy dose calculation.

    Science.gov (United States)

    Ziegenhein, Peter; Pirner, Sven; Ph Kamerling, Cornelis; Oelfke, Uwe

    2015-08-07

    Monte-Carlo (MC) simulations are considered to be the most accurate method for calculating dose distributions in radiotherapy. Its clinical application, however, still is limited by the long runtimes conventional implementations of MC algorithms require to deliver sufficiently accurate results on high resolution imaging data. In order to overcome this obstacle we developed the software-package PhiMC, which is capable of computing precise dose distributions in a sub-minute time-frame by leveraging the potential of modern many- and multi-core CPU-based computers. PhiMC is based on the well verified dose planning method (DPM). We could demonstrate that PhiMC delivers dose distributions which are in excellent agreement to DPM. The multi-core implementation of PhiMC scales well between different computer architectures and achieves a speed-up of up to 37[Formula: see text] compared to the original DPM code executed on a modern system. Furthermore, we could show that our CPU-based implementation on a modern workstation is between 1.25[Formula: see text] and 1.95[Formula: see text] faster than a well-known GPU implementation of the same simulation method on a NVIDIA Tesla C2050. Since CPUs work on several hundreds of GB RAM the typical GPU memory limitation does not apply for our implementation and high resolution clinical plans can be calculated.

  11. Fast hybrid CPU- and GPU-based CT reconstruction algorithm using air skipping technique.

    Science.gov (United States)

    Lee, Byeonghun; Lee, Ho; Shin, Yeong Gil

    2010-01-01

    This paper presents a fast hybrid CPU- and GPU-based CT reconstruction algorithm to reduce the amount of back-projection operation using air skipping involving polygon clipping. The algorithm easily and rapidly selects air areas that have significantly higher contrast in each projection image by applying K-means clustering method on CPU, and then generates boundary tables for verifying valid region using segmented air areas. Based on these boundary tables of each projection image, clipped polygon that indicates active region when back-projection operation is performed on GPU is determined on each volume slice. This polygon clipping process makes it possible to use smaller number of voxels to be back-projected, which leads to a faster GPU-based reconstruction method. This approach has been applied to a clinical data set and Shepp-Logan phantom data sets having various ratio of air region for quantitative and qualitative comparison and analysis of our and conventional GPU-based reconstruction methods. The algorithm has been proved to reduce computational time to half without losing any diagnostic information, compared to conventional GPU-based approaches.

  12. A hybrid stepping motor system with dual CPU

    Institute of Scientific and Technical Information of China (English)

    高晗璎; 赵克; 孙力

    2004-01-01

    An indirect method of measuring the rotor position based on the magnetic reluctance variation is presented in the paper. A single-chip microprocessor 80C196KC is utilized to compensate the phase shift produeed by the process of position signals. At the same time, a DSP (Data Signal Processor) unit is used to realize the speed and current closed-loops of the hybrid stepping motor system. At last, experimental results show the control system has excellent static and dynamic characteristics.

  13. Design and implementation of a low power mobile CPU based embedded system for artificial leg control.

    Science.gov (United States)

    Hernandez, Robert; Yang, Qing; Huang, He; Zhang, Fan; Zhang, Xiaorong

    2013-01-01

    This paper presents the design and implementation of a new neural-machine-interface (NMI) for control of artificial legs. The requirements of high accuracy, real-time processing, low power consumption, and mobility of the NMI place great challenges on the computation engine of the system. By utilizing the architectural features of a mobile embedded CPU, we are able to implement our decision-making algorithm, based on neuromuscular phase-dependant support vector machines (SVM), with exceptional accuracy and processing speed. To demonstrate the superiority of our NMI, real-time experiments were performed on an able bodied subject with a 20 ms window increment. The 20 ms testing yielded accuracies of 99.94% while executing our algorithm efficiently with less than 11% processor loads.

  14. 3D Kirchhoff depth migration algorithm: A new scalable approach for parallelization on multicore CPU based cluster

    Science.gov (United States)

    Rastogi, Richa; Londhe, Ashutosh; Srivastava, Abhishek; Sirasala, Kirannmayi M.; Khonde, Kiran

    2017-03-01

    In this article, a new scalable 3D Kirchhoff depth migration algorithm is presented on state of the art multicore CPU based cluster. Parallelization of 3D Kirchhoff depth migration is challenging due to its high demand of compute time, memory, storage and I/O along with the need of their effective management. The most resource intensive modules of the algorithm are traveltime calculations and migration summation which exhibit an inherent trade off between compute time and other resources. The parallelization strategy of the algorithm largely depends on the storage of calculated traveltimes and its feeding mechanism to the migration process. The presented work is an extension of our previous work, wherein a 3D Kirchhoff depth migration application for multicore CPU based parallel system had been developed. Recently, we have worked on improving parallel performance of this application by re-designing the parallelization approach. The new algorithm is capable to efficiently migrate both prestack and poststack 3D data. It exhibits flexibility for migrating large number of traces within the available node memory and with minimal requirement of storage, I/O and inter-node communication. The resultant application is tested using 3D Overthrust data on PARAM Yuva II, which is a Xeon E5-2670 based multicore CPU cluster with 16 cores/node and 64 GB shared memory. Parallel performance of the algorithm is studied using different numerical experiments and the scalability results show striking improvement over its previous version. An impressive 49.05X speedup with 76.64% efficiency is achieved for 3D prestack data and 32.00X speedup with 50.00% efficiency for 3D poststack data, using 64 nodes. The results also demonstrate the effectiveness and robustness of the improved algorithm with high scalability and efficiency on a multicore CPU cluster.

  15. 冷冻电镜三维重构在CPU-GPU系统中的并行性%Parallelism for cryo-EM 3D reconstruction on CPU-GPU heterogeneous system

    Institute of Scientific and Technical Information of China (English)

    李兴建; 李临川; 谭光明; 张佩珩

    2011-01-01

    It is a challenge to efficiently utilize massive parallelism on both applications and architectures for heterogeneous systems. A practice of accelerating a cryo-EM 3D program was presented on how to exploit and orchestrate parallelism of applications to take advantage of the underlying parallelism exposed at the architecture level. All possible parallelism in cryo-EM 3D was exploited, and a self-adaptive dynamic scheduling algorithm was leveraged to efficiently implement parallelism mapping between the application and architecture. The experiment on a part of dawning nebulae system (32 nodes) confirms that a hierarchical parallelism is an efficient pattern of parallel programming to utilize capabilities of both CPU and GPU on a heterogeneous system. The hybrid CPU-GPU program improves performance by 2. 4 times over the best CPU-only one for certain problem sizes.%为了有效地发掘和利用异构系统在应用和体系结构上的并行性,以冷冻电镜三维重构为例展示如何利用应用程序潜在的并行性.通过分析重构计算所有的并行性,实现了将动态自适应的划分算法用于任务在异构系统上高效的分发.在曙光星云系统的部分节点系统(32节点)上评估并行化的程序性能.实验证明:多层次的并行化是CPU与GPU异构系统上开发并行性的有效模式;CPU-GPU混合程序在给定问题规模上相对单纯CPU程序获得2.4倍加速比.

  16. A Scientific Trigger Unit for Space-Based Real-Time Gamma Ray Burst Detection, II - Data Processing Model and Benchmarks

    CERN Document Server

    Provost, Hervé Le; Flouzat, Christophe; Kestener, Pierre; Chaminade, Thomas; Donati, Modeste; Château, Frédéric; Daly, François; Fontignie, Jean

    2014-01-01

    The Scientific Trigger Unit (UTS) is a satellite equipment designed to detect Gamma Ray Bursts (GRBs) observed by the onboard 6400 pixels camera ECLAIRs. It is foreseen to equip the low-Earth orbit French-Chinese satellite SVOM and acts as the GRB trigger unit for the mission. The UTS analyses in real-time and in great details the onboard camera data in order to select the GRBs, to trigger a spacecraft slew re-centering each GRB for the narrow field-of-view instruments, and to alert the ground telescope network for GRB follow-up observations. A few GRBs per week are expected to be observed by the camera; the UTS targets a close to 100% trigger efficiency, while being selective enough to avoid fake alerts. This is achieved by running the complex scientific algorithms on a radiation tolerant hardware, based on a FPGA data pre-processor and a CPU with a Real-Time Operating System. The UTS is a scientific software, firmware and hardware co-development. A Data Processing Model (DPM) has been developed to fully val...

  17. 47 CFR 15.102 - CPU boards and power supplies used in personal computers.

    Science.gov (United States)

    2010-10-01

    ... computers. 15.102 Section 15.102 Telecommunication FEDERAL COMMUNICATIONS COMMISSION GENERAL RADIO FREQUENCY DEVICES Unintentional Radiators § 15.102 CPU boards and power supplies used in personal computers. (a... modifications that must be made to a personal computer, peripheral device, CPU board or power supply during...

  18. The Research and Test of Fast Radio Burst Real-time Search Algorithm Based on GPU Acceleration

    Science.gov (United States)

    Wang, J.; Chen, M. Z.; Pei, X.; Wang, Z. Q.

    2017-03-01

    In order to satisfy the research needs of Nanshan 25 m radio telescope of Xinjiang Astronomical Observatory (XAO) and study the key technology of the planned QiTai radio Telescope (QTT), the receiver group of XAO studied the GPU (Graphics Processing Unit) based real-time FRB searching algorithm which developed from the original FRB searching algorithm based on CPU (Central Processing Unit), and built the FRB real-time searching system. The comparison of the GPU system and the CPU system shows that: on the basis of ensuring the accuracy of the search, the speed of the GPU accelerated algorithm is improved by 35-45 times compared with the CPU algorithm.

  19. [Real-time safety audits in a neonatal unit].

    Science.gov (United States)

    Bergon-Sendin, Elena; Perez-Grande, María Del Carmen; Lora-Pablos, David; Melgar-Bonis, Ana; Ureta-Velasco, Noelia; Moral-Pumarega, María Teresa; Pallas-Alonso, Carmen Rosa

    2017-09-01

    Random audits are a safety tool to help in the prevention of adverse events, but they have not been widely used in hospitals. The aim of the study was to determine, through random safety audits, whether the information and material required for resuscitation were available for each patient in a neonatal intensive care unit and determine if factors related to the patient, time or location affect the implementation of the recommendations. Prospective observational study conducted in a level III-C neonatal intensive care unit during the year 2012. The evaluation of written information on the endotracheal tube, mask and ambu bag prepared of each patient and laryngoscopes of the emergency trolley were included within a broader audit of technological resources and study procedures. The technological resources and procedures were randomly selected twice a week for audit. Appropriate overall use was defined when all evaluated variables were correctly programmed in the same procedure. A total of 296 audits were performed. The kappa coefficient of inter-observer agreement was 0.93. The rate of appropriate overall use of written information and material required for resuscitation was 62.50% (185/296). Mask and ambu bag prepared for each patient was the variable with better compliance (97.3%, P=.001). Significant differences were found with improved usage during weekends versus working-day (73.97 vs. 58.74%, P=.01), and the rest of the year versus 3(rd) quarter (66.06 vs. 52%, P=.02). Only in 62.5% of cases was the information and the material necessary to attend to a critical situation urgently easily available. Opportunities for improvement were identified through the audits. Copyright © 2016 Asociación Española de Pediatría. Publicado por Elsevier España, S.L.U. All rights reserved.

  20. High Speed 3D Tomography on CPU, GPU, and FPGA

    Directory of Open Access Journals (Sweden)

    GAC Nicolas

    2008-01-01

    Full Text Available Abstract Back-projection (BP is a costly computational step in tomography image reconstruction such as positron emission tomography (PET. To reduce the computation time, this paper presents a pipelined, prefetch, and parallelized architecture for PET BP (3PA-PET. The key feature of this architecture is its original memory access strategy, masking the high latency of the external memory. Indeed, the pattern of the memory references to the data acquired hinders the processing unit. The memory access bottleneck is overcome by an efficient use of the intrinsic temporal and spatial locality of the BP algorithm. A loop reordering allows an efficient use of general purpose processor's caches, for software implementation, as well as the 3D predictive and adaptive cache (3D-AP cache, when considering hardware implementations. Parallel hardware pipelines are also efficient thanks to a hierarchical 3D-AP cache: each pipeline performs a memory reference in about one clock cycle to reach a computational throughput close to 100%. The 3PA-PET architecture is prototyped on a system on programmable chip (SoPC to validate the system and to measure its expected performances. Time performances are compared with a desktop PC, a workstation, and a graphic processor unit (GPU.

  1. High Speed 3D Tomography on CPU, GPU, and FPGA

    Directory of Open Access Journals (Sweden)

    Dominique Houzet

    2009-02-01

    Full Text Available Back-projection (BP is a costly computational step in tomography image reconstruction such as positron emission tomography (PET. To reduce the computation time, this paper presents a pipelined, prefetch, and parallelized architecture for PET BP (3PA-PET. The key feature of this architecture is its original memory access strategy, masking the high latency of the external memory. Indeed, the pattern of the memory references to the data acquired hinders the processing unit. The memory access bottleneck is overcome by an efficient use of the intrinsic temporal and spatial locality of the BP algorithm. A loop reordering allows an efficient use of general purpose processor's caches, for software implementation, as well as the 3D predictive and adaptive cache (3D-AP cache, when considering hardware implementations. Parallel hardware pipelines are also efficient thanks to a hierarchical 3D-AP cache: each pipeline performs a memory reference in about one clock cycle to reach a computational throughput close to 100%. The 3PA-PET architecture is prototyped on a system on programmable chip (SoPC to validate the system and to measure its expected performances. Time performances are compared with a desktop PC, a workstation, and a graphic processor unit (GPU.

  2. Hybrid computing: CPU+GPU co-processing and its application to tomographic reconstruction.

    Science.gov (United States)

    Agulleiro, J I; Vázquez, F; Garzón, E M; Fernández, J J

    2012-04-01

    Modern computers are equipped with powerful computing engines like multicore processors and GPUs. The 3DEM community has rapidly adapted to this scenario and many software packages now make use of high performance computing techniques to exploit these devices. However, the implementations thus far are purely focused on either GPUs or CPUs. This work presents a hybrid approach that collaboratively combines the GPUs and CPUs available in a computer and applies it to the problem of tomographic reconstruction. Proper orchestration of workload in such a heterogeneous system is an issue. Here we use an on-demand strategy whereby the computing devices request a new piece of work to do when idle. Our hybrid approach thus takes advantage of the whole computing power available in modern computers and further reduces the processing time. This CPU+GPU co-processing can be readily extended to other image processing tasks in 3DEM. Copyright © 2012 Elsevier B.V. All rights reserved.

  3. Porting AMG2013 to Heterogeneous CPU+GPU Nodes

    Energy Technology Data Exchange (ETDEWEB)

    Samfass, Philipp [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

    2017-01-26

    LLNL's future advanced technology system SIERRA will feature heterogeneous compute nodes that consist of IBM PowerV9 CPUs and NVIDIA Volta GPUs. Conceptually, the motivation for such an architecture is quite straightforward: While GPUs are optimized for throughput on massively parallel workloads, CPUs strive to minimize latency for rather sequential operations. Yet, making optimal use of heterogeneous architectures raises new challenges for the development of scalable parallel software, e.g., with respect to work distribution. Porting LLNL's parallel numerical libraries to upcoming heterogeneous CPU+GPU architectures is therefore a critical factor for ensuring LLNL's future success in ful lling its national mission. One of these libraries, called HYPRE, provides parallel solvers and precondi- tioners for large, sparse linear systems of equations. In the context of this intern- ship project, I consider AMG2013 which is a proxy application for major parts of HYPRE that implements a benchmark for setting up and solving di erent systems of linear equations. In the following, I describe in detail how I ported multiple parts of AMG2013 to the GPU (Section 2) and present results for di erent experiments that demonstrate a successful parallel implementation on the heterogeneous ma- chines surface and ray (Section 3). In Section 4, I give guidelines on how my code should be used. Finally, I conclude and give an outlook for future work (Section 5).

  4. Obesity, diabetes, and length of time in the United States

    Science.gov (United States)

    Tsujimoto, Tetsuro; Kajio, Hiroshi; Sugiyama, Takehiro

    2016-01-01

    Abstract Obesity prevalence remains high in the United States (US), and is rising in most other countries. This is a repeated cross-sectional study using a nationally representative sample of the National Health and Nutrition Examination Survey 1999 to 2012. Multivariate logistic regression analyses were separately performed for adults (n = 37,639) and children/adolescents (n = 28,282) to assess the associations between the length of time in the US, and the prevalences of obesity and diabetes. In foreign-born adults, the prevalences of both obesity and diabetes increased with the length of time in the US, and ≥20 years in the US was associated with significantly higher rates of obesity (adjusted odds ratio [aOR] 2.32, 95% confidence interval [CI] 1.22–4.40, P = 0.01) and diabetes (aOR 4.22, 95% CI 1.04–17.08, P = 0.04) compared with obesity prevalence was significantly higher in those born in the US than those who had been in the US for obesity prevalence was significantly higher in US-born than in foreign-born adults from 1999 to 2012. On the other hand, the gap in obesity prevalence between US-born and foreign-born children/adolescents decreased from 1999 to 2011 due to a rapid increase in obesity prevalence among the foreign-born population, until there was no significant difference in 2011 to 2012. This study revealed that the risks of obesity and diabetes have increased in foreign-born US residents with time living in the US. However, the obesity gap between US-born and foreign-born populations is closing. PMID:27583867

  5. United States Forest Disturbance Trends Observed Using Landsat Time Series

    Science.gov (United States)

    Masek, Jeffrey G.; Goward, Samuel N.; Kennedy, Robert E.; Cohen, Warren B.; Moisen, Gretchen G.; Schleeweis, Karen; Huang, Chengquan

    2013-01-01

    Disturbance events strongly affect the composition, structure, and function of forest ecosystems; however, existing U.S. land management inventories were not designed to monitor disturbance. To begin addressing this gap, the North American Forest Dynamics (NAFD) project has examined a geographic sample of 50 Landsat satellite image time series to assess trends in forest disturbance across the conterminous United States for 1985-2005. The geographic sample design used a probability-based scheme to encompass major forest types and maximize geographic dispersion. For each sample location disturbance was identified in the Landsat series using the Vegetation Change Tracker (VCT) algorithm. The NAFD analysis indicates that, on average, 2.77 Mha/yr of forests were disturbed annually, representing 1.09%/yr of US forestland. These satellite-based national disturbance rates estimates tend to be lower than those derived from land management inventories, reflecting both methodological and definitional differences. In particular the VCT approach used with a biennial time step has limited sensitivity to low-intensity disturbances. Unlike prior satellite studies, our biennial forest disturbance rates vary by nearly a factor of two between high and low years. High western US disturbance rates were associated with active fire years and insect activity, while variability in the east is more strongly related to harvest rates in managed forests. We note that generating a geographic sample based on representing forest type and variability may be problematic since the spatial pattern of disturbance does not necessarily correlate with forest type. We also find that the prevalence of diffuse, non-stand clearing disturbance in US forests makes the application of a biennial geographic sample problematic. Future satellite-based studies of disturbance at regional and national scales should focus on wall-to-wall analyses with annual time step for improved accuracy.

  6. 一种新的小檗碱衍生物CPU-86017在人小肠上皮细胞(Caco-2细胞)中的转运与摄取特性%Transport and uptake characteristics of a new derivative of berberine (CPU-86017) by human intestinal epithelial cell line: Caco-2

    Institute of Scientific and Technical Information of China (English)

    杨海涛; 王广基

    2003-01-01

    AIM: The characteristics of transepithelial transport and uptake of CPU-86017 {[7-(4-chlorbenzyl)-7,8,13,13α tetrahydroberberine chloride, CTHB] }, a new antiarrhythmia agent and a new derivative of berberine, were investi gated on epithelial cell line (Caco-2) to further understand the absorption mechanism of berberine and its derivatives. METHODS: Caco-2 cell was used. RESULTS: 1) The permeability coefficient from the apical (AP) to basolateral (BL) of CPU-86017 was approximately 5 times higher than that from BL-to-AP transport. The effects of a P-glycoprotein (P-gp) inhibitor-cyclosporin A, some surfactants, and lower pH on the transepithelial transport of CPU-86017 were also observed. Cyclosporine A at 7.5 mg/L had no effect on the transepithelial electrical resistance (TEER); an about 4-fold enhancement on the transepithlial transport of CPU-86017 was observed. Some surfac tants (sodium citrate, sodium deoxycholate, and sodium dodecyl sulfate) at 100 μmol/L and low pH (pH=6.0) induced a reversible decrease of TEER; enhancements of the transepithelial transport of CPU-86017 were also observed with some surfactants; 2) In the process of uptake of CPU-86017, the initial uptake rates of CPU-86017 were saturable with a Vmax of (250+39) μg.min-1.g-1 (protein) and Km of (0.90+0.12) mmol/L. This process was enhanced by cyclosporine A (7.5 mg/L) with a Vmax of (588+49) μg.min-1.g-1 (protein) and Km (0.42+0.08) mmol/L. CONCLUSION: Some surfactants and P-gp inhibitors can be considered as enhancers of its transepithelial trans port and uptake.

  7. Attributes for NHDPlus Catchments (Version 1.1)for the Conterminous United States: Contact Time, 2002

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — This data set represents the average contact time, in units of days, compiled for every catchment of NHDPlus for the conterminous United States. Contact time, as...

  8. Tempest: GPU-CPU computing for high-throughput database spectral matching.

    Science.gov (United States)

    Milloy, Jeffrey A; Faherty, Brendan K; Gerber, Scott A

    2012-07-06

    Modern mass spectrometers are now capable of producing hundreds of thousands of tandem (MS/MS) spectra per experiment, making the translation of these fragmentation spectra into peptide matches a common bottleneck in proteomics research. When coupled with experimental designs that enrich for post-translational modifications such as phosphorylation and/or include isotopically labeled amino acids for quantification, additional burdens are placed on this computational infrastructure by shotgun sequencing. To address this issue, we have developed a new database searching program that utilizes the massively parallel compute capabilities of a graphical processing unit (GPU) to produce peptide spectral matches in a very high throughput fashion. Our program, named Tempest, combines efficient database digestion and MS/MS spectral indexing on a CPU with fast similarity scoring on a GPU. In our implementation, the entire similarity score, including the generation of full theoretical peptide candidate fragmentation spectra and its comparison to experimental spectra, is conducted on the GPU. Although Tempest uses the classical SEQUEST XCorr score as a primary metric for evaluating similarity for spectra collected at unit resolution, we have developed a new "Accelerated Score" for MS/MS spectra collected at high resolution that is based on a computationally inexpensive dot product but exhibits scoring accuracy similar to that of the classical XCorr. In our experience, Tempest provides compute-cluster level performance in an affordable desktop computer.

  9. An hybrid CPU-GPU framework for quantitative follow-up of abdominal aortic aneurysm volume by CT angiography

    Science.gov (United States)

    Kauffmann, Claude; Tang, An; Therasse, Eric; Soulez, Gilles

    2010-03-01

    We developed a hybrid CPU-GPU framework enabling semi-automated segmentation of abdominal aortic aneurysm (AAA) on Computed Tomography Angiography (CTA) examinations. AAA maximal diameter (D-max) and volume measurements and their progression between 2 examinations can be generated by this software improving patient followup. In order to improve the workflow efficiency some segmentation tasks were implemented and executed on the graphics processing unit (GPU). A GPU based algorithm is used to automatically segment the lumen of the aneurysm within short computing time. In a second step, the user interacted with the software to validate the boundaries of the intra-luminal thrombus (ILT) on GPU-based curved image reformation. Automatic computation of D-max and volume were performed on the 3D AAA model. Clinical validation was conducted on 34 patients having 2 consecutive MDCT examinations within a minimum interval of 6 months. The AAA segmentation was performed twice by a experienced radiologist (reference standard) and once by 3 unsupervised technologists on all 68 MDCT. The ICC for intra-observer reproducibility was 0.992 (>=0.987) for D-max and 0.998 (>=0.994) for volume measurement. The ICC for inter-observer reproducibility was 0.985 (0.977-0.90) for D-max and 0.998 (0.996- 0.999) for volume measurement. Semi-automated AAA segmentation for volume follow-up was more than twice as sensitive than D-max follow-up, while providing an equivalent reproducibility.

  10. CPU 86017, p-chlorobenzyltetrahydroberberine chloride, attenuates monocrotaline-induced pulmonary hypertension by suppressing endothelin pathway.

    Science.gov (United States)

    Zhang, Tian-tai; Cui, Bing; Dai, De-zai; Su, Wei

    2005-11-01

    To elucidate the involvement of the endothelin (ET) pathway in the pathogenesis of monocrotaline (MCT)-induced pulmonary arterial hypertension (PAH) and the therapeutic effect of CPU 86017 (p-chlorobenzyltetrahydroberberine chloride) in rats. Rats were injected with a single dose (60 mg/kg, sc) of MCT and given CPU 86017 (20, 40, and 80 mg/kg-1/d-1, po) or saline for 28 d. The hemodynamics, mRNA expression, and vascular activity were evaluated. Right ventricular systolic pressure and central venous pressures were elevated markedly in the PAH model and decreased by CPU 86017. In the PAH group, the endothelin-1 (ET-1) in serum and lungs was dramatically increased by 54% (79.9 pg/mL, PCPU 86017 decreased the content of ET-1 to the normal level in lung tissue, but was less effective in serum. The level of NO was significantly increased in CPU 86017 at 80 and 40 mg/kg-1/d-1 groups in tissue, whereas the difference in serum was not significant. A significant reduction in MDA production and an increase in the SOD activity in the serum and lungs was observed in all three CPU 86017 groups. CPU 86017 80 mg/kg-1/d-1 po increased the activity of cNOS by 33% (PCPU 86017 groups, and preproET-1 mRNA abundance was also reduced notably in CPU 86017 80 mg/kg-1/d-1 group vs the PAH group. The KCl-induced vasoconstrictions in the calcium-free medium decreased markedly in PAH group but recovered partially after CPU 86017 intervention. The constrictions in the presence of Ca(2+) was not improved by CPU 86017. The phenylephrine-induced vasoconstrictions in the calcium-free medium decreased markedly in PAH group but not recovered after CPU 86017 intervention. The constrictions in the presence of Ca(2+) completely returned to the normal after CPU 86017 intervention. CPU 86017 suppressed MCT-induced PAH mainly through an indirect suppression of the ET-1 system, which was involved in the pathogenesis of the disease.

  11. A Survey of Techniques of CPU-GPGPU Heterogeneous Architecture%CPU-GPGPU异构体系结构相关技术综述

    Institute of Scientific and Technical Information of China (English)

    徐新海; 林宇斐; 易伟

    2009-01-01

    GPU has better performance than CPU in both computing ability and memory bandwidth with the fast development of GPU technology. Therefore general computing on GPU has become increasingly popular, bringing forth an emerging CPU-GPGPU heterogeneous architecture. Although the new architecture demonstrates high performance and are currently being the highlight in academe and industry, how to write and execute programs on it efficiently still remains a big challenge. This paper summarizes the techniques of programmability, reliability and low power for GPU, and discusses the development trend of the CPU-GPGPU heterogeneous architecture.%随着GPU的发展,其计算能力和访存带宽都超过了CPU,在GPU上进行通用计算也变得越来越流行,这样就构成了CPU-GPGPU的新型异构体系结构.虽然这种新型体系结构表现出了强大的性能优势并受到了学术界和产业界的广泛关注,但如何更好地在这种结构上高效地编写和运行程序仍然存在很大的挑战.本文综述了针对这一体系结构现有的可编程性技术、可靠性技术和低功耗技术,并结合这些技术展望了CPU-GPGPU这种异构系统的发展趋势.

  12. Real-time electroholography using a multiple-graphics processing unit cluster system with a single spatial light modulator and the InfiniBand network

    Science.gov (United States)

    Niwase, Hiroaki; Takada, Naoki; Araki, Hiromitsu; Maeda, Yuki; Fujiwara, Masato; Nakayama, Hirotaka; Kakue, Takashi; Shimobaba, Tomoyoshi; Ito, Tomoyoshi

    2016-09-01

    Parallel calculations of large-pixel-count computer-generated holograms (CGHs) are suitable for multiple-graphics processing unit (multi-GPU) cluster systems. However, it is not easy for a multi-GPU cluster system to accomplish fast CGH calculations when CGH transfers between PCs are required. In these cases, the CGH transfer between the PCs becomes a bottleneck. Usually, this problem occurs only in multi-GPU cluster systems with a single spatial light modulator. To overcome this problem, we propose a simple method using the InfiniBand network. The computational speed of the proposed method using 13 GPUs (NVIDIA GeForce GTX TITAN X) was more than 3000 times faster than that of a CPU (Intel Core i7 4770) when the number of three-dimensional (3-D) object points exceeded 20,480. In practice, we achieved ˜40 tera floating point operations per second (TFLOPS) when the number of 3-D object points exceeded 40,960. Our proposed method was able to reconstruct a real-time movie of a 3-D object comprising 95,949 points.

  13. Performance of Basic Geodynamic Solvers on BG/p and on Modern Mid-sized CPU Clusters

    Science.gov (United States)

    Omlin, S.; Keller, V.; Podladchikov, Y.

    2012-04-01

    cache use: we avoid random memory access and multiple read of the same data by rearranging mutually independent computations. More precisely, we group operations that act on the same small parts of data as much as possible together, assuring that these small data parts fit into the CPU cache. In fact, reading from CPU cache requires nearly no time compared to reading from memory. We also optimize the technical programming needed for a parallelized running of the solvers on a computer cluster. The parallelization of a solver requires a spatial decomposition of the computational domain; each processor solves then the problem for one sub-domain, synchronizing at every iteration the sub-domain's boundaries with the ones of its neighbours. We optimize boundary synchronization between processors by developing optimal methods based on the full range of advanced MPI-features (MPI is the standard interface for developing parallel applications on CPU clusters with distributed memory). A geodynamic solver solves at every iteration a system of equations. This can be solved implicitly - by using a direct solver - or explicitly - by updating all variables in the system of equations based on a update rule derived from the system. We compare the performance of implicit and explicit solving for our applications.

  14. Kernel density estimation using graphical processing unit

    Science.gov (United States)

    Sunarko, Su'ud, Zaki

    2015-09-01

    Kernel density estimation for particles distributed over a 2-dimensional space is calculated using a single graphical processing unit (GTX 660Ti GPU) and CUDA-C language. Parallel calculations are done for particles having bivariate normal distribution and by assigning calculations for equally-spaced node points to each scalar processor in the GPU. The number of particles, blocks and threads are varied to identify favorable configuration. Comparisons are obtained by performing the same calculation using 1, 2 and 4 processors on a 3.0 GHz CPU using MPICH 2.0 routines. Speedups attained with the GPU are in the range of 88 to 349 times compared the multiprocessor CPU. Blocks of 128 threads are found to be the optimum configuration for this case.

  15. Comparison of methods for estimating motor unit firing rate time series from firing times.

    Science.gov (United States)

    Liu, Lukai; Bonato, Paolo; Clancy, Edward A

    2016-12-01

    The central nervous system regulates recruitment and firing of motor units to modulate muscle tension. Estimation of the firing rate time series is typically performed by decomposing the electromyogram (EMG) into its constituent firing times, then lowpass filtering a constituent train of impulses. Little research has examined the performance of different estimation methods, particularly in the inevitable presence of decomposition errors. The study of electrocardiogram (ECG) and electroneurogram (ENG) firing rate time series presents a similar problem, and has applied novel simulation models and firing rate estimators. Herein, we adapted an ENG/ECG simulation model to generate realistic EMG firing times derived from known rates, and assessed various firing rate time series estimation methods. ENG/ECG-inspired rate estimation worked exceptionally well when EMG decomposition errors were absent, but degraded unacceptably with decomposition error rates of ⩾1%. Typical EMG decomposition error rates-even after expert manual review-are 3-5%. At realistic decomposition error rates, more traditional EMG smoothing approaches performed best, when optimal smoothing window durations were selected. This optimal window was often longer than the 400ms duration that is commonly used in the literature. The optimal duration decreased as the modulation frequency of firing rate increased, average firing rate increased and decomposition errors decreased. Examples of these rate estimation methods on physiologic data are also provided, demonstrating their influence on measures computed from the firing rate estimate. Copyright © 2016 Elsevier Ltd. All rights reserved.

  16. Parallel particle swarm optimization on a graphics processing unit with application to trajectory optimization

    Science.gov (United States)

    Wu, Q.; Xiong, F.; Wang, F.; Xiong, Y.

    2016-10-01

    In order to reduce the computational time, a fully parallel implementation of the particle swarm optimization (PSO) algorithm on a graphics processing unit (GPU) is presented. Instead of being executed on the central processing unit (CPU) sequentially, PSO is executed in parallel via the GPU on the compute unified device architecture (CUDA) platform. The processes of fitness evaluation, updating of velocity and position of all particles are all parallelized and introduced in detail. Comparative studies on the optimization of four benchmark functions and a trajectory optimization problem are conducted by running PSO on the GPU (GPU-PSO) and CPU (CPU-PSO). The impact of design dimension, number of particles and size of the thread-block in the GPU and their interactions on the computational time is investigated. The results show that the computational time of the developed GPU-PSO is much shorter than that of CPU-PSO, with comparable accuracy, which demonstrates the remarkable speed-up capability of GPU-PSO.

  17. Fast Clustering of Radar Reflectivity Data Using GPU-CPU Pipeline Scheme%基于GPU-CPU流水线的雷达回波快速聚类

    Institute of Scientific and Technical Information of China (English)

    周伟; 施宁; 王健; 汪群山

    2012-01-01

    In our meteorological application,the clustering algorithm was adopted for analysis and processing of radar reflectivity data.While facing problems of large scale of dataset and high dimension of feature vector,the clustering algorithm is too time-consuming to satisfy the real-time constraint in our applications.This paper proposes a parallelized clustering algorithm using GPU-CPU pipeline scheme to solve this problem.In our method,we utilized the feature of asynchronous execution between GPU and CPU,and organized the process of clustering into pipeline-style,with which we can largely exploit the parallelism in algorithm.The experimental results show that our GPU-CPU pipeline based parallelized clustering algorithm outperform normally parallelized clustering algorithm using CUDA without GPU-CPU pipeline by 38%.Compared to the serial code on CPU,out approach can achieve a 47x performance improvement,which makes it satisfy the requirements of real-time applications.%提出了基于GPU-CPU流水线的雷达回波快速聚类方法.该方法利用GPU与CPU异步执行的特征,将聚类的各步骤组织成流水线,大大的挖掘了聚类全过程的的并行性.实验表明,引入这种GPU-CPU流水线机制后,该方法比一般策略的基于GPU的并行聚类算法性能有38%的提升,而相对于传统的CPU上的串行程序,获得了47x的加速比,满足了气象实时分析应用中的实时性要求.

  18. Reconstruction of the neutron spectrum using an artificial neural network in CPU and GPU; Reconstruccion del espectro de neutrones usando una red neuronal artificial (RNA) en CPU y GPU

    Energy Technology Data Exchange (ETDEWEB)

    Hernandez D, V. M.; Moreno M, A.; Ortiz L, M. A. [Universidad de Cordoba, 14002 Cordoba (Spain); Vega C, H. R.; Alonso M, O. E., E-mail: vic.mc68010@gmail.com [Universidad Autonoma de Zacatecas, 98000 Zacatecas, Zac. (Mexico)

    2016-10-15

    The increase in computing power in personal computers has been increasing, computers now have several processors in the CPU and in addition multiple CUDA cores in the graphics processing unit (GPU); both systems can be used individually or combined to perform scientific computation without resorting to processor or supercomputing arrangements. The Bonner sphere spectrometer is the most commonly used multi-element system for neutron detection purposes and its associated spectrum. Each sphere-detector combination gives a particular response that depends on the energy of the neutrons, and the total set of these responses is known like the responses matrix Rφ(E). Thus, the counting rates obtained with each sphere and the neutron spectrum is related to the Fredholm equation in its discrete version. For the reconstruction of the spectrum has a system of poorly conditioned equations with an infinite number of solutions and to find the appropriate solution, it has been proposed the use of artificial intelligence through neural networks with different platforms CPU and GPU. (Author)

  19. Multi-threaded acceleration of ORBIT code on CPU and GPU with minimal modifications

    Science.gov (United States)

    Qu, Ante; Ethier, Stephane; Feibush, Eliot; White, Roscoe

    2013-10-01

    The guiding center code ORBIT was originally developed 30 years ago to study the drift orbit effects of charged particles in the strong equilibrium magnetic fields of tokamaks. Today, ORBIT remains a very active tool in magnetic confinement fusion research and continues to adapt to the latest toroidal devices, such as the NSTX-Upgrade, for which it plays a very important role in the study of energetic particle effects. Although the capabilities of ORBIT have improved throughout the years, the code still remains a serial application, which has now become an impediment to the lengthy simulations required for the NSTX-U project. In this work, multi-threaded parallelism is introduced in the core of the code with the goal of achieving the largest performance improvement while minimizing changes made to the source code. To that end, we introduce preprocessor directives in the most compute-intensive parts of the code, which constitute the stable core that seldom changes. Standard OpenMP directives are used for shared-memory CPU multi-threading while newly developed OpenACC (www.openacc.org) directives are used for GPU (Graphical Processing Unit) multi-threading. Implementation details and performance results are presented.

  20. Discrepancy Between Clinician and Research Assistant in TIMI Score Calculation (TRIAGED CPU

    Directory of Open Access Journals (Sweden)

    Taylor, Brian T.

    2014-11-01

    Full Text Available Introduction: Several studies have attempted to demonstrate that the Thrombolysis in Myocardial Infarction (TIMI risk score has the ability to risk stratify emergency department (ED patients with potential acute coronary syndromes (ACS. Most of the studies we reviewed relied on trained research investigators to determine TIMI risk scores rather than ED providers functioning in their normal work capacity. We assessed whether TIMI risk scores obtained by ED providers in the setting of a busy ED differed from those obtained by trained research investigators. Methods: This was an ED-based prospective observational cohort study comparing TIMI scores obtained by 49 ED providers admitting patients to an ED chest pain unit (CPU to scores generated by a team of trained research investigators. We examined provider type, patient gender, and TIMI elements for their effects on TIMI risk score discrepancy. Results: Of the 501 adult patients enrolled in the study, 29.3% of TIMI risk scores determined by ED providers and trained research investigators were generated using identical TIMI risk score variables. In our low-risk population the majority of TIMI risk score differences were small; however, 12% of TIMI risk scores differed by two or more points. Conclusion: TIMI risk scores determined by ED providers in the setting of a busy ED frequently differ from scores generated by trained research investigators who complete them while not under the same pressure of an ED provider. [West J Emerg Med. 2015;16(1:24–33.

  1. Discrepancy between clinician and research assistant in TIMI score calculation (TRIAGED CPU).

    Science.gov (United States)

    Taylor, Brian T; Mancini, Michelino

    2015-01-01

    Several studies have attempted to demonstrate that the Thrombolysis in Myocardial Infarction (TIMI) risk score has the ability to risk stratify emergency department (ED) patients with potential acute coronary syndromes (ACS). Most of the studies we reviewed relied on trained research investigators to determine TIMI risk scores rather than ED providers functioning in their normal work capacity. We assessed whether TIMI risk scores obtained by ED providers in the setting of a busy ED differed from those obtained by trained research investigators. This was an ED-based prospective observational cohort study comparing TIMI scores obtained by 49 ED providers admitting patients to an ED chest pain unit (CPU) to scores generated by a team of trained research investigators. We examined provider type, patient gender, and TIMI elements for their effects on TIMI risk score discrepancy. Of the 501 adult patients enrolled in the study, 29.3% of TIMI risk scores determined by ED providers and trained research investigators were generated using identical TIMI risk score variables. In our low-risk population the majority of TIMI risk score differences were small; however, 12% of TIMI risk scores differed by two or more points. TIMI risk scores determined by ED providers in the setting of a busy ED frequently differ from scores generated by trained research investigators who complete them while not under the same pressure of an ED provider.

  2. CPU--Constructing Physics Understanding[TM]. [CD-ROM].

    Science.gov (United States)

    2000

    This CD-ROM consists of simulation software that allows students to conduct countless experiments using 20 Java simulators and curriculum units that explore light and color, forces and motion, sound and waves, static electricity and magnetism, current electricity, the nature of matter, and a unit on underpinnings. Setups can be designed by the…

  3. Deployment and optimization of biomacromolecule molecular dynamics simulation process on hundred trillion times clusterto 6 720 CPU cores in this task, the total computing speed accumulated to 2 274 ns/d, which offers a new approach to run the molec...%生物大分子的分子动力学模拟过程在百万亿次集群上的部署优化

    Institute of Scientific and Technical Information of China (English)

    潘龙强; 耿存亮; 慕宇光; 刘鑫; 胡毅; 潘景山; 周亚滨; 龚斌; 王禄山

    2012-01-01

    分子动力学模拟作为研究生物大分子功能和性质的新工具,已广泛应用于蛋白质和核酸等物质的分子动态学行为研究,但目前常规分子动力学模拟时间尺度较小,不能达到生物大分子分子动态行为的有效取样范围。温度副本交换分子动力学可同时运行多个独立模拟,明显提高模拟时间的可及尺度,但需要千核以至万核的计算资源,目前已发表的相关文献其模拟体系使用的计算资源均较小。本文利用国家超算济南中心的神威4000A百万亿次集群,首先进行单副本的分子动力学模拟,然后利用外切纤维素酶催化结构域的模拟体系(约5万原子)进行多达128个温度副本的分子动力学模拟,一次模拟任务最多成功利用6720个CPU核心同时进行计算,最高总运算速度累计达到2274ns/d,这为分子动力学模拟利用上万核心进行计算提供了新的思路。%As a novel tool to study the functions and properties of biomacromolecules, molecular dynamics has been ex- tensively used to investigate the molecular dynamic behavior of proteins and nucleic acids. However, the time scale range of normal molecular dynamics simulation remains relatively narrow, which cannot reach the valid sampling range of biomacromolecules dynamics behavior recently. Temperature replica exchange molecular dynamics can run multiple independent simulations synchronously and effectively increase the simulating speed. Meanwhile, it requires tens of thousands of cores in aspect of the computing resources, yet the simulation systems of published literatures cannot reach such large scale hitherto. By using the 100T Flops cluster of sunway 4000A in National Supercomputing Center in Jinan, we firstly ran the molecular dynamics of single replica and then ran as many as 128 temperature replicas exchange molecular dynamics to simulate the catalytic domain of exocellulases ( about 50 000 atoms). We

  4. Improving the execution performance of FreeSurfer : a new scheduled pipeline scheme for optimizing the use of CPU and GPU resources.

    Science.gov (United States)

    Delgado, J; Moure, J C; Vives-Gilabert, Y; Delfino, M; Espinosa, A; Gómez-Ansón, B

    2014-07-01

    A scheme to significantly speed up the processing of MRI with FreeSurfer (FS) is presented. The scheme is aimed at maximizing the productivity (number of subjects processed per unit time) for the use case of research projects with datasets involving many acquisitions. The scheme combines the already existing GPU-accelerated version of the FS workflow with a task-level parallel scheme supervised by a resource scheduler. This allows for an optimum utilization of the computational power of a given hardware platform while avoiding problems with shortages of platform resources. The scheme can be executed on a wide variety of platforms, as its implementation only involves the script that orchestrates the execution of the workflow components and the FS code itself requires no modifications. The scheme has been implemented and tested on a commodity platform within the reach of most research groups (a personal computer with four cores and an NVIDIA GeForce 480 GTX graphics card). Using the scheduled task-level parallel scheme, a productivity above 0.6 subjects per hour is achieved on the test platform, corresponding to a speedup of over six times compared to the default CPU-only serial FS workflow.

  5. hybridMANTIS: a CPU-GPU Monte Carlo method for modeling indirect x-ray detectors with columnar scintillators.

    Science.gov (United States)

    Sharma, Diksha; Badal, Andreu; Badano, Aldo

    2012-04-21

    The computational modeling of medical imaging systems often requires obtaining a large number of simulated images with low statistical uncertainty which translates into prohibitive computing times. We describe a novel hybrid approach for Monte Carlo simulations that maximizes utilization of CPUs and GPUs in modern workstations. We apply the method to the modeling of indirect x-ray detectors using a new and improved version of the code MANTIS, an open source software tool used for the Monte Carlo simulations of indirect x-ray imagers. We first describe a GPU implementation of the physics and geometry models in fastDETECT2 (the optical transport model) and a serial CPU version of the same code. We discuss its new features like on-the-fly column geometry and columnar crosstalk in relation to the MANTIS code, and point out areas where our model provides more flexibility for the modeling of realistic columnar structures in large area detectors. Second, we modify PENELOPE (the open source software package that handles the x-ray and electron transport in MANTIS) to allow direct output of location and energy deposited during x-ray and electron interactions occurring within the scintillator. This information is then handled by optical transport routines in fastDETECT2. A load balancer dynamically allocates optical transport showers to the GPU and CPU computing cores. Our hybridMANTIS approach achieves a significant speed-up factor of 627 when compared to MANTIS and of 35 when compared to the same code running only in a CPU instead of a GPU. Using hybridMANTIS, we successfully hide hours of optical transport time by running it in parallel with the x-ray and electron transport, thus shifting the computational bottleneck from optical tox-ray transport. The new code requires much less memory than MANTIS and, asa result, allows us to efficiently simulate large area detectors.

  6. Parental Involvement and Work Schedules: Time with Children in the United States, Germany, Norway, and the United Kingdom.

    Science.gov (United States)

    Hook, Jennifer L; Wolfe, Christina M

    2013-06-01

    We examine variation in parents' time with children by work schedule in two-parent families, utilizing time use surveys from the United States (2003), Germany (2001), Norway (2000), and the United Kingdom (2000) (N = 6,835). We find that American fathers working the evening shift spend more time alone with children regardless of mothers' employment status, whereas this association is conditional on mothers' employment in the United Kingdom and Germany. We find no evidence that Norwegian fathers working the evening shift spend more time alone with children. We conclude that a consequence of evening work often viewed as positive for children - fathers spending more time with children - is sensitive to both household employment arrangements and country context.

  7. Comparing the Consumption of CPU Hours with Scientific Output for the Extreme Science and Engineering Discovery Environment (XSEDE).

    Science.gov (United States)

    Knepper, Richard; Börner, Katy

    2016-01-01

    This paper presents the results of a study that compares resource usage with publication output using data about the consumption of CPU cycles from the Extreme Science and Engineering Discovery Environment (XSEDE) and resulting scientific publications for 2,691 institutions/teams. Specifically, the datasets comprise a total of 5,374,032,696 central processing unit (CPU) hours run in XSEDE during July 1, 2011 to August 18, 2015 and 2,882 publications that cite the XSEDE resource. Three types of studies were conducted: a geospatial analysis of XSEDE providers and consumers, co-authorship network analysis of XSEDE publications, and bi-modal network analysis of how XSEDE resources are used by different research fields. Resulting visualizations show that a diverse set of consumers make use of XSEDE resources, that users of XSEDE publish together frequently, and that the users of XSEDE with the highest resource usage tend to be "traditional" high-performance computing (HPC) community members from astronomy, atmospheric science, physics, chemistry, and biology.

  8. Comparing the Consumption of CPU Hours with Scientific Output for the Extreme Science and Engineering Discovery Environment (XSEDE.

    Directory of Open Access Journals (Sweden)

    Richard Knepper

    Full Text Available This paper presents the results of a study that compares resource usage with publication output using data about the consumption of CPU cycles from the Extreme Science and Engineering Discovery Environment (XSEDE and resulting scientific publications for 2,691 institutions/teams. Specifically, the datasets comprise a total of 5,374,032,696 central processing unit (CPU hours run in XSEDE during July 1, 2011 to August 18, 2015 and 2,882 publications that cite the XSEDE resource. Three types of studies were conducted: a geospatial analysis of XSEDE providers and consumers, co-authorship network analysis of XSEDE publications, and bi-modal network analysis of how XSEDE resources are used by different research fields. Resulting visualizations show that a diverse set of consumers make use of XSEDE resources, that users of XSEDE publish together frequently, and that the users of XSEDE with the highest resource usage tend to be "traditional" high-performance computing (HPC community members from astronomy, atmospheric science, physics, chemistry, and biology.

  9. Comparison of GPU- and CPU-implementations of mean-firing rate neural networks on parallel hardware.

    Science.gov (United States)

    Dinkelbach, Helge Ülo; Vitay, Julien; Beuth, Frederik; Hamker, Fred H

    2012-01-01

    Modern parallel hardware such as multi-core processors (CPUs) and graphics processing units (GPUs) have a high computational power which can be greatly beneficial to the simulation of large-scale neural networks. Over the past years, a number of efforts have focused on developing parallel algorithms and simulators best suited for the simulation of spiking neural models. In this article, we aim at investigating the advantages and drawbacks of the CPU and GPU parallelization of mean-firing rate neurons, widely used in systems-level computational neuroscience. By comparing OpenMP, CUDA and OpenCL implementations towards a serial CPU implementation, we show that GPUs are better suited than CPUs for the simulation of very large networks, but that smaller networks would benefit more from an OpenMP implementation. As this performance strongly depends on data organization, we analyze the impact of various factors such as data structure, memory alignment and floating precision. We then discuss the suitability of the different hardware depending on the networks' size and connectivity, as random or sparse connectivities in mean-firing rate networks tend to break parallel performance on GPUs due to the violation of coalescence.

  10. Model Predictive Obstacle Avoidance and Wheel Allocation Control of Mobile Robots Using Embedded CPU

    Science.gov (United States)

    Takahashi, Naoki; Nonaka, Kenichiro

    In this study, we propose a real-time model predictive control method for leg/wheel mobile robots which simultaneously achieves both obstacle avoidance and wheel allocation at a flexible position. The proposed method generates both obstacle avoidance path and dynamical wheel positions, and controls the heading angle depending on the slope of the predicted path so that the robot can keep a good balance between stability and mobility in narrow and complex spaces like indoor environments. Moreover, we reduce the computational effort of the algorithm by deleting usage of mathematical function in the repetitive numerical computation. Thus the proposed real-time optimization method can be applied to low speed on-board CPUs used in commercially-produced vehicles. We conducted experiments to verify efficacy and feasibility of the real-time implementation of the proposed method. We used a leg/wheel mobile robot which is equipped with two laser range finders to detect obstacles and an embedded CPU whose clock speed is only 80MHz. Experiments indicate that the proposed method achieves improved obstacle avoidance comparing with the previous method in the sense that it generates an avoidance path with balanced allocation of right and left side wheels.

  11. SPILADY: A parallel CPU and GPU code for spin-lattice magnetic molecular dynamics simulations

    Science.gov (United States)

    Ma, Pui-Wai; Dudarev, S. L.; Woo, C. H.

    2016-10-01

    Spin-lattice dynamics generalizes molecular dynamics to magnetic materials, where dynamic variables describing an evolving atomic system include not only coordinates and velocities of atoms but also directions and magnitudes of atomic magnetic moments (spins). Spin-lattice dynamics simulates the collective time evolution of spins and atoms, taking into account the effect of non-collinear magnetism on interatomic forces. Applications of the method include atomistic models for defects, dislocations and surfaces in magnetic materials, thermally activated diffusion of defects, magnetic phase transitions, and various magnetic and lattice relaxation phenomena. Spin-lattice dynamics retains all the capabilities of molecular dynamics, adding to them the treatment of non-collinear magnetic degrees of freedom. The spin-lattice dynamics time integration algorithm uses symplectic Suzuki-Trotter decomposition of atomic coordinate, velocity and spin evolution operators, and delivers highly accurate numerical solutions of dynamic evolution equations over extended intervals of time. The code is parallelized in coordinate and spin spaces, and is written in OpenMP C/C++ for CPU and in CUDA C/C++ for Nvidia GPU implementations. Temperatures of atoms and spins are controlled by Langevin thermostats. Conduction electrons are treated by coupling the discrete spin-lattice dynamics equations for atoms and spins to the heat transfer equation for the electrons. Worked examples include simulations of thermalization of ferromagnetic bcc iron, the dynamics of laser pulse demagnetization, and collision cascades.

  12. Research on Programming Based on CPU/GPU Heterogeneous Computing Cluster%基于CPU/GPU集群的编程的研究

    Institute of Scientific and Technical Information of China (English)

    刘钢锋

    2013-01-01

    随着微处理器技术的发展,GPU/CPU的混合计算已经成为是科学计算的主流趋势.本文从编程的层面,介绍了如何利用已有的并行编程语言来,调度GPU的计算功能,主要以MPI(一种消息传递编程模型)与基于GPU的CUDA(统一计算设备架构)编程模型相结合的方式进行GPU集群程序的测试,并分析了CPU/GPU集群并行环境下的运行特点.从分析的特点中总结出GPU集群较优策略,从而为提高CPU/GPU并行程序性能提供科学依据.%With the fast development in computer and microprocessor, Scientific Computing using CPU/GPU hybrid computing cluster has become a tendency. In this paper, from programming point of view, we propose the method of GPU scheduling to improve calculation efficiency. The main methods are through the combination of MPI (Message Passing Interface) and CUDA (Compute Unified Device Architecture) based on GPU to program. According to running condition of the parallel program, the characteristic of CPU/GPU hybrid computing cluster is analyzed. From the characteristic, the optimization strategy of parallel programs is found. So, the strategy will provide basis for improving the CPU/GPU parallel program.

  13. Inhibition of CPU0213, a Dual Endothelin Receptor Antagonist, on Apoptosis via Nox4-Dependent ROS in HK-2 Cells

    Directory of Open Access Journals (Sweden)

    Qing Li

    2016-06-01

    Full Text Available Background/Aims: Our previous studies have indicated that a novel endothelin receptor antagonist CPU0213 effectively normalized renal function in diabetic nephropathy. However, the molecular mechanisms mediating the nephroprotective role of CPU0213 remain unknown. Methods and Results: In the present study, we first detected the role of CPU0213 on apoptosis in human renal tubular epithelial cell (HK-2. It was shown that high glucose significantly increased the protein expression of Bax and decreased Bcl-2 protein in HK-2 cells, which was reversed by CPU0213. The percentage of HK-2 cells that showed Annexin V-FITC binding was markedly suppressed by CPU0213, which confirmed the inhibitory role of CPU0213 on apoptosis. Given the regulation of endothelin (ET system to oxidative stress, we determined the role of redox signaling in the regulation of CPU0213 on apoptosis. It was demonstrated that the production of superoxide (O2-. was substantially attenuated by CPU0213 treatment in HK-2 cells. We further found that CPU0213 dramatically inhibited expression of Nox4 protein, which gene silencing mimicked the role of CPU0213 on the apoptosis under high glucose stimulation. We finally examined the role of CPU0213 on ET-1 receptors and found that high glucose-induced protein expression of endothelin A and B receptors was dramatically inhibited by CPU0213. Conclusion: Taken together, these results suggest that this Nox4-dependenet O2- production is critical for the apoptosis of HK-2 cells in high glucose. Endothelin receptor antagonist CPU0213 has an anti-apoptosis role through Nox4-dependent O2-.production, which address the nephroprotective role of CPU0213 in diabetic nephropathy.

  14. Mesh-particle interpolations on graphics processing units and multicore central processing units.

    Science.gov (United States)

    Rossinelli, Diego; Conti, Christian; Koumoutsakos, Petros

    2011-06-13

    Particle-mesh interpolations are fundamental operations for particle-in-cell codes, as implemented in vortex methods, plasma dynamics and electrostatics simulations. In these simulations, the mesh is used to solve the field equations and the gradients of the fields are used in order to advance the particles. The time integration of particle trajectories is performed through an extensive resampling of the flow field at the particle locations. The computational performance of this resampling turns out to be limited by the memory bandwidth of the underlying computer architecture. We investigate how mesh-particle interpolation can be efficiently performed on graphics processing units (GPUs) and multicore central processing units (CPUs), and we present two implementation techniques. The single-precision results for the multicore CPU implementation show an acceleration of 45-70×, depending on system size, and an acceleration of 85-155× for the GPU implementation over an efficient single-threaded C++ implementation. In double precision, we observe a performance improvement of 30-40× for the multicore CPU implementation and 20-45× for the GPU implementation. With respect to the 16-threaded standard C++ implementation, the present CPU technique leads to a performance increase of roughly 2.8-3.7× in single precision and 1.7-2.4× in double precision, whereas the GPU technique leads to an improvement of 9× in single precision and 2.2-2.8× in double precision.

  15. Full domain-decomposition scheme for diffuse optical tomography of large-sized tissues with a combined CPU and GPU parallelization.

    Science.gov (United States)

    Yi, Xi; Wang, Xin; Chen, Weiting; Wan, Wenbo; Zhao, Huijuan; Gao, Feng

    2014-05-01

    The common approach to diffuse optical tomography is to solve a nonlinear and ill-posed inverse problem using a linearized iteration process that involves repeated use of the forward and inverse solvers on an appropriately discretized domain of interest. This scheme normally brings severe computation and storage burdens to its applications on large-sized tissues, such as breast tumor diagnosis and brain functional imaging, and prevents from using the matrix-fashioned linear inversions for improved image quality. To cope with the difficulties, we propose in this paper a parallelized full domain-decomposition scheme, which divides the whole domain into several overlapped subdomains and solves the corresponding subinversions independently within the framework of the Schwarz-type iterations, with the support of a combined multicore CPU and multithread graphics processing unit (GPU) parallelization strategy. The numerical and phantom experiments both demonstrate that the proposed method can effectively reduce the computation time and memory occupation for the large-sized problem and improve the quantitative performance of the reconstruction.

  16. Two Multivehicle Routing Problems with Unit-Time Windows

    CERN Document Server

    Frederickson, Greg N

    2011-01-01

    Two multivehicle routing problems are considered in the framework that a visit to a location must take place during a specific time window in order to be counted and all time windows are the same length. In the first problem, the goal is to visit as many locations as possible using a fixed number of vehicles. In the second, the goal is to visit all locations using the smallest number of vehicles possible. For the first problem, we present an approximation algorithm whose output path collects a reward within a constant factor of optimal for any fixed number of vehicles. For the second problem, our algorithm finds a 6-approximation to the problem on a tree metric, whenever a single vehicle could visit all locations during their time windows.

  17. United States forest disturbance trends observed with landsat time series

    Science.gov (United States)

    Jeffrey G. Masek; Samuel N. Goward; Robert E. Kennedy; Warren B. Cohen; Gretchen G. Moisen; Karen Schleweiss; Chengquan. Huang

    2013-01-01

    Disturbance events strongly affect the composition, structure, and function of forest ecosystems; however, existing US land management inventories were not designed to monitor disturbance. To begin addressing this gap, the North American Forest Dynamics (NAFD) project has examined a geographic sample of 50 Landsat satellite image time series to assess trends in forest...

  18. Using hybrid GPU/CPU kernel splitting to accelerate spherical convolutions

    Science.gov (United States)

    Sutter, P. M.; Wandelt, B. D.; Elsner, F.

    2015-06-01

    We present a general method for accelerating by more than an order of magnitude the convolution of pixelated functions on the sphere with a radially-symmetric kernel. Our method splits the kernel into a compact real-space component and a compact spherical harmonic space component. These components can then be convolved in parallel using an inexpensive commodity GPU and a CPU. We provide models for the computational cost of both real-space and Fourier space convolutions and an estimate for the approximation error. Using these models we can determine the optimum split that minimizes the wall clock time for the convolution while satisfying the desired error bounds. We apply this technique to the problem of simulating a cosmic microwave background (CMB) anisotropy sky map at the resolution typical of the high resolution maps produced by the Planck mission. For the main Planck CMB science channels we achieve a speedup of over a factor of ten, assuming an acceptable fractional rms error of order 10-5 in the power spectrum of the output map.

  19. The effect of family structure on parents' child care time in the United States and the United Kingdom

    OpenAIRE

    Kalenkoski, Charlene Marie; Ribar, David C.; Stratton, Leslie Sundt

    2006-01-01

    We use time-diary data from the 2003 and 2004 American Time Use Surveys and the 2000 United Kingdom Time Use Study to estimate the effect of family structure on the time mothers and fathers spend on primary and passive child care and on market work, using a system of correlated Tobit equations and family structure equations. Estimates from these models indicate that single parents in both countries spend more time in child care than married or cohabiting parents. There are differences, howeve...

  20. Air pollution modelling using a graphics processing unit with CUDA

    CERN Document Server

    Molnar, Ferenc; Meszaros, Robert; Lagzi, Istvan; 10.1016/j.cpc.2009.09.008

    2010-01-01

    The Graphics Processing Unit (GPU) is a powerful tool for parallel computing. In the past years the performance and capabilities of GPUs have increased, and the Compute Unified Device Architecture (CUDA) - a parallel computing architecture - has been developed by NVIDIA to utilize this performance in general purpose computations. Here we show for the first time a possible application of GPU for environmental studies serving as a basement for decision making strategies. A stochastic Lagrangian particle model has been developed on CUDA to estimate the transport and the transformation of the radionuclides from a single point source during an accidental release. Our results show that parallel implementation achieves typical acceleration values in the order of 80-120 times compared to CPU using a single-threaded implementation on a 2.33 GHz desktop computer. Only very small differences have been found between the results obtained from GPU and CPU simulations, which are comparable with the effect of stochastic tran...

  1. The Impact of STOV (Standort Verwaltung) on Unit Time Utilization

    Science.gov (United States)

    1981-05-01

    ADDRESS I" OFPnRT DATE U.S. Army Research Institute for May 1981 the Behavioral & Social Sciences 13. NUMBER OF PAGES Alexandria. Virginia 14. MONITORING ...both primary Poci of STOV support and as being significant distractors to mission-oriented activities. They were: (1) building custodial and repair...8217 time. The assignment of a task, unless strictly monitored , may have less to do with what is actually done than we 31 Table 5. Comparison of Diary and

  2. The effect of light curing units, curing time, and veneering materials on resin cement microhardness

    Directory of Open Access Journals (Sweden)

    Nurcan Ozakar Ilday

    2013-06-01

    Conclusion: Light-curing units, curing time, and veneering materials are important factors for achieving adequate dual cure resin composite microhardness. High-intensity light and longer curing times resulted in the highest microhardness values.

  3. 12 CFR 561.54 - United States Treasury Time Deposit Open Account.

    Science.gov (United States)

    2010-01-01

    ... 12 Banks and Banking 5 2010-01-01 2010-01-01 false United States Treasury Time Deposit Open Account. 561.54 Section 561.54 Banks and Banking OFFICE OF THRIFT SUPERVISION, DEPARTMENT OF THE TREASURY DEFINITIONS FOR REGULATIONS AFFECTING ALL SAVINGS ASSOCIATIONS § 561.54 United States Treasury Time...

  4. SwiftLink: parallel MCMC linkage analysis using multicore CPU and GPU.

    Science.gov (United States)

    Medlar, Alan; Głowacka, Dorota; Stanescu, Horia; Bryson, Kevin; Kleta, Robert

    2013-02-15

    Linkage analysis remains an important tool in elucidating the genetic component of disease and has become even more important with the advent of whole exome sequencing, enabling the user to focus on only those genomic regions co-segregating with Mendelian traits. Unfortunately, methods to perform multipoint linkage analysis scale poorly with either the number of markers or with the size of the pedigree. Large pedigrees with many markers can only be evaluated with Markov chain Monte Carlo (MCMC) methods that are slow to converge and, as no attempts have been made to exploit parallelism, massively underuse available processing power. Here, we describe SWIFTLINK, a novel application that performs MCMC linkage analysis by spreading the computational burden between multiple processor cores and a graphics processing unit (GPU) simultaneously. SWIFTLINK was designed around the concept of explicitly matching the characteristics of an algorithm with the underlying computer architecture to maximize performance. We implement our approach using existing Gibbs samplers redesigned for parallel hardware. We applied SWIFTLINK to a real-world dataset, performing parametric multipoint linkage analysis on a highly consanguineous pedigree with EAST syndrome, containing 28 members, where a subset of individuals were genotyped with single nucleotide polymorphisms (SNPs). In our experiments with a four core CPU and GPU, SWIFTLINK achieves a 8.5× speed-up over the single-threaded version and a 109× speed-up over the popular linkage analysis program SIMWALK. SWIFTLINK is available at https://github.com/ajm/swiftlink. All source code is licensed under GPLv3.

  5. Experimental and Transient Thermal Analysis of Heat Sink Fin for CPU processor for better performance

    Science.gov (United States)

    Ravikumar, S.; Subash Chandra, Parisaboina; Harish, Remella; Sivaji, Tallapaneni

    2017-05-01

    The advancement of the digital computer and its utilization day by day is rapidly increasing. But the reliability of electronic components is critically affected by the temperature at which the junction operates. The designers are forced to shorten the overall system dimensions, in extracting the heat and controlling the temperature which focus the studies of electronic cooling. In this project Thermal analysis is carried out with a commercial package provided by ANSYS. The geometric variables and design of heat sink for improving the thermal performance is experimented. This project utilizes thermal analysis to identify a cooling solution for a desktop computer, which uses a 5 W CPU. The design is able to cool the chassis with heat sink joined to the CPU is adequate to cool the whole system. This work considers the circular cylindrical pin fins and rectangular plate heat sink fins design with aluminium base plate and the control of CPU heat sink processes.

  6. Accelerating Large Scale Image Analyses on Parallel, CPU-GPU Equipped Systems.

    Science.gov (United States)

    Teodoro, George; Kurc, Tahsin M; Pan, Tony; Cooper, Lee A D; Kong, Jun; Widener, Patrick; Saltz, Joel H

    2012-05-01

    The past decade has witnessed a major paradigm shift in high performance computing with the introduction of accelerators as general purpose processors. These computing devices make available very high parallel computing power at low cost and power consumption, transforming current high performance platforms into heterogeneous CPU-GPU equipped systems. Although the theoretical performance achieved by these hybrid systems is impressive, taking practical advantage of this computing power remains a very challenging problem. Most applications are still deployed to either GPU or CPU, leaving the other resource under- or un-utilized. In this paper, we propose, implement, and evaluate a performance aware scheduling technique along with optimizations to make efficient collaborative use of CPUs and GPUs on a parallel system. In the context of feature computations in large scale image analysis applications, our evaluations show that intelligently co-scheduling CPUs and GPUs can significantly improve performance over GPU-only or multi-core CPU-only approaches.

  7. Unit root tests in time series, v.1 key concepts and problems

    CERN Document Server

    Patterson, Kerry, Professor

    2011-01-01

    Testing for a unit root is now an essential part of time series analysis. Indeed no time series study in economics, and other disciplines that use time series observations, can ignore the crucial issue of nonstationarity caused by a unit root. However, the literature on the topic is large and often technical, making it difficult to understand the key practical issues. This volume provides an accessible introduction and a critical overview of tests for a unit root in time series, with extensive practical examples and illustrations using simulation analysis. It presents the concepts that enable

  8. Finite differences numerical method for two-dimensional superlattice Boltzmann transport equation and case comparison of CPU(C) and GPGPU(CUDA) implementations

    CERN Document Server

    Priimak, Dmitri

    2014-01-01

    We present finite differences numerical algorithm for solving 2D spatially homogeneous Boltzmann transport equation for semiconductor superlattices (SL) subject to time dependant electric field along SL axis and constant perpendicular magnetic field. Algorithm is implemented in C language targeted to CPU and in CUDA C language targeted to commodity NVidia GPUs. We compare performance and merits of one implementation versus another and discuss various methods of optimization.

  9. Evaluating Access to Eye Care in the Contiguous United States by Calculated Driving Time in the United States Medicare Population.

    Science.gov (United States)

    Lee, Cecilia S; Morris, Aneesha; Van Gelder, Russell N; Lee, Aaron Y

    2016-12-01

    To quantify the proximity to eye care in the contiguous United States by calculating driving routes and driving time using a census-based approach. Cross-sectional study based on United States (US) census data, Medicare payment data, and OpenStreetMap. 2010 US census survey respondents older than 65 years. For each state in the United States, the addresses of all practicing ophthalmologists and optometrists were obtained from the 2012 Medicare Provider Utilization and Payment Data from the Centers for Medicare and Medicaid Services (CMS). The US census data from 2010 then were used to calculate the geolocation of the US population at the block group level and the number of people older than 65 years in each location. Geometries and driving speed limits of every road, street, and highway in the United States from the OpenStreetMap project were used to calculate the exact driving distance and driving time to the nearest eye care provider. Driving time and driving distance to the nearest optometrist and ophthalmologist per state. Driving times for 3.79×10(7) persons were calculated using a total of 3.88×10(7) available roads for the 25 508 optometrists and 17 071 ophthalmologists registered with the CMS. Nationally, the median driving times to the nearest optometrist and ophthalmologist were 2.91 and 4.52 minutes, respectively. Ninety percent of the population lives within a 13.66- and 25.21-minute drive, respectively, to the nearest optometrist and ophthalmologist. While there are regional variations, overall more than 90% of the US Medicare beneficiary population lives within a 30-minute drive of an ophthalmologist and within 15 minutes of an optometrist. Copyright © 2016 American Academy of Ophthalmology. Published by Elsevier Inc. All rights reserved.

  10. Conserved-peptide upstream open reading frames (CPuORFs are associated with regulatory genes in angiosperms

    Directory of Open Access Journals (Sweden)

    Richard A Jorgensen

    2012-08-01

    Full Text Available Upstream open reading frames (uORFs are common in eukaryotic transcripts, but those that encode conserved peptides (CPuORFs occur in less than 1% of transcripts. The peptides encoded by three plant CPuORF families are known to control translation of the downstream ORF in response to a small signal molecule (sucrose, polyamines and phosphocholine. In flowering plants, transcription factors are statistically over-represented among genes that possess CPuORFs, and in general it appeared that many CPuORF genes also had other regulatory functions, though the significance of this suggestion was uncertain (Hayden and Jorgensen, 2007. Five years later the literature provides much more information on the functions of many CPuORF genes. Here we reassess the functions of 27 known CPuORF gene families and find that 22 of these families play a variety of different regulatory roles, from transcriptional control to protein turnover, and from small signal molecules to signal transduction kinases. Clearly then, there is indeed a strong association of CPuORFs with regulatory genes. In addition, 16 of these families play key roles in a variety of different biological processes. Most strikingly, the core sucrose response network includes three different CPuORFs, creating the potential for sophisticated balancing of the network in response to three different molecular inputs. We propose that the function of most CPuORFs is to modulate translation of a downstream major ORF (mORF in response to a signal molecule recognized by the conserved peptide and that because the mORFs of CPuORF genes generally encode regulatory proteins, many of them centrally important in the biology of plants, CPuORFs play key roles in balancing such regulatory networks.

  11. Toward Performance Portability of the FV3 Weather Model on CPU, GPU and MIC Processors

    Science.gov (United States)

    Govett, Mark; Rosinski, James; Middlecoff, Jacques; Schramm, Julie; Stringer, Lynd; Yu, Yonggang; Harrop, Chris

    2017-04-01

    The U.S. National Weather Service has selected the FV3 (Finite Volume cubed) dynamical core to become part of the its next global operational weather prediction model. While the NWS is preparing to run FV3 operationally in late 2017, NOAA's Earth System Research Laboratory is adapting the model to be capable of running on next-generation GPU and MIC processors. The FV3 model was designed in the 1990s, and while it has been extensively optimized for traditional CPU chips, some code refactoring has been required to expose sufficient parallelism needed to run on fine-grain GPU processors. The code transformations must demonstrate bit-wise reproducible results with the original CPU code, and between CPU, GPU and MIC processors. We will describe the parallelization and performance while attempting to maintain performance portability between CPU, GPU and MIC with the Fortran source code. Performance results will be shown using NOAA's new Pascal based fine-grain GPU system (800 GPUs), and for the Knights Landing processor on the National Science Foundation (NSF) Stampede-2 system.

  12. CPU SIM: A Computer Simulator for Use in an Introductory Computer Organization-Architecture Class.

    Science.gov (United States)

    Skrein, Dale

    1994-01-01

    CPU SIM, an interactive low-level computer simulation package that runs on the Macintosh computer, is described. The program is designed for instructional use in the first or second year of undergraduate computer science, to teach various features of typical computer organization through hands-on exercises. (MSE)

  13. DSM vs. NSM: CPU Performance Tradeoffs in Block-Oriented Query Processing

    NARCIS (Netherlands)

    M. Zukowski (Marcin); N.J. Nes (Niels); P.A. Boncz (Peter)

    2008-01-01

    textabstractComparisons between the merits of row-wise storage (NSM) and columnar storage (DSM) are typically made with respect to the persistent storage layer of database systems. In this paper, however, we focus on the CPU efficiency tradeoffs of tuple representations inside the query execution en

  14. Computational performance comparison of wavefront reconstruction algorithms for the European Extremely Large Telescope on multi-CPU architecture.

    Science.gov (United States)

    Feng, Lu; Fedrigo, Enrico; Béchet, Clémentine; Brunner, Elisabeth; Pirani, Werther

    2012-06-01

    The European Southern Observatory (ESO) is studying the next generation giant telescope, called the European Extremely Large Telescope (E-ELT). With a 42 m diameter primary mirror, it is a significant step from currently existing telescopes. Therefore, the E-ELT with its instruments poses new challenges in terms of cost and computational complexity for the control system, including its adaptive optics (AO). Since the conventional matrix-vector multiplication (MVM) method successfully used so far for AO wavefront reconstruction cannot be efficiently scaled to the size of the AO systems on the E-ELT, faster algorithms are needed. Among those recently developed wavefront reconstruction algorithms, three are studied in this paper from the point of view of design, implementation, and absolute speed on three multicore multi-CPU platforms. We focus on a single-conjugate AO system for the E-ELT. The algorithms are the MVM, the Fourier transform reconstructor (FTR), and the fractal iterative method (FRiM). This study enhances the scaling of these algorithms with an increasing number of CPUs involved in the computation. We discuss implementation strategies, depending on various CPU architecture constraints, and we present the first quantitative execution times so far at the E-ELT scale. MVM suffers from a large computational burden, making the current computing platform undersized to reach timings short enough for AO wavefront reconstruction. In our study, the FTR provides currently the fastest reconstruction. FRiM is a recently developed algorithm, and several strategies are investigated and presented here in order to implement it for real-time AO wavefront reconstruction, and to optimize its execution time. The difficulty to parallelize the algorithm in such architecture is enhanced. We also show that FRiM can provide interesting scalability using a sparse matrix approach.

  15. SpaceCubeX: A Framework for Evaluating Hybrid Multi-Core CPU FPGA DSP Architectures

    Science.gov (United States)

    Schmidt, Andrew G.; Weisz, Gabriel; French, Matthew; Flatley, Thomas; Villalpando, Carlos Y.

    2017-01-01

    The SpaceCubeX project is motivated by the need for high performance, modular, and scalable on-board processing to help scientists answer critical 21st century questions about global climate change, air quality, ocean health, and ecosystem dynamics, while adding new capabilities such as low-latency data products for extreme event warnings. These goals translate into on-board processing throughput requirements that are on the order of 100-1,000 more than those of previous Earth Science missions for standard processing, compression, storage, and downlink operations. To study possible future architectures to achieve these performance requirements, the SpaceCubeX project provides an evolvable testbed and framework that enables a focused design space exploration of candidate hybrid CPU/FPGA/DSP processing architectures. The framework includes ArchGen, an architecture generator tool populated with candidate architecture components, performance models, and IP cores, that allows an end user to specify the type, number, and connectivity of a hybrid architecture. The framework requires minimal extensions to integrate new processors, such as the anticipated High Performance Spaceflight Computer (HPSC), reducing time to initiate benchmarking by months. To evaluate the framework, we leverage a wide suite of high performance embedded computing benchmarks and Earth science scenarios to ensure robust architecture characterization. We report on our projects Year 1 efforts and demonstrate the capabilities across four simulation testbed models, a baseline SpaceCube 2.0 system, a dual ARM A9 processor system, a hybrid quad ARM A53 and FPGA system, and a hybrid quad ARM A53 and DSP system.

  16. Fast Parallel Image Registration on CPU and GPU for Diagnostic Classification of Alzheimer's Disease

    Directory of Open Access Journals (Sweden)

    Denis P Shamonin

    2014-01-01

    Full Text Available Nonrigid image registration is an important, but time-consuming taskin medical image analysis. In typical neuroimaging studies, multipleimage registrations are performed, i.e. for atlas-based segmentationor template construction. Faster image registration routines wouldtherefore be beneficial.In this paper we explore acceleration of the image registrationpackage elastix by a combination of several techniques: iparallelization on the CPU, to speed up the cost function derivativecalculation; ii parallelization on the GPU building on andextending the OpenCL framework from ITKv4, to speed up the Gaussianpyramid computation and the image resampling step; iii exploitationof certain properties of the B-spline transformation model; ivfurther software optimizations.The accelerated registration tool is employed in a study ondiagnostic classification of Alzheimer's disease and cognitivelynormal controls based on T1-weighted MRI. We selected 299participants from the publicly available Alzheimer's DiseaseNeuroimaging Initiative database. Classification is performed with asupport vector machine based on gray matter volumes as a marker foratrophy. We evaluated two types of strategies (voxel-wise andregion-wise that heavily rely on nonrigid image registration.Parallelization and optimization resulted in an acceleration factorof 4-5x on an 8-core machine. Using OpenCL a speedup factor of ~2was realized for computation of the Gaussian pyramids, and 15-60 forthe resampling step, for larger images. The voxel-wise and theregion-wise classification methods had an area under thereceiver operator characteristic curve of 88% and 90%,respectively, both for standard and accelerated registration.We conclude that the image registration package elastix wassubstantially accelerated, with nearly identical results to thenon-optimized version. The new functionality will become availablein the next release of elastix as open source under the BSD license.

  17. Fast parallel image registration on CPU and GPU for diagnostic classification of Alzheimer's disease.

    Science.gov (United States)

    Shamonin, Denis P; Bron, Esther E; Lelieveldt, Boudewijn P F; Smits, Marion; Klein, Stefan; Staring, Marius

    2013-01-01

    Nonrigid image registration is an important, but time-consuming task in medical image analysis. In typical neuroimaging studies, multiple image registrations are performed, i.e., for atlas-based segmentation or template construction. Faster image registration routines would therefore be beneficial. In this paper we explore acceleration of the image registration package elastix by a combination of several techniques: (i) parallelization on the CPU, to speed up the cost function derivative calculation; (ii) parallelization on the GPU building on and extending the OpenCL framework from ITKv4, to speed up the Gaussian pyramid computation and the image resampling step; (iii) exploitation of certain properties of the B-spline transformation model; (iv) further software optimizations. The accelerated registration tool is employed in a study on diagnostic classification of Alzheimer's disease and cognitively normal controls based on T1-weighted MRI. We selected 299 participants from the publicly available Alzheimer's Disease Neuroimaging Initiative database. Classification is performed with a support vector machine based on gray matter volumes as a marker for atrophy. We evaluated two types of strategies (voxel-wise and region-wise) that heavily rely on nonrigid image registration. Parallelization and optimization resulted in an acceleration factor of 4-5x on an 8-core machine. Using OpenCL a speedup factor of 2 was realized for computation of the Gaussian pyramids, and 15-60 for the resampling step, for larger images. The voxel-wise and the region-wise classification methods had an area under the receiver operator characteristic curve of 88 and 90%, respectively, both for standard and accelerated registration. We conclude that the image registration package elastix was substantially accelerated, with nearly identical results to the non-optimized version. The new functionality will become available in the next release of elastix as open source under the BSD license.

  18. Permeability of CPU-86017 through the BBB after Intravenous and Intracerebroventricular Injection in Mice%氯苄律定CPU-86017在小鼠静脉给药和脑室给药后通过血脑屏障的渗透性

    Institute of Scientific and Technical Information of China (English)

    查玛拉; 林生; 戴德哉

    2003-01-01

    目的:研究小鼠静脉给药和脑室给药后CPU-86017通过血脑屏障的双向渗透性.方法:小鼠给予3.0 mg/kg CPU-86017(盐酸对氯苄四氢小檗碱)后,在5,10,20,30,60 min时用HPLC测脑、心、肾和血中CPU-86017的水平.结果:两种途径给药的CPU-86017在小鼠脑,心、肾和血浆中的最大浓度基本上在10 min时都达到最大值,在静脉给药组分别为0.83±0.335,25.13±4.17,56.0±19.69,2.23±0.97 μg/ml,在脑室给药组分别为23.68±4.2,15.9±10.24,7.93±4.68,3.32±2.3 μg/ml.小鼠血浆浓度很快减少.静脉给药后CPU-86017在肾中的最大浓度是56.0±19.69 μg/g,脑室给药后,在脑中的最大浓度是23.68±4.2 μg/g.静脉给药60 min后,CPU-86017在肾中和脑中的浓度没有显著性差异.脑室给药后,CPU-86017到达外周组织和血浆中的浓度每5分钟都会有变化,而静脉给药后,在20,30和60 min时,脑中的药物浓度仍不能被检测出.结论:CPU-86017通过血脑屏障的渗透可以通过两条途径:从血液循环到脑和从脑到全身循环.然而,这两条途径存在很大的不同.从血渗透到脑比从脑渗透到外周更困难.%AIM:To investigate the bi-directional penetration of CPU-86017 across the BBB (Blood Brain Barrier) following iv and icv (intracerebroventricular) administration in mice.METHOD:The levels of CPU-86017 (p-Chlorobenzyltetrahydroberberine hydrochloride) in the brain, heart, kidney and blood of mice after acute administration of 3.0 mg/kg of CPU-86017 were measured by validated HPLC assay at several time points: 5, 10, 20, 30 and 60 minutes. RESULT:The maximum concentrations of CPU-86017 in the brain, heart, kidney and plasma achieved at 10 minutes by both routes of administration were 0.83±0.335, 25.13±4.17, 56.0±19.69, and 2.23±0.97 μg/ml in the iv group and 23.68±4.2,15.9±10.24, 7.93±4.68 and 3.32±2.3 μg/ml in the icv group, respectively. The decline in concentrations was rapid in plasma. The highest concentration of CPU

  19. Asymptotic fitting optimization technology for source-to-source compile system on CPU-GPU architecture%面向CPU-GPU源到源编译系统的渐近拟合优化方法

    Institute of Scientific and Technical Information of China (English)

    魏洪昌; 朱正东; 董小社; 宁洁

    2016-01-01

    Aiming at addressing the problem of the inadequate performance optimization after developing and porting of application on CPU-GPU heterogeneous parallel system, a new approach for CPU-GPU system is proposed, which com-bines asymptotic fitting optimization with source-to-source compiling technique. This approach can translate C code that inserts directives into CUDA code, and profile the generated code several times. Besides, the approach can realize the source-to-source compiling and optimization of the generated code automatically, and a prototype system based on the approach is realized in this paper as well. Functionality and performance evaluations of the prototype show that the gener-ated CUDA code is functionally equivalent to the original C code while its improvement in performance is significant. When compared with CUDA benchmark, the performance of the generated CUDA code is obviously better than codes generated by other source-to-source compiling technique.%针对CPU-GPU异构并行系统应用开发移植后优化不充分问题,提出了一种渐近拟合优化与源到源编译相结合的方法,该方法能够对插入了制导语句的C语言程序转换为CUDA语言后的程序进行多次剖分,根据源程序特性和硬件信息自动完成源到源编译与优化,并基于该方法实现了原型系统。通过在不同环境中的该原型系统在功能和性能方面进行的测试表明,由系统生成的CUDA目标程序与C源程序在功能上一致,性能上却有了大幅度提高,通过与CUDA基准测试程序相比表明,该目标程序在性能上明显优于其他源到源编译转换生成的程序。

  20. Isoproterenol disperses distribution of NADPH oxidase, MMP-9, and pPKCε in the heart, which are mitigated by endothelin receptor antagonist CPU0213

    Institute of Scientific and Technical Information of China (English)

    Yusi CHENG; De-zai DAI; Yin DAI

    2009-01-01

    Aim: Spatial dispersion of bioactive substances in the myocardium could serve as pathological basis for arrhythmogenesis and cardiac impairment by β-adrenoceptor stimulation. We hypothesized that dispersed NADPH oxidase, protein kinase Cε (PKCε), early response gene (ERG), and matrix metalloproteinase 9 (MMP-9) across the heart by isoproterenol (ISO) medication might be mediated by the endothelin (ET) - ROS pathway. We aimed to verify if ISO induced spatially heterogeneous distribution of pPKCε, NAPDH oxidase, MMP-9 and ERG could be mitigated by either an ET receptor antagonist CPU0213 or iNOS inhibitor aminoguanidine.Methods: Rats were treated with ISO (1 mg/kg sc) for 10 days, and drug interventions (mg/kg) either CPU0213 (30 sc) or aminoguani-dine (100 ip) were administered on days 8-10. Expression of NADPH oxidase, MMP-9, ERG, and PKCε in the left and right ventricle (LV, RV) and septum (S) were measured separately.Results: Ventricular hypertrophy was found in the LV, S, and RV, in association with dispersed QTc and oxidative stress in ISO-treated rats. mRNA and protein expression of MMP-9, PKCε, NADPH oxidase and ERG in the LV, S, and RV were obviously dispersed, with aug-mented expression mainly in the LV and S. Dispersed parameters were re-harmonized by either CPU0213, or aminoguanidine. Conclusion: We found at the first time that ISO-induced dispersed distribution of pPKCε, NADPH oxidase, MMP-9, and ERG in the LV, S,and RV of the heart, which were suppressed by either CPU0213 or aminoguanidine. It indicates that the ET-ROS pathway plays a role in the dispersed distribution of bioactive substances following sustained β-receptor stimulation.

  1. The Magnitude and Time Course of Muscle Cross-section Decrease in Intensive Care Unit Patients.

    Science.gov (United States)

    Ten Haaf, Dianne; Hemmen, Bea; van de Meent, Henk; BovendʼEerdt, Thamar J H

    2017-09-01

    Bedriddenness and immobilization of patients at an intensive care unit may result in muscle atrophy and devaluation in quality of life. The exact effect of immobilization on intensive care unit patients is not known. The aim of this study was to investigate the magnitude and time course of muscle cross-section decrease in acute critically ill patients admitted to the intensive care unit. An observational pilot study was performed in intensive care unit patients. Data of bilateral ultrasound muscle cross-section measurements of the knee extensors and the elbow flexors were collected. Thirty-four intensive care unit patients were included in this study; data are presented from 14 patients who were measured at least three times. Repeated measures analysis of variance shows a significant decrease in muscle cross-section over time (F1,13 = 80.40, P ≤ 0.001).The decrease in muscle cross-section of the arms was significantly higher (F1,13 = 5.38, P = 0.037) than the decrease of the legs. Four weeks after intensive care unit admission, the muscle cross-section decrease had not reached an asymptote yet. The muscle cross-section decrease in bedridden intensive care unit patients is significant for a time of 2 to 4 weeks. The decrease in muscle cross-section of the arms is greater than the decrease of the legs.

  2. Plasma levels of carboxypeptidase U (CPU, CPB2 or TAFIa) are elevated in patients with acute myocardial infarction.

    Science.gov (United States)

    Leenaerts, D; Bosmans, J M; van der Veken, P; Sim, Y; Lambeir, A M; Hendriks, D

    2015-12-01

    Two decades after its discovery, carboxypeptidase U (CPU, CPB2 or TAFIa) has become a compelling drug target in thrombosis research. However, given the difficulty of measuring CPU in the blood circulation and the demanding sample collecton requirements, previous clinical studies focused mainly on measuring its inactive precursor, proCPU (proCPB2 or TAFI). Using a sensitive and specific enzymatic assay, we investigated plasma CPU levels in patients presenting with acute myocardial infarction (AMI) and in controls. In this case-control study, peripheral arterial blood samples were collected from 45 patients with AMI (25 with ST segment elevation myocardial infarction [STEMI], 20 with non-ST segment elevation myocardial infarction [NSTEMI]) and 42 controls. Additionally, intracoronary blood samples were collected from 11 STEMI patients during thrombus aspiration. Subsequently, proCPU and CPU plasma concentrations in all samples were measured by means of an activity-based assay, using Bz-o-cyano-Phe-Arg as a selective substrate. CPU activity levels were higher in patients with AMI (median LOD-LOQ, range 0-1277 mU L(-1) ) than in controls (median CPU levels and AMI type (NSTEMI [median between LOD-LOQ, range 0-465 mU L(-1) ] vs. STEMI [median between LOD-LOQ, range 0-1277 mU L(-1) ]). Intracoronary samples (median 109 mU L(-1) , range 0-759 mU L(-1) ) contained higher CPU levels than did peripheral samples (median between LOD-LOQ, range 0-107 mU L(-1) ), indicating increased local CPU generation. With regard to proCPU, we found lower levels in AMI patients (median 910 U L(-1) , range 706-1224 U L(-1) ) than in controls (median 1010 U L(-1) , range 753-1396 U L(-1) ). AMI patients have higher plasma CPU levels and lower proCPU levels than controls. This finding indicates in vivo generation of functional active CPU in patients with AMI. © 2015 International Society on Thrombosis and Haemostasis.

  3. Der ATLAS LVL2-Trigger mit FPGA-Prozessoren : Entwicklung, Aufbau und Funktionsnachweis des hybriden FPGA/CPU-basierten Prozessorsystems ATLANTIS

    CERN Document Server

    Singpiel, Holger

    2000-01-01

    This thesis describes the conception and implementation of the hybrid FPGA/CPU based processing system ATLANTIS as trigger processor for the proposed ATLAS experiment at CERN. CompactPCI provides the close coupling of a multi FPGA system and a standard CPU. The system is scalable in computing power and flexible in use due to its partitioning into dedicated FPGA boards for computation, I/O tasks and a private communication. Main focus of the research activities based on the usage of the ATLANTIS system are two areas in the second level trigger (LVL2). First, the acceleration of time critical B physics trigger algorithms is the major aim. The execution of the full scan TRT algorithm on ATLANTIS, which has been used as a demonstrator, results in a speedup of 5.6 compared to a standard CPU. Next, the ATLANTIS system is used as a hardware platform for research work in conjunction with the ATLAS readout systems. For further studies a permanent installation of the ATLANTIS system in the LVL2 application testbed is f...

  4. Rapid learning-based video stereolization using graphic processing unit acceleration

    Science.gov (United States)

    Sun, Tian; Jung, Cheolkon; Wang, Lei; Kim, Joongkyu

    2016-09-01

    Video stereolization has received much attention in recent years due to the lack of stereoscopic three-dimensional (3-D) contents. Although video stereolization can enrich stereoscopic 3-D contents, it is hard to achieve automatic two-dimensional-to-3-D conversion with less computational cost. We proposed rapid learning-based video stereolization using a graphic processing unit (GPU) acceleration. We first generated an initial depth map based on learning from examples. Then, we refined the depth map using saliency and cross-bilateral filtering to make object boundaries clear. Finally, we performed depth-image-based-rendering to generate stereoscopic 3-D views. To accelerate the computation of video stereolization, we provided a parallelizable hybrid GPU-central processing unit (CPU) solution to be suitable for running on GPU. Experimental results demonstrate that the proposed method is nearly 180 times faster than CPU-based processing and achieves a good performance comparable to the-state-of-the-art ones.

  5. Leveraging the checkpoint-restart technique for optimizing CPU efficiency of ATLAS production applications on opportunistic platforms

    CERN Document Server

    Cameron, David; The ATLAS collaboration

    2017-01-01

    Data processing applications of the ATLAS experiment, such as event simulation and reconstruction, spend considerable amount of time in the initialization phase. This phase includes loading a large number of shared libraries, reading detector geometry and condition data from external databases, building a transient representation of the detector geometry and initializing various algorithms and services. In some cases the initialization step can take as long as 10-15 minutes. Such slow initialization, being inherently serial, has a significant negative impact on overall CPU efficiency of the production job, especially when the job is executed on opportunistic, often short-lived, resources such as commercial clouds or volunteer computing. In order to improve this situation, we can take advantage of the fact that ATLAS runs large numbers of production jobs with similar configuration parameters (e.g. jobs within the same production task). This allows us to checkpoint one job at the end of its configuration step a...

  6. A time study of physicians' work in a German university eye hospital to estimate unit costs.

    Directory of Open Access Journals (Sweden)

    Jan Wolff

    Full Text Available Technical efficiency of hospital services is debated since performance has been heterogeneous. Staff time represents the main resource in patient care and its inappropriate allocation has been identified as a key factor of inefficiency. The aim of this study was to analyse the utilisation of physicians' work time stratified by staff groups, tasks and places of work. A further aim was to use these data to estimate resource use per unit of output.A self-reporting work-sampling study was carried during 14-days at a University Eye Hospital. Staff costs of physicians per unit of output were calculated at the wards, the operating rooms and the outpatient unit.Forty per cent of total work time was spent in contact with the patient. Thirty per cent was spent with documentation tasks. Time spent with documentation tasks declined monotonically with increasing seniority of staff. Unit costs were 56 € per patient day at the wards, 77 € and 20 € per intervention at the operating rooms for inpatients and outpatients, respectively, and 33 € per contact at the outpatient unit. Substantial differences in resources directly dedicated to the patient were found between these locations.The presented data provide unprecedented units costs in inpatient Ophthalmology. Future research should focus on analysing factors that influence differences in time allocation, such as types of patients, organisation of care processes and composition of staff.

  7. Performance of heterogeneous computing with graphics processing unit and many integrated core for hartree potential calculations on a numerical grid.

    Science.gov (United States)

    Choi, Sunghwan; Kwon, Oh-Kyoung; Kim, Jaewook; Kim, Woo Youn

    2016-09-15

    We investigated the performance of heterogeneous computing with graphics processing units (GPUs) and many integrated core (MIC) with 20 CPU cores (20×CPU). As a practical example toward large scale electronic structure calculations using grid-based methods, we evaluated the Hartree potentials of silver nanoparticles with various sizes (3.1, 3.7, 4.9, 6.1, and 6.9 nm) via a direct integral method supported by the sinc basis set. The so-called work stealing scheduler was used for efficient heterogeneous computing via the balanced dynamic distribution of workloads between all processors on a given architecture without any prior information on their individual performances. 20×CPU + 1GPU was up to ∼1.5 and ∼3.1 times faster than 1GPU and 20×CPU, respectively. 20×CPU + 2GPU was ∼4.3 times faster than 20×CPU. The performance enhancement by CPU + MIC was considerably lower than expected because of the large initialization overhead of MIC, although its theoretical performance is similar with that of CPU + GPU. © 2016 Wiley Periodicals, Inc.

  8. MASSIVELY PARALLEL LATENT SEMANTIC ANALYSES USING A GRAPHICS PROCESSING UNIT

    Energy Technology Data Exchange (ETDEWEB)

    Cavanagh, J.; Cui, S.

    2009-01-01

    Latent Semantic Analysis (LSA) aims to reduce the dimensions of large term-document datasets using Singular Value Decomposition. However, with the ever-expanding size of datasets, current implementations are not fast enough to quickly and easily compute the results on a standard PC. A graphics processing unit (GPU) can solve some highly parallel problems much faster than a traditional sequential processor or central processing unit (CPU). Thus, a deployable system using a GPU to speed up large-scale LSA processes would be a much more effective choice (in terms of cost/performance ratio) than using a PC cluster. Due to the GPU’s application-specifi c architecture, harnessing the GPU’s computational prowess for LSA is a great challenge. We presented a parallel LSA implementation on the GPU, using NVIDIA® Compute Unifi ed Device Architecture and Compute Unifi ed Basic Linear Algebra Subprograms software. The performance of this implementation is compared to traditional LSA implementation on a CPU using an optimized Basic Linear Algebra Subprograms library. After implementation, we discovered that the GPU version of the algorithm was twice as fast for large matrices (1 000x1 000 and above) that had dimensions not divisible by 16. For large matrices that did have dimensions divisible by 16, the GPU algorithm ran fi ve to six times faster than the CPU version. The large variation is due to architectural benefi ts of the GPU for matrices divisible by 16. It should be noted that the overall speeds for the CPU version did not vary from relative normal when the matrix dimensions were divisible by 16. Further research is needed in order to produce a fully implementable version of LSA. With that in mind, the research we presented shows that the GPU is a viable option for increasing the speed of LSA, in terms of cost/performance ratio.

  9. Unit 03 - Introduction to Computers

    OpenAIRE

    Unit 74, CC in GIS; National Center for Geographic Information and Analysis

    1990-01-01

    This unit provides a brief introduction to computer hardware and software. It discusses binary notation, the ASCII coding system and hardware components including the central processing unit (CPU), memory, peripherals and storage media. Software including operating systems, word processors database packages, spreadsheets and statistical packages are briefly described.

  10. High performance technique for database applicationsusing a hybrid GPU/CPU platform

    KAUST Repository

    Zidan, Mohammed A.

    2012-07-28

    Many database applications, such as sequence comparing, sequence searching, and sequence matching, etc, process large database sequences. we introduce a novel and efficient technique to improve the performance of database applica- tions by using a Hybrid GPU/CPU platform. In particular, our technique solves the problem of the low efficiency result- ing from running short-length sequences in a database on a GPU. To verify our technique, we applied it to the widely used Smith-Waterman algorithm. The experimental results show that our Hybrid GPU/CPU technique improves the average performance by a factor of 2.2, and improves the peak performance by a factor of 2.8 when compared to earlier implementations. Copyright © 2011 by ASME.

  11. Performance Engineering of the Kernel Polynomial Method on Large-Scale CPU-GPU Systems

    CERN Document Server

    Kreutzer, Moritz; Wellein, Gerhard; Pieper, Andreas; Alvermann, Andreas; Fehske, Holger

    2014-01-01

    The Kernel Polynomial Method (KPM) is a well-established scheme in quantum physics and quantum chemistry to determine the eigenvalue density and spectral properties of large sparse matrices. In this work we demonstrate the high optimization potential and feasibility of peta-scale heterogeneous CPU-GPU implementations of the KPM. At the node level we show that it is possible to decouple the sparse matrix problem posed by KPM from main memory bandwidth both on CPU and GPU. To alleviate the effects of scattered data access we combine loosely coupled outer iterations with tightly coupled block sparse matrix multiple vector operations, which enables pure data streaming. All optimizations are guided by a performance analysis and modelling process that indicates how the computational bottlenecks change with each optimization step. Finally we use the optimized node-level KPM with a hybrid-parallel framework to perform large scale heterogeneous electronic structure calculations for novel topological materials on a pet...

  12. Design Patterns for Sparse-Matrix Computations on Hybrid CPU/GPU Platforms

    Directory of Open Access Journals (Sweden)

    Valeria Cardellini

    2014-01-01

    Full Text Available We apply object-oriented software design patterns to develop code for scientific software involving sparse matrices. Design patterns arise when multiple independent developments produce similar designs which converge onto a generic solution. We demonstrate how to use design patterns to implement an interface for sparse matrix computations on NVIDIA GPUs starting from PSBLAS, an existing sparse matrix library, and from existing sets of GPU kernels for sparse matrices. We also compare the throughput of the PSBLAS sparse matrix–vector multiplication on two platforms exploiting the GPU with that obtained by a CPU-only PSBLAS implementation. Our experiments exhibit encouraging results regarding the comparison between CPU and GPU executions in double precision, obtaining a speedup of up to 35.35 on NVIDIA GTX 285 with respect to AMD Athlon 7750, and up to 10.15 on NVIDIA Tesla C2050 with respect to Intel Xeon X5650.

  13. Turbo Charge CPU Utilization in Fork/Join Using the ManagedBlocker

    CERN Document Server

    CERN. Geneva

    2017-01-01

    Fork/Join is a framework for parallelizing calculations using recursive decomposition, also called divide and conquer. These algorithms occasionally end up duplicating work, especially at the beginning of the run. We can reduce wasted CPU cycles by implementing a reserved caching scheme. Before a task starts its calculation, it tries to reserve an entry in the shared map. If it is successful, it immediately begins. If not, it blocks until the other thread has finished its calculation. Unfortunately this might result in a significant number of blocked threads, decreasing CPU utilization. In this talk we will demonstrate this issue and offer a solution in the form of the ManagedBlocker. Combined with the Fork/Join, it can keep parallelism at the desired level.

  14. Screening methods for linear-scaling short-range hybrid calculations on CPU and GPU architectures

    Science.gov (United States)

    Beuerle, Matthias; Kussmann, Jörg; Ochsenfeld, Christian

    2017-04-01

    We present screening schemes that allow for efficient, linear-scaling short-range exchange calculations employing Gaussian basis sets for both CPU and GPU architectures. They are based on the LinK [C. Ochsenfeld et al., J. Chem. Phys. 109, 1663 (1998)] and PreLinK [J. Kussmann and C. Ochsenfeld, J. Chem. Phys. 138, 134114 (2013)] methods, but account for the decay introduced by the attenuated Coulomb operator in short-range hybrid density functionals. Furthermore, we discuss the implementation of short-range electron repulsion integrals on GPUs. The introduction of our screening methods allows for speedups of up to a factor 7.8 as compared to the underlying linear-scaling algorithm, while retaining full numerical control over the accuracy. With the increasing number of short-range hybrid functionals, our new schemes will allow for significant computational savings on CPU and GPU architectures.

  15. DOE SBIR Phase-1 Report on Hybrid CPU-GPU Parallel Development of the Eulerian-Lagrangian Barracuda Multiphase Program

    Energy Technology Data Exchange (ETDEWEB)

    Dr. Dale M. Snider

    2011-02-28

    This report gives the result from the Phase-1 work on demonstrating greater than 10x speedup of the Barracuda computer program using parallel methods and GPU processors (General-Purpose Graphics Processing Unit or Graphics Processing Unit). Phase-1 demonstrated a 12x speedup on a typical Barracuda function using the GPU processor. The problem test case used about 5 million particles and 250,000 Eulerian grid cells. The relative speedup, compared to a single CPU, increases with increased number of particles giving greater than 12x speedup. Phase-1 work provided a path for reformatting data structure modifications to give good parallel performance while keeping a friendly environment for new physics development and code maintenance. The implementation of data structure changes will be in Phase-2. Phase-1 laid the ground work for the complete parallelization of Barracuda in Phase-2, with the caveat that implemented computer practices for parallel programming done in Phase-1 gives immediate speedup in the current Barracuda serial running code. The Phase-1 tasks were completed successfully laying the frame work for Phase-2. The detailed results of Phase-1 are within this document. In general, the speedup of one function would be expected to be higher than the speedup of the entire code because of I/O functions and communication between the algorithms. However, because one of the most difficult Barracuda algorithms was parallelized in Phase-1 and because advanced parallelization methods and proposed parallelization optimization techniques identified in Phase-1 will be used in Phase-2, an overall Barracuda code speedup (relative to a single CPU) is expected to be greater than 10x. This means that a job which takes 30 days to complete will be done in 3 days. Tasks completed in Phase-1 are: Task 1: Profile the entire Barracuda code and select which subroutines are to be parallelized (See Section Choosing a Function to Accelerate) Task 2: Select a GPU consultant company and

  16. LHCb: Statistical Comparison of CPU performance for LHCb applications on the Grid

    CERN Multimedia

    Graciani, R

    2009-01-01

    The usage of CPU resources by LHCb on the Grid id dominated by two different applications: Gauss and Brunel. Gauss the application doing the Monte Carlo simulation of proton-proton collisions. Brunel is the application responsible for the reconstruction of the signals recorded by the detector converting them into objects that can be used for later physics analysis of the data (tracks, clusters,…) Both applications are based on the Gaudi and LHCb software frameworks. Gauss uses Pythia and Geant as underlying libraries for the simulation of the collision and the later passage of the generated particles through the LHCb detector. While Brunel makes use of LHCb specific code to process the data from each sub-detector. Both applications are CPU bound. Large Monte Carlo productions or data reconstructions running on the Grid are an ideal benchmark to compare the performance of the different CPU models for each case. Since the processed events are only statistically comparable, only statistical comparison of the...

  17. Exploring Heterogeneous NoC Design Space in Heterogeneous GPU-CPU Architectures

    Institute of Scientific and Technical Information of China (English)

    方娟; 姚治成; 冷镇宇; 隋秀峰; 刘思彤

    2015-01-01

    Computer architecture is transiting from the multicore era into the heterogeneous era in which heterogeneous architectures use on-chip networks to access shared resources and how a network is configured will likely have a significant impact on overall performance and power consumption. Recently, heterogeneous network on chip (NoC) has been proposed not only to achieve performance comparable to that of the NoCs with buffered routers but also to reduce buffer cost and energy consumption. However, heterogeneous NoC design for heterogeneous GPU-CPU architectures has not been studied in depth. This paper first evaluates the performance and power consumption of a variety of static hot-potato based heterogeneous NoCs with different buffered and bufferless router placements, which is helpful to explore the design space for heterogeneous GPU-CPU interconnection. Then it proposes Unidirectional Flow Control (UFC), a simple credit-based flow control mechanism for heterogeneous NoC in GPU-CPU architectures to control network congestion. UFC can guarantee that there are always unoccupied entries in buffered routers to receive flits coming from adjacent bufferless routers. Our evaluations show that when compared to hot-potato routing, UFC improves performance by an average of 14.1%with energy increased by an average of 5.3%only.

  18. Novel automatic mapping technology on CPU-GPU heteroge-neous systems%面向CPU-GPU架构的源到源自动映射方法

    Institute of Scientific and Technical Information of China (English)

    朱正东; 刘袁; 魏洪昌; 颜康; 王寅峰; 董小社

    2015-01-01

    针对GPU上应用开发移植困难的问题,提出了一种串行计算源程序到并行计算源程序的映射方法。该方法从串行源程序中获得可并行化循环的层次信息,建立循环体结构与GPU线程的对应关系,生成GPU端核心函数代码;根据变量引用读写属性生成CPU端控制代码。基于该方法实现了一个编译原型系统,完成了C语言源程序到CUDA源程序的自动生成。对原型系统在功能和性能方面的测试结果表明,该系统生成的CUDA源程序与C语言源程序在功能上一致,其性能有显著提高,在一定程度上解决了计算密集型应用向CPU-GPU异构多核系统移植困难的问题。%Aiming at the developing and porting difficulties of GPU-based applications, a mapping approach is proposed, which converts serial computing source code into equivalent parallel computing source code. This approach acquires hier-archies of parallelizable loops from serial sources, establishes the correspondence between loop structures and GPU threads, and generates the core function code for GPU. Meanwhile, CPU control code is generated according to read/write attributes of variable references. A compiler prototype is implemented based on this approach, which translates C code into CUDA code automatically. Functionality and performance evaluations of the prototype show that the CUDA code generated is functionally equivalent to the original C code, with significant improvement in performance, thus overcomes the diffi-culty in porting compute-intensive applications to CPU-GPU heterogeneous systems.

  19. Dynamic real-time hierarchical heuristic search for pathfinding.

    OpenAIRE

    Naveed, Munir; Kitchin, Diane E.; Crampton, Andrew

    2009-01-01

    Movement of Units in Real-Time Strategy (RTS) Games is a non-trivial and challenging task mainly due to three factors which are constraints on CPU and memory usage, dynamicity of the game world, and concurrency. In this paper, we are focusing on finding a novel solution for solving the pathfinding problem in RTS Games for the units which are controlled by the computer. The novel solution combines two AI Planning approaches: Hierarchical Task Network (HTN) and Real-Time Heuristic Search (RHS)....

  20. Using the CPU and GPU for real-time video enhancement on a mobile computer

    CSIR Research Space (South Africa)

    Bachoo, AK

    2010-09-01

    Full Text Available Language (2nd Edition), Addison- Wesley Professional, 2006. [6] D. Shreiner, M. Woo, J. Neider, and T. Davis, OpenGL Programming Guide, Addison-Wesley Professional, 2007. [7] SM Pizer, EP Amburn, JD Austin, R Cromartie, A Geselowitz, T Greer, BM ter... computer,? in The Twentieth Annual Symposium of the Pattern Recognition Association of South Africa (PRASA), 2009. [11] RC Gonzalez and RE Woods, Digital image processing, Addison-Wesley Publishing Company, 2002. ...

  1. Surrogate Structures for Computationally Expensive Optimization Problems With CPU-Time Correlated Functions

    Science.gov (United States)

    2007-06-01

    xc)−∇2g(x̃c)](x− xc). The second transformation is a space mapping function P that handles the change in variable dimensions (see Bandler et al. [11...17(2):188–217, 2004. 11. Bandler , J. W., Q. Cheng, S. Dakroury, A. S. Mohamed, M.H. Bakr, K. Madsen, J. Søndergaard. “Space Mapping: The State of

  2. Blood sugar control in the intensive care unit: time to relook ...

    African Journals Online (AJOL)

    Blood sugar control in the intensive care unit: time to relook. ... No 4 (2014) >. Log in or Register to get access to full text downloads. ... and without an increase in hospital cost, and thus change existing blood sugar control protocols in the ICU?

  3. Fast Monte Carlo simulations of ultrasound-modulated light using a graphics processing unit.

    Science.gov (United States)

    Leung, Terence S; Powell, Samuel

    2010-01-01

    Ultrasound-modulated optical tomography (UOT) is based on "tagging" light in turbid media with focused ultrasound. In comparison to diffuse optical imaging, UOT can potentially offer a better spatial resolution. The existing Monte Carlo (MC) model for simulating ultrasound-modulated light is central processing unit (CPU) based and has been employed in several UOT related studies. We reimplemented the MC model with a graphics processing unit [(GPU), Nvidia GeForce 9800] that can execute the algorithm up to 125 times faster than its CPU (Intel Core Quad) counterpart for a particular set of optical and acoustic parameters. We also show that the incorporation of ultrasound propagation in photon migration modeling increases the computational time considerably, by a factor of at least 6, in one case, even with a GPU. With slight adjustment to the code, MC simulations were also performed to demonstrate the effect of ultrasonic modulation on the speckle pattern generated by the light model (available as animation). This was computed in 4 s with our GPU implementation as compared to 290 s using the CPU.

  4. ON-LINE SCHEDULING OF UNIT TIME JOBS WITH REJECTION ON UNIFORM MACHINES

    Institute of Scientific and Technical Information of China (English)

    Shoupeng LIU; Yuzhong ZHANG

    2008-01-01

    The authors consider the problem of on-line scheduling of unit execution time jobs on uniform machines with rejection penalty. The jobs arrive one by one and can be either accepted and scheduled, or be rejected. The objective is to minimize the total completion time of the accepted jobs and the total penalty of the rejection jobs. The authors propose an on-line algorithm and prove that the competitive ratio is 1/2 (2 + ) ≈ 1.86602.

  5. Optimisation of an exemplar oculomotor model using multi-objective genetic algorithms executed on a GPU-CPU combination.

    Science.gov (United States)

    Avramidis, Eleftherios; Akman, Ozgur E

    2017-03-24

    Parameter optimisation is a critical step in the construction of computational biology models. In eye movement research, computational models are increasingly important to understanding the mechanistic basis of normal and abnormal behaviour. In this study, we considered an existing neurobiological model of fast eye movements (saccades), capable of generating realistic simulations of: (i) normal horizontal saccades; and (ii) infantile nystagmus - pathological ocular oscillations that can be subdivided into different waveform classes. By developing appropriate fitness functions, we optimised the model to existing experimental saccade and nystagmus data, using a well-established multi-objective genetic algorithm. This algorithm required the model to be numerically integrated for very large numbers of parameter combinations. To address this computational bottleneck, we implemented a master-slave parallelisation, in which the model integrations were distributed across the compute units of a GPU, under the control of a CPU. While previous nystagmus fitting has been based on reproducing qualitative waveform characteristics, our optimisation protocol enabled us to perform the first direct fits of a model to experimental recordings. The fits to normal eye movements showed that although saccades of different amplitudes can be accurately simulated by individual parameter sets, a single set capable of fitting all amplitudes simultaneously cannot be determined. The fits to nystagmus oscillations systematically identified the parameter regimes in which the model can reproduce a number of canonical nystagmus waveforms to a high accuracy, whilst also identifying some waveforms that the model cannot simulate. Using a GPU to perform the model integrations yielded a speedup of around 20 compared to a high-end CPU. The results of both optimisation problems enabled us to quantify the predictive capacity of the model, suggesting specific modifications that could expand its repertoire of

  6. CUDA Based Performance Evaluation of the Computational Efficiency of the DCT Image Compression Technique on Both the CPU and GPU

    Directory of Open Access Journals (Sweden)

    Kgotlaetsile Mathews Modieginyane

    2013-06-01

    Full Text Available Recent advances in computing such as the massively parallel GPUs (Graphical Processing Units,coupledwith the need to store and deliver large quantities of digital data especially images, has brought a numberof challenges for Computer Scientists, the research community and other stakeholders. These challenges,such as prohibitively large costs to manipulate the digital data amongst others, have been the focus of theresearch community in recent years and has led to the investigation of image compression techniques thatcan achieve excellent results. One such technique is the Discrete Cosine Transform, which helps separatean image into parts of differing frequencies and has the advantage of excellent energy-compaction.This paper investigates the use of the Compute Unified Device Architecture (CUDA programming modelto implement the DCT based Cordic based Loeffler algorithm for efficient image compression. Thecomputational efficiency is analyzed and evaluated under both the CPU and GPU. The PSNR (Peak Signalto Noise Ratio is used to evaluate image reconstruction quality in this paper. The results are presentedand discussed.

  7. Massively Parallel Latent Semantic Analyzes using a Graphics Processing Unit

    Energy Technology Data Exchange (ETDEWEB)

    Cavanagh, Joseph M [ORNL; Cui, Xiaohui [ORNL

    2009-01-01

    Latent Semantic Indexing (LSA) aims to reduce the dimensions of large Term-Document datasets using Singular Value Decomposition. However, with the ever expanding size of data sets, current implementations are not fast enough to quickly and easily compute the results on a standard PC. The Graphics Processing Unit (GPU) can solve some highly parallel problems much faster than the traditional sequential processor (CPU). Thus, a deployable system using a GPU to speedup large-scale LSA processes would be a much more effective choice (in terms of cost/performance ratio) than using a computer cluster. Due to the GPU s application-specific architecture, harnessing the GPU s computational prowess for LSA is a great challenge. We present a parallel LSA implementation on the GPU, using NVIDIA Compute Unified Device Architecture and Compute Unified Basic Linear Algebra Subprograms. The performance of this implementation is compared to traditional LSA implementation on CPU using an optimized Basic Linear Algebra Subprograms library. After implementation, we discovered that the GPU version of the algorithm was twice as fast for large matrices (1000x1000 and above) that had dimensions not divisible by 16. For large matrices that did have dimensions divisible by 16, the GPU algorithm ran five to six times faster than the CPU version. The large variation is due to architectural benefits the GPU has for matrices divisible by 16. It should be noted that the overall speeds for the CPU version did not vary from relative normal when the matrix dimensions were divisible by 16. Further research is needed in order to produce a fully implementable version of LSA. With that in mind, the research we presented shows that the GPU is a viable option for increasing the speed of LSA, in terms of cost/performance ratio.

  8. Demonstration of the suitability of GPUs for AO real-time control at ELT scales

    Science.gov (United States)

    Bitenc, Urban; Basden, Alastair G.; Dipper, Nigel A.; Myers, Richard M.

    2016-07-01

    We have implemented the full AO data-processing pipeline on Graphics Processing Units (GPUs), within the framework of Durham AO Real-time Controller (DARC). The wavefront sensor images are copied from the CPU memory to the GPU memory. The GPU processes the data and the DM commands are copied back to the CPU. For a SCAO system of 80x80 subapertures, the rate achieved on a single GPU is about 700 frames per second (fps). This increases to 1100 fps (1565 fps) if we use two (four) GPUs. Jitter exhibits a distribution with the root-mean-square value of 20 μs-30 μs and a negligible number of outliers. The increase in latency due to the pixel data copying from the CPU to the GPU has been reduced to the minimum by copying the data in parallel to processing them. An alternative solution in which the data would be moved from the camera directly to the GPU, without CPU involvement, could be about 10%-20% faster. We have also implemented the correlation centroiding algorithm, which - when used - reduces the frame rate by about a factor of 2-3.

  9. Accelerating hyper-spectral data processing on the multi-CPU and multi-GPU heterogeneous computing platform

    Science.gov (United States)

    Zhang, Lei; Gao, Jiao Bo; Hu, Yu; Wang, Ying Hui; Sun, Ke Feng; Cheng, Juan; Sun, Dan Dan; Li, Yu

    2017-02-01

    During the research of hyper-spectral imaging spectrometer, how to process the huge amount of image data is a difficult problem for all researchers. The amount of image data is about the order of magnitude of several hundred megabytes per second. The only way to solve this problem is parallel computing technology. With the development of multi-core CPU and GPU parallel computing on multi-core CPU or GPU is increasingly applied in large-scale data processing. In this paper, we propose a new parallel computing solution of hyper-spectral data processing which is based on the multi-CPU and multi-GPU heterogeneous computing platform. We use OpenMP technology to control multi-core CPU, we also use CUDA to schedule the parallel computing on multi-GPU. Experimental results show that the speed of hyper-spectral data processing on the multi-CPU and multi-GPU heterogeneous computing platform is apparently faster than the traditional serial algorithm which is run on single core CPU. Our research has significant meaning for the engineering application of the windowing Fourier transform imaging spectrometer.

  10. Time-dependent measurement of base pressure in a blowdown tunnel with varying unit Reynolds number

    Science.gov (United States)

    Kangovi, S.; Rao, D. M.

    1978-01-01

    An operational characteristic of blowdown-type of wind tunnels is the drop in the stagnation temperature with time and the accompanying change in the test-section unit Reynolds number at constant stagnation pressure and Mach number. This apparent disadvantage can be turned to advantage in some cases where a Reynolds number scan is desired in order to study the effect of unit Reynolds number variation on a particular viscous flow phenomenon. This note presents such an instance arising from recent investigations on base pressure at transonic speeds conducted in the NAL 1-ft tunnel.

  11. Unit roots and structural breakpoints in China's macroeconomic and financial time series

    Institute of Scientific and Technical Information of China (English)

    LIANG Qi; TENG Jianzhou

    2006-01-01

    This paper applies unit-root tests to 10 Chinese macroeconomic and financial time series that allow for the possibility of up to two endogenous structural breaks.We found that 6 of the series,i.e.,GDP,GDP per capita,employment,bank credit,deposit liabilities and investment,can be more accurately characterized as a segmented trend stationarity process around one or two structural breakpoints as opposed to a stochastic unit root process.Our findings have important implications for policy-makers to formulate long-term growth strategy and short-run stabilization policies,as well as causality analysis among the series.

  12. Dynamic Resource Reservation and Connectivity Tracking to Support Real-Time Communication among Mobile Units

    Directory of Open Access Journals (Sweden)

    Almeida Luis

    2005-01-01

    Full Text Available Wireless communication technology is spreading quickly in almost all the information technology areas as a consequence of a gradual enhancement in quality and security of the communication, together with a decrease in the related costs. This facilitates the development of relatively low-cost teams of autonomous (robotic mobile units that cooperate to achieve a common goal. Providing real-time communication among the team units is highly desirable for guaranteeing a predictable behavior in those applications in which the robots have to operate autonomously in unstructured environments. This paper proposes a MAC protocol for wireless communication that supports dynamic resource reservation and topology management for relatively small networks of cooperative units (10–20 units. The protocol uses a slotted time-triggered medium access transmission control that is collision-free, even in the presence of hidden nodes. The transmissions are scheduled according to the earliest deadline first scheduling policy. An adequate admission control guarantees the timing constraints of the team communication requirements, including when new nodes dynamically join or leave the team. The paper describes the protocol focusing on the consensus procedure that supports coherent changes in the global system. We also introduce a distributed connectivity tracking mechanism that is used to detect network partition and absent or crashed nodes. Finally, a set of simulation results are shown that illustrate the effectiveness of the proposed approaches.

  13. Time series of vegetation indices and the modifiable temporal unit problem

    Directory of Open Access Journals (Sweden)

    R. de Jong

    2011-08-01

    Full Text Available Time series of vegetation indices (VI derived from satellite imagery provide a consistent monitoring system for terrestrial plant systems. They enable detection and quantification of gradual changes within the time frame covered, which are of crucial importance in global change studies, for example. However, VI time series typically contain a strong seasonal signal which complicates change detection. Commonly, trends are quantified using linear regression methods, while the effect of serial autocorrelation is remediated by temporal aggregation over bins having a fixed width. Aggregating the data in this way produces temporal units which are modifiable. Analogous to the well-known Modifiable Area Unit Problem (MAUP, the way in which these temporal units are defined may influence the fitted model parameters and therefore the amount of change detected. This paper illustrates the effect of this Modifiable Temporal Unit Problem (MTUP on a synthetic data set and a real VI data set. Large variation in detected changes was found for aggregation over bins that mismatched full lengths of vegetative cycles, which demonstrates that aperiodicity in the data may influence model results. Using 26 yr of VI data and aggregation over full-length periods, deviations in VI gains of less than 1 % were found for annual periods, while deviations (with respect to seasonally adjusted data increased up to 24 % for aggregation windows of 5 yr. This demonstrates that temporal aggregation needs to be carried out with care in order to avoid spurious model results.

  14. Improvement of heat pipe performance through integration of a coral biomaterial wick structure into the heat pipe of a CPU cooling system

    Science.gov (United States)

    Putra, Nandy; Septiadi, Wayan Nata

    2016-08-01

    The very high heat flux dissipated by a Central Processing Unit (CPU) can no longer be handled by a conventional, single-phased cooling system. Thermal management of a CPU is now moving towards two-phase systems to maintain CPUs below their maximum temperature. A heat pipe is one of the emerging cooling systems to address this issue because of its superior efficiency and energy input independence. The goal of this research is to improve the performance of a heat pipe by integrating a biomaterial as the wick structure. In this work, the heat pipe was made from copper pipe and the biomaterial wick structure was made from tabulate coral with a mean pore diameter of 52.95 μm. For comparison purposes, the wick structure was fabricated from sintered Cu-powder with a mean pore diameter of 58.57 µm. The working fluid for this experiment was water. The experiment was conducted using a processor as the heat source and a plate simulator to measure the heat flux. The utilization of coral as the wick structure can improve the performance of a heat pipe and can decrease the temperature of a simulator plate by as much as 38.6 % at the maximum heat load compared to a conventional copper heat sink. This method also decreased the temperature of the simulator plate by as much as 44.25 °C compared to a heat pipe composed of a sintered Cu-powder wick.

  15. Improvement of heat pipe performance through integration of a coral biomaterial wick structure into the heat pipe of a CPU cooling system

    Science.gov (United States)

    Putra, Nandy; Septiadi, Wayan Nata

    2017-04-01

    The very high heat flux dissipated by a Central Processing Unit (CPU) can no longer be handled by a conventional, single-phased cooling system. Thermal management of a CPU is now moving towards two-phase systems to maintain CPUs below their maximum temperature. A heat pipe is one of the emerging cooling systems to address this issue because of its superior efficiency and energy input independence. The goal of this research is to improve the performance of a heat pipe by integrating a biomaterial as the wick structure. In this work, the heat pipe was made from copper pipe and the biomaterial wick structure was made from tabulate coral with a mean pore diameter of 52.95 μm. For comparison purposes, the wick structure was fabricated from sintered Cu-powder with a mean pore diameter of 58.57 µm. The working fluid for this experiment was water. The experiment was conducted using a processor as the heat source and a plate simulator to measure the heat flux. The utilization of coral as the wick structure can improve the performance of a heat pipe and can decrease the temperature of a simulator plate by as much as 38.6 % at the maximum heat load compared to a conventional copper heat sink. This method also decreased the temperature of the simulator plate by as much as 44.25 °C compared to a heat pipe composed of a sintered Cu-powder wick.

  16. Breastfeeding protection, promotion, and support in the United States: a time to nudge, a time to measure.

    Science.gov (United States)

    Pérez-Escamilla, Rafael; Chapman, Donna J

    2012-05-01

    Strong evidence-based advocacy efforts have now translated into high level political support and concrete goals for improving breastfeeding outcomes among women in the United States. In spite of this, major challenge remain for promoting, supporting and especially for protecting breastfeeding in the country. The goals of this commentary are to argue in favor of: A) Changes in the default social and environmental systems, that would allow women to implement their right to breastfeed their infants, B) A multi-level and comprehensive monitoring system to measure process and outcomes indicators in the country. Evidence-based commentary. Breastfeeding rates in the United States can improve based on a well coordinated social marketing framework. This approach calls for innovative promotion through mass media, appropriate facility based and community based support (e.g., Baby Friendly Hospital Initiative, WIC-coordinated community based peer counseling), and adequate protection for working women (e.g., longer paid maternity leave, breastfeeding or breast milk extraction breaks during the working day) and women at large by adhering and enforcing the WHO ethics Code for the Marketing of Breast Milk Substitutes. Sound infant feeding practices monitoring systems, which include WIC administrative food package data, are needed. Given the current high level of political support to improve breastfeeding in the United States, a window of opportunity has been opened. Establishing breastfeeding as the social norm in the USA will take time, but the global experience indicates that it can be done.

  17. Commodity CPU-GPU System for Low-Cost , High-Performance Computing

    Science.gov (United States)

    Wang, S.; Zhang, S.; Weiss, R. M.; Barnett, G. A.; Yuen, D. A.

    2009-12-01

    We have put together a desktop computer system for under 2.5 K dollars from commodity components that consist of one quad-core CPU (Intel Core 2 Quad Q6600 Kentsfield 2.4GHz) and two high end GPUs (nVidia's GeForce GTX 295 and Tesla C1060). A 1200 watt power supply is required. On this commodity system, we have constructed an easy-to-use hybrid computing environment, in which Message Passing Interface (MPI) is used for managing the working loads, for transferring the data among different GPU devices, and for minimizing the need of CPU’s memory. The test runs using the MAGMA (Matrix Algebra on GPU and Multicore Architectures) library show that the speed ups for double precision calculations can be greater than 10 (GPU vs. CPU) and they are bigger (> 20) for single precision calculations. In addition we have enabled the combination of Matlab with CUDA for interactive visualization through MPI, i.e., two GPU devices are used for simulation and one GPU device is used for visualizing the computing results as the simulation goes. Our experience with this commodity system has shown that running multiple applications on one GPU device or running one application across multiple GPU devices can be done as conveniently as on CPUs. With NVIDIA CEO Jen-Hsun Huang's claim that over the next 6 years GPU processing power will increase by 570x compared to the 3x for CPUs, future low-cost commodity computers such as ours may be a remedy for the long wait queues of the world's supercomputers, especially for small- and mid-scale computation. Our goal here is to explore the limits and capabilities of this emerging technology and to get ourselves ready to run large-scale simulations on the next generation of computing environment, which we believe will hybridize CPU and GPU architectures.

  18. Real-time radar signal processing using GPGPU (general-purpose graphic processing unit)

    Science.gov (United States)

    Kong, Fanxing; Zhang, Yan Rockee; Cai, Jingxiao; Palmer, Robert D.

    2016-05-01

    This study introduces a practical approach to develop real-time signal processing chain for general phased array radar on NVIDIA GPUs(Graphical Processing Units) using CUDA (Compute Unified Device Architecture) libraries such as cuBlas and cuFFT, which are adopted from open source libraries and optimized for the NVIDIA GPUs. The processed results are rigorously verified against those from the CPUs. Performance benchmarked in computation time with various input data cube sizes are compared across GPUs and CPUs. Through the analysis, it will be demonstrated that GPGPUs (General Purpose GPU) real-time processing of the array radar data is possible with relatively low-cost commercial GPUs.

  19. The Feasibility of Using OpenCL Instead of OpenMP for Parallel CPU Programming

    OpenAIRE

    Karimi, Kamran

    2015-01-01

    OpenCL, along with CUDA, is one of the main tools used to program GPGPUs. However, it allows running the same code on multi-core CPUs too, making it a rival for the long-established OpenMP. In this paper we compare OpenCL and OpenMP when developing and running compute-heavy code on a CPU. Both ease of programming and performance aspects are considered. Since, unlike a GPU, no memory copy operation is involved, our comparisons measure the code generation quality, as well as thread management e...

  20. 亿维发布UN CPU224 DC/DC/DC

    Institute of Scientific and Technical Information of China (English)

    2011-01-01

    2011年12月.亿维(UniMAT)正式发布了UN CPU224DC/DC/DC.订货号为UN214-1AD23—0XB0。这是一款由亿维研发部根据多年PLC模块的市场经验.历时1年研发出来的200系列自主品牌PLC控制器模块.在兼容西门子PLC功能的前提下,做了部分优化及功能增强。

  1. VMware vSphere performance designing CPU, memory, storage, and networking for performance-intensive workloads

    CERN Document Server

    Liebowitz, Matt; Spies, Rynardt

    2014-01-01

    Covering the latest VMware vSphere software, an essential book aimed at solving vSphere performance problems before they happen VMware vSphere is the industry's most widely deployed virtualization solution. However, if you improperly deploy vSphere, performance problems occur. Aimed at VMware administrators and engineers and written by a team of VMware experts, this resource provides guidance on common CPU, memory, storage, and network-related problems. Plus, step-by-step instructions walk you through techniques for solving problems and shed light on possible causes behind the problems. Divu

  2. Brownian dynamics simulations on CPU and GPU with BD_BOX.

    Science.gov (United States)

    Długosz, Maciej; Zieliński, Paweł; Trylska, Joanna

    2011-09-01

    There has been growing interest in simulating biological processes under in vivo conditions due to recent advances in experimental techniques dedicated to study single particle behavior in crowded environments. We have developed a software package, BD_BOX, for multiscale Brownian dynamics simulations. BD_BOX can simulate either single molecules or multicomponent systems of diverse, interacting molecular species using flexible, coarse-grained bead models. BD_BOX is written in C and employs modern computer architectures and technologies; these include MPI for distributed-memory architectures, OpenMP for shared-memory platforms, NVIDIA CUDA framework for GPGPU, and SSE vectorization for CPU. Copyright © 2011 Wiley Periodicals, Inc.

  3. Influence of timing variability between motor unit potentials on M-wave characteristics.

    Science.gov (United States)

    Rodriguez-Falces, Javier; Malanda, Armando; Latasa, Iban; Lavilla-Oiz, Ana; Navallas, Javier

    2016-10-01

    The transient enlargement of the compound muscle action potential (M wave) after a conditioning contraction is referred to as potentiation. It has been recently shown that the potentiation of the first and second phases of a monopolar M wave differed drastically; namely, the first phase remained largely unchanged, whereas the second phase underwent a marked enlargement and shortening. This dissimilar potentiation of the first and second phases has been suggested to be attributed to a transient increase in conduction velocity after the contraction. Here, we present a series of simulations to test if changes in the timing variability between motor unit potentials (MUPs) can be responsible for the unequal potentiation (and shortening) of the first and the second M-wave phases. We found that an increase in the mean motor unit conduction velocity resulted in a marked enlargement and narrowing of both the first and second M-wave phases. The enlargement of the first phase caused by a global increase in motor unit conduction velocities was apparent even for the electrode located over the innervation zone and became more pronounced with increasing distance to the innervation zone, whereas the potentiation of the second phase was largely independent of electrode position. Our simulations indicate that it is unlikely that an increase in motor unit conduction velocities (accompanied or not by changes in their distribution) could account for the experimental observation that only the second phase of a monopolar M wave, but not the first, is enlarged after a brief contraction. However, the combination of an increase in the motor unit conduction velocities and a spreading of the motor unit activation times could potentially explain the asymmetric potentiation of the M-wave phases.

  4. Vectorized K-Means Algorithm on Heterogeneous CPU/MIC Architecture%面向CPU/MIC异构架构的K-Means向量化算法

    Institute of Scientific and Technical Information of China (English)

    谭郁松; 伍复慧; 吴庆波; 陈微; 孙晓利

    2014-01-01

    In the context of big data era, K-Means is an important algorithm of cluster analysis of data mining. The massive high-dimensional data processing brings strong performance demand on K-Means algorithms. The newly proposed MIC (many integrated core) architecture provides both thread-level parallel between cores and instruction-level parallel in each core, which make MIC good choice for algorithm acceleration. Firstly, this paper describes the basic K-Means algorithm and analyzes its bottleneck. Then it proposes a novel vectorized K-Means algorithm which optimizes vector data layout strategy and gets higher parallel performance. Moreover, it implements the vectorized algorithm on CPU/MIC heterogeneous platform, and explores the MIC optimization strategy in non-traditional HPC (high performance computing) applications. The experimental results prove that the vectorized K-Means algorithm has excellent performance and scalability.%在大数据背景下,以K-Means为代表的聚类分析对于数据分析和挖掘十分重要。海量高维数据的处理给K-Means算法带来了性能方面的强烈需求。最新提出的众核体系结构MIC(many integrated core)能够为算法加速提供众核间线程级和核内指令级并行,使其成为K-Means算法加速的很好选择。在分析K-Means基本算法特点的基础上,分析了K-Means算法的瓶颈,提出了可利用数据并行的K-Means向量化算法,优化了向量化算法的数据布局方案。最后,基于CPU/MIC的异构架构实现了向量化K-Means算法,并且探索了MIC在非传统HPC(high performance computing)应用领域的优化策略。测试结果表明,K-Means向量化算法具有良好的计算性能和扩展性。

  5. Nursing time study for the administration of a PRN oral analgesic on an orthopedic postoperative unit.

    Science.gov (United States)

    Pizzi, Lois J; Chelly, Jacques E; Marlin, Vanessa

    2014-09-01

    As needed (PRN) oral opioid analgesics are an integral part of many orthopedic postoperative multimodal pain management regimens. However, the unpredictable nature of this dosing method can lead to disruptions in the process of administering the medication, as well as be an interruption to regular nursing activities. This IRB approved quantitative time study tested the hypothesis that a significant amount of nursing time is required in the administration of PRN oral opioid analgesics on a postoperative orthopedic nursing unit. The purpose of this study is to evaluate the time necessary to complete the required steps related to the administration of PRN oral analgesics. Nurses from 28 nursing shifts used a personal digital assistant (PDA) to record the time needed to complete these steps. We determined that 10.9 minutes is the mean time required to administer PRN oral analgesics on this unit. Other time studies have evaluated the medication administration process as a whole. No time studies related to PRN oral analgesic administration have been reported. In phase I of our project, the data were summarized and will be used as a baseline comparison for phase II, in which we will evaluate an oral PCA medication administration system.

  6. Monte Carlo standardless approach for laser induced breakdown spectroscopy based on massive parallel graphic processing unit computing

    Science.gov (United States)

    Demidov, A.; Eschlböck-Fuchs, S.; Kazakov, A. Ya.; Gornushkin, I. B.; Kolmhofer, P. J.; Pedarnig, J. D.; Huber, N.; Heitz, J.; Schmid, T.; Rössler, R.; Panne, U.

    2016-11-01

    The improved Monte-Carlo (MC) method for standard-less analysis in laser induced breakdown spectroscopy (LIBS) is presented. Concentrations in MC LIBS are found by fitting model-generated synthetic spectra to experimental spectra. The current version of MC LIBS is based on the graphic processing unit (GPU) computation and reduces the analysis time down to several seconds per spectrum/sample. The previous version of MC LIBS which was based on the central processing unit (CPU) computation requested unacceptably long analysis times of 10's minutes per spectrum/sample. The reduction of the computational time is achieved through the massively parallel computing on the GPU which embeds thousands of co-processors. It is shown that the number of iterations on the GPU exceeds that on the CPU by a factor > 1000 for the 5-dimentional parameter space and yet requires > 10-fold shorter computational time. The improved GPU-MC LIBS outperforms the CPU-MS LIBS in terms of accuracy, precision, and analysis time. The performance is tested on LIBS-spectra obtained from pelletized powders of metal oxides consisting of CaO, Fe2O3, MgO, and TiO2 that simulated by-products of steel industry, steel slags. It is demonstrated that GPU-based MC LIBS is capable of rapid multi-element analysis with relative error between 1 and 10's percent that is sufficient for industrial applications (e.g. steel slag analysis). The results of the improved GPU-based MC LIBS are positively compared to that of the CPU-based MC LIBS as well as to the results of the standard calibration-free (CF) LIBS based on the Boltzmann plot method.

  7. A Qualitative and Quantitative Analysis of Multi-core CPU Power and Performance Impact on Server Virtualization for Enterprise Cloud Data Centers

    Directory of Open Access Journals (Sweden)

    S. Suresh

    2015-02-01

    Full Text Available Cloud is an on demand service provisioning techniques uses virtualization as the underlying technology for managing and improving the utilization of data and computing center resources by server consolidation. Even though virtualization is a software technology, it has the effect of making hardware more important for high consolidation ratio. Performance and energy efficiency is one of the most important issues for large scale server systems in current and future cloud data centers. As improved performance is pushing the migration to multi core processors, this study does the analytic and simulation study of, multi core impact on server virtualization for new levels of performance and energy efficiency in cloud data centers. In this regard, the study develops the above described system model of virtualized server cluster and validate it for CPU core impact for performance and power consumption in terms of mean response time (mean delay vs. offered cloud load. Analytic and simulation results show that multi core virtualized model yields the best results (smallest mean delays, over the single fat CPU processor (faster clock speed for the diverse cloud workloads. For the given application, multi cores, by sharing the processing load improves overall system performance for all varying workload conditions; whereas, the fat single CPU model is only best suited for lighter loads. In addition, multi core processors don’t consume more power or generate more heat vs. a single-core processor, which gives users more processing power without the drawbacks typically associated with such increases. Therefore, cloud data centers today rely almost exclusively on multi core systems.

  8. UNIT-RATE COMPLEX ORTHOGONAL SPACE-TIME BLOCK CODE CONCATENATED WITH TURBO CODING

    Institute of Scientific and Technical Information of China (English)

    2006-01-01

    Space-Time Block (STB) code has been an effective transmit diversity technique for combating fading due to its orthogonal design, simple decoding and high diversity gains. In this paper, a unit-rate complex orthogonal STB code for multiple antennas in Time Division Duplex (TDD) mode is proposed. Meanwhile, Turbo Coding (TC) is employed to improve the performance of proposed STB code further by utilizing its good ability to combat the burst error of fading channel. Compared with full-diversity multiple antennas STB codes, the proposed code can implement unit rate and partial diversity; and it has much smaller computational complexity under the same system throughput. Moreover, the application of TC can effectively make up for the performance loss due to partial diversity. Simulation results show that on the condition of same system throughput and concatenation of TC, the proposed code has lower Bit Error Rate (BER) than those full-diversity codes.

  9. Strategies that delay or prevent the timely availability of affordable generic drugs in the United States.

    Science.gov (United States)

    Jones, Gregory H; Carrier, Michael A; Silver, Richard T; Kantarjian, Hagop

    2016-03-17

    High cancer drug prices are influenced by the availability of generic cancer drugs in a timely manner. Several strategies have been used to delay the availability of affordable generic drugs into the United States and world markets. These include reverse payment or pay-for-delay patent settlements, authorized generics, product hopping, lobbying against cross-border drug importation, buying out the competition, and others. In this forum, we detail these strategies and how they can be prevented.

  10. CPU-12, a novel synthesized oxazolo[5,4-d]pyrimidine derivative, showed superior anti-angiogenic activity.

    Science.gov (United States)

    Liu, Jiping; Deng, Ya-Hui; Yang, Ling; Chen, Yijuan; Lawali, Manzo; Sun, Li-Ping; Liu, Yu

    2015-09-01

    Angiogenesis is a crucial requirement for malignant tumor growth, progression and metastasis. Tumor-derived factors stimulate formation of new blood vessels which actively support tumor growth and spread. Various of drugs have been applied to inhibit tumor angiogenesis. CPU-12, 4-chloro-N-(4-((2-(4-methoxyphenyl)-5-methyloxazolo[5,4-d] pyrimidin-7-yl)amino)phenyl)benzamide, is a novel oxazolo[5,4-d]pyrimidine derivative that showed potent activity in inhibiting VEGF-induced angiogenesis in vitro and ex-vivo. In cell toxicity experiments, CPU-12 significantly inhibited the human umbilical vein endothelial cell (HUVEC) proliferation in a dose-dependent manner with a low IC50 value at 9.30 ± 1.24 μM. In vitro, CPU-12 remarkably inhibited HUVEC's migration, chemotactic invasion and capillary-like tube formation in a dose-dependent manner. In ex-vivo, CPU-12 effectively inhibited new microvessels sprouting from the rat aortic ring. In addition, the downstream signalings of vascular endothelial growth factor receptor-2 (VEGFR-2), including the phosphorylation of PI3K, ERK1/2 and p38 MAPK, were effectively down-regulated by CPU-12. These evidences suggested that angiogenic response via the induction of VEGFR through distinct signal transduction pathways regulating proliferation, migration and tube formation of endothelial cells was significantly inhibited by the novel small molecule compound CPU-12 in vitro and ex-vivo. In conclusion, CPU-12 showed superior anti-angiogenic activity in vitro. Copyright © 2015 The Authors. Production and hosting by Elsevier B.V. All rights reserved.

  11. Metamodel-assisted evolutionary algorithms for the unit commitment problem with probabilistic outages

    Energy Technology Data Exchange (ETDEWEB)

    Georgopoulou, Chariklia A.; Giannakoglou, Kyriakos C. [National Technical University of Athens, School of Mechanical Engineering, Lab. of Thermal Turbomachines, Parallel CFD and Optimization Unit, P.O. Box 64069, Athens 157 10 (Greece)

    2010-05-15

    An efficient method for solving power generating unit commitment (UC) problems with probabilistic unit outages is proposed. It is based on a two-level evolutionary algorithm (EA) minimizing the expected total operating cost (TOC) of a system of power generating units over a scheduling period, with known failure and repair rates of each unit. To compute the cost function value of each EA population member, namely a candidate UC schedule, a Monte Carlo simulation must be carried out. Some thousands of replicates are generated according to the units' outage and repair rates and the corresponding probabilities. Each replicate is represented by a series of randomly generated availability and unavailability periods of time for each unit and the UC schedule under consideration accordingly. The expected TOC is the average of the TOCs of all Monte Carlo replicates. Therefore, the CPU cost per Monte Carlo evaluation increases noticeably and so does the CPU cost of running the EA. To reduce it, the use of a metamodel-assisted EA (MAEA) with on-line trained surrogate evaluation models or metamodels (namely, radial-basis function networks) is proposed. A novelty of this method is that the metamodels are trained on a few ''representative'' unit outage scenarios selected among the Monte Carlo replicates generated once during the optimization and, then, used to predict the expected TOC. Based on this low cost, approximate pre-evaluation, only a few top individuals within each generation undergo Monte Carlo simulations. The proposed MAEA is demonstrated on test problems and shown to drastically reduce the CPU cost, compared to EAs which are exclusively based on Monte Carlo simulations. (author)

  12. Perceptions of mentoring of full-time occupational therapy faculty in the United States.

    Science.gov (United States)

    Falzarano, Mary; Zipp, Genevieve Pinto

    2012-09-01

    The purpose of this study was to describe the occurrence, nature and perception of the influence of mentoring for full-time occupational therapy faculty members who are on the tenure track or eligible for re-appointment in the United States. An online survey was sent during 2010 September, the beginning of the academic year, to all 818 potential participants in the United States entry-level and doctoral programmes. Fifty six of 107 participants who met the criteria reported being in a mentoring relationship and positively rated their perception of the influence of mentoring on academic success and academic socialization. The response of all participants to open-ended questions describes preferred mentoring characteristics (providing information, support), benefits (having someone to go to, easing the stress) and challenges (not enough time, mentoring not valued). Findings inform current and potential faculty of the current state of mentoring. Administrators can use this information when designing mentoring opportunities, educating mentors and mentees about the mentoring process, arranging mentors/mentees release time for engaging in the mentoring process and finally, managing the mentor/mentee needs. The cross-sectional survey of the United States occupational therapy faculty limits generalizability yet paves the way for future studies to explore retention and recruitment of mentored faculty across countries.

  13. Variation in voxel value distribution and effect of time between exposures in six CBCT units.

    Science.gov (United States)

    Spin-Neto, R; Gotfredsen, E; Wenzel, A

    2014-01-01

    The aim of this study is to assess the variation in voxel value distribution in volumetric data sets obtained by six cone beam CT (CBCT) units, and the effect of time between exposures. Six CBCT units [Cranex(®) 3D (CRAN; Soredex Oy, Tuusula, Finland), Scanora(®) 3D (SCAN; Soredex Oy), NewTom™ 5G (NEWT; QR Srl, Verona, Italy), Promax(®) Dimax 3 (Planmeca Oy, Helsinki, Finland), i-CAT (Imaging Sciences International, Hatfield, PA) and 3D Accuitomo FPD80 (Morita, Kyoto, Japan)] were tested. Two volumetric data sets of a dry human skull embedded in acrylic were acquired by each CBCT unit in two sessions on separate days. Each session consisted of 20 exposures: 10 acquired with 30 min between exposures and 10 acquired immediately one after the other. CBCT data were exported as digital imaging and communications in medicine (DICOM) files and converted to text files. The text files were re-organized to contain x-, y- and z-position and grey shade for each voxel. The files were merged to contain 1 record per voxel position, including the voxel values from the 20 exposures in a session. For each voxel, subtractions were performed between Data Set 1 and the remaining 19 data sets (1 - 2, 1 - 3, etc) in a session. Means, medians, ranges and standard deviations for grey shade variation in the subtraction data sets were calculated for each unit and session. For all CBCT units, variation in voxel values was observed throughout the 20 exposures. A "fingerprint" for the grey shade variation was observed for CRAN, SCAN and NEWT. For the other units, the variation was (apparently) randomly distributed. Large discrepancies in voxel value distribution are seen in CBCT images. This variation should be considered in studies that assess minute changes in CBCT images.

  14. CPU Cooling of Desktop PC by Closed-end Oscillating Heat-pipe (CEOHP

    Directory of Open Access Journals (Sweden)

    S. Rittidech

    2005-01-01

    Full Text Available The CEOHP cooling module consisted of two main parts, i.e., the aluminum housing and the CEOHP. The house casing was designed to be suitable for CEOHP. The housing to drilling for insert the CEOHP. The CEOHP design employed copper tubes: Two sets of capillary tubes with an inner diameter of 0.002 m, an evaporator length of 0.05 and a condenser length of 0.16 m and each of which has six meandering turns. The evaporator section was embraced in the aluminum housing and attached to the thermal pad of Pentium 4 CPU, model SL 6 PB, 2.26 GHZ. While the condenser section was embraced in the cooling fin housing and cooled by forced convection. R134a was used as the working fluid with filling ratio of 50%. In the experiment, the CPU chip with a power of 58 W was 70°C. Fan speed of 2000 and 4000 rpm. It was found that, if fan speed increases the cooling performance increases. The CEOHP cooling module had better thermal performance than conventional heat sink.

  15. The “Chimera”: An Off-The-Shelf CPU/GPGPU/FPGA Hybrid Computing Platform

    Directory of Open Access Journals (Sweden)

    Ra Inta

    2012-01-01

    Full Text Available The nature of modern astronomy means that a number of interesting problems exhibit a substantial computational bound and this situation is gradually worsening. Scientists, increasingly fighting for valuable resources on conventional high-performance computing (HPC facilities—often with a limited customizable user environment—are increasingly looking to hardware acceleration solutions. We describe here a heterogeneous CPU/GPGPU/FPGA desktop computing system (the “Chimera”, built with commercial-off-the-shelf components. We show that this platform may be a viable alternative solution to many common computationally bound problems found in astronomy, however, not without significant challenges. The most significant bottleneck in pipelines involving real data is most likely to be the interconnect (in this case the PCI Express bus residing on the CPU motherboard. Finally, we speculate on the merits of our Chimera system on the entire landscape of parallel computing, through the analysis of representative problems from UC Berkeley’s “Thirteen Dwarves.”

  16. Energy consumption optimization of the total-FETI solver by changing the CPU frequency

    Science.gov (United States)

    Horak, David; Riha, Lubomir; Sojka, Radim; Kruzik, Jakub; Beseda, Martin; Cermak, Martin; Schuchart, Joseph

    2017-07-01

    The energy consumption of supercomputers is one of the critical problems for the upcoming Exascale supercomputing era. The awareness of power and energy consumption is required on both software and hardware side. This paper deals with the energy consumption evaluation of the Finite Element Tearing and Interconnect (FETI) based solvers of linear systems, which is an established method for solving real-world engineering problems. We have evaluated the effect of the CPU frequency on the energy consumption of the FETI solver using a linear elasticity 3D cube synthetic benchmark. In this problem, we have evaluated the effect of frequency tuning on the energy consumption of the essential processing kernels of the FETI method. The paper provides results for two types of frequency tuning: (1) static tuning and (2) dynamic tuning. For static tuning experiments, the frequency is set before execution and kept constant during the runtime. For dynamic tuning, the frequency is changed during the program execution to adapt the system to the actual needs of the application. The paper shows that static tuning brings up 12% energy savings when compared to default CPU settings (the highest clock rate). The dynamic tuning improves this further by up to 3%.

  17. The impact of financial and nonfinancial incentives on business-unit outcomes over time.

    Science.gov (United States)

    Peterson, Suzanne J; Luthans, Fred

    2006-01-01

    Unlike previous behavior management research, this study used a quasi-experimental, control group design to examine the impact of financial and nonfinancial incentives on business-unit (21 stores in a fast-food franchise corporation) outcomes (profit, customer service, and employee turnover) over time. The results showed that both types of incentives had a significant impact on all measured outcomes. The financial incentive initially had a greater effect on all 3 outcomes, but over time, the financial and nonfinancial incentives had an equally significant impact except in terms of employee turnover.

  18. Dynamic Agricultural Land Unit Profile Database Generation using Landsat Time Series Images

    Science.gov (United States)

    Torres-Rua, A. F.; McKee, M.

    2012-12-01

    Agriculture requires continuous supply of inputs to production, while providing final or intermediate outputs or products (food, forage, industrial uses, etc.). Government and other economic agents are interested in the continuity of this process and make decisions based on the available information about current conditions within the agriculture area. From a government point of view, it is important that the input-output chain in agriculture for a given area be enhanced in time, while any possible abrupt disruption be minimized or be constrained within the variation tolerance of the input-output chain. The stability of the exchange of inputs and outputs becomes of even more important in disaster-affected zones, where government programs will look for restoring the area to equal or enhanced social and economical conditions before the occurrence of the disaster. From an economical perspective, potential and existing input providers require up-to-date, precise information of the agriculture area to determine present and future inputs and stock amounts. From another side, agriculture output acquirers might want to apply their own criteria to sort out present and future providers (farmers or irrigators) based on the management done during the irrigation season. In the last 20 years geospatial information has become available for large areas in the globe, providing accurate, unbiased historical records of actual agriculture conditions at individual land units for small and large agricultural areas. This data, adequately processed and stored in any database format, can provide invaluable information for government and economic interests. Despite the availability of the geospatial imagery records, limited or no geospatial-based information about past and current farming conditions at the level of individual land units exists for many agricultural areas in the world. The absence of this information challenges the work of policy makers to evaluate previous or current

  19. MODEL-BASED DEVELOPMENT OF REAL-TIME SOFTWARE SYSTEM FOR ELECTRONIC UNIT PUMP SYSTEM

    Institute of Scientific and Technical Information of China (English)

    YU Shitao; YANG Shiwei; YANG Lin; GONG Yuanming; ZHUO Bin

    2007-01-01

    A real-time operating system (RTOS), also named OS, is designed based on the hardware platform of MC68376, and is implemented in the electronic control system for unit pump in diesel engine. A parallel and time-based task division method is introduced and the multi-task software architecture is built in the software system for electronic unit pump (EUP) system. The V-model software development process is used to control algorithm of each task. The simulation results of the hardware-in-the-loop simulation system (HILSS) and the engine experimental results show that the OS is an efficient real-time kernel, and can meet the real-time demands of EUP system; The built multi-task software system is real-time, determinate and reliable. V-model development is a good development process of control algorithms for EUP system, the control precision of control system can be ensured, and the development cycle and cost are also decreased.

  20. Comparison of the CPU and memory performance of StatPatternRecognition (SPR) and Toolkit for MultiVariate Analysis (TMVA)

    CERN Document Server

    Palombo, Giulio

    2011-01-01

    High Energy Physics data sets are often characterized by a huge number of events. Therefore, it is extremely important to use statistical packages able to efficiently analyze these unprecedented amounts of data. We compare the performance of the statistical packages StatPatternRecognition (SPR) and Toolkit for MultiVariate Analysis (TMVA). We focus on how CPU time and memory usage of the learning process scale versus data set size. As classifiers, we consider Random Forests, Boosted Decision Trees and Neural Networks. For our tests, we employ a data set widely used in the machine learning community, "Threenorm" data set, as well as data tailored for testing various edge cases. For each data set, we constantly increase its size and check CPU time and memory needed to build the classifiers implemented in SPR and TMVA. We show that SPR is often significantly faster and consumes significantly less memory. For example, the SPR implementation of Random Forest is by an order of magnitude faster and consumes an order...

  1. Effect of implementation of Quiet Time Protocol on sleep quality of patients in Intensive Care Unit

    Directory of Open Access Journals (Sweden)

    Chamanzari Hamid

    2016-06-01

    Full Text Available Background and Objective: Sleep disorder is considered as one of the major challenges in the Intensive Care Unit. Psychological and physical factors of environment are involved in its development. The adjustment of these factors to meet this need is essential. The current study was conducted to determine the effect of implementation of  Quiet Time Protocol on sleep quality of patients in intensive care unit. Materials and Method: In this clinical trial study, study population was the hospitalized patients in surgical intensive care unit of Ghaem Hospital of Mashhad in 2013. 60 patients were selected by convenience sampling and then were assigned into intervention and control groups.  The quiet time protocol was implemented in intervention group for 3 consecutive nights from 7pm to 5 am. The data were gathered through made-researcher questionnaire about sleep quality in the first, second and third nights. Data analysis was done through Fisher's exact test, chi-square, independent T-test, repeated measures ANOVA in SPSS21. Results: The mean score of sleep quality in effectiveness aspect in intervention group was higher than the control group in all three nights (p<0.001. This mean in sleep disorders aspect after the intervention in intervention group was significantly reduced in the first (p=0.002 the second and third nights (p<0.001 in compare with control group. Conclusion: According to the results, implementation of quiet time protocol is effective on improving the sleep quality of patients in surgical intensive care unit. Nurses can use this protocol to improve the quality of sleep in patients.

  2. Massively parallel signal processing using the graphics processing unit for real-time brain-computer interface feature extraction

    Directory of Open Access Journals (Sweden)

    J. Adam Wilson

    2009-07-01

    Full Text Available The clock speeds of modern computer processors have nearly plateaued in the past five years. Consequently, neural prosthetic systems that rely on processing large quantities of data in a short period of time face a bottleneck, in that it may not be possible to process all of the data recorded from an electrode array with high channel counts and bandwidth, such as electrocorticographic grids or other implantable systems. Therefore, in this study a method of using the processing capabilities of a graphics card (GPU was developed for real-time neural signal processing of a brain-computer interface (BCI. The NVIDIA CUDA system was used to offload processing to the GPU, which is capable of running many operations in parallel, potentially greatly increasing the speed of existing algorithms. The BCI system records many channels of data, which are processed and translated into a control signal, such as the movement of a computer cursor. This signal processing chain involves computing a matrix-matrix multiplication (i.e., a spatial filter, followed by calculating the power spectral density on every channel using an auto-regressive method, and finally classifying appropriate features for control. In this study, the first two computationally-intensive steps were implemented on the GPU, and the speed was compared to both the current implementation and a CPU-based implementation that uses multi-threading. Significant performance gains were obtained with GPU processing: the current implementation processed 1000 channels in 933 ms, while the new GPU method took only 27 ms, an improvement of nearly 35 times.

  3. Containment closure time following loss of cooling under shutdown conditions of YGN units 3 and 4

    Energy Technology Data Exchange (ETDEWEB)

    Seul, Kwang Won; Bang, Young Seok; Kim, Se Won; Kim, Hho Jung [Korea Institute of Nuclear Safety, Taejon (Korea, Republic of)

    1998-12-31

    The YGN Units 3 and 4 plant conditions during shutdown operation were reviewed to identify the possible event scenarios following the loss of shutdown cooling. The thermal hydraulic analyses were performed for the five cases of RCS configurations under the worst event scenario, unavailable secondary cooling and no RCS inventory makeup, using the RELAP5/MOD3.2 code to investigate the plant behavior. From the analyses results, times to boil, times to core uncovery and times to core heat up were estimated to determine the containment closure time to prevent the uncontrolled release of fission products to atmosphere. These data provide useful information to the abnormal procedure to cope with the event. 6 refs., 7 figs., 2 tabs. (Author)

  4. Development of a Real-time Personal Dosimeter System and its Application to Hanul Unit-4

    Energy Technology Data Exchange (ETDEWEB)

    Kang, Kidoo; Cho, Moonhyung; Son, Jungkwon [Korea Hydro Nuclear Power Co., Seoul (Korea, Republic of)

    2013-10-15

    The main reasons to adopt the system are to minimize unnecessary exposure, to calculate one's dose faster, to provide a possible alternatives of personnel such as radiation safety manager. The KHNP's Remote radiation Monitoring System (KRMS) is characterized as integrated, less bulky, lighter comparing to existing instrument although it have multifunction of real-time dosimetry and voice communication. After laboratory test in Central Research Institute (CRI) and field test in Hanbit unit-3 and 4, KRMS was applied to main radiation works in Hanul unit-4. KHNP-CRI has developed real-time personal dose monitoring system and applied to Hanul overhaul which include steam generator replacement. It took 5 days to install the system in reactor building and the optimal location for the repeater was 3 points at 122ft and 3 points at 100ft. Owing to the optimization of repeater and high sensitivity antenna, there was no shaded area of wireless network and no loss of dose data in spite of wearing lead jacket. The average deviation of personal dose received by KRMS and existing ADR is about 2%, which tell us it matches well. The lessons learned in Hanul unit-4 are it needs simplification of operating system and it requires a function to be able to check battery level at remote area.

  5. Computing the Density Matrix in Electronic Structure Theory on Graphics Processing Units.

    Science.gov (United States)

    Cawkwell, M J; Sanville, E J; Mniszewski, S M; Niklasson, Anders M N

    2012-11-13

    The self-consistent solution of a Schrödinger-like equation for the density matrix is a critical and computationally demanding step in quantum-based models of interatomic bonding. This step was tackled historically via the diagonalization of the Hamiltonian. We have investigated the performance and accuracy of the second-order spectral projection (SP2) algorithm for the computation of the density matrix via a recursive expansion of the Fermi operator in a series of generalized matrix-matrix multiplications. We demonstrate that owing to its simplicity, the SP2 algorithm [Niklasson, A. M. N. Phys. Rev. B2002, 66, 155115] is exceptionally well suited to implementation on graphics processing units (GPUs). The performance in double and single precision arithmetic of a hybrid GPU/central processing unit (CPU) and full GPU implementation of the SP2 algorithm exceed those of a CPU-only implementation of the SP2 algorithm and traditional matrix diagonalization when the dimensions of the matrices exceed about 2000 × 2000. Padding schemes for arrays allocated in the GPU memory that optimize the performance of the CUBLAS implementations of the level 3 BLAS DGEMM and SGEMM subroutines for generalized matrix-matrix multiplications are described in detail. The analysis of the relative performance of the hybrid CPU/GPU and full GPU implementations indicate that the transfer of arrays between the GPU and CPU constitutes only a small fraction of the total computation time. The errors measured in the self-consistent density matrices computed using the SP2 algorithm are generally smaller than those measured in matrices computed via diagonalization. Furthermore, the errors in the density matrices computed using the SP2 algorithm do not exhibit any dependence of system size, whereas the errors increase linearly with the number of orbitals when diagonalization is employed.

  6. Adaptive sampling for real-time rendering of large terrain based on B-spline wavelet

    Science.gov (United States)

    Kalem, Sid Ali; Kourgli, Assia

    2017-05-01

    This paper describes a central processing unit (CPU)-based technique for terrain geometry rendering that could relieve graphics processing unit (GPU) from processing the appropriate level of detail (LOD) of the geometric surface. The proposed approach alleviates the computational load on the CPU and approaches GPU-based efficiency. As the datasets of realistic terrains are usually huge for real-time rendering, we suggest using a training stage to handle large tiled QuadTree terrain representation. The training stage is based on multiresolution wavelet decomposition and is used to limit the region of error control inside the tile. Maximum approximation errors are then calculated for each tile at different resolutions. Maximum world-space errors of the tile at different resolutions permit selection of the appropriate resolution of downsampling that will represent the tile at the run time. Tests and experiments demonstrate that B-spline 0 and B-spline 1 wavelets, well known for their properties of localization and their compact support, are suitable for fast and accurate localization of the maximum approximation error. The experimental results demonstrate that the proposed approach drastically reduces computation time in the CPU. Such a technique should also be used on low/medium end PCs, and embedded systems that are not equipped with the latest models of graphic hardware.

  7. Modifiable temporal unit problem (MTUP and its effect on space-time cluster detection.

    Directory of Open Access Journals (Sweden)

    Tao Cheng

    Full Text Available BACKGROUND: When analytical techniques are used to understand and analyse geographical events, adjustments to the datasets (e.g. aggregation, zoning, segmentation etc. in both the spatial and temporal dimensions are often carried out for various reasons. The 'Modifiable Areal Unit Problem' (MAUP, which is a consequence of adjustments in the spatial dimension, has been widely researched. However, its temporal counterpart is generally ignored, especially in space-time analysis. METHODS: In analogy to MAUP, the Modifiable Temporal Unit Problem (MTUP is defined as consisting of three temporal effects (aggregation, segmentation and boundary. The effects of MTUP on the detection of space-time clusters of crime datasets of Central London are examined using Space-Time Scan Statistics (STSS. RESULTS AND CONCLUSION: The case study reveals that MTUP has significant effects on the space-time clusters detected. The attributes of the clusters, i.e. temporal duration, spatial extent (size and significance value (p-value, vary as the aggregation, segmentation and boundaries of the datasets change. Aggregation could be used to find the significant clusters much more quickly than at lower scales; segmentation could be used to understand the cyclic patterns of crime types. The consistencies of the clusters appearing at different temporal scales could help in identifying strong or 'true' clusters.

  8. Microcontroller based resonance tracking unit for time resolved continuous wave cavity-ringdown spectroscopy measurements.

    Science.gov (United States)

    Votava, Ondrej; Mašát, Milan; Parker, Alexander E; Jain, Chaithania; Fittschen, Christa

    2012-04-01

    We present in this work a new tracking servoloop electronics for continuous wave cavity-ringdown absorption spectroscopy (cw-CRDS) and its application to time resolved cw-CRDS measurements by coupling the system with a pulsed laser photolysis set-up. The tracking unit significantly increases the repetition rate of the CRDS events and thus improves effective time resolution (and/or the signal-to-noise ratio) in kinetics studies with cw-CRDS in given data acquisition time. The tracking servoloop uses novel strategy to track the cavity resonances that result in a fast relocking (few ms) after the loss of tracking due to an external disturbance. The microcontroller based design is highly flexible and thus advanced tracking strategies are easy to implement by the firmware modification without the need to modify the hardware. We believe that the performance of many existing cw-CRDS experiments, not only time-resolved, can be improved with such tracking unit without any additional modification to the experiment. © 2012 American Institute of Physics

  9. 16位中央处理器设计与现场可编程门阵列实现%16-bit CPU Design and FPGA Implementation

    Institute of Scientific and Technical Information of China (English)

    白广治; 陈泉根

    2007-01-01

    为了自主开发中央处理器(Central Processing Unit,CPU),对16位CPU进行了研究,提出了以执行周期尽量最少的译码执行方式,采用Top-Down的方法进行设计,用硬件描述语言Verilog进行代码编写,并对编写的CPU代码进行仿真验证和现场可编程门阵列(Field Programmable Gate Array,FPGA)验证.结果表明,该CPU运行效率较INTEL等通用CPU有较大提高.该自主CPU可以作为IP核进行FPGA应用,也可进行SoC设计应用.

  10. Reorganizing nursing work on surgical units: a time-and-motion study.

    Science.gov (United States)

    Desjardins, France; Cardinal, Linda; Belzile, Eric; McCusker, Jane

    2008-01-01

    A time-and-motion study was conducted in response to perceptions that the surgical nursing staff at a Montreal hospital was spending an excessive amount of time on non-nursing care. A sample of 30 nurse shifts was observed by trained observers who timed nurses' activities for their entire working shift using a hand-held Personal Digital Assistant. Activities were grouped into four main categories: direct patient care, indirect patient care, non-nursing and personal activities. Break and meal times were excluded from the denominator of total worked hours. A total of 201 working hours were observed, an average of 6 hours, 42 minutes per nurse shift. The mean proportions of each nurse shift spent on the main activity categories were: direct care 32.8%, indirect care 55.7%, non-nursing tasks 9.0% and personal 2.5%. Three activities (communication among health professionals, medication verification/preparation and documentation) comprised 78.9% of indirect care time. Greater time on indirect care was associated with work on night shifts and on the short-stay surgical unit. Subsequent work reorganization focused on reducing time spent on communication and medications. The authors conclude that time-and-motion studies are a useful method of monitoring appropriate use of nursing staff, and may provide results that assist in restructuring nursing tasks.

  11. 面向国产CPU SW-1600的向量重组%DOMESTIC PRODUCED CPU SW-1600 ORIENTED VECTOR REGROUP

    Institute of Scientific and Technical Information of China (English)

    魏帅; 赵荣彩; 姚远

    2011-01-01

    Since vectorized regroup instructions sire comparatively complex whereas different instructions correspond to different delays, it is hard to find out a uniform and efficient vector regroup algorithm. The paper analyzes shifting and insertion/extraction instructions that are offered by domestic produced CPU SW-1600, and presents an optimal algorithm that only depends on shifting or insertion/extraction instructions to realize vector regroup as well as an efficient algorithm that integrates the two types of instructions to realize vector regroup. At last it is proven by experiments that the algorithms can better vectorize programs. The speedup ratio for integer type values reaches 7.31 while that for complex double precision float type programs reaches 1. 83.%由于向量化重组指令比较复杂并且不同指令有不同的延迟,从而难以寻找一种统一高效的向量重组算法.对国产CPUSW-1600提供的移位和插入提取指令进行了分析,提出单独依靠移位或插入提取指令实现向量重组的最优算法,并综合这两类指令实现向量重组的高效算法.最后通过实验证明该算法可以较好地对程序进行向量化,对整型数据的加速比达到7.31,对复杂的双精度浮点型程序的加速比也达到1.83.

  12. Multiple time scales in modeling the incidence of infections acquired in intensive care units

    Directory of Open Access Journals (Sweden)

    Martin Wolkewitz

    2016-09-01

    Full Text Available Abstract Background When patients are admitted to an intensive care unit (ICU their risk of getting an infection will be highly depend on the length of stay at-risk in the ICU. In addition, risk of infection is likely to vary over calendar time as a result of fluctuations in the prevalence of the pathogen on the ward. Hence risk of infection is expected to depend on two time scales (time in ICU and calendar time as well as competing events (discharge or death and their spatial location. The purpose of this paper is to develop and apply appropriate statistical models for the risk of ICU-acquired infection accounting for multiple time scales, competing risks and the spatial clustering of the data. Methods A multi-center data base from a Spanish surveillance network was used to study the occurrence of an infection due to Methicillin-resistant Staphylococcus aureus (MRSA. The analysis included 84,843 patient admissions between January 2006 and December 2011 from 81 ICUs. Stratified Cox models were used to study multiple time scales while accounting for spatial clustering of the data (patients within ICUs and for death or discharge as competing events for MRSA infection. Results Both time scales, time in ICU and calendar time, are highly associated with the MRSA hazard rate and cumulative risk. When using only one basic time scale, the interpretation and magnitude of several patient-individual risk factors differed. Risk factors concerning the severity of illness were more pronounced when using only calendar time. These differences disappeared when using both time scales simultaneously. Conclusions The time-dependent dynamics of infections is complex and should be studied with models allowing for multiple time scales. For patient individual risk-factors we recommend stratified Cox regression models for competing events with ICU time as the basic time scale and calendar time as a covariate. The inclusion of calendar time and stratification by ICU

  13. Publishing Time-Frame Evaluation for Doctoral Students in United Kingdom

    Directory of Open Access Journals (Sweden)

    Andrada Elena URDA-CÎMPEAN

    2014-09-01

    Full Text Available The first objective of the study was to compute the time to completion and publication of original scientific publications for medical doctoral students in the UK. A second objective was to evaluate if PhD theses format (monograph or publication-based can influence the time to completion and publication of original scientific publications. We assessed a small sample of free full text medical doctoral theses from universities in the United Kingdom (mostly from the University of Manchester, which have produced at least 2 original scientific publications by the end of the doctoral studies. The time elapsed between 2 consecutive publications from the same thesis was considered an approximation of the time to completion and publication of the second publication. In the case of prospective theses, the median time to completion and publication of original scientific publications from medical doctoral theses was 10.17 months. We found that there was a statistically significant difference between the time (to completion and publication medians of the publications from traditional theses format and of the publications from publication-based theses format. Time to completion and publication of original scientific publications for medical doctoral students needs to be further evaluated on a larger scale, based on more theses from several medical faculties in the UK.

  14. Time-resolved and time-averaged stereo-PIV measurements of a unit-ratio cavity

    Science.gov (United States)

    Immer, Marc; Allegrini, Jonas; Carmeliet, Jan

    2016-06-01

    An experimental setup was developed to perform wind tunnel measurements on a unit-ratio, 2D open cavity under perpendicular incident flow. The open cavity is characterized by a mixing layer at the cavity top, that divides the flow field into a boundary layer flow and a cavity flow. Instead of precisely replicating a specific type of inflow, such as a turbulent flat plate boundary layer or an atmospheric boundary layer, the setup is capable of simulating a wide range of inflow profiles. This is achieved by using triangular spires as upstream turbulence generators, which can modify the otherwise laminar inflow boundary layer to be moderately turbulent and stationary, or heavily turbulent and intermittent. Measurements were performed by means of time-resolved stereo PIV. The cavity shear layer is analyzed in detail using flow statistics, spectral analysis, and space-time plots. The ability of the setup to generate typical cavity flow cases is demonstrated for characteristic inflow boundary layers, laminar and turbulent. Each case is associated with a distinct shear layer flow phenomena, self-sustained oscillations for the former and Kelvin-Helmholtz instabilities for the latter. Additionally, large spires generate a highly turbulent wake flow, resulting in a significantly different cavity flow. Large turbulent sweep and ejection events in the wake flow suppress the typical shear layer and sporadic near wall sweep events generate coherent vortices at the upstream edge.

  15. New approximate solutions per unit of time for periodically checked systems with different lifetime distributions

    Directory of Open Access Journals (Sweden)

    J. Rodrigues Dias

    2006-11-01

    Full Text Available Systems with different lifetime distributions, associated with increasing, decreasing, constant, and bathtub-shaped hazard rates, are examined in this paper. It is assumed that a failure is only detected if systems are inspected. New approximate solutions for the inspection period and for the expected duration of hidden faults are presented, on the basis of the assumption that only periodic and perfect inspections are carried out. By minimizing total expected cost per unit of time, on the basis of numerical results and a range of comparisons, the conclusion is drawn that these new approximate solutions are extremely useful and simple to put into practice.

  16. High Performance Commodity Networking in a 512-CPU Teraflop Beowulf Cluster for Computational Astrophysics

    CERN Document Server

    Dubinski, J; Pen, U L; Loken, C; Martin, P; Dubinski, John; Humble, Robin; Loken, Chris; Martin, Peter; Pen, Ue-Li

    2003-01-01

    We describe a new 512-CPU Beowulf cluster with Teraflop performance dedicated to problems in computational astrophysics. The cluster incorporates a cubic network topology based on inexpensive commodity 24-port gigabit switches and point to point connections through the second gigabit port on each Linux server. This configuration has network performance competitive with more expensive cluster configurations and is scaleable to much larger systems using other network topologies. Networking represents only about 9% of our total system cost of USD$561K. The standard Top 500 HPL Linpack benchmark rating is 1.202 Teraflops on 512 CPUs so computing costs by this measure are $0.47/Megaflop. We also describe 4 different astrophysical applications using complex parallel algorithms for studying large-scale structure formation, galaxy dynamics, magnetohydrodynamic flows onto blackholes and planet formation currently running on the cluster and achieving high parallel performance. The MHD code achieved a sustained speed of...

  17. Accelerating mesh-based Monte Carlo method on modern CPU architectures.

    Science.gov (United States)

    Fang, Qianqian; Kaeli, David R

    2012-12-01

    In this report, we discuss the use of contemporary ray-tracing techniques to accelerate 3D mesh-based Monte Carlo photon transport simulations. Single Instruction Multiple Data (SIMD) based computation and branch-less design are exploited to accelerate ray-tetrahedron intersection tests and yield a 2-fold speed-up for ray-tracing calculations on a multi-core CPU. As part of this work, we have also studied SIMD-accelerated random number generators and math functions. The combination of these techniques achieved an overall improvement of 22% in simulation speed as compared to using a non-SIMD implementation. We applied this new method to analyze a complex numerical phantom and both the phantom data and the improved code are available as open-source software at http://mcx.sourceforge.net/mmc/.

  18. A Bit String Content Aware Chunking Strategy for Reduced CPU Energy on Cloud Storage

    Directory of Open Access Journals (Sweden)

    Bin Zhou

    2015-01-01

    Full Text Available In order to achieve energy saving and reduce the total cost of ownership, green storage has become the first priority for data center. Detecting and deleting the redundant data are the key factors to the reduction of the energy consumption of CPU, while high performance stable chunking strategy provides the groundwork for detecting redundant data. The existing chunking algorithm greatly reduces the system performance when confronted with big data and it wastes a lot of energy. Factors affecting the chunking performance are analyzed and discussed in the paper and a new fingerprint signature calculation is implemented. Furthermore, a Bit String Content Aware Chunking Strategy (BCCS is put forward. This strategy reduces the cost of signature computation in chunking process to improve the system performance and cuts down the energy consumption of the cloud storage data center. On the basis of relevant test scenarios and test data of this paper, the advantages of the chunking strategy are verified.

  19. Using real time process measurements to reduce catheter related bloodstream infections in the intensive care unit

    Science.gov (United States)

    Wall, R; Ely, E; Elasy, T; Dittus, R; Foss, J; Wilkerson, K; Speroff, T

    2005-01-01

    

Problem: Measuring a process of care in real time is essential for continuous quality improvement (CQI). Our inability to measure the process of central venous catheter (CVC) care in real time prevented CQI efforts aimed at reducing catheter related bloodstream infections (CR-BSIs) from these devices. Design: A system was developed for measuring the process of CVC care in real time. We used these new process measurements to continuously monitor the system, guide CQI activities, and deliver performance feedback to providers. Setting: Adult medical intensive care unit (MICU). Key measures for improvement: Measured process of CVC care in real time; CR-BSI rate and time between CR-BSI events; and performance feedback to staff. Strategies for change: An interdisciplinary team developed a standardized, user friendly nursing checklist for CVC insertion. Infection control practitioners scanned the completed checklists into a computerized database, thereby generating real time measurements for the process of CVC insertion. Armed with these new process measurements, the team optimized the impact of a multifaceted intervention aimed at reducing CR-BSIs. Effects of change: The new checklist immediately provided real time measurements for the process of CVC insertion. These process measures allowed the team to directly monitor adherence to evidence-based guidelines. Through continuous process measurement, the team successfully overcame barriers to change, reduced the CR-BSI rate, and improved patient safety. Two years after the introduction of the checklist the CR-BSI rate remained at a historic low. Lessons learnt: Measuring the process of CVC care in real time is feasible in the ICU. When trying to improve care, real time process measurements are an excellent tool for overcoming barriers to change and enhancing the sustainability of efforts. To continually improve patient safety, healthcare organizations should continually measure their key clinical processes in real

  20. Route to One Atomic Unit of Time: Development of a Broadband Attosecond Streak Camera

    Science.gov (United States)

    Zhao, Kun; Zhang, Qi; Chini, Michael; Chang, Zenghu

    A new attosecond streak camera based on a three-meter-long magnetic-bottle time-of-flight electron spectrometer (MBES) is developed. The temporal resolution of the photoelectron detection system is measured to be better than 250 ps, which is sufficient to achieve an energy resolution of 0.5 eV at 150 eV photoelectron energy. In preliminary experiments, a 94-as isolated XUV pulse was generated and characterized. With a new algorithm to retrieve the amplitude and phase of XUV pulses (PROOF—phase retrieval by omega oscillation filtering), the attosecond streak camera will be able to characterize isolated attosecond pulses as short as one atomic unit of time (25 as).

  1. Practical Testing and Performance Analysis of Phasor Measurement Unit Using Real Time Digital Simulator (RTDS)

    DEFF Research Database (Denmark)

    Liu, Leo; Rather, Zakir Hussain; Stearn, Nathen

    2012-01-01

    visualisation and post event analysis of power systems. It is expected however, that through integration with traditional Supervisory Control and Data Acquisition (SCADA) systems, closed loop control applications will be possible. Phasor Measurement Units (PMUs) are fundamental components of WAMS. Large WAMS......Wide Area Measurement Systems (WAMS) and Wide Area Monitoring, Protection and Control Systems (WAMPACS) have evolved rapidly over the last two decades [1]. This fast emerging technology enables real time synchronized monitoring of power systems. Presently, WAMS are mainly used for real time...... may support PMUs from multiple manufacturers and therefore it is important that there is a way of standardising the measurement performance of these devices. Currently the IEEE Standard C37.118 is used to quantify the measurement performance of PMUs. While standard specifications are also available...

  2. Software architecture for a multi-purpose real-time control unit for research purposes

    Science.gov (United States)

    Epple, S.; Jung, R.; Jalba, K.; Nasui, V.

    2017-05-01

    A new, freely programmable, scalable control system for academic research purposes was developed. The intention was, to have a control unit capable of handling multiple PT1000 temperature sensors at reasonable accuracy and temperature range, as well as digital input signals and providing powerful output signals. To take full advantage of the system, control-loops are run in real time. The whole eight bit system with very limited memory runs independently of a personal computer. The two on board RS232 connectors allow to connect further units or to connect other equipment, as required in real time. This paper describes the software architecture for the third prototype that now provides stable measurements and an improvement in accuracy compared to the previous designs. As test case a thermal solar system to produce hot tap water and assist heating in a single-family house was implemented. The solar fluid pump was power-controlled and several temperatures at different points in the hydraulic system were measured and used in the control algorithms. The software architecture proved suitable to test several different control strategies and their corresponding algorithms for the thermal solar system.

  3. Conformational characteristics of dimeric subunits of RNA from energy minimization studies. Mixed sugar-puckered ApG, ApU, CpG, and CpU.

    Science.gov (United States)

    Thiyagarajan, P; Ponnuswamy, P K

    1981-09-01

    Following the procedure described in the preceding article, the low energy conformations located for the four dimeric subunits of RNA, ApG, ApU, CpG, and CpU are presented. The A-RNA type and Watson-Crick type helical conformations and a number of different kinds of loop promoting ones were identified as low energy in all the units. The 3E-3E and 3E-2E pucker sequences are found to be more or less equally preferred; the 2E-2E sequence is occasionally preferred, while the 2E-3E is highly prohibited in all the units. A conformation similar to the one observed in the drug-dinucleoside monophosphate complex crystals becomes a low energy case only for the CpG unit. The low energy conformations obtained for the four model units were used to assess the stability of the conformational states of the dinucleotide segments in the four crystal models of the tRNAPhe molecule. Information on the occurrence of the less preferred sugar-pucker sequences in the various loop regions in the tRNAPhe molecule has been obtained. A detailed comparison of the conformational characteristics of DNA and RNA subunits at the dimeric level is presented on the basis of the results.

  4. The United Kingdom 2009 Swine Flu Outbreak As Recorded in Real Time by General Practitioners

    Directory of Open Access Journals (Sweden)

    Hershel Jick

    2011-01-01

    Full Text Available Background. Initially the course of the 2009 swine flu pandemic was uncertain and impossible to predict with any confidence. An effective prospective data resource exists in the United Kingdom (UK that could have been utilized to describe the scope and extent of the swine flu outbreak as it unfolded. We describe the 2009 swine flu outbreak in the UK as recorded daily by general practitioners and the potential use of this database for real-time tracking of flu outbreaks. Methods. Using the General Practice Research Database, a real-time general practice, electronic database, we estimated influenza incidence from July 1998 to September 2009 according to age, region, and calendar time. Results. From 1998 to2008, influenza outbreaks regularly occurred yearly from October to March, but did not typically occur from April to September until the swine flu outbreak began in April 2009. The weekly incidence rose gradually, peaking at the end of July, and the outbreak had largely dissipated by early September. Conclusions. The UK swine flu outbreak, recorded in real time by a large group of general practitioners, was mild and limited in time. Simultaneous online access seemed feasible and could have provided additional clinical-based evidence at an early planning stage of the outbreak.

  5. Fully 3D list-mode time-of-flight PET image reconstruction on GPUs using CUDA.

    Science.gov (United States)

    Cui, Jing-Yu; Pratx, Guillem; Prevrhal, Sven; Levin, Craig S

    2011-12-01

    List-mode processing is an efficient way of dealing with the sparse nature of positron emission tomography (PET) data sets and is the processing method of choice for time-of-flight (ToF) PET image reconstruction. However, the massive amount of computation involved in forward projection and backprojection limits the application of list-mode reconstruction in practice, and makes it challenging to incorporate accurate system modeling. The authors present a novel formulation for computing line projection operations on graphics processing units (GPUs) using the compute unified device architecture (CUDA) framework, and apply the formulation to list-mode ordered-subsets expectation maximization (OSEM) image reconstruction. Our method overcomes well-known GPU challenges such as divergence of compute threads, limited bandwidth of global memory, and limited size of shared memory, while exploiting GPU capabilities such as fast access to shared memory and efficient linear interpolation of texture memory. Execution time comparison and image quality analysis of the GPU-CUDA method and the central processing unit (CPU) method are performed on several data sets acquired on a preclinical scanner and a clinical ToF scanner. When applied to line projection operations for non-ToF list-mode PET, this new GPU-CUDA method is >200 times faster than a single-threaded reference CPU implementation. For ToF reconstruction, we exploit a ToF-specific optimization to improve the efficiency of our parallel processing method, resulting in GPU reconstruction >300 times faster than the CPU counterpart. For a typical whole-body scan with 75 × 75 × 26 image matrix, 40.7 million LORs, 33 subsets, and 3 iterations, the overall processing time is 7.7 s for GPU and 42 min for a single-threaded CPU. Image quality and accuracy are preserved for multiple imaging configurations and reconstruction parameters, with normalized root mean squared (RMS) deviation less than 1% between CPU and GPU

  6. [Perception of night-time sleep by the surgical patients in an intensive care unit].

    Science.gov (United States)

    Nicolás, A; Aizpitarte, E; Iruarrizaga, A; Vázquez, M; Margall, M A; Asiain, M C

    2002-01-01

    Night-time rest of the patients hospitalized in Intensive Care is a very important feature within the health/disease process since it has a direct repercussion on their adequate recovery. The objectives of this investigation are: 1) describe how the surgical patients perceive their night-time sleep in the Polyvalent Intensive Care Unit: 2) compare the subjective perception of the patients with the nursing record in the care plan and analyze the degree of agreement between both assessments. Night-time sleep has been studied in 104 patients; surgery patients from emergencies, patients who are intubated, with previous psychiatric treatment, sleep apnea, drinking habit or impossibility of adequate communication were not included. To measure the patient's perception, the five item sleep questionnaire of Richards-Campbell and the assessment of sleep by the nurse, as well as the remaining variables included in a computerized care plan, were used. The total mean score of the sleep on the first post-operative night was 51.42 mm. When the scores obtained in each one of the questionnaire items are analyzed, it is seen that the sleep profile of these patients has been characterized by being light sleep, with frequent wakenings and generally with little difficulty to go back to sleep when woke op or were awakened. The assessment of the night-time sleep performed by the nurse coincides with the perception of the patients on many occasions, and when there is discrepancy, the nurse has overestimated the patient's sleep.

  7. Overtaking CPU DBMSes with a GPU in whole-query analytic processing with parallelism-friendly execution plan optimization

    NARCIS (Netherlands)

    A. Agbaria (Adnan); D. Minor (David); N. Peterfreund (Natan); E. Rozenberg (Eyal); O. Rosenberg (Ofer); Huawei Research

    2016-01-01

    textabstractExisting work on accelerating analytic DB query processing with (discrete) GPUs fails to fully realize their potential for speedup through parallelism: Published results do not achieve significant speedup over more performant CPU-only DBMSes when processing complete queries. This paper p

  8. Transit times of water particles in the vadose zone across catchment states and catchments functional units

    Science.gov (United States)

    Sprenger, Matthias; Weiler, Markus

    2014-05-01

    Understanding the water movement in the vadose zone and its associated transport of solutes are of major interest to reduce nutrient leaching, pollution transport or other risks to water quality. Soil physical models are widely used to asses such transport processes, while the site specific parameterization of these models remains challenging. Inverse modeling is a common method to adjust the soil physical parameters in a way that the observed water movement or soil water dynamics are reproduced by the simulation. We have shown that the pore water stable isotope concentration can serve as an additional fitting target to simulate the solute transport and water balance in the unsaturated zone. In the presented study, the Mualem- van Genuchten parameters for the Richards equation and diffusivity parameter for the convection-dispersion equation have been parameterized using the inverse model approach with Hydrus-1D for 46 experimental sites of different land use, topography, pedology and geology in the Attert basin in Luxembourg. With the best parameter set we simulated the transport of a conservative solute that was introduced via a pulse input at different points in time. Thus, the transit times in the upper 2 m of the soil for different catchment states could be inferred for each location. It has been shown that the time a particle needs to pass the -2 m depth plane highly varies from the systems state and the systems forcing during and after infiltration of that particle. Differences in transit times among the study sites within the Attert basin were investigated with regards to its governing factors to test the concept of functional units. The study shows the potential of pore water stable isotope concentration for residence times and transport analyses in the unsaturated zone leading to a better understanding of the time variable subsurface processes across the catchment.

  9. Effects of Selection and Training on Unit-Level Performance over Time: A Latent Growth Modeling Approach

    Science.gov (United States)

    Van Iddekinge, Chad H.; Ferris, Gerald R.; Perrewe, Pamela L.; Perryman, Alexa A.; Blass, Fred R.; Heetderks, Thomas D.

    2009-01-01

    Surprisingly few data exist concerning whether and how utilization of job-related selection and training procedures affects different aspects of unit or organizational performance over time. The authors used longitudinal data from a large fast-food organization (N = 861 units) to examine how change in use of selection and training relates to…

  10. Integrating Smartphone Technology at the Time of Discharge from a Child and Adolescent Inpatient Psychiatry Unit.

    Science.gov (United States)

    Gregory, Jonathan M; Sukhera, Javeed; Taylor-Gates, Melissa

    2017-01-01

    As smartphone technology becomes an increasingly important part of youth mental health, there has been little to no examination of how to effectively integrate smartphone-based safety planning with inpatient care. Our study sought to examine whether or not we could effectively integrate smartphone-based safety planning into the discharge process on a child and adolescent inpatient psychiatry unit. Staff members completed a survey to determine the extent of smartphone ownership in a population of admitted child and adolescent inpatients. In addition to quantifying smartphone ownership, the survey also tracked whether youth would integrate their previously-established safety plan with a specific safety planning application on their smartphone (Be Safe) at the time of discharge. Sixty-six percent (50/76) of discharged youth owned a smartphone, which is consistent with prior reports of high smartphone ownership in adult psychiatric populations. A minority of youth (18%) downloaded the Be Safe app prior to discharge, with most (68%) suggesting they would download the app after discharge. Notably, all patients who downloaded the app prior to discharge were on their first admission to a psychiatric inpatient unit. Child and adolescent psychiatric inpatients have a clear interest in smartphone-based safety planning. Our results suggest that integrating smartphone-related interventions earlier in an admission might improve access before discharge. This highlights the tension between restricting and incorporating smartphone access for child and adolescent inpatients and may inform future study in this area.

  11. Brainstem Monitoring in the Neurocritical Care Unit: A Rationale for Real-Time, Automated Neurophysiological Monitoring.

    Science.gov (United States)

    Stone, James L; Bailes, Julian E; Hassan, Ahmed N; Sindelar, Brian; Patel, Vimal; Fino, John

    2017-02-01

    Patients with severe traumatic brain injury or large intracranial space-occupying lesions (spontaneous cerebral hemorrhage, infarction, or tumor) commonly present to the neurocritical care unit with an altered mental status. Many experience progressive stupor and coma from mass effects and transtentorial brain herniation compromising the ascending arousal (reticular activating) system. Yet, little progress has been made in the practicality of bedside, noninvasive, real-time, automated, neurophysiological brainstem, or cerebral hemispheric monitoring. In this critical review, we discuss the ascending arousal system, brain herniation, and shortcomings of our current management including the neurological exam, intracranial pressure monitoring, and neuroimaging. We present a rationale for the development of nurse-friendly-continuous, automated, and alarmed-evoked potential monitoring, based upon the clinical and experimental literature, advances in the prognostication of cerebral anoxia, and intraoperative neurophysiological monitoring.

  12. Space-Time Unit-Level EBLUP for Large Data Sets

    Directory of Open Access Journals (Sweden)

    D’Aló Michele

    2017-03-01

    Full Text Available Most important large-scale surveys carried out by national statistical institutes are the repeated survey type, typically intended to produce estimates for several parameters of the whole population, as well as parameters related to some subpopulations. Small area estimation techniques are becoming more and more important for the production of official statistics where direct estimators are not able to produce reliable estimates. In order to exploit data from different survey cycles, unit-level linear mixed models with area and time random effects can be considered. However, the large amount of data to be processed may cause computational problems. To overcome the computational issues, a reformulation of predictors and the correspondent mean cross product estimator is given. The R code based on the new formulation enables the elaboration of about 7.2 millions of data records in a matter of minutes.

  13. A hybrid CPU-GPU accelerated framework for fast mapping of high-resolution human brain connectome.

    Science.gov (United States)

    Wang, Yu; Du, Haixiao; Xia, Mingrui; Ren, Ling; Xu, Mo; Xie, Teng; Gong, Gaolang; Xu, Ningyi; Yang, Huazhong; He, Yong

    2013-01-01

    Recently, a combination of non-invasive neuroimaging techniques and graph theoretical approaches has provided a unique opportunity for understanding the patterns of the structural and functional connectivity of the human brain (referred to as the human brain connectome). Currently, there is a very large amount of brain imaging data that have been collected, and there are very high requirements for the computational capabilities that are used in high-resolution connectome research. In this paper, we propose a hybrid CPU-GPU framework to accelerate the computation of the human brain connectome. We applied this framework to a publicly available resting-state functional MRI dataset from 197 participants. For each subject, we first computed Pearson's Correlation coefficient between any pairs of the time series of gray-matter voxels, and then we constructed unweighted undirected brain networks with 58 k nodes and a sparsity range from 0.02% to 0.17%. Next, graphic properties of the functional brain networks were quantified, analyzed and compared with those of 15 corresponding random networks. With our proposed accelerating framework, the above process for each network cost 80∼150 minutes, depending on the network sparsity. Further analyses revealed that high-resolution functional brain networks have efficient small-world properties, significant modular structure, a power law degree distribution and highly connected nodes in the medial frontal and parietal cortical regions. These results are largely compatible with previous human brain network studies. Taken together, our proposed framework can substantially enhance the applicability and efficacy of high-resolution (voxel-based) brain network analysis, and have the potential to accelerate the mapping of the human brain connectome in normal and disease states.

  14. A hybrid CPU-GPU accelerated framework for fast mapping of high-resolution human brain connectome.

    Directory of Open Access Journals (Sweden)

    Yu Wang

    Full Text Available Recently, a combination of non-invasive neuroimaging techniques and graph theoretical approaches has provided a unique opportunity for understanding the patterns of the structural and functional connectivity of the human brain (referred to as the human brain connectome. Currently, there is a very large amount of brain imaging data that have been collected, and there are very high requirements for the computational capabilities that are used in high-resolution connectome research. In this paper, we propose a hybrid CPU-GPU framework to accelerate the computation of the human brain connectome. We applied this framework to a publicly available resting-state functional MRI dataset from 197 participants. For each subject, we first computed Pearson's Correlation coefficient between any pairs of the time series of gray-matter voxels, and then we constructed unweighted undirected brain networks with 58 k nodes and a sparsity range from 0.02% to 0.17%. Next, graphic properties of the functional brain networks were quantified, analyzed and compared with those of 15 corresponding random networks. With our proposed accelerating framework, the above process for each network cost 80∼150 minutes, depending on the network sparsity. Further analyses revealed that high-resolution functional brain networks have efficient small-world properties, significant modular structure, a power law degree distribution and highly connected nodes in the medial frontal and parietal cortical regions. These results are largely compatible with previous human brain network studies. Taken together, our proposed framework can substantially enhance the applicability and efficacy of high-resolution (voxel-based brain network analysis, and have the potential to accelerate the mapping of the human brain connectome in normal and disease states.

  15. Accelerated space object tracking via graphic processing unit

    Science.gov (United States)

    Jia, Bin; Liu, Kui; Pham, Khanh; Blasch, Erik; Chen, Genshe

    2016-05-01

    In this paper, a hybrid Monte Carlo Gauss mixture Kalman filter is proposed for the continuous orbit estimation problem. Specifically, the graphic processing unit (GPU) aided Monte Carlo method is used to propagate the uncertainty of the estimation when the observation is not available and the Gauss mixture Kalman filter is used to update the estimation when the observation sequences are available. A typical space object tracking problem using the ground radar is used to test the performance of the proposed algorithm. The performance of the proposed algorithm is compared with the popular cubature Kalman filter (CKF). The simulation results show that the ordinary CKF diverges in 5 observation periods. In contrast, the proposed hybrid Monte Carlo Gauss mixture Kalman filter achieves satisfactory performance in all observation periods. In addition, by using the GPU, the computational time is over 100 times less than that using the conventional central processing unit (CPU).

  16. Real-time Graphics Processing Unit Based Fourier Domain Optical Coherence Tomography and Surgical Applications

    Science.gov (United States)

    Zhang, Kang

    2011-12-01

    In this dissertation, real-time Fourier domain optical coherence tomography (FD-OCT) capable of multi-dimensional micrometer-resolution imaging targeted specifically for microsurgical intervention applications was developed and studied. As a part of this work several ultra-high speed real-time FD-OCT imaging and sensing systems were proposed and developed. A real-time 4D (3D+time) OCT system platform using the graphics processing unit (GPU) to accelerate OCT signal processing, the imaging reconstruction, visualization, and volume rendering was developed. Several GPU based algorithms such as non-uniform fast Fourier transform (NUFFT), numerical dispersion compensation, and multi-GPU implementation were developed to improve the impulse response, SNR roll-off and stability of the system. Full-range complex-conjugate-free FD-OCT was also implemented on the GPU architecture to achieve doubled image range and improved SNR. These technologies overcome the imaging reconstruction and visualization bottlenecks widely exist in current ultra-high speed FD-OCT systems and open the way to interventional OCT imaging for applications in guided microsurgery. A hand-held common-path optical coherence tomography (CP-OCT) distance-sensor based microsurgical tool was developed and validated. Through real-time signal processing, edge detection and feed-back control, the tool was shown to be capable of track target surface and compensate motion. The micro-incision test using a phantom was performed using a CP-OCT-sensor integrated hand-held tool, which showed an incision error less than +/-5 microns, comparing to >100 microns error by free-hand incision. The CP-OCT distance sensor has also been utilized to enhance the accuracy and safety of optical nerve stimulation. Finally, several experiments were conducted to validate the system for surgical applications. One of them involved 4D OCT guided micro-manipulation using a phantom. Multiple volume renderings of one 3D data set were

  17. Efficient Irregular Wavefront Propagation Algorithms on Hybrid CPU-GPU Machines.

    Science.gov (United States)

    Teodoro, George; Pan, Tony; Kurc, Tahsin; Kong, Jun; Cooper, Lee; Saltz, Joel

    2013-04-01

    We address the problem of efficient execution of a computation pattern, referred to here as the irregular wavefront propagation pattern (IWPP), on hybrid systems with multiple CPUs and GPUs. The IWPP is common in several image processing operations. In the IWPP, data elements in the wavefront propagate waves to their neighboring elements on a grid if a propagation condition is satisfied. Elements receiving the propagated waves become part of the wavefront. This pattern results in irregular data accesses and computations. We develop and evaluate strategies for efficient computation and propagation of wavefronts using a multi-level queue structure. This queue structure improves the utilization of fast memories in a GPU and reduces synchronization overheads. We also develop a tile-based parallelization strategy to support execution on multiple CPUs and GPUs. We evaluate our approaches on a state-of-the-art GPU accelerated machine (equipped with 3 GPUs and 2 multicore CPUs) using the IWPP implementations of two widely used image processing operations: morphological reconstruction and euclidean distance transform. Our results show significant performance improvements on GPUs. The use of multiple CPUs and GPUs cooperatively attains speedups of 50× and 85× with respect to single core CPU executions for morphological reconstruction and euclidean distance transform, respectively.

  18. Monte Carlo Simulations of Random Frustrated Systems on Graphics Processing Units

    Science.gov (United States)

    Feng, Sheng; Fang, Ye; Hall, Sean; Papke, Ariane; Thomasson, Cade; Tam, Ka-Ming; Moreno, Juana; Jarrell, Mark

    2012-02-01

    We study the implementation of the classical Monte Carlo simulation for random frustrated models using the multithreaded computing environment provided by the the Compute Unified Device Architecture (CUDA) on modern Graphics Processing Units (GPU) with hundreds of cores and high memory bandwidth. The key for optimizing the performance of the GPU computing is in the proper handling of the data structure. Utilizing the multi-spin coding, we obtain an efficient GPU implementation of the parallel tempering Monte Carlo simulation for the Edwards-Anderson spin glass model. In the typical simulations, we find over two thousand times of speed-up over the single threaded CPU implementation.

  19. Graphics processing unit-based quantitative second-harmonic generation imaging.

    Science.gov (United States)

    Kabir, Mohammad Mahfuzul; Jonayat, A S M; Patel, Sanjay; Toussaint, Kimani C

    2014-09-01

    We adapt a graphics processing unit (GPU) to dynamic quantitative second-harmonic generation imaging. We demonstrate the temporal advantage of the GPU-based approach by computing the number of frames analyzed per second from SHG image videos showing varying fiber orientations. In comparison to our previously reported CPU-based approach, our GPU-based image analysis results in ∼10× improvement in computational time. This work can be adapted to other quantitative, nonlinear imaging techniques and provides a significant step toward obtaining quantitative information from fast in vivo biological processes.

  20. Collaborating CPU and GPU for large-scale high-order CFD simulations with complex grids on the TianHe-1A supercomputer

    Energy Technology Data Exchange (ETDEWEB)

    Xu, Chuanfu, E-mail: xuchuanfu@nudt.edu.cn [College of Computer Science, National University of Defense Technology, Changsha 410073 (China); Deng, Xiaogang; Zhang, Lilun [College of Computer Science, National University of Defense Technology, Changsha 410073 (China); Fang, Jianbin [Parallel and Distributed Systems Group, Delft University of Technology, Delft 2628CD (Netherlands); Wang, Guangxue; Jiang, Yi [State Key Laboratory of Aerodynamics, P.O. Box 211, Mianyang 621000 (China); Cao, Wei; Che, Yonggang; Wang, Yongxian; Wang, Zhenghua; Liu, Wei; Cheng, Xinghua [College of Computer Science, National University of Defense Technology, Changsha 410073 (China)

    2014-12-01

    Programming and optimizing complex, real-world CFD codes on current many-core accelerated HPC systems is very challenging, especially when collaborating CPUs and accelerators to fully tap the potential of heterogeneous systems. In this paper, with a tri-level hybrid and heterogeneous programming model using MPI + OpenMP + CUDA, we port and optimize our high-order multi-block structured CFD software HOSTA on the GPU-accelerated TianHe-1A supercomputer. HOSTA adopts two self-developed high-order compact definite difference schemes WCNS and HDCS that can simulate flows with complex geometries. We present a dual-level parallelization scheme for efficient multi-block computation on GPUs and perform particular kernel optimizations for high-order CFD schemes. The GPU-only approach achieves a speedup of about 1.3 when comparing one Tesla M2050 GPU with two Xeon X5670 CPUs. To achieve a greater speedup, we collaborate CPU and GPU for HOSTA instead of using a naive GPU-only approach. We present a novel scheme to balance the loads between the store-poor GPU and the store-rich CPU. Taking CPU and GPU load balance into account, we improve the maximum simulation problem size per TianHe-1A node for HOSTA by 2.3×, meanwhile the collaborative approach can improve the performance by around 45% compared to the GPU-only approach. Further, to scale HOSTA on TianHe-1A, we propose a gather/scatter optimization to minimize PCI-e data transfer times for ghost and singularity data of 3D grid blocks, and overlap the collaborative computation and communication as far as possible using some advanced CUDA and MPI features. Scalability tests show that HOSTA can achieve a parallel efficiency of above 60% on 1024 TianHe-1A nodes. With our method, we have successfully simulated an EET high-lift airfoil configuration containing 800M cells and China's large civil airplane configuration containing 150M cells. To our best knowledge, those are the largest-scale CPU–GPU collaborative simulations

  1. Cumulative Time Series Representation for Code Blue prediction in the Intensive Care Unit.

    Science.gov (United States)

    Salas-Boni, Rebeca; Bai, Yong; Hu, Xiao

    2015-01-01

    Patient monitors in hospitals generate a high number of false alarms that compromise patients care and burden clinicians. In our previous work, an attempt to alleviate this problem by finding combinations of monitor alarms and laboratory test that were predictive of code blue events, called SuperAlarms. Our current work consists of developing a novel time series representation that accounts for both cumulative effects and temporality was developed, and it is applied to code blue prediction in the intensive care unit (ICU). The health status of patients is represented both by a term frequency approach, TF, often used in natural language processing; and by our novel cumulative approach. We call this representation "weighted accumulated occurrence representation", or WAOR. These two representations are fed into a L1 regularized logistic regression classifier, and are used to predict code blue events. Our performance was assessed online in an independent set. We report the sensitivity of our algorithm at different time windows prior to the code blue event, as well as the work-up to detect ratio and the proportion of false code blue detections divided by the number of false monitor alarms. We obtained a better performance with our cumulative representation, retaining a sensitivity close to our previous work while improving the other metrics.

  2. Real-time blood flow visualization using the graphics processing unit.

    Science.gov (United States)

    Yang, Owen; Cuccia, David; Choi, Bernard

    2011-01-01

    Laser speckle imaging (LSI) is a technique in which coherent light incident on a surface produces a reflected speckle pattern that is related to the underlying movement of optical scatterers, such as red blood cells, indicating blood flow. Image-processing algorithms can be applied to produce speckle flow index (SFI) maps of relative blood flow. We present a novel algorithm that employs the NVIDIA Compute Unified Device Architecture (CUDA) platform to perform laser speckle image processing on the graphics processing unit. Software written in C was integrated with CUDA and integrated into a LabVIEW Virtual Instrument (VI) that is interfaced with a monochrome CCD camera able to acquire high-resolution raw speckle images at nearly 10 fps. With the CUDA code integrated into the LabVIEW VI, the processing and display of SFI images were performed also at ∼10 fps. We present three video examples depicting real-time flow imaging during a reactive hyperemia maneuver, with fluid flow through an in vitro phantom, and a demonstration of real-time LSI during laser surgery of a port wine stain birthmark.

  3. Real-time speckle variance swept-source optical coherence tomography using a graphics processing unit

    Science.gov (United States)

    Lee, Kenneth K. C.; Mariampillai, Adrian; Yu, Joe X. Z.; Cadotte, David W.; Wilson, Brian C.; Standish, Beau A.; Yang, Victor X. D.

    2012-01-01

    Abstract: Advances in swept source laser technology continues to increase the imaging speed of swept-source optical coherence tomography (SS-OCT) systems. These fast imaging speeds are ideal for microvascular detection schemes, such as speckle variance (SV), where interframe motion can cause severe imaging artifacts and loss of vascular contrast. However, full utilization of the laser scan speed has been hindered by the computationally intensive signal processing required by SS-OCT and SV calculations. Using a commercial graphics processing unit that has been optimized for parallel data processing, we report a complete high-speed SS-OCT platform capable of real-time data acquisition, processing, display, and saving at 108,000 lines per second. Subpixel image registration of structural images was performed in real-time prior to SV calculations in order to reduce decorrelation from stationary structures induced by the bulk tissue motion. The viability of the system was successfully demonstrated in a high bulk tissue motion scenario of human fingernail root imaging where SV images (512 × 512 pixels, n = 4) were displayed at 54 frames per second. PMID:22808428

  4. Using demographic and time series physiological features to classify sepsis in the intensive care unit.

    Science.gov (United States)

    Gunnarsdottir, Kristin; Sadashivaiah, Vijay; Kerr, Matthew; Santaniello, Sabato; Sarma, Sridevi V

    2016-08-01

    Sepsis, a systemic inflammatory response to infection, is a major health care problem that affects millions of patients every year in the intensive care units (ICUs) worldwide. Despite the fact that ICU patients are heavily instrumented with physiological sensors, early sepsis detection remains challenging, perhaps because clinicians identify sepsis by (i) using static scores derived from bed-side measurements individually, and (ii) deriving these scores at a much slower rate than the rate for which patient data is collected. In this study, we construct a generalized linear model (GLM) for the probability that an ICU patient has sepsis as a function of demographics and bedside measurements. Specifically, models were trained on 29 patient recordings from the MIMIC II database and evaluated on a different test set including 8 patient recordings. A classification accuracy of 62.5% was achieved using demographic measures as features. Adding physiological time series features to the model increased the classification accuracy to 75%. Although very preliminary, these results suggest that using generalized linear models incorporating real time physiological signals may be useful for an early detection of sepsis, thereby improving the chances of a successful treatment.

  5. Timing and locations of reef fish spawning off the southeastern United States

    Science.gov (United States)

    Heyman, William D.; Karnauskas, Mandy; Kobara, Shinichi; Smart, Tracey I.; Ballenger, Joseph C.; Reichert, Marcel J. M.; Wyanski, David M.; Tishler, Michelle S.; Lindeman, Kenyon C.; Lowerre-Barbieri, Susan K.; Switzer, Theodore S.; Solomon, Justin J.; McCain, Kyle; Marhefka, Mark; Sedberry, George R.

    2017-01-01

    Managed reef fish in the Atlantic Ocean of the southeastern United States (SEUS) support a multi-billion dollar industry. There is a broad interest in locating and protecting spawning fish from harvest, to enhance productivity and reduce the potential for overfishing. We assessed spatiotemporal cues for spawning for six species from four reef fish families, using data on individual spawning condition collected by over three decades of regional fishery-independent reef fish surveys, combined with a series of predictors derived from bathymetric features. We quantified the size of spawning areas used by reef fish across many years and identified several multispecies spawning locations. We quantitatively identified cues for peak spawning and generated predictive maps for Gray Triggerfish (Balistes capriscus), White Grunt (Haemulon plumierii), Red Snapper (Lutjanus campechanus), Vermilion Snapper (Rhomboplites aurorubens), Black Sea Bass (Centropristis striata), and Scamp (Mycteroperca phenax). For example, Red Snapper peak spawning was predicted in 24.7–29.0°C water prior to the new moon at locations with high curvature in the 24–30 m depth range off northeast Florida during June and July. External validation using scientific and fishery-dependent data collections strongly supported the predictive utility of our models. We identified locations where reconfiguration or expansion of existing marine protected areas would protect spawning reef fish. We recommend increased sampling off southern Florida (south of 27° N), during winter months, and in high-relief, high current habitats to improve our understanding of timing and location of reef fish spawning off the southeastern United States. PMID:28264006

  6. Accelerating VASP electronic structure calculations using graphic processing units

    KAUST Repository

    Hacene, Mohamed

    2012-08-20

    We present a way to improve the performance of the electronic structure Vienna Ab initio Simulation Package (VASP) program. We show that high-performance computers equipped with graphics processing units (GPUs) as accelerators may reduce drastically the computation time when offloading these sections to the graphic chips. The procedure consists of (i) profiling the performance of the code to isolate the time-consuming parts, (ii) rewriting these so that the algorithms become better-suited for the chosen graphic accelerator, and (iii) optimizing memory traffic between the host computer and the GPU accelerator. We chose to accelerate VASP with NVIDIA GPU using CUDA. We compare the GPU and original versions of VASP by evaluating the Davidson and RMM-DIIS algorithms on chemical systems of up to 1100 atoms. In these tests, the total time is reduced by a factor between 3 and 8 when running on n (CPU core + GPU) compared to n CPU cores only, without any accuracy loss. © 2012 Wiley Periodicals, Inc.

  7. [Analysis of cost and efficiency of a medical nursing unit using time-driven activity-based costing].

    Science.gov (United States)

    Lim, Ji Young; Kim, Mi Ja; Park, Chang Gi

    2011-08-01

    Time-driven activity-based costing was applied to analyze the nursing activity cost and efficiency of a medical unit. Data were collected at a medical unit of a general hospital. Nursing activities were measured using a nursing activities inventory and classified as 6 domains using Easley-Storfjell Instrument. Descriptive statistics were used to identify general characteristics of the unit, nursing activities and activity time, and stochastic frontier model was adopted to estimate true activity time. The average efficiency of the medical unit using theoretical resource capacity was 77%, however the efficiency using practical resource capacity was 96%. According to these results, the portion of non-added value time was estimated 23% and 4% each. The sums of total nursing activity costs were estimated 109,860,977 won in traditional activity-based costing and 84,427,126 won in time-driven activity-based costing. The difference in the two cost calculating methods was 25,433,851 won. These results indicate that the time-driven activity-based costing provides useful and more realistic information about the efficiency of unit operation compared to traditional activity-based costing. So time-driven activity-based costing is recommended as a performance evaluation framework for nursing departments based on cost management.

  8. Design of the IBM RISC System/6000 floating-point execution unit

    Energy Technology Data Exchange (ETDEWEB)

    Montoye, R.K.; Hokenek, E. (International Business Machines Corp., Yorktown Heights, NY (USA). Thomas J. Watson Research Center); Runyon, S.L. (IBM Advanced Workstations Div., Austin, TX (US))

    1990-01-01

    The IBM RISC System/6000 (RS/6000) floating-point unit (FPU) exemplifies a second-generation RISC CPU architecture and an implementation which greatly increases floating-point performance and accuracy. The key feature of the FPU is a unified floating-point multiply-add-fused unit (MAF) which performs the accumulate operation ({ital A} {times} {ital B}) + {ital C} as an indivisible operation. This single functional unit reduces the latency for chained floating-point operations, as well as rounding errors and chip busing. It also reduces the number of adders/normalizers by combining the addition required for fast multiplication with accumulation. The MAF unit is made practical by a unique fast-shifter, which eases the overlap of multiplication and addition, and a leading-zero/one anticipator, which eases overlap of normalization and addition. The accumulate instruction required by this architecture reduces the instruction path length by combining two instructions into one. Additionally, the RS/6000 FPU is tightly coupled to the rest of the CPU, unlike typical floating-point coprocessor chips.

  9. GPU/CPU co-processing parallel computation for seismic data processing in oil and gas exploration%油气勘探地震资料处理GPU/CPU协同并行计算

    Institute of Scientific and Technical Information of China (English)

    刘国峰; 刘钦; 李博; 佟小龙; 刘洪

    2009-01-01

    随着图形处理器(Graphic Processing Unit:GPU)在通用计算领域的日趋成熟,使GPU/CPU协同并行计算应用到油气勘探地震资料处理中,对诸多大规模计算的关键性环节有重大提升.本文阐明协同并行计算机的思路、架构及编程环境,着重分析其计算效率得以大幅度提升的关键所在.文中以地震资料处理中的叠前时间偏移和Gazdag深度偏移为切入点,展示样机测试结果的图像显示.显而易见,生产实践中,时常面临对诸多算法进行算法精度和计算速度之间的折中选择.本文阐明GPU/CPU样机协同计算具有高并行度,进而可在算法精度与计算速度的优化配置协调上获得广阔空间.笔者认为,本文的台式协同并行机研制思路及架构,或可作为地球物理配置高性能计算机全新选择的一项依据.%With the development of Graphic Processing Unit (GPU) in general colculations, it makes the GPU/CPU co-processing parallel computing used in seismic data processing of oil and gas exploration and get the great improvement in some important steps. In this paper, we introduce the idea ,architecture and coding environment for GPU and CPU co-processing with CUDA. At the same time,we pay more attention to point out why this method can greatly improve the efficiency. We use the pre-stack time migration and Gazdag depth migration in seismic data processing as the start point of this paper, and the figures can tell us the result of these tests. Obviously ,we often face the situation that we must abandon some good method because of some problems about computing speed, but with the GPU we will have more choices. The result sticks out a mile that the samples in this paper have more advantages than the common PC-Cluster,the new architecture of making parallel computer on the desk expressed in this paper may be one of another choice for high performance computing in the field of geophysics.

  10. Optimized Laplacian image sharpening algorithm based on graphic processing unit

    Science.gov (United States)

    Ma, Tinghuai; Li, Lu; Ji, Sai; Wang, Xin; Tian, Yuan; Al-Dhelaan, Abdullah; Al-Rodhaan, Mznah

    2014-12-01

    In classical Laplacian image sharpening, all pixels are processed one by one, which leads to large amount of computation. Traditional Laplacian sharpening processed on CPU is considerably time-consuming especially for those large pictures. In this paper, we propose a parallel implementation of Laplacian sharpening based on Compute Unified Device Architecture (CUDA), which is a computing platform of Graphic Processing Units (GPU), and analyze the impact of picture size on performance and the relationship between the processing time of between data transfer time and parallel computing time. Further, according to different features of different memory, an improved scheme of our method is developed, which exploits shared memory in GPU instead of global memory and further increases the efficiency. Experimental results prove that two novel algorithms outperform traditional consequentially method based on OpenCV in the aspect of computing speed.

  11. Design and research of electrosurgical controller based on dual CPU+PSD%基于双CPU+PSD的电外科控制器的设计与研究

    Institute of Scientific and Technical Information of China (English)

    包晔峰; 张强; 蒋永锋; 赵虎成; 陈俊生

    2011-01-01

    设计了一种基于双CPU+PSD的电外科手术仪控制器.该控制器采用直接数字频率合成技术产生频率和脉宽可调的输出波形;通过共享RAM实现主、从处理器的并行运行;设计了输出电流、电压双反馈电路,根据反馈电流及电压信号,检测功能有效地控制输出能量的大小;利用增量式PID算法实现恒流、恒压及恒功率控制.通过在某高频电刀上的试验表明,系统具有较高的输出精度.%Introduces an electrosurgical controller based on dual CPU + PSD. Direct digital frequency synthesis is used to generate output waveform which pulse frequency and width is adjustable. Parallel running of the master and slaveCPU implements by shared RAM. Having designed the feedback circuit of current and voltage,the function of detection can control the output power with of the feedback variables. The controller utilizes a incremental PID control algorithm, incorporating with the voltage and current as feedback variables, it can realize constant voltage, constant current and constant power. The result shows that the output of the system is accurate.

  12. Real-time display on Fourier domain optical coherence tomography system using a graphics processing unit.

    Science.gov (United States)

    Watanabe, Yuuki; Itagaki, Toshiki

    2009-01-01

    Fourier domain optical coherence tomography (FD-OCT) requires resampling of spectrally resolved depth information from wavelength to wave number, and the subsequent application of the inverse Fourier transform. The display rates of OCT images are much slower than the image acquisition rates due to processing speed limitations on most computers. We demonstrate a real-time display of processed OCT images using a linear-in-wave-number (linear-k) spectrometer and a graphics processing unit (GPU). We use the linear-k spectrometer with the combination of a diffractive grating with 1200 lines/mm and a F2 equilateral prism in the 840-nm spectral region to avoid calculating the resampling process. The calculations of the fast Fourier transform (FFT) are accelerated by the GPU with many stream processors, which realizes highly parallel processing. A display rate of 27.9 frames/sec for processed images (2048 FFT size x 1000 lateral A-scans) is achieved in our OCT system using a line scan CCD camera operated at 27.9 kHz.

  13. The Effect of Increasing Meeting Time on the Physiological Indices of Patients Admitted to the Intensive Care Unit

    OpenAIRE

    2016-01-01

    Background Most hospitals have restricted visitation time in intensive care units (ICUs) for various reasons. Given the advantages of family presence and positive effect of emotional touching, talking and smiling on nervous system stimulation and vital signs of the patients. Objectives The present study aimed to determine the effect of increased visitation time on physiological indices of the patients hospitalized in ICUs. ...

  14. High-throughput sequence alignment using Graphics Processing Units

    Directory of Open Access Journals (Sweden)

    Trapnell Cole

    2007-12-01

    Full Text Available Abstract Background The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and de novo genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies. Results This paper describes MUMmerGPU, an open-source high-throughput parallel pairwise local sequence alignment program that runs on commodity Graphics Processing Units (GPUs in common workstations. MUMmerGPU uses the new Compute Unified Device Architecture (CUDA from nVidia to align multiple query sequences against a single reference sequence stored as a suffix tree. By processing the queries in parallel on the highly parallel graphics card, MUMmerGPU achieves more than a 10-fold speedup over a serial CPU version of the sequence alignment kernel, and outperforms the exact alignment component of MUMmer on a high end CPU by 3.5-fold in total application time when aligning reads from recent sequencing projects using Solexa/Illumina, 454, and Sanger sequencing technologies. Conclusion MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU.

  15. Application of PCI9052 in CPU Unit%PCI9052在CPU单元中的应用

    Institute of Scientific and Technical Information of China (English)

    韩洪丽

    2009-01-01

    文章主要介绍了PCI9052芯片在H20-20交换机系统CPU单元中的应用,详细介绍了在交换机系统CPU单元中利用PCI9052芯片将PCI访问转换为ISA访问的工作原理和具体实现过程.该芯片的应用使得H20-20交换机系统CPU的升级换代成为可能,大大提高了交换机CPU单元的处理速度和性能,为交换机增值业务的研发提供了平台,增强了H20-20交换机产品的市场竞争力.

  16. High-throughput Analysis of Large Microscopy Image Datasets on CPU-GPU Cluster Platforms.

    Science.gov (United States)

    Teodoro, George; Pan, Tony; Kurc, Tahsin M; Kong, Jun; Cooper, Lee A D; Podhorszki, Norbert; Klasky, Scott; Saltz, Joel H

    2013-05-01

    Analysis of large pathology image datasets offers significant opportunities for the investigation of disease morphology, but the resource requirements of analysis pipelines limit the scale of such studies. Motivated by a brain cancer study, we propose and evaluate a parallel image analysis application pipeline for high throughput computation of large datasets of high resolution pathology tissue images on distributed CPU-GPU platforms. To achieve efficient execution on these hybrid systems, we have built runtime support that allows us to express the cancer image analysis application as a hierarchical data processing pipeline. The application is implemented as a coarse-grain pipeline of stages, where each stage may be further partitioned into another pipeline of fine-grain operations. The fine-grain operations are efficiently managed and scheduled for computation on CPUs and GPUs using performance aware scheduling techniques along with several optimizations, including architecture aware process placement, data locality conscious task assignment, data prefetching, and asynchronous data copy. These optimizations are employed to maximize the utilization of the aggregate computing power of CPUs and GPUs and minimize data copy overheads. Our experimental evaluation shows that the cooperative use of CPUs and GPUs achieves significant improvements on top of GPU-only versions (up to 1.6×) and that the execution of the application as a set of fine-grain operations provides more opportunities for runtime optimizations and attains better performance than coarser-grain, monolithic implementations used in other works. An implementation of the cancer image analysis pipeline using the runtime support was able to process an image dataset consisting of 36,848 4Kx4K-pixel image tiles (about 1.8TB uncompressed) in less than 4 minutes (150 tiles/second) on 100 nodes of a state-of-the-art hybrid cluster system.

  17. Gridded design rule scaling: taking the CPU toward the 16nm node

    Science.gov (United States)

    Bencher, Christopher; Dai, Huixiong; Chen, Yongmei

    2009-03-01

    The Intel 45nm PenrynTM CPU was a landmark design, not only for its implementation of high-K metal gate materials1, but also for the adoption of a nearly gridded design rule (GDR) layout architecture for the poly silicon gate layer2. One key advantage of using gridded design rules is reduction of design rules and ease of 1- dimensional scaling compared to complex random 2-dimensinal layouts. In this paper, we demonstrate the scaling capability of GDR to 16nm and 22nm logic nodes. Copying the design of published images for the Intel 45nm PenrynTM poly-silicon layer2, we created a mask set designed to duplicate those patterns targeting a final pitch of 64nm and 52nm using Sidewall Spacer Double Patterning for the extreme pitch shrinking and performed exploratory work at final pitch of 44nm. Mask sets were made in both tones to enable demonstration of both damascene (dark field) patterning and poly-silicon gate layer (clear field) GDR layouts, although the results discussed focus primarily on poly-silicon gate layer scaling. The paper discusses the line-and-cut double patterning technique for generating GDR structures, the use of sidewall spacer double patterning for scaling parallel lines and the lithographic process window (CD and alignment) for applying cut masks. Through the demonstration, we highlight process margin issues and suggest corrective actions to be implemented in future demonstrations and more advanced studies. Overall, the process window is quite large and the technique has strong manufacturing possibilities.

  18. Time-To-Treatment of Acute Coronary Syndrome and Unit of First Contact in the ERICO Study

    Science.gov (United States)

    dos Santos, Rafael Caire de Oliveira; Goulart, Alessandra Carvalho; Kisukuri, Alan Loureiro Xavier; Brandão, Rodrigo Martins; Sitnik, Debora; Staniak, Henrique Lane; Bittencourt, Marcio Sommer; Lotufo, Paulo Andrade; Bensenor, Isabela Martins; Santos, Itamar de Souza

    2016-01-01

    Background To the best of our knowledge, there are no studies evaluating the influence of the unit of the first contact on the frequency and time of pharmacological treatment during an acute coronary syndrome (ACS) event. Objectives The main objective was to investigate if the unit of first contact influenced the frequency and time of aspirin treatment in the Strategy of Registry of Acute Coronary Syndrome (ERICO) study. Methods We analyzed the pharmacological treatment time in 830 ERICO participants - 700 individuals for whom the hospital was the unit of first contact and 130 who initially sought primary care units. We built logistic regression models to study whether the unit of first contact was associated with a treatment time of less than three hours. Results Individuals who went to primary care units received the first aspirin dose in those units in 75.6% of the cases. The remaining 24.4% received aspirin at the hospital. Despite this finding, individuals from primary care still had aspirin administered within three hours more frequently than those who went to the hospital (76.8% vs 52.6%; p<0.001 and 100% vs. 70.7%; p=0.001 for non ST-elevation ACS and ST-elevation myocardial infarction, respectively). In adjusted models, individuals coming from primary care were more likely to receive aspirin more quickly (odds ratio: 3.66; 95% confidence interval: 2.06-6.51). Conclusions In our setting, individuals from primary care were more likely to receive aspirin earlier. Enhancing the ability of primary care units to provide early treatment and safe transportation may be beneficial in similar settings. PMID:27849262

  19. Cpu/gpu Computing for AN Implicit Multi-Block Compressible Navier-Stokes Solver on Heterogeneous Platform

    Science.gov (United States)

    Deng, Liang; Bai, Hanli; Wang, Fang; Xu, Qingxin

    2016-06-01

    CPU/GPU computing allows scientists to tremendously accelerate their numerical codes. In this paper, we port and optimize a double precision alternating direction implicit (ADI) solver for three-dimensional compressible Navier-Stokes equations from our in-house Computational Fluid Dynamics (CFD) software on heterogeneous platform. First, we implement a full GPU version of the ADI solver to remove a lot of redundant data transfers between CPU and GPU, and then design two fine-grain schemes, namely “one-thread-one-point” and “one-thread-one-line”, to maximize the performance. Second, we present a dual-level parallelization scheme using the CPU/GPU collaborative model to exploit the computational resources of both multi-core CPUs and many-core GPUs within the heterogeneous platform. Finally, considering the fact that memory on a single node becomes inadequate when the simulation size grows, we present a tri-level hybrid programming pattern MPI-OpenMP-CUDA that merges fine-grain parallelism using OpenMP and CUDA threads with coarse-grain parallelism using MPI for inter-node communication. We also propose a strategy to overlap the computation with communication using the advanced features of CUDA and MPI programming. We obtain speedups of 6.0 for the ADI solver on one Tesla M2050 GPU in contrast to two Xeon X5670 CPUs. Scalability tests show that our implementation can offer significant performance improvement on heterogeneous platform.

  20. Real-time image processing for non-contact monitoring of dynamic displacements using smartphone technologies

    Science.gov (United States)

    Min, Jae-Hong; Gelo, Nikolas J.; Jo, Hongki

    2016-04-01

    The newly developed smartphone application, named RINO, in this study allows measuring absolute dynamic displacements and processing them in real time using state-of-the-art smartphone technologies, such as high-performance graphics processing unit (GPU), in addition to already powerful CPU and memories, embedded high-speed/ resolution camera, and open-source computer vision libraries. A carefully designed color-patterned target and user-adjustable crop filter enable accurate and fast image processing, allowing up to 240fps for complete displacement calculation and real-time display. The performances of the developed smartphone application are experimentally validated, showing comparable accuracy with those of conventional laser displacement sensor.

  1. 38 CFR 36.4232 - Allowable fees and charges; manufactured home unit.

    Science.gov (United States)

    2010-07-01

    ... operator-assisted telephone, terminal entry, or central processing unit-to-central processing unit (CPU-to... charges; manufactured home unit. 36.4232 Section 36.4232 Pensions, Bonuses, and Veterans' Relief... Manufactured Homes and Lots, Including Site Preparation Financing Manufactured Home Units § 36.4232 Allowable...

  2. Toward Optimal Computation of Ultrasound Image Reconstruction Using CPU and GPU.

    Science.gov (United States)

    Techavipoo, Udomchai; Worasawate, Denchai; Boonleelakul, Wittawat; Keinprasit, Rachaporn; Sunpetchniyom, Treepop; Sugino, Nobuhiko; Thajchayapong, Pairash

    2016-11-24

    An ultrasound image is reconstructed from echo signals received by array elements of a transducer. The time of flight of the echo depends on the distance between the focus to the array elements. The received echo signals have to be delayed to make their wave fronts and phase coherent before summing the signals. In digital beamforming, the delays are not always located at the sampled points. Generally, the values of the delayed signals are estimated by the values of the nearest samples. This method is fast and easy, however inaccurate. There are other methods available for increasing the accuracy of the delayed signals and, consequently, the quality of the beamformed signals; for example, the in-phase (I)/quadrature (Q) interpolation, which is more time consuming but provides more accurate values than the nearest samples. This paper compares the signals after dynamic receive beamforming, in which the echo signals are delayed using two methods, the nearest sample method and the I/Q interpolation method. The comparisons of the visual qualities of the reconstructed images and the qualities of the beamformed signals are reported. Moreover, the computational speeds of these methods are also optimized by reorganizing the data processing flow and by applying the graphics processing unit (GPU). The use of single and double precision floating-point formats of the intermediate data is also considered. The speeds with and without these optimizations are also compared.

  3. Toward Optimal Computation of Ultrasound Image Reconstruction Using CPU and GPU

    Science.gov (United States)

    Techavipoo, Udomchai; Worasawate, Denchai; Boonleelakul, Wittawat; Keinprasit, Rachaporn; Sunpetchniyom, Treepop; Sugino, Nobuhiko; Thajchayapong, Pairash

    2016-01-01

    An ultrasound image is reconstructed from echo signals received by array elements of a transducer. The time of flight of the echo depends on the distance between the focus to the array elements. The received echo signals have to be delayed to make their wave fronts and phase coherent before summing the signals. In digital beamforming, the delays are not always located at the sampled points. Generally, the values of the delayed signals are estimated by the values of the nearest samples. This method is fast and easy, however inaccurate. There are other methods available for increasing the accuracy of the delayed signals and, consequently, the quality of the beamformed signals; for example, the in-phase (I)/quadrature (Q) interpolation, which is more time consuming but provides more accurate values than the nearest samples. This paper compares the signals after dynamic receive beamforming, in which the echo signals are delayed using two methods, the nearest sample method and the I/Q interpolation method. The comparisons of the visual qualities of the reconstructed images and the qualities of the beamformed signals are reported. Moreover, the computational speeds of these methods are also optimized by reorganizing the data processing flow and by applying the graphics processing unit (GPU). The use of single and double precision floating-point formats of the intermediate data is also considered. The speeds with and without these optimizations are also compared. PMID:27886149

  4. Toward Optimal Computation of Ultrasound Image Reconstruction Using CPU and GPU

    Directory of Open Access Journals (Sweden)

    Udomchai Techavipoo

    2016-11-01

    Full Text Available An ultrasound image is reconstructed from echo signals received by array elements of a transducer. The time of flight of the echo depends on the distance between the focus to the array elements. The received echo signals have to be delayed to make their wave fronts and phase coherent before summing the signals. In digital beamforming, the delays are not always located at the sampled points. Generally, the values of the delayed signals are estimated by the values of the nearest samples. This method is fast and easy, however inaccurate. There are other methods available for increasing the accuracy of the delayed signals and, consequently, the quality of the beamformed signals; for example, the in-phase (I/quadrature (Q interpolation, which is more time consuming but provides more accurate values than the nearest samples. This paper compares the signals after dynamic receive beamforming, in which the echo signals are delayed using two methods, the nearest sample method and the I/Q interpolation method. The comparisons of the visual qualities of the reconstructed images and the qualities of the beamformed signals are reported. Moreover, the computational speeds of these methods are also optimized by reorganizing the data processing flow and by applying the graphics processing unit (GPU. The use of single and double precision floating-point formats of the intermediate data is also considered. The speeds with and without these optimizations are also compared.

  5. Perception of night-time sleep by surgical patients in an intensive care unit.

    Science.gov (United States)

    Nicolás, Ana; Aizpitarte, Eva; Iruarrizaga, Angélica; Vázquez, Mónica; Margall, Angeles; Asiain, Carmen

    2008-01-01

    The night-time sleep of patients hospitalized in intensive care is a very important feature within the health or disease process, as it has a direct repercussion on their adequate recovery. (1) To describe how surgical patients perceive their sleep in the intensive care unit; (2) to compare the subjective perception of patients with the nursing records and analyse these for the degree of agreement. Descriptive research. One hundred and four surgical patients were recruited to the study. Patients completed the Richards-Campbell Sleep Questionnaire, a five-item visual analogue scale, to subjectively measure their perceived level of sleep (range 0-100 mm). The observation of patient sleep by nurses, demographic data, nursing care during the night and use of specific pharmacological treatments were also collected from the nursing records. The total mean score of sleep on the first post-operative night was 51.42 mm, 28% of patients had a good sleep, 46% a regular sleep and 26% a bad sleep. The sleep profile of these patients has been characterized by the patients having a light sleep, with frequent awakening and generally little difficulty to go back to sleep after the awakenings. The agreement between the nurses' perceptions of patients' sleep and the patients' perception of their sleep was tested by means of one-factor analysis of variance (p nurse-patient perception, we obtained 44% of total agreement and 56% of disagreement. When discrepancy was found, the nurse generally overestimated the patients' perception. Surgical patients' perceptions of their sleep in the ICU suggest that this is inadequate. Nurses' perceptions of patients' sleep partially coincides with the latter's perception, but we have also found that the former frequently overestimate patients' sleep.

  6. Barriers over time to full implementation of health information exchange in the United States.

    Science.gov (United States)

    Kruse, Clemens Scott; Regier, Verna; Rheinboldt, Kurt T

    2014-09-30

    Although health information exchanges (HIE) have existed since their introduction by President Bush in his 2004 State of the Union Address, and despite monetary incentives earmarked in 2009 by the health information technology for economic and clinical health (HITECH) Act, adoption of HIE has been sparse in the United States. Research has been conducted to explore the concept of HIE and its benefit to patients, but viable business plans for their existence are rare, and so far, no research has been conducted on the dynamic nature of barriers over time. The aim of this study is to map the barriers mentioned in the literature to illustrate the effect, if any, of barriers discussed with respect to the HITECH Act from 2009 to the early months of 2014. We conducted a systematic literature review from CINAHL, PubMed, and Google Scholar. The search criteria primarily focused on studies. Each article was read by at least two of the authors, and a final set was established for evaluation (n=28). The 28 articles identified 16 barriers. Cost and efficiency/workflow were identified 15% and 13% of all instances of barriers mentioned in literature, respectively. The years 2010 and 2011 were the most plentiful years when barriers were discussed, with 75% and 69% of all barriers listed, respectively. The frequency of barriers mentioned in literature demonstrates the mindfulness of users, developers, and both local and national government. The broad conclusion is that public policy masks the effects of some barriers, while revealing others. However, a deleterious effect can be inferred when the public funds are exhausted. Public policy will need to lever incentives to overcome many of the barriers such as cost and impediments to competition. Process improvement managers need to optimize the efficiency of current practices at the point of care. Developers will need to work with users to ensure tools that use HIE resources work into existing workflows.

  7. Varying likelihood of Megafire across space and time in the western contiguous United States

    Science.gov (United States)

    Stavros, E.; Abatzoglou, J. T.; Larkin, N. K.; McKenzie, D.; Steel, E.

    2013-12-01

    Studies project that a warming climate will likely increase wildfire activity. These analyses, however, are of aggregate statistics of annual area burned and to anticipate future events, especially those of particular concern like megafires, we need more fire specific projections. Megafires account for a disproportionate amount of damage and are defined quantitatively here as fires that burn >20,234 ha ~50,000 ac. Megafires account for the top two percent of all fires and represent 33% of all area burned in the western contiguous United States from 1984 to 2010. Multiple megafires often occur in one region during a single fire season, suggesting that regional climate is a driver. Therefore, we used composite records of climate and fire to investigate the spatial and temporal variability of the megafire climate space. We then developed logistic regression models to predict the probability that a megafire will occur in a given week. Accuracy was good (AUC > 0.80) for all models. These analyses provide a coarse-scale assessment for operationally defined regions of megafire risk, which can be projected to determine how the likelihood of megafire varies across space and time using the Intergovernmental Panel on Climate Change representative concentration pathways (RCPs) 4.5 and 8.5. In general, with the exception of Northern California (NCAL), Southern California, and the Western Great Basin, there is increasing proportional change over time in the probability of a megafire. There was a significant (p≤0.05) difference between the historical modeled ensemble mean probability of a megafire occurrence from 1979 to 2010 and both RCP 4.5 and 8.5 means during 2031 to 2060. Generally, with the exception of the Southwest and NCAL, there are higher probabilities of megafire occurrence more frequently and for longer periods both throughout the fire season and from year to year, with more pronounced patterns under RCP 8.5 than RCP 4.5. Our results provide a quantitative

  8. Application of Time Transfer Function to McVittie Spacetime: Gravitational Time Delay and Secular Increase in Astronomical Unit

    CERN Document Server

    Arakida, Hideyoshi

    2011-01-01

    We attempt to calculate the gravitational time delay in a time-dependent gravitational field, especially in McVittie spacetime, which can be considered as the spacetime around a gravitating body such as the Sun, embedded in the FLRW (Friedmann-Lema\\^itre-Robertson-Walker) cosmological background metric. To this end, we adopt the time transfer function method proposed by Le Poncin-Lafitte {\\it et al.} (Class. Quant. Grav. 21:4463, 2004) and Teyssandier and Le Poncin-Lafitte (Class. Quant. Grav. 25:145020, 2008), which is originally related to Synge's world function $\\Omega(x_A, x_B)$ and enables to circumvent the integration of the null geodesic equation. We re-examine the global cosmological effect on light propagation in the solar system. The round-trip time of a light ray/signal is given by the functions of not only the spacial coordinates but also the emission time or reception time of light ray/signal, which characterize the time-dependency of solutions. We also apply the obtained results to the secular i...

  9. Use of time-subsidence data during pumping to characterize specific storage and hydraulic conductivity of semi-confining units

    Science.gov (United States)

    Burbey, T. J.

    2003-09-01

    A new graphical technique is developed that takes advantage of time-subsidence data collected from either traditional extensometer installations or from newer technologies such as fixed-station global positioning systems or interferometric synthetic aperture radar imagery, to accurately estimate storage properties of the aquifer and vertical hydraulic conductivity of semi-confining units. Semi-log plots of time-compaction data are highly diagnostic with the straight-line portion of the plot reflecting the specific storage of the semi-confining unit. Calculation of compaction during one-log cycle of time from these plots can be used in a simple analytical expression based on the Cooper-Jacob technique to accurately calculate specific storage of the semi-confining units. In addition, these semi-log plots can be used to identify when the pressure transient has migrated through the confining layer into the unpumped aquifer, precluding the need for additional piezometers within the unpumped aquifer or within the semi-confining units as is necessary in the Neuman and Witherspoon method. Numerical simulations are used to evaluate the accuracy of the new technique. The technique was applied to time-drawdown and time-compaction data collected near Franklin Virginia, within the Potomac aquifers of the Coastal Plain, and shows that the method can be easily applied to estimate the inelastic skeletal specific storage of this aquifer system.

  10. Efficient Execution of Microscopy Image Analysis on CPU, GPU, and MIC Equipped Cluster Systems.

    Science.gov (United States)

    Andrade, G; Ferreira, R; Teodoro, George; Rocha, Leonardo; Saltz, Joel H; Kurc, Tahsin

    2014-10-01

    High performance computing is experiencing a major paradigm shift with the introduction of accelerators, such as graphics processing units (GPUs) and Intel Xeon Phi (MIC). These processors have made available a tremendous computing power at low cost, and are transforming machines into hybrid systems equipped with CPUs and accelerators. Although these systems can deliver a very high peak performance, making full use of its resources in real-world applications is a complex problem. Most current applications deployed to these machines are still being executed in a single processor, leaving other devices underutilized. In this paper we explore a scenario in which applications are composed of hierarchical data flow tasks which are allocated to nodes of a distributed memory machine in coarse-grain, but each of them may be composed of several finer-grain tasks which can be allocated to different devices within the node. We propose and implement novel performance aware scheduling techniques that can be used to allocate tasks to devices. We evaluate our techniques using a pathology image analysis application used to investigate brain cancer morphology, and our experimental evaluation shows that the proposed scheduling strategies significantly outperforms other efficient scheduling techniques, such as Heterogeneous Earliest Finish Time - HEFT, in cooperative executions using CPUs, GPUs, and MICs. We also experimentally show that our strategies are less sensitive to inaccuracy in the scheduling input data and that the performance gains are maintained as the application scales.

  11. Contaminated Coastal Sediments in the Northeastern United States: Changing Sources Over Time

    Science.gov (United States)

    Buchholtz ten Brink, M. R.; Bothner, M. H.; Mecray, E. L.

    2001-05-01

    Regional studies of coastal sediments in the northeastern United States, conducted by the U.S. Geological Survey, show that trace metal contamination from land-based activities has occurred near all major urban centers. Concentrations of metals, such as Cu, Pb, Zn, Hg, and Ag, are 2-5 times background levels in sediments of Boston Harbor, Long Island Sound (LIS), offshore of Gulf of Maine coastal cities, and in the New York Bight (NYB). Contaminant accumulations are strongly influenced by sediment lithology and sediment transport properties in local areas, in addition to proximity to pollutant sources. Inventories are greatest in muddy depo-centers of the NYB, western LIS, and Boston Harbor. Based on sediment cores, the onset of metal contamination in the northeast occurs in the mid-1800s, with inputs increasing in the mid-1900s and decreasing (20-50%) from the 1970s to present. The increases correlate with local population growth and abundance of a bacterial sewage indicator, Clostridium perfringens. Increases of N and Corg in cores also reflect population growth and changing wastewater treatment practices. Corg values reach a high of 6% in buried sediments near the NYB disposal sites. Cores from western LIS have increasing values of C, N, and P in the most recently deposited sediments, in contrast to metal concentrations that have decreased in recent years. Cessation of sludge disposal and reduction of chemical discharges have been effective at reducing inputs; however, contaminated sediment deposits remain in rivers (e.g., the Charles), floodplains (e.g., the Housatonic), and coastal sediments. In the future, high concentrations of metal contaminants stored in buried sediments of marine and fluvial systems are likely to be a lingering and significant source of pollution to coastal environments. Until more effective source-reduction occurs, land-use and industrial practices associated with population growth in the northeast will remain dominant factors for

  12. Grammatical Planning Units during Real-Time Sentence Production in Speakers with Agrammatic Aphasia and Healthy Speakers

    Science.gov (United States)

    Lee, Jiyeon; Yoshida, Masaya; Thompson, Cynthia K.

    2015-01-01

    Purpose: Grammatical encoding (GE) is impaired in agrammatic aphasia; however, the nature of such deficits remains unclear. We examined grammatical planning units during real-time sentence production in speakers with agrammatic aphasia and control speakers, testing two competing models of GE. We queried whether speakers with agrammatic aphasia…

  13. Grammatical Planning Units during Real-Time Sentence Production in Speakers with Agrammatic Aphasia and Healthy Speakers

    Science.gov (United States)

    Lee, Jiyeon; Yoshida, Masaya; Thompson, Cynthia K.

    2015-01-01

    Purpose: Grammatical encoding (GE) is impaired in agrammatic aphasia; however, the nature of such deficits remains unclear. We examined grammatical planning units during real-time sentence production in speakers with agrammatic aphasia and control speakers, testing two competing models of GE. We queried whether speakers with agrammatic aphasia…

  14. CPU0213, a novel endothelin type A and type B receptor antagonist, protects against myocardial ischemia/reperfusion injury in rats

    Directory of Open Access Journals (Sweden)

    Z.Y. Wang

    2011-11-01

    Full Text Available The efficacy of endothelin receptor antagonists in protecting against myocardial ischemia/reperfusion (I/R injury is controversial, and the mechanisms remain unclear. The aim of this study was to investigate the effects of CPU0123, a novel endothelin type A and type B receptor antagonist, on myocardial I/R injury and to explore the mechanisms involved. Male Sprague-Dawley rats weighing 200-250 g were randomized to three groups (6-7 per group: group 1, Sham; group 2, I/R + vehicle. Rats were subjected to in vivo myocardial I/R injury by ligation of the left anterior descending coronary artery and 0.5% sodium carboxymethyl cellulose (1 mL/kg was injected intraperitoneally immediately prior to coronary occlusion. Group 3, I/R + CPU0213. Rats were subjected to identical surgical procedures and CPU0213 (30 mg/kg was injected intraperitoneally immediately prior to coronary occlusion. Infarct size, cardiac function and biochemical changes were measured. CPU0213 pretreatment reduced infarct size as a percentage of the ischemic area by 44.5% (I/R + vehicle: 61.3 ± 3.2 vs I/R + CPU0213: 34.0 ± 5.5%, P < 0.05 and improved ejection fraction by 17.2% (I/R + vehicle: 58.4 ± 2.8 vs I/R + CPU0213: 68.5 ± 2.2%, P < 0.05 compared to vehicle-treated animals. This protection was associated with inhibition of myocardial inflammation and oxidative stress. Moreover, reduction in Akt (protein kinase B and endothelial nitric oxide synthase (eNOS phosphorylation induced by myocardial I/R injury was limited by CPU0213 (P < 0.05. These data suggest that CPU0123, a non-selective antagonist, has protective effects against myocardial I/R injury in rats, which may be related to the Akt/eNOS pathway.

  15. ASSESSMENT OF INFLUENCE OF CUTTING TOOL BREAKAGE ON DRIVE LIFE TIME OF CUTTING UNIT OF HEADING MACHINE

    Directory of Open Access Journals (Sweden)

    О.Е. SHABAEV

    2014-01-01

    Full Text Available In this work a necessity to develop means of technical diagnostics of cutter's performance without stopping heading machine was grounded. There was theoretically demonstrated the possibility of essential decrease in life time of transmission elements of cutting unit during prolonged work of heading machine with broken cutting tool. It was defined that influence of cutting tool breakage on life time of transmission elements depends on cutting tool position on cutting head according to the assembly drawing.

  16. Graphics Processing Unit Enhanced Parallel Document Flocking Clustering

    Energy Technology Data Exchange (ETDEWEB)

    Cui, Xiaohui [ORNL; Potok, Thomas E [ORNL; ST Charles, Jesse Lee [ORNL

    2010-01-01

    Analyzing and clustering documents is a complex problem. One explored method of solving this problem borrows from nature, imitating the flocking behavior of birds. One limitation of this method of document clustering is its complexity O(n2). As the number of documents grows, it becomes increasingly difficult to generate results in a reasonable amount of time. In the last few years, the graphics processing unit (GPU) has received attention for its ability to solve highly-parallel and semi-parallel problems much faster than the traditional sequential processor. In this paper, we have conducted research to exploit this archi- tecture and apply its strengths to the flocking based document clustering problem. Using the CUDA platform from NVIDIA, we developed a doc- ument flocking implementation to be run on the NVIDIA GEFORCE GPU. Performance gains ranged from thirty-six to nearly sixty times improvement of the GPU over the CPU implementation.

  17. Second Language Learners' Contiguous and Discontiguous Multi-Word Unit Use over Time

    Science.gov (United States)

    Yuldashev, Aziz; Fernandez, Julieta; Thorne, Steven L.

    2013-01-01

    Research has described the key role of formulaic language use in both written and spoken communication (Schmitt, 2004; Wray, 2002), as well as in relation to L2 learning (Ellis, Simpson--Vlach, & Maynard, 2008). Relatively few studies have examined related fixed and semi-fixed multi-word units (MWUs), which comprise fixed parts with the potential…

  18. Second Language Learners' Contiguous and Discontiguous Multi-Word Unit Use Over Time

    NARCIS (Netherlands)

    Yuldashev, Aziz; Fernandez, Julieta; Thorne, Steven L.

    Research has described the key role of formulaic language use in both written and spoken communication (Schmitt, 2004; Wray, 2002), as well as in relation to L2 learning (Ellis, Simpson-Vlach, & Maynard, 2008). Relatively few studies have examined related fixed and semifixed multi-word units (MWUs),

  19. Second Language Learners' Contiguous and Discontiguous Multi-Word Unit Use over Time

    Science.gov (United States)

    Yuldashev, Aziz; Fernandez, Julieta; Thorne, Steven L.

    2013-01-01

    Research has described the key role of formulaic language use in both written and spoken communication (Schmitt, 2004; Wray, 2002), as well as in relation to L2 learning (Ellis, Simpson--Vlach, & Maynard, 2008). Relatively few studies have examined related fixed and semi-fixed multi-word units (MWUs), which comprise fixed parts with the potential…

  20. Second Language Learners' Contiguous and Discontiguous Multi-Word Unit Use Over Time

    NARCIS (Netherlands)

    Yuldashev, Aziz; Fernandez, Julieta; Thorne, Steven L.

    2013-01-01

    Research has described the key role of formulaic language use in both written and spoken communication (Schmitt, 2004; Wray, 2002), as well as in relation to L2 learning (Ellis, Simpson-Vlach, & Maynard, 2008). Relatively few studies have examined related fixed and semifixed multi-word units (MWUs),

  1. Evaluation of medical devices in thoracic radiograms in intensive care unit - time to pay attention!

    Science.gov (United States)

    Moreira, Ana Sofia Linhares; Afonso, Maria da Graça Alves; Dinis, Mónica Ribeiro dos Santos Alves; dos Santos, Maria Cristina Granja Teixeira

    2016-01-01

    Objective To identify and evaluate the correct positioning of the most commonly used medical devices as visualized in thoracic radiograms of patients in the intensive care unit of our center. Methods A literature search was conducted for the criteria used to evaluate the correct positioning of medical devices on thoracic radiograms. All the thoracic radiograms performed in the intensive care unit of our center over an 18-month period were analyzed. All admissions in which at least one thoracic radiogram was performed in the intensive care unit and in which at least one medical device was identifiable in the thoracic radiogram were included. One radiogram per admission was selected for analysis. The radiograms were evaluated by an independent observer. Results Out of the 2,312 thoracic radiograms analyzed, 568 were included in this study. Several medical devices were identified, including monitoring leads, endotracheal and tracheostomy tubes, central venous catheters, pacemakers and prosthetic cardiac valves. Of the central venous catheters that were identified, 33.6% of the subclavian and 23.8% of the jugular were malpositioned. Of the endotracheal tubes, 19.9% were malpositioned, while all the tracheostomy tubes were correctly positioned. Conclusion Malpositioning of central venous catheters and endotracheal tubes is frequently identified in radiograms of patients in an intensive care unit. This is relevant because malpositioned devices may be related to adverse events. In future studies, an association between malpositioning and adverse events should be investigated. PMID:27737432

  2. Phasor Measurement Unit and Phasor Data Concentrator test with Real Time Digital Simulator

    DEFF Research Database (Denmark)

    Diakos, Konstantinos; Wu, Qiuwei; Nielsen, Arne Hejde

    2014-01-01

    network to a more reliable, secure and economic operation. The implementation of these devices though, demands the warranty of a secure operation and high-accuracy performance. This paper describes the procedure of establishing a PMU (Phasor Measurement Unit)–PDC (Phasor Data Concentrator) platform...

  3. Acceleration of the OpenFOAM-based MHD solver using graphics processing units

    Energy Technology Data Exchange (ETDEWEB)

    He, Qingyun; Chen, Hongli, E-mail: hlchen1@ustc.edu.cn; Feng, Jingchao

    2015-12-15

    Highlights: • A 3D PISO-MHD was implemented on Kepler-class graphics processing units (GPUs) using CUDA technology. • A consistent and conservative scheme is used in the code which was validated by three basic benchmarks in a rectangular and round ducts. • Parallelized of CPU and GPU acceleration were compared relating to single core CPU in MHD problems and non-MHD problems. • Different preconditions for solving MHD solver were compared and the results showed that AMG method is better for calculations. - Abstract: The pressure-implicit with splitting of operators (PISO) magnetohydrodynamics MHD solver of the couple of Navier–Stokes equations and Maxwell equations was implemented on Kepler-class graphics processing units (GPUs) using the CUDA technology. The solver is developed on open source code OpenFOAM based on consistent and conservative scheme which is suitable for simulating MHD flow under strong magnetic field in fusion liquid metal blanket with structured or unstructured mesh. We verified the validity of the implementation on several standard cases including the benchmark I of Shercliff and Hunt's cases, benchmark II of fully developed circular pipe MHD flow cases and benchmark III of KIT experimental case. Computational performance of the GPU implementation was examined by comparing its double precision run times with those of essentially the same algorithms and meshes. The resulted showed that a GPU (GTX 770) can outperform a server-class 4-core, 8-thread CPU (Intel Core i7-4770k) by a factor of 2 at least.

  4. Research on control law accelerator of digital signal process chip TMS320F28035 for real-time data acquisition and processing

    Science.gov (United States)

    Zhao, Shuangle; Zhang, Xueyi; Sun, Shengli; Wang, Xudong

    2017-08-01

    TI C2000 series digital signal process (DSP) chip has been widely used in electrical engineering, measurement and control, communications and other professional fields, DSP TMS320F28035 is one of the most representative of a kind. When using the DSP program, need data acquisition and data processing, and if the use of common mode C or assembly language programming, the program sequence, analogue-to-digital (AD) converter cannot be real-time acquisition, often missing a lot of data. The control low accelerator (CLA) processor can run in parallel with the main central processing unit (CPU), and the frequency is consistent with the main CPU, and has the function of floating point operations. Therefore, the CLA coprocessor is used in the program, and the CLA kernel is responsible for data processing. The main CPU is responsible for the AD conversion. The advantage of this method is to reduce the time of data processing and realize the real-time performance of data acquisition.

  5. Timing Effects on Divorce: 20th Century Experience in the United States

    Science.gov (United States)

    Schoen, Robert; Canudas-Romo, Vladimir

    2006-01-01

    Period divorce measures can misrepresent the underlying behavior of birth cohorts as changes in cohort timing produce changes in period probabilities of divorce. Building on methods used to adjust period fertility and marriage measures, we adjust U.S. period divorce rates for timing effects, calculating a timing index for every year between 1910…

  6. Effects of age at cordotomy and subsequent exercise on contraction times of motor units in the cat.

    Science.gov (United States)

    Smith, L A; Eldred, E; Edgerton, V R

    1993-12-01

    The contraction times (CTs) of functionally isolated motor units (MUs) in the soleus (SOL) and medial gastrocnemius (MG) muscles were determined in cats that had been spinalized at ages 2 (n = 15) or 12 (n = 9) wk and then either subjected to exercise on a treadmill or simply given manipulative care of the hindlimbs. The MUs were tested approximately 12 wk after the low-thoracic cordotomy, and comparisons were made with data from control animals. The CT of 50.9 ms obtained for SOL units (n = 163) in the spinal cats was 22% shorter than the mean of 65.0 ms for MUs (n = 57) from control cats (n = 4). Contrary to expectation, the CT in animals spinalized at 12 wk was significantly shorter than that in the 2-wk group. The CT for MG units (n = 105) in spinal cats was also significantly shorter (11%) than that in controls cats (n = 66, 6 cats), and those units identified by their high fatigue index as being of slow or fatigue-resistant type had a shorter CT than units with a low index. No distinction in CT of exercised and nonexercised groups was detected for either muscle. These findings are discussed in relation to the bearing influences of supraspinal and segmental origin have on CT duration in SOL and MG muscles during growth of the kitten. A slight, significant decrease (6%) in the fatigue index of SOL MUs (n = 144) was detected, but the values remained high (mean 0.87).

  7. Compute-unified device architecture implementation of a block-matching algorithm for multiple graphical processing unit cards

    Science.gov (United States)

    Massanes, Francesc; Cadennes, Marie; Brankov, Jovan G.

    2011-07-01

    We describe and evaluate a fast implementation of a classical block-matching motion estimation algorithm for multiple graphical processing units (GPUs) using the compute unified device architecture computing engine. The implemented block-matching algorithm uses summed absolute difference error criterion and full grid search (FS) for finding optimal block displacement. In this evaluation, we compared the execution time of a GPU and CPU implementation for images of various sizes, using integer and noninteger search grids. The results show that use of a GPU card can shorten computation time by a factor of 200 times for integer and 1000 times for a noninteger search grid. The additional speedup for a noninteger search grid comes from the fact that GPU has built-in hardware for image interpolation. Further, when using multiple GPU cards, the presented evaluation shows the importance of the data splitting method across multiple cards, but an almost linear speedup with a number of cards is achievable. In addition, we compared the execution time of the proposed FS GPU implementation with two existing, highly optimized nonfull grid search CPU-based motion estimations methods, namely implementation of the Pyramidal Lucas Kanade Optical flow algorithm in OpenCV and simplified unsymmetrical multi-hexagon search in H.264/AVC standard. In these comparisons, FS GPU implementation still showed modest improvement even though the computational complexity of FS GPU implementation is substantially higher than non-FS CPU implementation. We also demonstrated that for an image sequence of 720 × 480 pixels in resolution commonly used in video surveillance, the proposed GPU implementation is sufficiently fast for real-time motion estimation at 30 frames-per-second using two NVIDIA C1060 Tesla GPU cards.

  8. Analytical Call Center Model with Voice Response Unit and Wrap-Up Time

    Directory of Open Access Journals (Sweden)

    Petr Hampl

    2015-01-01

    Full Text Available The last twenty years of computer integration significantly changed the process of service in a call center service systems. Basic building modules of classical call centers – a switching system and a group of humans agents – was extended with other special modules such as skills-based routing module, automatic call distribution module, interactive voice response module and others to minimize the customer waiting time and wage costs. A calling customer of a modern call center is served in the first stage by the interactive voice response module without any human interaction. If the customer requirements are not satisfied in the first stage, the service continues to the second stage realized by the group of human agents. The service time of second stage – the average handle time – is divided into a conversation time and wrap-up time. During the conversation time, the agent answers customer questions and collects its requirements and during the wrap-up time (administrative time the agent completes the task without any customer interaction. The analytical model presented in this contribution is solved under the condition of statistical equilibrium and takes into account the interactive voice response module service time, the conversation time and the wrap-up time.

  9. Performance and scalability of Fourier domain optical coherence tomography acceleration using graphics processing units.

    Science.gov (United States)

    Li, Jian; Bloch, Pavel; Xu, Jing; Sarunic, Marinko V; Shannon, Lesley

    2011-05-01

    Fourier domain optical coherence tomography (FD-OCT) provides faster line rates, better resolution, and higher sensitivity for noninvasive, in vivo biomedical imaging compared to traditional time domain OCT (TD-OCT). However, because the signal processing for FD-OCT is computationally intensive, real-time FD-OCT applications demand powerful computing platforms to deliver acceptable performance. Graphics processing units (GPUs) have been used as coprocessors to accelerate FD-OCT by leveraging their relatively simple programming model to exploit thread-level parallelism. Unfortunately, GPUs do not "share" memory with their host processors, requiring additional data transfers between the GPU and CPU. In this paper, we implement a complete FD-OCT accelerator on a consumer grade GPU/CPU platform. Our data acquisition system uses spectrometer-based detection and a dual-arm interferometer topology with numerical dispersion compensation for retinal imaging. We demonstrate that the maximum line rate is dictated by the memory transfer time and not the processing time due to the GPU platform's memory model. Finally, we discuss how the performance trends of GPU-based accelerators compare to the expected future requirements of FD-OCT data rates.

  10. Airborne SAR Real-time Imaging Algorithm Design and Implementation with CUDA on NVIDIA GPU

    Directory of Open Access Journals (Sweden)

    Meng Da-di

    2013-12-01

    Full Text Available Synthetic Aperture Radar (SAR image processing requires huge computation amount. Traditionally, this task runs on the workstation or server based on Central Processing Unit (CPU and is rather time-consuming, hence real-time processing of SAR data is impossible. Based on Compute Unified Device Architecture (CUDA technology, a new plan of SAR imaging algorithm operated on NVIDIA Graphic Processing Unit (GPU is proposed. The new proposal makes it possible that the data processing procedure and CPU/GPU data exchanging execute concurrently, especially when SAR data size exceeds total GPU global memory size. Multi-GPU is suitably supported by the new proposal and all of computational resources are fully exploited. It is shown by experiment on NVIDIA K20C and INTEL E5645 that the proposed solution accelerates SAR data processing by tens of times. Consequently, the GPU based SAR processing system with the proposed solution embedded is much more power saving and portable, which makes it qualified to be a real-time SAR data processing system. Experiment shows that SAR data of 36 Mega points can be processed in real-time per second by K20C with the new solution equipped.

  11. Coarse-grained and fine-grained parallel optimization for real-time en-face OCT imaging

    Science.gov (United States)

    Kapinchev, Konstantin; Bradu, Adrian; Barnes, Frederick; Podoleanu, Adrian

    2016-03-01

    This paper presents parallel optimizations in the en-face (C-scan) optical coherence tomography (OCT) display. Compared with the cross-sectional (B-scan) imagery, the production of en-face images is more computationally demanding, due to the increased size of the data handled by the digital signal processing (DSP) algorithms. A sequential implementation of the DSP leads to a limited number of real-time generated en-face images. There are OCT applications, where simultaneous production of large number of en-face images from multiple depths is required, such as real-time diagnostics and monitoring of surgery and ablation. In sequential computing, this requirement leads to a significant increase of the time to process the data and to generate the images. As a result, the processing time exceeds the acquisition time and the image generation is not in real-time. In these cases, not producing en-face images in real-time makes the OCT system ineffective. Parallel optimization of the DSP algorithms provides a solution to this problem. Coarse-grained central processing unit (CPU) based and fine-grained graphics processing unit (GPU) based parallel implementations of the conventional Fourier domain (CFD) OCT method and the Master-Slave Interferometry (MSI) OCT method are studied. In the coarse-grained CPU implementation, each parallel thread processes the whole OCT frame and generates a single en-face image. The corresponding fine-grained GPU implementation launches one parallel thread for every data point from the OCT frame and thus achieves maximum parallelism. The performance and scalability of the CPU-based and GPU-based parallel approaches are analyzed and compared. The quality and the resolution of the images generated by the CFD method and the MSI method are also discussed and compared.

  12. Economic and Sociological Correlates of Suicides: Multilevel Analysis of the Time Series Data in the United Kingdom.

    Science.gov (United States)

    Sun, Bruce Qiang; Zhang, Jie

    2016-03-01

    For the effects of social integration on suicides, there have been different and even contradictive conclusions. In this study, the selected economic and social risks of suicide for different age groups and genders in the United Kingdom were identified and the effects were estimated by the multilevel time series analyses. To our knowledge, there exist no previous studies that estimated a dynamic model of suicides on the time series data together with multilevel analysis and autoregressive distributed lags. The investigation indicated that unemployment rate, inflation rate, and divorce rate are all significantly and positively related to the national suicide rates in the United Kingdom from 1981 to 2011. Furthermore, the suicide rates of almost all groups above 40 years are significantly associated with the risk factors of unemployment and inflation rate, in comparison with the younger groups.

  13. The opportunity costs of informal elder-care in the United States: new estimates from the American Time Use Survey.

    Science.gov (United States)

    Chari, Amalavoyal V; Engberg, John; Ray, Kristin N; Mehrotra, Ateev

    2015-06-01

    To provide nationally representative estimates of the opportunity costs of informal elder-care in the United States. Data from the 2011 and 2012 American Time Use Survey. Wage is used as the measure of an individual's value of time (opportunity cost), with wages being imputed for nonworking individuals using a selection-corrected regression methodology. The total opportunity costs of informal elder-care amount to $522 billion annually, while the costs of replacing this care by unskilled and skilled paid care are $221 billion and $642 billion, respectively. Informal caregiving remains a significant phenomenon in the United States with a high opportunity cost, although it remains more economical (in the aggregate) than skilled paid care. © Health Research and Educational Trust.

  14. Real-time reconstruction of sensitivity encoded radial magnetic resonance imaging using a graphics processing unit.

    Science.gov (United States)

    Sørensen, Thomas Sangild; Atkinson, David; Schaeffter, Tobias; Hansen, Michael Schacht

    2009-12-01

    A barrier to the adoption of non-Cartesian parallel magnetic resonance imaging for real-time applications has been the times required for the image reconstructions. These times have exceeded the underlying acquisition time thus preventing real-time display of the acquired images. We present a reconstruction algorithm for commodity graphics hardware (GPUs) to enable real time reconstruction of sensitivity encoded radial imaging (radial SENSE). We demonstrate that a radial profile order based on the golden ratio facilitates reconstruction from an arbitrary number of profiles. This allows the temporal resolution to be adjusted on the fly. A user adaptable regularization term is also included and, particularly for highly undersampled data, used to interactively improve the reconstruction quality. Each reconstruction is fully self-contained from the profile stream, i.e., the required coil sensitivity profiles, sampling density compensation weights, regularization terms, and noise estimates are computed in real-time from the acquisition data itself. The reconstruction implementation is verified using a steady state free precession (SSFP) pulse sequence and quantitatively evaluated. Three applications are demonstrated; real-time imaging with real-time SENSE 1) or k- t SENSE 2) reconstructions, and 3) offline reconstruction with interactive adjustment of reconstruction settings.

  15. General purpose parallel programing using new generation graphic processors: CPU vs GPU comparative analysis and opportunities research

    Directory of Open Access Journals (Sweden)

    Donatas Krušna

    2013-03-01

    Full Text Available OpenCL, a modern parallel heterogeneous system programming language, enables problems to be partitioned and executed on modern CPU and GPU hardware, this increases performance of such applications considerably. Since GPU's are optimized for floating point and vector operations and specialize in them, they outperform general purpose CPU's in this field greatly. This language greatly simplifies the creation of applications for such heterogeneous system since it's cross-platform, vendor independent and is embeddable , hence letting it be used in any other general purpose programming language via libraries. There is more and more tools being developed that are aimed at low level programmers and scientists or engineers alike, that are developing applications or libraries for CPU’s and GPU’s of today as well as other heterogeneous platforms. The tendency today is to increase the number of cores or CPU‘s in hopes of increasing performance, however the increasing difficulty of parallelizing applications for such systems and the even increasing overhead of communication and synchronization are limiting the potential performance. This means that there is a point at which increasing cores or CPU‘s will no longer increase applications performance, and even can diminish performance. Even though parallel programming and GPU‘s with stream computing capabilities have decreased the need for communication and synchronization (since only the final result needs to be committed to memory, however this still is a weak link in developing such applications.

  16. 二进制流模式提取在CPU/GPU下的实现框架%IMPLEMENTATION FRAMEWORK FOR BINARY STREAM PATTERN EXTRACTION UNDER CPU/GPU

    Institute of Scientific and Technical Information of China (English)

    章一超; 陈凯; 梁阿磊; 白英彩; 管海兵

    2012-01-01

    As a stream processor, GPU is widely used in general high-performance computation, and no longer limited to the image processing area only. NVDIA CUDA and AMD Stream SDK are popular stream programming environments for General-Purpose computation on GPU ( GPGPU). However both of them have shortcomings and limitations. The biggest problems are the shortage of binary compatibility which is confronting different GPUs and the large cost in rewriting existing source codes. By using binary analysis and dynamic binary translation technology the article implements an automatic execution framework,GxBit, which offers a method to extract stream patterns from x86 binary programs to be mapped into NVIDIA CUDA programming environment. Validated by multiple programs from CUDA SDK Sample and Parboil Benchmark Suite,the framework upgrades the performance to an average level of 10 times.%图形处理单元(GPU)作为一种流体系结构的处理器,现已被广泛地用于通用高性能计算,而不仅仅局限于图像处理领域了.NVIDIA的CUDA和AMD的Stream SDK都是现在比较流行的针对GPU通用计算(GPGPU)的流编程环境.然而,它们有自身的缺陷和限制,其中最主要的便是缺乏面对不同GPU的二进制兼容性问题和重写已有程序源代码代价大的问题.通过利用二进制分析和动态二进制翻译技术,实现一个自动化执行框架GxBit,它提供一种从x86二进制程序中提取流模式,并映射到NVIDIACUDA编程环境的方法.该框架经过CUDA SDK Sample和Parboil Benchmark Suite中若干程序的验证,平均取得10倍以上的性能提升.

  17. Real-time resampling in Fourier domain optical coherence tomography using a graphics processing unit.

    Science.gov (United States)

    Van der Jeught, Sam; Bradu, Adrian; Podoleanu, Adrian Gh

    2010-01-01

    Fourier domain optical coherence tomography (FD-OCT) requires either a linear-in-wavenumber spectrometer or a computationally heavy software algorithm to recalibrate the acquired optical signal from wavelength to wavenumber. The first method is sensitive to the position of the prism in the spectrometer, while the second method drastically slows down the system speed when it is implemented on a serially oriented central processing unit. We implement the full resampling process on a commercial graphics processing unit (GPU), distributing the necessary calculations to many stream processors that operate in parallel. A comparison between several recalibration methods is made in terms of performance and image quality. The GPU is also used to accelerate the fast Fourier transform (FFT) and to remove the background noise, thereby achieving full GPU-based signal processing without the need for extra resampling hardware. A display rate of 25 framessec is achieved for processed images (1,024 x 1,024 pixels) using a line-scan charge-coupled device (CCD) camera operating at 25.6 kHz.

  18. C Language Extensions for Hybrid CPU/GPU Programming with StarPU

    OpenAIRE

    Courtès, Ludovic

    2013-01-01

    Modern platforms used for high-performance computing (HPC) include machines with both general-purpose CPUs, and "accelerators", often in the form of graphical processing units (GPUs). StarPU is a C library to exploit such platforms. It provides users with ways to define "tasks" to be executed on CPUs or GPUs, along with the dependencies among them, and by automatically scheduling them over all the available processing units. In doing so, it also relieves programmers from the need to know the ...

  19. A Maximum Time Difference Pipelined Arithmetic Unit Based on CMOS Gate Array

    Institute of Scientific and Technical Information of China (English)

    唐志敏; 夏培肃

    1995-01-01

    This paper describes a maximum time difference pipelined arithmetic chip,the 36-bit adder and subtractor based on 1.5μm CMOS gate array.The chip can operate at 60MHz,and consumes less than 0.5Watt.The results are also studied,and a more precise model of delay time difference is proposed.

  20. Using real time process measurements to reduce catheter related bloodstream infections in the intensive care unit

    OpenAIRE

    Wall, R; Ely, E; Elasy, T; Dittus, R; Foss, J.; Wilkerson, K; Speroff, T

    2005-01-01

    

Problem: Measuring a process of care in real time is essential for continuous quality improvement (CQI). Our inability to measure the process of central venous catheter (CVC) care in real time prevented CQI efforts aimed at reducing catheter related bloodstream infections (CR-BSIs) from these devices.

  1. Fast calculation of HELAS amplitudes using graphics processing unit (GPU)

    CERN Document Server

    Hagiwara, K; Okamura, N; Rainwater, D L; Stelzer, T

    2009-01-01

    We use the graphics processing unit (GPU) for fast calculations of helicity amplitudes of physics processes. As our first attempt, we compute $u\\overline{u}\\to n\\gamma$ ($n=2$ to 8) processes in $pp$ collisions at $\\sqrt{s} = 14$TeV by transferring the MadGraph generated HELAS amplitudes (FORTRAN) into newly developed HEGET ({\\bf H}ELAS {\\bf E}valuation with {\\bf G}PU {\\bf E}nhanced {\\bf T}echnology) codes written in CUDA, a C-platform developed by NVIDIA for general purpose computing on the GPU. Compared with the usual CPU programs, we obtain 40-150 times better performance on the GPU.

  2. Implementing wide baseline matching algorithms on a graphics processing unit.

    Energy Technology Data Exchange (ETDEWEB)

    Rothganger, Fredrick H.; Larson, Kurt W.; Gonzales, Antonio Ignacio; Myers, Daniel S.

    2007-10-01

    Wide baseline matching is the state of the art for object recognition and image registration problems in computer vision. Though effective, the computational expense of these algorithms limits their application to many real-world problems. The performance of wide baseline matching algorithms may be improved by using a graphical processing unit as a fast multithreaded co-processor. In this paper, we present an implementation of the difference of Gaussian feature extractor, based on the CUDA system of GPU programming developed by NVIDIA, and implemented on their hardware. For a 2000x2000 pixel image, the GPU-based method executes nearly thirteen times faster than a comparable CPU-based method, with no significant loss of accuracy.

  3. Solving Time of Least Square Systems in Sigma-Pi Unit Networks

    CERN Document Server

    Courrieu, Pierre

    2008-01-01

    The solving of least square systems is a useful operation in neurocomputational modeling of learning, pattern matching, and pattern recognition. In these last two cases, the solution must be obtained on-line, thus the time required to solve a system in a plausible neural architecture is critical. This paper presents a recurrent network of Sigma-Pi neurons, whose solving time increases at most like the logarithm of the system size, and of its condition number, which provides plausible computation times for biological systems.

  4. Real-Time Computation of Parameter Fitting and Image Reconstruction Using Graphical Processing Units

    CERN Document Server

    Locans, Uldis; Suter, Andreas; Fischer, Jannis; Lustermann, Werner; Dissertori, Gunther; Wang, Qiulin

    2016-01-01

    In recent years graphical processing units (GPUs) have become a powerful tool in scientific computing. Their potential to speed up highly parallel applications brings the power of high performance computing to a wider range of users. However, programming these devices and integrating their use in existing applications is still a challenging task. In this paper we examined the potential of GPUs for two different applications. The first application, created at Paul Scherrer Institut (PSI), is used for parameter fitting during data analysis of muSR (muon spin rotation, relaxation and resonance) experiments. The second application, developed at ETH, is used for PET (Positron Emission Tomography) image reconstruction and analysis. Applications currently in use were examined to identify parts of the algorithms in need of optimization. Efficient GPU kernels were created in order to allow applications to use a GPU, to speed up the previously identified parts. Benchmarking tests were performed in order to measure the ...

  5. EnviroAtlas - Commute Time to Work by Census Block Group for the Conterminous United States

    Data.gov (United States)

    U.S. Environmental Protection Agency — This EnviroAtlas dataset portrays the commute time of workers to their workplace for each Census Block Group (CBG) during 2008-2012. Data were compiled from the...

  6. Effects of CPU 86017 (chlorobenzyltetrahydroberberine chloride) and its enantiomers on thyrotoxicosis-induced overactive endothelin-1 system and oxidative stress in rat testes.

    Science.gov (United States)

    Tang, XiaoYun; Qi, MinYou; Dai, DeZai; Zhang, Can

    2006-08-01

    To study the effects of CPU 86017, a berberine derivative, and its four enantiomers on thyrotoxicosis-induced oxidative stress and the excessive endothelin-1 system in rat testes. Adult male SD rats were given high-dose L-thyroxin (0.2 mg/kg subcutaneously) once daily for 10 days to develop thyrotoxicosis. Subsets of the rats were treated with CPU 86017 or its four enantiomers (SR, SS, RS, and RR) once daily from day 6 to day 10. The alterations of redox, nitric oxide synthase, and endothelin-1 system in testes were examined by spectrophotometry and reverse transcriptase-polymerase chain reaction assay. After 10 days of high-dose L-thyroxin administration, increased mRNA expression of prepro-endothelin-1 and endothelin-converting enzyme was observed in the rat testes, accompanied by an elevated inducible nitric oxide synthase activity and oxidative stress. CPU 86017 and its enantiomer SR significantly improved these abnormalities. High-dose L-thyroxin results in an overactive endothelin-1 system and oxidative stress in adult rat testis. CPU 86017 and its enantiomer SR suppressed the excessive ET-1 system by improving oxidative stress, and SR exhibited more potent efficacy than CPU 86017 and other enantiomers.

  7. 一种3G智能手机中双CPU通信方案设计%Dual-CPU Communication Design Program Based on 3G Smart Phone

    Institute of Scientific and Technical Information of China (English)

    黄俊伟; 向伟

    2012-01-01

    For the conversion from future phone to smart phone, the proposed of two-chip architecture is important milestones. However, the data communication and power control between two-chip are enormous challenges for the architecture. A dual-CPU communication program is designed in this paper, which includes control, data transmission, low-power design. At the same time, detailed block diagram of system implementation and the solution for the data transmission and low-power are presented. This program achieves good results in practical application.%双芯片架构的提出是手机从简单功能的Future Phone转变为功能多样的Smart Phone的重大里程碑,但是双芯片间数据通信、功耗控制是该架构面临的一个难题.设计了一种双CPU的通信方案,包括控制功能、数据传输和低功耗设计.同时给出了详细的系统实现框图,并就数据传输和低功耗要求给出了解决方案.该系统方案在实际应用中取得了良好的效果.

  8. But science is international! Finding time and space to encourage intercultural learning in a content-driven physiology unit.

    Science.gov (United States)

    Etherington, Sarah J

    2014-06-01

    Internationalization of the curriculum is central to the strategic direction of many modern universities and has widespread benefits for student learning. However, these clear aspirations for internationalization of the curriculum have not been widely translated into more internationalized course content and teaching methods in the classroom, particularly in scientific disciplines. This study addressed one major challenge to promoting intercultural competence among undergraduate science students: finding time to scaffold such learning within the context of content-heavy, time-poor units. Small changes to enhance global and intercultural awareness were incorporated into existing assessments and teaching activities within a second-year biomedical physiology unit. Interventions were designed to start a conversation about global and intercultural perspectives on physiology, to embed the development of global awareness into the assessment and to promote cultural exchanges through peer interactions. In student surveys, 40% of domestic and 60% of international student respondents articulated specific learning about interactions in cross-cultural groups resulting from unit activities. Many students also identified specific examples of how cultural beliefs would impact on the place of biomedical physiology within the global community. In addition, staff observed more widespread benefits for student engagement and learning. It is concluded that a significant development of intercultural awareness and a more global perspective on scientific understanding can be supported among undergraduates with relatively modest, easy to implement adaptations to course content.

  9. Storage dynamics in hydropedological units control hillslope connectivity, runoff generation, and the evolution of catchment transit time distributions.

    Science.gov (United States)

    Tetzlaff, D; Birkel, C; Dick, J; Geris, J; Soulsby, C

    2014-02-01

    We examined the storage dynamics and isotopic composition of soil water over 12 months in three hydropedological units in order to understand runoff generation in a montane catchment. The units form classic catena sequences from freely draining podzols on steep upper hillslopes through peaty gleys in shallower lower slopes to deeper peats in the riparian zone. The peaty gleys and peats remained saturated throughout the year, while the podzols showed distinct wetting and drying cycles. In this region, most precipitation events are 80% of flow, even in large events, reflecting the displacement of water from the riparian soils that has been stored in the catchment for >2 years. These riparian areas are the key zone where different source waters mix. Our study is novel in showing that they act as "isostats," not only regulating the isotopic composition of stream water, but also integrating the transit time distribution for the catchment. Hillslope connectivity is controlled by small storage changes in soil unitsDifferent catchment source waters mix in large riparian wetland storageIsotopes show riparian wetlands set the catchment transit time distribution.

  10. Length of time to first job for immigrants in the United Kingdom: An exploratory analysis

    Directory of Open Access Journals (Sweden)

    JuYin (Helen Wong

    2013-05-01

    Full Text Available This study explores whether ethnicity affects immigrants’ time to first employment. Many studies on labour/social inequalities focus on modeling cross-sectional or panel data when comparing ethnic minority to majority groups in terms of their employment patterns. Results from these models, however, do not measure the degree of transition-duration penalties experienced by immigrant groups. Because time itself is an important variable, and to bridge the gap between literature and methodology, a lifecourse perspective and a duration model are employed to examine the length of transition that immigrants require to find first employment.

  11. Influenza mortality in the United States, 2009 pandemic: burden, timing and age distribution.

    Directory of Open Access Journals (Sweden)

    Ann M Nguyen

    Full Text Available BACKGROUND: In April 2009, the most recent pandemic of influenza A began. We present the first estimates of pandemic mortality based on the newly-released final data on deaths in 2009 and 2010 in the United States. METHODS: We obtained data on influenza and pneumonia deaths from the National Center for Health Statistics (NCHS. Age- and sex-specific death rates, and age-standardized death rates, were calculated. Using negative binomial Serfling-type methods, excess mortality was calculated separately by sex and age groups. RESULTS: In many age groups, observed pneumonia and influenza cause-specific mortality rates in October and November 2009 broke month-specific records since 1959 when the current series of detailed US mortality data began. Compared to the typical pattern of seasonal flu deaths, the 2009 pandemic age-specific mortality, as well as influenza-attributable (excess mortality, skewed much younger. We estimate 2,634 excess pneumonia and influenza deaths in 2009-10; the excess death rate in 2009 was 0.79 per 100,000. CONCLUSIONS: Pandemic influenza mortality skews younger than seasonal influenza. This can be explained by a protective effect due to antigenic cycling. When older cohorts have been previously exposed to a similar antigen, immune memory results in lower death rates at older ages. Age-targeted vaccination of younger people should be considered in future pandemics.

  12. Influence of the light-curing unit, storage time and shade of a dental composite resin on the fluorescence

    Science.gov (United States)

    Queiroz, R. S.; Bandéca, M. C.; Calixto, L. R.; Gaiao, U.; Cuin, A.; Porto-Neto, S. T.

    2010-07-01

    The aim of this study was to determine the influence of three light-curing units, storage times and colors of the dental composite resin on the fluorescence. The specimens (diameter 10.0 ± 0.1 mm, thickness 1.0 ± 0.1 mm) were made using a stainless steel mold. The mold was filled with the microhybrid composite resin and a polyethylene film covered each side of the mold. After this, a glass slide was placed on the top of the mold. To standardize the top surface of the specimens a circular weight (1 kg) with an orifice to pass the light tip of the LCU was placed on the top surface and photo-activated during 40 s. Five specimens were made for each group. The groups were divided into 9 groups following the LCUs (one QTH and two LEDs), storage times (immediately after curing, 24 hours, 7 and 30 days) and colors (shades: A2E, A2D, and TC) of the composite resin. After photo-activation, the specimens were storage in artificial saliva during the storage times proposed to each group at 37°C and 100% humidity. The analysis of variance (ANOVA) and Tukey’s posthoc tests showed no significant difference between storage times (immediately, 24 hours and 30 days) ( P > 0.05). The means of fluorescence had difference significant to color and light-curing unit used to all period of storage ( P 0.05).

  13. Trends in No Leisure-Time Physical Activity--United States, 1988-2010

    Science.gov (United States)

    Moore, Latetia V.; Harris, Carmen D.; Carlson, Susan A.; Kruger, Judy; Fulton, Janet E.

    2012-01-01

    Purpose: The aim of this study was to examine trends in the prevalence of no leisure-time physical activity (LTPA) from 1988 to 2010. Method: Using the Behavioral Risk Factor Surveillance System data, 35 states and the District of Columbia reported information on no LTPA from 1988 to 1994; all states reported no LTPA from 1996 to 2010. Results: No…

  14. Clocking in: The Organization of Work Time and Health in the United States

    Science.gov (United States)

    Kleiner, Sibyl; Pavalko, Eliza K.

    2010-01-01

    This article assesses the health implications of emerging patterns in the organization of work time. Using data from the National Longitudinal Survey of Youth 1979, we examine general mental and physical health (SF-12 scores), psychological distress (CESD score), clinical levels of obesity, and the presence of medical conditions, at age 40.…

  15. Uniting Mandelbrot’s Noah and Joseph Effects in Toy Models of Natural Hazard Time Series

    Science.gov (United States)

    Credgington, D.; Watkins, N. W.; Chapman, S. C.; Rosenberg, S. J.; Sanchez, R.

    2009-12-01

    The forecasting of extreme events is a highly topical, cross-disciplinary problem. One aspect which is potentially tractable even when the events themselves are stochastic is the probability of a “burst” of a given size and duration, defined as the area between a time series and a constant threshold. Many natural time series depart from the simplest, Brownian, case and in the 1960s Mandelbrot developed the use of fractals to describe these departures. In particular he proposed two kinds of fractal model to capture the way in which natural data is often persistent in time (his “Joseph effect”, common in hydrology and exemplified by fractional Brownian motion) and/or prone to heavy tailed jumps (the “Noah effect”, typical of economic index time series, for which he gave Levy flights as an examplar). Much of the earlier modelling, however, has emphasised one of the Noah and Joseph parameters (the tail exponent mu and one derived from the temporal behaviour such as power spectral beta) at the other one's expense. I will describe work [1] in which we applied a simple self-affine stable model-linear fractional stable motion (LFSM)-which unifies both effects to better describe natural data, in this case from space physics. I will show how we have resolved some contradictions seen in earlier work, where purely Joseph or Noah descriptions had been sought. I will also show recent work [2] using numerical simulations of LFSM and simple analytic scaling arguments to study the problem of the area between a fractional Levy model time series and a threshold. [1] Watkins et al, Space Science Reviews [2005] [2] Watkins et al, Physical Review E [2009

  16. Annihilating time and space: The electrification of the United States Army, 1875--1920

    Science.gov (United States)

    Brown, Shannon Allen

    2000-10-01

    The United States Army embraced electrical technology in the 1870s as part of a wider initiative to meet the challenge of the coastal defense mission. As commercial power storage, generation, and transmission technology improved and the army came to recognize the value of the energy source as a means and method of improving command and control, localized electrical networks were integrated into the active service of the military. New vulnerabilities emerged as the army became ever more reliant upon electric power, however, and electrification---the institutional adoption and adaptation of electrical technologies---emerged as a very expensive and contentious process guided by technical, political, and economic pressures, and influenced by conflicting personalities within the service. This study considers the institutional evolution of the U.S. Army before and during World War I with respect to the adoption and application of electrical technology. The changing relationships between the military and electrical manufacturing and utilities industries during the period 1875--1920 are also explored. Using a combination of military archival sources and published primary materials, this study traces the effects of electrification on the army. In the end, this study proves that electrification was, at first, a symptom of, and later, a partial solution to the army's struggle to modernize and centralize during the period under consideration. Electrification produced a set of conditions that encouraged a new maturity within the ranks of the army, in technical, doctrinal, and administrative terms. This growth eventually led to the development of new capabilities, new forms of military organization, new missions, and new approaches to warfare.

  17. The rise of global warming skepticism: exploring affective image associations in the United States over time.

    Science.gov (United States)

    Smith, Nicholas; Leiserowitz, Anthony

    2012-06-01

    This article explores how affective image associations to global warming have changed over time. Four nationally representative surveys of the American public were conducted between 2002 and 2010 to assess public global warming risk perceptions, policy preferences, and behavior. Affective images (positive or negative feelings and cognitive representations) were collected and content analyzed. The results demonstrate a large increase in "naysayer" associations, indicating extreme skepticism about the issue of climate change. Multiple regression analyses found that holistic affect and "naysayer" associations were more significant predictors of global warming risk perceptions than cultural worldviews or sociodemographic variables, including political party and ideology. The results demonstrate the important role affective imagery plays in judgment and decision-making processes, how these variables change over time, and how global warming is currently perceived by the American public. © 2012 Society for Risk Analysis.

  18. Design of High-Precision Frequency Measure System Based on CPLD Time Delay Unit

    Energy Technology Data Exchange (ETDEWEB)

    Feng Qian; Ding Wei; Wang Hao, E-mail: fengqian@eqhb.gov.cn [Institute of Seismology, China Earthquake Administration, 40 Hongshan Road, Wuchang District, Wuhan (China)

    2011-02-01

    Introduced a method for high-precision frequency measurement, which do do not need the complicated measuring control circumstance. CPLD is used for improving the precision of measurement by the method of quantization time-delay. High precision frequency adjustable module based on the method has been used on the photoelectricity data acquisition system. Frequency accuracy is -8.306x10{sup -10}, which meet the requirement of instrument.

  19. Deposition times in the northeastern United States during the Holocene: establishing valid priors for Bayesian age models

    Science.gov (United States)

    Goring, S.; Williams, J. W.; Blois, J. L.; Jackson, S. T.; Paciorek, C. J.; Booth, R. K.; Marlon, J. R.; Blaauw, M.; Christen, J. A.

    2012-08-01

    Age-depth relationships in sedimentary archives such as lakes, wetlands and bogs are non-linear with irregular probability distributions associated with calibrated radiocarbon dates. Bayesian approaches are thus well-suited to understanding relationships between age and depth for use in paleoecological studies. Bayesian models for the accumulation of sediment and organic matter within basins combine dated material from one or more records with prior information about the behavior of deposition times (yr/cm) based on expert knowledge. Well-informed priors are essential to good modeling of the age-depth relationship, but are particularly important in cases where data may be sparse (e.g., few radiocarbon dates), or unclear (e.g., age-reversals, coincident dates, age offsets, outliers and dates within a radiocarbon plateau). Here we assessed Holocene deposition times using 204 age-depth models obtained from the Neotoma Paleoecology Database (www.neotomadb.org) for both lacustrine and palustrine environments across the northeastern United States. These age-depth models were augmented using biostratigraphic events identifiable within pollen records from the northeastern United States during the Holocene and late-Pleistocene. Deposition times are significantly related to depositional environment (palustrine and lacustrine), sediment age, and sediment depth. Spatial variables had non-significant relationships with deposition time when site effects were considered. The best-fit model was a generalized additive mixed model that relates deposition time to age, stratified by depositional environment with site as a random factor. The best-fit model accounts for 63.3% of the total deviance in deposition times. The strongly increasing accumulation rates of the last 500-1000 years indicate that gamma distributions describing lacustrine deposition times (α = 1.08, β = 18.28) and palustrine deposition times (α = 1.23, β = 22.32) for the entire Holocene may be insufficient for

  20. An Algorithm of Traffic Perception of DDoS Attacks against SOA Based on Time United Conditional Entropy

    Directory of Open Access Journals (Sweden)

    Yuntao Zhao

    2016-01-01

    Full Text Available DDoS attacks can prevent legitimate users from accessing the service by consuming resource of the target nodes, whose availability of network and service is exposed to a significant threat. Therefore, DDoS traffic perception is the premise and foundation of the whole system security. In this paper the method of DDoS traffic perception for SOA network based on time united conditional entropy was proposed. According to many-to-one relationship mapping between the source IP address and destination IP addresses of DDoS attacks, traffic characteristics of services are analyzed based on conditional entropy. The algorithm is provided with perception ability of DDoS attacks on SOA services by introducing time dimension. Simulation results show that the novel method can realize DDoS traffic perception with analyzing abrupt variation of conditional entropy in time dimension.

  1. Towards 100,000 CPU Cycle-Scavenging by Genetic Algorithms

    Science.gov (United States)

    Globus, Al; Biegel, Bryan A. (Technical Monitor)

    2001-01-01

    We examine a web-centric design using standard tools such as web servers, web browsers, PHP, and mySQL. We also consider the applicability of Information Power Grid tools such as the Globus (no relation to the author) Toolkit. We intend to implement this architecture with JavaGenes running on at least two cycle-scavengers: Condor and United Devices. JavaGenes, a genetic algorithm code written in Java, will be used to evolve multi-species reactive molecular force field parameters.

  2. Utilizing Graphics Processing Units for Network Anomaly Detection

    Science.gov (United States)

    2012-09-13

    matching system using deterministic finite automata and extended finite automata resulting in a speedup of 9x over the CPU implementation [SGO09]. Kovach ...pages 14–18, 2009. [Kov10] Nicholas S. Kovach . Accelerating malware detection via a graphics processing unit, 2010. http://www.dtic.mil/dtic/tr

  3. A comparison of methods to predict historical daily streamflow time series in the southeastern United States

    Science.gov (United States)

    Farmer, William H.; Archfield, Stacey A.; Over, Thomas M.; Hay, Lauren E.; LaFontaine, Jacob H.; Kiang, Julie E.

    2015-01-01

    Effective and responsible management of water resources relies on a thorough understanding of the quantity and quality of available water. Streamgages cannot be installed at every location where streamflow information is needed. As part of its National Water Census, the U.S. Geological Survey is planning to provide streamflow predictions for ungaged locations. In order to predict streamflow at a useful spatial and temporal resolution throughout the Nation, efficient methods need to be selected. This report examines several methods used for streamflow prediction in ungaged basins to determine the best methods for regional and national implementation. A pilot area in the southeastern United States was selected to apply 19 different streamflow prediction methods and evaluate each method by a wide set of performance metrics. Through these comparisons, two methods emerged as the most generally accurate streamflow prediction methods: the nearest-neighbor implementations of nonlinear spatial interpolation using flow duration curves (NN-QPPQ) and standardizing logarithms of streamflow by monthly means and standard deviations (NN-SMS12L). It was nearly impossible to distinguish between these two methods in terms of performance. Furthermore, neither of these methods requires significantly more parameterization in order to be applied: NN-SMS12L requires 24 regional regressions—12 for monthly means and 12 for monthly standard deviations. NN-QPPQ, in the application described in this study, required 27 regressions of particular quantiles along the flow duration curve. Despite this finding, the results suggest that an optimal streamflow prediction method depends on the intended application. Some methods are stronger overall, while some methods may be better at predicting particular statistics. The methods of analysis presented here reflect a possible framework for continued analysis and comprehensive multiple comparisons of methods of prediction in ungaged basins (PUB

  4. Optimization of a Superconducting Magnetic Energy Storage Device via a CPU-Efficient Semi-Analytical Simulation

    CERN Document Server

    Dimitrov, I K; Solovyov, V F; Chubar, O; Li, Qiang

    2014-01-01

    Recent advances in second generation (YBCO) high temperature superconducting wire could potentially enable the design of super high performance energy storage devices that combine the high energy density of chemical storage with the high power of superconducting magnetic storage. However, the high aspect ratio and considerable filament size of these wires requires the concomitant development of dedicated optimization methods that account for both the critical current density and ac losses in type II superconductors. Here, we report on the novel application and results of a CPU-efficient semi-analytical computer code based on the Radia 3D magnetostatics software package. Our algorithm is used to simulate and optimize the energy density of a superconducting magnetic energy storage device model, based on design constraints, such as overall size and number of coils. The rapid performance of the code is pivoted on analytical calculations of the magnetic field based on an efficient implementation of the Biot-Savart...

  5. Design and Development of a Vector Control System of Induction Motor Based on Dual CPU for Electric Vehicle

    Institute of Scientific and Technical Information of China (English)

    孙逢春; 翟丽; 张承宁; 彭连云

    2003-01-01

    A vector control system for electric vehicle (EV) induction motor drive system is designed and developed. Its hardware system based on dual CPU(microcomputer 80C196KC and DSP TMS320F2407) is implemented. The fundamental mathematics equations of induction motor in the general synchronously rotating reference frame (M-T frame) used for vector control are achieved by coordinate transformation. Rotor flux equation and torque equation are deduced. According to these equations, an induction motor mathematical model and rotor flux observer model are built separately. The rotor flux field-oriented vector control method is implemented based on these models in system software, some of the simulation results with Matab/Simulink are given. The simulation results show that the vector control system for EV induction motor drive system has better static and dynamic performance, and the rotor flux field-oriented vector control method was practically verified.

  6. Leisure-time physical activity among older adults. United States, 1990.

    Science.gov (United States)

    Yusuf, H R; Croft, J B; Giles, W H; Anda, R F; Casper, M L; Caspersen, C J; Jones, D A

    1996-06-24

    To investigate the prevalence and selected correlates of leisure-time physical activity in a nationally representative sample of persons aged 65 years or older. Data from 2783 older male and 5018 older female respondents to the 1990 National Health Interview Survey were used. Regular physical activity was defined as participation in leisure-time physical activities 3 times or more per week for 30 minutes or more during the previous 2 weeks. Odds ratios (ORs) were estimated from multivariate logistic regression analysis. Prevalence of regular physical activity was 37% among older men and 24% among older women. Correlates of regular physical activity included the perception of excellent to good health (men: OR, 1.5; 95% confidence interval [CI], 1.1-1.9; women: OR, 1.6; 95% CI, 1.3-1.9), correct exercise knowledge (men: OR, 2.4; 95% CI, 1.9-3.1; women: OR, 2.7; 95% CI, 2.2-3.4), no activity limitations (men: OR, 1.3; 95% CI, 1.0-1.6; women: OR, 1.7; 95% CI, 1.4-2.0) and not perceiving "a lot" of stress during the previous 2 weeks (men: OR, 1.7; 95% CI, 1.2-2.4; women: OR, 1.3; 95% CI, 1.0-1.6). Among those who had been told at least twice that they had high blood pressure, physician's advice to exercise was associated with regular physical activity (men: OR, 1.6; 95% CI, 1.2-2.3; women: OR, 1.5; 95% CI, 1.2-1.9). The 2 major activities among active older adults were walking (men, 69%; women, 75%) and gardening (men, 45%; women, 35%). Prevalence of regular physical activity is low among older Americans. Identifying the correlates of physical activity will help to formulate strategies to increase physical activity in this age group.

  7. Comparative Performance Analysis of Intel Xeon Phi, GPU, and CPU: A Case Study from Microscopy Image Analysis.

    Science.gov (United States)

    Teodoro, George; Kurc, Tahsin; Kong, Jun; Cooper, Lee; Saltz, Joel

    2014-05-01

    We study and characterize the performance of operations in an important class of applications on GPUs and Many Integrated Core (MIC) architectures. Our work is motivated by applications that analyze low-dimensional spatial datasets captured by high resolution sensors, such as image datasets obtained from whole slide tissue specimens using microscopy scanners. Common operations in these applications involve the detection and extraction of objects (object segmentation), the computation of features of each extracted object (feature computation), and characterization of objects based on these features (object classification). In this work, we have identify the data access and computation patterns of operations in the object segmentation and feature computation categories. We systematically implement and evaluate the performance of these operations on modern CPUs, GPUs, and MIC systems for a microscopy image analysis application. Our results show that the performance on a MIC of operations that perform regular data access is comparable or sometimes better than that on a GPU. On the other hand, GPUs are significantly more efficient than MICs for operations that access data irregularly. This is a result of the low performance of MICs when it comes to random data access. We also have examined the coordinated use of MICs and CPUs. Our experiments show that using a performance aware task strategy for scheduling application operations improves performance about 1.29× over a first-come-first-served strategy. This allows applications to obtain high performance efficiency on CPU-MIC systems - the example application attained an efficiency of 84% on 192 nodes (3072 CPU cores and 192 MICs).

  8. An Algorithm of Traffic Perception of DDoS Attacks against SOA Based on Time United Conditional Entropy

    OpenAIRE

    Yuntao Zhao; Hengchi Liu; Yongxin Feng

    2016-01-01

    DDoS attacks can prevent legitimate users from accessing the service by consuming resource of the target nodes, whose availability of network and service is exposed to a significant threat. Therefore, DDoS traffic perception is the premise and foundation of the whole system security. In this paper the method of DDoS traffic perception for SOA network based on time united conditional entropy was proposed. According to many-to-one relationship mapping between the source IP address and destinati...

  9. Dual Management of Real-Time and Interactive Jobs in Smartphones

    National Research Council Canada - National Science Library

    LEE, Eunji; KIM, Youngsun; BAHN, Hyokyung

    2014-01-01

    .... To this end, high performance NVRAM is adopted as storage of real-time applications, and a dual purpose CPU scheduler, in which one core is exclusively used for real-time applications, is proposed...

  10. Noniterative Multireference Coupled Cluster Methods on Heterogeneous CPU-GPU Systems.

    Science.gov (United States)

    Bhaskaran-Nair, Kiran; Ma, Wenjing; Krishnamoorthy, Sriram; Villa, Oreste; van Dam, Hubertus J J; Aprà, Edoardo; Kowalski, Karol

    2013-04-09

    A novel parallel algorithm for noniterative multireference coupled cluster (MRCC) theories, which merges recently introduced reference-level parallelism (RLP) [Bhaskaran-Nair, K.; Brabec, J.; Aprà, E.; van Dam, H. J. J.; Pittner, J.; Kowalski, K. J. Chem. Phys.2012, 137, 094112] with the possibility of accelerating numerical calculations using graphics processing units (GPUs) is presented. We discuss the performance of this approach applied to the MRCCSD(T) method (iterative singles and doubles and perturbative triples), where the corrections due to triples are added to the diagonal elements of the MRCCSD effective Hamiltonian matrix. The performance of the combined RLP/GPU algorithm is illustrated on the example of the Brillouin-Wigner (BW) and Mukherjee (Mk) state-specific MRCCSD(T) formulations.

  11. Mixed precision numerical weather prediction on hybrid GPU-CPU supercomputers

    Science.gov (United States)

    Lapillonne, Xavier; Osuna, Carlos; Spoerri, Pascal; Osterried, Katherine; Charpilloz, Christophe; Fuhrer, Oliver

    2017-04-01

    A new version of the climate and weather model COSMO that runs faster on traditional high performance computing systems with CPUs as well as on heterogeneous architectures using graphics processing units (GPUs) has been developed. The model was in addition adapted to be able to run in "single precision" mode. After discussing the key changes introduced in this new model version and the tools used in the porting approach, we present 3 applications, namely the MeteoSwiss operational weather prediction system, COSMO-LEPS and the CALMO project, which already take advantage of the performance improvement, up to a factor 4, by running on GPU system and using the single precision mode. We discuss how the code changes open new perspectives for scientific research and can enable researchers to get access to a new class of supercomputers.

  12. Productive Large Scale Personal Computing: Fast Multipole Methods on GPU/CPU Systems Project

    Data.gov (United States)

    National Aeronautics and Space Administration — To be used naturally in design optimization, parametric study and achieve quick total time-to-solution, simulation must naturally and personally be available to the...

  13. Harmonization of Bordetella pertussis real-time PCR diagnostics in the United States in 2012.

    Science.gov (United States)

    Williams, Margaret M; Taylor, Thomas H; Warshauer, David M; Martin, Monte D; Valley, Ann M; Tondella, M Lucia

    2015-01-01

    Real-time PCR (rt-PCR) is an important diagnostic tool for the identification of Bordetella pertussis, Bordetella holmesii, and Bordetella parapertussis. Most U.S. public health laboratories (USPHLs) target IS481, present in 218 to 238 copies in the B. pertussis genome and 32 to 65 copies in B. holmesii. The CDC developed a multitarget PCR assay to differentiate B. pertussis, B. holmesii, and B. parapertussis and provided protocols and training to 19 USPHLs. The 2012 performance exercise (PE) assessed the capability of USPHLs to detect these three Bordetella species in clinical samples. Laboratories were recruited by the Wisconsin State Proficiency Testing program through the Association of Public Health Laboratories, in partnership with the CDC. Spring and fall PE panels contained 12 samples each of viable Bordetella and non-Bordetella species in saline. Fifty and 53 USPHLs participated in the spring and fall PEs, respectively, using a variety of nucleic acid extraction methods, PCR platforms, and assays. Ninety-six percent and 94% of laboratories targeted IS481 in spring and fall, respectively, in either singleplex or multiplex assays. In spring and fall, respectively, 72% and 79% of USPHLs differentiated B. pertussis and B. holmesii and 68% and 72% identified B. parapertussis. IS481 cycle threshold (CT) values for B. pertussis samples had coefficients of variation (CV) ranging from 10% to 28%. Of the USPHLs that differentiated B. pertussis and B. holmesii, sensitivity was 96% and specificity was 95% for the combined panels. The 2012 PE demonstrated increased harmonization of rt-PCR Bordetella diagnostic protocols in USPHLs compared to that of the previous survey.

  14. Common Mental Disorders at the Time of Deportation: A Survey at the Mexico-United States Border.

    Science.gov (United States)

    Bojorquez, Ietza; Aguilera, Rosa M; Ramírez, Jacobo; Cerecero, Diego; Mejía, Silvia

    2015-12-01

    Deportations from the Unites States (US) to Mexico increased substantially during the last decade. Considering deportation as a stressful event with potential consequences on mental health, we aimed to (1) estimate the prevalence of common mental disorders (CMD) among deported migrants; and (2) explore the association between migratory experience, social support and psychological variables, and CMD in this group. In repatriation points along the border, a probability sample of deportees responded to the Self Reporting Questionnaire (SRQ). The prevalence of CMD was 16.0% (95% CI 12.3, 20.6). There was a U-shaped association between time in the US and SRQ score. Times returned to Mexico, having a spouse in the US, number of persons in household, less social support, anxiety as a personality trait, and avoidant coping style were directly associated with SRQ score. Public health policies should address the need for mental health care among deported migrants.

  15. Simulating Photon Mapping for Real-time Applications

    DEFF Research Database (Denmark)

    Larsen, Bent Dalgaard; Christensen, Niels Jørgen

    2004-01-01

    GPU accelerated final gathering method and the illumination is then stored in light maps. Caustic photons are traced on the CPU and then drawn using points in the framebuffer, and finally filtered using the GPU. Both diffuse and non-diffuse surfaces can be handled by calculating the direct...... illumination on the GPU and the photon tracing on the CPU. We achieve real-time frame rates for dynamic scenes....

  16. Influenza epidemics in Iceland over 9 decades: changes in timing and synchrony with the United States and Europe.

    Science.gov (United States)

    Weinberger, Daniel M; Krause, Tyra Grove; Mølbak, Kåre; Cliff, Andrew; Briem, Haraldur; Viboud, Cécile; Gottfredsson, Magnus

    2012-10-01

    Influenza epidemics exhibit a strongly seasonal pattern, with winter peaks that occur with similar timing across temperate areas of the Northern Hemisphere. This synchrony could be influenced by population movements, environmental factors, host immunity, and viral characteristics. The historical isolation of Iceland and subsequent increase in international contacts make it an ideal setting to study epidemic timing. The authors evaluated changes in the timing and regional synchrony of influenza epidemics using mortality and morbidity data from Iceland, North America, and Europe during the period from 1915 to 2007. Cross-correlations and wavelet analyses highlighted 2 major changes in influenza epidemic patterns in Iceland: first was a shift from nonseasonal epidemics prior to the 1930s to a regular winter-seasonal pattern, and second was a change in the early 1990s when a 1-month lag between Iceland and the United States and Europe was no longer detectable with monthly data. There was a moderate association between increased synchrony and the number of foreign visitors to Iceland, providing a plausible explanation for the second shift in epidemic timing. This suggests that transportation might have a minor effect on epidemic timing, but efforts to restrict air travel during influenza epidemics would likely have a limited impact, even for island populations.

  17. Genetic Algorithm Supported by Graphical Processing Unit Improves the Exploration of Effective Connectivity in Functional Brain Imaging

    Directory of Open Access Journals (Sweden)

    Lawrence Wing Chi Chan

    2015-05-01

    Full Text Available Brain regions of human subjects exhibit certain levels of associated activation upon specific environmental stimuli. Functional Magnetic Resonance Imaging (fMRI detects regional signals, based on which we could infer the direct or indirect neuronal connectivity between the regions. Structural Equation Modeling (SEM is an appropriate mathematical approach for analyzing the effective connectivity using fMRI data. A maximum likelihood (ML discrepancy function is minimized against some constrained coefficients of a path model. The minimization is an iterative process. The computing time is very long as the number of iterations increases geometrically with the number of path coefficients. Using regular Quad-Core Central Processing Unit (CPU platform, duration up to three months is required for the iterations from 0 to 30 path coefficients. This study demonstrates the application of Graphical Processing Unit (GPU with the parallel Genetic Algorithm (GA that replaces the Powell minimization in the standard program code of the analysis software package. It was found in the same example that GA under GPU reduced the duration to 20 hours and provided more accurate solution when compared with standard program code under CPU.

  18. Fast parallel image registration on CPU and GPU for diagnostic classification of Alzheimer's disease

    NARCIS (Netherlands)

    D.P. Shamonin (Denis); E.E. Bron (Esther); B.P.F. Lelieveldt (Boudewijn); M. Smits (Marion); S.K. Klein (Stefan); M. Staring (Marius)

    2014-01-01

    textabstractNonrigid image registration is an important, but time-consuming task in medical image analysis. In typical neuroimaging studies, multiple image registrations are performed, i.e., for atlas-based segmentation or template construction. Faster image registration routines would therefore be

  19. Fast parallel image registration on CPU and GPU for diagnostic classification of Alzheimer's disease

    NARCIS (Netherlands)

    Shamonin, D.P.; Bron, E.E.; Lelieveldt, B.P.F.; Smits, M.; Klein, S.; Staring, M.

    2014-01-01

    Nonrigid image registration is an important, but time-consuming task in medical image analysis. In typical neuroimaging studies, multiple image registrations are performed, i.e., for atlas-based segmentation or template construction. Faster image registration routines would therefore be beneficial.

  20. Real-time 3D video conference on generic hardware

    Science.gov (United States)

    Desurmont, X.; Bruyelle, J. L.; Ruiz, D.; Meessen, J.; Macq, B.

    2007-02-01

    Nowadays, video-conference tends to be more and more advantageous because of the economical and ecological cost of transport. Several platforms exist. The goal of the TIFANIS immersive platform is to let users interact as if they were physically together. Unlike previous teleimmersion systems, TIFANIS uses generic hardware to achieve an economically realistic implementation. The basic functions of the system are to capture the scene, transmit it through digital networks to other partners, and then render it according to each partner's viewing characteristics. The image processing part should run in real-time. We propose to analyze the whole system. it can be split into different services like central processing unit (CPU), graphical rendering, direct memory access (DMA), and communications trough the network. Most of the processing is done by CPU resource. It is composed of the 3D reconstruction and the detection and tracking of faces from the video stream. However, the processing needs to be parallelized in several threads that have as little dependencies as possible. In this paper, we present these issues, and the way we deal with them.

  1. A novel synthetic oleanolic acid derivative (CPU-II2) attenuates liver fibrosis in mice through regulating the function of hepatic stellate cells.

    Science.gov (United States)

    Wu, Li-Mei; Wu, Xing-Xin; Sun, Yang; Kong, Xiang-Wen; Zhang, Yi-Hua; Xu, Qiang

    2008-03-01

    Regulation on the function of the hepatic stellate cells (HSCs) is one of the proposed therapeutic approaches to liver fibrosis. In the present study, we examined the in vitro and in vivo effects of CPU-II2, a novel synthetic oleanolic acid (OLA) derivative with nitrate, on hepatic fibrosis. This compound alleviated CCl4-induced hepatic fibrosis in mice with a decrease in hepatic hydroxyproline (Hyp) content and histological changes. CPU-II2 also attenuated the mRNA expression of alpha-smooth muscle actin (alpha-SMA) and tissue inhibitor of metalloproteinase type 1 (TIMP-1) induced by CCl4 in mice and reduced both mRNA and protein levels of alpha-SMA in HSC-T6 cells. Interestingly, CPU-II2 did not affect the survival of HSC-T6 cells but decreased the expression of procollagen-alpha1 (I) in HSC-T6 cells through down-regulating the phosphorylation of p38 MAPK. CPU-II2 attenuates the development of liver fibrosis rather by regulating the function of HSCs through p38 MAPK pathway than by damaging the stellate cells.

  2. Rapid gains in segmenting fluent speech when words match the rhythmic unit: evidence from infants acquiring syllable-timed languages

    Directory of Open Access Journals (Sweden)

    Laura eBosch

    2013-03-01

    Full Text Available The ability to extract word-forms from sentential contexts represents an initial step in infants’ process towards lexical acquisition. By age 6 months the ability is just emerging and evidence of it is restricted to certain testing conditions. Most research has been developed with infants acquiring stress-timed languages (English, but also German and Dutch whose rhythmic unit is not the syllable. Data from infants acquiring syllable-timed languages are still scarce and limited to French (European and Canadian, partially revealing some discrepancies with English regarding the age at which word segmentation ability emerges. Research reported here aims at broadening this cross-linguistic perspective by presenting first data on the early ability to segment monosyllabic word-forms by infants acquiring Spanish and Catalan. Three different language groups (two monolingual and one bilingual and two different age groups (8- and 6-month-old infants were tested using natural language and a modified version of the HPP with familiarization to passages and testing on words. Results revealed positive evidence of word segmentation in all groups at both ages, but critically, the pattern of preference differed by age. A novelty preference was obtained in the older groups, while the expected familiarity preference was only found at the younger age tested, suggesting more advanced segmentation ability with an increase in age. These results offer first evidence of an early ability for monosyllabic word segmentation in infants acquiring syllable-timed languages such as Spanish or Catalan, not previously described in the literature. Data show no impact of bilingual exposure in the emergence of this ability and results suggest rapid gains in early segmentation for words that match the rhythm unit of the native language.

  3. Rapid gains in segmenting fluent speech when words match the rhythmic unit: evidence from infants acquiring syllable-timed languages.

    Science.gov (United States)

    Bosch, Laura; Figueras, Melània; Teixidó, Maria; Ramon-Casas, Marta

    2013-01-01

    The ability to extract word-forms from sentential contexts represents an initial step in infants' process toward lexical acquisition. By age 6 months the ability is just emerging and evidence of it is restricted to certain testing conditions. Most research has been developed with infants acquiring stress-timed languages (English, but also German and Dutch) whose rhythmic unit is not the syllable. Data from infants acquiring syllable-timed languages are still scarce and limited to French (European and Canadian), partially revealing some discrepancies with English regarding the age at which word segmentation ability emerges. Research reported here aims at broadening this cross-linguistic perspective by presenting first data on the early ability to segment monosyllabic word-forms by infants acquiring Spanish and Catalan. Three different language groups (two monolingual and one bilingual) and two different age groups (8- and 6-month-old infants) were tested using natural language and a modified version of the HPP with familiarization to passages and testing on words. Results revealed positive evidence of word segmentation in all groups at both ages, but critically, the pattern of preference differed by age. A novelty preference was obtained in the older groups, while the expected familiarity preference was only found at the younger age tested, suggesting more advanced segmentation ability with an increase in age. These results offer first evidence of an early ability for monosyllabic word segmentation in infants acquiring syllable-timed languages such as Spanish or Catalan, not previously described in the literature. Data show no impact of bilingual exposure in the emergence of this ability and results suggest rapid gains in early segmentation for words that match the rhythm unit of the native language.

  4. Massively Parallel Signal Processing using the Graphics Processing Unit for Real-Time Brain-Computer Interface Feature Extraction.

    Science.gov (United States)

    Wilson, J Adam; Williams, Justin C

    2009-01-01

    The clock speeds of modern computer processors have nearly plateaued in the past 5 years. Consequently, neural prosthetic systems that rely on processing large quantities of data in a short period of time face a bottleneck, in that it may not be possible to process all of the data recorded from an electrode array with high channel counts and bandwidth, such as electrocorticographic grids or other implantable systems. Therefore, in this study a method of using the processing capabilities of a graphics card [graphics processing unit (GPU)] was developed for real-time neural signal processing of a brain-computer interface (BCI). The NVIDIA CUDA system was used to offload processing to the GPU, which is capable of running many operations in parallel, potentially greatly increasing the speed of existing algorithms. The BCI system records many channels of data, which are processed and translated into a control signal, such as the movement of a computer cursor. This signal processing chain involves computing a matrix-matrix multiplication (i.e., a spatial filter), followed by calculating the power spectral density on every channel using an auto-regressive method, and finally classifying appropriate features for control. In this study, the first two computationally intensive steps were implemented on the GPU, and the speed was compared to both the current implementation and a central processing unit-based implementation that uses multi-threading. Significant performance gains were obtained with GPU processing: the current implementation processed 1000 channels of 250 ms in 933 ms, while the new GPU method took only 27 ms, an improvement of nearly 35 times.

  5. Rapid gains in segmenting fluent speech when words match the rhythmic unit: evidence from infants acquiring syllable-timed languages

    Science.gov (United States)

    Bosch, Laura; Figueras, Melània; Teixidó, Maria; Ramon-Casas, Marta

    2013-01-01

    The ability to extract word-forms from sentential contexts represents an initial step in infants' process toward lexical acquisition. By age 6 months the ability is just emerging and evidence of it is restricted to certain testing conditions. Most research has been developed with infants acquiring stress-timed languages (English, but also German and Dutch) whose rhythmic unit is not the syllable. Data from infants acquiring syllable-timed languages are still scarce and limited to French (European and Canadian), partially revealing some discrepancies with English regarding the age at which word segmentation ability emerges. Research reported here aims at broadening this cross-linguistic perspective by presenting first data on the early ability to segment monosyllabic word-forms by infants acquiring Spanish and Catalan. Three different language groups (two monolingual and one bilingual) and two different age groups (8- and 6-month-old infants) were tested using natural language and a modified version of the HPP with familiarization to passages and testing on words. Results revealed positive evidence of word segmentation in all groups at both ages, but critically, the pattern of preference differed by age. A novelty preference was obtained in the older groups, while the expected familiarity preference was only found at the younger age tested, suggesting more advanced segmentation ability with an increase in age. These results offer first evidence of an early ability for monosyllabic word segmentation in infants acquiring syllable-timed languages such as Spanish or Catalan, not previously described in the literature. Data show no impact of bilingual exposure in the emergence of this ability and results suggest rapid gains in early segmentation for words that match the rhythm unit of the native language. PMID:23467921

  6. Assessment of full-time faculty preceptors by colleges and schools of pharmacy in the United States and Puerto Rico.

    Science.gov (United States)

    Kirschenbaum, Harold L; Zerilli, Tina

    2012-10-12

    To identify the manner in which colleges and schools of pharmacy in the United States and Puerto Rico assess full-time faculty preceptors. Directors of pharmacy practice (or equivalent title) were invited to complete an online, self-administered questionnaire. Seventy of the 75 respondents (93.3%) confirmed that their college or school assessed full-time pharmacy faculty members based on activities related to precepting students at a practice site. The most commonly reported assessment components were summative student evaluations (98.5%), type of professional service provided (92.3%), scholarly accomplishments (86.2%), and community service (72.3%). Approximately 42% of respondents indicated that a letter of evaluation provided by a site-based supervisor was included in their assessment process. Some colleges and schools also conducted onsite assessment of faculty members. Most colleges and schools of pharmacy assess full-time faculty-member preceptors via summative student assessments, although other strategies are used. Given the important role of preceptors in ensuring students are prepared for pharmacy practice, colleges and schools of pharmacy should review their assessment strategies for full-time faculty preceptors, keeping in mind the methodologies used by other institutions.

  7. Multi-CPU plasma fluid turbulence calculations on a CRAY Y-MP C90

    Energy Technology Data Exchange (ETDEWEB)

    Lynch, V.E.; Carreras, B.A.; Leboeuf, J.N. [Oak Ridge National Lab., TN (United States); Curtis, B.C.; Troutman, R.L. [National Energy Research Supercomputer Center, Livermore, CA (United States)

    1993-06-01

    Significant improvements in real-time efficiency have been obtained for plasma fluid turbulence calculations by microtasking the nonlinear fluid code KITE in which they are implemented on the CRAY Y-MP C90 at the National Energy Research Supercomputer Center (NERSC). The number of processors accessed concurrently scales linearly with problem size. Close to six concurrent processors have so far been obtained with a three-dimensional nonlinear production calculation at the currently allowed memory size of 80 Mword. With a calculation size corresponding to the maximum allowed memory of 200 Mword in the next system configuration, we expect to be able to access close to nine processors of the C90 concurrently with a commensurate improvement in real-time efficiency. These improvements in performance are comparable to those expected from a massively parallel implementation of the same calculations on the Intel Paragon.

  8. Fast data preprocessing with Graphics Processing Units for inverse problem solving in light-scattering measurements

    Science.gov (United States)

    Derkachov, G.; Jakubczyk, T.; Jakubczyk, D.; Archer, J.; Woźniak, M.

    2017-07-01

    Utilising Compute Unified Device Architecture (CUDA) platform for Graphics Processing Units (GPUs) enables significant reduction of computation time at a moderate cost, by means of parallel computing. In the paper [Jakubczyk et al., Opto-Electron. Rev., 2016] we reported using GPU for Mie scattering inverse problem solving (up to 800-fold speed-up). Here we report the development of two subroutines utilising GPU at data preprocessing stages for the inversion procedure: (i) A subroutine, based on ray tracing, for finding spherical aberration correction function. (ii) A subroutine performing the conversion of an image to a 1D distribution of light intensity versus azimuth angle (i.e. scattering diagram), fed from a movie-reading CPU subroutine running in parallel. All subroutines are incorporated in PikeReader application, which we make available on GitHub repository. PikeReader returns a sequence of intensity distributions versus a common azimuth angle vector, corresponding to the recorded movie. We obtained an overall ∼ 400 -fold speed-up of calculations at data preprocessing stages using CUDA codes running on GPU in comparison to single thread MATLAB-only code running on CPU.

  9. High-speed nonlinear finite element analysis for surgical simulation using graphics processing units.

    Science.gov (United States)

    Taylor, Z A; Cheng, M; Ourselin, S

    2008-05-01

    The use of biomechanical modelling, especially in conjunction with finite element analysis, has become common in many areas of medical image analysis and surgical simulation. Clinical employment of such techniques is hindered by conflicting requirements for high fidelity in the modelling approach, and fast solution speeds. We report the development of techniques for high-speed nonlinear finite element analysis for surgical simulation. We use a fully nonlinear total Lagrangian explicit finite element formulation which offers significant computational advantages for soft tissue simulation. However, the key contribution of the work is the presentation of a fast graphics processing unit (GPU) solution scheme for the finite element equations. To the best of our knowledge, this represents the first GPU implementation of a nonlinear finite element solver. We show that the present explicit finite element scheme is well suited to solution via highly parallel graphics hardware, and that even a midrange GPU allows significant solution speed gains (up to 16.8 x) compared with equivalent CPU implementations. For the models tested the scheme allows real-time solution of models with up to 16,000 tetrahedral elements. The use of GPUs for such purposes offers a cost-effective high-performance alternative to expensive multi-CPU machines, and may have important applications in medical image analysis and surgical simulation.

  10. Calculation of HELAS amplitudes for QCD processes using graphics processing unit (GPU)

    CERN Document Server

    Hagiwara, K; Okamura, N; Rainwater, D L; Stelzer, T

    2009-01-01

    We use a graphics processing unit (GPU) for fast calculations of helicity amplitudes of quark and gluon scattering processes in massless QCD. New HEGET ({\\bf H}ELAS {\\bf E}valuation with {\\bf G}PU {\\bf E}nhanced {\\bf T}echnology) codes for gluon self-interactions are introduced, and a C++ program to convert the MadGraph generated FORTRAN codes into HEGET codes in CUDA (a C-platform for general purpose computing on GPU) is created. Because of the proliferation of the number of Feynman diagrams and the number of independent color amplitudes, the maximum number of final state jets we can evaluate on a GPU is limited to 4 for pure gluon processes ($gg\\to 4g$), or 5 for processes with one or more quark lines such as $q\\bar{q}\\to 5g$ and $qq\\to qq+3g$. Compared with the usual CPU-based programs, we obtain 60-100 times better performance on the GPU, except for 5-jet production processes and the $gg\\to 4g$ processes for which the GPU gain over the CPU is about 20.

  11. School Start Times for Middle School and High School Students - United States, 2011-12 School Year.

    Science.gov (United States)

    Wheaton, Anne G; Ferro, Gabrielle A; Croft, Janet B

    2015-08-07

    Adolescents who do not get enough sleep are more likely to be overweight; not engage in daily physical activity; suffer from depressive symptoms; engage in unhealthy risk behaviors such as drinking, smoking tobacco, and using illicit drugs; and perform poorly in school. However, insufficient sleep is common among high school students, with less than one third of U.S. high school students sleeping at least 8 hours on school nights. In a policy statement published in 2014, the American Academy of Pediatrics (AAP) urged middle and high schools to modify start times as a means to enable students to get adequate sleep and improve their health, safety, academic performance, and quality of life. AAP recommended that "middle and high schools should aim for a starting time of no earlier than 8:30 a.m.". To assess state-specific distributions of public middle and high school start times and establish a pre-recommendation baseline, CDC and the U.S. Department of Education analyzed data from the 2011-12 Schools and Staffing Survey (SASS). Among an estimated 39,700 public middle, high, and combined schools* in the United States, the average start time was 8:03 a.m. Overall, only 17.7% of these public schools started school at 8:30 a.m. or later. The percentage of schools with 8:30 a.m. or later start times varied greatly by state, ranging from 0% in Hawaii, Mississippi, and Wyoming to more than three quarters of schools in Alaska (76.8%) and North Dakota (78.5%). A school system start time policy of 8:30 a.m. or later provides teenage students the opportunity to achieve the 8.5-9.5 hours of sleep recommended by AAP and the 8-10 hours recommended by the National Sleep Foundation.

  12. The Scales of Time, Length, Mass, Energy, and Other Fundamental Physical Quantities in the Atomic World and the Use of Atomic Units in Quantum Mechanical Calculations

    Science.gov (United States)

    Teo, Boon K.; Li, Wai-Kee

    2011-01-01

    This article is divided into two parts. In the first part, the atomic unit (au) system is introduced and the scales of time, space (length), and speed, as well as those of mass and energy, in the atomic world are discussed. In the second part, the utility of atomic units in quantum mechanical and spectroscopic calculations is illustrated with…

  13. Fast, multi-channel real-time processing of signals with microsecond latency using graphics processing units.

    Science.gov (United States)

    Rath, N; Kato, S; Levesque, J P; Mauel, M E; Navratil, G A; Peng, Q

    2014-04-01

    Fast, digital signal processing (DSP) has many applications. Typical hardware options for performing DSP are field-programmable gate arrays (FPGAs), application-specific integrated DSP chips, or general purpose personal computer systems. This paper presents a novel DSP platform that has been developed for feedback control on the HBT-EP tokamak device. The system runs all signal processing exclusively on a Graphics Processing Unit (GPU) to achieve real-time performance with latencies below 8 μs. Signals are transferred into and out of the GPU using PCI Express peer-to-peer direct-memory-access transfers without involvement of the central processing unit or host memory. Tests were performed on the feedback control system of the HBT-EP tokamak using forty 16-bit floating point inputs and outputs each and a sampling rate of up to 250 kHz. Signals were digitized by a D-TACQ ACQ196 module, processing done on an NVIDIA GTX 580 GPU programmed in CUDA, and analog output was generated by D-TACQ AO32CPCI modules.

  14. Fast, multi-channel real-time processing of signals with microsecond latency using graphics processing units

    Science.gov (United States)

    Rath, N.; Kato, S.; Levesque, J. P.; Mauel, M. E.; Navratil, G. A.; Peng, Q.

    2014-04-01

    Fast, digital signal processing (DSP) has many applications. Typical hardware options for performing DSP are field-programmable gate arrays (FPGAs), application-specific integrated DSP chips, or general purpose personal computer systems. This paper presents a novel DSP platform that has been developed for feedback control on the HBT-EP tokamak device. The system runs all signal processing exclusively on a Graphics Processing Unit (GPU) to achieve real-time performance with latencies below 8 μs. Signals are transferred into and out of the GPU using PCI Express peer-to-peer direct-memory-access transfers without involvement of the central processing unit or host memory. Tests were performed on the feedback control system of the HBT-EP tokamak using forty 16-bit floating point inputs and outputs each and a sampling rate of up to 250 kHz. Signals were digitized by a D-TACQ ACQ196 module, processing done on an NVIDIA GTX 580 GPU programmed in CUDA, and analog output was generated by D-TACQ AO32CPCI modules.

  15. Study on finite deformation finite element analysis algorithm of turbine blade based on CPU+GPU heterogeneous parallel computation

    Directory of Open Access Journals (Sweden)

    Liu Tian-Yuan

    2016-01-01

    Full Text Available Blade is one of the core components of turbine machinery. The reliability of blade is directly related to the normal operation of plant unit. However, with the increase of blade length and flow rate, non-linear effects such as finite deformation must be considered in strength computation to guarantee enough accuracy. Parallel computation is adopted to improve the efficiency of classical nonlinear finite element method and shorten the blade design period. So it is of extraordinary importance for engineering practice. In this paper, the dynamic partial differential equations and the finite element method forms for turbine blades under centrifugal load and flow load are given firstly. Then, according to the characteristics of turbine blade model, the classical method is optimized based on central processing unit + graphics processing unit heterogeneous parallel computation. Finally, the numerical experiment validations are performed. The computation speed of the algorithm proposed in this paper is compared with the speed of ANSYS. For the rectangle plate model with mesh number of 10 k to 4000 k, a maximum speed-up of 4.31 can be obtained. For the real blade-rim model with mesh number of 500 k, the speed-up of 4.54 times can be obtained.

  16. Dynamic modelling of a 3-CPU parallel robot via screw theory

    Directory of Open Access Journals (Sweden)

    L. Carbonari

    2013-04-01

    Full Text Available The article describes the dynamic modelling of I.Ca.Ro., a novel Cartesian parallel robot recently designed and prototyped by the robotics research group of the Polytechnic University of Marche. By means of screw theory and virtual work principle, a computationally efficient model has been built, with the final aim of realising advanced model based controllers. Then a dynamic analysis has been performed in order to point out possible model simplifications that could lead to a more efficient run time implementation.

  17. Conceptual design of the X-IFU Instrument Control Unit on board the ESA Athena mission

    Science.gov (United States)

    Corcione, L.; Ligori, S.; Capobianco, V.; Bonino, D.; Valenziano, L.; Guizzo, G. P.

    2016-07-01

    Athena is one of L-class missions selected in the ESA Cosmic Vision 2015-2025 program for the science theme of the Hot and Energetic Universe. The Athena model payload includes the X-ray Integral Field Unit (X-IFU), an advanced actively shielded X-ray microcalorimeter spectrometer for high spectral resolution imaging, utilizing cooled Transition Edge Sensors. This paper describes the preliminary architecture of Instrument Control Unit (ICU), which is aimed at operating all XIFU's subsystems, as well as at implementing the main functional interfaces of the instrument with the S/C control unit. The ICU functions include the TC/TM management with S/C, science data formatting and transmission to S/C Mass Memory, housekeeping data handling, time distribution for synchronous operations and the management of the X-IFU components (i.e. CryoCoolers, Filter Wheel, Detector Readout Electronics Event Processor, Power Distribution Unit). ICU functions baseline implementation for the phase-A study foresees the usage of standard and Space-qualified components from the heritage of past and current space missions (e.g. Gaia, Euclid), which currently encompasses Leon2/Leon3 based CPU board and standard Space-qualified interfaces for the exchange commands and data between ICU and X-IFU subsystems. Alternative architecture, arranged around a powerful PowerPC-based CPU, is also briefly presented, with the aim of endowing the system with enhanced hardware resources and processing power capability, for the handling of control and science data processing tasks not defined yet at this stage of the mission study.

  18. Flocking-based Document Clustering on the Graphics Processing Unit

    Energy Technology Data Exchange (ETDEWEB)

    Cui, Xiaohui [ORNL; Potok, Thomas E [ORNL; Patton, Robert M [ORNL; ST Charles, Jesse Lee [ORNL

    2008-01-01

    Abstract?Analyzing and grouping documents by content is a complex problem. One explored method of solving this problem borrows from nature, imitating the flocking behavior of birds. Each bird represents a single document and flies toward other documents that are similar to it. One limitation of this method of document clustering is its complexity O(n2). As the number of documents grows, it becomes increasingly difficult to receive results in a reasonable amount of time. However, flocking behavior, along with most naturally inspired algorithms such as ant colony optimization and particle swarm optimization, are highly parallel and have found increased performance on expensive cluster computers. In the last few years, the graphics processing unit (GPU) has received attention for its ability to solve highly-parallel and semi-parallel problems much faster than the traditional sequential processor. Some applications see a huge increase in performance on this new platform. The cost of these high-performance devices is also marginal when compared with the price of cluster machines. In this paper, we have conducted research to exploit this architecture and apply its strengths to the document flocking problem. Our results highlight the potential benefit the GPU brings to all naturally inspired algorithms. Using the CUDA platform from NIVIDA? we developed a document flocking implementation to be run on the NIVIDA?GEFORCE 8800. Additionally, we developed a similar but sequential implementation of the same algorithm to be run on a desktop CPU. We tested the performance of each on groups of news articles ranging in size from 200 to 3000 documents. The results of these tests were very significant. Performance gains ranged from three to nearly five times improvement of the GPU over the CPU implementation. This dramatic improvement in runtime makes the GPU a potentially revolutionary platform for document clustering algorithms.

  19. 混合 CPU-GPU加速矩阵的 Hessenberg约化%The Hessenberg Reduction of a Matrix Using CPU-GPU Hybrid System

    Institute of Scientific and Technical Information of China (English)

    沈聪; 曹婷; 宋金文; 高火涛

    2015-01-01

    The first step of solving the eigenvalue problem of a nonsymmetric matirx is to reduce the matrix to an upper Hessenberg form .A concretely plan is designed for the reduction of a matrix on GPU .For the CPU-GPU hy-brid system, The whole work of the block hessenberg reduction algorithm is split into several tasks .The computa-tional complexity of each task in each loop is analyzed , and then a more reasonable scheduling strategy is presen-ted.The numerical experiment shows that the algorithm using the hybrid scheduling plan presented acquires about 47 percentage of performance improvement compared with the orginal transplant plan using CUBLAS only in aver -age.Also it achieves a speedup of more than 7 times than that of the current BLAS library .%求解一般矩阵特征值问题的第一步即进行Hessenberg约化。给出了矩阵的Hessenberg约化算法在GPU上实现的具体方案。针对CPU-GPU混合系统,对基于块计算的Hessenberg约化算法进行了计算任务的划分,并通过详细分析每次循环时各任务的计算量,设计了一种较为合理的分阶段混合调度策略。数值实验表明,使用CPU-GPU混合调度的方案相比直接使用CUBLAS库方案平均获得了约47%的性能提升,而且相比使用CPU上标准的BLAS库函数最高获得了超过7倍的加速比。

  20. An Integrated Pipeline of Open Source Software Adapted for Multi-CPU Architectures: Use in the Large-Scale Identification of Single Nucleotide Polymorphisms

    Directory of Open Access Journals (Sweden)

    B. Jayashree

    2007-01-01

    Full Text Available The large amounts of EST sequence data available from a single species of an organism as well as for several species within a genus provide an easy source of identification of intra- and interspecies single nucleotide polymorphisms (SNPs. In the case of model organisms, the data available are numerous, given the degree of redundancy in the deposited EST data. There are several available bioinformatics tools that can be used to mine this data; however, using them requires a certain level of expertise: the tools have to be used sequentially with accompanying format conversion and steps like clustering and assembly of sequences become time-intensive jobs even for moderately sized datasets. We report here a pipeline of open source software extended to run on multiple CPU architectures that can be used to mine large EST datasets for SNPs and identify restriction sites for assaying the SNPs so that cost-effective CAPS assays can be developed for SNP genotyping in genetics and breeding applications. At the International Crops Research Institute for the Semi-Arid Tropics (ICRISAT, the pipeline has been implemented to run on a Paracel high-performance system consisting of four dual AMD Opteron processors running Linux with MPICH. The pipeline can be accessed through user-friendly web interfaces at http://hpc.icrisat.cgiar.org/PBSWeb and is available on request for academic use. We have validated the developed pipeline by mining chickpea ESTs for interspecies SNPs, development of CAPS assays for SNP genotyping, and confirmation of restriction digestion pattern at the sequence level.

  1. An integrated pipeline of open source software adapted for multi-CPU architectures: use in the large-scale identification of single nucleotide polymorphisms.

    Science.gov (United States)

    Jayashree, B; Hanspal, Manindra S; Srinivasan, Rajgopal; Vigneshwaran, R; Varshney, Rajeev K; Spurthi, N; Eshwar, K; Ramesh, N; Chandra, S; Hoisington, David A

    2007-01-01

    The large amounts of EST sequence data available from a single species of an organism as well as for several species within a genus provide an easy source of identification of intra- and interspecies single nucleotide polymorphisms (SNPs). In the case of model organisms, the data available are numerous, given the degree of redundancy in the deposited EST data. There are several available bioinformatics tools that can be used to mine this data; however, using them requires a certain level of expertise: the tools have to be used sequentially with accompanying format conversion and steps like clustering and assembly of sequences become time-intensive jobs even for moderately sized datasets. We report here a pipeline of open source software extended to run on multiple CPU architectures that can be used to mine large EST datasets for SNPs and identify restriction sites for assaying the SNPs so that cost-effective CAPS assays can be developed for SNP genotyping in genetics and breeding applications. At the International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), the pipeline has been implemented to run on a Paracel high-performance system consisting of four dual AMD Opteron processors running Linux with MPICH. The pipeline can be accessed through user-friendly web interfaces at http://hpc.icrisat.cgiar.org/PBSWeb and is available on request for academic use. We have validated the developed pipeline by mining chickpea ESTs for interspecies SNPs, development of CAPS assays for SNP genotyping, and confirmation of restriction digestion pattern at the sequence level.

  2. Summary of the Second Workshop on Liquid Argon Time Projection Chamber Research and Development in the United States

    Energy Technology Data Exchange (ETDEWEB)

    Acciarri, R. [Fermi National Accelerator Lab. (FNAL), Batavia, IL (United States); et al.

    2015-04-21

    The second workshop to discuss the development of liquid argon time projection chambers (LArTPCs) in the United States was held at Fermilab on July 8-9, 2014. The workshop was organized under the auspices of the Coordinating Panel for Advanced Detectors, a body that was initiated by the American Physical Society Division of Particles and Fields. All presentations at the workshop were made in six topical plenary sessions: i) Argon Purity and Cryogenics, ii) TPC and High Voltage, iii) Electronics, Data Acquisition and Triggering, iv) Scintillation Light Detection, v) Calibration and Test Beams, and vi) Software. This document summarizes the current efforts in each of these areas. It primarily focuses on the work in the US, but also highlights work done elsewhere in the world.

  3. Park availability and physical activity, TV time, and overweight and obesity among women: Findings from Australia and the United States.

    Science.gov (United States)

    Veitch, Jenny; Abbott, Gavin; Kaczynski, Andrew T; Wilhelm Stanis, Sonja A; Besenyi, Gina M; Lamb, Karen E

    2016-03-01

    This study examined relationships between three measures of park availability and self-reported physical activity (PA), television viewing (TV) time, and overweight/obesity among women from Australia and the United States. Having more parks near home was the only measure of park availability associated with an outcome. Australian women (n=1848) with more parks near home had higher odds of meeting PA recommendations and lower odds of being overweight/obese. In the US sample (n=489), women with more parks near home had lower odds of watching >4h TV per day. A greater number of parks near home was associated with lower BMI among both Australian and US women. Evidence across diverse contexts provides support to improve park availability to promote PA and other health behaviors.

  4. Phenological Classification of the United States: A Geographic Framework for Extending Multi-Sensor Time-Series Data

    Directory of Open Access Journals (Sweden)

    Willem J. D. van Leeuwen

    2010-02-01

    Full Text Available This study introduces a new geographic framework, phenological classification, for the conterminous United States based on Moderate Resolution Imaging Spectroradiometer (MODIS Normalized Difference Vegetation Index (NDVI time-series data and a digital elevation model. The resulting pheno-class map is comprised of 40 pheno-classes, each having unique phenological and topographic characteristics. Cross-comparison of the pheno-classes with the 2001 National Land Cover Database indicates that the new map contains additional phenological and climate information. The pheno-class framework may be a suitable basis for the development of an Advanced Very High Resolution Radiometer (AVHRR-MODIS NDVI translation algorithm and for various biogeographic studies.

  5. Summary of the Second Workshop on Liquid Argon Time Projection Chamber Research and Development in the United States

    CERN Document Server

    Acciarri, R; Artrip, D; Baller, B; Bromberg, C; Cavanna, F; Carls, B; Chen, H; Deptuch, G; Epprecht, L; Dharmapalan, R; Foreman, W; Hahn, A; Johnson, M; Jones, B J P; Junk, T; Lang, K; Lockwitz, S; Marchionni, A; Mauger, C; Montanari, C; Mufson, S; Nessi, M; Back, H Olling; Petrillo, G; Pordes, S; Raaf, J; Rebel, B; Sinins, G; Soderberg, M; Spooner, N J C; Stancari, M; Strauss, T; Terao, K; Thorn, C; Tope, T; Toups, M; Urheim, J; Van de Water, R; Wang, H; Wasserman, R; Weber, M; Whittington, D; Yang, T

    2015-01-01

    The second workshop to discuss the development of liquid argon time projection chambers (LArTPCs) in the United States was held at Fermilab on July 8-9, 2014. The workshop was organized under the auspices of the Coordinating Panel for Advanced Detectors, a body that was initiated by the American Physical Society Division of Particles and Fields. All presentations at the workshop were made in six topical plenary sessions: $i)$ Argon Purity and Cryogenics, $ii)$ TPC and High Voltage, $iii)$ Electronics, Data Acquisition and Triggering, $iv)$ Scintillation Light Detection, $v)$ Calibration and Test Beams, and $vi)$ Software. This document summarizes the current efforts in each of these areas. It primarily focuses on the work in the US, but also highlights work done elsewhere in the world.

  6. Comparison of open and closed suction on safety, efficacy and nursing time in a paediatric intensive care unit.

    Science.gov (United States)

    Evans, Janine; Syddall, Sophie; Butt, Warwick; Kinney, Sharon

    2014-05-01

    Endotracheal suctioning (ETS) is one of the most common procedures performed in the paediatric intensive care. The two methods of endotracheal suctioning used are known as open and closed suction, but neither method has been shown to be the superior suction method in the Paediatric Intensive Care Unit (PICU). The primary purpose was to compare open and closed suction methods from a physiological, safety and staff resource perspective. All paediatric intensive care patients with an endotracheal tube were included. Between June and September 2011 alternative months were nominated as open or closed suction months. Data were prospectively collected including suction events, staff involved, time taken, use of saline, and change from pre-suction baseline in heart rate (HR), mean arterial pressure (MAP) and oxygen saturation (SpO2). Blocked or dislodged ETTs were recorded as adverse events. Closed suction was performed more often per day (7.2 vs 6.0, psuction (5 vs 3, psuction group (18% vs 40%). Open suction demonstrated a greater reduction in SpO2 and nearly three times the incidence of increases in HR and MAP compared to closed suction. Reductions in MAP or HR were comparable across the two methods. In conclusion, CS could be performed with less staffing time and number of nurses, less physiological disturbances to our patients and no significant increases in adverse events. Copyright © 2014. Published by Elsevier Ltd.

  7. Effect of nocturnal sound reduction on the incidence of delirium in intensive care unit patients: An interrupted time series analysis.

    Science.gov (United States)

    van de Pol, Ineke; van Iterson, Mat; Maaskant, Jolanda

    2017-08-01

    Delirium in critically-ill patients is a common multifactorial disorder that is associated with various negative outcomes. It is assumed that sleep disturbances can result in an increased risk of delirium. This study hypothesized that implementing a protocol that reduces overall nocturnal sound levels improves quality of sleep and reduces the incidence of delirium in Intensive Care Unit (ICU) patients. This interrupted time series study was performed in an adult mixed medical and surgical 24-bed ICU. A pre-intervention group of 211 patients was compared with a post-intervention group of 210 patients after implementation of a nocturnal sound-reduction protocol. Primary outcome measures were incidence of delirium, measured by the Intensive Care Delirium Screening Checklist (ICDSC) and quality of sleep, measured by the Richards-Campbell Sleep Questionnaire (RCSQ). Secondary outcome measures were use of sleep-inducing medication, delirium treatment medication, and patient-perceived nocturnal noise. A significant difference in slope in the percentage of delirium was observed between the pre- and post-intervention periods (-3.7% per time period, p=0.02). Quality of sleep was unaffected (0.3 per time period, p=0.85). The post-intervention group used significantly less sleep-inducing medication (pdelirium in ICU patients was significantly reduced after implementation of a nocturnal sound-reduction protocol. However, reported sleep quality did not improve. Copyright © 2017. Published by Elsevier Ltd.

  8. Reducing patient wait times and improving resource utilization at British Columbia Cancer Agency's ambulatory care unit through simulation.

    Science.gov (United States)

    Santibáñez, Pablo; Chow, Vincent S; French, John; Puterman, Martin L; Tyldesley, Scott

    2009-12-01

    We consider an ambulatory care unit (ACU) in a large cancer centre, where operational and resource utilization challenges led to overcrowding, excessive delays, and concerns regarding safety of critical patient care duties. We use simulation to analyze the simultaneous impact of operations, scheduling, and resource allocation on patient wait time, clinic overtime, and resource utilization. The impact of these factors has been studied before, but usually in isolation. Further, our model considers multiple clinics operating concurrently, and includes the extra burden of training residents and medical students during patient consults. Through scenario analyses we found that the best outcomes were obtained when not one but multiple changes were implemented simultaneously. We developed configurations that achieve a reduction of up to 70% in patient wait times and 25% in physical space requirements, with the same appointment volume. The key findings of the study are the importance of on time clinic start, the need for improved patient scheduling; and the potential improvements from allocating examination rooms flexibly and dynamically among individual clinics within each of the oncology programs. These findings are currently being evaluated for implementation by senior management.

  9. Real-time photoacoustic and ultrasound dual-modality imaging system facilitated with graphics processing unit and code parallel optimization.

    Science.gov (United States)

    Yuan, Jie; Xu, Guan; Yu, Yao; Zhou, Yu; Carson, Paul L; Wang, Xueding; Liu, Xiaojun

    2013-08-01

    Photoacoustic tomography (PAT) offers structural and functional imaging of living biological tissue with highly sensitive optical absorption contrast and excellent spatial resolution comparable to medical ultrasound (US) imaging. We report the development of a fully integrated PAT and US dual-modality imaging system, which performs signal scanning, image reconstruction, and display for both photoacoustic (PA) and US imaging all in a truly real-time manner. The back-projection (BP) algorithm for PA image reconstruction is optimized to reduce the computational cost and facilitate parallel computation on a state of the art graphics processing unit (GPU) card. For the first time, PAT and US imaging of the same object can be conducted simultaneously and continuously, at a real-time frame rate, presently limited by the laser repetition rate of 10 Hz. Noninvasive PAT and US imaging of human peripheral joints in vivo were achieved, demonstrating the satisfactory image quality realized with this system. Another experiment, simultaneous PAT and US imaging of contrast agent flowing through an artificial vessel, was conducted to verify the performance of this system for imaging fast biological events. The GPU-based image reconstruction software code for this dual-modality system is open source and available for download from http://sourceforge.net/projects/patrealtime.

  10. Real-time geo-registration of imagery using COTS graphics processors

    Science.gov (United States)

    Flath, Laurence M.; Kartz, Michael W.

    2009-06-30

    A method of performing real-time geo-registration of high-resolution digital imagery using existing graphics processing units (GPUs) already found in current personal computers, rather than the main central processing unit (CPU). Digital image data captured by a camera (along with inertial navigation system (INS) data associated with the image data) is transferred to and processed by the GPU to perform the calculations involved in transforming the captured image into a geo-rectified, nadir-looking image. By using the GPU, the order-of-magnitude increase in throughput over conventional software techniques makes real-time geo-registration possible without the significant cost of custom hardware solutions.

  11. Accelerating Pathology Image Data Cross-Comparison on CPU-GPU Hybrid Systems.

    Science.gov (United States)

    Wang, Kaibo; Huai, Yin; Lee, Rubao; Wang, Fusheng; Zhang, Xiaodong; Saltz, Joel H

    2012-07-01

    As an important application of spatial databases in pathology imaging analysis, cross-comparing the spatial boundaries of a huge amount of segmented micro-anatomic objects demands extremely data- and compute-intensive operations, requiring high throughput at an affordable cost. However, the performance of spatial database systems has not been satisfactory since their implementations of spatial operations cannot fully utilize the power of modern parallel hardware. In this paper, we provide a customized software solution that exploits GPUs and multi-core CPUs to accelerate spatial cross-comparison in a cost-effective way. Our solution consists of an efficient GPU algorithm and a pipelined system framework with task migration support. Extensive experiments with real-world data sets demonstrate the effectiveness of our solution, which improves the performance of spatial cross-comparison by over 18 times compared with a parallelized spatial database approach.

  12. Procedural generation and real-time rendering of a marine ecosystem

    Institute of Scientific and Technical Information of China (English)

    Rong LI; Xin DING; Jun-hao YU; Tian-yi GAO; Wen-ting ZHENG; Rui WANG; Hu-jun BAO

    2014-01-01

    Underwater scene is one of the most marvelous environments in the world. In this study, we present an efficient procedural modeling and rendering system to generate marine ecosystems for swim-through graphic applications. To produce realistic and natural underwater scenes, several techniques and algorithms have been presented and introduced. First, to distribute sealife naturally on a seabed, we employ an ecosystem simulation that considers the infl uence of the underwater environment. Second, we propose a two-level procedural modeling system to generate sealife with unique biological features. At the base level, a series of grammars are designed to roughly represent underwater sealife on a central processing unit (CPU). Then at the fi ne level, additional details of the sealife are created and rendered using graphic processing units (GPUs). Such a hybrid CPU-GPU framework best adopts sequential and parallel computation in modeling a marine ecosystem, and achieves a high level of performance. Third, the proposed system integrates dynamic simulations in the proposed procedural modeling process to support dynamic interactions between sealife and the underwater environment, where interactions and physical factors of the environment are formulated into parameters and control the geometric generation at the fi ne level. Results demonstrate that this system is capable of generating and rendering scenes with massive corals and sealife in real time.

  13. Adaptation of a Multi-Block Structured Solver for Effective Use in a Hybrid CPU/GPU Massively Parallel Environment

    Science.gov (United States)

    Gutzwiller, David; Gontier, Mathieu; Demeulenaere, Alain

    2014-11-01

    Multi-Block structured solvers hold many advantages over their unstructured counterparts, such as a smaller memory footprint and efficient serial performance. Historically, multi-block structured solvers have not been easily adapted for use in a High Performance Computing (HPC) environment, and the recent trend towards hybrid GPU/CPU architectures has further complicated the situation. This paper will elaborate on developments and innovations applied to the NUMECA FINE/Turbo solver that have allowed near-linear scalability with real-world problems on over 250 hybrid GPU/GPU cluster nodes. Discussion will focus on the implementation of virtual partitioning and load balancing algorithms using a novel meta-block concept. This implementation is transparent to the user, allowing all pre- and post-processing steps to be performed using a simple, unpartitioned grid topology. Additional discussion will elaborate on developments that have improved parallel performance, including fully parallel I/O with the ADIOS API and the GPU porting of the computationally heavy CPUBooster convergence acceleration module. Head of HPC and Release Management, Numeca International.

  14. Accelerating the SCE-UA Global Optimization Method Based on Multi-Core CPU and Many-Core GPU

    Directory of Open Access Journals (Sweden)

    Guangyuan Kan

    2016-01-01

    Full Text Available The famous global optimization SCE-UA method, which has been widely used in the field of environmental model parameter calibration, is an effective and robust method. However, the SCE-UA method has a high computational load which prohibits the application of SCE-UA to high dimensional and complex problems. In recent years, the hardware of computer, such as multi-core CPUs and many-core GPUs, improves significantly. These much more powerful new hardware and their software ecosystems provide an opportunity to accelerate the SCE-UA method. In this paper, we proposed two parallel SCE-UA methods and implemented them on Intel multi-core CPU and NVIDIA many-core GPU by OpenMP and CUDA Fortran, respectively. The Griewank benchmark function was adopted in this paper to test and compare the performances of the serial and parallel SCE-UA methods. According to the results of the comparison, some useful advises were given to direct how to properly use the parallel SCE-UA methods.

  15. High temperature transformations of waste printed circuit boards from computer monitor and CPU: Characterisation of residues and kinetic studies.

    Science.gov (United States)

    Rajagopal, Raghu Raman; Rajarao, Ravindra; Sahajwalla, Veena

    2016-11-01

    This paper investigates the high temperature transformation, specifically the kinetic behaviour of the waste printed circuit board (WPCB) derived from computer monitor (single-sided/SSWPCB) and computer processing boards - CPU (multi-layered/MLWPCB) using Thermo-Gravimetric Analyser (TGA) and Vertical Thermo-Gravimetric Analyser (VTGA) techniques under nitrogen atmosphere. Furthermore, the resulting WPCB residues were subjected to characterisation using X-ray Fluorescence spectrometry (XRF), Carbon Analyser, X-ray Photoelectron Spectrometer (XPS) and Scanning Electron Microscopy (SEM). In order to analyse the material degradation of WPCB, TGA from 40°C to 700°C at the rates of 10°C, 20°C and 30°C and VTGA at 700°C, 900°C and 1100°C were performed respectively. The data obtained was analysed on the basis of first order reaction kinetics. Through experiments it is observed that there exists a substantial difference between SSWPCB and MLWPCB in their decomposition levels, kinetic behaviour and structural properties. The calculated activation energy (EA) of SSWPCB is found to be lower than that of MLWPCB. Elemental analysis of SSWPCB determines to have high carbon content in contrast to MLWPCB and differences in materials properties have significant influence on kinetics, which is ceramic rich, proving to have differences in the physicochemical properties. These high temperature transformation studies and associated analytical investigations provide fundamental understanding of different WPCB and its major variations. Copyright © 2015 Elsevier Ltd. All rights reserved.

  16. Parallelizing ATLAS Reconstruction and Simulation: Issues and Optimization Solutions for Scaling on Multi- and Many-CPU Platforms

    Science.gov (United States)

    Leggett, C.; Binet, S.; Jackson, K.; Levinthal, D.; Tatarkhanov, M.; Yao, Y.

    2011-12-01

    Thermal limitations have forced CPU manufacturers to shift from simply increasing clock speeds to improve processor performance, to producing chip designs with multi- and many-core architectures. Further the cores themselves can run multiple threads as a zero overhead context switch allowing low level resource sharing (Intel Hyperthreading). To maximize bandwidth and minimize memory latency, memory access has become non uniform (NUMA). As manufacturers add more cores to each chip, a careful understanding of the underlying architecture is required in order to fully utilize the available resources. We present AthenaMP and the Atlas event loop manager, the driver of the simulation and reconstruction engines, which have been rewritten to make use of multiple cores, by means of event based parallelism, and final stage I/O synchronization. However, initial studies on 8 andl6 core Intel architectures have shown marked non-linearities as parallel process counts increase, with as much as 30% reductions in event throughput in some scenarios. Since the Intel Nehalem architecture (both Gainestown and Westmere) will be the most common choice for the next round of hardware procurements, an understanding of these scaling issues is essential. Using hardware based event counters and Intel's Performance Tuning Utility, we have studied the performance bottlenecks at the hardware level, and discovered optimization schemes to maximize processor throughput. We have also produced optimization mechanisms, common to all large experiments, that address the extreme nature of today's HEP code, which due to it's size, places huge burdens on the memory infrastructure of today's processors.

  17. Multidisciplinary Simulation Acceleration using Multiple Shared-Memory Graphical Processing Units

    Science.gov (United States)

    Kemal, Jonathan Yashar

    For purposes of optimizing and analyzing turbomachinery and other designs, the unsteady Favre-averaged flow-field differential equations for an ideal compressible gas can be solved in conjunction with the heat conduction equation. We solve all equations using the finite-volume multiple-grid numerical technique, with the dual time-step scheme used for unsteady simulations. Our numerical solver code targets CUDA-capable Graphical Processing Units (GPUs) produced by NVIDIA. Making use of MPI, our solver can run across networked compute notes, where each MPI process can use either a GPU or a Central Processing Unit (CPU) core for primary solver calculations. We use NVIDIA Tesla C2050/C2070 GPUs based on the Fermi architecture, and compare our resulting performance against Intel Zeon X5690 CPUs. Solver routines converted to CUDA typically run about 10 times faster on a GPU for sufficiently dense computational grids. We used a conjugate cylinder computational grid and ran a turbulent steady flow simulation using 4 increasingly dense computational grids. Our densest computational grid is divided into 13 blocks each containing 1033x1033 grid points, for a total of 13.87 million grid points or 1.07 million grid points per domain block. To obtain overall speedups, we compare the execution time of the solver's iteration loop, including all resource intensive GPU-related memory copies. Comparing the performance of 8 GPUs to that of 8 CPUs, we obtain an overall speedup of about 6.0 when using our densest computational grid. This amounts to an 8-GPU simulation running about 39.5 times faster than running than a single-CPU simulation.

  18. The Effect of Increasing Meeting Time on the Physiological Indices of Patients Admitted to the Intensive Care Unit

    Directory of Open Access Journals (Sweden)

    Mahmoudi

    2016-04-01

    Full Text Available Background Most hospitals have restricted visitation time in intensive care units (ICUs for various reasons. Given the advantages of family presence and positive effect of emotional touching, talking and smiling on nervous system stimulation and vital signs of the patients. Objectives The present study aimed to determine the effect of increased visitation time on physiological indices of the patients hospitalized in ICUs. Materials and Methods This clinical trial study was conducted in the ICUs of Vail-e-Asr hospital in Arak city, Iran. A total of 60 subjects were randomly assigned to the intervention and control groups with visitation time for 10 minutes 3 times a day and 10 minutes once a day, respectively. Then, the patients’ physiological indices were measured before, during, and 10 and 30 minutes after the hospital visiting hours. Data were analyzed using SPSS version 20. Results Findings showed no statistically significant differences among mean values of all physiological indices in measurement stages before, during, and 10 and 30 minutes after the visitation times in the control group (P > 0.05. While, in the intervention group, systolic blood pressure (SBP measurements at 9 (previous mean: 126.9, 30 minutes later: 111.9, 12:00 PM (previous mean: 126.9, 30 minutes later: 114.9, and 3:00 PM (previous mean: 125.2, 30 minutes later: 105.8, diastolic blood pressure (DBP measurements at 9:00 AM (previous mean: 87.4, 30 minutes later: 83.2, 12:00 PM (previous mean: 86.6, 30 minutes later: 81.7, and 3:00 PM (previous mean: 87.1, 30 minutes later: 85.0, heart rate (HR measurements at 9:00 AM (previous mean: 90, 30 minutes later: 78.4, 12:00 PM (previous mean: 89.8, 30 minutes later: 78.6, and 3:00 PM (previous mean: 89.3, 30 minutes later: 78.3, repertory rate (RR measurements at 9:00 AM (previous mean: 20.9, 30 minutes later: 15.0, 12:00 PM (previous mean: 20.6, 30 minutes later: 15.4, and 3:00 PM (previous mean: 21.0, 30 minutes later: 15

  19. Time course of changes in passive properties of the gastrocnemius muscle-tendon unit during 5 min of static stretching.

    Science.gov (United States)

    Nakamura, Masatoshi; Ikezoe, Tome; Takeno, Yohei; Ichihashi, Noriaki

    2013-06-01

    The minimum time required for Static stretching (SS) to change the passive properties of the muscle-tendon unit (MTU), as well as the association between these passive properties, remains unclear. This study investigated the time course of changes in the passive properties of gastrocnemius MTU during 5 min of SS. The subjects comprised 20 healthy males (22.0 ± 1.8 years). Passive torque as an index of MTU resistance and myotendinous junction (MTJ) displacement as an index of muscle extensibility were assessed using ultrasonography and dynamometer during 5 min of SS. Significant differences before and every 1 min during SS were determined using Scheffé's post hoc test. Relationships between passive torque and MTJ displacement for each subject were determined using Pearson's product-moment correlation coefficient. Although gradual changes in both passive torque and MTJ displacement were demonstrated over every minute, these changes became statistically significant after 2, 3, 4, and 5 min of SS compared with the values before SS. In addition, passive torque after 5 min SS was significantly lower than that after 2 min SS. Similarly, MTJ displacement after 5 min SS was significantly higher than that after 2 min SS. A strong correlation was observed between passive torque and MTJ displacement for each subject (r = -0.886 to -0.991). These results suggest that SS for more than 2 min effectively increases muscle extensibility, which in turn decreases MTU resistance. Copyright © 2012 Elsevier Ltd. All rights reserved.

  20. The meaning of postponed motherhood for women in the United States and Sweden: aspects of feminism and radical timing strategies.

    Science.gov (United States)

    Welles-Nyström, B

    1997-01-01

    This exploratory study considered certain psychosocial, medical, and cultural aspects of the phenomenon of postponed motherhood for one cohort of white women born between 1947 and 1953 in Sweden and the United States. A cross-cultural comparison was made of the experience of pregnancy and the early perinatal period in 15 American and 16 Swedish women to find out (a) whether timing decisions reflected the influence of feminist ideology toward a reproductive strategy radically different from the conventional one, and (b) whether the pattern of delayed motherhood was culture specific. Results indicated that the patterns of delayed motherhood were culture specific. Feminist ideology clearly influenced the timing of the American women's first birth but was evident in Sweden. Women in the U.S. exhibited more nonconventional behaviors and attitudes, whereas Swedish women were more conventional. However, the husbands in both groups were remarkably similar in infant caretaking behaviors, regardless of culture and level of education attainment. These findings indicate postponed motherhood has different meanings in the cultural context of these two Western industrialized societies.

  1. Advanced Investigation and Comparative Study of Graphics Processing Unit-queries Countered

    Directory of Open Access Journals (Sweden)

    A. Baskar

    2014-10-01

    Full Text Available GPU, Graphics Processing Unit, is the buzz word ruling the market these days. What is that and how has it gained that much importance is what to be answered in this research work. The study has been constructed with full attention paid towards answering the following question. What is a GPU? How is it different from a CPU? How good/bad it is computationally when comparing to CPU? Can GPU replace CPU, or it is a day dream? How significant is arrival of APU (Accelerated Processing Unit in market? What tools are needed to make GPU work? What are the improvement/focus areas for GPU to stand in the market? All the above questions are discussed and answered well in this study with relevant explanations.

  2. 图形处理器在通用计算中的应用%Application of graphics processing unit in general purpose computation

    Institute of Scientific and Technical Information of China (English)

    张健; 陈瑞

    2009-01-01

    基于图形处理器(GPU)的计算统一设备体系结构(compute unified device architecture,CUDA)构架,阐述了GPU用于通用计算的原理和方法.在Geforce8800GT下,完成了矩阵乘法运算实验.实验结果表明,随着矩阵阶数的递增,无论是GPU还是CPU处理,速度都在减慢.数据增加100倍后,GPU上的运算时间仅增加了3.95倍,而CPU的运算时间增加了216.66倍.%Based on the CUDA (compute unified device architecture) of GPU (graphics processing unit), the technical fundamentals and methods for general purpose computation on GPU are introduced. The algorithm of matrix multiplication is simulated on Geforce8800 GT. With the increasing of matrix order, algorithm speed is slowed either on CPU or on GPU. After the data quantity increases to 100 times, the operation time only increased in 3.95 times on GPU, and 216.66 times on CPU.

  3. Wound Botulism in Injection Drug Users: Time to Antitoxin Correlates with Intensive Care Unit Length of Stay

    Directory of Open Access Journals (Sweden)

    Offerman, Steven R

    2009-11-01

    Full Text Available Objectives: We sought to identify factors associated with need for mechanical ventilation (MV, length of intensive care unit (ICU stay, length of hospital stay, and poor outcome in injection drug users (IDUs with wound botulism (WB.Methods: This is a retrospective review of WB patients admitted between 1991-2005. IDUs were included if they had symptoms of WB and diagnostic confirmation. Primary outcome variables were the need for MV, length of ICU stay, length of hospital stay, hospital-related complications, and death.Results: Twenty-nine patients met inclusion criteria. Twenty-two (76% admitted to heroin use only and seven (24% admitted to heroin and methamphetamine use. Chief complaints on initial presentation included visual changes, 13 (45%; weakness, nine (31%; and difficulty swallowing, seven (24%. Skin wounds were documented in 22 (76%. Twenty-one (72% patients underwent mechanical ventilation (MV. Antitoxin (AT was administered to 26 (90% patients but only two received antitoxin in the emergency department (ED. The time from ED presentation to AT administration was associated with increased length of ICU stay (Regression coefficient = 2.5; 95% CI 0.45, 4.5. The time from ED presentation to wound drainage was also associated with increased length of ICU stay (Regression coefficient = 13.7; 95% CI = 2.3, 25.2. There was no relationship between time to antibiotic administration and length of ICU stay.Conclusion: MV and prolonged ICU stays are common in patients identified with WB. Early AT administration and wound drainage are recommended as these measures may decrease ICU length of stay.[West J Emerg Med. 2009;10(4:251-256.

  4. Effective electron-density map improvement and structure validation on a Linux multi-CPU web cluster: The TB Structural Genomics Consortium Bias Removal Web Service.

    Science.gov (United States)

    Reddy, Vinod; Swanson, Stanley M; Segelke, Brent; Kantardjieff, Katherine A; Sacchettini, James C; Rupp, Bernhard

    2003-12-01

    Anticipating a continuing increase in the number of structures solved by molecular replacement in high-throughput crystallography and drug-discovery programs, a user-friendly web service for automated molecular replacement, map improvement, bias removal and real-space correlation structure validation has been implemented. The service is based on an efficient bias-removal protocol, Shake&wARP, and implemented using EPMR and the CCP4 suite of programs, combined with various shell scripts and Fortran90 routines. The service returns improved maps, converted data files and real-space correlation and B-factor plots. User data are uploaded through a web interface and the CPU-intensive iteration cycles are executed on a low-cost Linux multi-CPU cluster using the Condor job-queuing package. Examples of map improvement at various resolutions are provided and include model completion and reconstruction of absent parts, sequence correction, and ligand validation in drug-target structures.

  5. Design of CPU with Cache and Precise Interruption Response%带Cache和精确中断响应的CPU设计

    Institute of Scientific and Technical Information of China (English)

    刘秋菊; 李飞; 刘书伦

    2012-01-01

    In this paper the design of CPU with Cache and precise interruption response was proposed. 15 of the MIPS instruction set were selected as the basic instruction for the CPU. By using 5 stage pipeline, the instruction Cache,data Cache and precise interruption response were realized. The teat results show that the scheme meets the design requirements.%提出了带Cache和精确中断响应的CPU设计方案,实现指令集MIPS中选取15条指令作为本CPU的基本指令.采用基本5步流水线CPU设计,给出了指令Cache、数据Cache和精确中断响应的设计与实现.测试结果表明,该方案符合设计要求.

  6. Energy-efficient optical network units for OFDM PON based on time-domain interleaved OFDM technique.

    Science.gov (United States)

    Hu, Xiaofeng; Cao, Pan; Zhang, Liang; Jiang, Lipeng; Su, Yikai

    2014-06-02

    We propose and experimentally demonstrate a new scheme to reduce the energy consumption of optical network units (ONUs) in orthogonal frequency division multiplexing passive optical networks (OFDM PONs) by using time-domain interleaved OFDM (TI-OFDM) technique. In a conventional OFDM PON, each ONU has to process the complete downstream broadcast OFDM signal with a high sampling rate and a large FFT size to retrieve its required data, even if it employs a portion of OFDM subcarriers. However, in our scheme, the ONU only needs to sample and process one data group from the downlink TI-OFDM signal, effectively reducing the sampling rate and the FFT size of the ONU. Thus, the energy efficiency of ONUs in OFDM PONs can be greatly improved. A proof-of-concept experiment is conducted to verify the feasibility of the proposed scheme. Compared to the conventional OFDM PON, our proposal can save 17.1% and 26.7% energy consumption of ONUs by halving and quartering the sampling rate and the FFT size of ONUs with the use of the TI-OFDM technology.

  7. 收获机组作业时间分析与建模%Analysis and Modeling of Operation Time Items and Times Utilization Rate of Harvest Unit

    Institute of Scientific and Technical Information of China (English)

    乔金友; 韩兆桢; 李传磊; 陈海涛; 衣佳忠; 姜岩; 黄超; 张东光

    2016-01-01

    收获作业是粮食生产过程关键环节之一,选择适宜的收获机械适时完成收获作业是粮食丰产丰收的重要保障,因此提高农业收获机组效率已成为收获作业的重要组成部分。依据收获机组实际作业测得数据进行分析,明确了典型联合收获机作业时间项目构成,建立了纯作业、转弯、卸粮等各个时间项目的数学计算模型。针对3种卸粮方式—单侧卸粮、双侧卸粮、满箱卸粮分别建立数学模型,对3种不同的卸粮方式时间利用率进行了分析比较,同时选择约翰迪尔9660进行试验研究。%Harvesting operation is one of the key links in the process of grain production, choosing the appropriate har-vesting machine to finish the harvest is an important guarantee for the harvest of grain yield, so it is important to improve the efficiency of agricultural harvest unit.According to the data model analysis of the actual operation of the harvester, the typical combines harvest machine operation time is defined, and the mathematical model of the operation, turning and unloading is established.For three kinds of unloading ways:unilateral unloading , bilateral unloading grain, trunkful un-loading grain mathematical models are established respectively, for three different unloading grain pattern time utilization rate through the analysis and comparison, and John Deere 9660 was studied.

  8. Identifying modules of coexpressed transcript units and their organization of Saccharopolyspora erythraea from time series gene expression profiles.

    Directory of Open Access Journals (Sweden)

    Xiao Chang

    Full Text Available BACKGROUND: The Saccharopolyspora erythraea genome sequence was released in 2007. In order to look at the gene regulations at whole transcriptome level, an expression microarray was specifically designed on the S. erythraea strain NRRL 2338 genome sequence. Based on these data, we set out to investigate the potential transcriptional regulatory networks and their organization. METHODOLOGY/PRINCIPAL FINDINGS: In view of the hierarchical structure of bacterial transcriptional regulation, we constructed a hierarchical coexpression network at whole transcriptome level. A total of 27 modules were identified from 1255 differentially expressed transcript units (TUs across time course, which were further classified in to four groups. Functional enrichment analysis indicated the biological significance of our hierarchical network. It was indicated that primary metabolism is activated in the first rapid growth phase (phase A, and secondary metabolism is induced when the growth is slowed down (phase B. Among the 27 modules, two are highly correlated to erythromycin production. One contains all genes in the erythromycin-biosynthetic (ery gene cluster and the other seems to be associated with erythromycin production by sharing common intermediate metabolites. Non-concomitant correlation between production and expression regulation was observed. Especially, by calculating the partial correlation coefficients and building the network based on Gaussian graphical model, intrinsic associations between modules were found, and the association between those two erythromycin production-correlated modules was included as expected. CONCLUSIONS: This work created a hierarchical model clustering transcriptome data into coordinated modules, and modules into groups across the time course, giving insight into the concerted transcriptional regulations especially the regulation corresponding to erythromycin production of S. erythraea. This strategy may be extendable to studies

  9. Real-time PCR strategy for the identification of Trypanosoma cruzi discrete typing units directly in chronically infected human blood.

    Science.gov (United States)

    Muñoz-San Martín, Catalina; Apt, Werner; Zulantay, Inés

    2017-04-01

    The protozoan Trypanosoma cruzi is the causative agent of Chagas disease, a major public health problem in Latin America. This parasite has a complex population structure comprised by six or seven major evolutionary lineages (discrete typing units or DTUs) TcI-TcVI and TcBat, some of which have apparently resulted from ancient hybridization events. Because of the existence of significant biological differences between these lineages, strain characterization methods have been essential to study T. cruzi in its different vectors and hosts. However, available methods can be laborious and costly, limited in resolution or sensitivity. In this study, a new genotyping strategy by real-time PCR to identify each of the six DTUs in clinical blood samples have been developed and evaluated. Two nuclear (SL-IR and 18S rDNA) and two mitochondrial genes (COII and ND1) were selected to develop original primers. The method was evaluated with eight genomic DNA of T. cruzi populations belonging to the six DTUs, one genomic DNA of Trypanosoma rangeli, and 53 blood samples from individuals with chronic Chagas disease. The assays had an analytical sensitivity of 1-25fg of DNA per reaction tube depending on the DTU analyzed. The selectivity of trials with 20fg/μL of genomic DNA identified each DTU, excluding non-targets DTUs in every test. The method was able to characterize 67.9% of the chronically infected clinical samples with high detection of TcII followed by TcI. With the proposed original genotyping methodology, each DTU was established with high sensitivity after a single real-time PCR assay. This novel protocol reduces carryover contamination, enables detection of each DTU independently and in the future, the quantification of each DTU in clinical blood samples.

  10. The Experimental Study on CPU Water-cooling Heat-radiating System%CPU水冷系统散热实验研究

    Institute of Scientific and Technical Information of China (English)

    吕玉坤; 刘海峰; 徐国涛

    2012-01-01

    通过对某台式计算机水冷系统CPU吸热盒的换热和阻力特性实验,证明CPU吸热盒内的阻力压降与进口流速成二次方关系,热交换量随流量的增加先增大后减小。然后进行了不同管路布置情况下阻力和换热的性能试验,得出北桥吸热盒与显卡吸热盒并联的管路布置为最优方案,比管路串联布置时的总阻力低2.4%,CPU吸热盒换热量增加了21%。同时推出除CPU吸热盒管路以外的管路总阻力系数和管路阻力损失计算公式。%Based on the experiment of heat transfer and pressure drop characteristics of a desktop computer water-cooling system CPU heat-absorbing box, the relationship between the resistance pressure drop of the CPU heat-absorbing box and inlet velocity is a quadratic relationship and the amount of heat exchange first increases and then decreases with increasing flow is proved. And then resistance and heat transfer performance experiment under different pipeline layout was done, the pipeline layout when the North Bridge heat-absorbing box and graphics heat-absorbing box is in parallel circuit arrangement is the optimal solution. In contrast to the pipeline series connection, the resistance induces 2.4% and the heat exchange of CPU heat-absorbing box increases 21%. The pipeline (exclude the pipeline of CPU heat-absorbing box) total drag coefficient and resistance loss formula is derived.

  11. Abrupt changes in FKBP12.6 and SERCA2a expression contribute to sudden occurrence of ventricular fibrillation on reperfusion and are prevented by CPU86017

    Institute of Scientific and Technical Information of China (English)

    Tao NA; Zhi-jiang HUANG; De-zai DAI; Yuan ZHANG; Yin DAI

    2007-01-01

    Aim: The occurrence of ventricular fibriUation (VF) is dependent on the deterioration of channelopathy in the myocardium. It is interesting to investigate molecular changes in relation to abrupt appearance of VF on repeffusion. We aimed to study whether changes in the expression of FKBP12.6 and SERCA2a and the endothelin (ET) system on reperfusion against ischemia were related to the rapid occurrence of VF and whether CPU86017, a class Ⅲ antiarrhythmic agent which blocksK IKr, IKs,and ICa.L, suppressed VF by correcting the molecular changes on repeffusion.Methods: Cardiomyopathy (CM) was produced by 0.4 mg/kg sc L-thyroxin for 10 d in rats, and subjected to 10 min coronary artery ligation/reperfusion on d 11. Expressions of the Ca2+ handling and ET system and calcium transients were conducted and CPU86017 was injected (4 mg/kg, sc) on d 6-10.Results: Ahigh incidence of VF was found on repeffusion of the rat CM hearts, but there was no VF before reperfusion. The elevation of diastolic calcium was significant in the CM myocytes and exhibited abnormality of the Ca2+ handling system. The rapid downregulation of mRNA and the protein expression of FKBP12.6 and SERCA2a were found on reperfusion in association with the upregulation of the expression of the endothelin-converting enzyme (ECE) and protein kinase A (PKA), in contrast, no change in the ryanodine type 2 receptor (RyR2), phospholamban (PLB),endothelin A receptor (ETAR), and iNOS was found. CPU86017 removed these changes and suppressed VEConclusion: Abrupt changes in the expression of FKBP12.6, SERCA2a, PKA, and ECE on reperfusion against ischemia, which are responsible for the rapid occurrence of VF, have been observed. These changes are effectively prevented by CPU86017.

  12. CPU0213,a novel endothelin receptor antagonist,ameliorates septic renal lesion by suppressing ET system and NF-κB in rats

    Institute of Scientific and Technical Information of China (English)

    Haibo HE; De-zai DAI; Yin DAI

    2006-01-01

    Aim: To examine whether a novel endothelin receptor antagonist, CPU0213, is effective in relieving the acute renal failure (ARF) of septic shock by suppressing the activated endothelin-reactive oxygen species (ET-ROS) pathway and nuclear factor kappa B (NF-κB). Methods: The cecum was ligated and punctured in rats under anesthesia. CPU0213 (30 mg·kg-1·d-1, bid, sc×3 d) was administered 8 h after surgical operation. Results: In the untreated septic shock group, the mean arterial pressure and survival rate were markedly decreased (P<0.01), and heart rate, weight index of kidney, serum creatinine and blood urea nitrogen, 24 h urinary protein and creatinine were significantly increased (P<0.01). The levels of ET-1, total NO synthetase (tNOS), indusible nitric oxide synthetase (iNOS), nitric oxide (NO), and ROS in serum and the renal cortex were markedly increased (P< 0.01). The upregulation of the mRNAlevels of preproET-1, endothelin converting enzyme, ETA, ETB, iNOS, and tumor necrosis factor-alpha in the renal cortex was significant (P<0.01). The protein amount of activated NF-κB was significantly increased (P<0.01) in comparison with the sham operation group. All of these changes were significantly reversed after CPU0213 administration. Conclusion: Upregulation of the ET signaling pathway and NF-κB play an important role in the ARF of septic shock. Amelioration of renal lesions was achieved by suppressing the ETA and ETB receptors in the renal cortex following CPU0213 medication.

  13. A GPU-based Real-time Software Correlation System for the Murchison Widefield Array Prototype

    Science.gov (United States)

    Wayth, Randall B.; Greenhill, Lincoln J.; Briggs, Frank H.

    2009-08-01

    Modern graphics processing units (GPUs) are inexpensive commodity hardware that offer Tflop/s theoretical computing capacity. GPUs are well suited to many compute-intensive tasks including digital signal processing. We describe the implementation and performance of a GPU-based digital correlator for radio astronomy. The correlator is implemented using the NVIDIA CUDA development environment. We evaluate three design options on two generations of NVIDIA hardware. The different designs utilize the internal registers, shared memory, and multiprocessors in different ways. We find that optimal performance is achieved with the design that minimizes global memory reads on recent generations of hardware. The GPU-based correlator outperforms a single-threaded CPU equivalent by a factor of 60 for a 32-antenna array, and runs on commodity PC hardware. The extra compute capability provided by the GPU maximizes the correlation capability of a PC while retaining the fast development time associated with using standard hardware, networking, and programming languages. In this way, a GPU-based correlation system represents a middle ground in design space between high performance, custom-built hardware, and pure CPU-based software correlation. The correlator was deployed at the Murchison Widefield Array 32-antenna prototype system where it ran in real time for extended periods. We briefly describe the data capture, streaming, and correlation system for the prototype array.

  14. Real-time maximum a-posteriori image reconstruction for fluorescence microscopy

    Directory of Open Access Journals (Sweden)

    Anwar A. Jabbar

    2015-08-01

    Full Text Available Rapid reconstruction of multidimensional image is crucial for enabling real-time 3D fluorescence imaging. This becomes a key factor for imaging rapidly occurring events in the cellular environment. To facilitate real-time imaging, we have developed a graphics processing unit (GPU based real-time maximum a-posteriori (MAP image reconstruction system. The parallel processing capability of GPU device that consists of a large number of tiny processing cores and the adaptability of image reconstruction algorithm to parallel processing (that employ multiple independent computing modules called threads results in high temporal resolution. Moreover, the proposed quadratic potential based MAP algorithm effectively deconvolves the images as well as suppresses the noise. The multi-node multi-threaded GPU and the Compute Unified Device Architecture (CUDA efficiently execute the iterative image reconstruction algorithm that is ≈200-fold faster (for large dataset when compared to existing CPU based systems.

  15. Real-time maximum a-posteriori image reconstruction for fluorescence microscopy

    Science.gov (United States)

    Jabbar, Anwar A.; Dilipkumar, Shilpa; C K, Rasmi; Rajan, K.; Mondal, Partha P.

    2015-08-01

    Rapid reconstruction of multidimensional image is crucial for enabling real-time 3D fluorescence imaging. This becomes a key factor for imaging rapidly occurring events in the cellular environment. To facilitate real-time imaging, we have developed a graphics processing unit (GPU) based real-time maximum a-posteriori (MAP) image reconstruction system. The parallel processing capability of GPU device that consists of a large number of tiny processing cores and the adaptability of image reconstruction algorithm to parallel processing (that employ multiple independent computing modules called threads) results in high temporal resolution. Moreover, the proposed quadratic potential based MAP algorithm effectively deconvolves the images as well as suppresses the noise. The multi-node multi-threaded GPU and the Compute Unified Device Architecture (CUDA) efficiently execute the iterative image reconstruction algorithm that is ≈200-fold faster (for large dataset) when compared to existing CPU based systems.

  16. Effects of light intensity and curing time of the newest LED Curing units on the diametral tensile strength of microhybrid composite resins

    Science.gov (United States)

    Ariani, D.; Herda, E.; Eriwati, Y. K.

    2017-08-01

    The aim of this study was to evaluate the influence of light intensity and curing time of the latest LED curing units on the diametral tensile strength of microhybrid composite resins. Sixty-three specimens from three brands (Polofil Supra, Filtek Z250, and Solare X) were divided into two test groups and one control group. The test groups were polymerized with a Flashmax P3 LED curing unit for one or three seconds. The control group was polymerized with a Ledmax 450 curing unit with the curing time based on the resin manufacturer’s instructions. A higher light intensity and shorter curing time did not influence the diametral tensile strength of microhybrid composite resins.

  17. Evaluation of Selected Resource Allocation and Scheduling Methods in Heterogeneous Many-Core Processors and Graphics Processing Units

    Directory of Open Access Journals (Sweden)

    Ciznicki Milosz

    2014-12-01

    Full Text Available Heterogeneous many-core computing resources are increasingly popular among users due to their improved performance over homogeneous systems. Many developers have realized that heterogeneous systems, e.g. a combination of a shared memory multi-core CPU machine with massively parallel Graphics Processing Units (GPUs, can provide significant performance opportunities to a wide range of applications. However, the best overall performance can only be achieved if application tasks are efficiently assigned to different types of processor units in time taking into account their specific resource requirements. Additionally, one should note that available heterogeneous resources have been designed as general purpose units, however, with many built-in features accelerating specific application operations. In other words, the same algorithm or application functionality can be implemented as a different task for CPU or GPU. Nevertheless, from the perspective of various evaluation criteria, e.g. the total execution time or energy consumption, we may observe completely different results. Therefore, as tasks can be scheduled and managed in many alternative ways on both many-core CPUs or GPUs and consequently have a huge impact on the overall computing resources performance, there are needs for new and improved resource management techniques. In this paper we discuss results achieved during experimental performance studies of selected task scheduling methods in heterogeneous computing systems. Additionally, we present a new architecture for resource allocation and task scheduling library which provides a generic application programming interface at the operating system level for improving scheduling polices taking into account a diversity of tasks and heterogeneous computing resources characteristics.

  18. [Results obtained by a mobile handicap-prevention unit at the Institut de Léprologie de Dakar].

    Science.gov (United States)

    Hirzel, C; Grauwin, M Y; Mane, I; Cartel, J L

    1995-01-01

    Of 584 leprosy patients known at the Institut de Léprologie Appliquée de Dakar because they suffered a nerve lesion with or without chronic plantar ulcer (CPU), 242 (41%) could be followed-up during a mean period of time of 8.2 years (range: 5 and 10 years) by the means of the mobile disability prevention team (health education, medical care and shoe workshop). Every two months a visit of the patients, at their home town, was organized, with the purpose to assess whether they could actually put into practice the foot and hand as having been trained for. At the same time, further advice and encouragement were given to the patients. Adapted footwear was brought to the patient, at reduced fee, the foot prints and special moulds having been taken during the previous visit. The local health worker were responsible for light surgical cares. Among the 242 followed-up patients: of 107 without CPU at beginning, 90 (84%) remained so, of 135 with CPU at beginning, 57 (42%) were cured, of 135 with CPU at beginning, 74 (55%) remained stable (no worsening), the last 21, of whom 17 showed severe foot deformities but without CPU, worsened (all presented one or more CPU at the last control). Of the 242 patients, 221 (91%) remained stable or showed substantial improvement. Therefore, it must be emphasized that careful follow-up of patients is essential to insure the improvement or care of CPU as well as to prevent the onset, worsening or reappearance of CPU. Such follow-up must consist of cares, health education and special shoe wearing.

  19. An Ecosystem Assessment of Carbon Storage and Fluxes Over Space and Time in the Conterminous United States

    Science.gov (United States)

    Zhu, Z.; Bergamaschi, B. A.; Hawbaker, T.; Liu, S.; Reed, B.; Sleeter, B. M.; Sohl, T.; Stackpoole, S. M.

    2013-12-01

    Ecosystem carbon stock, sequestration, and greenhouse gasl (GHG) flux were estimated for the conterminous United States (CONUS) in two time periods: baseline (annual average of 2001-2005) and future projection (annual average of 2006-2050). Major input data for baseline estimates included national resource inventories (such as forest and agricultural inventories and data from a national stream gage network), land use and land cover (LULC) map and soil carbon from national soil databases. The assessment covered 7,88 million km2 in land and water areas. Major input data for projected carbon estimates included future LULC scenarios developed in a framework consistent with the Intergovernmental Panel on Climate Change's future climate projections, and the future climate projection data. Estimated carbon stock and net ecosystem carbon balance for all major pools (live biomass, dead biomass, and soil organic matter) and terrestrial ecosystems (forests, agrcilture, wetlands, and grasslands) were produced using ecosystem models (Table 1). Emission from wildfires of the CONUS was evaluated based on remote sensing methods and fire behavior modeling. Emission fom inland water bodies (including rivers, lakes, and reservoirs), carbon transport by riverine systems, and carbon burial in sediments of lakes and reservoirs in the CONUS were estimated using input data from available aquatic measurements in a national water information system, water areas, and empirial methods (Table 2). Details of the methods used, and effects of drivers (both natural and anthropogenic processes) will be presented in the poster. Uncertainties from the assessment remained high as indicated by the major results shown above. Sources of uncertainties included scarcity of input data, structure differences of methods and models used, and parameterization and assumptions made in the modeling process.Table 1. Estimated carbon stock and net ecosystem carbon balance (NECB) of the major ecosystems by two time

  20. 基于嵌入式CPU的加解密子系统%Encryption and Decryption Subsystem Based on Embedded CPU

    Institute of Scientific and Technical Information of China (English)

    王剑非; 马德; 熊东亮; 陈亮; 黄凯; 葛海通

    2014-01-01

    针对信息安全等级和应用场合变化时IP级复用的片上系统( SoC)集成验证效率低的问题,提出一种基于嵌入式CPU的加解密子系统。子系统包括RSA,DES,AES等多种加解密模块,通过硬件上的参数配置,构造满足不同信息安全应用和等级的子系统;采用低功耗高性能的嵌入式CPU,作为SoC中主CPU的协处理器,控制各加解密模块的工作,可减少对主CPU的访问,以降低功耗。将经过验证的加解密子系统作为整体集成到SoC中,实现子系统复用,可减少SoC设计和集成工作量,降低SoC验证难度;利用门控时钟技术,根据各加解密模块的工作状态管理时钟,从而降低加解密子系统的功耗。采用CKSoC设计集成方法,在SoC集成工具平台上可快速集成不同配置下的基于嵌入式CPU的加解密子系统。实验结果表明,构造子系统后的SoC设计和验证工作量明显减少,提高了工作效率。%To improve the efficiency of System-on-Chip( SoC) integration and verification for different applications of information security,a complete and pre-verified encryption and decryption subsystem based on embedded CPU is proposed. The subsystem includes cryptography modules such as RSA,DES,AES and so on. It can satisfy applications of different requirements on security levels. The embedded CPU in subsystem is a low-power and high-performance CPU,as a coprocessor for main CPU in SoC. It is responsible for controlling the operation of cryptography modules, reducing both the computation load of the main CPU and the power of SoC greatly. Integrating the pre-verified encryption and decryption subsystem as a whole to SoC, significantly reduces SoC design and integration effort and lowers the difficulty of SoC verification. Using gated clock technology, which manages the clock of cryptography modules based on their states,reduces the power of subsystem effectively. According to the CKSoC Integration method, the subsystem based

  1. A real-time autostereoscopic display method based on partial sub-pixel by general GPU processing

    Science.gov (United States)

    Chen, Duo; Sang, Xinzhu; Cai, Yuanfa

    2013-08-01

    With the progress of 3D technology, the huge computing capacity for the real-time autostereoscopic display is required. Because of complicated sub-pixel allocating, masks providing arranged sub-pixels are fabricated to reduce real-time computation. However, the binary mask has inherent drawbacks. In order to solve these problems, weighted masks are used in displaying based on partial sub-pixel. Nevertheless, the corresponding computations will be tremendously growing and unbearable for CPU. To improve calculating speed, Graphics Processing Unit (GPU) processing with parallel computing ability is adopted. Here the principle of partial sub-pixel is presented, and the texture array of Direct3D 10 is used to increase the number of computable textures. When dealing with a HD display and multi-viewpoints, a low level GPU is still able to permit a fluent real time displaying, while the performance of high level CPU is really not acceptable. Meanwhile, after using texture array, the performance of D3D10 could be double, and sometimes be triple faster than D3D9. There are several distinguishing features for the proposed method, such as the good portability, less overhead and good stability. The GPU display system could also be used for the future Ultra HD autostereoscopic display.

  2. Real-time Kinematics Base Station and Survey Unit Setup Method for the Synchronous Impulse Reconstruction (SIRE) Radar

    Science.gov (United States)

    2012-12-01

    4. Novatel CDU software. ...................................................................................................4 Figure 5. Survey Unit...a laptop to the GPS receiver with a universal serial bus (USB) or RS232 cable (#3). 8. Run the Control and Display Unit ( CDU ) program on the laptop...identify the correct COM port and connect to the GPS receiver. Figure 4 shows a typical display of the CDU window. Figure 4. Novatel CDU

  3. Time from accident to admission to a burn intensive care unit: how long does it actually take? A 25-year retrospective data analysis from a german burn center.

    Science.gov (United States)

    Schiefer, J L; Alischahi, A; Perbix, W; Grigutsch, D; Graeff, I; Zinser, M; Demir, E; Fuchs, P C; Schulz, A

    2016-03-31

    Severe burn injuries often require specialized treatment at a burn center. It is known that prompt admission to an intensive care unit is essential for achieving good outcome. Nevertheless, very little is known about the duration of time before a patient is admitted to a specialized center after a burn injury in Germany, and whether the situation has improved over time. We retrospectively analyzed time from burn injury to admission to the burn intensive care unit in the Cologne-Merheim Medical Center - one of Germany's specialized burn centers - over the last 25 years. Moreover, we analyzed the data based on differences according to time of injury and day of the week, as well as severity of the burn injury. There was no weekend effect with regard to transfer time; instead transfer time was particularly short on a Monday or on Sundays. Furthermore, patients with severe burn injuries of 40-89% total body surface area (TBSA) showed the least differences in transfer time. Interestingly, the youngest and the oldest patients arrived at the burn intensive care unit (BICU) the fastest. This study should help elucidate published knowledge regarding transfer time from the scene of the accident to admission to a BICU in Germany.

  4. Response times of ambulances to calls from Midwife Obstetric Units of the Peninsula Maternal and Neonatal Service (PMNS in Cape Town

    Directory of Open Access Journals (Sweden)

    J.K. Marcus

    2009-09-01

    Full Text Available Response times of ambulances to calls from Midwife Obstetric Units, although varied, are perceived as slow. Delays in transporting women experiencing complications during or after their pregnancies to higher levels of care may have negative consequences such as fetal, neonatal or maternal morbidity or death. An exploratory descriptive study was undertaken to investigate the response times of ambulances of the Western Cape Emergency Medical Services to calls from midwife obstetric units (MOUs in the Peninsula Maternal and Neonatal Services (PMNS in Cape Town. Response times were calculated from data collected in specific MOUs using a specifically developed instrument. Recorded data included time of call placed requesting transfer, diagnosis or reason for transfer, priority of call and the time of arrival of ambulance to the requesting facility. Mean, median and range of response times, in minutes, to various MOUs and priorities of calls were calculated. These were then compared using the Kruskal-Wallis test. A comparison was then made between the recorded and analysed response times to national norms and recommendations for ambulance response times and maternal transfer response times respectively.A wide range of response times was noted for the whole sample. Median response times across all priorities of calls and to all MOUs in sample fell short of national norms and recommendations. No statistical differences were noted between various priorities of calls and MOUs.The perception of delayed response times of ambulances to MOUs in the PMNS was confirmed in this pilot study.

  5. A Multi-Agent Mah Jong Playing System: Towards Real-Time Recognition of Graphic Units in Graphic Representations

    Directory of Open Access Journals (Sweden)

    H. Achten

    2003-01-01

    Full Text Available In architectural design, sketching is an important means to explore the first conceptual developments in the design process. It is necessary to understand the conventions of depiction and encoding in sketches and drawings if we want to support the architect in the sketching activity. The theory of graphic units provides a comprehensive list of conventions of depiction and encoding that are widely used among architects. These graphic units form useful building blocks to understand design drawings. We investigate whether it is possible to build a system that can recognize graphic units. The technology we are looking at is multi-agent systems. It was chosen for the following reasons: agents can specialize in graphic units, a multi-agent system can deal with ambiguity through negotiation and conflict resolution, and multi-agent systems function in dynamically changing environments. Currently there is no general approach or technology available for multi-agent systems. Therefore, in our research we first set out to make such a multi-agent system. In order to keep the complexity low, we first aim to make a system that can do something simple: playing Mah Jong solitary. The Mah Jong solitary system shares the following important features with a multi-agent system that can recognize graphic units: (1 specialized agents for moves; (2 negotiation between agents to establish the best move; (3 a dynamically changing environment; and (4 search activity for more advanced strategies. The paper presents the theoretical basis of graphic units and multi-agents systems, followed by a description of the multi-agent framework and its implementation. A number of systems that can play Mah Jong at various degrees of competence and accordingly degrees of complexity of multi-agent system, are distinguished. Finally, the paper demonstrates how the findings are informative for a system that can recognize graphic units.

  6. Real-time Virtual Environment Signal Extraction and Denoising Using Programmable Graphics Hardware

    Institute of Scientific and Technical Information of China (English)

    Yang Su; Zhi-Jie Xu; Xiang-Qian Jiang

    2009-01-01

    The sense of being within a three-dimensional (3D) space and interacting with virtual 3D objects in a computer-generated virtual environment (VE) often requires essential image, vision and sensor signal processing techniques such as differentiating and denoising. This paper describes novel implementations of thc Gaussian filtering for characteristic signal extraction and wavelet-based image denoising algorithms that run on the graphics processing unit (GPU). While significant acceleration over standard CPU implementations is obtained through exploiting data parallelism provided by the modern programmable graphics hardware, the CPU can be freed up to run other computations more efficiently such as artificial intelligence (AI) and physics. The proposed GPU-based Gaussian filtering can extract surface information from a real object and provide its material features for rendering and illumination. The wavelet-based signal denoising for large size digital images realized in this project provided better realism for VE visualization without sacrificing real-time and interactive performances of an application.

  7. Validation of columnar CsI x-ray detector responses obtained with hybridMANTIS, a CPU-GPU Monte Carlo code for coupled x-ray, electron, and optical transport.

    Science.gov (United States)

    Sharma, Diksha; Badano, Aldo

    2013-03-01

    hybridMANTIS is a Monte Carlo package for modeling indirect x-ray imagers using columnar geometry based on a hybrid concept that maximizes the utilization of available CPU and graphics processing unit processors in a workstation. The authors compare hybridMANTIS x-ray response simulations to previously published MANTIS and experimental data for four cesium iodide scintillator screens. These screens have a variety of reflective and absorptive surfaces with different thicknesses. The authors analyze hybridMANTIS results in terms of modulation transfer function and calculate the root mean square difference and Swank factors from simulated and experimental results. The comparison suggests that hybridMANTIS better matches the experimental data as compared to MANTIS, especially at high spatial frequencies and for the thicker screens. hybridMANTIS simulations are much faster than MANTIS with speed-ups up to 5260. hybridMANTIS is a useful tool for improved description and optimization of image acquisition stages in medical imaging systems and for modeling the forward problem in iterative reconstruction algorithms.

  8. Attributes for MRB_E2RF1 Catchments in Selected Major River Basins of the Conterminous United States: Contact Time, 2002

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — This tabular data set represents the average contact time, in units of days, compiled for every MRB_E2RF1 catchment of Major River Basins (MRBs, Crawford and others,...

  9. Use of Jigsaw Technique to Teach the Unit "Science within Time" in Secondary 7th Grade Social Sciences Course and Students' Views on This Technique

    Science.gov (United States)

    Yapici, Hakki

    2016-01-01

    The aim of this study is to apply the jigsaw technique in Social Sciences teaching and to unroll the effects of this technique on learning. The unit "Science within Time" in the secondary 7th grade Social Sciences text book was chosen for the research. It is aimed to compare the jigsaw technique with the traditional teaching method in…

  10. Graphics processing units in bioinformatics, computational biology and systems biology.

    Science.gov (United States)

    Nobile, Marco S; Cazzaniga, Paolo; Tangherloni, Andrea; Besozzi, Daniela

    2016-07-08

    Several studies in Bioinformatics, Computational Biology and Systems Biology rely on the definition of physico-chemical or mathematical models of biological systems at different scales and levels of complexity, ranging from the interaction of atoms in single molecules up to genome-wide interaction networks. Traditional computational methods and software tools developed in these research fields share a common trait: they can be computationally demanding on Central Processing Units (CPUs), therefore limiting their applicability in many circumstances. To overcome this issue, general-purpose Graphics Processing Units (GPUs) are gaining an increasing attention by the scientific community, as they can considerably reduce the running time required by standard CPU-based software, and allow more intensive investigations of biological systems. In this review, we present a collection of GPU tools recently developed to perform computational analyses in life science disciplines, emphasizing the advantages and the drawbacks in the use of these parallel architectures. The complete list of GPU-powered tools here reviewed is available at http://bit.ly/gputools. © The Author 2016. Published by Oxford University Press.

  11. Using SI Units in Astronomy

    Science.gov (United States)

    Dodd, Richard

    2011-12-01

    1. Introduction; 2. An introduction to SI units; 3. Dimensional analysis; 4. Unit of angular measure (radian); 5. Unit of time (second); 6. Unit of length (metre); 7. Unit of mass (kilogram); 8. Unit of luminous intensity (candela); 9. Unit of thermodynamic temperature (kelvin); 10. Unit of electric current (ampere); 11. Unit of amount of substance (mole); 12. Astronomical taxonomy; Index.

  12. Novel web-based real-time dashboard to optimize recycling and use of red cell units at a large multi-site transfusion service

    Directory of Open Access Journals (Sweden)

    Christopher Sharpe

    2014-01-01

    Full Text Available Background: Effective blood inventory management reduces outdates of blood products. Multiple strategies have been employed to reduce the rate of red blood cell (RBC unit outdate. We designed an automated real-time web-based dashboard interfaced with our laboratory information system to effectively recycle red cell units. The objective of our approach is to decrease RBC outdate rates within our transfusion service. Methods: The dashboard was deployed in August 2011 and is accessed by a shortcut that was placed on the desktops of all blood transfusion services computers in the Capital District Health Authority region. It was designed to refresh automatically every 10 min. The dashboard provides all vital information on RBC units, and implemented a color coding scheme to indicate an RBC unit′s proximity to expiration. Results: The overall RBC unit outdate rate in the 7 months period following implementation of the dashboard (September 2011-March 2012 was 1.24% (123 units outdated/9763 units received, compared to similar periods in 2010-2011 and 2009-2010: 2.03% (188/9395 and 2.81% (261/9220, respectively. The odds ratio of a RBC unit outdate postdashboard (2011-2012 compared with 2010-2011 was 0.625 (95% confidence interval: 0.497-0.786; P < 0.0001. Conclusion: Our dashboard system is an inexpensive and novel blood inventory management system which was associated with a significant reduction in RBC unit outdate rates at our institution over a period of 7 months. This system, or components of it, could be a useful addition to existing RBC management systems at other institutions.

  13. A note on self-normalized Dickey-Fuller test for unit root in autoregressive time series with GARCH errors

    Institute of Scientific and Technical Information of China (English)

    YANG Xiao-rong; ZHANG Li-xin

    2008-01-01

    In this article, the unit root test for AR (p) model with GARCH errors is considered. The Dickey-Fuller test statistics are rewritten in the form of self-normalized sums, and the asymptotic distribution of the test statistics is derived under the weak conditions.

  14. Using simulated historical time series to prioritize fuel treatments on landscapes across the United States: The LANDFIRE prototype project

    Science.gov (United States)

    Robert E. Keane; Matthew Rollins; Zhi-Liang Zhu

    2007-01-01

    Canopy and surface fuels in many fire-prone forests of the United States have increased over the last 70 years as a result of modern fire exclusion policies, grazing, and other land management activities. The Healthy Forest Restoration Act and National Fire Plan establish a national commitment to reduce fire hazard and restore fire-adapted ecosystems across the USA....

  15. Bridging FPGA and GPU technologies for AO real-time control

    Science.gov (United States)

    Perret, Denis; Lainé, Maxime; Bernard, Julien; Gratadour, Damien; Sevin, Arnaud

    2016-07-01

    Our team has developed a common environment for high performance simulations and real-time control of AO systems based on the use of Graphics Processors Units in the context of the COMPASS project. Such a solution, based on the ability of the real time core in the simulation to provide adequate computing performance, limits the cost of developing AO RTC systems and makes them more scalable. A code developed and validated in the context of the simulation may be injected directly into the system and tested on sky. Furthermore, the use of relatively low cost components also offers significant advantages for the system hardware platform. However, the use of GPUs in an AO loop comes with drawbacks: the traditional way of offloading computation from CPU to GPUs - involving multiple copies and unacceptable overhead in kernel launching - is not well suited in a real time context. This last application requires the implementation of a solution enabling direct memory access (DMA) to the GPU memory from a third party device, bypassing the operating system. This allows this device to communicate directly with the real-time core of the simulation feeding it with the WFS camera pixel stream. We show that DMA between a custom FPGA-based frame-grabber and a computation unit (GPU, FPGA, or Coprocessor such as Xeon-phi) across PCIe allows us to get latencies compatible with what will be needed on ELTs. As a fine-grained synchronization mechanism is not yet made available by GPU vendors, we propose the use of memory polling to avoid interrupts handling and involvement of a CPU. Network and Vision protocols are handled by the FPGA-based Network Interface Card (NIC). We present the results we obtained on a complete AO loop using camera and deformable mirror simulators.

  16. Reversal of isoproterenol-induced downregulation of phospholamban and FKBP12.6 by CPU0213-mediated antagonism of endothelin receptors

    Institute of Scientific and Technical Information of China (English)

    Yu FENG; Xiao-yun TANG; De-zai DAI; Yin DAI

    2007-01-01

    Aim:The downregulation of phospholamban (PLB) and FKBPI 2.6 as a result of βreceptor activation is involved in the pathway(s) of congestive heart failure. We hypothesized that the endothelin (ET)-I system may link to downregulated PLB and FKBP12.6. Methods:Rats were subjected to ischemia/reperfusion (I/R) to cause heart failure (HF). 1 mg/kg isoproterenol (ISO) was injected subcutaneously (sc) for 10 d to worsen HF. 30 mg/kg CPU0213 (sc),a dual ET receptor (ETAR/ETBR) antagonist was given from d 6 to d 10. On d 11,cardiac function was assessed together with the determination of mRNA levels of ryanodine receptor 2,calstabin-2 (FKBP12.6),PLB,and sarcoplasmic reticulum Ca2+-ATPase. Isolated adult rat ventricular myocytes were incubated with ISO at lx 10-6 mol/L to set up an in vitro model of HF. Propranolol (PRO),CPU0213,and darusentan (DAR,an ETAR antagonist) were incubated with cardiomyocytes at 1 x 10.5 mol/L or 1 × 10-6mol/L in the presence of ISO (1× 10.6 mol/L). Immunocytochemistry and Western blotting were applied for measuring the protein levels of PLB and FKBP12.6.Results:The worsened hemodynamics produced by I/R were exacerbated by ISO pretreatment. The significant downregulation of the gene expression of PLB and FKBPI 2.6 and worsened cardiac function by ISO were reversed by CPU0213. In vitro ISO lx 10-6 mol/L produced a sharp decline of PLB and FKBP12.6 proteins relative to the control. The downregulation of the protein expression was significantly reversed by the ET receptor antagonist CPU0213 or DAR,comparable to that achieved by PRO. Conclusion:This study demonstrates a role of ET in mediating the downregulation of the cardiac Ca2+-handling protein by ISO.AcknowledgementWe are most grateful to Prof David J TRIGGLE from the State University of New York at Buffalo for assistance in revising the English of the manuscript.

  17. Operable Units

    Data.gov (United States)

    U.S. Environmental Protection Agency — This dataset consists of operable unit data from multiple Superfund sites in U.S. EPA Region 8. These data were acquired from multiple sources at different times and...

  18. Open problems in CEM: Porting an explicit time-domain volume-integral- equation solver on GPUs with OpenACC

    KAUST Repository

    Ergül, Özgür

    2014-04-01

    Graphics processing units (GPUs) are gradually becoming mainstream in high-performance computing, as their capabilities for enhancing performance of a large spectrum of scientific applications to many fold when compared to multi-core CPUs have been clearly identified and proven. In this paper, implementation and performance-tuning details for porting an explicit marching-on-in-time (MOT)-based time-domain volume-integral-equation (TDVIE) solver onto GPUs are described in detail. To this end, a high-level approach, utilizing the OpenACC directive-based parallel programming model, is used to minimize two often-faced challenges in GPU programming: developer productivity and code portability. The MOT-TDVIE solver code, originally developed for CPUs, is annotated with compiler directives to port it to GPUs in a fashion similar to how OpenMP targets multi-core CPUs. In contrast to CUDA and OpenCL, where significant modifications to CPU-based codes are required, this high-level approach therefore requires minimal changes to the codes. In this work, we make use of two available OpenACC compilers, CAPS and PGI. Our experience reveals that different annotations of the code are required for each of the compilers, due to different interpretations of the fairly new standard by the compiler developers. Both versions of the OpenACC accelerated code achieved significant performance improvements, with up to 30× speedup against the sequential CPU code using recent hardware technology. Moreover, we demonstrated that the GPU-accelerated fully explicit MOT-TDVIE solver leveraged energy-consumption gains of the order of 3× against its CPU counterpart. © 2014 IEEE.

  19. 基于多核CPU的干涉成像光谱仪快速数据重建方法%Fast Data Reconstructed Method of Fourier Transform Imaging Spectrometer Based on Multi-core CPU

    Institute of Scientific and Technical Information of China (English)

    杨智雄; 余春超; 严敏; 郑为建; 雷正刚; 粟宇路

    2014-01-01

    Imaging spectrometer can gain two-dimensional space image and one-dimensional spectrum at the same time, which shows high utility in color and spectral measurements, the true color image synthesis, military reconnaissance and so on. In order to realize the fast reconstructed processing of the Fourier transform imaging spectrometer data, the paper designed the optimization reconstructed algorithm with OpenMP parallel calculating technology, which was further used for the optimization process for the Hyper Spectral Imager of‘HJ-1’ Chinese satellite. The results show that the method based on multi-core parallel computing technology can control the multi-core CPU hardware resources competently and significantly enhance the calculation of the spectrum reconstruction processing efficiency. If the technology is applied to more cores workstation in parallel computing, it will be possible to complete Fourier transform imaging spectrometer real-time data processing with a single computer.%成像光谱仪作为一种航空航天遥感器工作,可以同时得到地物的二维空间图像信息和一维光谱的丰富信息,在颜色和光谱测量、真彩色图像合成、军事侦察等领域有着很高的实用价值。为了达到对干涉成像光谱仪数据快速处理的要求,使用 OpenMP 并行计算技术设计了基于多核 CPU 的成像光谱仪快速数据重建优化算法,并将其应用到我国“环境一号”卫星的高光谱数据处理任务中。实验结果表明,基于多核的并行计算技术能有效调动多核CPU的硬件资源,大幅度提高光谱重建处理任务的计算效率。如果将该技术应用到更多核的并行计算工作站上,单台计算机完成干涉成像光谱仪数据的实时处理任务将成为可能。

  20. Effects of distance from tip of LED light-curing unit and curing time on surface hardness of nano-filled composite resin

    Science.gov (United States)

    Shafadilla, V. A.; Usman, M.; Margono, A.

    2017-08-01

    Polymerization process depends on several variables, including the hue, thickness, and translucency of the composite resin, the size of the filler particles, the duration of exposure to light (the curing time), the intensity of the light, and the distance from the light. This study aimed to analyze the effects of the distance from the tip of the light-emitting diode (LED) light-curing unit and of curing time on the surface hardness of nano-filled composite resin. 60 specimens were prepared in a mold and divided into 6 groups based on various curing distances and times: 2 mm, 5 mm, and 8 mm and 20 seconds and 40 seconds. The highest surface hardness was seen in the group both closest to the tip and having the longest curing time, while the lowest hardness was seen in the group both farthest from the tip and having the shortest curing time. Significant differences were seen among the various tip distances, except for in the two groups that had 8-mm tip distances, which had no significant differences due to curing time. Both decreased distance from the tip of the LED light-curing unit and increased curing time increase the surface hardness of nano-filled composite resin. However, curing time increases the surface hardness only if the tip distance is ≤ 5 mm.