total cpu time: Topics by WorldWideScience.org

Sample records for total cpu time

Thermally-aware composite run-time CPU power models

OpenAIRE

Walker, Matthew J.; Diestelhorst, Stephan; Hansson, Andreas; Balsamo, Domenico; Merrett, Geoff V.; Al-Hashimi, Bashir M.

2016-01-01

Accurate and stable CPU power modelling is fundamental in modern system-on-chips (SoCs) for two main reasons: 1) they enable significant online energy savings by providing a run-time manager with reliable power consumption data for controlling CPU energy-saving techniques; 2) they can be used as accurate and trusted reference models for system design and exploration. We begin by showing the limitations in typical performance monitoring counter (PMC) based power modelling approaches and illust...
Improvement of CPU time of Linear Discriminant Function based on MNM criterion by IP

Directory of Open Access Journals (Sweden)

Shuichi Shinmura

2014-05-01

Full Text Available Revised IP-OLDF (optimal linear discriminant function by integer programming is a linear discriminant function to minimize the number of misclassifications (NM of training samples by integer programming (IP. However, IP requires large computation (CPU time. In this paper, it is proposed how to reduce CPU time by using linear programming (LP. In the first phase, Revised LP-OLDF is applied to all cases, and all cases are categorized into two groups: those that are classified correctly or those that are not classified by support vectors (SVs. In the second phase, Revised IP-OLDF is applied to the misclassified cases by SVs. This method is called Revised IPLP-OLDF.In this research, it is evaluated whether NM of Revised IPLP-OLDF is good estimate of the minimum number of misclassifications (MNM by Revised IP-OLDF. Four kinds of the real data—Iris data, Swiss bank note data, student data, and CPD data—are used as training samples. Four kinds of 20,000 re-sampling cases generated from these data are used as the evaluation samples. There are a total of 149 models of all combinations of independent variables by these data. NMs and CPU times of the 149 models are compared with Revised IPLP-OLDF and Revised IP-OLDF. The following results are obtained: 1 Revised IPLP-OLDF significantly improves CPU time. 2 In the case of training samples, all 149 NMs of Revised IPLP-OLDF are equal to the MNM of Revised IP-OLDF. 3 In the case of evaluation samples, most NMs of Revised IPLP-OLDF are equal to NM of Revised IP-OLDF. 4 Generalization abilities of both discriminant functions are concluded to be high, because the difference between the error rates of training and evaluation samples are almost within 2%. Therefore, Revised IPLP-OLDF is recommended for the analysis of big data instead of Revised IP-OLDF. Next, Revised IPLP-OLDF is compared with LDF and logistic regression by 100-fold cross validation using 100 re-sampling samples. Means of error rates of
CPU time reduction strategies for the Lambda modes calculation of a nuclear power reactor

Energy Technology Data Exchange (ETDEWEB)

Vidal, V.; Garayoa, J.; Hernandez, V. [Universidad Politecnica de Valencia (Spain). Dept. de Sistemas Informaticos y Computacion; Navarro, J.; Verdu, G.; Munoz-Cobo, J.L. [Universidad Politecnica de Valencia (Spain). Dept. de Ingenieria Quimica y Nuclear; Ginestar, D. [Universidad Politecnica de Valencia (Spain). Dept. de Matematica Aplicada

1997-12-01

In this paper, we present two strategies to reduce the CPU time spent in the lambda modes calculation for a realistic nuclear power reactor.The discretization of the multigroup neutron diffusion equation has been made using a nodal collocation method, solving the associated eigenvalue problem with two different techniques: the Subspace Iteration Method and Arnoldi`s Method. CPU time reduction is based on a coarse grain parallelization approach together with a multistep algorithm to initialize adequately the solution. (author). 9 refs., 6 tabs.
Enhanced round robin CPU scheduling with burst time based time quantum

Science.gov (United States)

Indusree, J. R.; Prabadevi, B.

2017-11-01

Process scheduling is a very important functionality of Operating system. The main-known process-scheduling algorithms are First Come First Serve (FCFS) algorithm, Round Robin (RR) algorithm, Priority scheduling algorithm and Shortest Job First (SJF) algorithm. Compared to its peers, Round Robin (RR) algorithm has the advantage that it gives fair share of CPU to the processes which are already in the ready-queue. The effectiveness of the RR algorithm greatly depends on chosen time quantum value. Through this research paper, we are proposing an enhanced algorithm called Enhanced Round Robin with Burst-time based Time Quantum (ERRBTQ) process scheduling algorithm which calculates time quantum as per the burst-time of processes already in ready queue. The experimental results and analysis of ERRBTQ algorithm clearly indicates the improved performance when compared with conventional RR and its variants.
A Robust Ultra-Low Voltage CPU Utilizing Timing-Error Prevention

OpenAIRE

Hiienkari, Markus; Teittinen, Jukka; Koskinen, Lauri; Turnquist, Matthew; Mäkipää, Jani; Rantala, Arto; Sopanen, Matti; Kaltiokallio, Mikko

2015-01-01

To minimize energy consumption of a digital circuit, logic can be operated at sub- or near-threshold voltage. Operation at this region is challenging due to device and environment variations, and resulting performance may not be adequate to all applications. This article presents two variants of a 32-bit RISC CPU targeted for near-threshold voltage. Both CPUs are placed on the same die and manufactured in 28 nm CMOS process. They employ timing-error prevention with clock stretching to enable ...
CPU and GPU (Cuda Template Matching Comparison

Directory of Open Access Journals (Sweden)

Evaldas Borcovas

2014-05-01

Full Text Available Image processing, computer vision or other complicated opticalinformation processing algorithms require large resources. It isoften desired to execute algorithms in real time. It is hard tofulfill such requirements with single CPU processor. NVidiaproposed CUDA technology enables programmer to use theGPU resources in the computer. Current research was madewith Intel Pentium Dual-Core T4500 2.3 GHz processor with4 GB RAM DDR3 (CPU I, NVidia GeForce GT320M CUDAcompliable graphics card (GPU I and Intel Core I5-2500K3.3 GHz processor with 4 GB RAM DDR3 (CPU II, NVidiaGeForce GTX 560 CUDA compatible graphic card (GPU II.Additional libraries as OpenCV 2.1 and OpenCV 2.4.0 CUDAcompliable were used for the testing. Main test were made withstandard function MatchTemplate from the OpenCV libraries.The algorithm uses a main image and a template. An influenceof these factors was tested. Main image and template have beenresized and the algorithm computing time and performancein Gtpix/s have been measured. According to the informationobtained from the research GPU computing using the hardwarementioned earlier is till 24 times faster when it is processing abig amount of information. When the images are small the performanceof CPU and GPU are not significantly different. Thechoice of the template size makes influence on calculating withCPU. Difference in the computing time between the GPUs canbe explained by the number of cores which they have.
Design improvement of FPGA and CPU based digital circuit cards to solve timing issues

International Nuclear Information System (INIS)

Lee, Dongil; Lee, Jaeki; Lee, Kwang-Hyun

2016-01-01

The digital circuit cards installed at NPPs (Nuclear Power Plant) are mostly composed of a CPU (Central Processing Unit) and a PLD (Programmable Logic Device; these include a FPGA (Field Programmable Gate Array) and a CPLD (Complex Programmable Logic Device)). This type of structure is typical and is maintained using digital circuit cards. There are no big problems with this device as a structure. In particular, signal delay causes a lot of problems when various IC (Integrated Circuit) and several circuit cards are connected to the BUS of the backplane in the BUS design. This paper suggests a structure to improve the BUS signal timing problems in a circuit card consisting of CPU and FPGA. Nowadays, as the structure of circuit cards has become complex and mass data at high speed is communicated through the BUS, data integrity is the most important issue. The conventional design does not consider delay and the synchronicity of signal and this causes many problems in data processing. In order to solve these problems, it is important to isolate the BUS controller from the CPU and maintain constancy of the signal delay by using a PLD
Design improvement of FPGA and CPU based digital circuit cards to solve timing issues

Energy Technology Data Exchange (ETDEWEB)

Lee, Dongil; Lee, Jaeki; Lee, Kwang-Hyun [KHNP CRI, Daejeon (Korea, Republic of)

2016-10-15

The digital circuit cards installed at NPPs (Nuclear Power Plant) are mostly composed of a CPU (Central Processing Unit) and a PLD (Programmable Logic Device; these include a FPGA (Field Programmable Gate Array) and a CPLD (Complex Programmable Logic Device)). This type of structure is typical and is maintained using digital circuit cards. There are no big problems with this device as a structure. In particular, signal delay causes a lot of problems when various IC (Integrated Circuit) and several circuit cards are connected to the BUS of the backplane in the BUS design. This paper suggests a structure to improve the BUS signal timing problems in a circuit card consisting of CPU and FPGA. Nowadays, as the structure of circuit cards has become complex and mass data at high speed is communicated through the BUS, data integrity is the most important issue. The conventional design does not consider delay and the synchronicity of signal and this causes many problems in data processing. In order to solve these problems, it is important to isolate the BUS controller from the CPU and maintain constancy of the signal delay by using a PLD.
Accelerating Spaceborne SAR Imaging Using Multiple CPU/GPU Deep Collaborative Computing

Directory of Open Access Journals (Sweden)

Fan Zhang

2016-04-01

Full Text Available With the development of synthetic aperture radar (SAR technologies in recent years, the huge amount of remote sensing data brings challenges for real-time imaging processing. Therefore, high performance computing (HPC methods have been presented to accelerate SAR imaging, especially the GPU based methods. In the classical GPU based imaging algorithm, GPU is employed to accelerate image processing by massive parallel computing, and CPU is only used to perform the auxiliary work such as data input/output (IO. However, the computing capability of CPU is ignored and underestimated. In this work, a new deep collaborative SAR imaging method based on multiple CPU/GPU is proposed to achieve real-time SAR imaging. Through the proposed tasks partitioning and scheduling strategy, the whole image can be generated with deep collaborative multiple CPU/GPU computing. In the part of CPU parallel imaging, the advanced vector extension (AVX method is firstly introduced into the multi-core CPU parallel method for higher efficiency. As for the GPU parallel imaging, not only the bottlenecks of memory limitation and frequent data transferring are broken, but also kinds of optimized strategies are applied, such as streaming, parallel pipeline and so on. Experimental results demonstrate that the deep CPU/GPU collaborative imaging method enhances the efficiency of SAR imaging on single-core CPU by 270 times and realizes the real-time imaging in that the imaging rate outperforms the raw data generation rate.
Accelerating Spaceborne SAR Imaging Using Multiple CPU/GPU Deep Collaborative Computing.

Science.gov (United States)

Zhang, Fan; Li, Guojun; Li, Wei; Hu, Wei; Hu, Yuxin

2016-04-07

With the development of synthetic aperture radar (SAR) technologies in recent years, the huge amount of remote sensing data brings challenges for real-time imaging processing. Therefore, high performance computing (HPC) methods have been presented to accelerate SAR imaging, especially the GPU based methods. In the classical GPU based imaging algorithm, GPU is employed to accelerate image processing by massive parallel computing, and CPU is only used to perform the auxiliary work such as data input/output (IO). However, the computing capability of CPU is ignored and underestimated. In this work, a new deep collaborative SAR imaging method based on multiple CPU/GPU is proposed to achieve real-time SAR imaging. Through the proposed tasks partitioning and scheduling strategy, the whole image can be generated with deep collaborative multiple CPU/GPU computing. In the part of CPU parallel imaging, the advanced vector extension (AVX) method is firstly introduced into the multi-core CPU parallel method for higher efficiency. As for the GPU parallel imaging, not only the bottlenecks of memory limitation and frequent data transferring are broken, but also kinds of optimized strategies are applied, such as streaming, parallel pipeline and so on. Experimental results demonstrate that the deep CPU/GPU collaborative imaging method enhances the efficiency of SAR imaging on single-core CPU by 270 times and realizes the real-time imaging in that the imaging rate outperforms the raw data generation rate.
STEM image simulation with hybrid CPU/GPU programming

International Nuclear Information System (INIS)

Yao, Y.; Ge, B.H.; Shen, X.; Wang, Y.G.; Yu, R.C.

2016-01-01

STEM image simulation is achieved via hybrid CPU/GPU programming under parallel algorithm architecture to speed up calculation on a personal computer (PC). To utilize the calculation power of a PC fully, the simulation is performed using the GPU core and multi-CPU cores at the same time to significantly improve efficiency. GaSb and an artificial GaSb/InAs interface with atom diffusion have been used to verify the computation. - Highlights: • STEM image simulation is achieved by hybrid CPU/GPU programming under parallel algorithm architecture to speed up the calculation in the personal computer (PC). • In order to fully utilize the calculation power of the PC, the simulation is performed by GPU core and multi-CPU cores at the same time so efficiency is improved significantly. • GaSb and artificial GaSb/InAs interface with atom diffusion have been used to verify the computation. The results reveal some unintuitive phenomena about the contrast variation with the atom numbers.
STEM image simulation with hybrid CPU/GPU programming

Energy Technology Data Exchange (ETDEWEB)

Yao, Y., E-mail: yaoyuan@iphy.ac.cn; Ge, B.H.; Shen, X.; Wang, Y.G.; Yu, R.C.

2016-07-15

STEM image simulation is achieved via hybrid CPU/GPU programming under parallel algorithm architecture to speed up calculation on a personal computer (PC). To utilize the calculation power of a PC fully, the simulation is performed using the GPU core and multi-CPU cores at the same time to significantly improve efficiency. GaSb and an artificial GaSb/InAs interface with atom diffusion have been used to verify the computation. - Highlights: • STEM image simulation is achieved by hybrid CPU/GPU programming under parallel algorithm architecture to speed up the calculation in the personal computer (PC). • In order to fully utilize the calculation power of the PC, the simulation is performed by GPU core and multi-CPU cores at the same time so efficiency is improved significantly. • GaSb and artificial GaSb/InAs interface with atom diffusion have been used to verify the computation. The results reveal some unintuitive phenomena about the contrast variation with the atom numbers.
A Robust Ultra-Low Voltage CPU Utilizing Timing-Error Prevention

Directory of Open Access Journals (Sweden)

Markus Hiienkari

2015-04-01

Full Text Available To minimize energy consumption of a digital circuit, logic can be operated at sub- or near-threshold voltage. Operation at this region is challenging due to device and environment variations, and resulting performance may not be adequate to all applications. This article presents two variants of a 32-bit RISC CPU targeted for near-threshold voltage. Both CPUs are placed on the same die and manufactured in 28 nm CMOS process. They employ timing-error prevention with clock stretching to enable operation with minimal safety margins while maximizing performance and energy efficiency at a given operating point. Measurements show minimum energy of 3.15 pJ/cyc at 400 mV, which corresponds to 39% energy saving compared to operation based on static signoff timing.
ITCA: Inter-Task Conflict-Aware CPU accounting for CMP

OpenAIRE

Luque, Carlos; Moreto Planas, Miquel; Cazorla Almeida, Francisco Javier; Gioiosa, Roberto; Valero Cortés, Mateo

2010-01-01

Chip-MultiProcessors (CMP) introduce complexities when accounting CPU utilization to processes because the progress done by a process during an interval of time highly depends on the activity of the other processes it is coscheduled with. We propose a new hardware CPU accounting mechanism to improve the accuracy when measuring the CPU utilization in CMPs and compare it with previous accounting mechanisms. Our results show that currently known mechanisms lead to a 16% average error when it com...
Using the CPU and GPU for real-time video enhancement on a mobile computer

CSIR Research Space (South Africa)

Bachoo, AK

2010-09-01

Full Text Available . In this paper, the current advances in mobile CPU and GPU hardware are used to implement video enhancement algorithms in a new way on a mobile computer. Both the CPU and GPU are used effectively to achieve realtime performance for complex image enhancement...
An Investigation of the Performance of the Colored Gauss-Seidel Solver on CPU and GPU

International Nuclear Information System (INIS)

Yoon, Jong Seon; Choi, Hyoung Gwon; Jeon, Byoung Jin

2017-01-01

The performance of the colored Gauss–Seidel solver on CPU and GPU was investigated for the two- and three-dimensional heat conduction problems by using different mesh sizes. The heat conduction equation was discretized by the finite difference method and finite element method. The CPU yielded good performance for small problems but deteriorated when the total memory required for computing was larger than the cache memory for large problems. In contrast, the GPU performed better as the mesh size increased because of the latency hiding technique. Further, GPU computation by the colored Gauss–Siedel solver was approximately 7 times that by the single CPU. Furthermore, the colored Gauss–Seidel solver was found to be approximately twice that of the Jacobi solver when parallel computing was conducted on the GPU.
An Investigation of the Performance of the Colored Gauss-Seidel Solver on CPU and GPU

Energy Technology Data Exchange (ETDEWEB)

Yoon, Jong Seon; Choi, Hyoung Gwon [Seoul Nat’l Univ. of Science and Technology, Seoul (Korea, Republic of); Jeon, Byoung Jin [Yonsei Univ., Seoul (Korea, Republic of)

2017-02-15

The performance of the colored Gauss–Seidel solver on CPU and GPU was investigated for the two- and three-dimensional heat conduction problems by using different mesh sizes. The heat conduction equation was discretized by the finite difference method and finite element method. The CPU yielded good performance for small problems but deteriorated when the total memory required for computing was larger than the cache memory for large problems. In contrast, the GPU performed better as the mesh size increased because of the latency hiding technique. Further, GPU computation by the colored Gauss–Siedel solver was approximately 7 times that by the single CPU. Furthermore, the colored Gauss–Seidel solver was found to be approximately twice that of the Jacobi solver when parallel computing was conducted on the GPU.
Energy consumption optimization of the total-FETI solver by changing the CPU frequency

Science.gov (United States)

Horak, David; Riha, Lubomir; Sojka, Radim; Kruzik, Jakub; Beseda, Martin; Cermak, Martin; Schuchart, Joseph

2017-07-01

The energy consumption of supercomputers is one of the critical problems for the upcoming Exascale supercomputing era. The awareness of power and energy consumption is required on both software and hardware side. This paper deals with the energy consumption evaluation of the Finite Element Tearing and Interconnect (FETI) based solvers of linear systems, which is an established method for solving real-world engineering problems. We have evaluated the effect of the CPU frequency on the energy consumption of the FETI solver using a linear elasticity 3D cube synthetic benchmark. In this problem, we have evaluated the effect of frequency tuning on the energy consumption of the essential processing kernels of the FETI method. The paper provides results for two types of frequency tuning: (1) static tuning and (2) dynamic tuning. For static tuning experiments, the frequency is set before execution and kept constant during the runtime. For dynamic tuning, the frequency is changed during the program execution to adapt the system to the actual needs of the application. The paper shows that static tuning brings up 12% energy savings when compared to default CPU settings (the highest clock rate). The dynamic tuning improves this further by up to 3%.
ITCA: Inter-Task Conflict-Aware CPU accounting for CMPs

OpenAIRE

Luque, Carlos; Moreto Planas, Miquel; Cazorla, Francisco; Gioiosa, Roberto; Buyuktosunoglu, Alper; Valero Cortés, Mateo

2009-01-01

Chip-MultiProcessor (CMP) architectures are becoming more and more popular as an alternative to the traditional processors that only extract instruction-level parallelism from an application. CMPs introduce complexities when accounting CPU utilization. This is due to the fact that the progress done by an application during an interval of time highly depends on the activity of the other applications it is co-scheduled with. In this paper, we identify how an inaccurate measurement of the CPU ut...
Evaluation of the CPU time for solving the radiative transfer equation with high-order resolution schemes applying the normalized weighting-factor method

Science.gov (United States)

Xamán, J.; Zavala-Guillén, I.; Hernández-López, I.; Uriarte-Flores, J.; Hernández-Pérez, I.; Macías-Melo, E. V.; Aguilar-Castro, K. M.

2018-03-01

In this paper, we evaluated the convergence rate (CPU time) of a new mathematical formulation for the numerical solution of the radiative transfer equation (RTE) with several High-Order (HO) and High-Resolution (HR) schemes. In computational fluid dynamics, this procedure is known as the Normalized Weighting-Factor (NWF) method and it is adopted here. The NWF method is used to incorporate the high-order resolution schemes in the discretized RTE. The NWF method is compared, in terms of computer time needed to obtain a converged solution, with the widely used deferred-correction (DC) technique for the calculations of a two-dimensional cavity with emitting-absorbing-scattering gray media using the discrete ordinates method. Six parameters, viz. the grid size, the order of quadrature, the absorption coefficient, the emissivity of the boundary surface, the under-relaxation factor, and the scattering albedo are considered to evaluate ten schemes. The results showed that using the DC method, in general, the scheme that had the lowest CPU time is the SOU. In contrast, with the results of theDC procedure the CPU time for DIAMOND and QUICK schemes using the NWF method is shown to be, between the 3.8 and 23.1% faster and 12.6 and 56.1% faster, respectively. However, the other schemes are more time consuming when theNWFis used instead of the DC method. Additionally, a second test case was presented and the results showed that depending on the problem under consideration, the NWF procedure may be computationally faster or slower that the DC method. As an example, the CPU time for QUICK and SMART schemes are 61.8 and 203.7%, respectively, slower when the NWF formulation is used for the second test case. Finally, future researches to explore the computational cost of the NWF method in more complex problems are required.

A Hybrid CPU/GPU Pattern-Matching Algorithm for Deep Packet Inspection.

Directory of Open Access Journals (Sweden)

Chun-Liang Lee

Full Text Available The large quantities of data now being transferred via high-speed networks have made deep packet inspection indispensable for security purposes. Scalable and low-cost signature-based network intrusion detection systems have been developed for deep packet inspection for various software platforms. Traditional approaches that only involve central processing units (CPUs are now considered inadequate in terms of inspection speed. Graphic processing units (GPUs have superior parallel processing power, but transmission bottlenecks can reduce optimal GPU efficiency. In this paper we describe our proposal for a hybrid CPU/GPU pattern-matching algorithm (HPMA that divides and distributes the packet-inspecting workload between a CPU and GPU. All packets are initially inspected by the CPU and filtered using a simple pre-filtering algorithm, and packets that might contain malicious content are sent to the GPU for further inspection. Test results indicate that in terms of random payload traffic, the matching speed of our proposed algorithm was 3.4 times and 2.7 times faster than those of the AC-CPU and AC-GPU algorithms, respectively. Further, HPMA achieved higher energy efficiency than the other tested algorithms.
Event- and Time-Driven Techniques Using Parallel CPU-GPU Co-processing for Spiking Neural Networks.

Science.gov (United States)

Naveros, Francisco; Garrido, Jesus A; Carrillo, Richard R; Ros, Eduardo; Luque, Niceto R

2017-01-01

Modeling and simulating the neural structures which make up our central neural system is instrumental for deciphering the computational neural cues beneath. Higher levels of biological plausibility usually impose higher levels of complexity in mathematical modeling, from neural to behavioral levels. This paper focuses on overcoming the simulation problems (accuracy and performance) derived from using higher levels of mathematical complexity at a neural level. This study proposes different techniques for simulating neural models that hold incremental levels of mathematical complexity: leaky integrate-and-fire (LIF), adaptive exponential integrate-and-fire (AdEx), and Hodgkin-Huxley (HH) neural models (ranged from low to high neural complexity). The studied techniques are classified into two main families depending on how the neural-model dynamic evaluation is computed: the event-driven or the time-driven families. Whilst event-driven techniques pre-compile and store the neural dynamics within look-up tables, time-driven techniques compute the neural dynamics iteratively during the simulation time. We propose two modifications for the event-driven family: a look-up table recombination to better cope with the incremental neural complexity together with a better handling of the synchronous input activity. Regarding the time-driven family, we propose a modification in computing the neural dynamics: the bi-fixed-step integration method. This method automatically adjusts the simulation step size to better cope with the stiffness of the neural model dynamics running in CPU platforms. One version of this method is also implemented for hybrid CPU-GPU platforms. Finally, we analyze how the performance and accuracy of these modifications evolve with increasing levels of neural complexity. We also demonstrate how the proposed modifications which constitute the main contribution of this study systematically outperform the traditional event- and time-driven techniques under
Thermoelectric mini cooler coupled with micro thermosiphon for CPU cooling system

International Nuclear Information System (INIS)

Liu, Di; Zhao, Fu-Yun; Yang, Hong-Xing; Tang, Guang-Fa

2015-01-01

In the present study, a thermoelectric mini cooler coupling with a micro thermosiphon cooling system has been proposed for the purpose of CPU cooling. A mathematical model of heat transfer, depending on one-dimensional treatment of thermal and electric power, is firstly established for the thermoelectric module. Analytical results demonstrate the relationship between the maximal COP (Coefficient of Performance) and Q c with the figure of merit. Full-scale experiments have been conducted to investigate the effect of thermoelectric operating voltage, power input of heat source, and thermoelectric module number on the performance of the cooling system. Experimental results indicated that the cooling production increases with promotion of thermoelectric operating voltage. Surface temperature of CPU heat source linearly increases with increasing of power input, and its maximum value reached 70 °C as the prototype CPU power input was equivalent to 84 W. Insulation between air and heat source surface can prevent the condensate water due to low surface temperature. In addition, thermal performance of this cooling system could be enhanced when the total dimension of thermoelectric module matched well with the dimension of CPU. This research could benefit the design of thermal dissipation of electronic chips and CPU units. - Highlights: • A cooling system coupled with thermoelectric module and loop thermosiphon is developed. • Thermoelectric module coupled with loop thermosiphon can achieve high heat-transfer efficiency. • A mathematical model of thermoelectric cooling is built. • An analysis of modeling results for design and experimental data are presented. • Influence of power input and operating voltage on the cooling system are researched
An FPGA Based Multiprocessing CPU for Beam Synchronous Timing in CERN's SPS and LHC

CERN Document Server

Ballester, F J; Gras, J J; Lewis, J; Savioz, J J; Serrano, J

2003-01-01

The Beam Synchronous Timing system (BST) will be used around the LHC and its injector, the SPS, to broadcast timing meassages and synchronize actions with the beam in different receivers. To achieve beam synchronization, the BST Master card encodes messages using the bunch clock, with a nominal value of 40.079 MHz for the LHC. These messages are produced by a set of tasks every revolution period, which is every 89 us for the LHC and every 23 us for the SPS, therefore imposing a hard real-time constraint on the system. To achieve determinism, the BST Master uses a dedicated CPU inside its main Field Programmable Gate Array (FPGA) featuring zero-delay hardware task switching and a reduced instruction set. This paper describes the BST Master card, stressing the main FPGA design, as well as the associated software, including the LynxOS driver and the tailor-made assembler.
Improving the Performance of CPU Architectures by Reducing the Operating System Overhead (Extended Version

Directory of Open Access Journals (Sweden)

Zagan Ionel

2016-07-01

Full Text Available The predictable CPU architectures that run hard real-time tasks must be executed with isolation in order to provide a timing-analyzable execution for real-time systems. The major problems for real-time operating systems are determined by an excessive jitter, introduced mainly through task switching. This can alter deadline requirements, and, consequently, the predictability of hard real-time tasks. New requirements also arise for a real-time operating system used in mixed-criticality systems, when the executions of hard real-time applications require timing predictability. The present article discusses several solutions to improve the performance of CPU architectures and eventually overcome the Operating Systems overhead inconveniences. This paper focuses on the innovative CPU implementation named nMPRA-MT, designed for small real-time applications. This implementation uses the replication and remapping techniques for the program counter, general purpose registers and pipeline registers, enabling multiple threads to share a single pipeline assembly line. In order to increase predictability, the proposed architecture partially removes the hazard situation at the expense of larger execution latency per one instruction.
A high performance image processing platform based on CPU-GPU heterogeneous cluster with parallel image reconstroctions for micro-CT

International Nuclear Information System (INIS)

Ding Yu; Qi Yujin; Zhang Xuezhu; Zhao Cuilan

2011-01-01

In this paper, we report the development of a high-performance image processing platform, which is based on CPU-GPU heterogeneous cluster. Currently, it consists of a Dell Precision T7500 and HP XW8600 workstations with parallel programming and runtime environment, using the message-passing interface (MPI) and CUDA (Compute Unified Device Architecture). We succeeded in developing parallel image processing techniques for 3D image reconstruction of X-ray micro-CT imaging. The results show that a GPU provides a computing efficiency of about 194 times faster than a single CPU, and the CPU-GPU clusters provides a computing efficiency of about 46 times faster than the CPU clusters. These meet the requirements of rapid 3D image reconstruction and real time image display. In conclusion, the use of CPU-GPU heterogeneous cluster is an effective way to build high-performance image processing platform. (authors)
Online performance evaluation of RAID 5 using CPU utilization

Science.gov (United States)

Jin, Hai; Yang, Hua; Zhang, Jiangling

1998-09-01

Redundant arrays of independent disks (RAID) technology is the efficient way to solve the bottleneck problem between CPU processing ability and I/O subsystem. For the system point of view, the most important metric of on line performance is the utilization of CPU. This paper first employs the way to calculate the CPU utilization of system connected with RAID level 5 using statistic average method. From the simulation results of CPU utilization of system connected with RAID level 5 subsystem can we see that using multiple disks as an array to access data in parallel is the efficient way to enhance the on-line performance of disk storage system. USing high-end disk drivers to compose the disk array is the key to enhance the on-line performance of system.
Heterogeneous CPU-GPU moving targets detection for UAV video

Science.gov (United States)

Li, Maowen; Tang, Linbo; Han, Yuqi; Yu, Chunlei; Zhang, Chao; Fu, Huiquan

2017-07-01

Moving targets detection is gaining popularity in civilian and military applications. On some monitoring platform of motion detection, some low-resolution stationary cameras are replaced by moving HD camera based on UAVs. The pixels of moving targets in the HD Video taken by UAV are always in a minority, and the background of the frame is usually moving because of the motion of UAVs. The high computational cost of the algorithm prevents running it at higher resolutions the pixels of frame. Hence, to solve the problem of moving targets detection based UAVs video, we propose a heterogeneous CPU-GPU moving target detection algorithm for UAV video. More specifically, we use background registration to eliminate the impact of the moving background and frame difference to detect small moving targets. In order to achieve the effect of real-time processing, we design the solution of heterogeneous CPU-GPU framework for our method. The experimental results show that our method can detect the main moving targets from the HD video taken by UAV, and the average process time is 52.16ms per frame which is fast enough to solve the problem.
The Effect of NUMA Tunings on CPU Performance

Science.gov (United States)

Hollowell, Christopher; Caramarcu, Costin; Strecker-Kellogg, William; Wong, Antonio; Zaytsev, Alexandr

2015-12-01

Non-Uniform Memory Access (NUMA) is a memory architecture for symmetric multiprocessing (SMP) systems where each processor is directly connected to separate memory. Indirect access to other CPU's (remote) RAM is still possible, but such requests are slower as they must also pass through that memory's controlling CPU. In concert with a NUMA-aware operating system, the NUMA hardware architecture can help eliminate the memory performance reductions generally seen in SMP systems when multiple processors simultaneously attempt to access memory. The x86 CPU architecture has supported NUMA for a number of years. Modern operating systems such as Linux support NUMA-aware scheduling, where the OS attempts to schedule a process to the CPU directly attached to the majority of its RAM. In Linux, it is possible to further manually tune the NUMA subsystem using the numactl utility. With the release of Red Hat Enterprise Linux (RHEL) 6.3, the numad daemon became available in this distribution. This daemon monitors a system's NUMA topology and utilization, and automatically makes adjustments to optimize locality. As the number of cores in x86 servers continues to grow, efficient NUMA mappings of processes to CPUs/memory will become increasingly important. This paper gives a brief overview of NUMA, and discusses the effects of manual tunings and numad on the performance of the HEPSPEC06 benchmark, and ATLAS software.
The Effect of NUMA Tunings on CPU Performance

International Nuclear Information System (INIS)

Hollowell, Christopher; Caramarcu, Costin; Strecker-Kellogg, William; Wong, Antonio; Zaytsev, Alexandr

2015-01-01

Non-Uniform Memory Access (NUMA) is a memory architecture for symmetric multiprocessing (SMP) systems where each processor is directly connected to separate memory. Indirect access to other CPU's (remote) RAM is still possible, but such requests are slower as they must also pass through that memory's controlling CPU. In concert with a NUMA-aware operating system, the NUMA hardware architecture can help eliminate the memory performance reductions generally seen in SMP systems when multiple processors simultaneously attempt to access memory.The x86 CPU architecture has supported NUMA for a number of years. Modern operating systems such as Linux support NUMA-aware scheduling, where the OS attempts to schedule a process to the CPU directly attached to the majority of its RAM. In Linux, it is possible to further manually tune the NUMA subsystem using the numactl utility. With the release of Red Hat Enterprise Linux (RHEL) 6.3, the numad daemon became available in this distribution. This daemon monitors a system's NUMA topology and utilization, and automatically makes adjustments to optimize locality.As the number of cores in x86 servers continues to grow, efficient NUMA mappings of processes to CPUs/memory will become increasingly important. This paper gives a brief overview of NUMA, and discusses the effects of manual tunings and numad on the performance of the HEPSPEC06 benchmark, and ATLAS software. (paper)
GeantV: from CPU to accelerators

Science.gov (United States)

Amadio, G.; Ananya, A.; Apostolakis, J.; Arora, A.; Bandieramonte, M.; Bhattacharyya, A.; Bianchini, C.; Brun, R.; Canal, P.; Carminati, F.; Duhem, L.; Elvira, D.; Gheata, A.; Gheata, M.; Goulas, I.; Iope, R.; Jun, S.; Lima, G.; Mohanty, A.; Nikitina, T.; Novak, M.; Pokorski, W.; Ribon, A.; Sehgal, R.; Shadura, O.; Vallecorsa, S.; Wenzel, S.; Zhang, Y.

2016-10-01

The GeantV project aims to research and develop the next-generation simulation software describing the passage of particles through matter. While the modern CPU architectures are being targeted first, resources such as GPGPU, Intel© Xeon Phi, Atom or ARM cannot be ignored anymore by HEP CPU-bound applications. The proof of concept GeantV prototype has been mainly engineered for CPU's having vector units but we have foreseen from early stages a bridge to arbitrary accelerators. A software layer consisting of architecture/technology specific backends supports currently this concept. This approach allows to abstract out the basic types such as scalar/vector but also to formalize generic computation kernels using transparently library or device specific constructs based on Vc, CUDA, Cilk+ or Intel intrinsics. While the main goal of this approach is portable performance, as a bonus, it comes with the insulation of the core application and algorithms from the technology layer. This allows our application to be long term maintainable and versatile to changes at the backend side. The paper presents the first results of basket-based GeantV geometry navigation on the Intel© Xeon Phi KNC architecture. We present the scalability and vectorization study, conducted using Intel performance tools, as well as our preliminary conclusions on the use of accelerators for GeantV transport. We also describe the current work and preliminary results for using the GeantV transport kernel on GPUs.
GeantV: from CPU to accelerators

International Nuclear Information System (INIS)

Amadio, G; Bianchini, C; Iope, R; Ananya, A; Arora, A; Apostolakis, J; Bandieramonte, M; Brun, R; Carminati, F; Gheata, A; Gheata, M; Goulas, I; Nikitina, T; Bhattacharyya, A; Mohanty, A; Canal, P; Elvira, D; Jun, S; Lima, G; Duhem, L

2016-01-01

The GeantV project aims to research and develop the next-generation simulation software describing the passage of particles through matter. While the modern CPU architectures are being targeted first, resources such as GPGPU, Intel© Xeon Phi, Atom or ARM cannot be ignored anymore by HEP CPU-bound applications. The proof of concept GeantV prototype has been mainly engineered for CPU's having vector units but we have foreseen from early stages a bridge to arbitrary accelerators. A software layer consisting of architecture/technology specific backends supports currently this concept. This approach allows to abstract out the basic types such as scalar/vector but also to formalize generic computation kernels using transparently library or device specific constructs based on Vc, CUDA, Cilk+ or Intel intrinsics. While the main goal of this approach is portable performance, as a bonus, it comes with the insulation of the core application and algorithms from the technology layer. This allows our application to be long term maintainable and versatile to changes at the backend side. The paper presents the first results of basket-based GeantV geometry navigation on the Intel© Xeon Phi KNC architecture. We present the scalability and vectorization study, conducted using Intel performance tools, as well as our preliminary conclusions on the use of accelerators for GeantV transport. We also describe the current work and preliminary results for using the GeantV transport kernel on GPUs. (paper)
Performance analysis of the FDTD method applied to holographic volume gratings: Multi-core CPU versus GPU computing

Science.gov (United States)

Francés, J.; Bleda, S.; Neipp, C.; Márquez, A.; Pascual, I.; Beléndez, A.

2013-03-01

The finite-difference time-domain method (FDTD) allows electromagnetic field distribution analysis as a function of time and space. The method is applied to analyze holographic volume gratings (HVGs) for the near-field distribution at optical wavelengths. Usually, this application requires the simulation of wide areas, which implies more memory and time processing. In this work, we propose a specific implementation of the FDTD method including several add-ons for a precise simulation of optical diffractive elements. Values in the near-field region are computed considering the illumination of the grating by means of a plane wave for different angles of incidence and including absorbing boundaries as well. We compare the results obtained by FDTD with those obtained using a matrix method (MM) applied to diffraction gratings. In addition, we have developed two optimized versions of the algorithm, for both CPU and GPU, in order to analyze the improvement of using the new NVIDIA Fermi GPU architecture versus highly tuned multi-core CPU as a function of the size simulation. In particular, the optimized CPU implementation takes advantage of the arithmetic and data transfer streaming SIMD (single instruction multiple data) extensions (SSE) included explicitly in the code and also of multi-threading by means of OpenMP directives. A good agreement between the results obtained using both FDTD and MM methods is obtained, thus validating our methodology. Moreover, the performance of the GPU is compared to the SSE+OpenMP CPU implementation, and it is quantitatively determined that a highly optimized CPU program can be competitive for a wider range of simulation sizes, whereas GPU computing becomes more powerful for large-scale simulations.
Parallelized computation for computer simulation of electrocardiograms using personal computers with multi-core CPU and general-purpose GPU.

Science.gov (United States)

Shen, Wenfeng; Wei, Daming; Xu, Weimin; Zhu, Xin; Yuan, Shizhong

2010-10-01

Biological computations like electrocardiological modelling and simulation usually require high-performance computing environments. This paper introduces an implementation of parallel computation for computer simulation of electrocardiograms (ECGs) in a personal computer environment with an Intel CPU of Core (TM) 2 Quad Q6600 and a GPU of Geforce 8800GT, with software support by OpenMP and CUDA. It was tested in three parallelization device setups: (a) a four-core CPU without a general-purpose GPU, (b) a general-purpose GPU plus 1 core of CPU, and (c) a four-core CPU plus a general-purpose GPU. To effectively take advantage of a multi-core CPU and a general-purpose GPU, an algorithm based on load-prediction dynamic scheduling was developed and applied to setting (c). In the simulation with 1600 time steps, the speedup of the parallel computation as compared to the serial computation was 3.9 in setting (a), 16.8 in setting (b), and 20.0 in setting (c). This study demonstrates that a current PC with a multi-core CPU and a general-purpose GPU provides a good environment for parallel computations in biological modelling and simulation studies. Copyright 2010 Elsevier Ireland Ltd. All rights reserved.
Reconstruction of the neutron spectrum using an artificial neural network in CPU and GPU; Reconstruccion del espectro de neutrones usando una red neuronal artificial (RNA) en CPU y GPU

Energy Technology Data Exchange (ETDEWEB)

Hernandez D, V. M.; Moreno M, A.; Ortiz L, M. A. [Universidad de Cordoba, 14002 Cordoba (Spain); Vega C, H. R.; Alonso M, O. E., E-mail: vic.mc68010@gmail.com [Universidad Autonoma de Zacatecas, 98000 Zacatecas, Zac. (Mexico)

2016-10-15

The increase in computing power in personal computers has been increasing, computers now have several processors in the CPU and in addition multiple CUDA cores in the graphics processing unit (GPU); both systems can be used individually or combined to perform scientific computation without resorting to processor or supercomputing arrangements. The Bonner sphere spectrometer is the most commonly used multi-element system for neutron detection purposes and its associated spectrum. Each sphere-detector combination gives a particular response that depends on the energy of the neutrons, and the total set of these responses is known like the responses matrix Rφ(E). Thus, the counting rates obtained with each sphere and the neutron spectrum is related to the Fredholm equation in its discrete version. For the reconstruction of the spectrum has a system of poorly conditioned equations with an infinite number of solutions and to find the appropriate solution, it has been proposed the use of artificial intelligence through neural networks with different platforms CPU and GPU. (Author)
Research on the Prediction Model of CPU Utilization Based on ARIMA-BP Neural Network

Directory of Open Access Journals (Sweden)

Wang Jina

2016-01-01

Full Text Available The dynamic deployment technology of the virtual machine is one of the current cloud computing research focuses. The traditional methods mainly work after the degradation of the service performance that usually lag. To solve the problem a new prediction model based on the CPU utilization is constructed in this paper. A reference offered by the new prediction model of the CPU utilization is provided to the VM dynamic deployment process which will speed to finish the deployment process before the degradation of the service performance. By this method it not only ensure the quality of services but also improve the server performance and resource utilization. The new prediction method of the CPU utilization based on the ARIMA-BP neural network mainly include four parts: preprocess the collected data, build the predictive model of ARIMA-BP neural network, modify the nonlinear residuals of the time series by the BP prediction algorithm and obtain the prediction results by analyzing the above data comprehensively.
Semiempirical Quantum Chemical Calculations Accelerated on a Hybrid Multicore CPU-GPU Computing Platform.

Science.gov (United States)

Wu, Xin; Koslowski, Axel; Thiel, Walter

2012-07-10

In this work, we demonstrate that semiempirical quantum chemical calculations can be accelerated significantly by leveraging the graphics processing unit (GPU) as a coprocessor on a hybrid multicore CPU-GPU computing platform. Semiempirical calculations using the MNDO, AM1, PM3, OM1, OM2, and OM3 model Hamiltonians were systematically profiled for three types of test systems (fullerenes, water clusters, and solvated crambin) to identify the most time-consuming sections of the code. The corresponding routines were ported to the GPU and optimized employing both existing library functions and a GPU kernel that carries out a sequence of noniterative Jacobi transformations during pseudodiagonalization. The overall computation times for single-point energy calculations and geometry optimizations of large molecules were reduced by one order of magnitude for all methods, as compared to runs on a single CPU core.
Designing of Vague Logic Based 2-Layered Framework for CPU Scheduler

Directory of Open Access Journals (Sweden)

Supriya Raheja

2016-01-01

Full Text Available Fuzzy based CPU scheduler has become of great interest by operating system because of its ability to handle imprecise information associated with task. This paper introduces an extension to the fuzzy based round robin scheduler to a Vague Logic Based Round Robin (VBRR scheduler. VBRR scheduler works on 2-layered framework. At the first layer, scheduler has a vague inference system which has the ability to handle the impreciseness of task using vague logic. At the second layer, Vague Logic Based Round Robin (VBRR scheduling algorithm works to schedule the tasks. VBRR scheduler has the learning capability based on which scheduler adapts intelligently an optimum length for time quantum. An optimum time quantum reduces the overhead on scheduler by reducing the unnecessary context switches which lead to improve the overall performance of system. The work is simulated using MATLAB and compared with the conventional round robin scheduler and the other two fuzzy based approaches to CPU scheduler. Given simulation analysis and results prove the effectiveness and efficiency of VBRR scheduler.
Conserved-peptide upstream open reading frames (CPuORFs are associated with regulatory genes in angiosperms

Directory of Open Access Journals (Sweden)

Richard A Jorgensen

2012-08-01

Full Text Available Upstream open reading frames (uORFs are common in eukaryotic transcripts, but those that encode conserved peptides (CPuORFs occur in less than 1% of transcripts. The peptides encoded by three plant CPuORF families are known to control translation of the downstream ORF in response to a small signal molecule (sucrose, polyamines and phosphocholine. In flowering plants, transcription factors are statistically over-represented among genes that possess CPuORFs, and in general it appeared that many CPuORF genes also had other regulatory functions, though the significance of this suggestion was uncertain (Hayden and Jorgensen, 2007. Five years later the literature provides much more information on the functions of many CPuORF genes. Here we reassess the functions of 27 known CPuORF gene families and find that 22 of these families play a variety of different regulatory roles, from transcriptional control to protein turnover, and from small signal molecules to signal transduction kinases. Clearly then, there is indeed a strong association of CPuORFs with regulatory genes. In addition, 16 of these families play key roles in a variety of different biological processes. Most strikingly, the core sucrose response network includes three different CPuORFs, creating the potential for sophisticated balancing of the network in response to three different molecular inputs. We propose that the function of most CPuORFs is to modulate translation of a downstream major ORF (mORF in response to a signal molecule recognized by the conserved peptide and that because the mORFs of CPuORF genes generally encode regulatory proteins, many of them centrally important in the biology of plants, CPuORFs play key roles in balancing such regulatory networks.
Reconstruction of the neutron spectrum using an artificial neural network in CPU and GPU

International Nuclear Information System (INIS)

Hernandez D, V. M.; Moreno M, A.; Ortiz L, M. A.; Vega C, H. R.; Alonso M, O. E.

2016-10-01

The increase in computing power in personal computers has been increasing, computers now have several processors in the CPU and in addition multiple CUDA cores in the graphics processing unit (GPU); both systems can be used individually or combined to perform scientific computation without resorting to processor or supercomputing arrangements. The Bonner sphere spectrometer is the most commonly used multi-element system for neutron detection purposes and its associated spectrum. Each sphere-detector combination gives a particular response that depends on the energy of the neutrons, and the total set of these responses is known like the responses matrix Rφ(E). Thus, the counting rates obtained with each sphere and the neutron spectrum is related to the Fredholm equation in its discrete version. For the reconstruction of the spectrum has a system of poorly conditioned equations with an infinite number of solutions and to find the appropriate solution, it has been proposed the use of artificial intelligence through neural networks with different platforms CPU and GPU. (Author)

Accelerating Smith-Waterman Alignment for Protein Database Search Using Frequency Distance Filtration Scheme Based on CPU-GPU Collaborative System

Directory of Open Access Journals (Sweden)

Yu Liu

2015-01-01

Full Text Available The Smith-Waterman (SW algorithm has been widely utilized for searching biological sequence databases in bioinformatics. Recently, several works have adopted the graphic card with Graphic Processing Units (GPUs and their associated CUDA model to enhance the performance of SW computations. However, these works mainly focused on the protein database search by using the intertask parallelization technique, and only using the GPU capability to do the SW computations one by one. Hence, in this paper, we will propose an efficient SW alignment method, called CUDA-SWfr, for the protein database search by using the intratask parallelization technique based on a CPU-GPU collaborative system. Before doing the SW computations on GPU, a procedure is applied on CPU by using the frequency distance filtration scheme (FDFS to eliminate the unnecessary alignments. The experimental results indicate that CUDA-SWfr runs 9.6 times and 96 times faster than the CPU-based SW method without and with FDFS, respectively.
Accelerating Smith-Waterman Alignment for Protein Database Search Using Frequency Distance Filtration Scheme Based on CPU-GPU Collaborative System.

Science.gov (United States)

Liu, Yu; Hong, Yang; Lin, Chun-Yuan; Hung, Che-Lun

2015-01-01

The Smith-Waterman (SW) algorithm has been widely utilized for searching biological sequence databases in bioinformatics. Recently, several works have adopted the graphic card with Graphic Processing Units (GPUs) and their associated CUDA model to enhance the performance of SW computations. However, these works mainly focused on the protein database search by using the intertask parallelization technique, and only using the GPU capability to do the SW computations one by one. Hence, in this paper, we will propose an efficient SW alignment method, called CUDA-SWfr, for the protein database search by using the intratask parallelization technique based on a CPU-GPU collaborative system. Before doing the SW computations on GPU, a procedure is applied on CPU by using the frequency distance filtration scheme (FDFS) to eliminate the unnecessary alignments. The experimental results indicate that CUDA-SWfr runs 9.6 times and 96 times faster than the CPU-based SW method without and with FDFS, respectively.
Performance of the OVERFLOW-MLP and LAURA-MLP CFD Codes on the NASA Ames 512 CPU Origin System

Science.gov (United States)

Taft, James R.

2000-01-01

aircraft are routinely undertaken. Typical large problems might require 100s of Cray C90 CPU hours to complete. The dramatic performance gains with the 256 CPU steger system are exciting. Obtaining results in hours instead of months is revolutionizing the way in which aircraft manufacturers are looking at future aircraft simulation work. Figure 2 below is a current state of the art plot of OVERFLOW-MLP performance on the 512 CPU Lomax system. As can be seen, the chart indicates that OVERFLOW-MLP continues to scale linearly with CPU count up to 512 CPUs on a large 35 million point full aircraft RANS simulation. At this point performance is such that a fully converged simulation of 2500 time steps is completed in less than 2 hours of elapsed time. Further work over the next few weeks will improve the performance of this code even further.The LAURA code has been converted to the MLP format as well. This code is currently being optimized for the 512 CPU system. Performance statistics indicate that the goal of 100 GFLOP/s will be achieved by year's end. This amounts to 20x the 16 CPU C90 result and strongly demonstrates the viability of the new parallel systems rapidly solving very large simulations in a production environment.
Length-Bounded Hybrid CPU/GPU Pattern Matching Algorithm for Deep Packet Inspection

Directory of Open Access Journals (Sweden)

Yi-Shan Lin

2017-01-01

Full Text Available Since frequent communication between applications takes place in high speed networks, deep packet inspection (DPI plays an important role in the network application awareness. The signature-based network intrusion detection system (NIDS contains a DPI technique that examines the incoming packet payloads by employing a pattern matching algorithm that dominates the overall inspection performance. Existing studies focused on implementing efficient pattern matching algorithms by parallel programming on software platforms because of the advantages of lower cost and higher scalability. Either the central processing unit (CPU or the graphic processing unit (GPU were involved. Our studies focused on designing a pattern matching algorithm based on the cooperation between both CPU and GPU. In this paper, we present an enhanced design for our previous work, a length-bounded hybrid CPU/GPU pattern matching algorithm (LHPMA. In the preliminary experiment, the performance and comparison with the previous work are displayed, and the experimental results show that the LHPMA can achieve not only effective CPU/GPU cooperation but also higher throughput than the previous method.
Interactive dose shaping - efficient strategies for CPU-based real-time treatment planning

International Nuclear Information System (INIS)

Ziegenhein, P; Kamerling, C P; Oelfke, U

2014-01-01

Conventional intensity modulated radiation therapy (IMRT) treatment planning is based on the traditional concept of iterative optimization using an objective function specified by dose volume histogram constraints for pre-segmented VOIs. This indirect approach suffers from unavoidable shortcomings: i) The control of local dose features is limited to segmented VOIs. ii) Any objective function is a mathematical measure of the plan quality, i.e., is not able to define the clinically optimal treatment plan. iii) Adapting an existing plan to changed patient anatomy as detected by IGRT procedures is difficult. To overcome these shortcomings, we introduce the method of Interactive Dose Shaping (IDS) as a new paradigm for IMRT treatment planning. IDS allows for a direct and interactive manipulation of local dose features in real-time. The key element driving the IDS process is a two-step Dose Modification and Recovery (DMR) strategy: A local dose modification is initiated by the user which translates into modified fluence patterns. This also affects existing desired dose features elsewhere which is compensated by a heuristic recovery process. The IDS paradigm was implemented together with a CPU-based ultra-fast dose calculation and a 3D GUI for dose manipulation and visualization. A local dose feature can be implemented via the DMR strategy within 1-2 seconds. By imposing a series of local dose features, equal plan qualities could be achieved compared to conventional planning for prostate and head and neck cases within 1-2 minutes. The idea of Interactive Dose Shaping for treatment planning has been introduced and first applications of this concept have been realized.
The CMSSW benchmarking suite: Using HEP code to measure CPU performance

International Nuclear Information System (INIS)

Benelli, G

2010-01-01

The demanding computing needs of the CMS experiment require thoughtful planning and management of its computing infrastructure. A key factor in this process is the use of realistic benchmarks when assessing the computing power of the different architectures available. In recent years a discrepancy has been observed between the CPU performance estimates given by the reference benchmark for HEP computing (SPECint) and actual performances of HEP code. Making use of the CPU performance tools from the CMSSW performance suite, comparative CPU performance studies have been carried out on several architectures. A benchmarking suite has been developed and integrated in the CMSSW framework, to allow computing centers and interested third parties to benchmark architectures directly with CMSSW. The CMSSW benchmarking suite can be used out of the box, to test and compare several machines in terms of CPU performance and report with the wanted level of detail the different benchmarking scores (e.g. by processing step) and results. In this talk we describe briefly the CMSSW software performance suite, and in detail the CMSSW benchmarking suite client/server design, the performance data analysis and the available CMSSW benchmark scores. The experience in the use of HEP code for benchmarking will be discussed and CMSSW benchmark results presented.
SU-E-J-60: Efficient Monte Carlo Dose Calculation On CPU-GPU Heterogeneous Systems

Energy Technology Data Exchange (ETDEWEB)

Xiao, K; Chen, D. Z; Hu, X. S [University of Notre Dame, Notre Dame, IN (United States); Zhou, B [Altera Corp., San Jose, CA (United States)

2014-06-01

Purpose: It is well-known that the performance of GPU-based Monte Carlo dose calculation implementations is bounded by memory bandwidth. One major cause of this bottleneck is the random memory writing patterns in dose deposition, which leads to several memory efficiency issues on GPU such as un-coalesced writing and atomic operations. We propose a new method to alleviate such issues on CPU-GPU heterogeneous systems, which achieves overall performance improvement for Monte Carlo dose calculation. Methods: Dose deposition is to accumulate dose into the voxels of a dose volume along the trajectories of radiation rays. Our idea is to partition this procedure into the following three steps, which are fine-tuned for CPU or GPU: (1) each GPU thread writes dose results with location information to a buffer on GPU memory, which achieves fully-coalesced and atomic-free memory transactions; (2) the dose results in the buffer are transferred to CPU memory; (3) the dose volume is constructed from the dose buffer on CPU. We organize the processing of all radiation rays into streams. Since the steps within a stream use different hardware resources (i.e., GPU, DMA, CPU), we can overlap the execution of these steps for different streams by pipelining. Results: We evaluated our method using a Monte Carlo Convolution Superposition (MCCS) program and tested our implementation for various clinical cases on a heterogeneous system containing an Intel i7 quad-core CPU and an NVIDIA TITAN GPU. Comparing with a straightforward MCCS implementation on the same system (using both CPU and GPU for radiation ray tracing), our method gained 2-5X speedup without losing dose calculation accuracy. Conclusion: The results show that our new method improves the effective memory bandwidth and overall performance for MCCS on the CPU-GPU systems. Our proposed method can also be applied to accelerate other Monte Carlo dose calculation approaches. This research was supported in part by NSF under Grants CCF
Acceleration of stereo-matching on multi-core CPU and GPU

OpenAIRE

Tian, Xu; Cockshott, Paul; Oehler, Susanne

2014-01-01

This paper presents an accelerated version of a\\ud dense stereo-correspondence algorithm for two different parallelism\\ud enabled architectures, multi-core CPU and GPU. The\\ud algorithm is part of the vision system developed for a binocular\\ud robot-head in the context of the CloPeMa 1 research project.\\ud This research project focuses on the conception of a new clothes\\ud folding robot with real-time and high resolution requirements\\ud for the vision system. The performance analysis shows th...
Fast CPU-based Monte Carlo simulation for radiotherapy dose calculation

Science.gov (United States)

Ziegenhein, Peter; Pirner, Sven; Kamerling, Cornelis Ph; Oelfke, Uwe

2015-08-01

Monte-Carlo (MC) simulations are considered to be the most accurate method for calculating dose distributions in radiotherapy. Its clinical application, however, still is limited by the long runtimes conventional implementations of MC algorithms require to deliver sufficiently accurate results on high resolution imaging data. In order to overcome this obstacle we developed the software-package PhiMC, which is capable of computing precise dose distributions in a sub-minute time-frame by leveraging the potential of modern many- and multi-core CPU-based computers. PhiMC is based on the well verified dose planning method (DPM). We could demonstrate that PhiMC delivers dose distributions which are in excellent agreement to DPM. The multi-core implementation of PhiMC scales well between different computer architectures and achieves a speed-up of up to 37× compared to the original DPM code executed on a modern system. Furthermore, we could show that our CPU-based implementation on a modern workstation is between 1.25× and 1.95× faster than a well-known GPU implementation of the same simulation method on a NVIDIA Tesla C2050. Since CPUs work on several hundreds of GB RAM the typical GPU memory limitation does not apply for our implementation and high resolution clinical plans can be calculated.
Inhibition of CPU0213, a Dual Endothelin Receptor Antagonist, on Apoptosis via Nox4-Dependent ROS in HK-2 Cells

Directory of Open Access Journals (Sweden)

Qing Li

2016-06-01

Full Text Available Background/Aims: Our previous studies have indicated that a novel endothelin receptor antagonist CPU0213 effectively normalized renal function in diabetic nephropathy. However, the molecular mechanisms mediating the nephroprotective role of CPU0213 remain unknown. Methods and Results: In the present study, we first detected the role of CPU0213 on apoptosis in human renal tubular epithelial cell (HK-2. It was shown that high glucose significantly increased the protein expression of Bax and decreased Bcl-2 protein in HK-2 cells, which was reversed by CPU0213. The percentage of HK-2 cells that showed Annexin V-FITC binding was markedly suppressed by CPU0213, which confirmed the inhibitory role of CPU0213 on apoptosis. Given the regulation of endothelin (ET system to oxidative stress, we determined the role of redox signaling in the regulation of CPU0213 on apoptosis. It was demonstrated that the production of superoxide (O2-. was substantially attenuated by CPU0213 treatment in HK-2 cells. We further found that CPU0213 dramatically inhibited expression of Nox4 protein, which gene silencing mimicked the role of CPU0213 on the apoptosis under high glucose stimulation. We finally examined the role of CPU0213 on ET-1 receptors and found that high glucose-induced protein expression of endothelin A and B receptors was dramatically inhibited by CPU0213. Conclusion: Taken together, these results suggest that this Nox4-dependenet O2- production is critical for the apoptosis of HK-2 cells in high glucose. Endothelin receptor antagonist CPU0213 has an anti-apoptosis role through Nox4-dependent O2-.production, which address the nephroprotective role of CPU0213 in diabetic nephropathy.
Promise of a low power mobile CPU based embedded system in artificial leg control.

Science.gov (United States)

Hernandez, Robert; Zhang, Fan; Zhang, Xiaorong; Huang, He; Yang, Qing

2012-01-01

This paper presents the design and implementation of a low power embedded system using mobile processor technology (Intel Atom™ Z530 Processor) specifically tailored for a neural-machine interface (NMI) for artificial limbs. This embedded system effectively performs our previously developed NMI algorithm based on neuromuscular-mechanical fusion and phase-dependent pattern classification. The analysis shows that NMI embedded system can meet real-time constraints with high accuracies for recognizing the user's locomotion mode. Our implementation utilizes the mobile processor efficiently to allow a power consumption of 2.2 watts and low CPU utilization (less than 4.3%) while executing the complex NMI algorithm. Our experiments have shown that the highly optimized C program implementation on the embedded system has superb advantages over existing PC implementations on MATLAB. The study results suggest that mobile-CPU-based embedded system is promising for implementing advanced control for powered lower limb prostheses.
Heterogeneous Gpu&Cpu Cluster For High Performance Computing In Cryptography

Directory of Open Access Journals (Sweden)

Michał Marks

2012-01-01

Full Text Available This paper addresses issues associated with distributed computing systems andthe application of mixed GPU&CPU technology to data encryption and decryptionalgorithms. We describe a heterogenous cluster HGCC formed by twotypes of nodes: Intel processor with NVIDIA graphics processing unit and AMDprocessor with AMD graphics processing unit (formerly ATI, and a novel softwareframework that hides the heterogeneity of our cluster and provides toolsfor solving complex scientific and engineering problems. Finally, we present theresults of numerical experiments. The considered case study is concerned withparallel implementations of selected cryptanalysis algorithms. The main goal ofthe paper is to show the wide applicability of the GPU&CPU technology tolarge scale computation and data processing.
LHCb: Statistical Comparison of CPU performance for LHCb applications on the Grid

CERN Multimedia

Graciani, R

2009-01-01

The usage of CPU resources by LHCb on the Grid id dominated by two different applications: Gauss and Brunel. Gauss the application doing the Monte Carlo simulation of proton-proton collisions. Brunel is the application responsible for the reconstruction of the signals recorded by the detector converting them into objects that can be used for later physics analysis of the data (tracks, clusters,…) Both applications are based on the Gaudi and LHCb software frameworks. Gauss uses Pythia and Geant as underlying libraries for the simulation of the collision and the later passage of the generated particles through the LHCb detector. While Brunel makes use of LHCb specific code to process the data from each sub-detector. Both applications are CPU bound. Large Monte Carlo productions or data reconstructions running on the Grid are an ideal benchmark to compare the performance of the different CPU models for each case. Since the processed events are only statistically comparable, only statistical comparison of the...
Fast multipurpose Monte Carlo simulation for proton therapy using multi- and many-core CPU architectures

Energy Technology Data Exchange (ETDEWEB)

Souris, Kevin, E-mail: kevin.souris@uclouvain.be; Lee, John Aldo [Center for Molecular Imaging and Experimental Radiotherapy, Institut de Recherche Expérimentale et Clinique, Université catholique de Louvain, Avenue Hippocrate 54, 1200 Brussels, Belgium and ICTEAM Institute, Université catholique de Louvain, Louvain-la-Neuve 1348 (Belgium); Sterpin, Edmond [Center for Molecular Imaging and Experimental Radiotherapy, Institut de Recherche Expérimentale et Clinique, Université catholique de Louvain, Avenue Hippocrate 54, 1200 Brussels, Belgium and Department of Oncology, Katholieke Universiteit Leuven, O& N I Herestraat 49, 3000 Leuven (Belgium)

2016-04-15

Purpose: Accuracy in proton therapy treatment planning can be improved using Monte Carlo (MC) simulations. However the long computation time of such methods hinders their use in clinical routine. This work aims to develop a fast multipurpose Monte Carlo simulation tool for proton therapy using massively parallel central processing unit (CPU) architectures. Methods: A new Monte Carlo, called MCsquare (many-core Monte Carlo), has been designed and optimized for the last generation of Intel Xeon processors and Intel Xeon Phi coprocessors. These massively parallel architectures offer the flexibility and the computational power suitable to MC methods. The class-II condensed history algorithm of MCsquare provides a fast and yet accurate method of simulating heavy charged particles such as protons, deuterons, and alphas inside voxelized geometries. Hard ionizations, with energy losses above a user-specified threshold, are simulated individually while soft events are regrouped in a multiple scattering theory. Elastic and inelastic nuclear interactions are sampled from ICRU 63 differential cross sections, thereby allowing for the computation of prompt gamma emission profiles. MCsquare has been benchmarked with the GATE/GEANT4 Monte Carlo application for homogeneous and heterogeneous geometries. Results: Comparisons with GATE/GEANT4 for various geometries show deviations within 2%–1 mm. In spite of the limited memory bandwidth of the coprocessor simulation time is below 25 s for 10{sup 7} primary 200 MeV protons in average soft tissues using all Xeon Phi and CPU resources embedded in a single desktop unit. Conclusions: MCsquare exploits the flexibility of CPU architectures to provide a multipurpose MC simulation tool. Optimized code enables the use of accurate MC calculation within a reasonable computation time, adequate for clinical practice. MCsquare also simulates prompt gamma emission and can thus be used also for in vivo range verification.
Fast multipurpose Monte Carlo simulation for proton therapy using multi- and many-core CPU architectures

International Nuclear Information System (INIS)

Souris, Kevin; Lee, John Aldo; Sterpin, Edmond

2016-01-01

Purpose: Accuracy in proton therapy treatment planning can be improved using Monte Carlo (MC) simulations. However the long computation time of such methods hinders their use in clinical routine. This work aims to develop a fast multipurpose Monte Carlo simulation tool for proton therapy using massively parallel central processing unit (CPU) architectures. Methods: A new Monte Carlo, called MCsquare (many-core Monte Carlo), has been designed and optimized for the last generation of Intel Xeon processors and Intel Xeon Phi coprocessors. These massively parallel architectures offer the flexibility and the computational power suitable to MC methods. The class-II condensed history algorithm of MCsquare provides a fast and yet accurate method of simulating heavy charged particles such as protons, deuterons, and alphas inside voxelized geometries. Hard ionizations, with energy losses above a user-specified threshold, are simulated individually while soft events are regrouped in a multiple scattering theory. Elastic and inelastic nuclear interactions are sampled from ICRU 63 differential cross sections, thereby allowing for the computation of prompt gamma emission profiles. MCsquare has been benchmarked with the GATE/GEANT4 Monte Carlo application for homogeneous and heterogeneous geometries. Results: Comparisons with GATE/GEANT4 for various geometries show deviations within 2%–1 mm. In spite of the limited memory bandwidth of the coprocessor simulation time is below 25 s for 10"7 primary 200 MeV protons in average soft tissues using all Xeon Phi and CPU resources embedded in a single desktop unit. Conclusions: MCsquare exploits the flexibility of CPU architectures to provide a multipurpose MC simulation tool. Optimized code enables the use of accurate MC calculation within a reasonable computation time, adequate for clinical practice. MCsquare also simulates prompt gamma emission and can thus be used also for in vivo range verification.
Fast multipurpose Monte Carlo simulation for proton therapy using multi- and many-core CPU architectures.

Science.gov (United States)

Souris, Kevin; Lee, John Aldo; Sterpin, Edmond

2016-04-01

Accuracy in proton therapy treatment planning can be improved using Monte Carlo (MC) simulations. However the long computation time of such methods hinders their use in clinical routine. This work aims to develop a fast multipurpose Monte Carlo simulation tool for proton therapy using massively parallel central processing unit (CPU) architectures. A new Monte Carlo, called MCsquare (many-core Monte Carlo), has been designed and optimized for the last generation of Intel Xeon processors and Intel Xeon Phi coprocessors. These massively parallel architectures offer the flexibility and the computational power suitable to MC methods. The class-II condensed history algorithm of MCsquare provides a fast and yet accurate method of simulating heavy charged particles such as protons, deuterons, and alphas inside voxelized geometries. Hard ionizations, with energy losses above a user-specified threshold, are simulated individually while soft events are regrouped in a multiple scattering theory. Elastic and inelastic nuclear interactions are sampled from ICRU 63 differential cross sections, thereby allowing for the computation of prompt gamma emission profiles. MCsquare has been benchmarked with the gate/geant4 Monte Carlo application for homogeneous and heterogeneous geometries. Comparisons with gate/geant4 for various geometries show deviations within 2%-1 mm. In spite of the limited memory bandwidth of the coprocessor simulation time is below 25 s for 10(7) primary 200 MeV protons in average soft tissues using all Xeon Phi and CPU resources embedded in a single desktop unit. MCsquare exploits the flexibility of CPU architectures to provide a multipurpose MC simulation tool. Optimized code enables the use of accurate MC calculation within a reasonable computation time, adequate for clinical practice. MCsquare also simulates prompt gamma emission and can thus be used also for in vivo range verification.
CPU and cache efficient management of memory-resident databases

NARCIS (Netherlands)

Pirk, H.; Funke, F.; Grund, M.; Neumann, T.; Leser, U.; Manegold, S.; Kemper, A.; Kersten, M.L.

2013-01-01

Memory-Resident Database Management Systems (MRDBMS) have to be optimized for two resources: CPU cycles and memory bandwidth. To optimize for bandwidth in mixed OLTP/OLAP scenarios, the hybrid or Partially Decomposed Storage Model (PDSM) has been proposed. However, in current implementations,
CPU and Cache Efficient Management of Memory-Resident Databases

NARCIS (Netherlands)

H. Pirk (Holger); F. Funke; M. Grund; T. Neumann (Thomas); U. Leser; S. Manegold (Stefan); A. Kemper (Alfons); M.L. Kersten (Martin)

2013-01-01

htmlabstractMemory-Resident Database Management Systems (MRDBMS) have to be optimized for two resources: CPU cycles and memory bandwidth. To optimize for bandwidth in mixed OLTP/OLAP scenarios, the hybrid or Partially Decomposed Storage Model (PDSM) has been proposed. However, in current
The relationship among CPU utilization, temperature, and thermal power for waste heat utilization

International Nuclear Information System (INIS)

Haywood, Anna M.; Sherbeck, Jon; Phelan, Patrick; Varsamopoulos, Georgios; Gupta, Sandeep K.S.

2015-01-01

Highlights: • This work graphs a triad relationship among CPU utilization, temperature and power. • Using a custom-built cold plate, we were able capture CPU-generated high quality heat. • The work undertakes a radical approach using mineral oil to directly cool CPUs. • We found that it is possible to use CPU waste energy to power an absorption chiller. - Abstract: This work addresses significant datacenter issues of growth in numbers of computer servers and subsequent electricity expenditure by proposing, analyzing and testing a unique idea of recycling the highest quality waste heat generated by datacenter servers. The aim was to provide a renewable and sustainable energy source for use in cooling the datacenter. The work incorporates novel approaches in waste heat usage, graphing CPU temperature, power and utilization simultaneously, and a mineral oil experimental design and implementation. The work presented investigates and illustrates the quantity and quality of heat that can be captured from a variably tasked liquid-cooled microprocessor on a datacenter server blade. It undertakes a radical approach using mineral oil. The trials examine the feasibility of using the thermal energy from a CPU to drive a cooling process. Results indicate that 123 servers encapsulated in mineral oil can power a 10-ton chiller with a design point of 50.2 kW th . Compared with water-cooling experiments, the mineral oil experiment mitigated the temperature drop between the heat source and discharge line by up to 81%. In addition, due to this reduction in temperature drop, the heat quality in the oil discharge line was up to 12.3 °C higher on average than for water-cooled experiments. Furthermore, mineral oil cooling holds the potential to eliminate the 50% cooling expenditure which initially motivated this project
First Evaluation of the CPU, GPGPU and MIC Architectures for Real Time Particle Tracking based on Hough Transform at the LHC

CERN Document Server

Halyo, V.; Lujan, P.; Karpusenko, V.; Vladimirov, A.

2014-04-07

Recent innovations focused around {\\em parallel} processing, either through systems containing multiple processors or processors containing multiple cores, hold great promise for enhancing the performance of the trigger at the LHC and extending its physics program. The flexibility of the CMS/ATLAS trigger system allows for easy integration of computational accelerators, such as NVIDIA's Tesla Graphics Processing Unit (GPU) or Intel's \\xphi, in the High Level Trigger. These accelerators have the potential to provide faster or more energy efficient event selection, thus opening up possibilities for new complex triggers that were not previously feasible. At the same time, it is crucial to explore the performance limits achievable on the latest generation multicore CPUs with the use of the best software optimization methods. In this article, a new tracking algorithm based on the Hough transform will be evaluated for the first time on a multi-core Intel Xeon E5-2697v2 CPU, an NVIDIA Tesla K20c GPU, and an Intel \\x...

A combined PLC and CPU approach to multiprocessor control

International Nuclear Information System (INIS)

Harris, J.J.; Broesch, J.D.; Coon, R.M.

1995-10-01

A sophisticated multiprocessor control system has been developed for use in the E-Power Supply System Integrated Control (EPSSIC) on the DIII-D tokamak. EPSSIC provides control and interlocks for the ohmic heating coil power supply and its associated systems. Of particular interest is the architecture of this system: both a Programmable Logic Controller (PLC) and a Central Processor Unit (CPU) have been combined on a standard VME bus. The PLC and CPU input and output signals are routed through signal conditioning modules, which provide the necessary voltage and ground isolation. Additionally these modules adapt the signal levels to that of the VME I/O boards. One set of I/O signals is shared between the two processors. The resulting multiprocessor system provides a number of advantages: redundant operation for mission critical situations, flexible communications using conventional TCP/IP protocols, the simplicity of ladder logic programming for the majority of the control code, and an easily maintained and expandable non-proprietary system
High performance technique for database applicationsusing a hybrid GPU/CPU platform

KAUST Repository

Zidan, Mohammed A.

2012-07-28

Many database applications, such as sequence comparing, sequence searching, and sequence matching, etc, process large database sequences. we introduce a novel and efficient technique to improve the performance of database applica- tions by using a Hybrid GPU/CPU platform. In particular, our technique solves the problem of the low efficiency result- ing from running short-length sequences in a database on a GPU. To verify our technique, we applied it to the widely used Smith-Waterman algorithm. The experimental results show that our Hybrid GPU/CPU technique improves the average performance by a factor of 2.2, and improves the peak performance by a factor of 2.8 when compared to earlier implementations. Copyright © 2011 by ASME.
A CFD Heterogeneous Parallel Solver Based on Collaborating CPU and GPU

Science.gov (United States)

Lai, Jianqi; Tian, Zhengyu; Li, Hua; Pan, Sha

2018-03-01

Since Graphic Processing Unit (GPU) has a strong ability of floating-point computation and memory bandwidth for data parallelism, it has been widely used in the areas of common computing such as molecular dynamics (MD), computational fluid dynamics (CFD) and so on. The emergence of compute unified device architecture (CUDA), which reduces the complexity of compiling program, brings the great opportunities to CFD. There are three different modes for parallel solution of NS equations: parallel solver based on CPU, parallel solver based on GPU and heterogeneous parallel solver based on collaborating CPU and GPU. As we can see, GPUs are relatively rich in compute capacity but poor in memory capacity and the CPUs do the opposite. We need to make full use of the GPUs and CPUs, so a CFD heterogeneous parallel solver based on collaborating CPU and GPU has been established. Three cases are presented to analyse the solver’s computational accuracy and heterogeneous parallel efficiency. The numerical results agree well with experiment results, which demonstrate that the heterogeneous parallel solver has high computational precision. The speedup on a single GPU is more than 40 for laminar flow, it decreases for turbulent flow, but it still can reach more than 20. What’s more, the speedup increases as the grid size becomes larger.
Der ATLAS LVL2-Trigger mit FPGA-Prozessoren : Entwicklung, Aufbau und Funktionsnachweis des hybriden FPGA/CPU-basierten Prozessorsystems ATLANTIS

CERN Document Server

Singpiel, Holger

2000-01-01

This thesis describes the conception and implementation of the hybrid FPGA/CPU based processing system ATLANTIS as trigger processor for the proposed ATLAS experiment at CERN. CompactPCI provides the close coupling of a multi FPGA system and a standard CPU. The system is scalable in computing power and flexible in use due to its partitioning into dedicated FPGA boards for computation, I/O tasks and a private communication. Main focus of the research activities based on the usage of the ATLANTIS system are two areas in the second level trigger (LVL2). First, the acceleration of time critical B physics trigger algorithms is the major aim. The execution of the full scan TRT algorithm on ATLANTIS, which has been used as a demonstrator, results in a speedup of 5.6 compared to a standard CPU. Next, the ATLANTIS system is used as a hardware platform for research work in conjunction with the ATLAS readout systems. For further studies a permanent installation of the ATLANTIS system in the LVL2 application testbed is f...
Turbo Charge CPU Utilization in Fork/Join Using the ManagedBlocker

CERN Multimedia

CERN. Geneva

2017-01-01

Fork/Join is a framework for parallelizing calculations using recursive decomposition, also called divide and conquer. These algorithms occasionally end up duplicating work, especially at the beginning of the run. We can reduce wasted CPU cycles by implementing a reserved caching scheme. Before a task starts its calculation, it tries to reserve an entry in the shared map. If it is successful, it immediately begins. If not, it blocks until the other thread has finished its calculation. Unfortunately this might result in a significant number of blocked threads, decreasing CPU utilization. In this talk we will demonstrate this issue and offer a solution in the form of the ManagedBlocker. Combined with the Fork/Join, it can keep parallelism at the desired level.
Liquid Cooling System for CPU by Electroconjugate Fluid

Directory of Open Access Journals (Sweden)

Yasuo Sakurai

2014-06-01

Full Text Available The dissipated power of CPU for personal computer has been increased because the performance of personal computer becomes higher. Therefore, a liquid cooling system has been employed in some personal computers in order to improve their cooling performance. Electroconjugate fluid (ECF is one of the functional fluids. ECF has a remarkable property that a strong jet flow is generated between electrodes when a high voltage is applied to ECF through the electrodes. By using this strong jet flow, an ECF-pump with simple structure, no sliding portion, no noise, and no vibration seems to be able to be developed. And then, by the use of the ECF-pump, a new liquid cooling system by ECF seems to be realized. In this study, to realize this system, an ECF-pump is proposed and fabricated to investigate the basic characteristics of the ECF-pump experimentally. Next, by utilizing the ECF-pump, a model of a liquid cooling system by ECF is manufactured and some experiments are carried out to investigate the performance of this system. As a result, by using this system, the temperature of heat source of 50 W is kept at 60°C or less. In general, CPU is usually used at this temperature or less.
CPU0213, a novel endothelin type A and type B receptor antagonist, protects against myocardial ischemia/reperfusion injury in rats

Directory of Open Access Journals (Sweden)

Z.Y. Wang

2011-11-01

Full Text Available The efficacy of endothelin receptor antagonists in protecting against myocardial ischemia/reperfusion (I/R injury is controversial, and the mechanisms remain unclear. The aim of this study was to investigate the effects of CPU0123, a novel endothelin type A and type B receptor antagonist, on myocardial I/R injury and to explore the mechanisms involved. Male Sprague-Dawley rats weighing 200-250 g were randomized to three groups (6-7 per group: group 1, Sham; group 2, I/R + vehicle. Rats were subjected to in vivo myocardial I/R injury by ligation of the left anterior descending coronary artery and 0.5% sodium carboxymethyl cellulose (1 mL/kg was injected intraperitoneally immediately prior to coronary occlusion. Group 3, I/R + CPU0213. Rats were subjected to identical surgical procedures and CPU0213 (30 mg/kg was injected intraperitoneally immediately prior to coronary occlusion. Infarct size, cardiac function and biochemical changes were measured. CPU0213 pretreatment reduced infarct size as a percentage of the ischemic area by 44.5% (I/R + vehicle: 61.3 ± 3.2 vs I/R + CPU0213: 34.0 ± 5.5%, P < 0.05 and improved ejection fraction by 17.2% (I/R + vehicle: 58.4 ± 2.8 vs I/R + CPU0213: 68.5 ± 2.2%, P < 0.05 compared to vehicle-treated animals. This protection was associated with inhibition of myocardial inflammation and oxidative stress. Moreover, reduction in Akt (protein kinase B and endothelial nitric oxide synthase (eNOS phosphorylation induced by myocardial I/R injury was limited by CPU0213 (P < 0.05. These data suggest that CPU0123, a non-selective antagonist, has protective effects against myocardial I/R injury in rats, which may be related to the Akt/eNOS pathway.
High performance technique for database applicationsusing a hybrid GPU/CPU platform

KAUST Repository

Zidan, Mohammed A.; Bonny, Talal; Salama, Khaled N.

2012-01-01

Hybrid GPU/CPU platform. In particular, our technique solves the problem of the low efficiency result- ing from running short-length sequences in a database on a GPU. To verify our technique, we applied it to the widely used Smith-Waterman algorithm
Design Patterns for Sparse-Matrix Computations on Hybrid CPU/GPU Platforms

Directory of Open Access Journals (Sweden)

Valeria Cardellini

2014-01-01

Full Text Available We apply object-oriented software design patterns to develop code for scientific software involving sparse matrices. Design patterns arise when multiple independent developments produce similar designs which converge onto a generic solution. We demonstrate how to use design patterns to implement an interface for sparse matrix computations on NVIDIA GPUs starting from PSBLAS, an existing sparse matrix library, and from existing sets of GPU kernels for sparse matrices. We also compare the throughput of the PSBLAS sparse matrix–vector multiplication on two platforms exploiting the GPU with that obtained by a CPU-only PSBLAS implementation. Our experiments exhibit encouraging results regarding the comparison between CPU and GPU executions in double precision, obtaining a speedup of up to 35.35 on NVIDIA GTX 285 with respect to AMD Athlon 7750, and up to 10.15 on NVIDIA Tesla C2050 with respect to Intel Xeon X5650.
DSM vs. NSM: CPU Performance Tradeoffs in Block-Oriented Query Processing

NARCIS (Netherlands)

M. Zukowski (Marcin); N.J. Nes (Niels); P.A. Boncz (Peter)

2008-01-01

textabstractComparisons between the merits of row-wise storage (NSM) and columnar storage (DSM) are typically made with respect to the persistent storage layer of database systems. In this paper, however, we focus on the CPU efficiency tradeoffs of tuple representations inside the query
HEP specific benchmarks of virtual machines on multi-core CPU architectures

International Nuclear Information System (INIS)

Alef, M; Gable, I

2010-01-01

Virtualization technologies such as Xen can be used in order to satisfy the disparate and often incompatible system requirements of different user groups in shared-use computing facilities. This capability is particularly important for HEP applications, which often have restrictive requirements. The use of virtualization adds flexibility, however, it is essential that the virtualization technology place little overhead on the HEP application. We present an evaluation of the practicality of running HEP applications in multiple Virtual Machines (VMs) on a single multi-core Linux system. We use the benchmark suite used by the HEPiX CPU Benchmarking Working Group to give a quantitative evaluation relevant to the HEP community. Benchmarks are packaged inside VMs and then the VMs are booted onto a single multi-core system. Benchmarks are then simultaneously executed on each VM to simulate highly loaded VMs running HEP applications. These techniques are applied to a variety of multi-core CPU architectures and VM configurations.
Comparison of the CPU and memory performance of StatPatternRecognitions (SPR) and Toolkit for MultiVariate Analysis (TMVA)

International Nuclear Information System (INIS)

Palombo, G.

2012-01-01

High Energy Physics data sets are often characterized by a huge number of events. Therefore, it is extremely important to use statistical packages able to efficiently analyze these unprecedented amounts of data. We compare the performance of the statistical packages StatPatternRecognition (SPR) and Toolkit for MultiVariate Analysis (TMVA). We focus on how CPU time and memory usage of the learning process scale versus data set size. As classifiers, we consider Random Forests, Boosted Decision Trees and Neural Networks only, each with specific settings. For our tests, we employ a data set widely used in the machine learning community, “Threenorm” data set, as well as data tailored for testing various edge cases. For each data set, we constantly increase its size and check CPU time and memory needed to build the classifiers implemented in SPR and TMVA. We show that SPR is often significantly faster and consumes significantly less memory. For example, the SPR implementation of Random Forest is by an order of magnitude faster and consumes an order of magnitude less memory than TMVA on Threenorm data.
An efficient implementation of 3D high-resolution imaging for large-scale seismic data with GPU/CPU heterogeneous parallel computing

Science.gov (United States)

Xu, Jincheng; Liu, Wei; Wang, Jin; Liu, Linong; Zhang, Jianfeng

2018-02-01

De-absorption pre-stack time migration (QPSTM) compensates for the absorption and dispersion of seismic waves by introducing an effective Q parameter, thereby making it an effective tool for 3D, high-resolution imaging of seismic data. Although the optimal aperture obtained via stationary-phase migration reduces the computational cost of 3D QPSTM and yields 3D stationary-phase QPSTM, the associated computational efficiency is still the main problem in the processing of 3D, high-resolution images for real large-scale seismic data. In the current paper, we proposed a division method for large-scale, 3D seismic data to optimize the performance of stationary-phase QPSTM on clusters of graphics processing units (GPU). Then, we designed an imaging point parallel strategy to achieve an optimal parallel computing performance. Afterward, we adopted an asynchronous double buffering scheme for multi-stream to perform the GPU/CPU parallel computing. Moreover, several key optimization strategies of computation and storage based on the compute unified device architecture (CUDA) were adopted to accelerate the 3D stationary-phase QPSTM algorithm. Compared with the initial GPU code, the implementation of the key optimization steps, including thread optimization, shared memory optimization, register optimization and special function units (SFU), greatly improved the efficiency. A numerical example employing real large-scale, 3D seismic data showed that our scheme is nearly 80 times faster than the CPU-QPSTM algorithm. Our GPU/CPU heterogeneous parallel computing framework significant reduces the computational cost and facilitates 3D high-resolution imaging for large-scale seismic data.
Enhancing Leakage Power in CPU Cache Using Inverted Architecture

OpenAIRE

Bilal A. Shehada; Ahmed M. Serdah; Aiman Abu Samra

2013-01-01

Power consumption is an increasingly pressing problem in modern processor design. Since the on-chip caches usually consume a significant amount of power so power and energy consumption parameters have become one of the most important design constraint. It is one of the most attractive targets for power reduction. This paper presents an approach to enhance the dynamic power consumption of CPU cache using inverted cache architecture. Our assumption tries to reduce dynamic write power dissipatio...
Massively parallel data processing for quantitative total flow imaging with optical coherence microscopy and tomography

Science.gov (United States)

Sylwestrzak, Marcin; Szlag, Daniel; Marchand, Paul J.; Kumar, Ashwin S.; Lasser, Theo

2017-08-01

We present an application of massively parallel processing of quantitative flow measurements data acquired using spectral optical coherence microscopy (SOCM). The need for massive signal processing of these particular datasets has been a major hurdle for many applications based on SOCM. In view of this difficulty, we implemented and adapted quantitative total flow estimation algorithms on graphics processing units (GPU) and achieved a 150 fold reduction in processing time when compared to a former CPU implementation. As SOCM constitutes the microscopy counterpart to spectral optical coherence tomography (SOCT), the developed processing procedure can be applied to both imaging modalities. We present the developed DLL library integrated in MATLAB (with an example) and have included the source code for adaptations and future improvements. Catalogue identifier: AFBT_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AFBT_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU GPLv3 No. of lines in distributed program, including test data, etc.: 913552 No. of bytes in distributed program, including test data, etc.: 270876249 Distribution format: tar.gz Programming language: CUDA/C, MATLAB. Computer: Intel x64 CPU, GPU supporting CUDA technology. Operating system: 64-bit Windows 7 Professional. Has the code been vectorized or parallelized?: Yes, CPU code has been vectorized in MATLAB, CUDA code has been parallelized. RAM: Dependent on users parameters, typically between several gigabytes and several tens of gigabytes Classification: 6.5, 18. Nature of problem: Speed up of data processing in optical coherence microscopy Solution method: Utilization of GPU for massively parallel data processing Additional comments: Compiled DLL library with source code and documentation, example of utilization (MATLAB script with raw data) Running time: 1,8 s for one B-scan (150 × faster in comparison to the CPU
hybrid\\scriptsize{{MANTIS}}: a CPU-GPU Monte Carlo method for modeling indirect x-ray detectors with columnar scintillators

Science.gov (United States)

Sharma, Diksha; Badal, Andreu; Badano, Aldo

2012-04-01

The computational modeling of medical imaging systems often requires obtaining a large number of simulated images with low statistical uncertainty which translates into prohibitive computing times. We describe a novel hybrid approach for Monte Carlo simulations that maximizes utilization of CPUs and GPUs in modern workstations. We apply the method to the modeling of indirect x-ray detectors using a new and improved version of the code \\scriptsize{{MANTIS}}, an open source software tool used for the Monte Carlo simulations of indirect x-ray imagers. We first describe a GPU implementation of the physics and geometry models in fast\\scriptsize{{DETECT}}2 (the optical transport model) and a serial CPU version of the same code. We discuss its new features like on-the-fly column geometry and columnar crosstalk in relation to the \\scriptsize{{MANTIS}} code, and point out areas where our model provides more flexibility for the modeling of realistic columnar structures in large area detectors. Second, we modify \\scriptsize{{PENELOPE}} (the open source software package that handles the x-ray and electron transport in \\scriptsize{{MANTIS}}) to allow direct output of location and energy deposited during x-ray and electron interactions occurring within the scintillator. This information is then handled by optical transport routines in fast\\scriptsize{{DETECT}}2. A load balancer dynamically allocates optical transport showers to the GPU and CPU computing cores. Our hybrid\\scriptsize{{MANTIS}} approach achieves a significant speed-up factor of 627 when compared to \\scriptsize{{MANTIS}} and of 35 when compared to the same code running only in a CPU instead of a GPU. Using hybrid\\scriptsize{{MANTIS}}, we successfully hide hours of optical transport time by running it in parallel with the x-ray and electron transport, thus shifting the computational bottleneck from optical to x-ray transport. The new code requires much less memory than \\scriptsize{{MANTIS}} and, as a result
The “Chimera”: An Off-The-Shelf CPU/GPGPU/FPGA Hybrid Computing Platform

Directory of Open Access Journals (Sweden)

Ra Inta

2012-01-01

Full Text Available The nature of modern astronomy means that a number of interesting problems exhibit a substantial computational bound and this situation is gradually worsening. Scientists, increasingly fighting for valuable resources on conventional high-performance computing (HPC facilities—often with a limited customizable user environment—are increasingly looking to hardware acceleration solutions. We describe here a heterogeneous CPU/GPGPU/FPGA desktop computing system (the “Chimera”, built with commercial-off-the-shelf components. We show that this platform may be a viable alternative solution to many common computationally bound problems found in astronomy, however, not without significant challenges. The most significant bottleneck in pipelines involving real data is most likely to be the interconnect (in this case the PCI Express bus residing on the CPU motherboard. Finally, we speculate on the merits of our Chimera system on the entire landscape of parallel computing, through the analysis of representative problems from UC Berkeley’s “Thirteen Dwarves.”
Cpu/gpu Computing for AN Implicit Multi-Block Compressible Navier-Stokes Solver on Heterogeneous Platform

Science.gov (United States)

Deng, Liang; Bai, Hanli; Wang, Fang; Xu, Qingxin

2016-06-01

CPU/GPU computing allows scientists to tremendously accelerate their numerical codes. In this paper, we port and optimize a double precision alternating direction implicit (ADI) solver for three-dimensional compressible Navier-Stokes equations from our in-house Computational Fluid Dynamics (CFD) software on heterogeneous platform. First, we implement a full GPU version of the ADI solver to remove a lot of redundant data transfers between CPU and GPU, and then design two fine-grain schemes, namely “one-thread-one-point” and “one-thread-one-line”, to maximize the performance. Second, we present a dual-level parallelization scheme using the CPU/GPU collaborative model to exploit the computational resources of both multi-core CPUs and many-core GPUs within the heterogeneous platform. Finally, considering the fact that memory on a single node becomes inadequate when the simulation size grows, we present a tri-level hybrid programming pattern MPI-OpenMP-CUDA that merges fine-grain parallelism using OpenMP and CUDA threads with coarse-grain parallelism using MPI for inter-node communication. We also propose a strategy to overlap the computation with communication using the advanced features of CUDA and MPI programming. We obtain speedups of 6.0 for the ADI solver on one Tesla M2050 GPU in contrast to two Xeon X5670 CPUs. Scalability tests show that our implementation can offer significant performance improvement on heterogeneous platform.
Finite difference numerical method for the superlattice Boltzmann transport equation and case comparison of CPU(C) and GPU(CUDA) implementations

Energy Technology Data Exchange (ETDEWEB)

Priimak, Dmitri

2014-12-01

We present a finite difference numerical algorithm for solving two dimensional spatially homogeneous Boltzmann transport equation which describes electron transport in a semiconductor superlattice subject to crossed time dependent electric and constant magnetic fields. The algorithm is implemented both in C language targeted to CPU and in CUDA C language targeted to commodity NVidia GPU. We compare performances and merits of one implementation versus another and discuss various software optimisation techniques.
Finite difference numerical method for the superlattice Boltzmann transport equation and case comparison of CPU(C) and GPU(CUDA) implementations

International Nuclear Information System (INIS)

Priimak, Dmitri

2014-01-01

We present a finite difference numerical algorithm for solving two dimensional spatially homogeneous Boltzmann transport equation which describes electron transport in a semiconductor superlattice subject to crossed time dependent electric and constant magnetic fields. The algorithm is implemented both in C language targeted to CPU and in CUDA C language targeted to commodity NVidia GPU. We compare performances and merits of one implementation versus another and discuss various software optimisation techniques

Fast data reconstructed method of Fourier transform imaging spectrometer based on multi-core CPU

Science.gov (United States)

Yu, Chunchao; Du, Debiao; Xia, Zongze; Song, Li; Zheng, Weijian; Yan, Min; Lei, Zhenggang

2017-10-01

Imaging spectrometer can gain two-dimensional space image and one-dimensional spectrum at the same time, which shows high utility in color and spectral measurements, the true color image synthesis, military reconnaissance and so on. In order to realize the fast reconstructed processing of the Fourier transform imaging spectrometer data, the paper designed the optimization reconstructed algorithm with OpenMP parallel calculating technology, which was further used for the optimization process for the HyperSpectral Imager of `HJ-1' Chinese satellite. The results show that the method based on multi-core parallel computing technology can control the multi-core CPU hardware resources competently and significantly enhance the calculation of the spectrum reconstruction processing efficiency. If the technology is applied to more cores workstation in parallel computing, it will be possible to complete Fourier transform imaging spectrometer real-time data processing with a single computer.
A Bit String Content Aware Chunking Strategy for Reduced CPU Energy on Cloud Storage

Directory of Open Access Journals (Sweden)

Bin Zhou

2015-01-01

Full Text Available In order to achieve energy saving and reduce the total cost of ownership, green storage has become the first priority for data center. Detecting and deleting the redundant data are the key factors to the reduction of the energy consumption of CPU, while high performance stable chunking strategy provides the groundwork for detecting redundant data. The existing chunking algorithm greatly reduces the system performance when confronted with big data and it wastes a lot of energy. Factors affecting the chunking performance are analyzed and discussed in the paper and a new fingerprint signature calculation is implemented. Furthermore, a Bit String Content Aware Chunking Strategy (BCCS is put forward. This strategy reduces the cost of signature computation in chunking process to improve the system performance and cuts down the energy consumption of the cloud storage data center. On the basis of relevant test scenarios and test data of this paper, the advantages of the chunking strategy are verified.
CPU SIM: A Computer Simulator for Use in an Introductory Computer Organization-Architecture Class.

Science.gov (United States)

Skrein, Dale

1994-01-01

CPU SIM, an interactive low-level computer simulation package that runs on the Macintosh computer, is described. The program is designed for instructional use in the first or second year of undergraduate computer science, to teach various features of typical computer organization through hands-on exercises. (MSE)
Portable implementation model for CFD simulations. Application to hybrid CPU/GPU supercomputers

Science.gov (United States)

Oyarzun, Guillermo; Borrell, Ricard; Gorobets, Andrey; Oliva, Assensi

2017-10-01

Nowadays, high performance computing (HPC) systems experience a disruptive moment with a variety of novel architectures and frameworks, without any clarity of which one is going to prevail. In this context, the portability of codes across different architectures is of major importance. This paper presents a portable implementation model based on an algebraic operational approach for direct numerical simulation (DNS) and large eddy simulation (LES) of incompressible turbulent flows using unstructured hybrid meshes. The strategy proposed consists in representing the whole time-integration algorithm using only three basic algebraic operations: sparse matrix-vector product, a linear combination of vectors and dot product. The main idea is based on decomposing the nonlinear operators into a concatenation of two SpMV operations. This provides high modularity and portability. An exhaustive analysis of the proposed implementation for hybrid CPU/GPU supercomputers has been conducted with tests using up to 128 GPUs. The main objective consists in understanding the challenges of implementing CFD codes on new architectures.
A Novel CPU/GPU Simulation Environment for Large-Scale Biologically-Realistic Neural Modeling

Directory of Open Access Journals (Sweden)

Roger V Hoang

2013-10-01

Full Text Available Computational Neuroscience is an emerging field that provides unique opportunities to studycomplex brain structures through realistic neural simulations. However, as biological details are added tomodels, the execution time for the simulation becomes longer. Graphics Processing Units (GPUs are now being utilized to accelerate simulations due to their ability to perform computations in parallel. As such, they haveshown significant improvement in execution time compared to Central Processing Units (CPUs. Most neural simulators utilize either multiple CPUs or a single GPU for better performance, but still show limitations in execution time when biological details are not sacrificed. Therefore, we present a novel CPU/GPU simulation environment for large-scale biological networks,the NeoCortical Simulator version 6 (NCS6. NCS6 is a free, open-source, parallelizable, and scalable simula-tor, designed to run on clusters of multiple machines, potentially with high performance computing devicesin each of them. It has built-in leaky-integrate-and-fire (LIF and Izhikevich (IZH neuron models, but usersalso have the capability to design their own plug-in interface for different neuron types as desired. NCS6is currently able to simulate one million cells and 100 million synapses in quasi real time by distributing dataacross these heterogeneous clusters of CPUs and GPUs.
An Experimental Evaluation of Real-Time DVFS Scheduling Algorithms

OpenAIRE

Saha, Sonal

2011-01-01

Dynamic voltage and frequency scaling (DVFS) is an extensively studied energy manage- ment technique, which aims to reduce the energy consumption of computing platforms by dynamically scaling the CPU frequency. Real-Time DVFS (RT-DVFS) is a branch of DVFS, which reduces CPU energy consumption through DVFS, while at the same time ensures that task time constraints are satisfied by constructing appropriate real-time task schedules. The literature presents numerous RT-DVFS schedul...
Hybrid Computational Architecture for Multi-Scale Modeling of Materials and Devices

Science.gov (United States)

2016-01-03

Equivalent: Total Number: Sub Contractors (DD882) Names of Faculty Supported Names of Under Graduate students supported Names of Personnel receiving masters...GHz, 20 cores (40 with hyper-threading ( HT )) Single node performance Node # of cores Total CPU time User CPU time System CPU time Elapsed time...INTEL20 40 (with HT ) 534.785 529.984 4.800 541.179 20 468.873 466.119 2.754 476.878 10 671.798 669.653 2.145 680.510 8 772.269 770.256 2.013
Timing the total reflection of light

International Nuclear Information System (INIS)

Chauvat, Dominique; Bonnet, Christophe; Dunseath, Kevin; Emile, Olivier; Le Floch, Albert

2005-01-01

We have identified for the first time the absolute delay at total reflection, envisioned by Newton. We show that there are in fact two divergent Wigner delays, depending on the polarisation of the incident light. These measurements give a new insight on the passage from total reflection to refraction
Simulation of small-angle scattering patterns using a CPU-efficient algorithm

Science.gov (United States)

Anitas, E. M.

2017-12-01

Small-angle scattering (of neutrons, x-ray or light; SAS) is a well-established experimental technique for structural analysis of disordered systems at nano and micro scales. For complex systems, such as super-molecular assemblies or protein molecules, analytic solutions of SAS intensity are generally not available. Thus, a frequent approach to simulate the corresponding patterns is to use a CPU-efficient version of the Debye formula. For this purpose, in this paper we implement the well-known DALAI algorithm in Mathematica software. We present calculations for a series of 2D Sierpinski gaskets and respectively of pentaflakes, obtained from chaos game representation.
Overtaking CPU DBMSes with a GPU in whole-query analytic processing with parallelism-friendly execution plan optimization

NARCIS (Netherlands)

A. Agbaria (Adnan); D. Minor (David); N. Peterfreund (Natan); E. Rozenberg (Eyal); O. Rosenberg (Ofer); Huawei Research

2016-01-01

textabstractExisting work on accelerating analytic DB query processing with (discrete) GPUs fails to fully realize their potential for speedup through parallelism: Published results do not achieve significant speedup over more performant CPU-only DBMSes when processing complete queries. This
CPU time optimization and precise adjustment of the Geant4 physics parameters for a VARIAN 2100 C/D gamma radiotherapy linear accelerator simulation using GAMOS

Science.gov (United States)

Arce, Pedro; Lagares, Juan Ignacio

2018-02-01

We have verified the GAMOS/Geant4 simulation model of a 6 MV VARIAN Clinac 2100 C/D linear accelerator by the procedure of adjusting the initial beam parameters to fit the percentage depth dose and cross-profile dose experimental data at different depths in a water phantom. Thanks to the use of a wide range of field sizes, from 2 × 2 cm2 to 40 × 40 cm2, a small phantom voxel size and high statistics, fine precision in the determination of the beam parameters has been achieved. This precision has allowed us to make a thorough study of the different physics models and parameters that Geant4 offers. The three Geant4 electromagnetic physics sets of models, i.e. Standard, Livermore and Penelope, have been compared to the experiment, testing the four different models of angular bremsstrahlung distributions as well as the three available multiple-scattering models, and optimizing the most relevant Geant4 electromagnetic physics parameters. Before the fitting, a comprehensive CPU time optimization has been done, using several of the Geant4 efficiency improvement techniques plus a few more developed in GAMOS.
Design of a Message Passing Model for Use in a Heterogeneous CPU-NFP Framework for Network Analytics

CSIR Research Space (South Africa)

Pennefather, S

2017-09-01

Full Text Available of applications written in the Go programming language to be executed on a Network Flow Processor (NFP) for enhanced performance. This paper explores the need and feasibility of implementing a message passing model for data transmission between the NFP and CPU...
OpenMP GNU and Intel Fortran programs for solving the time-dependent Gross-Pitaevskii equation

Science.gov (United States)

Young-S., Luis E.; Muruganandam, Paulsamy; Adhikari, Sadhan K.; Lončar, Vladimir; Vudragović, Dušan; Balaž, Antun

2017-11-01

reduce the execution time cannot be overemphasized. To address this issue, we provide here such OpenMP Fortran programs, optimized for both Intel and GNU Fortran compilers and capable of using all available CPU cores, which can significantly reduce the execution time. Summary of revisions: Previous Fortran programs [1] for solving the time-dependent GP equation in 1d, 2d, and 3d with different trap symmetries have been parallelized using the OpenMP interface to reduce the execution time on multi-core processors. There are six different trap symmetries considered, resulting in six programs for imaginary-time propagation and six for real-time propagation, totaling to 12 programs included in BEC-GP-OMP-FOR software package. All input data (number of atoms, scattering length, harmonic oscillator trap length, trap anisotropy, etc.) are conveniently placed at the beginning of each program, as before [2]. Present programs introduce a new input parameter, which is designated by Number_of_Threads and defines the number of CPU cores of the processor to be used in the calculation. If one sets the value 0 for this parameter, all available CPU cores will be used. For the most efficient calculation it is advisable to leave one CPU core unused for the background system's jobs. For example, on a machine with 20 CPU cores such that we used for testing, it is advisable to use up to 19 CPU cores. However, the total number of used CPU cores can be divided into more than one job. For instance, one can run three simulations simultaneously using 10, 4, and 5 CPU cores, respectively, thus totaling to 19 used CPU cores on a 20-core computer. The Fortran source programs are located in the directory src, and can be compiled by the make command using the makefile in the root directory BEC-GP-OMP-FOR of the software package. The examples of produced output files can be found in the directory output, although some large density files are omitted, to save space. The programs calculate the values of
A heterogeneous CPU+GPU Poisson solver for space charge calculations in beam dynamics studies

Energy Technology Data Exchange (ETDEWEB)

Zheng, Dawei; Rienen, Ursula van [University of Rostock, Institute of General Electrical Engineering (Germany)

2016-07-01

In beam dynamics studies in accelerator physics, space charge plays a central role in the low energy regime of an accelerator. Numerical space charge calculations are required, both, in the design phase and in the operation of the machines as well. Due to its efficiency, mostly the Particle-In-Cell (PIC) method is chosen for the space charge calculation. Then, the solution of Poisson's equation for the charge distribution in the rest frame is the most prominent part within the solution process. The Poisson solver directly affects the accuracy of the self-field applied on the charged particles when the equation of motion is solved in the laboratory frame. As the Poisson solver consumes the major part of the computing time in most simulations it has to be as fast as possible since it has to be carried out once per time step. In this work, we demonstrate a novel heterogeneous CPU+GPU routine for the Poisson solver. The novel solver also benefits from our new research results on the utilization of a discrete cosine transform within the classical Hockney and Eastwood's convolution routine.
Joint Optimized CPU and Networking Control Scheme for Improved Energy Efficiency in Video Streaming on Mobile Devices

Directory of Open Access Journals (Sweden)

Sung-Woong Jo

2017-01-01

Full Text Available Video streaming service is one of the most popular applications for mobile users. However, mobile video streaming services consume a lot of energy, resulting in a reduced battery life. This is a critical problem that results in a degraded user’s quality of experience (QoE. Therefore, in this paper, a joint optimization scheme that controls both the central processing unit (CPU and wireless networking of the video streaming process for improved energy efficiency on mobile devices is proposed. For this purpose, the energy consumption of the network interface and CPU is analyzed, and based on the energy consumption profile a joint optimization problem is formulated to maximize the energy efficiency of the mobile device. The proposed algorithm adaptively adjusts the number of chunks to be downloaded and decoded in each packet. Simulation results show that the proposed algorithm can effectively improve the energy efficiency when compared with the existing algorithms.
GPScheDVS: A New Paradigm of the Autonomous CPU Speed Control for Commodity-OS-based General-Purpose Mobile Computers with a DVS-friendly Task Scheduling

OpenAIRE

Kim, Sookyoung

2008-01-01

This dissertation studies the problem of increasing battery life-time and reducing CPU heat dissipation without degrading system performance in commodity-OS-based general-purpose (GP) mobile computers using the dynamic voltage scaling (DVS) function of modern CPUs. The dissertation especially focuses on the impact of task scheduling on the effectiveness of DVS in achieving this goal. The task scheduling mechanism used in most contemporary general-purpose operating systems (GPOS) prioritizes t...
The Research and Test of Fast Radio Burst Real-time Search Algorithm Based on GPU Acceleration

Science.gov (United States)

Wang, J.; Chen, M. Z.; Pei, X.; Wang, Z. Q.

2017-03-01

In order to satisfy the research needs of Nanshan 25 m radio telescope of Xinjiang Astronomical Observatory (XAO) and study the key technology of the planned QiTai radio Telescope (QTT), the receiver group of XAO studied the GPU (Graphics Processing Unit) based real-time FRB searching algorithm which developed from the original FRB searching algorithm based on CPU (Central Processing Unit), and built the FRB real-time searching system. The comparison of the GPU system and the CPU system shows that: on the basis of ensuring the accuracy of the search, the speed of the GPU accelerated algorithm is improved by 35-45 times compared with the CPU algorithm.
Study on efficiency of time computation in x-ray imaging simulation base on Monte Carlo algorithm using graphics processing unit

International Nuclear Information System (INIS)

Setiani, Tia Dwi; Suprijadi; Haryanto, Freddy

2016-01-01

Monte Carlo (MC) is one of the powerful techniques for simulation in x-ray imaging. MC method can simulate the radiation transport within matter with high accuracy and provides a natural way to simulate radiation transport in complex systems. One of the codes based on MC algorithm that are widely used for radiographic images simulation is MC-GPU, a codes developed by Andrea Basal. This study was aimed to investigate the time computation of x-ray imaging simulation in GPU (Graphics Processing Unit) compared to a standard CPU (Central Processing Unit). Furthermore, the effect of physical parameters to the quality of radiographic images and the comparison of image quality resulted from simulation in the GPU and CPU are evaluated in this paper. The simulations were run in CPU which was simulated in serial condition, and in two GPU with 384 cores and 2304 cores. In simulation using GPU, each cores calculates one photon, so, a large number of photon were calculated simultaneously. Results show that the time simulations on GPU were significantly accelerated compared to CPU. The simulations on the 2304 core of GPU were performed about 64 -114 times faster than on CPU, while the simulation on the 384 core of GPU were performed about 20 – 31 times faster than in a single core of CPU. Another result shows that optimum quality of images from the simulation was gained at the history start from 10"8 and the energy from 60 Kev to 90 Kev. Analyzed by statistical approach, the quality of GPU and CPU images are relatively the same.
Study on efficiency of time computation in x-ray imaging simulation base on Monte Carlo algorithm using graphics processing unit

Energy Technology Data Exchange (ETDEWEB)

Setiani, Tia Dwi, E-mail: tiadwisetiani@gmail.com [Computational Science, Faculty of Mathematics and Natural Sciences, Institut Teknologi Bandung Jalan Ganesha 10 Bandung, 40132 (Indonesia); Suprijadi [Computational Science, Faculty of Mathematics and Natural Sciences, Institut Teknologi Bandung Jalan Ganesha 10 Bandung, 40132 (Indonesia); Nuclear Physics and Biophysics Reaserch Division, Faculty of Mathematics and Natural Sciences, Institut Teknologi Bandung Jalan Ganesha 10 Bandung, 40132 (Indonesia); Haryanto, Freddy [Nuclear Physics and Biophysics Reaserch Division, Faculty of Mathematics and Natural Sciences, Institut Teknologi Bandung Jalan Ganesha 10 Bandung, 40132 (Indonesia)

2016-03-11

Monte Carlo (MC) is one of the powerful techniques for simulation in x-ray imaging. MC method can simulate the radiation transport within matter with high accuracy and provides a natural way to simulate radiation transport in complex systems. One of the codes based on MC algorithm that are widely used for radiographic images simulation is MC-GPU, a codes developed by Andrea Basal. This study was aimed to investigate the time computation of x-ray imaging simulation in GPU (Graphics Processing Unit) compared to a standard CPU (Central Processing Unit). Furthermore, the effect of physical parameters to the quality of radiographic images and the comparison of image quality resulted from simulation in the GPU and CPU are evaluated in this paper. The simulations were run in CPU which was simulated in serial condition, and in two GPU with 384 cores and 2304 cores. In simulation using GPU, each cores calculates one photon, so, a large number of photon were calculated simultaneously. Results show that the time simulations on GPU were significantly accelerated compared to CPU. The simulations on the 2304 core of GPU were performed about 64 -114 times faster than on CPU, while the simulation on the 384 core of GPU were performed about 20 – 31 times faster than in a single core of CPU. Another result shows that optimum quality of images from the simulation was gained at the history start from 10{sup 8} and the energy from 60 Kev to 90 Kev. Analyzed by statistical approach, the quality of GPU and CPU images are relatively the same.
GENIE: a software package for gene-gene interaction analysis in genetic association studies using multiple GPU or CPU cores

Directory of Open Access Journals (Sweden)

Wang Kai

2011-05-01

Full Text Available Abstract Background Gene-gene interaction in genetic association studies is computationally intensive when a large number of SNPs are involved. Most of the latest Central Processing Units (CPUs have multiple cores, whereas Graphics Processing Units (GPUs also have hundreds of cores and have been recently used to implement faster scientific software. However, currently there are no genetic analysis software packages that allow users to fully utilize the computing power of these multi-core devices for genetic interaction analysis for binary traits. Findings Here we present a novel software package GENIE, which utilizes the power of multiple GPU or CPU processor cores to parallelize the interaction analysis. GENIE reads an entire genetic association study dataset into memory and partitions the dataset into fragments with non-overlapping sets of SNPs. For each fragment, GENIE analyzes: 1 the interaction of SNPs within it in parallel, and 2 the interaction between the SNPs of the current fragment and other fragments in parallel. We tested GENIE on a large-scale candidate gene study on high-density lipoprotein cholesterol. Using an NVIDIA Tesla C1060 graphics card, the GPU mode of GENIE achieves a speedup of 27 times over its single-core CPU mode run. Conclusions GENIE is open-source, economical, user-friendly, and scalable. Since the computing power and memory capacity of graphics cards are increasing rapidly while their cost is going down, we anticipate that GENIE will achieve greater speedups with faster GPU cards. Documentation, source code, and precompiled binaries can be downloaded from http://www.cceb.upenn.edu/~mli/software/GENIE/.

Simulating Photon Mapping for Real-time Applications

DEFF Research Database (Denmark)

Larsen, Bent Dalgaard; Christensen, Niels Jørgen

2004-01-01

This paper introduces a novel method for simulating photon mapping for real-time applications. First we introduce a new method for selectively redistributing photons. Then we describe a method for selectively updating the indirect illumination. The indirect illumination is calculated using a new...... GPU accelerated final gathering method and the illumination is then stored in light maps. Caustic photons are traced on the CPU and then drawn using points in the framebuffer, and finally filtered using the GPU. Both diffuse and non-diffuse surfaces can be handled by calculating the direct...... illumination on the GPU and the photon tracing on the CPU. We achieve real-time frame rates for dynamic scenes....
VMware vSphere performance designing CPU, memory, storage, and networking for performance-intensive workloads

CERN Document Server

Liebowitz, Matt; Spies, Rynardt

2014-01-01

Covering the latest VMware vSphere software, an essential book aimed at solving vSphere performance problems before they happen VMware vSphere is the industry's most widely deployed virtualization solution. However, if you improperly deploy vSphere, performance problems occur. Aimed at VMware administrators and engineers and written by a team of VMware experts, this resource provides guidance on common CPU, memory, storage, and network-related problems. Plus, step-by-step instructions walk you through techniques for solving problems and shed light on possible causes behind the problems. Divu
Leveraging the checkpoint-restart technique for optimizing CPU efficiency of ATLAS production applications on opportunistic platforms

CERN Document Server

Cameron, David; The ATLAS collaboration

2017-01-01

Data processing applications of the ATLAS experiment, such as event simulation and reconstruction, spend considerable amount of time in the initialization phase. This phase includes loading a large number of shared libraries, reading detector geometry and condition data from external databases, building a transient representation of the detector geometry and initializing various algorithms and services. In some cases the initialization step can take as long as 10-15 minutes. Such slow initialization, being inherently serial, has a significant negative impact on overall CPU efficiency of the production job, especially when the job is executed on opportunistic, often short-lived, resources such as commercial clouds or volunteer computing. In order to improve this situation, we can take advantage of the fact that ATLAS runs large numbers of production jobs with similar configuration parameters (e.g. jobs within the same production task). This allows us to checkpoint one job at the end of its configuration step a...
Collaborating CPU and GPU for large-scale high-order CFD simulations with complex grids on the TianHe-1A supercomputer

Energy Technology Data Exchange (ETDEWEB)

Xu, Chuanfu, E-mail: xuchuanfu@nudt.edu.cn [College of Computer Science, National University of Defense Technology, Changsha 410073 (China); Deng, Xiaogang; Zhang, Lilun [College of Computer Science, National University of Defense Technology, Changsha 410073 (China); Fang, Jianbin [Parallel and Distributed Systems Group, Delft University of Technology, Delft 2628CD (Netherlands); Wang, Guangxue; Jiang, Yi [State Key Laboratory of Aerodynamics, P.O. Box 211, Mianyang 621000 (China); Cao, Wei; Che, Yonggang; Wang, Yongxian; Wang, Zhenghua; Liu, Wei; Cheng, Xinghua [College of Computer Science, National University of Defense Technology, Changsha 410073 (China)

2014-12-01

Programming and optimizing complex, real-world CFD codes on current many-core accelerated HPC systems is very challenging, especially when collaborating CPUs and accelerators to fully tap the potential of heterogeneous systems. In this paper, with a tri-level hybrid and heterogeneous programming model using MPI + OpenMP + CUDA, we port and optimize our high-order multi-block structured CFD software HOSTA on the GPU-accelerated TianHe-1A supercomputer. HOSTA adopts two self-developed high-order compact definite difference schemes WCNS and HDCS that can simulate flows with complex geometries. We present a dual-level parallelization scheme for efficient multi-block computation on GPUs and perform particular kernel optimizations for high-order CFD schemes. The GPU-only approach achieves a speedup of about 1.3 when comparing one Tesla M2050 GPU with two Xeon X5670 CPUs. To achieve a greater speedup, we collaborate CPU and GPU for HOSTA instead of using a naive GPU-only approach. We present a novel scheme to balance the loads between the store-poor GPU and the store-rich CPU. Taking CPU and GPU load balance into account, we improve the maximum simulation problem size per TianHe-1A node for HOSTA by 2.3×, meanwhile the collaborative approach can improve the performance by around 45% compared to the GPU-only approach. Further, to scale HOSTA on TianHe-1A, we propose a gather/scatter optimization to minimize PCI-e data transfer times for ghost and singularity data of 3D grid blocks, and overlap the collaborative computation and communication as far as possible using some advanced CUDA and MPI features. Scalability tests show that HOSTA can achieve a parallel efficiency of above 60% on 1024 TianHe-1A nodes. With our method, we have successfully simulated an EET high-lift airfoil configuration containing 800M cells and China's large civil airplane configuration containing 150M cells. To our best knowledge, those are the largest-scale CPU–GPU collaborative simulations
Collaborating CPU and GPU for large-scale high-order CFD simulations with complex grids on the TianHe-1A supercomputer

International Nuclear Information System (INIS)

Xu, Chuanfu; Deng, Xiaogang; Zhang, Lilun; Fang, Jianbin; Wang, Guangxue; Jiang, Yi; Cao, Wei; Che, Yonggang; Wang, Yongxian; Wang, Zhenghua; Liu, Wei; Cheng, Xinghua

2014-01-01

Programming and optimizing complex, real-world CFD codes on current many-core accelerated HPC systems is very challenging, especially when collaborating CPUs and accelerators to fully tap the potential of heterogeneous systems. In this paper, with a tri-level hybrid and heterogeneous programming model using MPI + OpenMP + CUDA, we port and optimize our high-order multi-block structured CFD software HOSTA on the GPU-accelerated TianHe-1A supercomputer. HOSTA adopts two self-developed high-order compact definite difference schemes WCNS and HDCS that can simulate flows with complex geometries. We present a dual-level parallelization scheme for efficient multi-block computation on GPUs and perform particular kernel optimizations for high-order CFD schemes. The GPU-only approach achieves a speedup of about 1.3 when comparing one Tesla M2050 GPU with two Xeon X5670 CPUs. To achieve a greater speedup, we collaborate CPU and GPU for HOSTA instead of using a naive GPU-only approach. We present a novel scheme to balance the loads between the store-poor GPU and the store-rich CPU. Taking CPU and GPU load balance into account, we improve the maximum simulation problem size per TianHe-1A node for HOSTA by 2.3×, meanwhile the collaborative approach can improve the performance by around 45% compared to the GPU-only approach. Further, to scale HOSTA on TianHe-1A, we propose a gather/scatter optimization to minimize PCI-e data transfer times for ghost and singularity data of 3D grid blocks, and overlap the collaborative computation and communication as far as possible using some advanced CUDA and MPI features. Scalability tests show that HOSTA can achieve a parallel efficiency of above 60% on 1024 TianHe-1A nodes. With our method, we have successfully simulated an EET high-lift airfoil configuration containing 800M cells and China's large civil airplane configuration containing 150M cells. To our best knowledge, those are the largest-scale CPU–GPU collaborative simulations
Application of queueing models to multiprogrammed computer systems operating in a time-critical environment

Science.gov (United States)

Eckhardt, D. E., Jr.

1979-01-01

A model of a central processor (CPU) which services background applications in the presence of time critical activity is presented. The CPU is viewed as an M/M/1 queueing system subject to periodic interrupts by deterministic, time critical process. The Laplace transform of the distribution of service times for the background applications is developed. The use of state of the art queueing models for studying the background processing capability of time critical computer systems is discussed and the results of a model validation study which support this application of queueing models are presented.
Test methods of total dose effects in very large scale integrated circuits

International Nuclear Information System (INIS)

He Chaohui; Geng Bin; He Baoping; Yao Yujuan; Li Yonghong; Peng Honglun; Lin Dongsheng; Zhou Hui; Chen Yusheng

2004-01-01

A kind of test method of total dose effects (TDE) is presented for very large scale integrated circuits (VLSI). The consumption current of devices is measured while function parameters of devices (or circuits) are measured. Then the relation between data errors and consumption current can be analyzed and mechanism of TDE in VLSI can be proposed. Experimental results of 60 Co γ TDEs are given for SRAMs, EEPROMs, FLASH ROMs and a kind of CPU
Adaptive real-time methodology for optimizing energy-efficient computing

Science.gov (United States)

Hsu, Chung-Hsing [Los Alamos, NM; Feng, Wu-Chun [Blacksburg, VA

2011-06-28

Dynamic voltage and frequency scaling (DVFS) is an effective way to reduce energy and power consumption in microprocessor units. Current implementations of DVFS suffer from inaccurate modeling of power requirements and usage, and from inaccurate characterization of the relationships between the applicable variables. A system and method is proposed that adjusts CPU frequency and voltage based on run-time calculations of the workload processing time, as well as a calculation of performance sensitivity with respect to CPU frequency. The system and method are processor independent, and can be applied to either an entire system as a unit, or individually to each process running on a system.
A Programming Framework for Scientific Applications on CPU-GPU Systems

Energy Technology Data Exchange (ETDEWEB)

Owens, John

2013-03-24

At a high level, my research interests center around designing, programming, and evaluating computer systems that use new approaches to solve interesting problems. The rapid change of technology allows a variety of different architectural approaches to computationally difficult problems, and a constantly shifting set of constraints and trends makes the solutions to these problems both challenging and interesting. One of the most important recent trends in computing has been a move to commodity parallel architectures. This sea change is motivated by the industry’s inability to continue to profitably increase performance on a single processor and instead to move to multiple parallel processors. In the period of review, my most significant work has been leading a research group looking at the use of the graphics processing unit (GPU) as a general-purpose processor. GPUs can potentially deliver superior performance on a broad range of problems than their CPU counterparts, but effectively mapping complex applications to a parallel programming model with an emerging programming environment is a significant and important research problem.
Porting AMG2013 to Heterogeneous CPU+GPU Nodes

Energy Technology Data Exchange (ETDEWEB)

Samfass, Philipp [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

2017-01-26

LLNL's future advanced technology system SIERRA will feature heterogeneous compute nodes that consist of IBM PowerV9 CPUs and NVIDIA Volta GPUs. Conceptually, the motivation for such an architecture is quite straightforward: While GPUs are optimized for throughput on massively parallel workloads, CPUs strive to minimize latency for rather sequential operations. Yet, making optimal use of heterogeneous architectures raises new challenges for the development of scalable parallel software, e.g., with respect to work distribution. Porting LLNL's parallel numerical libraries to upcoming heterogeneous CPU+GPU architectures is therefore a critical factor for ensuring LLNL's future success in ful lling its national mission. One of these libraries, called HYPRE, provides parallel solvers and precondi- tioners for large, sparse linear systems of equations. In the context of this intern- ship project, I consider AMG2013 which is a proxy application for major parts of HYPRE that implements a benchmark for setting up and solving di erent systems of linear equations. In the following, I describe in detail how I ported multiple parts of AMG2013 to the GPU (Section 2) and present results for di erent experiments that demonstrate a successful parallel implementation on the heterogeneous ma- chines surface and ray (Section 3). In Section 4, I give guidelines on how my code should be used. Finally, I conclude and give an outlook for future work (Section 5).
Benchmarking hardware architecture candidates for the NFIRAOS real-time controller

Science.gov (United States)

Smith, Malcolm; Kerley, Dan; Herriot, Glen; Véran, Jean-Pierre

2014-07-01

As a part of the trade study for the Narrow Field Infrared Adaptive Optics System, the adaptive optics system for the Thirty Meter Telescope, we investigated the feasibility of performing real-time control computation using a Linux operating system and Intel Xeon E5 CPUs. We also investigated a Xeon Phi based architecture which allows higher levels of parallelism. This paper summarizes both the CPU based real-time controller architecture and the Xeon Phi based RTC. The Intel Xeon E5 CPU solution meets the requirements and performs the computation for one AO cycle in an average of 767 microseconds. The Xeon Phi solution did not meet the 1200 microsecond time requirement and also suffered from unpredictable execution times. More detailed benchmark results are reported for both architectures.
Design and development of a diversified real time computer for future FBRs

International Nuclear Information System (INIS)

Sujith, K.R.; Bhattacharyya, Anindya; Behera, R.P.; Murali, N.

2014-01-01

The current safety related computer system of Prototype Fast Breeder Reactor (PFBR) under construction in Kalpakkam consists of two redundant Versa Module Europa (VME) bus based Real Time Computer system with a Switch Over Logic Circuit (SOLC). Since both the VME systems are identical, the dual redundant system is prone to common cause failure (CCF). The probability of CCF can be reduced by adopting diversity. Design diversity has long been used to protect redundant systems against common-mode failures. The conventional notion of diversity relies on 'independent' generation of 'different' implementations. This paper discusses the design and development of a diversified Real Time Computer which will replace one of the computer system in the dual redundant architecture. Compact PCI (cPCI) bus systems are widely used in safety critical applications such as avionics, railways, defence and uses diverse electrical signaling and logical specifications, hence was chosen for development of the diversified system. Towards the initial development a CPU card based on an ARM-9 processor, 16 channel Relay Output (RO) card and a 30 channel Analog Input (AI) card was developed. All the cards mentioned supports hot-swap and geographic addressing capability. In order to mitigate the component obsolescence problem the 32 bit PCI target controller and associated glue logic for the slave I/O cards was indigenously developed using VHDL. U-boot was selected as the boot loader and arm Linux 2.6 as the preliminary operating system for the CPU card. Board specific initialization code for the CPU card was written in ARM assembly language and serial port initialization was written in C language. Boot loader along with Linux 2.6 kernel and jffs2 file system was flashed into the CPU card. Test applications written in C language were used to test the various peripherals of the CPU card. Device driver for the AI and RO card was developed as Linux kernel modules and application library was also
Saving time and energy with oversubscription and semi-direct Møller-Plesset second order perturbation methods.

Science.gov (United States)

Fought, Ellie L; Sundriyal, Vaibhav; Sosonkina, Masha; Windus, Theresa L

2017-04-30

In this work, the effect of oversubscription is evaluated, via calling 2n, 3n, or 4n processes for n physical cores, on semi-direct MP2 energy and gradient calculations and RI-MP2 energy calculations with the cc-pVTZ basis using NWChem. Results indicate that on both Intel and AMD platforms, oversubscription reduces total time to solution on average for semi-direct MP2 energy calculations by 25-45% and reduces total energy consumed by the CPU and DRAM on average by 10-15% on the Intel platform. Semi-direct gradient time to solution is shortened on average by 8-15% and energy consumption is decreased by 5-10%. Linear regression analysis shows a strong correlation between time to solution and total energy consumed. Oversubscribing during RI-MP2 calculations results in performance degradations of 30-50% at the 4n level. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Application of total care time and payment per unit time model for physician reimbursement for common general surgery operations.

Science.gov (United States)

Chatterjee, Abhishek; Holubar, Stefan D; Figy, Sean; Chen, Lilian; Montagne, Shirley A; Rosen, Joseph M; Desimone, Joseph P

2012-06-01

The relative value unit system relies on subjective measures of physician input in the care of patients. A payment per unit time model incorporates surgeon reimbursement to the total care time spent in the operating room, postoperative in-house, and clinic time to define payment per unit time. We aimed to compare common general surgery operations by using the total care time and payment per unit time method in order to demonstrate a more objective measurement for physician reimbursement. Average total physician payment per case was obtained for 5 outpatient operations and 4 inpatient operations in general surgery. Total care time was defined as the sum of operative time, 30 minutes per hospital day, and 30 minutes per office visit for each operation. Payment per unit time was calculated by dividing the physician reimbursement per case by the total care time. Total care time, physician payment per case, and payment per unit time for each type of operation demonstrated that an average payment per time spent for inpatient operations was $455.73 and slightly more at $467.51 for outpatient operations. Partial colectomy with primary anastomosis had the longest total care time (8.98 hours) and the least payment per unit time ($188.52). Laparoscopic gastric bypass had the highest payment per time ($707.30). The total care time and payment per unit time method can be used as an adjunct to compare reimbursement among different operations on an institutional level as well as on a national level. Although many operations have similar payment trends based on time spent by the surgeon, payment differences using this methodology are seen and may be in need of further review. Copyright © 2012 American College of Surgeons. Published by Elsevier Inc. All rights reserved.
Total sitting time, leisure time physical activity and risk of hospitalization due to low back pain

DEFF Research Database (Denmark)

Balling, Mie; Holmberg, Teresa; Petersen, Christina B

2018-01-01

AIMS: This study aimed to test the hypotheses that a high total sitting time and vigorous physical activity in leisure time increase the risk of low back pain and herniated lumbar disc disease. METHODS: A total of 76,438 adults answered questions regarding their total sitting time and physical...... activity during leisure time in the Danish Health Examination Survey 2007-2008. Information on low back pain diagnoses up to 10 September 2015 was obtained from The National Patient Register. The mean follow-up time was 7.4 years. Data were analysed using Cox regression analysis with adjustment...... disc disease. However, moderate or vigorous physical activity, as compared to light physical activity, was associated with increased risk of low back pain (HR = 1.16, 95% CI: 1.03-1.30 and HR = 1.45, 95% CI: 1.15-1.83). Moderate, but not vigorous physical activity was associated with increased risk...
Relative performance of priority rules for hybrid flow shop scheduling with setup times

Directory of Open Access Journals (Sweden)

Helio Yochihiro Fuchigami

2015-12-01

Full Text Available This paper focuses the hybrid flow shop scheduling problem with explicit and sequence-independent setup times. This production environment is a multistage system with unidirectional flow of jobs, wherein each stage may contain multiple machines available for processing. The optimized measure was the total time to complete the schedule (makespan. The aim was to propose new priority rules to support the schedule and to evaluate their relative performance at the production system considered by the percentage of success, relative deviation, standard deviation of relative deviation, and average CPU time. Computational experiments have indicated that the rules using ascending order of the sum of processing and setup times of the first stage (SPT1 and SPT1_ERD performed better, reaching together more than 56% of success.
Newmark local time stepping on high-performance computing architectures

KAUST Repository

Rietmann, Max

2016-11-25

In multi-scale complex media, finite element meshes often require areas of local refinement, creating small elements that can dramatically reduce the global time-step for wave-propagation problems due to the CFL condition. Local time stepping (LTS) algorithms allow an explicit time-stepping scheme to adapt the time-step to the element size, allowing near-optimal time-steps everywhere in the mesh. We develop an efficient multilevel LTS-Newmark scheme and implement it in a widely used continuous finite element seismic wave-propagation package. In particular, we extend the standard LTS formulation with adaptations to continuous finite element methods that can be implemented very efficiently with very strong element-size contrasts (more than 100×). Capable of running on large CPU and GPU clusters, we present both synthetic validation examples and large scale, realistic application examples to demonstrate the performance and applicability of the method and implementation on thousands of CPU cores and hundreds of GPUs.
Newmark local time stepping on high-performance computing architectures

KAUST Repository

Rietmann, Max; Grote, Marcus; Peter, Daniel; Schenk, Olaf

2016-01-01

In multi-scale complex media, finite element meshes often require areas of local refinement, creating small elements that can dramatically reduce the global time-step for wave-propagation problems due to the CFL condition. Local time stepping (LTS) algorithms allow an explicit time-stepping scheme to adapt the time-step to the element size, allowing near-optimal time-steps everywhere in the mesh. We develop an efficient multilevel LTS-Newmark scheme and implement it in a widely used continuous finite element seismic wave-propagation package. In particular, we extend the standard LTS formulation with adaptations to continuous finite element methods that can be implemented very efficiently with very strong element-size contrasts (more than 100×). Capable of running on large CPU and GPU clusters, we present both synthetic validation examples and large scale, realistic application examples to demonstrate the performance and applicability of the method and implementation on thousands of CPU cores and hundreds of GPUs.
Newmark local time stepping on high-performance computing architectures

Energy Technology Data Exchange (ETDEWEB)

Rietmann, Max, E-mail: max.rietmann@erdw.ethz.ch [Institute for Computational Science, Università della Svizzera italiana, Lugano (Switzerland); Institute of Geophysics, ETH Zurich (Switzerland); Grote, Marcus, E-mail: marcus.grote@unibas.ch [Department of Mathematics and Computer Science, University of Basel (Switzerland); Peter, Daniel, E-mail: daniel.peter@kaust.edu.sa [Institute for Computational Science, Università della Svizzera italiana, Lugano (Switzerland); Institute of Geophysics, ETH Zurich (Switzerland); Schenk, Olaf, E-mail: olaf.schenk@usi.ch [Institute for Computational Science, Università della Svizzera italiana, Lugano (Switzerland)

2017-04-01

In multi-scale complex media, finite element meshes often require areas of local refinement, creating small elements that can dramatically reduce the global time-step for wave-propagation problems due to the CFL condition. Local time stepping (LTS) algorithms allow an explicit time-stepping scheme to adapt the time-step to the element size, allowing near-optimal time-steps everywhere in the mesh. We develop an efficient multilevel LTS-Newmark scheme and implement it in a widely used continuous finite element seismic wave-propagation package. In particular, we extend the standard LTS formulation with adaptations to continuous finite element methods that can be implemented very efficiently with very strong element-size contrasts (more than 100x). Capable of running on large CPU and GPU clusters, we present both synthetic validation examples and large scale, realistic application examples to demonstrate the performance and applicability of the method and implementation on thousands of CPU cores and hundreds of GPUs.
Single machine total completion time minimization scheduling with a time-dependent learning effect and deteriorating jobs

Science.gov (United States)

Wang, Ji-Bo; Wang, Ming-Zheng; Ji, Ping

2012-05-01

In this article, we consider a single machine scheduling problem with a time-dependent learning effect and deteriorating jobs. By the effects of time-dependent learning and deterioration, we mean that the job processing time is defined by a function of its starting time and total normal processing time of jobs in front of it in the sequence. The objective is to determine an optimal schedule so as to minimize the total completion time. This problem remains open for the case of -1 < a < 0, where a denotes the learning index; we show that an optimal schedule of the problem is V-shaped with respect to job normal processing times. Three heuristic algorithms utilising the V-shaped property are proposed, and computational experiments show that the last heuristic algorithm performs effectively and efficiently in obtaining near-optimal solutions.

Effective electron-density map improvement and structure validation on a Linux multi-CPU web cluster: The TB Structural Genomics Consortium Bias Removal Web Service.

Science.gov (United States)

Reddy, Vinod; Swanson, Stanley M; Segelke, Brent; Kantardjieff, Katherine A; Sacchettini, James C; Rupp, Bernhard

2003-12-01

Anticipating a continuing increase in the number of structures solved by molecular replacement in high-throughput crystallography and drug-discovery programs, a user-friendly web service for automated molecular replacement, map improvement, bias removal and real-space correlation structure validation has been implemented. The service is based on an efficient bias-removal protocol, Shake&wARP, and implemented using EPMR and the CCP4 suite of programs, combined with various shell scripts and Fortran90 routines. The service returns improved maps, converted data files and real-space correlation and B-factor plots. User data are uploaded through a web interface and the CPU-intensive iteration cycles are executed on a low-cost Linux multi-CPU cluster using the Condor job-queuing package. Examples of map improvement at various resolutions are provided and include model completion and reconstruction of absent parts, sequence correction, and ligand validation in drug-target structures.
Whole blood coagulation time, haematocrit, haemoglobin and total ...

African Journals Online (AJOL)

The study was carried out to determine the values of whole blood coagulation time (WBCT), haematocrit (HM), haemaglobin (HB) and total protein (TP) of one hundred and eighteen apparently healthy turkeys reared under an extensive management system in Zaria. The mean values for WBCT, HM, HB and TP were 1.12 ...
Does antegrade JJ stenting affect the total operative time during laparoscopic pyeloplasty?

Science.gov (United States)

Bolat, Mustafa Suat; Çınar, Önder; Akdeniz, Ekrem

2017-12-01

We aimed to show the effect of retrograde JJ stenting and intraoperative antegrade JJ stenting techniques on operative time in patients who underwent laparoscopic pyeloplasty. A total of 34 patients were retrospectively investigated (15 male and 19 female) with ureteropelvic junction obstruction. Of the patients stentized under local anesthesia preoperatively, as a part of surgery, 15 were retrogradely stentized at the beginning of the procedure (Group 1), and 19 were antegradely stentized during the procedure (Group 2). A transperitoneal dismembered pyeloplasty technique was performed in all patients. The two groups were retrospectively compared in terms of complications, the mean total operative time, and the mean stenting times. The mean ages of the patients were 31.5±15.5 and 33.2±15.5 years (p=0.09), and the mean body mass indexes were 25.8±5.6 and 26.2.3±8.4 kg/m 2 in Group 1 and Group 2, respectively. The mean total operative times were 128.9±38.9 min and 112.7±21.9 min (p=0.04); the mean stenting times were 12.6±5.4 min and 3.5±2.4 min (p=0.02); and the mean rates of catheterization-to-total surgery times were 0.1 and 0.03 (p=0.01) in Group 1 and 2, respectively. The mean hospital stays and the mean anastomosis times were similar between the two groups (p>0.05). Antegrade JJ stenting during laparoscopic pyeloplasty significantly decreased the total operative time.
Timing comparison of two-dimensional discrete-ordinates codes for criticality calculations

International Nuclear Information System (INIS)

Miller, W.F. Jr.; Alcouffe, R.E.; Bosler, G.E.; Brinkley, F.W. Jr.; O'dell, R.D.

1979-01-01

The authors compare two-dimensional discrete-ordinates neutron transport computer codes to solve reactor criticality problems. The fundamental interest is in determining which code requires the minimum Central Processing Unit (CPU) time for a given numerical model of a reasonably realistic fast reactor core and peripherals. The computer codes considered are the most advanced available and, in three cases, are not officially released. The conclusion, based on the study of four fast reactor core models, is that for this class of problems the diffusion synthetic accelerated version of TWOTRAN, labeled TWOTRAN-DA, is superior to the other codes in terms of CPU requirements
Deployment of 464XLAT (RFC6877) alongside IPv6-only CPU resources at WLCG sites

Science.gov (United States)

Froy, T. S.; Traynor, D. P.; Walker, C. J.

2017-10-01

IPv4 is now officially deprecated by the IETF. A significant amount of effort has already been expended by the HEPiX IPv6 Working Group on testing dual-stacked hosts and IPv6-only CPU resources. Dual-stack adds complexity and administrative overhead to sites that may already be starved of resource. This has resulted in a very slow uptake of IPv6 from WLCG sites. 464XLAT (RFC6877) is intended for IPv6 single-stack environments that require the ability to communicate with IPv4-only endpoints. This paper will present a deployment strategy for 464XLAT, operational experiences of using 464XLAT in production at a WLCG site and important information to consider prior to deploying 464XLAT.
Objectively Measured Total and Occupational Sedentary Time in Three Work Settings

Science.gov (United States)

van Dommelen, Paula; Coffeng, Jennifer K.; van der Ploeg, Hidde P.; van der Beek, Allard J.; Boot, Cécile R. L.; Hendriksen, Ingrid J. M.

2016-01-01

Background Sedentary behaviour increases the risk for morbidity. Our primary aim is to determine the proportion and factors associated with objectively measured total and occupational sedentary time in three work settings. Secondary aim is to study the proportion of physical activity and prolonged sedentary bouts. Methods Data were obtained using ActiGraph accelerometers from employees of: 1) a financial service provider (n = 49 men, 31 women), 2) two research institutes (n = 30 men, 57 women), and 3) a construction company (n = 38 men). Total (over the whole day) and occupational sedentary time, physical activity and prolonged sedentary bouts (lasting ≥30 minutes) were calculated by work setting. Linear regression analyses were performed to examine general, health and work-related factors associated with sedentary time. Results The employees of the financial service provider and the research institutes spent 76–80% of their occupational time in sedentary behaviour, 18–20% in light intensity physical activity and 3–5% in moderate-to-vigorous intensity physical activity. Occupational time in prolonged sedentary bouts was 27–30%. Total time was less sedentary (64–70%), and had more light intensity physical activity (26–33%). The employees of the construction company spent 44% of their occupational time in sedentary behaviour, 49% in light, and 7% in moderate intensity physical activity, and spent 7% in sedentary bouts. Total time spent in sedentary behavior was 56%, 40% in light, and 4% in moderate intensity physical behaviour, and 12% in sedentary bouts. For women, low to intermediate education was the only factor that was negatively associated with occupational sedentary time. Conclusions Sedentary behaviour is high among white-collar employees, especially in highly educated women. A relatively small proportion of sedentary time was accrued in sedentary bouts. It is recommended that worksite health promotion efforts should focus on reducing sedentary
A Study on GPU-based Iterative ML-EM Reconstruction Algorithm for Emission Computed Tomographic Imaging Systems

Energy Technology Data Exchange (ETDEWEB)

Ha, Woo Seok; Kim, Soo Mee; Park, Min Jae; Lee, Dong Soo; Lee, Jae Sung [Seoul National University, Seoul (Korea, Republic of)

2009-10-15

The maximum likelihood-expectation maximization (ML-EM) is the statistical reconstruction algorithm derived from probabilistic model of the emission and detection processes. Although the ML-EM has many advantages in accuracy and utility, the use of the ML-EM is limited due to the computational burden of iterating processing on a CPU (central processing unit). In this study, we developed a parallel computing technique on GPU (graphic processing unit) for ML-EM algorithm. Using Geforce 9800 GTX+ graphic card and CUDA (compute unified device architecture) the projection and backprojection in ML-EM algorithm were parallelized by NVIDIA's technology. The time delay on computations for projection, errors between measured and estimated data and backprojection in an iteration were measured. Total time included the latency in data transmission between RAM and GPU memory. The total computation time of the CPU- and GPU-based ML-EM with 32 iterations were 3.83 and 0.26 sec, respectively. In this case, the computing speed was improved about 15 times on GPU. When the number of iterations increased into 1024, the CPU- and GPU-based computing took totally 18 min and 8 sec, respectively. The improvement was about 135 times and was caused by delay on CPU-based computing after certain iterations. On the other hand, the GPU-based computation provided very small variation on time delay per iteration due to use of shared memory. The GPU-based parallel computation for ML-EM improved significantly the computing speed and stability. The developed GPU-based ML-EM algorithm could be easily modified for some other imaging geometries
A Study on GPU-based Iterative ML-EM Reconstruction Algorithm for Emission Computed Tomographic Imaging Systems

International Nuclear Information System (INIS)

Ha, Woo Seok; Kim, Soo Mee; Park, Min Jae; Lee, Dong Soo; Lee, Jae Sung

2009-01-01

The maximum likelihood-expectation maximization (ML-EM) is the statistical reconstruction algorithm derived from probabilistic model of the emission and detection processes. Although the ML-EM has many advantages in accuracy and utility, the use of the ML-EM is limited due to the computational burden of iterating processing on a CPU (central processing unit). In this study, we developed a parallel computing technique on GPU (graphic processing unit) for ML-EM algorithm. Using Geforce 9800 GTX+ graphic card and CUDA (compute unified device architecture) the projection and backprojection in ML-EM algorithm were parallelized by NVIDIA's technology. The time delay on computations for projection, errors between measured and estimated data and backprojection in an iteration were measured. Total time included the latency in data transmission between RAM and GPU memory. The total computation time of the CPU- and GPU-based ML-EM with 32 iterations were 3.83 and 0.26 sec, respectively. In this case, the computing speed was improved about 15 times on GPU. When the number of iterations increased into 1024, the CPU- and GPU-based computing took totally 18 min and 8 sec, respectively. The improvement was about 135 times and was caused by delay on CPU-based computing after certain iterations. On the other hand, the GPU-based computation provided very small variation on time delay per iteration due to use of shared memory. The GPU-based parallel computation for ML-EM improved significantly the computing speed and stability. The developed GPU-based ML-EM algorithm could be easily modified for some other imaging geometries
Total Work, Gender and Social Norms in EU and US Time Use

OpenAIRE

Burda , Michael C; Hamermesh , Daniel S; Weil , Philippe

2008-01-01

Using time-diary data from 27 countries, we demonstrate a negative relationship between real GDP per capita and the female-male difference in total work time--the sum of work for pay and work at home. We also show that in rich non-Catholic countries on four continents men and women do the same amount of total work on average. Our survey results demonstrate that labor economists, macroeconomists, sociologists and the general public consistently believe that women perform more total work. The f...
PROCESS INNOVATION: HOLISTIC SCENARIOS TO REDUCE TOTAL LEAD TIME

Directory of Open Access Journals (Sweden)

Alin POSTEUCĂ

2015-11-01

Full Text Available The globalization of markets requires continuous development of business holistic scenarios to ensure acceptable flexibility to satisfy customers. Continuous improvement of supply chain supposes continuous improvement of materials and products lead time and flow, material stocks and finished products stocks and increasing the number of suppliers close by as possible. The contribution of our study is to present holistic scenarios of total lead time improvement and innovation by implementing supply chain policy.
Research on control law accelerator of digital signal process chip TMS320F28035 for real-time data acquisition and processing

Science.gov (United States)

Zhao, Shuangle; Zhang, Xueyi; Sun, Shengli; Wang, Xudong

2017-08-01

TI C2000 series digital signal process (DSP) chip has been widely used in electrical engineering, measurement and control, communications and other professional fields, DSP TMS320F28035 is one of the most representative of a kind. When using the DSP program, need data acquisition and data processing, and if the use of common mode C or assembly language programming, the program sequence, analogue-to-digital (AD) converter cannot be real-time acquisition, often missing a lot of data. The control low accelerator (CLA) processor can run in parallel with the main central processing unit (CPU), and the frequency is consistent with the main CPU, and has the function of floating point operations. Therefore, the CLA coprocessor is used in the program, and the CLA kernel is responsible for data processing. The main CPU is responsible for the AD conversion. The advantage of this method is to reduce the time of data processing and realize the real-time performance of data acquisition.
Total and domain-specific sitting time among employees in desk-based work settings in Australia.

Science.gov (United States)

Bennie, Jason A; Pedisic, Zeljko; Timperio, Anna; Crawford, David; Dunstan, David; Bauman, Adrian; van Uffelen, Jannique; Salmon, Jo

2015-06-01

To describe the total and domain-specific daily sitting time among a sample of Australian office-based employees. In April 2010, paper-based surveys were provided to desk-based employees (n=801) in Victoria, Australia. Total daily and domain-specific (work, leisure-time and transport-related) sitting time (minutes/day) were assessed by validated questionnaires. Differences in sitting time were examined across socio-demographic (age, sex, occupational status) and lifestyle characteristics (physical activity levels, body mass index [BMI]) using multiple linear regression analyses. The median (95% confidence interval [CI]) of total daily sitting time was 540 (531-557) minutes/day. Insufficiently active adults (median=578 minutes/day, [95%CI: 564-602]), younger adults aged 18-29 years (median=561 minutes/day, [95%CI: 540-577]) reported the highest total daily sitting times. Occupational sitting time accounted for almost 60% of total daily sitting time. In multivariate analyses, total daily sitting time was negatively associated with age (unstandardised regression coefficient [B]=-1.58, pphysical activity (minutes/week) (B=-0.03, pemployees reported that more than half of their total daily sitting time was accrued in the work setting. Given the high contribution of occupational sitting to total daily sitting time among desk-based employees, interventions should focus on the work setting. © 2014 Public Health Association of Australia.
CPU Server

CERN Multimedia

The CERN computer centre has hundreds of racks like these. They are over a million times more powerful than our first computer in the 1960's. This tray is a 'dual-core' server. This means it effectively has two CPUs in it (eg. two of your home computers minimised to fit into a single box). Also note the copper cooling fins, to help dissipate the heat.
Control of total voltage in the large distributed RF system of LEP

CERN Document Server

Ciapala, Edmond

1995-01-01

The LEP RF system is made up of a large number of independent RF units situated around the ring near the interaction points. These have different available RF voltages depending on their type and they may be inactive or unable to provide full voltage for certain periods. The original RF voltage control system was based on local RF unit voltage function generators pre-loaded with individual tables for energy ramping. This was replaced this year by a more flexible global RF voltage control system. A central controller in the main control room has direct access to the units over the LEP TDM system via multiplexers and local serial links. It continuously checks the state of all the units and adjusts their voltages to maintain the desired total voltage under all conditions. This voltage is distributed among the individual units to reduce the adverse effects of RF voltage asymmetry around the machine as far as possible. The central controller is a VME system with 68040 CPU and real time multitasking operating syste...
Online real-time reconstruction of adaptive TSENSE with commodity CPU / GPU hardware

DEFF Research Database (Denmark)

Roujol, Sebastien; de Senneville, Baudouin; Vahala, E.

2009-01-01

A real-time reconstruction for adaptive TSENSE is presented that is optimized for MR-guidance of interventional procedures. The proposed method allows high frame-rate imaging with low image latencies, even when large coil arrays are employed and can be implemented on affordable commodity hardware....
Developing infrared array controller with software real time operating system

Science.gov (United States)

Sako, Shigeyuki; Miyata, Takashi; Nakamura, Tomohiko; Motohara, Kentaro; Uchimoto, Yuka Katsuno; Onaka, Takashi; Kataza, Hirokazu

2008-07-01

Real-time capabilities are required for a controller of a large format array to reduce a dead-time attributed by readout and data transfer. The real-time processing has been achieved by dedicated processors including DSP, CPLD, and FPGA devices. However, the dedicated processors have problems with memory resources, inflexibility, and high cost. Meanwhile, a recent PC has sufficient resources of CPUs and memories to control the infrared array and to process a large amount of frame data in real-time. In this study, we have developed an infrared array controller with a software real-time operating system (RTOS) instead of the dedicated processors. A Linux PC equipped with a RTAI extension and a dual-core CPU is used as a main computer, and one of the CPU cores is allocated to the real-time processing. A digital I/O board with DMA functions is used for an I/O interface. The signal-processing cores are integrated in the OS kernel as a real-time driver module, which is composed of two virtual devices of the clock processor and the frame processor tasks. The array controller with the RTOS realizes complicated operations easily, flexibly, and at a low cost.
Event processing time prediction at the CMS experiment of the Large Hadron Collider

International Nuclear Information System (INIS)

Cury, Samir; Gutsche, Oliver; Kcira, Dorian

2014-01-01

The physics event reconstruction is one of the biggest challenges for the computing of the LHC experiments. Among the different tasks that computing systems of the CMS experiment performs, the reconstruction takes most of the available CPU resources. The reconstruction time of single collisions varies according to event complexity. Measurements were done in order to determine this correlation quantitatively, creating means to predict it based on the data-taking conditions of the input samples. Currently the data processing system splits tasks in groups with the same number of collisions and does not account for variations in the processing time. These variations can be large and can lead to a considerable increase in the time it takes for CMS workflows to finish. The goal of this study was to use estimates on processing time to more efficiently split the workflow into jobs. By considering the CPU time needed for each job the spread of the job-length distribution in a workflow is reduced.
SpaceCubeX: A Framework for Evaluating Hybrid Multi-Core CPU FPGA DSP Architectures

Science.gov (United States)

Schmidt, Andrew G.; Weisz, Gabriel; French, Matthew; Flatley, Thomas; Villalpando, Carlos Y.

2017-01-01

The SpaceCubeX project is motivated by the need for high performance, modular, and scalable on-board processing to help scientists answer critical 21st century questions about global climate change, air quality, ocean health, and ecosystem dynamics, while adding new capabilities such as low-latency data products for extreme event warnings. These goals translate into on-board processing throughput requirements that are on the order of 100-1,000 more than those of previous Earth Science missions for standard processing, compression, storage, and downlink operations. To study possible future architectures to achieve these performance requirements, the SpaceCubeX project provides an evolvable testbed and framework that enables a focused design space exploration of candidate hybrid CPU/FPGA/DSP processing architectures. The framework includes ArchGen, an architecture generator tool populated with candidate architecture components, performance models, and IP cores, that allows an end user to specify the type, number, and connectivity of a hybrid architecture. The framework requires minimal extensions to integrate new processors, such as the anticipated High Performance Spaceflight Computer (HPSC), reducing time to initiate benchmarking by months. To evaluate the framework, we leverage a wide suite of high performance embedded computing benchmarks and Earth science scenarios to ensure robust architecture characterization. We report on our projects Year 1 efforts and demonstrate the capabilities across four simulation testbed models, a baseline SpaceCube 2.0 system, a dual ARM A9 processor system, a hybrid quad ARM A53 and FPGA system, and a hybrid quad ARM A53 and DSP system.
Television viewing, computer use and total screen time in Canadian youth.

Science.gov (United States)

Mark, Amy E; Boyce, William F; Janssen, Ian

2006-11-01

Research has linked excessive television viewing and computer use in children and adolescents to a variety of health and social problems. Current recommendations are that screen time in children and adolescents should be limited to no more than 2 h per day. To determine the percentage of Canadian youth meeting the screen time guideline recommendations. The representative study sample consisted of 6942 Canadian youth in grades 6 to 10 who participated in the 2001/2002 World Health Organization Health Behaviour in School-Aged Children survey. Only 41% of girls and 34% of boys in grades 6 to 10 watched 2 h or less of television per day. Once the time of leisure computer use was included and total daily screen time was examined, only 18% of girls and 14% of boys met the guidelines. The prevalence of those meeting the screen time guidelines was higher in girls than boys. Fewer than 20% of Canadian youth in grades 6 to 10 met the total screen time guidelines, suggesting that increased public health interventions are needed to reduce the number of leisure time hours that Canadian youth spend watching television and using the computer.
Real-Time Generic Face Tracking in the Wild with CUDA

NARCIS (Netherlands)

Cheng, Shiyang; Asthana, Akshay; Asthana, Ashish; Zafeiriou, Stefanos; Shen, Jie; Pantic, Maja

We present a robust real-time face tracking system based on the Constrained Local Models framework by adopting the novel regression-based Discriminative Response Map Fitting (DRMF) method. By exploiting the algorithm's potential parallelism, we present a hybrid CPU-GPU implementation capable of

Surgical time and complications of total transvaginal (total-NOTES, single-port laparoscopic-assisted and conventional ovariohysterectomy in bitches

Directory of Open Access Journals (Sweden)

M.A.M. Silva

2015-06-01

Full Text Available The recently developed minimally invasive techniques of ovariohysterectomy (OVH have been studied in dogs in order to optimize their benefits and decrease risks to the patients. The purpose of this study was to compare surgical time, complications and technical difficulties of transvaginal total-NOTES, single-port laparoscopic-assisted and conventional OVH in bitches. Twelve bitches were submitted to total-NOTES (NOTES group, while 13 underwent single-port laparoscopic-assisted (SPLA group and 15 were submitted to conventional OVH (OPEN group. Intra-operative period was divided into 7 stages: (1 access to abdominal cavity; (2 pneumoperitoneum; approach to the right (3 and left (4 ovarian pedicle and uterine body (5; (6 abdominal or vaginal synthesis, performed in 6 out of 12 patients of NOTES; (7 inoperative time. Overall and stages operative times, intra and postoperative complications and technical difficulties were compared among groups. Mean overall surgical time in NOTES (25.7±6.8 minutes and SPLA (23.1±4.0 minutes groups were shorter than in the OPEN group (34.0±6.4 minutes (P<0.05. The intraoperative stage that required the longest time was the approach to the uterine body in the NOTES group and abdominal and cutaneous sutures in the OPEN group. There was no difference regarding the rates of complications. Major complications included postoperative bleeding requiring reoperation in a bitch in the OPEN group, while minor complications included mild vaginal discharge in four patients in the NOTES group and seroma in three bitches in the SPLA group. In conclusion, total-NOTES and SPLA OVH were less time-consuming then conventional OVH in bitches. All techniques presented complications, which were properly managed.
Timing of Re-Transfusion Drain Removal Following Total Knee Replacement

Science.gov (United States)

Leeman, MF; Costa, ML; Costello, E; Edwards, D

2006-01-01

INTRODUCTION The use of postoperative drains following total knee replacement (TKR) has recently been modified by the use of re-transfusion drains. The aim of our study was to investigate the optimal time for removal of re-transfusion drains following TKR. PATIENTS AND METHODS The medical records of 66 patients who had a TKR performed between October 2003 and October 2004 were reviewed; blood drained before 6 h and the total volume of blood drained was recorded. RESULTS A total of 56 patients had complete records of postoperative drainage. The mean volume of blood collected in the drain in the first 6 h was 442 ml. The mean total volume of blood in the drain was 595 ml. Therefore, of the blood drained, 78% was available for transfusion. CONCLUSION Re-transfusion drains should be removed after 6 h, when no further re-transfusion is permissible. PMID:16551400
Total sleep time severely drops during adolescence.

Directory of Open Access Journals (Sweden)

Damien Leger

Full Text Available UNLABELLED: Restricted sleep duration among young adults and adolescents has been shown to increase the risk of morbidities such as obesity, diabetes or accidents. However there are few epidemiological studies on normal total sleep time (TST in representative groups of teen-agers which allow to get normative data. PURPOSE: To explore perceived total sleep time on schooldays (TSTS and non schooldays (TSTN and the prevalence of sleep initiating insomnia among a nationally representative sample of teenagers. METHODS: Data from 9,251 children aged 11 to 15 years-old, 50.7% of which were boys, as part of the cross-national study 2011 HBSC were analyzed. Self-completion questionnaires were administered in classrooms. An estimate of TSTS and TSTN (week-ends and vacations was calculated based on specifically designed sleep habits report. Sleep deprivation was estimated by a TSTN - TSTS difference >2 hours. Sleep initiating nsomnia was assessed according to International classification of sleep disorders (ICSD 2. Children who reported sleeping 7 hours or less per night were considered as short sleepers. RESULTS: A serious drop of TST was observed between 11 yo and 15 yo, both during the schooldays (9 hours 26 minutes vs. 7 h 55 min.; p<0.001 and at a lesser extent during week-ends (10 h 17 min. vs. 9 h 44 min.; p<0.001. Sleep deprivation concerned 16.0% of chidren aged of 11 yo vs. 40.5% of those of 15 yo (p<0.001. Too short sleep was reported by 2.6% of the 11 yo vs. 24.6% of the 15 yo (p<0.001. CONCLUSION: Despite the obvious need for sleep in adolescence, TST drastically decreases with age among children from 11 to 15 yo which creates significant sleep debt increasing with age.
SAFARI digital processing unit: performance analysis of the SpaceWire links in case of a LEON3-FT based CPU

Science.gov (United States)

Giusi, Giovanni; Liu, Scige J.; Di Giorgio, Anna M.; Galli, Emanuele; Pezzuto, Stefano; Farina, Maria; Spinoglio, Luigi

2014-08-01

SAFARI (SpicA FAR infrared Instrument) is a far-infrared imaging Fourier Transform Spectrometer for the SPICA mission. The Digital Processing Unit (DPU) of the instrument implements the functions of controlling the overall instrument and implementing the science data compression and packing. The DPU design is based on the use of a LEON family processor. In SAFARI, all instrument components are connected to the central DPU via SpaceWire links. On these links science data, housekeeping and commands flows are in some cases multiplexed, therefore the interface control shall be able to cope with variable throughput needs. The effective data transfer workload can be an issue for the overall system performances and becomes a critical parameter for the on-board software design, both at application layer level and at lower, and more HW related, levels. To analyze the system behavior in presence of the expected SAFARI demanding science data flow, we carried out a series of performance tests using the standard GR-CPCI-UT699 LEON3-FT Development Board, provided by Aeroflex/Gaisler, connected to the emulator of the SAFARI science data links, in a point-to-point topology. Two different communication protocols have been used in the tests, the ECSS-E-ST-50-52C RMAP protocol and an internally defined one, the SAFARI internal data handling protocol. An incremental approach has been adopted to measure the system performances at different levels of the communication protocol complexity. In all cases the performance has been evaluated by measuring the CPU workload and the bus latencies. The tests have been executed initially in a custom low level execution environment and finally using the Real- Time Executive for Multiprocessor Systems (RTEMS), which has been selected as the operating system to be used onboard SAFARI. The preliminary results of the carried out performance analysis confirmed the possibility of using a LEON3 CPU processor in the SAFARI DPU, but pointed out, in agreement
Working time intervals and total work time on nursing positions in Poland

Directory of Open Access Journals (Sweden)

Danuta Kunecka

2015-06-01

Full Text Available Background: For the last few years a topic of overwork on nursing posts has given rise to strong discussions. The author has set herself a goal of answering the question if it is a result of real overwork of this particular profession or rather commonly assumed frustration of this professional group. The aim of this paper is to conduct the analysis of working time on chosen nursing positions in relation to measures of time being used as intervals in the course of conducting standard professional activities during one working day. Material and Methods: Research material consisted of documentation of work time on chosen nursing workplaces, compiled between 2007–2012 within the framework of a nursing course at the Pomeranian Medical University in Szczecin. As a method of measurement a photograph of a working day has been used. Measurements were performed in institutions located in 6 voivodeships in Poland. Results: Results suggest that only 6.5% of total of surveyed representatives of nurse profession spends proper amount of time (meaning: a time set by the applicable standards on work intervals during a working day. Conclusions: The scale of the phenomenon indicates excessive workload for nursing positions, which along with a longer period of time, longer working hours may cause decrease in efficiency of work and cause a drop in quality of provided services. Med Pr 2015;66,(2:165–172
Multi-GPU based acceleration of a list-mode DRAMA toward real-time OpenPET imaging

Energy Technology Data Exchange (ETDEWEB)

Kinouchi, Shoko [Chiba Univ. (Japan); National Institute of Radiological Sciences, Chiba (Japan); Yamaya, Taiga; Yoshida, Eiji; Tashima, Hideaki [National Institute of Radiological Sciences, Chiba (Japan); Kudo, Hiroyuki [Tsukuba Univ., Ibaraki (Japan); Suga, Mikio [Chiba Univ. (Japan)

2011-07-01

OpenPET, which has a physical gap between two detector rings, is our new PET geometry. In order to realize future radiation therapy guided by OpenPET, real-time imaging is required. Therefore we developed a list-mode image reconstruction method using general purpose graphic processing units (GPUs). For GPU implementation, the efficiency of acceleration depends on the implementation method which is required to avoid conditional statements. Therefore, in our previous study, we developed a new system model which was suited for the GPU implementation. In this paper, we implemented our image reconstruction method using 4 GPUs to get further acceleration. We applied the developed reconstruction method to a small OpenPET prototype. We obtained calculation times of total iteration using 4 GPUs that were 3.4 times faster than using a single GPU. Compared to using a single CPU, we achieved the reconstruction time speed-up of 142 times using 4 GPUs. (orig.)
Real-time image reconstruction and display system for MRI using a high-speed personal computer.

Science.gov (United States)

Haishi, T; Kose, K

1998-09-01

A real-time NMR image reconstruction and display system was developed using a high-speed personal computer and optimized for the 32-bit multitasking Microsoft Windows 95 operating system. The system was operated at various CPU clock frequencies by changing the motherboard clock frequency and the processor/bus frequency ratio. When the Pentium CPU was used at the 200 MHz clock frequency, the reconstruction time for one 128 x 128 pixel image was 48 ms and that for the image display on the enlarged 256 x 256 pixel window was about 8 ms. NMR imaging experiments were performed with three fast imaging sequences (FLASH, multishot EPI, and one-shot EPI) to demonstrate the ability of the real-time system. It was concluded that in most cases, high-speed PC would be the best choice for the image reconstruction and display system for real-time MRI. Copyright 1998 Academic Press.
Fast Parallel Image Registration on CPU and GPU for Diagnostic Classification of Alzheimer's Disease

Directory of Open Access Journals (Sweden)

Denis P Shamonin

2014-01-01

Full Text Available Nonrigid image registration is an important, but time-consuming taskin medical image analysis. In typical neuroimaging studies, multipleimage registrations are performed, i.e. for atlas-based segmentationor template construction. Faster image registration routines wouldtherefore be beneficial.In this paper we explore acceleration of the image registrationpackage elastix by a combination of several techniques: iparallelization on the CPU, to speed up the cost function derivativecalculation; ii parallelization on the GPU building on andextending the OpenCL framework from ITKv4, to speed up the Gaussianpyramid computation and the image resampling step; iii exploitationof certain properties of the B-spline transformation model; ivfurther software optimizations.The accelerated registration tool is employed in a study ondiagnostic classification of Alzheimer's disease and cognitivelynormal controls based on T1-weighted MRI. We selected 299participants from the publicly available Alzheimer's DiseaseNeuroimaging Initiative database. Classification is performed with asupport vector machine based on gray matter volumes as a marker foratrophy. We evaluated two types of strategies (voxel-wise andregion-wise that heavily rely on nonrigid image registration.Parallelization and optimization resulted in an acceleration factorof 4-5x on an 8-core machine. Using OpenCL a speedup factor of ~2was realized for computation of the Gaussian pyramids, and 15-60 forthe resampling step, for larger images. The voxel-wise and theregion-wise classification methods had an area under thereceiver operator characteristic curve of 88% and 90%,respectively, both for standard and accelerated registration.We conclude that the image registration package elastix wassubstantially accelerated, with nearly identical results to thenon-optimized version. The new functionality will become availablein the next release of elastix as open source under the BSD license.
Aggressive time step selection for the time asymptotic velocity diffusion problem

International Nuclear Information System (INIS)

Hewett, D.W.; Krapchev, V.B.; Hizanidis, K.; Bers, A.

1984-12-01

An aggressive time step selector for an ADI algorithm is preseneted that is applied to the linearized 2-D Fokker-Planck equation including an externally imposed quasilinear diffusion term. This method provides a reduction in CPU requirements by factors of two or three compared to standard ADI. More important, the robustness of the procedure greatly reduces the work load of the user. The procedure selects a nearly optimal Δt with a minimum of intervention by the user thus relieving the need to supervise the algorithm. In effect, the algorithm does its own supervision by discarding time steps made with Δt too large
Real-time unmanned aircraft systems surveillance video mosaicking using GPU

Science.gov (United States)

Camargo, Aldo; Anderson, Kyle; Wang, Yi; Schultz, Richard R.; Fevig, Ronald A.

2010-04-01

Digital video mosaicking from Unmanned Aircraft Systems (UAS) is being used for many military and civilian applications, including surveillance, target recognition, border protection, forest fire monitoring, traffic control on highways, monitoring of transmission lines, among others. Additionally, NASA is using digital video mosaicking to explore the moon and planets such as Mars. In order to compute a "good" mosaic from video captured by a UAS, the algorithm must deal with motion blur, frame-to-frame jitter associated with an imperfectly stabilized platform, perspective changes as the camera tilts in flight, as well as a number of other factors. The most suitable algorithms use SIFT (Scale-Invariant Feature Transform) to detect the features consistent between video frames. Utilizing these features, the next step is to estimate the homography between two consecutives video frames, perform warping to properly register the image data, and finally blend the video frames resulting in a seamless video mosaick. All this processing takes a great deal of resources of resources from the CPU, so it is almost impossible to compute a real time video mosaic on a single processor. Modern graphics processing units (GPUs) offer computational performance that far exceeds current CPU technology, allowing for real-time operation. This paper presents the development of a GPU-accelerated digital video mosaicking implementation and compares it with CPU performance. Our tests are based on two sets of real video captured by a small UAS aircraft; one video comes from Infrared (IR) and Electro-Optical (EO) cameras. Our results show that we can obtain a speed-up of more than 50 times using GPU technology, so real-time operation at a video capture of 30 frames per second is feasible.
[Determination of total and segmental colonic transit time in constipated children].

Science.gov (United States)

Zhang, Shu-cheng; Wang, Wei-lin; Bai, Yu-zuo; Yuan, Zheng-wei; Wang, Wei

2003-03-01

To determine the total and segmental colonic transit time of normal Chinese children and to explore its value in constipation in children. The subjects involved in this study were divided into 2 groups. One group was control, which had 33 healthy children (21 males and 12 females) aged 2 - 13 years (mean 5 years). The other was constipation group, which had 25 patients (15 males and 10 females) aged 3 - 14 years (mean 7 years) with constipation according to Benninga's criteria. Written informed consent was obtained from the parents of each subject. In this study the simplified method of radio opaque markers was used to determine the total gastrointestinal transit time and segmental colonic transit time of the normal and constipated children, and in part of these patients X-ray defecography was also used. The total gastrointestinal transit time (TGITT), right colonic transit time (RCTT), left colonic transit time (LCTT) and rectosigmoid colonic transit time (RSTT) of the normal children were 28.7 +/- 7.7 h, 7.5 +/- 3.2 h, 6.5 +/- 3.8 h and 13.4 +/- 5.6 h, respectively. In the constipated children, the TGITT, LCTT and RSTT were significantly longer than those in controls (92.2 +/- 55.5 h vs 28.7 +/- 7.7 h, P < 0.001; 16.9 +/- 12.6 h vs 6.5 +/- 3.8 h, P < 0.01; 61.5 +/- 29.0 h vs 13.4 +/- 5.6 h, P < 0.001), while the RCTT had no significant difference. X-ray defecography demonstrated one rectocele, one perineal descent syndrome and one puborectal muscle syndrome, respectively. The TGITT, RCTT, LCTT and RSTT of the normal children were 28.7 +/- 7.7 h, 7.5 +/- 3.2 h, 6.5 +/- 3.8 h and 13.4 +/- 5.6 h, respectively. With the segmental colonic transit time, constipation can be divided into four types: slow-transit constipation, outlet obstruction, mixed type and normal transit constipation. X-ray defecography can demonstrate the anatomical or dynamic abnormalities within the anorectal area, with which constipation can be further divided into different subtypes, and
Novel crystal timing calibration method based on total variation

Science.gov (United States)

Yu, Xingjian; Isobe, Takashi; Watanabe, Mitsuo; Liu, Huafeng

2016-11-01

A novel crystal timing calibration method based on total variation (TV), abbreviated as ‘TV merge’, has been developed for a high-resolution positron emission tomography (PET) system. The proposed method was developed for a system with a large number of crystals, it can provide timing calibration at the crystal level. In the proposed method, the timing calibration process was formulated as a linear problem. To robustly optimize the timing resolution, a TV constraint was added to the linear equation. Moreover, to solve the computer memory problem associated with the calculation of the timing calibration factors for systems with a large number of crystals, the merge component was used for obtaining the crystal level timing calibration values. Compared with other conventional methods, the data measured from a standard cylindrical phantom filled with a radioisotope solution was sufficient for performing a high-precision crystal-level timing calibration. In this paper, both simulation and experimental studies were performed to demonstrate the effectiveness and robustness of the TV merge method. We compare the timing resolutions of a 22Na point source, which was located in the field of view (FOV) of the brain PET system, with various calibration techniques. After implementing the TV merge method, the timing resolution improved from 3.34 ns at full width at half maximum (FWHM) to 2.31 ns FWHM.
A new approach for global synchronization in hierarchical scheduled real-time systems

NARCIS (Netherlands)

Behnam, M.; Nolte, T.; Bril, R.J.

2009-01-01

We present our ongoing work to improve an existing synchronization protocol SIRAP for hierarchically scheduled real-time systems. A less pessimistic schedulability analysis is presented which can make the SIRAP protocol more efficient in terms of calculated CPU resource needs. In addition and for
Online real-time reconstruction of adaptive TSENSE with commodity CPU / GPU hardware

DEFF Research Database (Denmark)

Roujol, Sebastien; de Senneville, Baudouin Denis; Vahalla, Erkki

2009-01-01

Adaptive temporal sensitivity encoding (TSENSE) has been suggested as a robust parallel imaging method suitable for MR guidance of interventional procedures. However, in practice, the reconstruction of adaptive TSENSE images obtained with large coil arrays leads to long reconstruction times...... image sizes used in interventional imaging (128 × 96, 16 channels, sensitivity encoding (SENSE) factor 2-4), the pipeline is able to reconstruct adaptive TSENSE images with image latencies below 90 ms at frame rates of up to 40 images/s, rendering the MR performance in practice limited...... by the constraints of the MR acquisition. Its performance is demonstrated by the online reconstruction of in vivo MR images for rapid temperature mapping of the kidney and for cardiac catheterization....
Fast parallel image registration on CPU and GPU for diagnostic classification of Alzheimer's disease.

Science.gov (United States)

Shamonin, Denis P; Bron, Esther E; Lelieveldt, Boudewijn P F; Smits, Marion; Klein, Stefan; Staring, Marius

2013-01-01

Nonrigid image registration is an important, but time-consuming task in medical image analysis. In typical neuroimaging studies, multiple image registrations are performed, i.e., for atlas-based segmentation or template construction. Faster image registration routines would therefore be beneficial. In this paper we explore acceleration of the image registration package elastix by a combination of several techniques: (i) parallelization on the CPU, to speed up the cost function derivative calculation; (ii) parallelization on the GPU building on and extending the OpenCL framework from ITKv4, to speed up the Gaussian pyramid computation and the image resampling step; (iii) exploitation of certain properties of the B-spline transformation model; (iv) further software optimizations. The accelerated registration tool is employed in a study on diagnostic classification of Alzheimer's disease and cognitively normal controls based on T1-weighted MRI. We selected 299 participants from the publicly available Alzheimer's Disease Neuroimaging Initiative database. Classification is performed with a support vector machine based on gray matter volumes as a marker for atrophy. We evaluated two types of strategies (voxel-wise and region-wise) that heavily rely on nonrigid image registration. Parallelization and optimization resulted in an acceleration factor of 4-5x on an 8-core machine. Using OpenCL a speedup factor of 2 was realized for computation of the Gaussian pyramids, and 15-60 for the resampling step, for larger images. The voxel-wise and the region-wise classification methods had an area under the receiver operator characteristic curve of 88 and 90%, respectively, both for standard and accelerated registration. We conclude that the image registration package elastix was substantially accelerated, with nearly identical results to the non-optimized version. The new functionality will become available in the next release of elastix as open source under the BSD license.
A hybrid CPU-GPU accelerated framework for fast mapping of high-resolution human brain connectome.

Directory of Open Access Journals (Sweden)

Yu Wang

Full Text Available Recently, a combination of non-invasive neuroimaging techniques and graph theoretical approaches has provided a unique opportunity for understanding the patterns of the structural and functional connectivity of the human brain (referred to as the human brain connectome. Currently, there is a very large amount of brain imaging data that have been collected, and there are very high requirements for the computational capabilities that are used in high-resolution connectome research. In this paper, we propose a hybrid CPU-GPU framework to accelerate the computation of the human brain connectome. We applied this framework to a publicly available resting-state functional MRI dataset from 197 participants. For each subject, we first computed Pearson's Correlation coefficient between any pairs of the time series of gray-matter voxels, and then we constructed unweighted undirected brain networks with 58 k nodes and a sparsity range from 0.02% to 0.17%. Next, graphic properties of the functional brain networks were quantified, analyzed and compared with those of 15 corresponding random networks. With our proposed accelerating framework, the above process for each network cost 80∼150 minutes, depending on the network sparsity. Further analyses revealed that high-resolution functional brain networks have efficient small-world properties, significant modular structure, a power law degree distribution and highly connected nodes in the medial frontal and parietal cortical regions. These results are largely compatible with previous human brain network studies. Taken together, our proposed framework can substantially enhance the applicability and efficacy of high-resolution (voxel-based brain network analysis, and have the potential to accelerate the mapping of the human brain connectome in normal and disease states.
Accuracy and computational time of a hierarchy of growth rate definitions for breeder reactor fuel

International Nuclear Information System (INIS)

Maudlin, P.J.; Borg, R.C.; Ott, K.O.

1979-01-01

For a hierarchy of four logically different definitions for calculating the asymptotic growth of fast breeder reactor fuel, an investigation is performed concerning the comparative accuracy and computational effort associated with each definition. The definition based on detailed calculation of the accumulating fuel in an expanding park of reactors asymptotically yields the most accurate value of the infinite time growth rate, γ/sup infinity/, which is used as a reference value. The computational effort involved with the park definition is very large. The definition based on the single reactor calculation of the equilibrium surplus production rate and fuel inventory gives a value for γ/sup infinity of comparable accuracy to the park definition and uses significantly less central processor unit (CPU) time. The third definition is based on a continuous treatment of the reactor fuel cycle for a single reactor and gives a value for γ/sup infinity/ that accurately approximates the second definition. The continuous definition requires very little CPU time. The fourth definition employs the isotopic breeding worths, w/sub i//sup */, for a projection of the asymptotic growth rate. The CPU time involved in this definition is practically nil if its calculation is based on the few-cycle depletion calculation normally performed for core design and critical enrichment evaluations. The small inaccuracy (approx. = 1%) of the breeding-worth-based definition is well within the inaccuracy range that results unavoidably from other sources such as nuclear cross sections, group constants, and flux calculations. This fully justifies the use of this approach in routine calculations
Objectively measured total and occupational sedentary time in three work settings

NARCIS (Netherlands)

Dommelen, P. van; Coffeng, J. K.; Ploeg, H.P. van der; Beek, A.J. van der; Boot, C.R.; Hendriksen, I.J.

2016-01-01

Background. Sedentary behaviour increases the risk for morbidity. Our primary aim is to determine the proportion and factors associated with objectively measured total and occupational sedentary time in three work settings. Secondary aim is to study the proportion of physical activity and prolonged
Real-time global illumination on mobile device

Science.gov (United States)

Ahn, Minsu; Ha, Inwoo; Lee, Hyong-Euk; Kim, James D. K.

2014-02-01

We propose a novel method for real-time global illumination on mobile devices. Our approach is based on instant radiosity, which uses a sequence of virtual point lights in order to represent the e ect of indirect illumination. Our rendering process consists of three stages. With the primary light, the rst stage generates a local illumination with the shadow map on GPU The second stage of the global illumination uses the re ective shadow map on GPU and generates the sequence of virtual point lights on CPU. Finally, we use the splatting method of Dachsbacher et al 1 and add the indirect illumination to the local illumination on GPU. With the limited computing resources in mobile devices, a small number of virtual point lights are allowed for real-time rendering. Our approach uses the multi-resolution sampling method with 3D geometry and attributes simultaneously and reduce the total number of virtual point lights. We also use the hybrid strategy, which collaboratively combines the CPUs and GPUs available in a mobile SoC due to the limited computing resources in mobile devices. Experimental results demonstrate the global illumination performance of the proposed method.
Real Time Deconvolution of In-Vivo Ultrasound Images

DEFF Research Database (Denmark)

Jensen, Jørgen Arendt

2013-01-01

and two wavelengths. This can be improved by deconvolution, which increase the bandwidth and equalizes the phase to increase resolution under the constraint of the electronic noise in the received signal. A fixed interval Kalman filter based deconvolution routine written in C is employed. It uses a state...... resolution has been determined from the in-vivo liver image using the auto-covariance function. From the envelope of the estimated pulse the axial resolution at Full-Width-Half-Max is 0.581 mm corresponding to 1.13 l at 3 MHz. The algorithm increases the resolution to 0.116 mm or 0.227 l corresponding...... to a factor of 5.1. The basic pulse can be estimated in roughly 0.176 seconds on a single CPU core on an Intel i5 CPU running at 1.8 GHz. An in-vivo image consisting of 100 lines of 1600 samples can be processed in roughly 0.1 seconds making it possible to perform real-time deconvolution on ultrasound data...

Musrfit-Real Time Parameter Fitting Using GPUs

Science.gov (United States)

Locans, Uldis; Suter, Andreas

High transverse field μSR (HTF-μSR) experiments typically lead to a rather large data sets, since it is necessary to follow the high frequencies present in the positron decay histograms. The analysis of these data sets can be very time consuming, usually due to the limited computational power of the hardware. To overcome the limited computing resources rotating reference frame transformation (RRF) is often used to reduce the data sets that need to be handled. This comes at a price typically the μSR community is not aware of: (i) due to the RRF transformation the fitting parameter estimate is of poorer precision, i.e., more extended expensive beamtime is needed. (ii) RRF introduces systematic errors which hampers the statistical interpretation of χ2 or the maximum log-likelihood. We will briefly discuss these issues in a non-exhaustive practical way. The only and single purpose of the RRF transformation is the sluggish computer power. Therefore during this work GPU (Graphical Processing Units) based fitting was developed which allows to perform real-time full data analysis without RRF. GPUs have become increasingly popular in scientific computing in recent years. Due to their highly parallel architecture they provide the opportunity to accelerate many applications with considerably less costs than upgrading the CPU computational power. With the emergence of frameworks such as CUDA and OpenCL these devices have become more easily programmable. During this work GPU support was added to Musrfit- a data analysis framework for μSR experiments. The new fitting algorithm uses CUDA or OpenCL to offload the most time consuming parts of the calculations to Nvidia or AMD GPUs. Using the current CPU implementation in Musrfit parameter fitting can take hours for certain data sets while the GPU version can allow to perform real-time data analysis on the same data sets. This work describes the challenges that arise in adding the GPU support to t as well as results obtained
Deployment of IPv6-only CPU resources at WLCG sites

Science.gov (United States)

Babik, M.; Chudoba, J.; Dewhurst, A.; Finnern, T.; Froy, T.; Grigoras, C.; Hafeez, K.; Hoeft, B.; Idiculla, T.; Kelsey, D. P.; López Muñoz, F.; Martelli, E.; Nandakumar, R.; Ohrenberg, K.; Prelz, F.; Rand, D.; Sciabà, A.; Tigerstedt, U.; Traynor, D.

2017-10-01

The fraction of Internet traffic carried over IPv6 continues to grow rapidly. IPv6 support from network hardware vendors and carriers is pervasive and becoming mature. A network infrastructure upgrade often offers sites an excellent window of opportunity to configure and enable IPv6. There is a significant overhead when setting up and maintaining dual-stack machines, so where possible sites would like to upgrade their services directly to IPv6 only. In doing so, they are also expediting the transition process towards its desired completion. While the LHC experiments accept there is a need to move to IPv6, it is currently not directly affecting their work. Sites are unwilling to upgrade if they will be unable to run LHC experiment workflows. This has resulted in a very slow uptake of IPv6 from WLCG sites. For several years the HEPiX IPv6 Working Group has been testing a range of WLCG services to ensure they are IPv6 compliant. Several sites are now running many of their services as dual-stack. The working group, driven by the requirements of the LHC VOs to be able to use IPv6-only opportunistic resources, continues to encourage wider deployment of dual-stack services to make the use of such IPv6-only clients viable. This paper presents the working group’s plan and progress so far to allow sites to deploy IPv6-only CPU resources. This includes making experiment central services dual-stack as well as a number of storage services. The monitoring, accounting and information services that are used by jobs also need to be upgraded. Finally the VO testing that has taken place on hosts connected via IPv6-only is reported.
Increased Total Anesthetic Time Leads to Higher Rates of Surgical Site Infections in Spinal Fusions.

Science.gov (United States)

Puffer, Ross C; Murphy, Meghan; Maloney, Patrick; Kor, Daryl; Nassr, Ahmad; Freedman, Brett; Fogelson, Jeremy; Bydon, Mohamad

2017-06-01

A retrospective review of a consecutive series of spinal fusions comparing patient and procedural characteristics of patients who developed surgical site infections (SSIs) after spinal fusion. It is known that increased surgical time (incision to closure) is associated with a higher rate of postoperative SSIs. We sought to determine whether increased total anesthetic time (intubation to extubation) is a factor in the development of SSIs as well. In spine surgery for deformity and degenerative disease, SSI has been associated with operative time, revealing a nearly 10-fold increase in SSI rates in prolonged surgery. Surgical time is associated with infections in other surgical disciplines as well. No studies have reported whether total anesthetic time (intubation to extubation) has an association with SSIs. Surgical records were searched in a retrospective fashion to identify all spine fusion procedures performed between January 2010 and July 2012. All SSIs during that timeframe were recorded and compared with the list of cases performed between 2010 and 2012 in a case-control design. There were 20 (1.7%) SSIs in this fusion cohort. On univariate analyses of operative factors, there was a significant association between total anesthetic time (Infection 7.6 ± 0.5 hrs vs. no infection -6.0 ± 0.1 hrs, P operative time (infection 5.5 ± 0.4 hrs vs. no infection - 4.4 ± 0.06 hrs, P infections, whereas level of pathology and emergent surgery were not significant. On multivariate logistic analysis, BMI and total anesthetic time remained independent predictors of SSI whereas ASA status and operative time did not. Increasing BMI and total anesthetic time were independent predictors of SSIs in this cohort of over 1000 consecutive spinal fusions. 3.
Endoplasmic reticulum stress mediating downregulated StAR and 3-beta-HSD and low plasma testosterone caused by hypoxia is attenuated by CPU86017-RS and nifedipine

Directory of Open Access Journals (Sweden)

Liu Gui-Lai

2012-01-01

Full Text Available Abstract Background Hypoxia exposure initiates low serum testosterone levels that could be attributed to downregulated androgen biosynthesizing genes such as StAR (steroidogenic acute regulatory protein and 3-beta-HSD (3-beta-hydroxysteroid dehydrogenase in the testis. It was hypothesized that these abnormalities in the testis by hypoxia are associated with oxidative stress and an increase in chaperones of endoplasmic reticulum stress (ER stress and ER stress could be modulated by a reduction in calcium influx. Therefore, we verify that if an application of CPU86017-RS (simplified as RS, a derivative to berberine could alleviate the ER stress and depressed gene expressions of StAR and 3-beta-HSD, and low plasma testosterone in hypoxic rats, these were compared with those of nifedipine. Methods Adult male Sprague-Dawley rats were randomly divided into control, hypoxia for 28 days, and hypoxia treated (mg/kg, p.o. during the last 14 days with nifedipine (Nif, 10 and three doses of RS (20, 40, 80, and normal rats treated with RS isomer (80. Serum testosterone (T and luteinizing hormone (LH were measured. The testicular expressions of biomarkers including StAR, 3-beta-HSD, immunoglobulin heavy chain binding protein (Bip, double-strand RNA-activated protein kinase-like ER kinase (PERK and pro-apoptotic transcription factor C/EBP homologous protein (CHOP were measured. Results In hypoxic rats, serum testosterone levels decreased and mRNA and protein expressions of the testosterone biosynthesis related genes, StAR and 3-beta-HSD were downregulated. These changes were linked to an increase in oxidants and upregulated ER stress chaperones: Bip, PERK, CHOP and distorted histological structure of the seminiferous tubules in the testis. These abnormalities were attenuated significantly by CPU86017-RS and nifedipine. Conclusion Downregulated StAR and 3-beta-HSD significantly contribute to low testosterone in hypoxic rats and is associated with ER stress
Accelerating the SCE-UA Global Optimization Method Based on Multi-Core CPU and Many-Core GPU

Directory of Open Access Journals (Sweden)

Guangyuan Kan

2016-01-01

Full Text Available The famous global optimization SCE-UA method, which has been widely used in the field of environmental model parameter calibration, is an effective and robust method. However, the SCE-UA method has a high computational load which prohibits the application of SCE-UA to high dimensional and complex problems. In recent years, the hardware of computer, such as multi-core CPUs and many-core GPUs, improves significantly. These much more powerful new hardware and their software ecosystems provide an opportunity to accelerate the SCE-UA method. In this paper, we proposed two parallel SCE-UA methods and implemented them on Intel multi-core CPU and NVIDIA many-core GPU by OpenMP and CUDA Fortran, respectively. The Griewank benchmark function was adopted in this paper to test and compare the performances of the serial and parallel SCE-UA methods. According to the results of the comparison, some useful advises were given to direct how to properly use the parallel SCE-UA methods.
Real-time Global Illumination by Simulating Photon Mapping

DEFF Research Database (Denmark)

Larsen, Bent Dalgaard

2004-01-01

This thesis introduces a new method for simulating photon mapping in realtime. The method uses a variety of both CPU and GPU based algorithms for speeding up the different elements in global illumination. The idea behind the method is to calculate each illumination element individually in a progr......This thesis introduces a new method for simulating photon mapping in realtime. The method uses a variety of both CPU and GPU based algorithms for speeding up the different elements in global illumination. The idea behind the method is to calculate each illumination element individually...... in a progressive and efficient manner. This has been done by analyzing the photon mapping method and by selecting efficient methods, either CPU based or GPU based, which replaces the original photon mapping algorithms. We have chosen to focus on the indirect illumination and the caustics. In our method we first...... divide the photon map into several photon maps in order to make local updates possible. Then indirect illumination is added using light maps that are selectively updated by using selective photon tracing on the CPU. The final gathering step is calculated by using fragment programs and GPU based...
Evaluation and comparison of SN and Monte-Carlo charged particle transport calculations

International Nuclear Information System (INIS)

Hadad, K.

2000-01-01

A study was done to evaluate a 3-D S N charged particle transport code called SMARTEPANTS 1 and another 3-D Monte Carlo code called Integrated Tiger Series, ITS 2 . The evaluation study of SMARTEPANTS code was based on angular discretization and reflected boundary sensitivity whilst the evaluation of ITS was based on CPU time and variance reduction. The comparison of the two code was based on energy and charge deposition calculation in block of Gallium Arsenide with embedded gold cylinders. The result of evaluation tests shows that an S 8 calculation maintains both accuracy and speed and calculations with reflected boundaries geometry produces full symmetrical results. As expected for ITS evaluation, the CPU time and variance reduction are opposite to a point beyond which the history augmentation while increasing the CPU time do not result in variance reduction. The comparison test problem showed excellent agreement in total energy deposition calculations
Discrepancy Between Clinician and Research Assistant in TIMI Score Calculation (TRIAGED CPU

Directory of Open Access Journals (Sweden)

Taylor, Brian T.

2014-11-01

Full Text Available Introduction: Several studies have attempted to demonstrate that the Thrombolysis in Myocardial Infarction (TIMI risk score has the ability to risk stratify emergency department (ED patients with potential acute coronary syndromes (ACS. Most of the studies we reviewed relied on trained research investigators to determine TIMI risk scores rather than ED providers functioning in their normal work capacity. We assessed whether TIMI risk scores obtained by ED providers in the setting of a busy ED differed from those obtained by trained research investigators. Methods: This was an ED-based prospective observational cohort study comparing TIMI scores obtained by 49 ED providers admitting patients to an ED chest pain unit (CPU to scores generated by a team of trained research investigators. We examined provider type, patient gender, and TIMI elements for their effects on TIMI risk score discrepancy. Results: Of the 501 adult patients enrolled in the study, 29.3% of TIMI risk scores determined by ED providers and trained research investigators were generated using identical TIMI risk score variables. In our low-risk population the majority of TIMI risk score differences were small; however, 12% of TIMI risk scores differed by two or more points. Conclusion: TIMI risk scores determined by ED providers in the setting of a busy ED frequently differ from scores generated by trained research investigators who complete them while not under the same pressure of an ED provider. [West J Emerg Med. 2015;16(1:24–33.
Total sleep time, alcohol consumption, and the duration and severity of alcohol hangover

NARCIS (Netherlands)

van Schrojenstein Lantman, Marith; Mackus, Marlou; Roth, Thomas; Verster, Joris C|info:eu-repo/dai/nl/241442702

2017-01-01

INTRODUCTION: An evening of alcohol consumption often occurs at the expense of sleep time. The aim of this study was to determine the relationship between total sleep time and the duration and severity of the alcohol hangover. METHODS: A survey was conducted among Dutch University students to
The influence of tourniquet use and operative time on the incidence of deep vein thrombosis in total knee arthroplasty.

Science.gov (United States)

Hernandez, Arnaldo José; Almeida, Adriano Marques de; Fávaro, Edmar; Sguizzato, Guilherme Turola

2012-09-01

To evaluate the association between tourniquet and total operative time during total knee arthroplasty and the occurrence of deep vein thrombosis. Seventy-eight consecutive patients from our institution underwent cemented total knee arthroplasty for degenerative knee disorders. The pneumatic tourniquet time and total operative time were recorded in minutes. Four categories were established for total tourniquet time: 120 minutes. Three categories were defined for operative time: 150 minutes. Between 7 and 12 days after surgery, the patients underwent ascending venography to evaluate the presence of distal or proximal deep vein thrombosis. We evaluated the association between the tourniquet time and total operative time and the occurrence of deep vein thrombosis after total knee arthroplasty. In total, 33 cases (42.3%) were positive for deep vein thrombosis; 13 (16.7%) cases involved the proximal type. We found no statistically significant difference in tourniquet time or operative time between patients with or without deep vein thrombosis. We did observe a higher frequency of proximal deep vein thrombosis in patients who underwent surgery lasting longer than 120 minutes. The mean total operative time was also higher in patients with proximal deep vein thrombosis. The tourniquet time did not significantly differ in these patients. We concluded that surgery lasting longer than 120 minutes increases the risk of proximal deep vein thrombosis.
Real time control of the SSC string magnets

International Nuclear Information System (INIS)

Calvo, O.; Flora, R.; MacPherson, M.

1987-01-01

The system described in this paper, called SECAR, was designed to control the excitation of a test string of magnets for the proposed Superconducting Super Collider (SSC) and will be used to upgrade the present Tevatron Excitation, Control and Regulation (TECAR) hardware and software. It resides in a VME orate and is controlled by a 68020/68881 based CPU running the application software under a real time operating system named VRTX
Asymptotic behavior of total times For jobs that must start over if a failure occurs

DEFF Research Database (Denmark)

Asmussen, Søren; Fiorini, Pierre; Lipsky, Lester

the ready queue, or it may restart the task. The behavior of systems under the first two scenarios is well documented, but the third (RESTART) has resisted detailed analysis. In this paper we derive tight asymptotic relations between the distribution of task times without failures to the total time when...... including failures, for any failure distribution. In particular, we show that if the task time distribution has an unbounded support then the total time distribution H is always heavy-tailed. Asymptotic expressions are given for the tail of H in various scenarios. The key ingredients of the analysis...
Asymptotic behaviour of total times for jobs that must start over if a failure occurs

DEFF Research Database (Denmark)

Asmussen, Søren; Fiorini, Pierre; Lipsky, Lester

2008-01-01

the ready queue, or it may restart the task. The behavior of systems under the first two scenarios is well documented, but the third (RESTART) has resisted detailed analysis. In this paper we derive tight asymptotic relations between the distribution of task times without failures and the total time when...... including failures, for any failure distribution. In particular, we show that if the task-time distribution has an unbounded support, then the total-time distribution H is always heavy tailed. Asymptotic expressions are given for the tail of H in various scenarios. The key ingredients of the analysis...
Instruction timing for the CDC 7600 computer

International Nuclear Information System (INIS)

Lipps, H.

1975-01-01

This report provides timing information for all instructions of the Control Data 7600 computer, except for instructions of type 01X, to enable the optimization of 7600 programs. The timing rules serve as background information for timing charts which are produced by a program (TIME76) of the CERN Program Library. The rules that co-ordinate the different sections of the CPU are stated in as much detail as is necessary to time the flow of instructions for a given sequence of code. Instruction fetch, instruction issue, and access to small core memory are treated at length, since details are not available from the computer manuals. Annotated timing charts are given for 24 examples, chosen to display the full range of timing considerations. (Author)
Total donor ischemic time: relationship to early hemodynamics and intensive care morbidity in pediatric cardiac transplant recipients.

Science.gov (United States)

Rodrigues, Warren; Carr, Michelle; Ridout, Deborah; Carter, Katherine; Hulme, Sara Louise; Simmonds, Jacob; Elliott, Martin; Hoskote, Aparna; Burch, Michael; Brown, Kate L

2011-11-01

Single-center studies have failed to link modest increases in total donor ischemic time to mortality after pediatric orthotopic heart transplant. We aimed to investigate whether prolonged total donor ischemic time is linked to pediatric intensive care morbidity after orthotopic heart transplant. Retrospective cohort review. Tertiary pediatric transplant center in the United Kingdom. Ninety-three pediatric orthotopic heart transplants between 2002 and 2006. Total donor ischemic time was investigated for association with early post-orthotopic heart transplant hemodynamics and intensive care unit morbidities. Of 43 males and 50 females with median age 7.2 (interquartile range 2.2, 13.0) yrs, 62 (68%) had dilated cardiomyopathy, 20 (22%) had congenital heart disease, and nine (10%) had restrictive cardiomyopathy. The mean total donor ischemic time was 225.9 (sd 65.6) mins. In the first 24 hrs after orthotopic heart transplant, age-adjusted mean arterial blood pressure increased (p total donor ischemic time was significantly associated with lower mean arterial blood pressure (p care unit (p = .004), and longer post-orthotopic heart transplant stay in hospital (p = .02). Total donor ischemic time was not related to levels of mean pulmonary arterial pressure (p = .62), left atrial pressure (p = .38), or central venous pressure (p = .76) early after orthotopic heart transplant. Prolonged total donor ischemic time has an adverse effect on the donor organ, contributing to lower mean arterial blood pressure, as well as more prolonged ventilation and intensive care unit and hospital stays post-orthotopic heart transplant, reflecting increased morbidity.
GPU based acceleration of first principles calculation

International Nuclear Information System (INIS)

Tomono, H; Tsumuraya, K; Aoki, M; Iitaka, T

2010-01-01

We present a Graphics Processing Unit (GPU) accelerated simulations of first principles electronic structure calculations. The FFT, which is the most time-consuming part, is about 10 times accelerated. As the result, the total computation time of a first principles calculation is reduced to 15 percent of that of the CPU.
Acoustic reverse-time migration using GPU card and POSIX thread based on the adaptive optimal finite-difference scheme and the hybrid absorbing boundary condition

Science.gov (United States)

Cai, Xiaohui; Liu, Yang; Ren, Zhiming

2018-06-01

Reverse-time migration (RTM) is a powerful tool for imaging geologically complex structures such as steep-dip and subsalt. However, its implementation is quite computationally expensive. Recently, as a low-cost solution, the graphic processing unit (GPU) was introduced to improve the efficiency of RTM. In the paper, we develop three ameliorative strategies to implement RTM on GPU card. First, given the high accuracy and efficiency of the adaptive optimal finite-difference (FD) method based on least squares (LS) on central processing unit (CPU), we study the optimal LS-based FD method on GPU. Second, we develop the CPU-based hybrid absorbing boundary condition (ABC) to the GPU-based one by addressing two issues of the former when introduced to GPU card: time-consuming and chaotic threads. Third, for large-scale data, the combinatorial strategy for optimal checkpointing and efficient boundary storage is introduced for the trade-off between memory and recomputation. To save the time of communication between host and disk, the portable operating system interface (POSIX) thread is utilized to create the other CPU core at the checkpoints. Applications of the three strategies on GPU with the compute unified device architecture (CUDA) programming language in RTM demonstrate their efficiency and validity.
Time Stamp Synchronization of PEFP Distributed Control Systems

International Nuclear Information System (INIS)

Song, Young Gi; An, Eun Mi; Kwon, Hyeok Jung; Cho, Yong Sub

2010-01-01

Proton Engineering Frontier Project (PEFP) proton linac consists of several types of control systems, such as soft Input Output Controllers (IOC) and embedded IOC based on Experimental Physics Industrial Control System (EPICS) for each subsection of PEFP facility. One of the important factors is that IOC's time clock is synchronized. The synchronized time and time stamp can be achieved with Network Time Protocol (NTP) and EPICS time stamp record without timing hardware. The requirement of the time accuracy of IOCs is less than 1 second. The main objective of this study is to configure a master clock and produce Process Variable (PV) time stamps using local CPU time synchronized from the master clock. The distributed control systems are attached on PEFP control network
Complexities of the storm-time characteristics of ionospheric total electron content

International Nuclear Information System (INIS)

Kane, R.P.

1982-01-01

The complexities of the storm-time variations of the ionospheric total electron content are briefly reviewed. It is suggested that large variations from storm to storm may be due to irregular flows from the auroral region towards equator. A proper study of such flows needs an elaborate network of TEC measuring instruments. The need of planning and organizing such a network is emphasized
A parallel algorithm for the two-dimensional time fractional diffusion equation with implicit difference method.

Science.gov (United States)

Gong, Chunye; Bao, Weimin; Tang, Guojian; Jiang, Yuewen; Liu, Jie

2014-01-01

It is very time consuming to solve fractional differential equations. The computational complexity of two-dimensional fractional differential equation (2D-TFDE) with iterative implicit finite difference method is O(M(x)M(y)N(2)). In this paper, we present a parallel algorithm for 2D-TFDE and give an in-depth discussion about this algorithm. A task distribution model and data layout with virtual boundary are designed for this parallel algorithm. The experimental results show that the parallel algorithm compares well with the exact solution. The parallel algorithm on single Intel Xeon X5540 CPU runs 3.16-4.17 times faster than the serial algorithm on single CPU core. The parallel efficiency of 81 processes is up to 88.24% compared with 9 processes on a distributed memory cluster system. We do think that the parallel computing technology will become a very basic method for the computational intensive fractional applications in the near future.

Total sitting time, leisure time physical activity and risk of hospitalization due to low back pain: The Danish Health Examination Survey cohort 2007-2008.

Science.gov (United States)

Balling, Mie; Holmberg, Teresa; Petersen, Christina B; Aadahl, Mette; Meyrowitsch, Dan W; Tolstrup, Janne S

2018-02-01

This study aimed to test the hypotheses that a high total sitting time and vigorous physical activity in leisure time increase the risk of low back pain and herniated lumbar disc disease. A total of 76,438 adults answered questions regarding their total sitting time and physical activity during leisure time in the Danish Health Examination Survey 2007-2008. Information on low back pain diagnoses up to 10 September 2015 was obtained from The National Patient Register. The mean follow-up time was 7.4 years. Data were analysed using Cox regression analysis with adjustment for potential confounders. Multiple imputations were performed for missing values. During the follow-up period, 1796 individuals were diagnosed with low back pain, of whom 479 were diagnosed with herniated lumbar disc disease. Total sitting time was not associated with low back pain or herniated lumbar disc disease. However, moderate or vigorous physical activity, as compared to light physical activity, was associated with increased risk of low back pain (HR = 1.16, 95% CI: 1.03-1.30 and HR = 1.45, 95% CI: 1.15-1.83). Moderate, but not vigorous physical activity was associated with increased risk of herniated lumbar disc disease. The results suggest that total sitting time is not associated with low back pain, but moderate and vigorous physical activity is associated with increased risk of low back pain compared with light physical activity.
Lot-Order Assignment Applying Priority Rules for the Single-Machine Total Tardiness Scheduling with Nonnegative Time-Dependent Processing Times

Directory of Open Access Journals (Sweden)

Jae-Gon Kim

2015-01-01

Full Text Available Lot-order assignment is to assign items in lots being processed to orders to fulfill the orders. It is usually performed periodically for meeting the due dates of orders especially in a manufacturing industry with a long production cycle time such as the semiconductor manufacturing industry. In this paper, we consider the lot-order assignment problem (LOAP with the objective of minimizing the total tardiness of the orders with distinct due dates. We show that we can solve the LOAP optimally by finding an optimal sequence for the single-machine total tardiness scheduling problem with nonnegative time-dependent processing times (SMTTSP-NNTDPT. Also, we address how the priority rules for the SMTTSP can be modified to those for the SMTTSP-NNTDPT to solve the LOAP. In computational experiments, we discuss the performances of the suggested priority rules and show the result of the proposed approach outperforms that of the commercial optimization software package.
Comparison of Iterative Methods for Computing the Pressure Field in a Dynamic Network Model

DEFF Research Database (Denmark)

Mogensen, Kristian; Stenby, Erling Halfdan; Banerjee, Srilekha

1999-01-01

In dynamic network models, the pressure map (the pressure in the pores) must be evaluated at each time step. This calculation involves the solution of a large number of nonlinear algebraic systems of equations and accounts for more than 80 of the total CPU-time. Each nonlinear system requires...
On the Laws of Total Local Times for -Paths and Bridges of Symmetric Lévy Processes

Directory of Open Access Journals (Sweden)

Masafumi Hayashi

2013-01-01

Full Text Available The joint law of the total local times at two levels for -paths of symmetric Lévy processes is shown to admit an explicit representation in terms of the laws of the squared Bessel processes of dimensions two and zero. The law of the total local time at a single level for bridges is also discussed.
Acceleration for 2D time-domain elastic full waveform inversion using a single GPU card

Science.gov (United States)

Jiang, Jinpeng; Zhu, Peimin

2018-05-01

Full waveform inversion (FWI) is a challenging procedure due to the high computational cost related to the modeling, especially for the elastic case. The graphics processing unit (GPU) has become a popular device for the high-performance computing (HPC). To reduce the long computation time, we design and implement the GPU-based 2D elastic FWI (EFWI) in time domain using a single GPU card. We parallelize the forward modeling and gradient calculations using the CUDA programming language. To overcome the limitation of relatively small global memory on GPU, the boundary saving strategy is exploited to reconstruct the forward wavefield. Moreover, the L-BFGS optimization method used in the inversion increases the convergence of the misfit function. A multiscale inversion strategy is performed in the workflow to obtain the accurate inversion results. In our tests, the GPU-based implementations using a single GPU device achieve >15 times speedup in forward modeling, and about 12 times speedup in gradient calculation, compared with the eight-core CPU implementations optimized by OpenMP. The test results from the GPU implementations are verified to have enough accuracy by comparing the results obtained from the CPU implementations.
Fast-GPU-PCC: A GPU-Based Technique to Compute Pairwise Pearson's Correlation Coefficients for Time Series Data-fMRI Study.

Science.gov (United States)

Eslami, Taban; Saeed, Fahad

2018-04-20

Functional magnetic resonance imaging (fMRI) is a non-invasive brain imaging technique, which has been regularly used for studying brain’s functional activities in the past few years. A very well-used measure for capturing functional associations in brain is Pearson’s correlation coefficient. Pearson’s correlation is widely used for constructing functional network and studying dynamic functional connectivity of the brain. These are useful measures for understanding the effects of brain disorders on connectivities among brain regions. The fMRI scanners produce huge number of voxels and using traditional central processing unit (CPU)-based techniques for computing pairwise correlations is very time consuming especially when large number of subjects are being studied. In this paper, we propose a graphics processing unit (GPU)-based algorithm called Fast-GPU-PCC for computing pairwise Pearson’s correlation coefficient. Based on the symmetric property of Pearson’s correlation, this approach returns N ( N − 1 ) / 2 correlation coefficients located at strictly upper triangle part of the correlation matrix. Storing correlations in a one-dimensional array with the order as proposed in this paper is useful for further usage. Our experiments on real and synthetic fMRI data for different number of voxels and varying length of time series show that the proposed approach outperformed state of the art GPU-based techniques as well as the sequential CPU-based versions. We show that Fast-GPU-PCC runs 62 times faster than CPU-based version and about 2 to 3 times faster than two other state of the art GPU-based methods.
Correlates of occupational, leisure and total sitting time in working adults: results from the Singapore multi-ethnic cohort.

Science.gov (United States)

Uijtdewilligen, Léonie; Yin, Jason Dean-Chen; van der Ploeg, Hidde P; Müller-Riemenschneider, Falk

2017-12-13

Evidence on the health risks of sitting is accumulating. However, research identifying factors influencing sitting time in adults is limited, especially in Asian populations. This study aimed to identify socio-demographic and lifestyle correlates of occupational, leisure and total sitting time in a sample of Singapore working adults. Data were collected between 2004 and 2010 from participants of the Singapore Multi Ethnic Cohort (MEC). Medical exclusion criteria for cohort participation were cancer, heart disease, stroke, renal failure and serious mental illness. Participants who were not working over the past 12 months and without data on sitting time were excluded from the analyses. Multivariable regression analyses were used to examine cross-sectional associations of self-reported age, gender, ethnicity, marital status, education, smoking, caloric intake and moderate-to-vigorous leisure time physical activity (LTPA) with self-reported occupational, leisure and total sitting time. Correlates were also studied separately for Chinese, Malays and Indians. The final sample comprised 9384 participants (54.8% male): 50.5% were Chinese, 24.0% Malay, and 25.5% Indian. For the total sample, mean occupational sitting time was 2.71 h/day, mean leisure sitting time was 2.77 h/day and mean total sitting time was 5.48 h/day. Sitting time in all domains was highest among Chinese. Age, gender, education, and caloric intake were associated with higher occupational sitting time, while ethnicity, marital status and smoking were associated with lower occupational sitting time. Marital status, smoking, caloric intake and LTPA were associated with higher leisure sitting time, while age, gender and ethnicity were associated with lower leisure sitting time. Gender, marital status, education, caloric intake and LTPA were associated with higher total sitting time, while ethnicity was associated with lower total sitting time. Stratified analyses revealed different associations within
A Real-Time Programmer's Tour of General-Purpose L4 Microkernels

OpenAIRE

Ruocco Sergio

2008-01-01

Abstract L4-embedded is a microkernel successfully deployed in mobile devices with soft real-time requirements. It now faces the challenges of tightly integrated systems, in which user interface, multimedia, OS, wireless protocols, and even software-defined radios must run on a single CPU. In this paper we discuss the pros and cons of L4-embedded for real-time systems design, focusing on the issues caused by the extreme speed optimisations it inherited from its general-purpose ancestors. Sinc...
A Real-Time Programmer's Tour of General-Purpose L4 Microkernels

OpenAIRE

Sergio Ruocco

2008-01-01

L4-embedded is a microkernel successfully deployed in mobile devices with soft real-time requirements. It now faces the challenges of tightly integrated systems, in which user interface, multimedia, OS, wireless protocols, and even software-defined radios must run on a single CPU. In this paper we discuss the pros and cons of L4-embedded for real-time systems design, focusing on the issues caused by the extreme speed optimisations it inherited from its general-purpose ancestors. Since these i...
Minimizing Total Completion Time For Preemptive Scheduling With Release Dates And Deadline Constraints

Directory of Open Access Journals (Sweden)

He Cheng

2014-02-01

Full Text Available It is known that the single machine preemptive scheduling problem of minimizing total completion time with release date and deadline constraints is NP- hard. Du and Leung solved some special cases by the generalized Baker's algorithm and the generalized Smith's algorithm in O(n2 time. In this paper we give an O(n2 algorithm for the special case where the processing times and deadlines are agreeable. Moreover, for the case where the processing times and deadlines are disagreeable, we present two properties which could enable us to reduce the range of the enumeration algorithm
Objectively Measured Total and Occupational Sedentary Time in Three Work Settings

OpenAIRE

van Dommelen, Paula; Coffeng, Jennifer K.; van der Ploeg, Hidde P.; van der Beek, Allard J.; Boot, C?cile R. L.; Hendriksen, Ingrid J. M.

2016-01-01

Background. Sedentary behaviour increases the risk for morbidity. Our primary aim is to determine the proportion and factors associated with objectively measured total and occupational sedentary time in three work settings. Secondary aim is to study the proportion of physical activity and prolonged sedentary bouts. Methods. Data were obtained using ActiGraph accelerometers from employees of: 1) a financial service provider (n = 49 men, 31 women), 2) two research institutes (n = 30 men, 57 wom...
A polynomial time algorithm for checking regularity of totally normed process algebra

NARCIS (Netherlands)

Yang, F.; Huang, H.

2015-01-01

A polynomial algorithm for the regularity problem of weak and branching bisimilarity on totally normed process algebra (PA) processes is given. Its time complexity is O(n 3 +mn) O(n3+mn), where n is the number of transition rules and m is the maximal length of the rules. The algorithm works for
Real-Time Agent-Based Modeling Simulation with in-situ Visualization of Complex Biological Systems: A Case Study on Vocal Fold Inflammation and Healing.

Science.gov (United States)

Seekhao, Nuttiiya; Shung, Caroline; JaJa, Joseph; Mongeau, Luc; Li-Jessen, Nicole Y K

2016-05-01

We present an efficient and scalable scheme for implementing agent-based modeling (ABM) simulation with In Situ visualization of large complex systems on heterogeneous computing platforms. The scheme is designed to make optimal use of the resources available on a heterogeneous platform consisting of a multicore CPU and a GPU, resulting in minimal to no resource idle time. Furthermore, the scheme was implemented under a client-server paradigm that enables remote users to visualize and analyze simulation data as it is being generated at each time step of the model. Performance of a simulation case study of vocal fold inflammation and wound healing with 3.8 million agents shows 35× and 7× speedup in execution time over single-core and multi-core CPU respectively. Each iteration of the model took less than 200 ms to simulate, visualize and send the results to the client. This enables users to monitor the simulation in real-time and modify its course as needed.
Real-time digital control, data acquisition, and analysis system for the DIII-D multipulse Thomson scattering diagnostic

International Nuclear Information System (INIS)

Greenfield, C.M.; Campbell, G.L.; Carlstrom, T.N.; DeBoo, J.C.; Hsieh, C.; Snider, R.T.; Trost, P.K.

1990-01-01

A VME-based real-time computer system for laser control, data acquisition, and analysis for the DIII-D multipulse Thomson scattering diagnostic is described. The laser control task requires precise timing of up to eight Nd:YAG lasers, each with an average firing rate of 20 Hz. A cpu module in a real-time multiprocessing computer system will operate the lasers with evenly staggered laser pulses or in a ''burst mode,'' where all available (fully charged) lasers can be fired at 50--100 μs intervals upon receipt of an external event trigger signal. One or more cpu modules, along with a LeCroy FERA (fast encoding and readout ADC) system, will perform real-time data acquisition and analysis. Partial electron temperature and density profiles will be available for plasma feedback control within 1 ms following each laser pulse. The VME-based computer system consists of two or more target processor modules (25 MHz Motorola 68030) running the VMEexec real-time operating system connected to a Unix-based host system (also a 68030). All real-time software is fully interrupt driven to maximize system efficiency. Operator interaction and (non-real-time) data analysis takes place on a MicroVAX 3400 connected via DECnet
Just In Time Value Chain Total Quality Management Part Of Technical Strategic Management Accounting

Directory of Open Access Journals (Sweden)

Lesi Hertati

2015-08-01

Full Text Available This article aims to determine Just In Time Value Chain Total Quality Management tqm as a technique in management accounting stategis.Tujuan Just In Time value chain or value chain Total Quality Management TQM is strategic for customer satisfaction in the long term obtained from the information. Quality information is the way to continuous improvement in order to increase the companys financial performance in the long term to increase competitive advantage. Strategic Management Accounting process gather competitor information explore opportunities to reduce costs integrate accounting with emphasis on the strategic position of the competition is a great plan. An overall strategic plan interrelated and serves as the basis for achieving targets or goals ahead.
First passage times in homogeneous nucleation: Dependence on the total number of particles

International Nuclear Information System (INIS)

Yvinec, Romain; Bernard, Samuel; Pujo-Menjouet, Laurent; Hingant, Erwan

2016-01-01

Motivated by nucleation and molecular aggregation in physical, chemical, and biological settings, we present an extension to a thorough analysis of the stochastic self-assembly of a fixed number of identical particles in a finite volume. We study the statistics of times required for maximal clusters to be completed, starting from a pure-monomeric particle configuration. For finite volumes, we extend previous analytical approaches to the case of arbitrary size-dependent aggregation and fragmentation kinetic rates. For larger volumes, we develop a scaling framework to study the first assembly time behavior as a function of the total quantity of particles. We find that the mean time to first completion of a maximum-sized cluster may have a surprisingly weak dependence on the total number of particles. We highlight how higher statistics (variance, distribution) of the first passage time may nevertheless help to infer key parameters, such as the size of the maximum cluster. Finally, we present a framework to quantify formation of macroscopic sized clusters, which are (asymptotically) very unlikely and occur as a large deviation phenomenon from the mean-field limit. We argue that this framework is suitable to describe phase transition phenomena, as inherent infrequent stochastic processes, in contrast to classical nucleation theory
Productive Large Scale Personal Computing: Fast Multipole Methods on GPU/CPU Systems, Phase I

Data.gov (United States)

National Aeronautics and Space Administration — To be used naturally in design optimization, parametric study and achieve quick total time-to-solution, simulation must naturally and personally be available to the...
Batch Scheduling for Hybrid Assembly Differentiation Flow Shop to Minimize Total Actual Flow Time

Science.gov (United States)

Maulidya, R.; Suprayogi; Wangsaputra, R.; Halim, A. H.

2018-03-01

A hybrid assembly differentiation flow shop is a three-stage flow shop consisting of Machining, Assembly and Differentiation Stages and producing different types of products. In the machining stage, parts are processed in batches on different (unrelated) machines. In the assembly stage, each part of the different parts is assembled into an assembly product. Finally, the assembled products will further be processed into different types of final products in the differentiation stage. In this paper, we develop a batch scheduling model for a hybrid assembly differentiation flow shop to minimize the total actual flow time defined as the total times part spent in the shop floor from the arrival times until its due date. We also proposed a heuristic algorithm for solving the problems. The proposed algorithm is tested using a set of hypothetic data. The solution shows that the algorithm can solve the problems effectively.
Real-time autocorrelator for fluorescence correlation spectroscopy based on graphical-processor-unit architecture: method, implementation, and comparative studies

Science.gov (United States)

Laracuente, Nicholas; Grossman, Carl

2013-03-01

We developed an algorithm and software to calculate autocorrelation functions from real-time photon-counting data using the fast, parallel capabilities of graphical processor units (GPUs). Recent developments in hardware and software have allowed for general purpose computing with inexpensive GPU hardware. These devices are more suited for emulating hardware autocorrelators than traditional CPU-based software applications by emphasizing parallel throughput over sequential speed. Incoming data are binned in a standard multi-tau scheme with configurable points-per-bin size and are mapped into a GPU memory pattern to reduce time-expensive memory access. Applications include dynamic light scattering (DLS) and fluorescence correlation spectroscopy (FCS) experiments. We ran the software on a 64-core graphics pci card in a 3.2 GHz Intel i5 CPU based computer running Linux. FCS measurements were made on Alexa-546 and Texas Red dyes in a standard buffer (PBS). Software correlations were compared to hardware correlator measurements on the same signals. Supported by HHMI and Swarthmore College
A real-time digital control, data acquisition and analysis system for the DIII-D multipulse Thomson scattering diagnostic

International Nuclear Information System (INIS)

Greenfield, C.M.; Campbell, G.L.; Carlstrom, T.N.; DeBoo, J.C.; Hsieh, C.-L.; Snider, R.T.; Trost, P.K.

1990-10-01

A VME-based real-time computer systems for laser control, data acquisition and analysis for the DIII-D multipulse Thomson scattering diagnostic is described. The laser control task requires precise timing of up to 8 Nd:YAG lasers, each with an average firing rate of 20 Hz. A cpu module in real-time multiprocessing computer system will operate the lasers with evenly staggered laser pulses or in a ''burst mode'', where all available (fully charged) lasers can be fired at 50--100 μsec intervals upon receipt of an external event trigger signal. One of more cpu modules, along with a LeCroy FERA (Fast Encoding and Readout ADC) system, will perform real-time data acquisition and analysis. Partial electron temperature and density profiles will be available for plasma feedback control within 1 msec following each laser pulse. The VME-based computer system consists of 2 or more target processor modules (25 MHz Motorola 68030) running the VMEexec real-time operating system connected to a Unix based host system (also a 68030). All real-time software is fully interrupt driven to maximize system efficiency. Operator interaction and (non real-time) data analysis takes place on a MicroVAX 3400 connected via DECnet. 17 refs., 1 fig

Estimation of total bacteria by real-time PCR in patients with periodontal disease.

Science.gov (United States)

Brajović, Gavrilo; Popović, Branka; Puletić, Miljan; Kostić, Marija; Milasin, Jelena

2016-01-01

Periodontal diseases are associated with the presence of elevated levels of bacteria within the gingival crevice. The aim of this study was to evaluate a total amount of bacteria in subgingival plaque samples in patients with a periodontal disease. A quantitative evaluation of total bacteria amount using quantitative real-time polymerase chain reaction (qRT-PCR) was performed on 20 samples of patients with ulceronecrotic periodontitis and on 10 samples of healthy subjects. The estimation of total bacterial amount was based on gene copy number for 16S rRNA that was determined by comparing to Ct values/gene copy number of the standard curve. A statistically significant difference between average gene copy number of total bacteria in periodontal patients (2.55 x 10⁷) and healthy control (2.37 x 10⁶) was found (p = 0.01). Also, a trend of higher numbers of the gene copy in deeper periodontal lesions (> 7 mm) was confirmed by a positive value of coefficient of correlation (r = 0.073). The quantitative estimation of total bacteria based on gene copy number could be an important additional tool in diagnosing periodontitis.
Associations of Total and Domain-Specific Sedentary Time With Type 2 Diabetes in Taiwanese Older Adults

Directory of Open Access Journals (Sweden)

Ming-Chun Hsueh

2016-07-01

Full Text Available Background: The increasing prevalence of type 2 diabetes in older adults has become a public health concern. We investigated the associations of total and domain-specific sedentary time with risk of type 2 diabetes in older adults. Methods: The sample comprised 1046 older people (aged ≥65 years. Analyses were performed using crosssectional data collected via computer-assisted telephone-based interviews in 2014. Data on six self-reported domains of sedentary time (Measure of Older Adults’ Sedentary Time, type 2 diabetes status, and sociodemographic variables were included in the study. Binary logistic regression analysis was performed to calculate the adjusted odds ratios (ORs and 95% confidence intervals (CIs for total and individual sedentary behavior components and likelihood of type 2 diabetes. Results: A total of 17.5% of the participants reported type 2 diabetes. No significant associations were found between total sitting time and risk of type 2 diabetes, after controlling for confounding factors. After total sedentary behavior was stratified into six domains, only watching television for more than 2 hours per day was associated with higher odds of type 2 diabetes (OR 1.56; 95% CI, 1.10–2.21, but no significant associations were found between other domains of sedentary behavior (computer use, reading, socializing, transport, and hobbies and risk of type 2 diabetes. Conclusions: These findings suggest that, among domain-specific sedentary behavior, excessive television viewing might increase the risk of type 2 diabetes among older adults more than other forms of sedentary behavior.
What are the important manoeuvres for beginners to minimize surgical time in primary total knee arthroplasty?

Science.gov (United States)

Harato, Kengo; Maeno, Shinichi; Tanikawa, Hidenori; Kaneda, Kazuya; Morishige, Yutaro; Nomoto, So; Niki, Yasuo

2016-08-01

It was hypothesized that surgical time of beginners would be much longer than that of experts. Our purpose was to investigate and clarify the important manoeuvres for beginners to minimize surgical time in primary total knee arthroplasty (TKA) as a multicentre study. A total of 300 knees in 248 patients (averaged 74.6 years) were enrolled. All TKAs were done using the same instruments and the same measured resection technique at 14 facilities by 25 orthopaedic surgeons. Surgeons were divided into three surgeon groups (four experts, nine medium-volume surgeons and 12 beginners). The surgical technique was divided into five phases. Detailed surgical time and ratio of the time in each phase to overall surgical time were recorded and compared among the groups in each phase. A total of 62, 119, and 119 TKAs were done by beginners, medium-volume surgeons, and experts, respectively. Significant differences in surgical time among the groups were seen in each phase. Concerning the ratio of the time, experts and medium-volume surgeons seemed cautious in fixation of the permanent component compared to other phases. Interestingly, even in ratio, beginners and medium-volume surgeons took more time in exposure of soft tissue compared to experts. (0.14 in beginners, 0.13 in medium-volume surgeons, 0.11 in experts, P time in exposure and closure of soft tissue compared to experts. Improvement in basic technique is essential to minimize surgical time among beginners. First of all, surgical instructors should teach basic techniques in primary TKA for beginners. Therapeutic studies, Level IV.
Implicit time-dependent finite different algorithm for quench simulation

International Nuclear Information System (INIS)

Koizumi, Norikiyo; Takahashi, Yoshikazu; Tsuji, Hiroshi

1994-12-01

A magnet in a fusion machine has many difficulties in its application because of requirement of a large operating current, high operating field and high breakdown voltage. A cable-in-conduit (CIC) conductor is the best candidate to overcome these difficulties. However, there remained uncertainty in a quench event in the cable-in-conduit conductor because of a difficulty to analyze a fluid dynamics equation. Several scientists, then, developed the numerical code for the quench simulation. However, most of them were based on an explicit time-dependent finite difference scheme. In this scheme, a discrete time increment is strictly restricted by CFL (Courant-Friedrichs-Lewy) condition. Therefore, long CPU time was consumed for the quench simulation. Authors, then, developed a new quench simulation code, POCHI1, which is based on an implicit time dependent scheme. In POCHI1, the fluid dynamics equation is linearlized according to a procedure applied by Beam and Warming and then, a tridiagonal system can be offered. Therefore, no iteration is necessary to solve the fluid dynamics equation. This leads great reduction of the CPU time. Also, POCHI1 can cope with non-linear boundary condition. In this study, comparison with experimental results was carried out. The normal zone propagation behavior was investigated in two samples of CIC conductors which had different hydraulic diameters. The measured and simulated normal zone propagation length showed relatively good agreement. However, the behavior of the normal voltage shows a little disagreement. These results indicate necessity to improve the treatment of the heat transfer coefficient in the turbulent flow region and the electric resistivity of the copper stabilizer in high temperature and high field region. (author)
Integration of MDSplus in real-time systems

International Nuclear Information System (INIS)

Luchetta, A.; Manduchi, G.; Taliercio, C.

2006-01-01

RFX-mod makes extensive usage of real-time systems for feedback control and uses MDSplus to interface them to the main Data Acquisition system. For this purpose, the core of MDSplus has been ported to VxWorks, the operating system used for real-time control in RFX. Using this approach, it is possible to integrate real-time systems, but MDSplus is used only for non-real-time tasks, i.e. those tasks which are executed before and after the pulse and whose performance does not affect the system time constraints. More extensive use of MDSplus in real-time systems is foreseen, and a real-time layer for MDSplus is under development, which will provide access to memory-mapped pulse files, shared by the tasks running on the same CPU. Real-time communication will also be integrated in the MDSplus core to provide support for distributed memory-mapped pulse files
10 Management Controller for Time and Space Partitioning Architectures

Science.gov (United States)

Lachaize, Jerome; Deredempt, Marie-Helene; Galizzi, Julien

2015-09-01

The Integrated Modular Avionics (IMA) has been industrialized in aeronautical domain to enable the independent qualification of different application softwares from different suppliers on the same generic computer, this latter computer being a single terminal in a deterministic network. This concept allowed to distribute efficiently and transparently the different applications across the network, sizing accurately the HW equipments to embed on the aircraft, through the configuration of the virtual computers and the virtual network. , This concept has been studied for space domain and requirements issued [D04],[D05]. Experiments in the space domain have been done, for the computer level, through ESA and CNES initiatives [D02] [D03]. One possible IMA implementation may use Time and Space Partitioning (TSP) technology. Studies on Time and Space Partitioning [D02] for controlling resources access such as CPU and memories and studies on hardware/software interface standardization [D01] showed that for space domain technologies where I/O components (or IP) do not cover advanced features such as buffering, descriptors or virtualization, CPU overhead in terms of performances is mainly due to shared interface management in the execution platform, and to the high frequency of I/O accesses, these latter leading to an important number of context switches. This paper will present a solution to reduce this execution overhead with an open, modular and configurable controller.
Embedded real-time operating system micro kernel design

Science.gov (United States)

Cheng, Xiao-hui; Li, Ming-qiang; Wang, Xin-zheng

2005-12-01

Embedded systems usually require a real-time character. Base on an 8051 microcontroller, an embedded real-time operating system micro kernel is proposed consisting of six parts, including a critical section process, task scheduling, interruption handle, semaphore and message mailbox communication, clock managent and memory managent. Distributed CPU and other resources are among tasks rationally according to the importance and urgency. The design proposed here provides the position, definition, function and principle of micro kernel. The kernel runs on the platform of an ATMEL AT89C51 microcontroller. Simulation results prove that the designed micro kernel is stable and reliable and has quick response while operating in an application system.
Physiotherapy Exercise After Fast-Track Total Hip and Knee Arthroplasty: Time for Reconsideration?

DEFF Research Database (Denmark)

Bandholm, Thomas; Kehlet, Henrik

2012-01-01

Bandholm T, Kehlet H. Physiotherapy exercise after fast-track total hip and knee arthroplasty: time for reconsideration? Major surgery, including total hip arthroplasty (THA) and total knee arthroplasty (TKA), is followed by a convalescence period, during which the loss of muscle strength......-track methodology or enhanced recovery programs. It is the nature of this methodology to systematically and scientifically optimize all perioperative care components, with the overall goal of enhancing recovery. This is also the case for the care component "physiotherapy exercise" after THA and TKA. The 2 latest...... meta-analyses on the effectiveness of physiotherapy exercise after THA and TKA generally conclude that physiotherapy exercise after THA and TKA either does not work or is not very effective. The reason for this may be that the "pill" of physiotherapy exercise typically offered after THA and TKA does...
Joint association of physical activity in leisure and total sitting time with metabolic syndrome amongst 15,235 Danish adults

DEFF Research Database (Denmark)

Petersen, Christina Bjørk; Nielsen, Asser Jon; Bauman, Adrian

2014-01-01

and total daily sitting time were assessed by self-report in 15,235 men and women in the Danish Health Examination Survey 2007-2008. Associations between leisure time physical activity, total sitting time and metabolic syndrome were investigated in logistic regression analysis. RESULTS: Adjusted odds ratios......BACKGROUND: Recent studies suggest that physical inactivity as well as sitting time are associated with metabolic syndrome. Our aim was to examine joint associations of leisure time physical activity and total daily sitting time with metabolic syndrome. METHODS: Leisure time physical activity...... (OR) for metabolic syndrome were 2.14 (95% CI: 1.88-2.43) amongst participants who were inactive in leisure time compared to the most active, and 1.42 (95% CI: 1.26-1.61) amongst those who sat for ≥10h/day compared to physical activity, sitting time...
An adaptive time-stepping strategy for solving the phase field crystal model

International Nuclear Information System (INIS)

Zhang, Zhengru; Ma, Yuan; Qiao, Zhonghua

2013-01-01

In this work, we will propose an adaptive time step method for simulating the dynamics of the phase field crystal (PFC) model. The numerical simulation of the PFC model needs long time to reach steady state, and then large time-stepping method is necessary. Unconditionally energy stable schemes are used to solve the PFC model. The time steps are adaptively determined based on the time derivative of the corresponding energy. It is found that the use of the proposed time step adaptivity cannot only resolve the steady state solution, but also the dynamical development of the solution efficiently and accurately. The numerical experiments demonstrate that the CPU time is significantly saved for long time simulations
Cross-sectional associations of total sitting and leisure screen time with cardiometabolic risk in adults. Results from the HUNT Study, Norway.

Science.gov (United States)

Chau, Josephine Y; Grunseit, Anne; Midthjell, Kristian; Holmen, Jostein; Holmen, Turid L; Bauman, Adrian E; van der Ploeg, Hidde P

2014-01-01

To examine associations of total sitting time, TV-viewing and leisure-time computer use with cardiometabolic risk biomarkers in adults. Population based cross-sectional study. Waist circumference, BMI, total cholesterol, HDL cholesterol, blood pressure, non-fasting glucose, gamma glutamyltransferase (GGT) and triglycerides were measured in 48,882 adults aged 20 years or older from the Nord-Trøndelag Health Study 2006-2008 (HUNT3). Adjusted multiple regression models were used to test for associations between these biomarkers and self-reported total sitting time, TV-viewing and leisure-time computer use in the whole sample and by cardiometabolic disease status sub-groups. In the whole sample, reporting total sitting time ≥10 h/day was associated with poorer BMI, waist circumference, total cholesterol, HDL cholesterol, diastolic blood pressure, systolic blood pressure, non-fasting glucose, GGT and triglyceride levels compared to those reporting total sitting time Leisure-time computer use ≥1 h/day was associated with poorer BMI, total cholesterol, diastolic blood pressure, GGT and triglycerides compared with those reporting no leisure-time computing. Sub-group analyses by cardiometabolic disease status showed similar patterns in participants free of cardiometabolic disease, while similar albeit non-significant patterns were observed in those with cardiometabolic disease. Total sitting time, TV-viewing and leisure-time computer use are associated with poorer cardiometabolic risk profiles in adults. Reducing sedentary behaviour throughout the day and limiting TV-viewing and leisure-time computer use may have health benefits. Copyright © 2013 Sports Medicine Australia. Published by Elsevier Ltd. All rights reserved.
Can a surgery-first orthognathic approach reduce the total treatment time?

Science.gov (United States)

Jeong, Woo Shik; Choi, Jong Woo; Kim, Do Yeon; Lee, Jang Yeol; Kwon, Soon Man

2017-04-01

Although pre-surgical orthodontic treatment has been accepted as a necessary process for stable orthognathic correction in the traditional orthognathic approach, recent advances in the application of miniscrews and in the pre-surgical simulation of orthodontic management using dental models have shown that it is possible to perform a surgery-first orthognathic approach without pre-surgical orthodontic treatment. This prospective study investigated the surgical outcomes of patients with diagnosed skeletal class III dentofacial deformities who underwent orthognathic surgery between December 2007 and December 2014. Cephalometric landmark data for patients undergoing the surgery-first approach were analyzed in terms of postoperative changes in vertical and horizontal skeletal pattern, dental pattern, and soft tissue profile. Forty-five consecutive Asian patients with skeletal class III dentofacial deformities who underwent surgery-first orthognathic surgery and 52 patients who underwent conventional two-jaw orthognathic surgery were included. The analysis revealed that the total treatment period for the surgery-first approach averaged 14.6 months, compared with 22.0 months for the orthodontics-first approach. Comparisons between the immediate postoperative and preoperative and between the postoperative and immediate postoperative cephalometric data revealed factors that correlated with the total treatment duration. The surgery-first orthognathic approach can dramatically reduce the total treatment time, with no major complications. Copyright © 2016 International Association of Oral and Maxillofacial Surgeons. Published by Elsevier Ltd. All rights reserved.
Optimizing Ship Speed to Minimize Total Fuel Consumption with Multiple Time Windows

Directory of Open Access Journals (Sweden)

Jae-Gon Kim

2016-01-01

Full Text Available We study the ship speed optimization problem with the objective of minimizing the total fuel consumption. We consider multiple time windows for each port call as constraints and formulate the problem as a nonlinear mixed integer program. We derive intrinsic properties of the problem and develop an exact algorithm based on the properties. Computational experiments show that the suggested algorithm is very efficient in finding an optimal solution.
Reduced Operating Time but Not Blood Loss With Cruciate Retaining Total Knee Arthroplasty

Science.gov (United States)

Vermesan, Dinu; Trocan, Ilie; Prejbeanu, Radu; Poenaru, Dan V; Haragus, Horia; Gratian, Damian; Marrelli, Massimo; Inchingolo, Francesco; Caprio, Monica; Cagiano, Raffaele; Tatullo, Marco

2015-01-01

Background There is no consensus regarding the use of retaining or replacing cruciate implants for patients with limited deformity who undergo a total knee replacement. Scope of this paper is to evaluate whether a cruciate sparing total knee replacement could have a reduced operating time compared to a posterior stabilized implant. Methods For this purpose, we performed a randomized study on 50 subjects. All procedures were performed by a single surgeon in the same conditions to minimize bias and only knees with a less than 20 varus deviation and/or maximum 15° fixed flexion contracture were included. Results Surgery time was significantly shorter with the cruciate retaining implant (P = 0.0037). The mean duration for the Vanguard implant was 68.9 (14.7) and for the NexGen II Legacy was 80.2 (11.3). A higher range of motion, but no significant Knee Society Scores at 6 months follow-up, was used as controls. Conclusions In conclusion, both implants had the potential to assure great outcomes. However, if a decision has to be made, choosing a cruciate retaining procedure could significantly reduce the surgical time. When performed under tourniquet, this gain does not lead to reduced blood loss. PMID:25584102
Independent and combined associations of total sedentary time and television viewing time with food intake patterns of 9- to 11-year-old Canadian children.

Science.gov (United States)

Borghese, Michael M; Tremblay, Mark S; Leduc, Genevieve; Boyer, Charles; Bélanger, Priscilla; LeBlanc, Allana G; Francis, Claire; Chaput, Jean-Philippe

2014-08-01

The relationships among sedentary time, television viewing time, and dietary patterns in children are not fully understood. The aim of this paper was to determine which of self-reported television viewing time or objectively measured sedentary time is a better correlate of the frequency of consumption of healthy and unhealthy foods. A cross-sectional study was conducted of 9- to 11-year-old children (n = 523; 57.1% female) from Ottawa, Ontario, Canada. Accelerometers were used to determine total sedentary time, and questionnaires were used to determine the number of hours of television watching and the frequency of consumption of foods per week. Television viewing was negatively associated with the frequency of consumption of fruits, vegetables, and green vegetables, and positively associated with the frequency of consumption of sweets, soft drinks, diet soft drinks, pastries, potato chips, French fries, fruit juices, ice cream, fried foods, and fast food. Except for diet soft drinks and fruit juices, these associations were independent of covariates, including sedentary time. Total sedentary time was negatively associated with the frequency of consumption of sports drinks, independent of covariates, including television viewing. In combined sedentary time and television viewing analyses, children watching >2 h of television per day consumed several unhealthy food items more frequently than did children watching ≤2 h of television, regardless of sedentary time. In conclusion, this paper provides evidence to suggest that television viewing time is more strongly associated with unhealthy dietary patterns than is total sedentary time. Future research should focus on reducing television viewing time, as a means of improving dietary patterns and potentially reducing childhood obesity.
Improving the Prediction of Total Surgical Procedure Time Using Linear Regression Modeling

Directory of Open Access Journals (Sweden)

Eric R. Edelman

2017-06-01

Full Text Available For efficient utilization of operating rooms (ORs, accurate schedules of assigned block time and sequences of patient cases need to be made. The quality of these planning tools is dependent on the accurate prediction of total procedure time (TPT per case. In this paper, we attempt to improve the accuracy of TPT predictions by using linear regression models based on estimated surgeon-controlled time (eSCT and other variables relevant to TPT. We extracted data from a Dutch benchmarking database of all surgeries performed in six academic hospitals in The Netherlands from 2012 till 2016. The final dataset consisted of 79,983 records, describing 199,772 h of total OR time. Potential predictors of TPT that were included in the subsequent analysis were eSCT, patient age, type of operation, American Society of Anesthesiologists (ASA physical status classification, and type of anesthesia used. First, we computed the predicted TPT based on a previously described fixed ratio model for each record, multiplying eSCT by 1.33. This number is based on the research performed by van Veen-Berkx et al., which showed that 33% of SCT is generally a good approximation of anesthesia-controlled time (ACT. We then systematically tested all possible linear regression models to predict TPT using eSCT in combination with the other available independent variables. In addition, all regression models were again tested without eSCT as a predictor to predict ACT separately (which leads to TPT by adding SCT. TPT was most accurately predicted using a linear regression model based on the independent variables eSCT, type of operation, ASA classification, and type of anesthesia. This model performed significantly better than the fixed ratio model and the method of predicting ACT separately. Making use of these more accurate predictions in planning and sequencing algorithms may enable an increase in utilization of ORs, leading to significant financial and productivity related
Improving the Prediction of Total Surgical Procedure Time Using Linear Regression Modeling.

Science.gov (United States)

Edelman, Eric R; van Kuijk, Sander M J; Hamaekers, Ankie E W; de Korte, Marcel J M; van Merode, Godefridus G; Buhre, Wolfgang F F A

2017-01-01

For efficient utilization of operating rooms (ORs), accurate schedules of assigned block time and sequences of patient cases need to be made. The quality of these planning tools is dependent on the accurate prediction of total procedure time (TPT) per case. In this paper, we attempt to improve the accuracy of TPT predictions by using linear regression models based on estimated surgeon-controlled time (eSCT) and other variables relevant to TPT. We extracted data from a Dutch benchmarking database of all surgeries performed in six academic hospitals in The Netherlands from 2012 till 2016. The final dataset consisted of 79,983 records, describing 199,772 h of total OR time. Potential predictors of TPT that were included in the subsequent analysis were eSCT, patient age, type of operation, American Society of Anesthesiologists (ASA) physical status classification, and type of anesthesia used. First, we computed the predicted TPT based on a previously described fixed ratio model for each record, multiplying eSCT by 1.33. This number is based on the research performed by van Veen-Berkx et al., which showed that 33% of SCT is generally a good approximation of anesthesia-controlled time (ACT). We then systematically tested all possible linear regression models to predict TPT using eSCT in combination with the other available independent variables. In addition, all regression models were again tested without eSCT as a predictor to predict ACT separately (which leads to TPT by adding SCT). TPT was most accurately predicted using a linear regression model based on the independent variables eSCT, type of operation, ASA classification, and type of anesthesia. This model performed significantly better than the fixed ratio model and the method of predicting ACT separately. Making use of these more accurate predictions in planning and sequencing algorithms may enable an increase in utilization of ORs, leading to significant financial and productivity related benefits.
Hybrid GPU-CPU adaptive precision ray-triangle intersection tests for robust high-performance GPU dosimetry computations

International Nuclear Information System (INIS)

Perrotte, Lancelot; Bodin, Bruno; Chodorge, Laurent

2011-01-01

Before an intervention on a nuclear site, it is essential to study different scenarios to identify the less dangerous one for the operator. Therefore, it is mandatory to dispose of an efficient dosimetry simulation code with accurate results. One classical method in radiation protection is the straight-line attenuation method with build-up factors. In the case of 3D industrial scenes composed of meshes, the computation cost resides in the fast computation of all of the intersections between the rays and the triangles of the scene. Efficient GPU algorithms have already been proposed, that enable dosimetry calculation for a huge scene (800000 rays, 800000 triangles) in a fraction of second. But these algorithms are not robust: because of the rounding caused by floating-point arithmetic, the numerical results of the ray-triangle intersection tests can differ from the expected mathematical results. In worst case scenario, this can lead to a computed dose rate dramatically inferior to the real dose rate to which the operator is exposed. In this paper, we present a hybrid GPU-CPU algorithm to manage adaptive precision floating-point arithmetic. This algorithm allows robust ray-triangle intersection tests, with very small loss of performance (less than 5 % overhead), and without any need for scene-dependent tuning. (author)
Cache timing attacks on recent microarchitectures

DEFF Research Database (Denmark)

Andreou, Alexandres; Bogdanov, Andrey; Tischhauser, Elmar Wolfgang

2017-01-01

Cache timing attacks have been known for a long time, however since the rise of cloud computing and shared hardware resources, such attacks found new potentially devastating applications. One prominent example is S$A (presented by Irazoqui et al at S&P 2015) which is a cache timing attack against...... AES or similar algorithms in virtualized environments. This paper applies variants of this cache timing attack to Intel's latest generation of microprocessors. It enables a spy-process to recover cryptographic keys, interacting with the victim processes only over TCP. The threat model is a logically...... separated but CPU co-located attacker with root privileges. We report successful and practically verified applications of this attack against a wide range of microarchitectures, from a two-core Nehalem processor (i5-650) to two-core Haswell (i7-4600M) and four-core Skylake processors (i7-6700). The attack...
Near-real-time Estimation and Forecast of Total Precipitable Water in Europe

Science.gov (United States)

Bartholy, J.; Kern, A.; Barcza, Z.; Pongracz, R.; Ihasz, I.; Kovacs, R.; Ferencz, C.

2013-12-01

Information about the amount and spatial distribution of atmospheric water vapor (or total precipitable water) is essential for understanding weather and the environment including the greenhouse effect, the climate system with its feedbacks and the hydrological cycle. Numerical weather prediction (NWP) models need accurate estimations of water vapor content to provide realistic forecasts including representation of clouds and precipitation. In the present study we introduce our research activity for the estimation and forecast of atmospheric water vapor in Central Europe using both observations and models. The Eötvös Loránd University (Hungary) operates a polar orbiting satellite receiving station in Budapest since 2002. This station receives Earth observation data from polar orbiting satellites including MODerate resolution Imaging Spectroradiometer (MODIS) Direct Broadcast (DB) data stream from satellites Terra and Aqua. The received DB MODIS data are automatically processed using freely distributed software packages. Using the IMAPP Level2 software total precipitable water is calculated operationally using two different methods. Quality of the TPW estimations is a crucial question for further application of the results, thus validation of the remotely sensed total precipitable water fields is presented using radiosonde data. In a current research project in Hungary we aim to compare different estimations of atmospheric water vapor content. Within the frame of the project we use a NWP model (DBCRAS; Direct Broadcast CIMSS Regional Assimilation System numerical weather prediction software developed by the University of Wisconsin, Madison) to forecast TPW. DBCRAS uses near real time Level2 products from the MODIS data processing chain. From the wide range of the derived Level2 products the MODIS TPW parameter found within the so-called mod07 results (Atmospheric Profiles Product) and the cloud top pressure and cloud effective emissivity parameters from the so

Open problems in CEM: Porting an explicit time-domain volume-integral- equation solver on GPUs with OpenACC

KAUST Repository

Ergül, Özgür

2014-04-01

Graphics processing units (GPUs) are gradually becoming mainstream in high-performance computing, as their capabilities for enhancing performance of a large spectrum of scientific applications to many fold when compared to multi-core CPUs have been clearly identified and proven. In this paper, implementation and performance-tuning details for porting an explicit marching-on-in-time (MOT)-based time-domain volume-integral-equation (TDVIE) solver onto GPUs are described in detail. To this end, a high-level approach, utilizing the OpenACC directive-based parallel programming model, is used to minimize two often-faced challenges in GPU programming: developer productivity and code portability. The MOT-TDVIE solver code, originally developed for CPUs, is annotated with compiler directives to port it to GPUs in a fashion similar to how OpenMP targets multi-core CPUs. In contrast to CUDA and OpenCL, where significant modifications to CPU-based codes are required, this high-level approach therefore requires minimal changes to the codes. In this work, we make use of two available OpenACC compilers, CAPS and PGI. Our experience reveals that different annotations of the code are required for each of the compilers, due to different interpretations of the fairly new standard by the compiler developers. Both versions of the OpenACC accelerated code achieved significant performance improvements, with up to 30× speedup against the sequential CPU code using recent hardware technology. Moreover, we demonstrated that the GPU-accelerated fully explicit MOT-TDVIE solver leveraged energy-consumption gains of the order of 3× against its CPU counterpart. © 2014 IEEE.
Where Does the Time Go in Software DSMs?--Experiences with JIAJIA

Institute of Scientific and Technical Information of China (English)

SHI Weisong; HU Weiwu; TANGZhimin

1999-01-01

The performance gap between softwareDSM systems and message passing platforms prevents the prevalence ofsoftware DSM system greatly, though great efforts have been delivered inthis area in the past decade. In this paper, we take the challenge tofind where we should focus our efforts in the future design. Thecomponents of total system overhead of software DSM systems are analyzedin detail firstly. Based on a state-of-the-art software DSM systemJIAJIA, we measure these components on Dawning parallel system and drawfive important conclusions which are different from some traditionalviewpoints. (1) The performance of the JIAJIA software DSM system isacceptable. For four of eight applications, the parallel efficiencyachieved by JIAJIA is about 80%, while for two others, 70% efficiencycan be obtained. (2) 40.94% interrupt service time is overlapped withwaiting time. (3) Encoding and decoding diffs do not cost muchtime (<1%), so using hardware support to encode/decode diffs andsend/receive messages is not worthwhile. (4) Great endeavours should beput to reduce data miss penalty and optimize synchronization operations,which occupy 11.75% and 13.65% of total execution time respectively.(5) Communication hardware overhead occupies 66.76% of the wholecommunication time in the experimental environment, and communicationsoftware overhead does not take much time as expected.Moreover, by studying the effect of CPU speed to system overhead, wefind that the common speedup formula for distributed memory systems doesnot work under software DSM systems. Therefore, we design a new speedupformula special to software DSM systems, and point out that when the CPUspeed increases the speedup can be increased too even if the networkspeed is fixed, which is impossible in message passing systems. Finally,we argue that JIAJIA system has desired scalability.
Time-based analysis of total cost of patient episodes: a case study of hip replacement.

Science.gov (United States)

Peltokorpi, Antti; Kujala, Jaakko

2006-01-01

Healthcare in the public and private sectors is facing increasing pressure to become more cost-effective. Time-based competition and work-in-progress have been used successfully to measure and improve the efficiency of industrial manufacturing. Seeks to address this issue. Presents a framework for time based management of the total cost of a patient episode and apply it to the six sigma DMAIC-process development approach. The framework is used to analyse hip replacement patient episodes in Päijät-Häme Hospital District in Finland, which has a catchment area of 210,000 inhabitants and performs an average of 230 hip replacements per year. The work-in-progress concept is applicable to healthcare--notably that the DMAIC-process development approach can be used to analyse the total cost of patient episodes. Concludes that a framework, which combines the patient-in-process and the DMAIC development approach, can be used not only to analyse the total cost of patient episode but also to improve patient process efficiency. Presents a framework that combines patient-in-process and DMAIC-process development approaches, which can be used to analyse the total cost of a patient episode in order to improve patient process efficiency.
Time-dependent density functional theory description of total photoabsorption cross sections

Science.gov (United States)

Tenorio, Bruno Nunes Cabral; Nascimento, Marco Antonio Chaer; Rocha, Alexandre Braga

2018-02-01

The time-dependent version of the density functional theory (TDDFT) has been used to calculate the total photoabsorption cross section of a number of molecules, namely, benzene, pyridine, furan, pyrrole, thiophene, phenol, naphthalene, and anthracene. The discrete electronic pseudo-spectra, obtained in a L2 basis set calculation were used in an analytic continuation procedure to obtain the photoabsorption cross sections. The ammonia molecule was chosen as a model system to compare the results obtained with TDDFT to those obtained with the linear response coupled cluster approach in order to make a link with our previous work and establish benchmarks.
Total variation regularization for a backward time-fractional diffusion problem

International Nuclear Information System (INIS)

Wang, Liyan; Liu, Jijun

2013-01-01

Consider a two-dimensional backward problem for a time-fractional diffusion process, which can be considered as image de-blurring where the blurring process is assumed to be slow diffusion. In order to avoid the over-smoothing effect for object image with edges and to construct a fast reconstruction scheme, the total variation regularizing term and the data residual error in the frequency domain are coupled to construct the cost functional. The well posedness of this optimization problem is studied. The minimizer is sought approximately using the iteration process for a series of optimization problems with Bregman distance as a penalty term. This iteration reconstruction scheme is essentially a new regularizing scheme with coupling parameter in the cost functional and the iteration stopping times as two regularizing parameters. We give the choice strategy for the regularizing parameters in terms of the noise level of measurement data, which yields the optimal error estimate on the iterative solution. The series optimization problems are solved by alternative iteration with explicit exact solution and therefore the amount of computation is much weakened. Numerical implementations are given to support our theoretical analysis on the convergence rate and to show the significant reconstruction improvements. (paper)
Does the brake response time of the right leg change after left total knee arthroplasty? A prospective study.

Science.gov (United States)

Marques, Carlos J; Barreiros, João; Cabri, Jan; Carita, Ana I; Friesecke, Christian; Loehr, Jochen F

2008-08-01

Patients undergoing total knee arthroplasty often ask when they can safely resume car driving. There is little evidence available on which physicians can rely when advising patients on this issue. In a prospective study we assessed the brake response time of 24 patients admitted to the clinic for left total knee arthroplasty preoperatively and then 10 days after surgery. On each measurement day the patients performed two tasks, a simple and a complex brake response time task in a car simulator. Ten days after left TKA the brake response time for the simple task had decreased by 3.6% (p=0.24), the reaction time by 3.1% (p=0.34) and the movement time by 6.6% (p=0.07). However, the performance improvement was not statistically significant. Task complexity increased brake response time at both time points. A 5.8% increase was significant (p=0.01) at 10 days after surgery. Based on our results, we suggest that patients who have undergone left total knee arthroplasty may resume car driving 10 days after surgery as long as they drive a car with automatic transmission.
Seismic response of three-dimensional topographies using a time-domain boundary element method

Science.gov (United States)

Janod, François; Coutant, Olivier

2000-08-01

We present a time-domain implementation for a boundary element method (BEM) to compute the diffraction of seismic waves by 3-D topographies overlying a homogeneous half-space. This implementation is chosen to overcome the memory limitations arising when solving the boundary conditions with a frequency-domain approach. This formulation is flexible because it allows one to make an adaptive use of the Green's function time translation properties: the boundary conditions solving scheme can be chosen as a trade-off between memory and cpu requirements. We explore here an explicit method of solution that requires little memory but a high cpu cost in order to run on a workstation computer. We obtain good results with four points per minimum wavelength discretization for various topographies and plane wave excitations. This implementation can be used for two different aims: the time-domain approach allows an easier implementation of the BEM in hybrid methods (e.g. coupling with finite differences), and it also allows one to run simple BEM models with reasonable computer requirements. In order to keep reasonable computation times, we do not introduce any interface and we only consider homogeneous models. Results are shown for different configurations: an explosion near a flat free surface, a plane wave vertically incident on a Gaussian hill and on a hemispherical cavity, and an explosion point below the surface of a Gaussian hill. Comparison is made with other numerical methods, such as finite difference methods (FDMs) and spectral elements.
Qualità totale e mobilità totale Total Quality and Total Mobility

Directory of Open Access Journals (Sweden)

Giuseppe Trieste

2010-05-01

Full Text Available FIABA ONLUS (Italian Fund for Elimination of Architectural Barriers was founded in 2000 with the aim of promoting a culture of equal opportunities and, above all, it has as its main goal to involve public and private institutions to create a really accessible and usable environment for everyone. Total accessibility, Total usability and Total mobility are key indicators to define quality of life within cities. A supportive environment that is free of architectural, cultural and psychological barriers allows everyone to live with ease and universality. In fact, people who access to goods and services in the urban context can use to their advantage time and space, so they can do their activities and can maintain relationships that are deemed significant for their social life. The main aim of urban accessibility is to raise the comfort of space for citizens, eliminating all barriers that discriminate people, and prevent from an equality of opportunity. “FIABA FUND - City of ... for the removal of architectural barriers” is an idea of FIABA that has already affected many regions of Italy as Lazio, Lombardy, Campania, Abruzzi and Calabria. It is a National project which provides for opening a bank account in the cities of referring, in which for the first time, all together, individuals and private and public institutions can make a donation to fund initiatives for the removal of architectural barriers within its own territory for a real and effective total accessibility. Last February the fund was launched in Rome with the aim of achieving a Capital without barriers and a Town European model of accessibility and usability. Urban mobility is a prerequisite to access to goods and services, and to organize activities related to daily life. FIABA promotes the concept of sustainable mobility for all, supported by the European Commission’s White Paper. We need a cultural change in management and organization of public means, which might focus on
Empirical forecast of quiet time ionospheric Total Electron Content maps over Europe

Science.gov (United States)

Badeke, Ronny; Borries, Claudia; Hoque, Mainul M.; Minkwitz, David

2018-06-01

An accurate forecast of the atmospheric Total Electron Content (TEC) is helpful to investigate space weather influences on the ionosphere and technical applications like satellite-receiver radio links. The purpose of this work is to compare four empirical methods for a 24-h forecast of vertical TEC maps over Europe under geomagnetically quiet conditions. TEC map data are obtained from the Space Weather Application Center Ionosphere (SWACI) and the Universitat Politècnica de Catalunya (UPC). The time-series methods Standard Persistence Model (SPM), a 27 day median model (MediMod) and a Fourier Series Expansion are compared to maps for the entire year of 2015. As a representative of the climatological coefficient models the forecast performance of the Global Neustrelitz TEC model (NTCM-GL) is also investigated. Time periods of magnetic storms, which are identified with the Dst index, are excluded from the validation. By calculating the TEC values with the most recent maps, the time-series methods perform slightly better than the coefficient model NTCM-GL. The benefit of NTCM-GL is its independence on observational TEC data. Amongst the time-series methods mentioned, MediMod delivers the best overall performance regarding accuracy and data gap handling. Quiet-time SWACI maps can be forecasted accurately and in real-time by the MediMod time-series approach.
GPU acceleration towards real-time image reconstruction in 3D tomographic diffractive microscopy

Science.gov (United States)

Bailleul, J.; Simon, B.; Debailleul, M.; Liu, H.; Haeberlé, O.

2012-06-01

Phase microscopy techniques regained interest in allowing for the observation of unprepared specimens with excellent temporal resolution. Tomographic diffractive microscopy is an extension of holographic microscopy which permits 3D observations with a finer resolution than incoherent light microscopes. Specimens are imaged by a series of 2D holograms: their accumulation progressively fills the range of frequencies of the specimen in Fourier space. A 3D inverse FFT eventually provides a spatial image of the specimen. Consequently, acquisition then reconstruction are mandatory to produce an image that could prelude real-time control of the observed specimen. The MIPS Laboratory has built a tomographic diffractive microscope with an unsurpassed 130nm resolution but a low imaging speed - no less than one minute. Afterwards, a high-end PC reconstructs the 3D image in 20 seconds. We now expect an interactive system providing preview images during the acquisition for monitoring purposes. We first present a prototype implementing this solution on CPU: acquisition and reconstruction are tied in a producer-consumer scheme, sharing common data into CPU memory. Then we present a prototype dispatching some reconstruction tasks to GPU in order to take advantage of SIMDparallelization for FFT and higher bandwidth for filtering operations. The CPU scheme takes 6 seconds for a 3D image update while the GPU scheme can go down to 2 or > 1 seconds depending on the GPU class. This opens opportunities for 4D imaging of living organisms or crystallization processes. We also consider the relevance of GPU for 3D image interaction in our specific conditions.
When is it safe to resume driving after total hip and total knee arthroplasty? a meta-analysis of literature on post-operative brake reaction times.

Science.gov (United States)

van der Velden, C A; Tolk, J J; Janssen, R P A; Reijman, M

2017-05-01

The aim of this study was to assess the current available evidence about when patients might resume driving after elective, primary total hip (THA) or total knee arthroplasty (TKA) undertaken for osteoarthritis (OA). In February 2016, EMBASE, MEDLINE, Web of Science, Scopus, Cochrane, PubMed Publisher, CINAHL, EBSCO and Google Scholar were searched for clinical studies reporting on 'THA', 'TKA', 'car driving', 'reaction time' and 'brake response time'. Two researchers (CAV and JJT) independently screened the titles and abstracts for eligibility and assessed the risk of bias. Both fixed and random effects were used to pool data and calculate mean differences (MD) and 95% confidence intervals (CI) between pre- and post-operative total brake response time (TBRT). A total of 19 studies were included. The assessment of the risk of bias showed that one study was at high risk, six studies at moderate risk and 12 studies at low risk. Meta-analysis of TBRT showed a MD decrease of 25.54 ms (95% CI -32.02 to 83.09) two weeks after right-sided THA, and of 18.19 ms (95% CI -6.13 to 42.50) four weeks after a right-sided TKA, when compared with the pre-operative value. The TBRT returned to baseline two weeks after a right-sided THA and four weeks after a right-sided TKA. These results may serve as guidelines for orthopaedic surgeons when advising patients when to resume driving. However, the advice should be individualised. Cite this article: Bone Joint J 2017;99-B:566-76. ©2017 The British Editorial Society of Bone & Joint Surgery.
Real-time analysis of total, elemental, and total speciated mercury

International Nuclear Information System (INIS)

Schlager, R.J.; Wilson, K.G.; Sappey, A.D.

1995-01-01

ADA Technologies, Inc., is developing a continuous emissions monitoring system that measures the concentrations of mercury in flue gas. Mercury is emitted as an air pollutant from a number of industrial processes. The largest contributors of these emissions are coal and oil combustion, municipal waste combustion, medical waste combustion, and the thermal treatment of hazardous materials. It is difficult, time consuming, and expensive to measure mercury emissions using current testing methods. Part of the difficulty lies in the fact that mercury is emitted from sources in several different forms, such as elemental mercury and mercuric chloride. The ADA analyzer measures these emissions in real time, thus providing a number of advantages over existing test methods: (1) it will provide a real-time measure of emission rates, (2) it will assure facility operators, regulators, and the public that emissions control systems are working at peak efficiency, and (3) it will provide information as to the nature of the emitted mercury (elemental mercury or speciated compounds). This update presents an overview of the CEM and describes features of key components of the monitoring system--the mercury detector, a mercury species converter, and the analyzer calibration system
Real-time analysis of total, elemental, and total speciated mercury

Energy Technology Data Exchange (ETDEWEB)

Schlager, R.J.; Wilson, K.G.; Sappey, A.D. [ADA Technologies, Inc., Englewood, CO (United States)

1995-11-01

ADA Technologies, Inc., is developing a continuous emissions monitoring system that measures the concentrations of mercury in flue gas. Mercury is emitted as an air pollutant from a number of industrial processes. The largest contributors of these emissions are coal and oil combustion, municipal waste combustion, medical waste combustion, and the thermal treatment of hazardous materials. It is difficult, time consuming, and expensive to measure mercury emissions using current testing methods. Part of the difficulty lies in the fact that mercury is emitted from sources in several different forms, such as elemental mercury and mercuric chloride. The ADA analyzer measures these emissions in real time, thus providing a number of advantages over existing test methods: (1) it will provide a real-time measure of emission rates, (2) it will assure facility operators, regulators, and the public that emissions control systems are working at peak efficiency, and (3) it will provide information as to the nature of the emitted mercury (elemental mercury or speciated compounds). This update presents an overview of the CEM and describes features of key components of the monitoring system--the mercury detector, a mercury species converter, and the analyzer calibration system.
Time-gated scintillator imaging for real-time optical surface dosimetry in total skin electron therapy

Science.gov (United States)

Bruza, Petr; Gollub, Sarah L.; Andreozzi, Jacqueline M.; Tendler, Irwin I.; Williams, Benjamin B.; Jarvis, Lesley A.; Gladstone, David J.; Pogue, Brian W.

2018-05-01

The purpose of this study was to measure surface dose by remote time-gated imaging of plastic scintillators. A novel technique for time-gated, intensified camera imaging of scintillator emission was demonstrated, and key parameters influencing the signal were analyzed, including distance, angle and thickness. A set of scintillator samples was calibrated by using thermo-luminescence detector response as reference. Examples of use in total skin electron therapy are described. The data showed excellent room light rejection (signal-to-noise ratio of scintillation SNR ≈ 470), ideal scintillation dose response linearity, and 2% dose rate error. Individual sample scintillation response varied by 7% due to sample preparation. Inverse square distance dependence correction and lens throughput error (8% per meter) correction were needed. At scintillator-to-source angle and observation angle <50°, the radiant energy fluence error was smaller than 1%. The achieved standard error of the scintillator cumulative dose measurement compared to the TLD dose was 5%. The results from this proof-of-concept study documented the first use of small scintillator targets for remote surface dosimetry in ambient room lighting. The measured dose accuracy renders our method to be comparable to thermo-luminescent detector dosimetry, with the ultimate realization of accuracy likely to be better than shown here. Once optimized, this approach to remote dosimetry may substantially reduce the time and effort required for surface dosimetry.
Heterogeneous real-time computing in radio astronomy

Science.gov (United States)

Ford, John M.; Demorest, Paul; Ransom, Scott

2010-07-01

Modern computer architectures suited for general purpose computing are often not the best choice for either I/O-bound or compute-bound problems. Sometimes the best choice is not to choose a single architecture, but to take advantage of the best characteristics of different computer architectures to solve your problems. This paper examines the tradeoffs between using computer systems based on the ubiquitous X86 Central Processing Units (CPU's), Field Programmable Gate Array (FPGA) based signal processors, and Graphical Processing Units (GPU's). We will show how a heterogeneous system can be produced that blends the best of each of these technologies into a real-time signal processing system. FPGA's tightly coupled to analog-to-digital converters connect the instrument to the telescope and supply the first level of computing to the system. These FPGA's are coupled to other FPGA's to continue to provide highly efficient processing power. Data is then packaged up and shipped over fast networks to a cluster of general purpose computers equipped with GPU's, which are used for floating-point intensive computation. Finally, the data is handled by the CPU and written to disk, or further processed. Each of the elements in the system has been chosen for its specific characteristics and the role it can play in creating a system that does the most for the least, in terms of power, space, and money.
ELT-scale Adaptive Optics real-time control with thes Intel Xeon Phi Many Integrated Core Architecture

Science.gov (United States)

Jenkins, David R.; Basden, Alastair; Myers, Richard M.

2018-05-01

We propose a solution to the increased computational demands of Extremely Large Telescope (ELT) scale adaptive optics (AO) real-time control with the Intel Xeon Phi Knights Landing (KNL) Many Integrated Core (MIC) Architecture. The computational demands of an AO real-time controller (RTC) scale with the fourth power of telescope diameter and so the next generation ELTs require orders of magnitude more processing power for the RTC pipeline than existing systems. The Xeon Phi contains a large number (≥64) of low power x86 CPU cores and high bandwidth memory integrated into a single socketed server CPU package. The increased parallelism and memory bandwidth are crucial to providing the performance for reconstructing wavefronts with the required precision for ELT scale AO. Here, we demonstrate that the Xeon Phi KNL is capable of performing ELT scale single conjugate AO real-time control computation at over 1.0kHz with less than 20μs RMS jitter. We have also shown that with a wavefront sensor camera attached the KNL can process the real-time control loop at up to 966Hz, the maximum frame-rate of the camera, with jitter remaining below 20μs RMS. Future studies will involve exploring the use of a cluster of Xeon Phis for the real-time control of the MCAO and MOAO regimes of AO. We find that the Xeon Phi is highly suitable for ELT AO real time control.
Different but Equal: Total Work, Gender and Social Norms in EU and US Time Use

OpenAIRE

Daniel S Hamermesh; Michael C Burda; Philippe Weil

2008-01-01

Using time-diary data from 27 countries, we demonstrate a negative relationship between real GDP per capita and the female-male difference in total work time—the sum of work for pay and work at home. We also show that in rich non-Catholic countries on four continents men and women do the same amount of total work on average. Our survey results demonstrate that labor economists, macroeconomists, sociologists and the general public consistently believe that women perform more tot...
Management of Virtual Machine as an Energy Conservation in Private Cloud Computing System

Directory of Open Access Journals (Sweden)

Fauzi Akhmad

2016-01-01

Full Text Available Cloud computing is a service model that is packaged in a base computing resources that can be accessed through the Internet on demand and placed in the data center. Data center architecture in cloud computing environments are heterogeneous and distributed, composed of a cluster of network servers with different capacity computing resources in different physical servers. The problems on the demand and availability of cloud services can be solved by fluctuating data center cloud through abstraction with virtualization technology. Virtual machine (VM is a representation of the availability of computing resources that can be dynamically allocated and reallocated on demand. In this study the consolidation of VM as energy conservation in Private Cloud Computing Systems with the target of process optimization selection policy and migration of the VM on the procedure consolidation. VM environment cloud data center to consider hosting a type of service a particular application at the instance VM requires a different level of computing resources. The results of the use of computing resources on a VM that is not balanced in physical servers can be reduced by using a live VM migration to achieve workload balancing. A practical approach used in developing OpenStack-based cloud computing environment by integrating Cloud VM and VM Placement selection procedure using OpenStack Neat VM consolidation. Following the value of CPU Time used as a fill to get the average value in MHz CPU utilization within a specific time period. The average value of a VM’s CPU utilization in getting from the current CPU_time reduced by CPU_time from the previous data retrieval multiplied by the maximum frequency of the CPU. The calculation result is divided by the making time CPU_time when it is reduced to the previous taking time CPU_time multiplied by milliseconds.
Time related total lactic acid bacteria population diversity and ...

African Journals Online (AJOL)

The total lactic acid bacterial community involved in the spontaneous fermentation of malted cowpea fortified cereal weaning food was investigated by phenotypically and cultivation independent method. A total of 74 out of the isolated 178 strains were Lactobacillus plantarum, 32 were Pediococcus acidilactici and over 60% ...
The Impact of Total Ischemic Time, Donor Age and the Pathway of Donor Death on Graft Outcomes After Deceased Donor Kidney Transplantation.

Science.gov (United States)

Wong, Germaine; Teixeira-Pinto, Armando; Chapman, Jeremy R; Craig, Jonathan C; Pleass, Henry; McDonald, Stephen; Lim, Wai H

2017-06-01

Prolonged ischemia is a known risk factor for delayed graft function (DGF) and its interaction with donor characteristics, the pathways of donor death, and graft outcomes may have important implications for allocation policies. Using data from the Australian and New Zealand Dialysis and Transplant registry (1994-2013), we examined the relationship between total ischemic time with graft outcomes among recipients who received their first deceased donor kidney transplants. Total ischemic time (in hours) was defined as the time of the donor renal artery interruption or aortic clamp, until the time of release of the clamp on the renal artery in the recipient. A total of 7542 recipients were followed up over a median follow-up time of 5.3 years (interquartile range of 8.2 years). Of these, 1823 (24.6%) experienced DGF and 2553 (33.9%) experienced allograft loss. Recipients with total ischemic time of 14 hours or longer experienced an increased odd of DGF compared with those with total ischemic time less than 14 hours. This effect was most marked among those with older donors (P value for interaction = 0.01). There was a significant interaction between total ischemic time, donor age, and graft loss (P value for interaction = 0.03). There was on average, a 9% increase in the overall risk of graft loss per hour increase in the total ischemic time (adjusted hazard ratio, 1.09; 95% confidence interval, 1.01-1.18; P = 0.02) in recipients with older donation after circulatory death grafts. There is a clinically important interaction between donor age, the pathway of donor death, and total ischemic time on graft outcomes, such that the duration of ischemic time has the greatest impact on graft survival in recipients with older donation after circulatory death kidneys.

Smoking is associated with earlier time to revision of total knee arthroplasty.

Science.gov (United States)

Lim, Chin Tat; Goodman, Stuart B; Huddleston, James I; Harris, Alex H S; Bhowmick, Subhrojyoti; Maloney, William J; Amanatullah, Derek F

2017-10-01

Smoking is associated with early postoperative complications, increased length of hospital stay, and an increased risk of revision after total knee arthroplasty (TKA). However, the effect of smoking on time to revision TKA is unknown. A total of 619 primary TKAs referred to an academic tertiary center for revision TKA were retrospectively stratified according to the patient smoking status. Smoking status was then analyzed for associations with time to revision TKA using a Chi square test. The association was also analyzed according to the indication for revision TKA. Smokers (37/41, 90%) have an increased risk of earlier revision for any reason compared to non-smokers (274/357, 77%, p=0.031). Smokers (37/41, 90%) have an increased risk of earlier revision for any reason compared to ex-smokers (168/221, 76%, p=0.028). Subgroup analysis did not reveal a difference in indication for revision TKA (p>0.05). Smokers are at increased risk of earlier revision TKA when compared to non-smokers and ex-smokers. The risk for ex-smokers was similar to that of non-smokers. Smoking appears to have an all-or-none effect on earlier revision TKA as patients who smoked more did not have higher risk of early revision TKA. These results highlight the need for clinicians to urge patients not to begin smoking and encourage smokers to quit smoking prior to primary TKA. Copyright © 2017 Elsevier B.V. All rights reserved.
On the Feasibility and Limitations of Just-in-Time Instruction Set Extension for FPGA-Based Reconfigurable Processors

Directory of Open Access Journals (Sweden)

Mariusz Grad

2012-01-01

Full Text Available Reconfigurable instruction set processors provide the possibility of tailor the instruction set of a CPU to a particular application. While this customization process could be performed during runtime in order to adapt the CPU to the currently executed workload, this use case has been hardly investigated. In this paper, we study the feasibility of moving the customization process to runtime and evaluate the relation of the expected speedups and the associated overheads. To this end, we present a tool flow that is tailored to the requirements of this just-in-time ASIP specialization scenario. We evaluate our methods by targeting our previously introduced Woolcano reconfigurable ASIP architecture for a set of applications from the SPEC2006, SPEC2000, MiBench, and SciMark2 benchmark suites. Our results show that just-in-time ASIP specialization is promising for embedded computing applications, where average speedups of 5x can be achieved by spending 50 minutes for custom instruction identification and hardware generation. These overheads will be compensated if the applications execute for more than 2 hours. For the scientific computing benchmarks, the achievable speedup is only 1.2x, which requires significant execution times in the order of days to amortize the overheads.
The association between problematic cellular phone use and risky behaviors and low self-esteem among Taiwanese adolescents.

Science.gov (United States)

Yang, Yuan-Sheng; Yen, Ju-Yu; Ko, Chih-Hung; Cheng, Chung-Ping; Yen, Cheng-Fang

2010-04-28

Cellular phone use (CPU) is an important part of life for many adolescents. However, problematic CPU may complicate physiological and psychological problems. The aim of our study was to examine the associations between problematic CPU and a series of risky behaviors and low self-esteem in Taiwanese adolescents. A total of 11,111 adolescent students in Southern Taiwan were randomly selected into this study. We used the Problematic Cellular Phone Use Questionnaire to identify the adolescents with problematic CPU. Meanwhile, a series of risky behaviors and self-esteem were evaluated. Multilevel logistic regression analyses were employed to examine the associations between problematic CPU and risky behaviors and low self-esteem regarding gender and age. The results indicated that positive associations were found between problematic CPU and aggression, insomnia, smoking cigarettes, suicidal tendencies, and low self-esteem in all groups with different sexes and ages. However, gender and age differences existed in the associations between problematic CPU and suspension from school, criminal records, tattooing, short nocturnal sleep duration, unprotected sex, illicit drugs use, drinking alcohol and chewing betel nuts. There were positive associations between problematic CPU and a series of risky behaviors and low self-esteem in Taiwanese adolescents. It is worthy for parents and mental health professionals to pay attention to adolescents' problematic CPU.
Determinantal Representation of the Time-Dependent Stationary Correlation Function for the Totally Asymmetric Simple Exclusion Model

Directory of Open Access Journals (Sweden)

Nikolay M. Bogoliubov

2009-04-01

Full Text Available The basic model of the non-equilibrium low dimensional physics the so-called totally asymmetric exclusion process is related to the 'crystalline limit' (q → ∞ of the SU_q(2 quantum algebra. Using the quantum inverse scattering method we obtain the exact expression for the time-dependent stationary correlation function of the totally asymmetric simple exclusion process on a one dimensional lattice with the periodic boundary conditions.
Increased control and data acquisition capabilities via microprocessor-based timed reading and time plot CAMAC modules

International Nuclear Information System (INIS)

Barsotti, E.J.; Purvis, D.M.; Loveless, R.L.; Hance, R.D.

1977-01-01

By implementing a microprocessor-based CAMAC module capable of being programmed to function as a time plot or a timed reading controller, the capabilities of the experimental area serial CAMAC control and data acquisition system at Fermilab have been extensively increased. These modules provide real-time data gathering and pre-processing functions synchronized to the main accelerator cycle clock while adding only a minimal amount to the host computer's CPU time and memory requirements. Critical data requiring a fast system response can be read by the host computer immediately following the request for this data. The vast majority of data, being non-critical, can be read via a block transfer during a non-busy time in the main accelerator cycle. Each of Fermilab's experimental areas, Meson, Neutrino and Proton, are controlled primarily by a Lockheed MAC-16 computer. Each of these three minicomputers is linked to a larger Digital Equipment Corporation PDP-11/50 computer. The PDP-11 computers are used primarily for data analysis and reduction. Presently two PDP-11's are linked to the three MAC-16 computers
A close-form solution to predict the total melting time of an ablating slab in contact with a plasma

International Nuclear Information System (INIS)

Yeh, F.-B.

2007-01-01

An exact melt-through time is derived for a one-dimensional heated slab in contact with a plasma when the melted material is immediately removed. The plasma is composed of a collisionless presheath and sheath on a slab, which partially reflects and secondarily emits ions and electrons. The energy transport from plasma to the surface accounting for the presheath and sheath is determined from the kinetic analysis. This work proposes a semi-analytical model to calculate the total melting time of a slab based on a direct integration of the unsteady heat conduction equation, and provides quantitative results applicable to control the total melting time of the slab. The total melting time as a function of plasma parameters and thermophysical properties of the slab are obtained. The predicted energy transmission factor as a function of dimensionless wall potential agrees well with the experimental data. The effects of reflectivities of the ions and electrons on the wall, electron-to-ion source temperature ratio at the presheath edge, charge number, ion-to-electron mass ratio, ionization energy, plasma flow work-to-heat conduction ratios, Stefan number, melting temperature, Biot number and bias voltage on the total melting time of the slab are quantitatively provided in this work
Total vaginectomy and urethral lengthening at time of neourethral prelamination in transgender men.

Science.gov (United States)

Medina, Carlos A; Fein, Lydia A; Salgado, Christopher J

2017-11-29

For transgender men (TGM), gender-affirmation surgery (GAS) is often the final stage of their gender transition. GAS involves creating a neophallus, typically using tissue remote from the genital region, such as radial forearm free-flap phalloplasty. Essential to this process is vaginectomy. Complexity of vaginal fascial attachments, atrophy due to testosterone use, and need to preserve integrity of the vaginal epithelium for tissue rearrangement add to the intricacy of the procedure during GAS. We designed the technique presented here to minimize complications and contribute to overall success of the phalloplasty procedure. After obtaining approval from the Institutional Review Board, our transgender (TG) database at the University of Miami Hospital was reviewed to identify cases with vaginectomy and urethral elongation performed at the time of radial forearm free-flap phalloplasty prelamination. Surgical technique for posterior vaginectomy and anterior vaginal wall-flap harvest with subsequent urethral lengthening is detailed. Six patients underwent total vaginectomy and urethral elongation at the time of radial forearm free-flap phalloplasty prelamination. Mean estimated blood loss (EBL) was 290 ± 199.4 ml for the vaginectomy and urethral elongation, and no one required transfusion. There were no intraoperative complications (cystotomy, ureteral obstruction, enterotomy, proctotomy, or neurological injury). One patient had a urologic complication (urethral stricture) in the neobulbar urethra. Total vaginectomy and urethral lengthening procedures at the time of GAS are relatively safe procedures, and using the described technique provides excellent tissue for urethral prelamination and a low complication rate in both the short and long term.
Using a pruned, nondirect product basis in conjunction with the multi-configuration time-dependent Hartree (MCTDH) method

Energy Technology Data Exchange (ETDEWEB)

Wodraszka, Robert, E-mail: Robert.Wodraszka@chem.queensu.ca; Carrington, Tucker, E-mail: Tucker.Carrington@queensu.ca [Department of Chemistry, Queen’s University, Kingston, Ontario K7L 3N6 (Canada)

2016-07-28

In this paper, we propose a pruned, nondirect product multi-configuration time dependent Hartree (MCTDH) method for solving the Schrödinger equation. MCTDH uses optimized 1D basis functions, called single particle functions, but the size of the standard direct product MCTDH basis scales exponentially with D, the number of coordinates. We compare the pruned approach to standard MCTDH calculations for basis sizes small enough that the latter are possible and demonstrate that pruning the basis reduces the CPU cost of computing vibrational energy levels of acetonitrile (D = 12) by more than two orders of magnitude. Using the pruned method, it is possible to do calculations with larger bases, for which the cost of standard MCTDH calculations is prohibitive. Pruning the basis complicates the evaluation of matrix-vector products. In this paper, they are done term by term for a sum-of-products Hamiltonian. When no attempt is made to exploit the fact that matrices representing some of the factors of a term are identity matrices, one needs only to carefully constrain indices. In this paper, we develop new ideas that make it possible to further reduce the CPU time by exploiting identity matrices.
Bridging FPGA and GPU technologies for AO real-time control

Science.gov (United States)

Perret, Denis; Lainé, Maxime; Bernard, Julien; Gratadour, Damien; Sevin, Arnaud

2016-07-01

Our team has developed a common environment for high performance simulations and real-time control of AO systems based on the use of Graphics Processors Units in the context of the COMPASS project. Such a solution, based on the ability of the real time core in the simulation to provide adequate computing performance, limits the cost of developing AO RTC systems and makes them more scalable. A code developed and validated in the context of the simulation may be injected directly into the system and tested on sky. Furthermore, the use of relatively low cost components also offers significant advantages for the system hardware platform. However, the use of GPUs in an AO loop comes with drawbacks: the traditional way of offloading computation from CPU to GPUs - involving multiple copies and unacceptable overhead in kernel launching - is not well suited in a real time context. This last application requires the implementation of a solution enabling direct memory access (DMA) to the GPU memory from a third party device, bypassing the operating system. This allows this device to communicate directly with the real-time core of the simulation feeding it with the WFS camera pixel stream. We show that DMA between a custom FPGA-based frame-grabber and a computation unit (GPU, FPGA, or Coprocessor such as Xeon-phi) across PCIe allows us to get latencies compatible with what will be needed on ELTs. As a fine-grained synchronization mechanism is not yet made available by GPU vendors, we propose the use of memory polling to avoid interrupts handling and involvement of a CPU. Network and Vision protocols are handled by the FPGA-based Network Interface Card (NIC). We present the results we obtained on a complete AO loop using camera and deformable mirror simulators.
GPU accelerated real-time confocal fluorescence lifetime imaging microscopy (FLIM) based on the analog mean-delay (AMD) method

Science.gov (United States)

Kim, Byungyeon; Park, Byungjun; Lee, Seungrag; Won, Youngjae

2016-01-01

We demonstrated GPU accelerated real-time confocal fluorescence lifetime imaging microscopy (FLIM) based on the analog mean-delay (AMD) method. Our algorithm was verified for various fluorescence lifetimes and photon numbers. The GPU processing time was faster than the physical scanning time for images up to 800 × 800, and more than 149 times faster than a single core CPU. The frame rate of our system was demonstrated to be 13 fps for a 200 × 200 pixel image when observing maize vascular tissue. This system can be utilized for observing dynamic biological reactions, medical diagnosis, and real-time industrial inspection. PMID:28018724
Dynamic Allocation of SPM Based on Time-Slotted Cache Conflict Graph for System Optimization

Science.gov (United States)

Wu, Jianping; Ling, Ming; Zhang, Yang; Mei, Chen; Wang, Huan

This paper proposes a novel dynamic Scratch-pad Memory allocation strategy to optimize the energy consumption of the memory sub-system. Firstly, the whole program execution process is sliced into several time slots according to the temporal dimension; thereafter, a Time-Slotted Cache Conflict Graph (TSCCG) is introduced to model the behavior of Data Cache (D-Cache) conflicts within each time slot. Then, Integer Nonlinear Programming (INP) is implemented, which can avoid time-consuming linearization process, to select the most profitable data pages. Virtual Memory System (VMS) is adopted to remap those data pages, which will cause severe Cache conflicts within a time slot, to SPM. In order to minimize the swapping overhead of dynamic SPM allocation, a novel SPM controller with a tightly coupled DMA is introduced to issue the swapping operations without CPU's intervention. Last but not the least, this paper discusses the fluctuation of system energy profit based on different MMU page size as well as the Time Slot duration quantitatively. According to our design space exploration, the proposed method can optimize all of the data segments, including global data, heap and stack data in general, and reduce the total energy consumption by 27.28% on average, up to 55.22% with a marginal performance promotion. And comparing to the conventional static CCG (Cache Conflicts Graph), our approach can obtain 24.7% energy profit on average, up to 30.5% with a sight boost in performance.
Design of FPGA based high-speed data acquisition and real-time data processing system on J-TEXT tokamak

International Nuclear Information System (INIS)

Zheng, W.; Liu, R.; Zhang, M.; Zhuang, G.; Yuan, T.

2014-01-01

Highlights: • It is a data acquisition system for polarimeter–interferometer diagnostic on J-TEXT tokamak based on FPGA and PXIe devices. • The system provides a powerful data acquisition and real-time data processing performance. • Users can implement different data processing applications on the FPGA in a short time. • This system supports EPICS and has been integrated into the J-TEXT CODAC system. - Abstract: Tokamak experiment requires high-speed data acquisition and processing systems. In traditional data acquisition system, the sampling rate, channel numbers and processing speed are limited by bus throughput and CPU speed. This paper presents a data acquisition and processing system based on FPGA. The data can be processed in real-time before it is passed to the CPU. It provides processing ability for more channels with higher sampling rates than the traditional data acquisition system while ensuring deterministic real-time performance. A working prototype is developed for the newly built polarimeter–interferometer diagnostic system on the Joint Texas Experimental Tokamak (J-TEXT). It provides 16 channels with 120 MHz maximum sampling rate and 16 bit resolution. The onboard FPGA is able to calculate the plasma electron density and Faraday rotation angel. A RAID 5 storage device is adopted providing 700 MB/s read–write speed to buffer the data to the hard disk continuously for better performance
Cross-sectional associations of total sitting and leisure screen time with cardiometabolic risk in adults. Results from the HUNT Study, Norway

NARCIS (Netherlands)

Chau, J.Y.; Grunseit, A.; Midthjell, K.; Holmen, J.; Holmen, T.L.; Bauman, A.E.; van der Ploeg, H.P.

2014-01-01

Objectives: To examine associations of total sitting time, TV-viewing and leisure-time computer use with cardiometabolic risk biomarkers in adults. Design: Population based cross-sectional study. Methods: Waist circumference, BMI, total cholesterol, HDL cholesterol, blood pressure, non-fasting
Accessible high performance computing solutions for near real-time image processing for time critical applications

Science.gov (United States)

Bielski, Conrad; Lemoine, Guido; Syryczynski, Jacek

2009-09-01

High Performance Computing (HPC) hardware solutions such as grid computing and General Processing on a Graphics Processing Unit (GPGPU) are now accessible to users with general computing needs. Grid computing infrastructures in the form of computing clusters or blades are becoming common place and GPGPU solutions that leverage the processing power of the video card are quickly being integrated into personal workstations. Our interest in these HPC technologies stems from the need to produce near real-time maps from a combination of pre- and post-event satellite imagery in support of post-disaster management. Faster processing provides a twofold gain in this situation: 1. critical information can be provided faster and 2. more elaborate automated processing can be performed prior to providing the critical information. In our particular case, we test the use of the PANTEX index which is based on analysis of image textural measures extracted using anisotropic, rotation-invariant GLCM statistics. The use of this index, applied in a moving window, has been shown to successfully identify built-up areas in remotely sensed imagery. Built-up index image masks are important input to the structuring of damage assessment interpretation because they help optimise the workload. The performance of computing the PANTEX workflow is compared on two different HPC hardware architectures: (1) a blade server with 4 blades, each having dual quad-core CPUs and (2) a CUDA enabled GPU workstation. The reference platform is a dual CPU-quad core workstation and the PANTEX workflow total computing time is measured. Furthermore, as part of a qualitative evaluation, the differences in setting up and configuring various hardware solutions and the related software coding effort is presented.
The association between problematic cellular phone use and risky behaviors and low self-esteem among Taiwanese adolescents

Directory of Open Access Journals (Sweden)

Ko Chih-Hung

2010-04-01

Full Text Available Abstract Background Cellular phone use (CPU is an important part of life for many adolescents. However, problematic CPU may complicate physiological and psychological problems. The aim of our study was to examine the associations between problematic CPU and a series of risky behaviors and low self-esteem in Taiwanese adolescents. Methods A total of 11,111 adolescent students in Southern Taiwan were randomly selected into this study. We used the Problematic Cellular Phone Use Questionnaire to identify the adolescents with problematic CPU. Meanwhile, a series of risky behaviors and self-esteem were evaluated. Multilevel logistic regression analyses were employed to examine the associations between problematic CPU and risky behaviors and low self-esteem regarding gender and age. Results The results indicated that positive associations were found between problematic CPU and aggression, insomnia, smoking cigarettes, suicidal tendencies, and low self-esteem in all groups with different sexes and ages. However, gender and age differences existed in the associations between problematic CPU and suspension from school, criminal records, tattooing, short nocturnal sleep duration, unprotected sex, illicit drugs use, drinking alcohol and chewing betel nuts. Conclusions There were positive associations between problematic CPU and a series of risky behaviors and low self-esteem in Taiwanese adolescents. It is worthy for parents and mental health professionals to pay attention to adolescents' problematic CPU.
An Integrated Pipeline of Open Source Software Adapted for Multi-CPU Architectures: Use in the Large-Scale Identification of Single Nucleotide Polymorphisms

Directory of Open Access Journals (Sweden)

B. Jayashree

2007-01-01

Full Text Available The large amounts of EST sequence data available from a single species of an organism as well as for several species within a genus provide an easy source of identification of intra- and interspecies single nucleotide polymorphisms (SNPs. In the case of model organisms, the data available are numerous, given the degree of redundancy in the deposited EST data. There are several available bioinformatics tools that can be used to mine this data; however, using them requires a certain level of expertise: the tools have to be used sequentially with accompanying format conversion and steps like clustering and assembly of sequences become time-intensive jobs even for moderately sized datasets. We report here a pipeline of open source software extended to run on multiple CPU architectures that can be used to mine large EST datasets for SNPs and identify restriction sites for assaying the SNPs so that cost-effective CAPS assays can be developed for SNP genotyping in genetics and breeding applications. At the International Crops Research Institute for the Semi-Arid Tropics (ICRISAT, the pipeline has been implemented to run on a Paracel high-performance system consisting of four dual AMD Opteron processors running Linux with MPICH. The pipeline can be accessed through user-friendly web interfaces at http://hpc.icrisat.cgiar.org/PBSWeb and is available on request for academic use. We have validated the developed pipeline by mining chickpea ESTs for interspecies SNPs, development of CAPS assays for SNP genotyping, and confirmation of restriction digestion pattern at the sequence level.
A Real-Time Programmer's Tour of General-Purpose L4 Microkernels

Directory of Open Access Journals (Sweden)

Ruocco Sergio

2008-01-01

Full Text Available Abstract L4-embedded is a microkernel successfully deployed in mobile devices with soft real-time requirements. It now faces the challenges of tightly integrated systems, in which user interface, multimedia, OS, wireless protocols, and even software-defined radios must run on a single CPU. In this paper we discuss the pros and cons of L4-embedded for real-time systems design, focusing on the issues caused by the extreme speed optimisations it inherited from its general-purpose ancestors. Since these issues can be addressed with a minimal performance loss, we conclude that, overall, the design of real-time systems based on L4-embedded is possible, and facilitated by a number of design features unique to microkernels and the L4 family.
A Real-Time Programmer's Tour of General-Purpose L4 Microkernels

Directory of Open Access Journals (Sweden)

Sergio Ruocco

2008-02-01

Full Text Available L4-embedded is a microkernel successfully deployed in mobile devices with soft real-time requirements. It now faces the challenges of tightly integrated systems, in which user interface, multimedia, OS, wireless protocols, and even software-defined radios must run on a single CPU. In this paper we discuss the pros and cons of L4-embedded for real-time systems design, focusing on the issues caused by the extreme speed optimisations it inherited from its general-purpose ancestors. Since these issues can be addressed with a minimal performance loss, we conclude that, overall, the design of real-time systems based on L4-embedded is possible, and facilitated by a number of design features unique to microkernels and the L4 family.
Real-Time Adaptive Lossless Hyperspectral Image Compression using CCSDS on Parallel GPGPU and Multicore Processor Systems

Science.gov (United States)

Hopson, Ben; Benkrid, Khaled; Keymeulen, Didier; Aranki, Nazeeh; Klimesh, Matt; Kiely, Aaron

2012-01-01

The proposed CCSDS (Consultative Committee for Space Data Systems) Lossless Hyperspectral Image Compression Algorithm was designed to facilitate a fast hardware implementation. This paper analyses that algorithm with regard to available parallelism and describes fast parallel implementations in software for GPGPU and Multicore CPU architectures. We show that careful software implementation, using hardware acceleration in the form of GPGPUs or even just multicore processors, can exceed the performance of existing hardware and software implementations by up to 11x and break the real-time barrier for the first time for a typical test application.
Time and Space Partitioning the EagleEye Reference Misson

Science.gov (United States)

Bos, Victor; Mendham, Peter; Kauppinen, Panu; Holsti, Niklas; Crespo, Alfons; Masmano, Miguel; de la Puente, Juan A.; Zamorano, Juan

2013-08-01

We discuss experiences gained by porting a Software Validation Facility (SVF) and a satellite Central Software (CSW) to a platform with support for Time and Space Partitioning (TSP). The SVF and CSW are part of the EagleEye Reference mission of the European Space Agency (ESA). As a reference mission, EagleEye is a perfect candidate to evaluate practical aspects of developing satellite CSW for and on TSP platforms. The specific TSP platform we used consists of a simulated LEON3 CPU controlled by the XtratuM separation micro-kernel. On top of this, we run five separate partitions. Each partition runs its own real-time operating system or Ada run-time kernel, which in turn are running the application software of the CSW. We describe issues related to partitioning; inter-partition communication; scheduling; I/O; and fault-detection, isolation, and recovery (FDIR).

A Block-Asynchronous Relaxation Method for Graphics Processing Units

OpenAIRE

Anzt, H.; Dongarra, J.; Heuveline, Vincent; Tomov, S.

2011-01-01

In this paper, we analyze the potential of asynchronous relaxation methods on Graphics Processing Units (GPUs). For this purpose, we developed a set of asynchronous iteration algorithms in CUDA and compared them with a parallel implementation of synchronous relaxation methods on CPU-based systems. For a set of test matrices taken from the University of Florida Matrix Collection we monitor the convergence behavior, the average iteration time and the total time-to-solution time. Analyzing the r...
Accelerating VASP electronic structure calculations using graphic processing units

KAUST Repository

Hacene, Mohamed

2012-08-20

We present a way to improve the performance of the electronic structure Vienna Ab initio Simulation Package (VASP) program. We show that high-performance computers equipped with graphics processing units (GPUs) as accelerators may reduce drastically the computation time when offloading these sections to the graphic chips. The procedure consists of (i) profiling the performance of the code to isolate the time-consuming parts, (ii) rewriting these so that the algorithms become better-suited for the chosen graphic accelerator, and (iii) optimizing memory traffic between the host computer and the GPU accelerator. We chose to accelerate VASP with NVIDIA GPU using CUDA. We compare the GPU and original versions of VASP by evaluating the Davidson and RMM-DIIS algorithms on chemical systems of up to 1100 atoms. In these tests, the total time is reduced by a factor between 3 and 8 when running on n (CPU core + GPU) compared to n CPU cores only, without any accuracy loss. © 2012 Wiley Periodicals, Inc.
Accelerating VASP electronic structure calculations using graphic processing units

KAUST Repository

Hacene, Mohamed; Anciaux-Sedrakian, Ani; Rozanska, Xavier; Klahr, Diego; Guignon, Thomas; Fleurat-Lessard, Paul

2012-01-01

We present a way to improve the performance of the electronic structure Vienna Ab initio Simulation Package (VASP) program. We show that high-performance computers equipped with graphics processing units (GPUs) as accelerators may reduce drastically the computation time when offloading these sections to the graphic chips. The procedure consists of (i) profiling the performance of the code to isolate the time-consuming parts, (ii) rewriting these so that the algorithms become better-suited for the chosen graphic accelerator, and (iii) optimizing memory traffic between the host computer and the GPU accelerator. We chose to accelerate VASP with NVIDIA GPU using CUDA. We compare the GPU and original versions of VASP by evaluating the Davidson and RMM-DIIS algorithms on chemical systems of up to 1100 atoms. In these tests, the total time is reduced by a factor between 3 and 8 when running on n (CPU core + GPU) compared to n CPU cores only, without any accuracy loss. © 2012 Wiley Periodicals, Inc.
Effect of temperature, time, and milling process on yield, flavonoid, and total phenolic content of Zingiber officinale water extract

Science.gov (United States)

Andriyani, R.; Kosasih, W.; Ningrum, D. R.; Pudjiraharti, S.

2017-03-01

Several parameters such as temperature, time of extraction, and size of simplicia play significant role in medicinal herb extraction. This study aimed to investigate the effect of those parameters on yield extract, flavonoid, and total phenolic content in water extract of Zingiber officinale. The temperatures used were 50, 70 and 90°C and the extraction times were 30, 60 and 90 min. Z. officinale in the form of powder and chips were used to study the effect of milling treatment. The correlation among those variables was analysed using ANOVA two-way factors without replication. The result showed that time and temperature did not influence the yield of extract of Powder simplicia. However, time of extraction influenced the extract of simplicia treated without milling process. On the other hand, flavonoid and total phenolic content were not influenced by temperature, time, and milling treatment.
Time-driven Activity-based Cost of Fast-Track Total Hip and Knee Arthroplasty

DEFF Research Database (Denmark)

Andreasen, Signe E; Holm, Henriette B; Jørgensen, Mira

2017-01-01

this between 2 departments with different logistical set-ups. METHODS: Prospective data collection was analyzed using the time-driven activity-based costing method (TDABC) on time consumed by different staff members involved in patient treatment in the perioperative period of fast-track THA and TKA in 2 Danish...... orthopedic departments with standardized fast-track settings, but different logistical set-ups. RESULTS: Length of stay was median 2 days in both departments. TDABC revealed minor differences in the perioperative settings between departments, but the total cost excluding the prosthesis was similar at USD......-track methodology, the result could be a more cost-effective pathway altogether. As THA and TKA are potentially costly procedures and the numbers are increasing in an economical limited environment, the aim of this study is to present baseline detailed economical calculations of fast-track THA and TKA and compare...
An efficient compression scheme for bitmap indices

Energy Technology Data Exchange (ETDEWEB)

Wu, Kesheng; Otoo, Ekow J.; Shoshani, Arie

2004-04-13

When using an out-of-core indexing method to answer a query, it is generally assumed that the I/O cost dominates the overall query response time. Because of this, most research on indexing methods concentrate on reducing the sizes of indices. For bitmap indices, compression has been used for this purpose. However, in most cases, operations on these compressed bitmaps, mostly bitwise logical operations such as AND, OR, and NOT, spend more time in CPU than in I/O. To speedup these operations, a number of specialized bitmap compression schemes have been developed; the best known of which is the byte-aligned bitmap code (BBC). They are usually faster in performing logical operations than the general purpose compression schemes, but, the time spent in CPU still dominates the total query response time. To reduce the query response time, we designed a CPU-friendly scheme named the word-aligned hybrid (WAH) code. In this paper, we prove that the sizes of WAH compressed bitmap indices are about two words per row for large range of attributes. This size is smaller than typical sizes of commonly used indices, such as a B-tree. Therefore, WAH compressed indices are not only appropriate for low cardinality attributes but also for high cardinality attributes.In the worst case, the time to operate on compressed bitmaps is proportional to the total size of the bitmaps involved. The total size of the bitmaps required to answer a query on one attribute is proportional to the number of hits. These indicate that WAH compressed bitmap indices are optimal. To verify their effectiveness, we generated bitmap indices for four different datasets and measured the response time of many range queries. Tests confirm that sizes of compressed bitmap indices are indeed smaller than B-tree indices, and query processing with WAH compressed indices is much faster than with BBC compressed indices, projection indices and B-tree indices. In addition, we also verified that the average query response time
Timing of urinary catheter removal after uncomplicated total abdominal hysterectomy: a prospective randomized trial.

Science.gov (United States)

Ahmed, Magdy R; Sayed Ahmed, Waleed A; Atwa, Khaled A; Metwally, Lobna

2014-05-01

To assess whether immediate (0h), intermediate (after 6h) or delayed (after 24h) removal of an indwelling urinary catheter after uncomplicated abdominal hysterectomy can affect the rate of re-catheterization due to urinary retention, rate of urinary tract infection, ambulation time and length of hospital stay. Prospective randomized controlled trial conducted at Suez Canal University Hospital, Egypt. Two hundred and twenty-one women underwent total abdominal hysterectomy for benign gynecological diseases and were randomly allocated into three groups. Women in group A (73 patients) had their urinary catheter removed immediately after surgery. Group B (81 patients) had the catheter removed 6h post-operatively while in group C (67 patients) the catheter was removed after 24h. The main outcome measures were the frequency of urinary retention, urinary tract infections, ambulation time and length of hospital stay. There was a significantly higher number of urinary retention episodes requiring re-catheterization in the immediate removal group compared to the intermediate and delayed removal groups (16.4% versus 2.5% and 0% respectively). Delayed urinary catheter removal was associated with a higher incidence of urinary tract infections (15%), delayed ambulation time (10.3h) and longer hospital stay (5.6 days) compared to the early (1.4%, 4.1h and 3.2 days respectively) and intermediate (3.7%, 6.8h and 3.4 days respectively) removal groups. Removal of the urinary catheter 6h postoperatively appears to be more advantageous than early or late removal in cases of uncomplicated total abdominal hysterectomy. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Real-time generation of kd-trees for ray tracing using DirectX 11

OpenAIRE

Säll, Martin; Cronqvist, Fredrik

2017-01-01

Context. Ray tracing has always been a simple but eﬀective way to create a photorealistic scene but at a greater cost when expanding the scene. Recent improvements in GPU and CPU hardware have made ray tracing faster, making more complex scenes possible with the same amount of time needed to process the scene. Despite the improvements in hardware ray tracing is still rarely run at a interactive speed. Objectives. The aim of this experiment was to implement a new kdtree generation algorithm us...
Wake force computation in the time domain for long structures

International Nuclear Information System (INIS)

Bane, K.; Weiland, T.

1983-07-01

One is often interested in calculating the wake potentials for short bunches in long structures using TBCI. For ultra-relativistic particles it is sufficient to solve for the fields only over a window containing the bunch and moving along with it. This technique reduces both the memory and the running time required by a factor that equals the ratio of the structure length to the window length. For example, for a bunch with sigma/sub z/ of one picosecond traversing a single SLAC cell this improvement factor is 15. It is thus possible to solve for the wakefields in very long structures: for a given problem, increasing the structure length will not change the memory required while only adding linearly to the CPU time needed
Total and segmental colon transit time in constipated children assessed by scintigraphy with 111In-DTPA given orally.

Science.gov (United States)

Vattimo, A; Burroni, L; Bertelli, P; Messina, M; Meucci, D; Tota, G

1993-12-01

Serial colon scintigraphy using 111In-DTPA (2 MBq) given orally was performed in 39 children referred for constipation, and the total and segmental colon transit times were measured. The bowel movements during the study were recorded and the intervals between defecations (ID) were calculated. This method proved able to identify children with normal colon morphology (no. = 32) and those with dolichocolon (no. = 7). Normal children were not included for ethical reasons and we used the normal range determined by others using x-ray methods (29 +/- 4 hours). Total and segmental colon transit times were found to be prolonged in all children with dolichocolon (TC: 113.55 +/- 41.20 hours; RC: 39.85 +/- 26.39 hours; LC: 43.05 +/- 18.30 hours; RS: 30.66 +/- 26.89 hours). In the group of children with a normal colon shape, 13 presented total and segmental colon transit times within the referred normal value (TC: 27.79 +/- 4.10 hours; RC: 9.11 +/- 2.53 hours; LC: 9.80 +/- 3.50 hours; RS: 8.88 +/- 4.09 hours) and normal bowel function (ID: 23.37 +/- 5.93 hours). In the remaining children, 5 presented prolonged retention in the rectum (RS: 53.36 +/- 29.66 hours), and 14 a prolonged transit time in all segments. A good correlation was found between the transit time and bowel function. From the point of view of radiation dosimetry, the most heavily irradiated organs were the lower large intestine and the ovaries, and the level of radiation burden depended on the colon transit time. We can conclude that the described method results safe, accurate and fully diagnostic.
Time delay and duration of ionospheric total electron content responses to geomagnetic disturbances

Directory of Open Access Journals (Sweden)

J. Liu

2010-03-01

Full Text Available Although positive and negative signatures of ionospheric storms have been reported many times, global characteristics such as the time of occurrence, time delay and duration as well as their relations to the intensity of the ionospheric storms have not received enough attention. The 10 years of global ionosphere maps (GIMs of total electron content (TEC retrieved at Jet Propulsion Laboratory (JPL were used to conduct a statistical study of the time delay of the ionospheric responses to geomagnetic disturbances. Our results show that the time delays between geomagnetic disturbances and TEC responses depend on season, magnetic local time and magnetic latitude. In the summer hemisphere at mid- and high latitudes, the negative storm effects can propagate to the low latitudes at post-midnight to the morning sector with a time delay of 4–7 h. As the earth rotates to the sunlight, negative phase retreats to higher latitudes and starts to extend to the lower latitude toward midnight sector. In the winter hemisphere during the daytime and after sunset at mid- and low latitudes, the negative phase appearance time is delayed from 1–10 h depending on the local time, latitude and storm intensity compared to the same area in the summer hemisphere. The quick response of positive phase can be observed at the auroral area in the night-side of the winter hemisphere. At the low latitudes during the dawn-noon sector, the ionospheric negative phase responses quickly with time delays of 5–7 h in both equinoctial and solsticial months.

Our results also manifest that there is a positive correlation between the intensity of geomagnetic disturbances and the time duration of both the positive phase and negative phase. The durations of both negative phase and positive phase have clear latitudinal, seasonal and magnetic local time (MLT dependence. In the winter hemisphere, long durations for the positive phase are 8–11 h and 12–14 h during the daytime at
Time delay and duration of ionospheric total electron content responses to geomagnetic disturbances

Directory of Open Access Journals (Sweden)

J. Liu

2010-03-01

Full Text Available Although positive and negative signatures of ionospheric storms have been reported many times, global characteristics such as the time of occurrence, time delay and duration as well as their relations to the intensity of the ionospheric storms have not received enough attention. The 10 years of global ionosphere maps (GIMs of total electron content (TEC retrieved at Jet Propulsion Laboratory (JPL were used to conduct a statistical study of the time delay of the ionospheric responses to geomagnetic disturbances. Our results show that the time delays between geomagnetic disturbances and TEC responses depend on season, magnetic local time and magnetic latitude. In the summer hemisphere at mid- and high latitudes, the negative storm effects can propagate to the low latitudes at post-midnight to the morning sector with a time delay of 4–7 h. As the earth rotates to the sunlight, negative phase retreats to higher latitudes and starts to extend to the lower latitude toward midnight sector. In the winter hemisphere during the daytime and after sunset at mid- and low latitudes, the negative phase appearance time is delayed from 1–10 h depending on the local time, latitude and storm intensity compared to the same area in the summer hemisphere. The quick response of positive phase can be observed at the auroral area in the night-side of the winter hemisphere. At the low latitudes during the dawn-noon sector, the ionospheric negative phase responses quickly with time delays of 5–7 h in both equinoctial and solsticial months. Our results also manifest that there is a positive correlation between the intensity of geomagnetic disturbances and the time duration of both the positive phase and negative phase. The durations of both negative phase and positive phase have clear latitudinal, seasonal and magnetic local time (MLT dependence. In the winter hemisphere, long durations for the positive phase are 8–11 h and 12–14 h during the daytime at middle
Usability of a new multiple high-speed pulse time data registration, processing and real-time display system for pulse time interval analysis

International Nuclear Information System (INIS)

Yawata, Takashi; Sakaue, Hisanobu; Hashimoto, Tetsuo; Itou, Shigeki

2006-01-01

A new high-speed multiple pulse time data registration, processing and real-time display system for time interval analysis (TIA) was developed for counting either β-α or α-α correlated decay-events. The TIA method has been so far limited to selective extraction of successive α-α decay events within the milli-second time scale owing to the use of original electronic hardware. In the present pulse-processing system, three different high-speed α/β(γ) pulses could be fed quickly to original 32 bit PCI board (ZN-HTS2) within 1 μs. This original PCI board is consisting of a timing-control IC (HTS-A) and 28 bit counting IC (HTS-B). All channel and pulse time data were stored to FIFO RAM, followed to transfer into temporary CPU RAM (32 MB) by DMA. Both data registration (into main RAM (200 MB)) and calculation of pulse time intervals together with real-time TIA-distribution display simultaneously processed using two sophisticate softwares. The present system has proven to succeed for the real-time display of TIA distribution spectrum even when 1.6x10 5 cps pulses from pulse generator were given to the system. By using this new system combined with liquid scintillation counting (LSC) apparatus, both a natural micro-second order β-α correlated decay-events and a milli-second order α-α correlated decay-event could be selectively extracted from the mixture of natural radionuclides. (author)
Optimum filters with time width constraints for liquid argon total-absorption detectors

International Nuclear Information System (INIS)

Gatti, E.; Radeka, V.

1977-10-01

Optimum filter responses are found for triangular current input pulses occurring in liquid argon ionization chambers used as total absorption detectors. The filters considered are subject to the following constraints: finite width of the output pulse having a prescribed ratio to the width of the triangular input current pulse and zero area of a bipolar antisymmetrical pulse or of a three lobe pulse, as required for high event rates. The feasibility of pulse shaping giving an output equal to, or shorter than, the input one is demonstrated. It is shown that the signal-to-noise ratio remains constant for the chamber interelectrode gap which gives an input pulse width (i.e., electron drift time) greater than one third of the required output pulse width
Some optimizations of the animal code

International Nuclear Information System (INIS)

Fletcher, W.T.

1975-01-01

Optimizing techniques were performed on a version of the ANIMAL code (MALAD1B) at the source-code (FORTRAN) level. Sample optimizing techniques and operations used in MALADOP--the optimized version of the code--are presented, along with a critique of some standard CDC 7600 optimizing techniques. The statistical analysis of total CPU time required for MALADOP and MALAD1B shows a run-time saving of 174 msec (almost 3 percent) in the code MALADOP during one time step
VERSE - Virtual Equivalent Real-time Simulation

Science.gov (United States)

Zheng, Yang; Martin, Bryan J.; Villaume, Nathaniel

2005-01-01

Distributed real-time simulations provide important timing validation and hardware in the- loop results for the spacecraft flight software development cycle. Occasionally, the need for higher fidelity modeling and more comprehensive debugging capabilities - combined with a limited amount of computational resources - calls for a non real-time simulation environment that mimics the real-time environment. By creating a non real-time environment that accommodates simulations and flight software designed for a multi-CPU real-time system, we can save development time, cut mission costs, and reduce the likelihood of errors. This paper presents such a solution: Virtual Equivalent Real-time Simulation Environment (VERSE). VERSE turns the real-time operating system RTAI (Real-time Application Interface) into an event driven simulator that runs in virtual real time. Designed to keep the original RTAI architecture as intact as possible, and therefore inheriting RTAI's many capabilities, VERSE was implemented with remarkably little change to the RTAI source code. This small footprint together with use of the same API allows users to easily run the same application in both real-time and virtual time environments. VERSE has been used to build a workstation testbed for NASA's Space Interferometry Mission (SIM PlanetQuest) instrument flight software. With its flexible simulation controls and inexpensive setup and replication costs, VERSE will become an invaluable tool in future mission development.
Terahertz time-domain attenuated total reflection spectroscopy applied to the rapid discrimination of the botanical origin of honeys

Science.gov (United States)

Liu, Wen; Zhang, Yuying; Yang, Si; Han, Donghai

2018-05-01

A new technique to identify the floral resources of honeys is demanded. Terahertz time-domain attenuated total reflection spectroscopy combined with chemometrics methods was applied to discriminate different categorizes (Medlar honey, Vitex honey, and Acacia honey). Principal component analysis (PCA), cluster analysis (CA) and partial least squares-discriminant analysis (PLS-DA) have been used to find information of the botanical origins of honeys. Spectral range also was discussed to increase the precision of PLS-DA model. The accuracy of 88.46% for validation set was obtained, using PLS-DA model in 0.5-1.5 THz. This work indicated terahertz time-domain attenuated total reflection spectroscopy was an available approach to evaluate the quality of honey rapidly.
A contribution to the numerical calculation of static electromagnetic fields in unbounded domains

International Nuclear Information System (INIS)

Krawczyk, F.

1990-11-01

The numerical calculation of static electromagnetic fields for arbitrarily shaped three-dimensional structures, especially in unbounded domains, is very memory and cpu-time consuming. In this thesis several schemes that reduce memory and cpu-time consumption have been developed or introduced. The memory needed can be reduced by a special simulation of boundaries towards open space and by the use of a scalar potential for the field description. Known disadvantages of the use of such a potential are avoided by an improved formulation of the used algorithms. The cpu-time for the calculations can be reduced remarkably in many cases by using a multigrid solution scheme including a defect-correction. A computer code has been written that uses these algorithms. With the help of this program it has been demonstrated that using these algorithms, distinct improvements in terms of computer memory, cpu-time consumption and accuracy can be achieved. (orig.) [de
Total testosterone levels are often more than three times elevated in patients with androgen-secreting tumours

DEFF Research Database (Denmark)

Glintborg, Dorte; Lambaa Altinok, Magda; Petersen, Kresten Rubeck

2015-01-01

surgery. Terminal hair growth on lip and chin gradually increases after menopause, which complicates distinction from normal physiological variation. Precise testosterone assays have just recently become available in the daily clinic. We present three women diagnosed with testosterone-producing tumours...... when total testosterone levels are above three times the upper reference limit....
Parallel direct solver for finite element modeling of manufacturing processes

DEFF Research Database (Denmark)

Nielsen, Chris Valentin; Martins, P.A.F.

2017-01-01

The central processing unit (CPU) time is of paramount importance in finite element modeling of manufacturing processes. Because the most significant part of the CPU time is consumed in solving the main system of equations resulting from finite element assemblies, different approaches have been...

Significantly reducing registration time in IGRT using graphics processing units

DEFF Research Database (Denmark)

Noe, Karsten Østergaard; Denis de Senneville, Baudouin; Tanderup, Kari

2008-01-01

respiration phases in a free breathing volunteer and 41 anatomical landmark points in each image series. The registration method used is a multi-resolution GPU implementation of the 3D Horn and Schunck algorithm. It is based on the CUDA framework from Nvidia. Results On an Intel Core 2 CPU at 2.4GHz each...... registration took 30 minutes. On an Nvidia Geforce 8800GTX GPU in the same machine this registration took 37 seconds, making the GPU version 48.7 times faster. The nine image series of different respiration phases were registered to the same reference image (full inhale). Accuracy was evaluated on landmark...
Clinical responses after total body irradiation by over permissible dose of γ-rays in one time

International Nuclear Information System (INIS)

Jiang Benrong; Wang Guilin; Liu Huilan; Tang Xingsheng; Ai Huisheng

1990-01-01

The clinical responses of patients after total body over permissilbe dose γ-ray irradiation were observed and analysed. The results showed: when the dose was above 5 cGy, there was some immunological depression, but no significant change in hematopoietic functions. 5 cases showed some transient changes of ECG, perhaps due to vagotonia caused by psychological imbalance, One case vomitted 3-4 times after 28 cGy irradiation, this suggested that a few times of vomitting had no significance in the estimation of the irradiated dose and the whole clinical manifestations must be concretely analysed
Real-time computation of parameter fitting and image reconstruction using graphical processing units

Science.gov (United States)

Locans, Uldis; Adelmann, Andreas; Suter, Andreas; Fischer, Jannis; Lustermann, Werner; Dissertori, Günther; Wang, Qiulin

2017-06-01

In recent years graphical processing units (GPUs) have become a powerful tool in scientific computing. Their potential to speed up highly parallel applications brings the power of high performance computing to a wider range of users. However, programming these devices and integrating their use in existing applications is still a challenging task. In this paper we examined the potential of GPUs for two different applications. The first application, created at Paul Scherrer Institut (PSI), is used for parameter fitting during data analysis of μSR (muon spin rotation, relaxation and resonance) experiments. The second application, developed at ETH, is used for PET (Positron Emission Tomography) image reconstruction and analysis. Applications currently in use were examined to identify parts of the algorithms in need of optimization. Efficient GPU kernels were created in order to allow applications to use a GPU, to speed up the previously identified parts. Benchmarking tests were performed in order to measure the achieved speedup. During this work, we focused on single GPU systems to show that real time data analysis of these problems can be achieved without the need for large computing clusters. The results show that the currently used application for parameter fitting, which uses OpenMP to parallelize calculations over multiple CPU cores, can be accelerated around 40 times through the use of a GPU. The speedup may vary depending on the size and complexity of the problem. For PET image analysis, the obtained speedups of the GPU version were more than × 40 larger compared to a single core CPU implementation. The achieved results show that it is possible to improve the execution time by orders of magnitude.
Study of time-critical diagnostic method for emergency operation of nuclear power plant

International Nuclear Information System (INIS)

Gofuku, A.; Yoshikawa, H.; Itoh, K.; Wakabayashi, J.

1986-01-01

In order to support the emergency operation of nuclear power plant, the method of time-critical diagnostic plant analyzer has been investigated. The conception of the emergency operation support center is proposed and two types of plant analyzer may be installed in this center. One analyzer is a real-time tracking simulation code using the observed signals and another is a fast trend-prediction code. A real-time tracking code, TOKRAC, has been developed for analyzing the PWR primary loop thermo-hydraulics at SBLOCA, and the applicability of this code was examined by the numerical experiments for the initial phase transient of both TMI-2 accident and 6% coldleg SBLOCA of a Westinghouse-type PWR plant. The results showed that fairly good tracking was carried out by TOKRAC. The CPU time of TOKRAC was about 12-14 percent of real-time
Time-driven activity based costing of total knee replacement surgery at a London teaching hospital.

Science.gov (United States)

Chen, Alvin; Sabharwal, Sanjeeve; Akhtar, Kashif; Makaram, Navnit; Gupte, Chinmay M

2015-12-01

The aim of this study was to conduct a time-driven activity based costing (TDABC) analysis of the clinical pathway for total knee replacement (TKR) and to determine where the major cost drivers lay. The in-patient pathway was prospectively mapped utilising a TDABC model, following 20 TKRs. The mean age for these patients was 73.4 years. All patients were ASA grade I or II and their mean BMI was 30.4. The 14 varus knees had a mean deformity of 5.32° and the six valgus knee had a mean deformity of 10.83°. Timings were prospectively collected as each patient was followed through the TKR pathway. Pre-operative costs including pre-assessment and joint school were £ 163. Total staff costs for admission and the operating theatre were £ 658. Consumables cost for the operating theatre were £ 1862. The average length of stay was 5.25 days at a total cost of £ 910. Trust overheads contributed £ 1651. The overall institutional cost of a 'noncomplex' TKR in patients without substantial medical co-morbidities was estimated to be £ 5422, representing a profit of £ 1065 based on a best practice tariff of £ 6487. The major cost drivers in the TKR pathway were determined to be theatre consumables, corporate overheads, overall ward cost and operating theatre staffing costs. Appropriate discounting of implant costs, reduction in length of stay by adopting an enhanced recovery programme and control of corporate overheads through the use of elective orthopaedic treatment centres are proposed approaches for reducing the overall cost of treatment. Copyright © 2015 Elsevier B.V. All rights reserved.
Influence of different maceration time and temperatures on total phenols, colour and sensory properties of Cabernet Sauvignon wines.

Science.gov (United States)

Şener, Hasan; Yildirim, Hatice Kalkan

2013-12-01

Maceration and fermentation time and temperatures are important factors affecting wine quality. In this study different maceration times (3 and 6 days) and temperatures (15 and 25 ) during production of red wine (Vitis vinifera L. Cabernet Sauvignon) were investigated. In all wines standard wine chemical parameters and some specific parameters as total phenols, tartaric esters, total flavonols and colour parameters (CD, CI, T, dA%, %Y, %R, %B, CIELAB values) were determined. Sensory evaluation was performed by descriptive sensory analysis. The results demonstrated not only the importance of skin contact time and temperature during maceration but also the effects of transition temperatures (different maceration and fermentation temperatures) on wine quality as a whole. The results of sensory descriptive analyses revealed that the temperature significantly affected the aroma and flavour attributes of wines. The highest scores for 'cassis', 'clove', 'fresh fruity' and 'rose' characters were obtained in wines produced at low temperature (15 ) of maceration (6 days) and fermentation.
Parallelizing ATLAS Reconstruction and Simulation: Issues and Optimization Solutions for Scaling on Multi- and Many-CPU Platforms

International Nuclear Information System (INIS)

Leggett, C; Jackson, K; Tatarkhanov, M; Yao, Y; Binet, S; Levinthal, D

2011-01-01

Thermal limitations have forced CPU manufacturers to shift from simply increasing clock speeds to improve processor performance, to producing chip designs with multi- and many-core architectures. Further the cores themselves can run multiple threads as a zero overhead context switch allowing low level resource sharing (Intel Hyperthreading). To maximize bandwidth and minimize memory latency, memory access has become non uniform (NUMA). As manufacturers add more cores to each chip, a careful understanding of the underlying architecture is required in order to fully utilize the available resources. We present AthenaMP and the Atlas event loop manager, the driver of the simulation and reconstruction engines, which have been rewritten to make use of multiple cores, by means of event based parallelism, and final stage I/O synchronization. However, initial studies on 8 andl6 core Intel architectures have shown marked non-linearities as parallel process counts increase, with as much as 30% reductions in event throughput in some scenarios. Since the Intel Nehalem architecture (both Gainestown and Westmere) will be the most common choice for the next round of hardware procurements, an understanding of these scaling issues is essential. Using hardware based event counters and Intel's Performance Tuning Utility, we have studied the performance bottlenecks at the hardware level, and discovered optimization schemes to maximize processor throughput. We have also produced optimization mechanisms, common to all large experiments, that address the extreme nature of today's HEP code, which due to it's size, places huge burdens on the memory infrastructure of today's processors.
Brake response time is significantly impaired after total knee arthroplasty: investigation of performing an emergency stop while driving a car.

Science.gov (United States)

Jordan, Maurice; Hofmann, Ulf-Krister; Rondak, Ina; Götze, Marco; Kluba, Torsten; Ipach, Ingmar

2015-09-01

The objective of this study was to investigate whether total knee arthroplasty (TKA) impairs the ability to perform an emergency stop. An automatic transmission brake simulator was developed to evaluate total brake response time. A prospective repeated-measures design was used. Forty patients (20 left/20 right) were measured 8 days and 6, 12, and 52 wks after surgery. Eight days postoperative total brake response time increased significantly by 30% in right TKA and insignificantly by 2% in left TKA. Brake force significantly decreased by 35% in right TKA and by 25% in left TKA during this period. Baseline values were reached at week 12 in right TKA; the impairment of outcome measures, however, was no longer significant at week 6 compared with preoperative values. Total brake response time and brake force in left TKA fell below baseline values at weeks 6 and 12. Brake force in left TKA was the only outcome measure significantly impaired 8 days postoperatively. This study highlights that categorical statements cannot be provided. This study's findings on automatic transmission driving suggest that right TKA patients may resume driving 6 wks postoperatively. Fitness to drive in left TKA is not fully recovered 8 days postoperatively. If testing is not available, patients should refrain from driving until they return from rehabilitation.
Numerical method improvement for a subchannel code

Energy Technology Data Exchange (ETDEWEB)

Ding, W.J.; Gou, J.L.; Shan, J.Q. [Xi' an Jiaotong Univ., Shaanxi (China). School of Nuclear Science and Technology

2016-07-15

Previous studies showed that the subchannel codes need most CPU time to solve the matrix formed by the conservation equations. Traditional matrix solving method such as Gaussian elimination method and Gaussian-Seidel iteration method cannot meet the requirement of the computational efficiency. Therefore, a new algorithm for solving the block penta-diagonal matrix is designed based on Stone's incomplete LU (ILU) decomposition method. In the new algorithm, the original block penta-diagonal matrix will be decomposed into a block upper triangular matrix and a lower block triangular matrix as well as a nonzero small matrix. After that, the LU algorithm is applied to solve the matrix until the convergence. In order to compare the computational efficiency, the new designed algorithm is applied to the ATHAS code in this paper. The calculation results show that more than 80 % of the total CPU time can be saved with the new designed ILU algorithm for a 324-channel PWR assembly problem, compared with the original ATHAS code.
Acceleration of PIC simulation with GPU

International Nuclear Information System (INIS)

Suzuki, Junya; Shimazu, Hironori; Fukazawa, Keiichiro; Den, Mitsue

2011-01-01

Particle-in-cell (PIC) is a simulation technique for plasma physics. The large number of particles in high-resolution plasma simulation increases the volume computation required, making it vital to increase computation speed. In this study, we attempt to accelerate computation speed on graphics processing units (GPUs) using KEMPO, a PIC simulation code package. We perform two tests for benchmarking, with small and large grid sizes. In these tests, we run KEMPO1 code using a CPU only, both a CPU and a GPU, and a GPU only. The results showed that performance using only a GPU was twice that of using a CPU alone. While, execution time for using both a CPU and GPU is comparable to the tests with a CPU alone, because of the significant bottleneck in communication between the CPU and GPU. (author)
Real-Time Incompressible Fluid Simulation on the GPU

Directory of Open Access Journals (Sweden)

Xiao Nie

2015-01-01

Full Text Available We present a parallel framework for simulating incompressible fluids with predictive-corrective incompressible smoothed particle hydrodynamics (PCISPH on the GPU in real time. To this end, we propose an efficient GPU streaming pipeline to map the entire computational task onto the GPU, fully exploiting the massive computational power of state-of-the-art GPUs. In PCISPH-based simulations, neighbor search is the major performance obstacle because this process is performed several times at each time step. To eliminate this bottleneck, an efficient parallel sorting method for this time-consuming step is introduced. Moreover, we discuss several optimization techniques including using fast on-chip shared memory to avoid global memory bandwidth limitations and thus further improve performance on modern GPU hardware. With our framework, the realism of real-time fluid simulation is significantly improved since our method enforces incompressibility constraint which is typically ignored due to efficiency reason in previous GPU-based SPH methods. The performance results illustrate that our approach can efficiently simulate realistic incompressible fluid in real time and results in a speed-up factor of up to 23 on a high-end NVIDIA GPU in comparison to single-threaded CPU-based implementation.
Concentration and flux of total and dissolved phosphorus, total nitrogen, chloride, and total suspended solids for monitored tributaries of Lake Champlain, 1990-2012

Science.gov (United States)

Medalie, Laura

2014-01-01

Annual and daily concentrations and fluxes of total and dissolved phosphorus, total nitrogen, chloride, and total suspended solids were estimated for 18 monitored tributaries to Lake Champlain by using the Weighted Regressions on Time, Discharge, and Seasons regression model. Estimates were made for 21 or 23 years, depending on data availability, for the purpose of providing timely and accessible summary reports as stipulated in the 2010 update to the Lake Champlain “Opportunities for Action” management plan. Estimates of concentration and flux were provided for each tributary based on (1) observed daily discharges and (2) a flow-normalizing procedure, which removed the random fluctuations of climate-related variability. The flux bias statistic, an indicator of the ability of the Weighted Regressions on Time, Discharge, and Season regression models to provide accurate representations of flux, showed acceptable bias (less than ±10 percent) for 68 out of 72 models for total and dissolved phosphorus, total nitrogen, and chloride. Six out of 18 models for total suspended solids had moderate bias (between 10 and 30 percent), an expected result given the frequently nonlinear relation between total suspended solids and discharge. One model for total suspended solids with a very high bias was influenced by a single extreme value; however, removal of that value, although reducing the bias substantially, had little effect on annual fluxes.
Neither pre-operative education or a minimally invasive procedure have any influence on the recovery time after total hip replacement.

Science.gov (United States)

Biau, David Jean; Porcher, Raphael; Roren, Alexandra; Babinet, Antoine; Rosencher, Nadia; Chevret, Sylvie; Poiraudeau, Serge; Anract, Philippe

2015-08-01

The purpose of this study was to evaluate pre-operative education versus no education and mini-invasive surgery versus standard surgery to reach complete independence. We conducted a four-arm randomized controlled trial of 209 patients. The primary outcome criterion was the time to reach complete functional independence. Secondary outcomes included the operative time, the estimated total blood loss, the pain level, the dose of morphine, and the time to discharge. There was no significant effect of either education (HR: 1.1; P = 0.77) or mini-invasive surgery (HR: 1.0; 95 %; P = 0.96) on the time to reach complete independence. The mini-invasive surgery group significantly reduced the total estimated blood loss (P = 0.0035) and decreased the dose of morphine necessary for titration in the recovery (P = 0.035). Neither pre-operative education nor mini-invasive surgery reduces the time to reach complete functional independence. Mini-invasive surgery significantly reduces blood loss and the need for morphine consumption.
Stochastic first passage time accelerated with CUDA

Science.gov (United States)

Pierro, Vincenzo; Troiano, Luigi; Mejuto, Elena; Filatrella, Giovanni

2018-05-01

The numerical integration of stochastic trajectories to estimate the time to pass a threshold is an interesting physical quantity, for instance in Josephson junctions and atomic force microscopy, where the full trajectory is not accessible. We propose an algorithm suitable for efficient implementation on graphical processing unit in CUDA environment. The proposed approach for well balanced loads achieves almost perfect scaling with the number of available threads and processors, and allows an acceleration of about 400× with a GPU GTX980 respect to standard multicore CPU. This method allows with off the shell GPU to challenge problems that are otherwise prohibitive, as thermal activation in slowly tilted potentials. In particular, we demonstrate that it is possible to simulate the switching currents distributions of Josephson junctions in the timescale of actual experiments.
Observed and simulated time evolution of HCl, ClONO2, and HF total column abundances

Directory of Open Access Journals (Sweden)

B.-M. Sinnhuber

2012-04-01

Full Text Available Time series of total column abundances of hydrogen chloride (HCl, chlorine nitrate (ClONO2, and hydrogen fluoride (HF were determined from ground-based Fourier transform infrared (FTIR spectra recorded at 17 sites belonging to the Network for the Detection of Atmospheric Composition Change (NDACC and located between 80.05° N and 77.82° S. By providing such a near-global overview on ground-based measurements of the two major stratospheric chlorine reservoir species, HCl and ClONO2, the present study is able to confirm the decrease of the atmospheric inorganic chlorine abundance during the last few years. This decrease is expected following the 1987 Montreal Protocol and its amendments and adjustments, where restrictions and a subsequent phase-out of the prominent anthropogenic chlorine source gases (solvents, chlorofluorocarbons were agreed upon to enable a stabilisation and recovery of the stratospheric ozone layer. The atmospheric fluorine content is expected to be influenced by the Montreal Protocol, too, because most of the banned anthropogenic gases also represent important fluorine sources. But many of the substitutes to the banned gases also contain fluorine so that the HF total column abundance is expected to have continued to increase during the last few years. The measurements are compared with calculations from five different models: the two-dimensional Bremen model, the two chemistry-transport models KASIMA and SLIMCAT, and the two chemistry-climate models EMAC and SOCOL. Thereby, the ability of the models to reproduce the absolute total column amounts, the seasonal cycles, and the temporal evolution found in the FTIR measurements is investigated and inter-compared. This is especially interesting because the models have different architectures. The overall agreement between the measurements and models for the total column abundances and the seasonal cycles is good. Linear trends of HCl, ClONO2, and HF are calculated from both
Feasibility study of helical tomotherapy for total body or total marrow irradiation

International Nuclear Information System (INIS)

Hui, Susanta K.; Kapatoes, Jeff; Fowler, Jack; Henderson, Douglas; Olivera, Gustavo; Manon, Rafael R.; Gerbi, Bruce; Mackie, T. R.; Welsh, James S.

2005-01-01

Total body radiation (TBI) has been used for many years as a preconditioning agent before bone marrow transplantation. Many side effects still plague its use. We investigated the planning and delivery of total body irradiation (TBI) and selective total marrow irradiation (TMI) and a reduced radiation dose to sensitive structures using image-guided helical tomotherapy. To assess the feasibility of using helical tomotherapy (A) we studied variations in pitch, field width, and modulation factor on total body and total marrow helical tomotherapy treatments. We varied these parameters to provide a uniform dose along with a treatment times similar to conventional TBI (15-30 min). (B) We also investigated limited (head, chest, and pelvis) megavoltage CT (MVCT) scanning for the dimensional pretreatment setup verification rather than total body MVCT scanning to shorten the overall treatment time per treatment fraction. (C) We placed thermoluminescent detectors (TLDs) inside a Rando phantom to measure the dose at seven anatomical sites, including the lungs. A simulated TBI treatment showed homogeneous dose coverage (±10%) to the whole body. Doses to the sensitive organs were reduced by 35%-70% of the target dose. TLD measurements on Rando showed an accurate dose delivery (±7%) to the target and critical organs. In the TMI study, the dose was delivered conformally to the bone marrow only. The TBI and TMI treatment delivery time was reduced (by 50%) by increasing the field width from 2.5 to 5.0 cm in the inferior-superior direction. A limited MVCT reduced the target localization time 60% compared to whole body MVCT. MVCT image-guided helical tomotherapy offers a novel method to deliver a precise, homogeneous radiation dose to the whole body target while reducing the dose significantly to all critical organs. A judicious selection of pitch, modulation factor, and field size is required to produce a homogeneous dose distribution along with an acceptable treatment time. In
A two-level real-time vision machine combining coarse and fine grained parallelism

DEFF Research Database (Denmark)

Jensen, Lars Baunegaard With; Kjær-Nielsen, Anders; Pauwels, Karl

2010-01-01

In this paper, we describe a real-time vision machine having a stereo camera as input generating visual information on two different levels of abstraction. The system provides visual low-level and mid-level information in terms of dense stereo and optical flow, egomotion, indicating areas...... a factor 90 and a reduction of latency of a factor 26 compared to processing on a single CPU--core. Since the vision machine provides generic visual information it can be used in many contexts. Currently it is used in a driver assistance context as well as in two robotic applications....
A heterogeneous system based on GPU and multi-core CPU for real-time fluid and rigid body simulation

Science.gov (United States)

da Silva Junior, José Ricardo; Gonzalez Clua, Esteban W.; Montenegro, Anselmo; Lage, Marcos; Dreux, Marcelo de Andrade; Joselli, Mark; Pagliosa, Paulo A.; Kuryla, Christine Lucille

2012-03-01

Computational fluid dynamics in simulation has become an important field not only for physics and engineering areas but also for simulation, computer graphics, virtual reality and even video game development. Many efficient models have been developed over the years, but when many contact interactions must be processed, most models present difficulties or cannot achieve real-time results when executed. The advent of parallel computing has enabled the development of many strategies for accelerating the simulations. Our work proposes a new system which uses some successful algorithms already proposed, as well as a data structure organisation based on a heterogeneous architecture using CPUs and GPUs, in order to process the simulation of the interaction of fluids and rigid bodies. This successfully results in a two-way interaction between them and their surrounding objects. As far as we know, this is the first work that presents a computational collaborative environment which makes use of two different paradigms of hardware architecture for this specific kind of problem. Since our method achieves real-time results, it is suitable for virtual reality, simulation and video game fluid simulation problems.
Billing the CPU Time Used by System Components on Behalf of VMs

OpenAIRE

Djomgwe Teabe , Boris; Tchana , Alain-Bouzaïde; Hagimont , Daniel

2016-01-01

International audience; Nowadays, virtualization is present in almost all cloud infrastructures. In virtualized cloud, virtual machines (VMs) are the basis for allocating resources. A VM is launched with a fixed allocated computing capacity that should be strictly provided by the hosting system scheduler. Unfortunately, this allocated capacity is not always respected, due to mechanisms provided by the virtual machine monitoring system (also known as hypervisor). For instance, we observe that ...
Billing the CPU Time Used by System Components on Behalf of VMs

OpenAIRE

Djomgwe Teabe, Boris; Tchana, Alain-Bouzaïde; Hagimont, Daniel

2016-01-01

Nowadays, virtualization is present in almost all cloud infrastructures. In virtualized cloud, virtual machines (VMs) are the basis for allocating resources. A VM is launched with a fixed allocated computing capacity that should be strictly provided by the hosting system scheduler. Unfortunately, this allocated capacity is not always respected, due to mechanisms provided by the virtual machine monitoring system (also known as hypervisor). For instance, we observe that a significant amount of ...

Integer batch scheduling problems for a single-machine with simultaneous effect of learning and forgetting to minimize total actual flow time

Directory of Open Access Journals (Sweden)

Rinto Yusriski

2015-09-01

Full Text Available This research discusses an integer batch scheduling problems for a single-machine with position-dependent batch processing time due to the simultaneous effect of learning and forgetting. The decision variables are the number of batches, batch sizes, and the sequence of the resulting batches. The objective is to minimize total actual flow time, defined as total interval time between the arrival times of parts in all respective batches and their common due date. There are two proposed algorithms to solve the problems. The first is developed by using the Integer Composition method, and it produces an optimal solution. Since the problems can be solved by the first algorithm in a worst-case time complexity O(n2n-1, this research proposes the second algorithm. It is a heuristic algorithm based on the Lagrange Relaxation method. Numerical experiments show that the heuristic algorithm gives outstanding results.
Clinical implementation of a GPU-based simplified Monte Carlo method for a treatment planning system of proton beam therapy

International Nuclear Information System (INIS)

Kohno, R; Hotta, K; Nishioka, S; Matsubara, K; Tansho, R; Suzuki, T

2011-01-01

We implemented the simplified Monte Carlo (SMC) method on graphics processing unit (GPU) architecture under the computer-unified device architecture platform developed by NVIDIA. The GPU-based SMC was clinically applied for four patients with head and neck, lung, or prostate cancer. The results were compared to those obtained by a traditional CPU-based SMC with respect to the computation time and discrepancy. In the CPU- and GPU-based SMC calculations, the estimated mean statistical errors of the calculated doses in the planning target volume region were within 0.5% rms. The dose distributions calculated by the GPU- and CPU-based SMCs were similar, within statistical errors. The GPU-based SMC showed 12.30–16.00 times faster performance than the CPU-based SMC. The computation time per beam arrangement using the GPU-based SMC for the clinical cases ranged 9–67 s. The results demonstrate the successful application of the GPU-based SMC to a clinical proton treatment planning. (note)
Criterion-based laparoscopic training reduces total training time

NARCIS (Netherlands)

Brinkman, W.M.; Buzink, S.N.; Alevizos, L.; De Hingh, I.H.J.T.; Jakimowicz, J.J.

2011-01-01

The benefits of criterion-based laparoscopic training over time-oriented training are unclear. The purpose of this study is to compare these types of training based on training outcome and time efficiency. Methods During four training sessions within 1 week (one session per day) 34 medical interns
Real-time image registration and fusion in a FPGA architecture (Ad-FIRE)

Science.gov (United States)

Waters, T.; Swan, L.; Rickman, R.

2011-06-01

Real-time Image Registration is a key processing requirement of Waterfall Solutions' image fusion system, Ad-FIRE, which combines the attributes of high resolution visible imagery with the spectral response of low resolution thermal sensors in a single composite image. Implementing image fusion at video frame rates typically requires a high bandwidth video processing capability which, within a standard CPU-type processing architecture, necessitates bulky, high power components. Field Programmable Gate Arrays (FPGAs) offer the prospect of low power/heat dissipation combined with highly efficient processing architectures for use in portable, battery-powered, passively cooled applications, such as Waterfall Solutions' hand-held or helmet-mounted Ad-FIRE system.
Decay-usage scheduling in multiprocessors

NARCIS (Netherlands)

Epema, D.H.J.

1998-01-01

Decay-usage scheduling is a priority-aging time-sharing scheduling policy capable of dealing with a workload of both interactive and batch jobs by decreasing the priority of a job when it acquires CPU time, and by increasing its priority when it does not use the (a) CPU. In this article we deal with
Suppression of numerical dispersion using FD modified operators; Atarashii sabunho no enzanshi wo mochiita suchi bunsan no yokusei

Energy Technology Data Exchange (ETDEWEB)

Takeuchi, N; Geller, R [The University of Tokyo, Tokyo (Japan). Faculty of Science

1996-05-01

The author, et al., have developed a formal evaluation theory for errors in numerical solutions and derived on the basis of this theory the conditions that a modified error minimizing operator should satisfy. A modified operator was derived for a calculus of finite difference in the time domain making use of this error evaluation theory. In this study, a modified operator was derived for O (2, 2) in the calculus of finite difference in time, and the operator was used in the calculation for the old and new methods about 1-dimension inhomogeneous media, and the two were quantitatively compared in CPU time and calculation accuracy. The calculation used 500 space grids and 5000 time grids. With the ratio of the time grid gap and space grid gap are kept constant, both CPU time and calculation accuracy were in proportion to the square of the number of grids. It was found in view of the result that the new method, as compared with the old method, needs only approximately 1/20 of CPU time in performing calculations of the same precision and that it maintains calculation accuracy that is approximately 20 times higher in the said CPU time. 4 refs., 2 figs., 1 tab.
Impact of operative time on early joint infection and deep vein thrombosis in primary total hip arthroplasty.

Science.gov (United States)

Wills, B W; Sheppard, E D; Smith, W R; Staggers, J R; Li, P; Shah, A; Lee, S R; Naranje, S M

2018-03-22

Infections and deep vein thrombosis (DVT) after total hip arthroplasty (THA) are challenging problems for both the patient and surgeon. Previous studies have identified numerous risk factors for infections and DVT after THA but have often been limited by sample size. We aimed to evaluate the effect of operative time on early postoperative infection as well as DVT rates following THA. We hypothesized that an increase in operative time would result in increased odds of acquiring an infection as well as a DVT. We conducted a retrospective analysis of prospectively collected data using the American College of Surgeons National Surgical Quality Improvement Program (NSQIP) database from 2006 to 2015 for all patients undergoing primary THA. Associations between operative time and infection or DVT were evaluated with multivariable logistic regressions controlling for demographics and several known risks factors for infection. Three different types of infections were evaluated: (1) superficial surgical site infection (SSI), an infection involving the skin or subcutaneous tissue, (2) deep SSI, an infection involving the muscle or fascial layers beneath the subcutaneous tissue, and (3) organ/space infection, an infection involving any part of the anatomy manipulated during surgery other than the incisional components. In total, 103,044 patients who underwent THA were included in our study. Our results suggested a significant association between superficial SSIs and operative time. Specifically, the adjusted odds of suffering a superficial SSI increased by 6% (CI=1.04-1.08, ptime. When using dichotomized operative time (90minutes), the adjusted odds of suffering a superficial SSI was 56% higher for patients with prolonged operative time (CI=1.05-2.32, p=0.0277). The adjusted odds of suffering a deep SSI increased by 7% for every 10-minute increase in operative time (CI=1.01-1.14, p=0.0335). No significant associations were detected between organ/space infection, wound
Minimizing total weighted tardiness for the single machine scheduling problem with dependent setup time and precedence constraints

Directory of Open Access Journals (Sweden)

Hamidreza Haddad

2012-04-01

Full Text Available This paper tackles the single machine scheduling problem with dependent setup time and precedence constraints. The primary objective of this paper is minimization of total weighted tardiness. Since the complexity of the resulted problem is NP-hard we use metaheuristics method to solve the resulted model. The proposed model of this paper uses genetic algorithm to solve the problem in reasonable amount of time. Because of high sensitivity of GA to its initial values of parameters, a Taguchi approach is presented to calibrate its parameters. Computational experiments validate the effectiveness and capability of proposed method.
Objectively measured physical environmental neighbourhood factors are not associated with accelerometer-determined total sedentary time in adults

OpenAIRE

Compernolle, Sofie; De Cocker, Katrien; Mackenbach, Joreintje D.; Van Nassau, Femke; Lakerveld, Jeroen; Cardon, Greet; De Bourdeaudhuij, Ilse

2017-01-01

Background: The physical neighbourhood environment may influence adults' sedentary behaviour. Yet, most studies examining the association between the physical neighbourhood environment and sedentary behaviour rely on self-reported data of either the physical neighbourhood environment and/or sedentary behaviour. The aim of this study was to investigate the associations between objectively measured physical environmental neighbourhood factors and accelerometer-determined total sedentary time in...
Extending DIII-D Neutral Beam Modulated Operations with a Camac Based Total on Time Interlock

International Nuclear Information System (INIS)

Baggest, D.S.; Broesch, J.D.; Phillips, J.C.

1999-01-01

A new total-on-time interlock has increased the operational time limits of the Neutral Beam systems at DIII-D. The interlock, called the Neutral Beam On-Time-Limiter (NBOTL), is a custom built CAMAC module utilizing a Xilinx 9572 Complex Programmable Logic Device (CPLD) as its primary circuit. The Neutral Beam Injection Systems are the primary source of auxiliary heating for DIII-D plasma discharges and contain eight sources capable of delivering 20MW of power. The delivered power is typically limited to 3.5 s per source to protect beam-line components, while a DIII-D plasma discharge usually exceeds 5 s. Implemented as a hardware interlock within the neutral beam power supplies, the NBOTL limits the beam injection time. With a continuing emphasis on modulated beam injections, the NBOTL guards against command faults and allows the beam injection to be safely spread over a longer plasma discharge time. The NBOTL design is an example of incorporating modern circuit design techniques (CPLD) within an established format (CAMAC). The CPLD is the heart of the NBOTL and contains 90% of the circuitry, including a loadable, 1 MHz, 28 bit, BCD count down timer, buffers, and CAMAC communication circuitry. This paper discusses the circuit design and implementation. Of particular interest is the melding of flexible modern programmable logic devices with the CAMAC format
A new system of computer-assisted navigation leading to reduction in operating time in uncemented total hip replacement in a matched population.

Science.gov (United States)

Chaudhry, Fouad A; Ismail, Sanaa Z; Davis, Edward T

2018-05-01

Computer-assisted navigation techniques are used to optimise component placement and alignment in total hip replacement. It has developed in the last 10 years but despite its advantages only 0.3% of all total hip replacements in England and Wales are done using computer navigation. One of the reasons for this is that computer-assisted technology increases operative time. A new method of pelvic registration has been developed without the need to register the anterior pelvic plane (BrainLab hip 6.0) which has shown to improve the accuracy of THR. The purpose of this study was to find out if the new method reduces the operating time. This was a retrospective analysis of comparing operating time in computer navigated primary uncemented total hip replacement using two methods of registration. Group 1 included 128 cases that were performed using BrainLab versions 2.1-5.1. This version relied on the acquisition of the anterior pelvic plane for registration. Group 2 included 128 cases that were performed using the newest navigation software, BrainLab hip 6.0 (registration possible with the patient in the lateral decubitus position). The operating time was 65.79 (40-98) minutes using the old method of registration and was 50.87 (33-74) minutes using the new method of registration. This difference was statistically significant. The body mass index (BMI) was comparable in both groups. The study supports the use of new method of registration in improving the operating time in computer navigated primary uncemented total hip replacements.
The FAIR timing master: a discussion of performance requirements and architectures for a high-precision timing system

International Nuclear Information System (INIS)

Kreider, M.

2012-01-01

Production chains in a particle accelerator are complex structures with many inter-dependencies and multiple paths to consider. This ranges from system initialization and synchronization of numerous machines to interlock handling and appropriate contingency measures like beam dump scenarios. The FAIR facility will employ White-Rabbit, a time based system which delivers an instruction and a corresponding execution time to a machine. In order to meet the deadlines in any given production chain, instructions need to be sent out ahead of time. For this purpose, code execution and message delivery times need to be known in advance. The FAIR Timing Master needs to be reliably capable of satisfying these timing requirements as well as being fault tolerant. Event sequences of recorded production chains indicate that low reaction times to internal and external events and fast, parallel execution are required. This suggests a slim architecture, especially devised for this purpose. Using the thread model of an OS or other high level programs on a generic CPU would be counterproductive when trying to achieve deterministic processing times. This paper deals with the analysis of said requirements as well as a comparison of known processor and virtual machine architectures and the possibilities of parallelization in programmable hardware. In addition, existing proposals at GSI will be checked against these findings. The final goal will be to determine the best instruction set for modeling any given production chain and devising a suitable architecture to execute these models. (authors)
Toward real-time diffuse optical tomography: accelerating light propagation modeling employing parallel computing on GPU and CPU

Science.gov (United States)

Doulgerakis, Matthaios; Eggebrecht, Adam; Wojtkiewicz, Stanislaw; Culver, Joseph; Dehghani, Hamid

2017-12-01

Parameter recovery in diffuse optical tomography is a computationally expensive algorithm, especially when used for large and complex volumes, as in the case of human brain functional imaging. The modeling of light propagation, also known as the forward problem, is the computational bottleneck of the recovery algorithm, whereby the lack of a real-time solution is impeding practical and clinical applications. The objective of this work is the acceleration of the forward model, within a diffusion approximation-based finite-element modeling framework, employing parallelization to expedite the calculation of light propagation in realistic adult head models. The proposed methodology is applicable for modeling both continuous wave and frequency-domain systems with the results demonstrating a 10-fold speed increase when GPU architectures are available, while maintaining high accuracy. It is shown that, for a very high-resolution finite-element model of the adult human head with ˜600,000 nodes, consisting of heterogeneous layers, light propagation can be calculated at ˜0.25 s/excitation source.
Wait time management strategies for total joint replacement surgery: sustainability and unintended consequences.

Science.gov (United States)

Pomey, Marie-Pascale; Clavel, Nathalie; Amar, Claudia; Sabogale-Olarte, Juan Carlos; Sanmartin, Claudia; De Coster, Carolyn; Noseworthy, Tom

2017-09-07

In Canada, long waiting times for core specialized services have consistently been identified as a key barrier to access. Governments and organizations have responded with strategies for better access management, notably for total joint replacement (TJR) of the hip and knee. While wait time management strategies (WTMS) are promising, the factors which influence their sustainable implementation at the organizational level are understudied. Consequently, this study examined organizational and systemic factors that made it possible to sustain waiting times for TJR within federally established limits and for at least 18 months or more. The research design is a multiple case study of WTMS implementation. Five cases were selected across five Canadian provinces. Three success levels were pre-defined: 1) the WTMS maintained compliance with requirements for more than 18 months; 2) the WTMS met requirements for 18 months but could not sustain the level thereafter; 3) the WTMS never met requirements. For each case, we collected documents and interviewed key informants. We analyzed systemic and organizational factors, with particular attention to governance and leadership, culture, resources, methods, and tools. We found that successful organizations had specific characteristics: 1) management of the whole care continuum, 2) strong clinical leadership; 3) dedicated committees to coordinate and sustain strategy; 4) a culture based on trust and innovation. All strategies led to relatively similar unintended consequences. The main negative consequence was an initial increase in waiting times for TJR and the main positive consequence was operational enhancement of other areas of specialization based on the TJR model. This study highlights important differences in factors which help to achieve and sustain waiting times. To be sustainable, a WTMS needs to generate greater synergies between contextual-level strategy (provincial or regional) and organizational objectives and
IMU-based Real-time Pose Measurement system for Anterior Pelvic Plane in Total Hip Replacement Surgeries.

Science.gov (United States)

Zhe Cao; Shaojie Su; Hao Tang; Yixin Zhou; Zhihua Wang; Hong Chen

2017-07-01

With the aging of population, the number of Total Hip Replacement Surgeries (THR) increased year by year. In THR, inaccurate position of the implanted prosthesis may lead to the failure of the operation. In order to reduce the failure rate and acquire the real-time pose of Anterior Pelvic Plane (APP), we propose a measurement system in this paper. The measurement system includes two parts: Initial Pose Measurement Instrument (IPMI) and Real-time Pose Measurement Instrument (RPMI). IPMI is used to acquire the initial pose of the APP, and RPMI is used to estimate the real-time pose of the APP. Both are composed of an Inertial Measurement Unit (IMU) and magnetometer sensors. To estimate the attitude of the measurement system, the Extended Kalman Filter (EKF) is adopted in this paper. The real-time pose of the APP could be acquired together with the algorithm designed in the paper. The experiment results show that the Root Mean Square Error (RMSE) is within 1.6 degrees, which meets the requirement of THR operations.
Minimizing the Total Service Time of Discrete Dynamic Berth Allocation Problem by an Iterated Greedy Heuristic

Science.gov (United States)

2014-01-01

Berth allocation is the forefront operation performed when ships arrive at a port and is a critical task in container port optimization. Minimizing the time ships spend at berths constitutes an important objective of berth allocation problems. This study focuses on the discrete dynamic berth allocation problem (discrete DBAP), which aims to minimize total service time, and proposes an iterated greedy (IG) algorithm to solve it. The proposed IG algorithm is tested on three benchmark problem sets. Experimental results show that the proposed IG algorithm can obtain optimal solutions for all test instances of the first and second problem sets and outperforms the best-known solutions for 35 out of 90 test instances of the third problem set. PMID:25295295
Minimizing the Total Service Time of Discrete Dynamic Berth Allocation Problem by an Iterated Greedy Heuristic

Directory of Open Access Journals (Sweden)

Shih-Wei Lin

2014-01-01

Full Text Available Berth allocation is the forefront operation performed when ships arrive at a port and is a critical task in container port optimization. Minimizing the time ships spend at berths constitutes an important objective of berth allocation problems. This study focuses on the discrete dynamic berth allocation problem (discrete DBAP, which aims to minimize total service time, and proposes an iterated greedy (IG algorithm to solve it. The proposed IG algorithm is tested on three benchmark problem sets. Experimental results show that the proposed IG algorithm can obtain optimal solutions for all test instances of the first and second problem sets and outperforms the best-known solutions for 35 out of 90 test instances of the third problem set.
A real time multi-server multi-client coherent database for a new high voltage system

International Nuclear Information System (INIS)

Gorbics, M.; Green, M.

1995-01-01

A high voltage system has been designed to allow multiple users (clients) access to the database of measured values and settings. This database is actively maintained in real time for a given mainframe containing multiple modules each having their own database. With limited CPU nd memory resources the mainframe system provides a data coherency scheme for multiple clients which (1) allows the client to determine when and what values need to be updated, (2) allows for changes from one client to be detected by another client, and (3) does not depend on the mainframe system tracking client accesses
Predictions of first passage times in sparse discrete fracture networks using graph-based reductions

Science.gov (United States)

Hyman, J.; Hagberg, A.; Srinivasan, G.; Mohd-Yusof, J.; Viswanathan, H. S.

2017-12-01

We present a graph-based methodology to reduce the computational cost of obtaining first passage times through sparse fracture networks. We derive graph representations of generic three-dimensional discrete fracture networks (DFNs) using the DFN topology and flow boundary conditions. Subgraphs corresponding to the union of the k shortest paths between the inflow and outflow boundaries are identified and transport on their equivalent subnetworks is compared to transport through the full network. The number of paths included in the subgraphs is based on the scaling behavior of the number of edges in the graph with the number of shortest paths. First passage times through the subnetworks are in good agreement with those obtained in the full network, both for individual realizations and in distribution. Accurate estimates of first passage times are obtained with an order of magnitude reduction of CPU time and mesh size using the proposed method.
Decreasing Postanesthesia Care Unit to Floor Transfer Times to Facilitate Short Stay Total Joint Replacements.

Science.gov (United States)

Sibia, Udai S; Grover, Jennifer; Turcotte, Justin J; Seanger, Michelle L; England, Kimberly A; King, Jennifer L; King, Paul J

2018-04-01

We describe a process for studying and improving baseline postanesthesia care unit (PACU)-to-floor transfer times after total joint replacements. Quality improvement project using lean methodology. Phase I of the investigational process involved collection of baseline data. Phase II involved developing targeted solutions to improve throughput. Phase III involved measured project sustainability. Phase I investigations revealed that patients spent an additional 62 minutes waiting in the PACU after being designated ready for transfer. Five to 16 telephone calls were needed between the PACU and the unit to facilitate each patient transfer. The most common reason for delay was unavailability of the unit nurse who was attending to another patient (58%). Phase II interventions resulted in transfer times decreasing to 13 minutes (79% reduction, P care at other institutions. Copyright © 2016 American Society of PeriAnesthesia Nurses. Published by Elsevier Inc. All rights reserved.

FAST: a three-dimensional time-dependent FEL simulation code

International Nuclear Information System (INIS)

Saldin, E.L.; Schneidmiller, E.A.; Yurkov, M.V.

1999-01-01

In this report we briefly describe the three-dimensional, time-dependent FEL simulation code FAST. The equations of motion of the particles and Maxwell's equations are solved simultaneously taking into account the slippage effect. Radiation fields are calculated using an integral solution of Maxwell's equations. A special technique has been developed for fast calculations of the radiation field, drastically reducing the required CPU time. As a result, the developed code allows one to use a personal computer for time-dependent simulations. The code allows one to simulate the radiation from the electron bunch of any transverse and longitudinal bunch shape; to simulate simultaneously an external seed with superimposed noise in the electron beam; to take into account energy spread in the electron beam and the space charge fields; and to simulate a high-gain, high-efficiency FEL amplifier with a tapered undulator. It is important to note that there are no significant memory limitations in the developed code and an electron bunch of any length can be simulated
Real-Time Model and Simulation Architecture for Half- and Full-Bridge Modular Multilevel Converters

Science.gov (United States)

Ashourloo, Mojtaba

This work presents an equivalent model and simulation architecture for real-time electromagnetic transient analysis of either half-bridge or full-bridge modular multilevel converter (MMC) with 400 sub-modules (SMs) per arm. The proposed CPU/FPGA-based architecture is optimized for the parallel implementation of the presented MMC model on the FPGA and is beneficiary of a high-throughput floating-point computational engine. The developed real-time simulation architecture is capable of simulating MMCs with 400 SMs per arm at 825 nanoseconds. To address the difficulties of the sorting process implementation, a modified Odd-Even Bubble sorting is presented in this work. The comparison of the results under various test scenarios reveals that the proposed real-time simulator is representing the system responses in the same way of its corresponding off-line counterpart obtained from the PSCAD/EMTDC program.
Criterion-based laparoscopic training reduces total training time

OpenAIRE

Brinkman, Willem M.; Buzink, Sonja N.; Alevizos, Leonidas; de Hingh, Ignace H. J. T.; Jakimowicz, Jack J.

2011-01-01

Introduction The benefits of criterion-based laparoscopic training over time-oriented training are unclear. The purpose of this study is to compare these types of training based on training outcome and time efficiency. Methods During four training sessions within 1 week (one session per day) 34 medical interns (no laparoscopic experience) practiced on two basic tasks on the Simbionix LAP Mentor virtual-reality (VR) simulator: ‘clipping and grasping’ and ‘cutting’. Group C (criterion-based) (N...
Automatic Optimization for Large-Scale Real-Time Coastal Water Simulation

Directory of Open Access Journals (Sweden)

Shunli Wang

2016-01-01

Full Text Available We introduce an automatic optimization approach for the simulation of large-scale coastal water. To solve the singular problem of water waves obtained with the traditional model, a hybrid deep-shallow-water model is estimated by using an automatic coupling algorithm. It can handle arbitrary water depth and different underwater terrain. As a certain feature of coastal terrain, coastline is detected with the collision detection technology. Then, unnecessary water grid cells are simplified by the automatic simplification algorithm according to the depth. Finally, the model is calculated on Central Processing Unit (CPU and the simulation is implemented on Graphics Processing Unit (GPU. We show the effectiveness of our method with various results which achieve real-time rendering on consumer-level computer.
Real-time Tsunami Inundation Prediction Using High Performance Computers

Science.gov (United States)

Oishi, Y.; Imamura, F.; Sugawara, D.

2014-12-01

Recently off-shore tsunami observation stations based on cabled ocean bottom pressure gauges are actively being deployed especially in Japan. These cabled systems are designed to provide real-time tsunami data before tsunamis reach coastlines for disaster mitigation purposes. To receive real benefits of these observations, real-time analysis techniques to make an effective use of these data are necessary. A representative study was made by Tsushima et al. (2009) that proposed a method to provide instant tsunami source prediction based on achieving tsunami waveform data. As time passes, the prediction is improved by using updated waveform data. After a tsunami source is predicted, tsunami waveforms are synthesized from pre-computed tsunami Green functions of linear long wave equations. Tsushima et al. (2014) updated the method by combining the tsunami waveform inversion with an instant inversion of coseismic crustal deformation and improved the prediction accuracy and speed in the early stages. For disaster mitigation purposes, real-time predictions of tsunami inundation are also important. In this study, we discuss the possibility of real-time tsunami inundation predictions, which require faster-than-real-time tsunami inundation simulation in addition to instant tsunami source analysis. Although the computational amount is large to solve non-linear shallow water equations for inundation predictions, it has become executable through the recent developments of high performance computing technologies. We conducted parallel computations of tsunami inundation and achieved 6.0 TFLOPS by using 19,000 CPU cores. We employed a leap-frog finite difference method with nested staggered grids of which resolution range from 405 m to 5 m. The resolution ratio of each nested domain was 1/3. Total number of grid points were 13 million, and the time step was 0.1 seconds. Tsunami sources of 2011 Tohoku-oki earthquake were tested. The inundation prediction up to 2 hours after the
Toward real-time diffuse optical tomography: accelerating light propagation modeling employing parallel computing on GPU and CPU.

Science.gov (United States)

Doulgerakis, Matthaios; Eggebrecht, Adam; Wojtkiewicz, Stanislaw; Culver, Joseph; Dehghani, Hamid

2017-12-01

Parameter recovery in diffuse optical tomography is a computationally expensive algorithm, especially when used for large and complex volumes, as in the case of human brain functional imaging. The modeling of light propagation, also known as the forward problem, is the computational bottleneck of the recovery algorithm, whereby the lack of a real-time solution is impeding practical and clinical applications. The objective of this work is the acceleration of the forward model, within a diffusion approximation-based finite-element modeling framework, employing parallelization to expedite the calculation of light propagation in realistic adult head models. The proposed methodology is applicable for modeling both continuous wave and frequency-domain systems with the results demonstrating a 10-fold speed increase when GPU architectures are available, while maintaining high accuracy. It is shown that, for a very high-resolution finite-element model of the adult human head with ∼600,000 nodes, consisting of heterogeneous layers, light propagation can be calculated at ∼0.25 s/excitation source. (2017) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE).
Application of Stochastic Automata Networks for Creation of Continuous Time Markov Chain Models of Voltage Gating of Gap Junction Channels

Directory of Open Access Journals (Sweden)

Mindaugas Snipas

2015-01-01

Full Text Available The primary goal of this work was to study advantages of numerical methods used for the creation of continuous time Markov chain models (CTMC of voltage gating of gap junction (GJ channels composed of connexin protein. This task was accomplished by describing gating of GJs using the formalism of the stochastic automata networks (SANs, which allowed for very efficient building and storing of infinitesimal generator of the CTMC that allowed to produce matrices of the models containing a distinct block structure. All of that allowed us to develop efficient numerical methods for a steady-state solution of CTMC models. This allowed us to accelerate CPU time, which is necessary to solve CTMC models, ∼20 times.
Application of Stochastic Automata Networks for Creation of Continuous Time Markov Chain Models of Voltage Gating of Gap Junction Channels

Science.gov (United States)

Pranevicius, Henrikas; Pranevicius, Mindaugas; Pranevicius, Osvaldas; Bukauskas, Feliksas F.

2015-01-01

The primary goal of this work was to study advantages of numerical methods used for the creation of continuous time Markov chain models (CTMC) of voltage gating of gap junction (GJ) channels composed of connexin protein. This task was accomplished by describing gating of GJs using the formalism of the stochastic automata networks (SANs), which allowed for very efficient building and storing of infinitesimal generator of the CTMC that allowed to produce matrices of the models containing a distinct block structure. All of that allowed us to develop efficient numerical methods for a steady-state solution of CTMC models. This allowed us to accelerate CPU time, which is necessary to solve CTMC models, ∼20 times. PMID:25705700
Use of general purpose graphics processing units with MODFLOW

Science.gov (United States)

Hughes, Joseph D.; White, Jeremy T.

2013-01-01

To evaluate the use of general-purpose graphics processing units (GPGPUs) to improve the performance of MODFLOW, an unstructured preconditioned conjugate gradient (UPCG) solver has been developed. The UPCG solver uses a compressed sparse row storage scheme and includes Jacobi, zero fill-in incomplete, and modified-incomplete lower-upper (LU) factorization, and generalized least-squares polynomial preconditioners. The UPCG solver also includes options for sequential and parallel solution on the central processing unit (CPU) using OpenMP. For simulations utilizing the GPGPU, all basic linear algebra operations are performed on the GPGPU; memory copies between the central processing unit CPU and GPCPU occur prior to the first iteration of the UPCG solver and after satisfying head and flow criteria or exceeding a maximum number of iterations. The efficiency of the UPCG solver for GPGPU and CPU solutions is benchmarked using simulations of a synthetic, heterogeneous unconfined aquifer with tens of thousands to millions of active grid cells. Testing indicates GPGPU speedups on the order of 2 to 8, relative to the standard MODFLOW preconditioned conjugate gradient (PCG) solver, can be achieved when (1) memory copies between the CPU and GPGPU are optimized, (2) the percentage of time performing memory copies between the CPU and GPGPU is small relative to the calculation time, (3) high-performance GPGPU cards are utilized, and (4) CPU-GPGPU combinations are used to execute sequential operations that are difficult to parallelize. Furthermore, UPCG solver testing indicates GPGPU speedups exceed parallel CPU speedups achieved using OpenMP on multicore CPUs for preconditioners that can be easily parallelized.
Symplectic multi-particle tracking on GPUs

Science.gov (United States)

Liu, Zhicong; Qiang, Ji

2018-05-01

A symplectic multi-particle tracking model is implemented on the Graphic Processing Units (GPUs) using the Compute Unified Device Architecture (CUDA) language. The symplectic tracking model can preserve phase space structure and reduce non-physical effects in long term simulation, which is important for beam property evaluation in particle accelerators. Though this model is computationally expensive, it is very suitable for parallelization and can be accelerated significantly by using GPUs. In this paper, we optimized the implementation of the symplectic tracking model on both single GPU and multiple GPUs. Using a single GPU processor, the code achieves a factor of 2-10 speedup for a range of problem sizes compared with the time on a single state-of-the-art Central Processing Unit (CPU) node with similar power consumption and semiconductor technology. It also shows good scalability on a multi-GPU cluster at Oak Ridge Leadership Computing Facility. In an application to beam dynamics simulation, the GPU implementation helps save more than a factor of two total computing time in comparison to the CPU implementation.
Polydrug use among college students in Brazil: a nationwide survey.

Science.gov (United States)

Oliveira, Lúcio Garcia de; Alberghini, Denis Guilherme; Santos, Bernardo dos; Andrade, Arthur Guerra de

2013-01-01

To estimate the frequency of polydrug use (alcohol and illicit drugs) among college students and its associations with gender and age group. A nationwide sample of 12,544 college students was asked to complete a questionnaire on their use of drugs according to three time parameters (lifetime, past 12 months, and last 30 days). The co-use of drugs was investigated as concurrent polydrug use (CPU) and simultaneous polydrug use (SPU), a subcategory of CPU that involves the use of drugs at the same time or in close temporal proximity. Almost 26% of college students reported having engaged in CPU in the past 12 months. Among these students, 37% had engaged in SPU. In the past 30 days, 17% college students had engaged in CPU. Among these, 35% had engaged in SPU. Marijuana was the illicit drug mostly frequently used with alcohol (either as CPU or SPU), especially among males. Among females, the most commonly reported combination was alcohol and prescribed medications. A high proportion of Brazilian college students may be engaging in polydrug use. College administrators should keep themselves informed to be able to identify such use and to develop educational interventions to prevent such behavior.
An Implementation of Parallel and Networked Computing Schemes for the Real-Time Image Reconstruction Based on Electrical Tomography

International Nuclear Information System (INIS)

Park, Sook Hee

2001-02-01

This thesis implements and analyzes the parallel and networked computing libraries based on the multiprocessor computer architecture as well as networked computers, aiming at improving the computation speed of ET(Electrical Tomography) system which requires enormous CPU time in reconstructing the unknown internal state of the target object. As an instance of the typical tomography technology, ET partitions the cross-section of the target object into the tiny elements and calculates the resistivity of them with signal values measured at the boundary electrodes surrounding the surface of the object after injecting the predetermined current pattern through the object. The number of elements is determined considering the trade-off between the accuracy of the reconstructed image and the computation time. As the elements become more finer, the number of element increases, and the system can get the better image. However, the reconstruction time increases polynomially with the number of partitioned elements since the procedure consists of a number of time consuming matrix operations such as multiplication, inverse, pseudo inverse, Jacobian and so on. Consequently, the demand for improving computation speed via multiple processor grows indispensably. Moreover, currently released PCs can be stuffed with up to 4 CPUs interconnected to the shared memory while some operating systems enable the application process to benefit from such computer by allocating the threaded job to each CPU, resulting in concurrent processing. In addition, a networked computing or cluster computing environment is commonly available to almost every computer which contains communication protocol and is connected to local or global network. After partitioning the given job(numerical operation), each CPU or computer calculates the partial result independently, and the results are merged via common memory to produce the final result. It is desirable to adopt the commonly used library such as Matlab to
Multiresource allocation and scheduling for periodic soft real-time applications

Science.gov (United States)

Gopalan, Kartik; Chiueh, Tzi-cker

2001-12-01

Real-time applications that utilize multiple system resources, such as CPU, disks, and network links, require coordinated scheduling of these resources in order to meet their end-to-end performance requirements. Most state-of-the-art operating systems support independent resource allocation and deadline-driven scheduling but lack coordination among multiple heterogeneous resources. This paper describes the design and implementation of an Integrated Real-time Resource Scheduler (IRS) that performs coordinated allocation and scheduling of multiple heterogeneous resources on the same machine for periodic soft real-time application. The principal feature of IRS is a heuristic multi-resource allocation algorithm that reserves multiple resources for real-time applications in a manner that can maximize the number of applications admitted into the system in the long run. At run-time, a global scheduler dispatches the tasks of the soft real-time application to individual resource schedulers according to the precedence constraints between tasks. The individual resource schedulers, which could be any deadline based schedulers, can make scheduling decisions locally and yet collectively satisfy a real-time application's performance requirements. The tightness of overall timing guarantees is ultimately determined by the properties of individual resource schedulers. However, IRS maximizes overall system resource utilization efficiency by coordinating deadline assignment across multiple tasks in a soft real-time application.
Design of a memory-access controller with 3.71-times-enhanced energy efficiency for Internet-of-Things-oriented nonvolatile microcontroller unit

Science.gov (United States)

Natsui, Masanori; Hanyu, Takahiro

2018-04-01

In realizing a nonvolatile microcontroller unit (MCU) for sensor nodes in Internet-of-Things (IoT) applications, it is important to solve the data-transfer bottleneck between the central processing unit (CPU) and the nonvolatile memory constituting the MCU. As one circuit-oriented approach to solving this problem, we propose a memory access minimization technique for magnetoresistive-random-access-memory (MRAM)-embedded nonvolatile MCUs. In addition to multiplexing and prefetching of memory access, the proposed technique realizes efficient instruction fetch by eliminating redundant memory access while considering the code length of the instruction to be fetched and the transition of the memory address to be accessed. As a result, the performance of the MCU can be improved while relaxing the performance requirement for the embedded MRAM, and compact and low-power implementation can be performed as compared with the conventional cache-based one. Through the evaluation using a system consisting of a general purpose 32-bit CPU and embedded MRAM, it is demonstrated that the proposed technique increases the peak efficiency of the system up to 3.71 times, while a 2.29-fold area reduction is achieved compared with the cache-based one.
On developing B-spline registration algorithms for multi-core processors

International Nuclear Information System (INIS)

Shackleford, J A; Kandasamy, N; Sharp, G C

2010-01-01

Spline-based deformable registration methods are quite popular within the medical-imaging community due to their flexibility and robustness. However, they require a large amount of computing time to obtain adequate results. This paper makes two contributions towards accelerating B-spline-based registration. First, we propose a grid-alignment scheme and associated data structures that greatly reduce the complexity of the registration algorithm. Based on this grid-alignment scheme, we then develop highly data parallel designs for B-spline registration within the stream-processing model, suitable for implementation on multi-core processors such as graphics processing units (GPUs). Particular attention is focused on an optimal method for performing analytic gradient computations in a data parallel fashion. CPU and GPU versions are validated for execution time and registration quality. Performance results on large images show that our GPU algorithm achieves a speedup of 15 times over the single-threaded CPU implementation whereas our multi-core CPU algorithm achieves a speedup of 8 times over the single-threaded implementation. The CPU and GPU versions achieve near-identical registration quality in terms of RMS differences between the generated vector fields.
Time-based Cellular Automaton track finder for the CBM experiment

International Nuclear Information System (INIS)

Akishina, Valentina; Kisel, Ivan

2015-01-01

The future heavy-ion experiment CBM (FAIR/GSI, Darmstadt, Germany) will focus on the measurement of rare probes at interaction rates up to 10 MHz with data flow of up to 1 TB/s. The beam will provide free stream of particles without bunch structure. That requires full online event reconstruction and selection not only in space, but also in time, so- called 4D event building and selection. This is a task of the First-Level Event Selection (FLES) package. The FLES reconstruction and selection package consists of several modules: track finding, track fitting, short-lived particles finding, event building and event selection. The input data are distributed within the FLES farm in a form of so-called time-slices, in which time length is proportional to a compute power of a processing node. A time-slice is reconstructed in parallel between cores within a CPU, thus minimising communication between CPUs. After all tracks of the whole time-slice are found and fitted, they are collected into clusters of tracks originated from common primary vertices. After that short-lived particles are found and the full event building process is finished. (paper)
A Fast and Accurate Algorithm for l1 Minimization Problems in Compressive Sampling (Preprint)

Science.gov (United States)

2013-01-22

However, updating uk+1 via the formulation of Step 2 in Algorithm 1 can be implemented through the use of the component-wise Gauss - Seidel iteration which...may accelerate the rate of convergence of the algorithm and therefore reduce the total CPU-time consumed. The efficiency of component-wise Gauss - Seidel ...Micchelli, L. Shen, and Y. Xu, A proximity algorithm accelerated by Gauss - Seidel iterations for L1/TV denoising models, Inverse Problems, 28 (2012), p
Three-dimensional discrete ordinates reactor assembly calculations on GPUs

Energy Technology Data Exchange (ETDEWEB)

Evans, Thomas M [ORNL; Joubert, Wayne [ORNL; Hamilton, Steven P [ORNL; Johnson, Seth R [ORNL; Turner, John A [ORNL; Davidson, Gregory G [ORNL; Pandya, Tara M [ORNL

2015-01-01

In this paper we describe and demonstrate a discrete ordinates sweep algorithm on GPUs. This sweep algorithm is nested within a multilevel comunication-based decomposition based on energy. We demonstrated the effectiveness of this algorithm on detailed three-dimensional critical experiments and PWR lattice problems. For these problems we show improvement factors of 4 6 over conventional communication-based, CPU-only sweeps. These sweep kernel speedups resulted in a factor of 2 total time-to-solution improvement.
Molecular Dynamics Simulations and Kinetic Measurements to Estimate and Predict Protein-Ligand Residence Times.

Science.gov (United States)

Mollica, Luca; Theret, Isabelle; Antoine, Mathias; Perron-Sierra, Françoise; Charton, Yves; Fourquez, Jean-Marie; Wierzbicki, Michel; Boutin, Jean A; Ferry, Gilles; Decherchi, Sergio; Bottegoni, Giovanni; Ducrot, Pierre; Cavalli, Andrea

2016-08-11

Ligand-target residence time is emerging as a key drug discovery parameter because it can reliably predict drug efficacy in vivo. Experimental approaches to binding and unbinding kinetics are nowadays available, but we still lack reliable computational tools for predicting kinetics and residence time. Most attempts have been based on brute-force molecular dynamics (MD) simulations, which are CPU-demanding and not yet particularly accurate. We recently reported a new scaled-MD-based protocol, which showed potential for residence time prediction in drug discovery. Here, we further challenged our procedure's predictive ability by applying our methodology to a series of glucokinase activators that could be useful for treating type 2 diabetes mellitus. We combined scaled MD with experimental kinetics measurements and X-ray crystallography, promptly checking the protocol's reliability by directly comparing computational predictions and experimental measures. The good agreement highlights the potential of our scaled-MD-based approach as an innovative method for computationally estimating and predicting drug residence times.
SAFCM: A Security-Aware Feedback Control Mechanism for Distributed Real-Time Embedded Systems

DEFF Research Database (Denmark)

Ma, Yue; Jiang, Wei; Sang, Nan

2012-01-01

Distributed Real-time Embedded (DRE) systems are facing great challenges in networked, unpredictable and especially unsecured environments. In such systems, there is a strong need to enforce security on distributed computing nodes in order to guard against potential threats, while satisfying......-time systems, a multi-input multi-output feedback loop is designed and a model predictive controller is deployed based on an equation model that describes the dynamic behavior of the DRE systems. This control loop uses security level scaling to globally control the CPU utilization and security performance...... for the whole system. We propose a "security level" metric based on an evolution of cryptography algorithms used in embedded systems. Experimental results demonstrate that SAFCM not only has the excellent adaptivity compared to open-loop mechanism, but also has a better overall performance than PID control...

Reconstruction of MODIS total suspended matter time series maps by DINEOF and validation with autonomous platform data

Science.gov (United States)

Nechad, Bouchra; Alvera-Azcaràte, Aida; Ruddick, Kevin; Greenwood, Naomi

2011-08-01

In situ measurements of total suspended matter (TSM) over the period 2003-2006, collected with two autonomous platforms from the Centre for Environment, Fisheries and Aquatic Sciences (Cefas) measuring the optical backscatter (OBS) in the southern North Sea, are used to assess the accuracy of TSM time series extracted from satellite data. Since there are gaps in the remote sensing (RS) data, due mainly to cloud cover, the Data Interpolating Empirical Orthogonal Functions (DINEOF) is used to fill in the TSM time series and build a continuous daily "recoloured" dataset. The RS datasets consist of TSM maps derived from MODIS imagery using the bio-optical model of Nechad et al. (Rem Sens Environ 114: 854-866, 2010). In this study, the DINEOF time series are compared to the in situ OBS measured in moderately to very turbid waters respectively in West Gabbard and Warp Anchorage, in the southern North Sea. The discrepancies between instantaneous RS, DINEOF-filled RS data and Cefas data are analysed in terms of TSM algorithm uncertainties, space-time variability and DINEOF reconstruction uncertainty.
Real-time Vision using FPGAs, GPUs and Multi-core CPUs

DEFF Research Database (Denmark)

Kjær-Nielsen, Anders

the introduction and evolution of a wide variety of powerful hardware architectures have made the developed theory more applicable in performance demanding and real-time applications. Three different architectures have dominated the field due to their parallel capabilities that are often desired when dealing...... processors in the vision community. The introduction of programming languages like CUDA from NVIDIA has made it easier to utilize the high parallel processing powers of the GPU for general purpose computing and thereby realistic to use based on the effort involved with development. The increased clock...... frequencies and number of Configurable Logic Blocks (CLBs) of the FPGAs, as well as the introduction of dedicated hardware implementations like multipliers, Digital Signal Processing (DSP) slices and even embedded hard-core CPU implementations have made them more applicable for general purpose computing...
Raspberry Pi Eclipse Experiments

Science.gov (United States)

Chizek Frouard, Malynda

2018-01-01

The 21 August 2017 solar eclipse was an excellent opportunity for electronics and science enthusiasts to collect data during a fascinating phenomenon. With my recent personal interest in Raspberry Pis, I thought measuring how much the temperature and illuminance changes during a total solar eclipse would be fun and informational.Previous observations of total solar eclipses have remarked on the temperature drop during totality. Illuminance (ambient light) varies over 7 orders of magnitude from day to night and is highly dependent on relative positions of Sun, Earth, and Moon. I wondered whether totality was really as dark as night.Using a Raspberry Pi Zero W, a Pimoroni Enviro pHAT, and a portable USB charger, I collected environmental temperature; CPU temperature (because the environmental temperature sensor sat very near the CPU on the Raspberry Pi); barometric pressure; ambient light; R, G, and B colors; and x, y, and z acceleration (for marking times when I moved the sensor) data at a ~15 second cadence starting at about 5 am until 1:30 pm from my eclipse observation site in Glendo, WY. Totality occurred from 11:45 to 11:47 am, lasting about 2 minutes and 30 seconds.The Raspberry Pi recorded a >20 degree F drop in temperature during the eclipse, and the illuminance during totality was equivalent to twilight measurements earlier in the day. A limitation in the ambient light sensor prevented accurate measurements of broad daylight and most of the partial phase of the eclipse, but an alternate ambient light sensor combined with the Raspberry Pi setup would make this a cost-efficient set-up for illuminance studies.I will present data from the ambient light sensor, temperature sensor, and color sensor, noting caveats from my experiments, lessons learned for next time, and suggestions for anyone who wants to perform similar experiments for themselves or with a classroom.
RURAL EXTENSION EPISTEMOLOGY AND THE TIME OF TOTAL EXTENSION

Directory of Open Access Journals (Sweden)

Silvio Calgaro Neto

2016-09-01

Full Text Available This article is dedicated to explore the field of knowledge related to rural extension. In general, a three complementary perspective is used as theoretical strategy to present this epistemological study. The first perspective, seeks to accomplish a brief archeology of rural extension, identifying the remarkable historical passages. At the second, we look to some theoretical models through the modern epistemological platform. Finally, the third perspective, aims to present a methodological proposal that contemplate this epistemic characteristics, relating with the contemporary transformations observed in the knowledge construction and technological transference for a rural development. Keywords: Total institutions. University.
Exact diagonalization of quantum lattice models on coprocessors

Science.gov (United States)

Siro, T.; Harju, A.

2016-10-01

We implement the Lanczos algorithm on an Intel Xeon Phi coprocessor and compare its performance to a multi-core Intel Xeon CPU and an NVIDIA graphics processor. The Xeon and the Xeon Phi are parallelized with OpenMP and the graphics processor is programmed with CUDA. The performance is evaluated by measuring the execution time of a single step in the Lanczos algorithm. We study two quantum lattice models with different particle numbers, and conclude that for small systems, the multi-core CPU is the fastest platform, while for large systems, the graphics processor is the clear winner, reaching speedups of up to 7.6 compared to the CPU. The Xeon Phi outperforms the CPU with sufficiently large particle number, reaching a speedup of 2.5.
Gender differences in total cholesterol levels in patients with acute heart failure and its importance for short and long time prognosis.

Science.gov (United States)

Spinarova, Lenka; Spinar, Jindrich; Vitovec, Jiri; Linhart, Ales; Widimsky, Petr; Fedorco, Marian; Malek, Filip; Cihalik, Cestmir; Miklik, Roman; Dusek, Ladislav; Zidova, Klaudia; Jarkovsky, Jiri; Littnerova, Simona; Parenica, Jiri

2012-03-01

The purpose of this study was to evaluate whether there are gender differences in total cholesterol levels in patients with acute heart failure and if there is an association of this parameter with short and long time mortality. The AHEAD MAIN registry is a database conducted in 7 university hospitals, all with 24 h cath lab service, in 4 cities in the Czech Republic. The database included 4 153 patients hospitalised for acute heart failure in the period 2006-2009. 2 384 patients had a complete record of their total cholesterol levels. 946 females and 1437 males were included in this analysis. According to the admission total cholesterol levels, patients were divided into 5 groups: 6.0 mmol/l (group E). The median total cholesterol levels were 4.24 in males and 4.60 in females (Ppercentage of women with total cholesterol levels above 6 mmol/l and lower percentage in the group below 4.5 mmol/l than in men. In all, total cholesterol categories women were older than men. Total cholesterol levels are important for in- hospital mortality and long term survival of patients admitted for acute heart failure.
TAGUCHI METHOD FOR THREE-STAGE ASSEMBLY FLOW SHOP SCHEDULING PROBLEM WITH BLOCKING AND SEQUENCE-DEPENDENT SET UP TIMES

Directory of Open Access Journals (Sweden)

AREF MALEKI-DARONKOLAEI

2013-10-01

Full Text Available This article considers a three-stage assembly flowshop scheduling problem minimizing the weighted sum of mean completion time and makespan with sequence-dependent setup times at the first stage and blocking times between each stage. To tackle such an NP-hard, two meta-heuristic algorithms are presented. The novelty of our approach is to develop a variable neighborhood search algorithm (VNS and a well-known simulated annealing (SA for the problem. Furthermore, to enhance the performance of the (SA, its parameters are optimized by the use of Taguchi method, but to setting parameters of VNS just one parameter has been used without Taguchi. The computational results show that the proposed VNS is better in mean and standard deviation for all sizes of the problem than SA, but on the contrary about CPU Time SA outperforms VNS.
Estimating time-based instantaneous total mortality rate based on the age-structured abundance index

Science.gov (United States)

Wang, Yingbin; Jiao, Yan

2015-05-01

The instantaneous total mortality rate ( Z) of a fish population is one of the important parameters in fisheries stock assessment. The estimation of Z is crucial to fish population dynamics analysis, abundance and catch forecast, and fisheries management. A catch curve-based method for estimating time-based Z and its change trend from catch per unit effort (CPUE) data of multiple cohorts is developed. Unlike the traditional catch-curve method, the method developed here does not need the assumption of constant Z throughout the time, but the Z values in n continuous years are assumed constant, and then the Z values in different n continuous years are estimated using the age-based CPUE data within these years. The results of the simulation analyses show that the trends of the estimated time-based Z are consistent with the trends of the true Z, and the estimated rates of change from this approach are close to the true change rates (the relative differences between the change rates of the estimated Z and the true Z are smaller than 10%). Variations of both Z and recruitment can affect the estimates of Z value and the trend of Z. The most appropriate value of n can be different given the effects of different factors. Therefore, the appropriate value of n for different fisheries should be determined through a simulation analysis as we demonstrated in this study. Further analyses suggested that selectivity and age estimation are also two factors that can affect the estimated Z values if there is error in either of them, but the estimated change rates of Z are still close to the true change rates. We also applied this approach to the Atlantic cod ( Gadus morhua) fishery of eastern Newfoundland and Labrador from 1983 to 1997, and obtained reasonable estimates of time-based Z.
A rapid infusion protocol is safe for total dose iron polymaltose: time for change.

Science.gov (United States)

Garg, M; Morrison, G; Friedman, A; Lau, A; Lau, D; Gibson, P R

2011-07-01

Intravenous correction of iron deficiency by total dose iron polymaltose is inexpensive and safe, but current protocols entail prolonged administration over more than 4 h. This results in reduced patient acceptance, and hospital resource strain. We aimed to assess prospectively the safety of a rapid intravenous protocol and compare this with historical controls. Consecutive patients in whom intravenous iron replacement was indicated were invited to have up to 1.5 g iron polymaltose by a 58-min infusion protocol after an initial 15-min test dose without pre-medication. Infusion-related adverse events (AE) and delayed AE over the ensuing 5 days were also prospectively documented and graded as mild, moderate or severe. One hundred patients, 63 female, mean age 54 (range 18-85) years were studied. Thirty-four infusion-related AE to iron polymaltose occurred in a total of 24 patients--25 mild, 8 moderate and 1 severe; higher than previously reported for a slow protocol iron infusion. Thirty-one delayed AE occurred in 26 patients--26 mild, 3 moderate and 2 severe; similar to previously reported. All but five patients reported they would prefer iron replacement through the rapid protocol again. The presence of inflammatory bowel disease (IBD) predicted infusion-related reactions (54% vs 14% without IBD, P cost, resource utilization and time benefits for the patient and hospital system. © 2011 The Authors. Internal Medicine Journal © 2011 Royal Australasian College of Physicians.
FAMOUS, faster: using parallel computing techniques to accelerate the FAMOUS/HadCM3 climate model with a focus on the radiative transfer algorithm

Directory of Open Access Journals (Sweden)

P. Hanappe

2011-09-01

Full Text Available We have optimised the atmospheric radiation algorithm of the FAMOUS climate model on several hardware platforms. The optimisation involved translating the Fortran code to C and restructuring the algorithm around the computation of a single air column. Instead of the existing MPI-based domain decomposition, we used a task queue and a thread pool to schedule the computation of individual columns on the available processors. Finally, four air columns are packed together in a single data structure and computed simultaneously using Single Instruction Multiple Data operations.

The modified algorithm runs more than 50 times faster on the CELL's Synergistic Processing Element than on its main PowerPC processing element. On Intel-compatible processors, the new radiation code runs 4 times faster. On the tested graphics processor, using OpenCL, we find a speed-up of more than 2.5 times as compared to the original code on the main CPU. Because the radiation code takes more than 60 % of the total CPU time, FAMOUS executes more than twice as fast. Our version of the algorithm returns bit-wise identical results, which demonstrates the robustness of our approach. We estimate that this project required around two and a half man-years of work.
Enhanced responses to tumor immunization following total body irradiation are time-dependent.

Directory of Open Access Journals (Sweden)

Adi Diab

Full Text Available The development of successful cancer vaccines is contingent on the ability to induce effective and persistent anti-tumor immunity against self-antigens that do not typically elicit immune responses. In this study, we examine the effects of a non-myeloablative dose of total body irradiation on the ability of tumor-naïve mice to respond to DNA vaccines against melanoma. We demonstrate that irradiation followed by lymphocyte infusion results in a dramatic increase in responsiveness to tumor vaccination, with augmentation of T cell responses to tumor antigens and tumor eradication. In irradiated mice, infused CD8(+ T cells expand in an environment that is relatively depleted in regulatory T cells, and this correlates with improved CD8(+ T cell functionality. We also observe an increase in the frequency of dendritic cells displaying an activated phenotype within lymphoid organs in the first 24 hours after irradiation. Intriguingly, both the relative decrease in regulatory T cells and increase in activated dendritic cells correspond with a brief window of augmented responsiveness to immunization. After this 24 hour window, the numbers of dendritic cells decline, as does the ability of mice to respond to immunizations. When immunizations are initiated within the period of augmented dendritic cell activation, mice develop anti-tumor responses that show increased durability as well as magnitude, and this approach leads to improved survival in experiments with mice bearing established tumors as well as in a spontaneous melanoma model. We conclude that irradiation can produce potent immune adjuvant effects independent of its ability to induce tumor ablation, and that the timing of immunization and lymphocyte infusion in the irradiated host are crucial for generating optimal anti-tumor immunity. Clinical strategies using these approaches must therefore optimize such parameters, as the correct timing of infusion and vaccination may mean the difference
Total quality management in orthodontic practice.

Science.gov (United States)

Atta, A E

1999-12-01

Quality is the buzz word for the new Millennium. Patients demand it, and we must serve it. Yet one must identify it. Quality is not imaging or public relations; it is a business process. This short article presents quality as a balance of three critical notions: core clinical competence, perceived values that our patients seek and want, and the cost of quality. Customer satisfaction is a variable that must be identified for each practice. In my practice, patients perceive quality as communication and time, be it treatment or waiting time. Time is a value and cost that must be managed effectively. Total quality management is a business function; it involves diagnosis, design, implementation, and measurement of the process, the people, and the service. Kazien is a function that reduces value services, eliminates waste, and manages time and cost in the process. Total quality management is a total commitment for continuous improvement.
Stability and Scalability of the CMS Global Pool: Pushing HTCondor and GlideinWMS to New Limits

Energy Technology Data Exchange (ETDEWEB)

Balcas, J. [Caltech; Bockelman, B. [Nebraska U.; Hufnagel, D. [Fermilab; Hurtado Anampa, K. [Notre Dame U.; Aftab Khan, F. [NCP, Islamabad; Larson, K. [Fermilab; Letts, J. [UC, San Diego; Marra da Silva, J. [Sao Paulo, IFT; Mascheroni, M. [Fermilab; Mason, D. [Fermilab; Perez-Calero Yzquierdo, A. [Madrid, CIEMAT; Tiradani, A. [Fermilab

2017-11-22

The CMS Global Pool, based on HTCondor and glideinWMS, is the main computing resource provisioning system for all CMS workflows, including analysis, Monte Carlo production, and detector data reprocessing activities. The total resources at Tier-1 and Tier-2 grid sites pledged to CMS exceed 100,000 CPU cores, while another 50,000 to 100,000 CPU cores are available opportunistically, pushing the needs of the Global Pool to higher scales each year. These resources are becoming more diverse in their accessibility and configuration over time. Furthermore, the challenge of stably running at higher and higher scales while introducing new modes of operation such as multi-core pilots, as well as the chaotic nature of physics analysis workflows, places huge strains on the submission infrastructure. This paper details some of the most important challenges to scalability and stability that the CMS Global Pool has faced since the beginning of the LHC Run II and how they were overcome.
A real-time monitoring and assessment method for calculation of total amounts of indoor air pollutants emitted in subway stations.

Science.gov (United States)

Oh, TaeSeok; Kim, MinJeong; Lim, JungJin; Kang, OnYu; Shetty, K Vidya; SankaraRao, B; Yoo, ChangKyoo; Park, Jae Hyung; Kim, Jeong Tai

2012-05-01

Subway systems are considered as main public transportation facility in developed countries. Time spent by people in indoors, such as underground spaces, subway stations, and indoor buildings, has gradually increased in the recent past. Especially, operators or old persons who stay in indoor environments more than 15 hr per day usually influenced a greater extent by indoor air pollutants. Hence, regulations on indoor air pollutants are needed to ensure good health of people. Therefore, in this study, a new cumulative calculation method for the estimation of total amounts of indoor air pollutants emitted inside the subway station is proposed by taking cumulative amounts of indoor air pollutants based on integration concept. Minimum concentration of individual air pollutants which naturally exist in indoor space is referred as base concentration of air pollutants and can be found from the data collected. After subtracting the value of base concentration from data point of each data set of indoor air pollutant, the primary quantity of emitted air pollutant is calculated. After integration is carried out with these values, adding the base concentration to the integration quantity gives the total amount of indoor air pollutant emitted. Moreover the values of new index for cumulative indoor air quality obtained for 1 day are calculated using the values of cumulative air quality index (CAI). Cumulative comprehensive indoor air quality index (CCIAI) is also proposed to compare the values of cumulative concentrations of indoor air pollutants. From the results, it is clear that the cumulative assessment approach of indoor air quality (IAQ) is useful for monitoring the values of total amounts of indoor air pollutants emitted, in case of exposure to indoor air pollutants for a long time. Also, the values of CCIAI are influenced more by the values of concentration of NO2, which is released due to the use of air conditioners and combustion of the fuel. The results obtained in
First evaluation of the CPU, GPGPU and MIC architectures for real time particle tracking based on Hough transform at the LHC

International Nuclear Information System (INIS)

V Halyo, V Halyo; LeGresley, P; Lujan, P; Karpusenko, V; Vladimirov, A

2014-01-01

Recent innovations focused around parallel processing, either through systems containing multiple processors or processors containing multiple cores, hold great promise for enhancing the performance of the trigger at the LHC and extending its physics program. The flexibility of the CMS/ATLAS trigger system allows for easy integration of computational accelerators, such as NVIDIA's Tesla Graphics Processing Unit (GPU) or Intel's Xeon Phi, in the High Level Trigger. These accelerators have the potential to provide faster or more energy efficient event selection, thus opening up possibilities for new complex triggers that were not previously feasible. At the same time, it is crucial to explore the performance limits achievable on the latest generation multicore CPUs with the use of the best software optimization methods. In this article, a new tracking algorithm based on the Hough transform will be evaluated for the first time on multi-core Intel i7-3770 and Intel Xeon E5-2697v2 CPUs, an NVIDIA Tesla K20c GPU, and an Intel Xeon Phi 7120 coprocessor. Preliminary time performance will be presented
Comparison of two accelerators for Monte Carlo radiation transport calculations, Nvidia Tesla M2090 GPU and Intel Xeon Phi 5110p coprocessor: A case study for X-ray CT imaging dose calculation

International Nuclear Information System (INIS)

Liu, T.; Xu, X.G.; Carothers, C.D.

2015-01-01

Highlights: • A new Monte Carlo photon transport code ARCHER-CT for CT dose calculations is developed to execute on the GPU and coprocessor. • ARCHER-CT is verified against MCNP. • The GPU code on an Nvidia M2090 GPU is 5.15–5.81 times faster than the parallel CPU code on an Intel X5650 6-core CPU. • The coprocessor code on an Intel Xeon Phi 5110p coprocessor is 3.30–3.38 times faster than the CPU code. - Abstract: Hardware accelerators are currently becoming increasingly important in boosting high performance computing systems. In this study, we tested the performance of two accelerator models, Nvidia Tesla M2090 GPU and Intel Xeon Phi 5110p coprocessor, using a new Monte Carlo photon transport package called ARCHER-CT we have developed for fast CT imaging dose calculation. The package contains three components, ARCHER-CT CPU , ARCHER-CT GPU and ARCHER-CT COP designed to be run on the multi-core CPU, GPU and coprocessor architectures respectively. A detailed GE LightSpeed Multi-Detector Computed Tomography (MDCT) scanner model and a family of voxel patient phantoms are included in the code to calculate absorbed dose to radiosensitive organs under user-specified scan protocols. The results from ARCHER agree well with those from the production code Monte Carlo N-Particle eXtended (MCNPX). It is found that all the code components are significantly faster than the parallel MCNPX run on 12 MPI processes, and that the GPU and coprocessor codes are 5.15–5.81 and 3.30–3.38 times faster than the parallel ARCHER-CT CPU , respectively. The M2090 GPU performs better than the 5110p coprocessor in our specific test. Besides, the heterogeneous computation mode in which the CPU and the hardware accelerator work concurrently can increase the overall performance by 13–18%
Application of GPU to computational multiphase fluid dynamics

International Nuclear Information System (INIS)

Nagatake, T; Kunugi, T

2010-01-01

The MARS (Multi-interfaces Advection and Reconstruction Solver) [1] is one of the surface volume tracking methods for multi-phase flows. Nowadays, the performance of GPU (Graphics Processing Unit) is much higher than the CPU (Central Processing Unit). In this study, the GPU was applied to the MARS in order to accelerate the computation of multi-phase flows (GPU-MARS), and the performance of the GPU-MARS was discussed. From the performance of the interface tracking method for the analyses of one-directional advection problem, it is found that the computing time of GPU(single GTX280) was around 4 times faster than that of the CPU (Xeon 5040, 4 threads parallelized). From the performance of Poisson Solver by using the algorithm developed in this study, it is found that the performance of the GPU showed around 30 times faster than that of the CPU. Finally, it is confirmed that the GPU showed the large acceleration of the fluid flow computation (GPU-MARS) compared to the CPU. However, it is also found that the double-precision computation of the GPU must perform with very high precision.
Electronic eye occluder with time-counting and reflection control

Science.gov (United States)

Karitans, V.; Ozolinsh, M.; Kuprisha, G.

2008-09-01

In pediatric ophthalmology 2 - 3 % of all the children are impacted by a visual pathology - amblyopia. It develops if a clear image isn't presented to the retina during an early stage of the development of the visual system. A common way of treating this pathology is to cover the better-seeing eye to force the "lazy" eye to learn seeing. However, children are often reluctant to wear such an occluder because they are ashamed or simply because they find it inconvenient. This fact requires to find a way how to track the regime of occlusion because results of occlusion is a hint that the actual regime of occlusion isn't that what the optometrist has recommended. We design an electronic eye occluder that allows to track the regime of eye occlusion. We employ real-time clock DS1302 providing time information from seconds to years. Data is stored in the internal memory of the CPU (EEPROM). The MCU (PIC16F676) switches on only if a mechanical switch is closed and temperature has reached a satisfactory level. The occlusion is registered between time moments when the infrared signal appeared and disappeared.
A combined time-of-flight and depth-of-interaction detector for total-body positron emission tomography

Energy Technology Data Exchange (ETDEWEB)

Berg, Eric, E-mail: eberg@ucdavis.edu; Roncali, Emilie; Du, Junwei; Cherry, Simon R. [Department of Biomedical Engineering, University of California, Davis, One Shields Avenue, Davis, California 95616 (United States); Kapusta, Maciej [Molecular Imaging, Siemens Healthcare, Knoxville, Tennessee 37932 (United States)

2016-02-15

Purpose: In support of a project to build a total-body PET scanner with an axial field-of-view of 2 m, the authors are developing simple, cost-effective block detectors with combined time-of-flight (TOF) and depth-of-interaction (DOI) capabilities. Methods: This work focuses on investigating the potential of phosphor-coated crystals with conventional PMT-based block detector readout to provide DOI information while preserving timing resolution. The authors explored a variety of phosphor-coating configurations with single crystals and crystal arrays. Several pulse shape discrimination techniques were investigated, including decay time, delayed charge integration (DCI), and average signal shapes. Results: Pulse shape discrimination based on DCI provided the lowest DOI positioning error: 2 mm DOI positioning error was obtained with single phosphor-coated crystals while 3–3.5 mm DOI error was measured with the block detector module. Minimal timing resolution degradation was observed with single phosphor-coated crystals compared to uncoated crystals, and a timing resolution of 442 ps was obtained with phosphor-coated crystals in the block detector compared to 404 ps without phosphor coating. Flood maps showed a slight degradation in crystal resolvability with phosphor-coated crystals; however, all crystals could be resolved. Energy resolution was degraded by 3%–7% with phosphor-coated crystals compared to uncoated crystals. Conclusions: These results demonstrate the feasibility of obtaining TOF–DOI capabilities with simple block detector readout using phosphor-coated crystals.
A combined time-of-flight and depth-of-interaction detector for total-body positron emission tomography

International Nuclear Information System (INIS)

Berg, Eric; Roncali, Emilie; Du, Junwei; Cherry, Simon R.; Kapusta, Maciej

2016-01-01

Purpose: In support of a project to build a total-body PET scanner with an axial field-of-view of 2 m, the authors are developing simple, cost-effective block detectors with combined time-of-flight (TOF) and depth-of-interaction (DOI) capabilities. Methods: This work focuses on investigating the potential of phosphor-coated crystals with conventional PMT-based block detector readout to provide DOI information while preserving timing resolution. The authors explored a variety of phosphor-coating configurations with single crystals and crystal arrays. Several pulse shape discrimination techniques were investigated, including decay time, delayed charge integration (DCI), and average signal shapes. Results: Pulse shape discrimination based on DCI provided the lowest DOI positioning error: 2 mm DOI positioning error was obtained with single phosphor-coated crystals while 3–3.5 mm DOI error was measured with the block detector module. Minimal timing resolution degradation was observed with single phosphor-coated crystals compared to uncoated crystals, and a timing resolution of 442 ps was obtained with phosphor-coated crystals in the block detector compared to 404 ps without phosphor coating. Flood maps showed a slight degradation in crystal resolvability with phosphor-coated crystals; however, all crystals could be resolved. Energy resolution was degraded by 3%–7% with phosphor-coated crystals compared to uncoated crystals. Conclusions: These results demonstrate the feasibility of obtaining TOF–DOI capabilities with simple block detector readout using phosphor-coated crystals

Real-time fusion of coronary CT angiography with x-ray fluoroscopy during chronic total occlusion PCI.

Science.gov (United States)

Ghoshhajra, Brian B; Takx, Richard A P; Stone, Luke L; Girard, Erin E; Brilakis, Emmanouil S; Lombardi, William L; Yeh, Robert W; Jaffer, Farouc A

2017-06-01

The purpose of this study was to demonstrate the feasibility of real-time fusion of coronary computed tomography angiography (CTA) centreline and arterial wall calcification with x-ray fluoroscopy during chronic total occlusion (CTO) percutaneous coronary intervention (PCI). Patients undergoing CTO PCI were prospectively enrolled. Pre-procedural CT scans were integrated with conventional coronary fluoroscopy using prototype software. We enrolled 24 patients who underwent CTO PCI using the prototype CT fusion software, and 24 consecutive CTO PCI patients without CT guidance served as a control group. Mean age was 66 ± 11 years, and 43/48 patients were men. Real-time CTA fusion during CTO PCI provided additional information regarding coronary arterial calcification and tortuosity that generated new insights into antegrade wiring, antegrade dissection/reentry, and retrograde wiring during CTO PCI. Overall CTO success rates and procedural outcomes remained similar between the two groups, despite a trend toward higher complexity in the fusion CTA group. This study demonstrates that real-time automated co-registration of coronary CTA centreline and calcification onto live fluoroscopic images is feasible and provides new insights into CTO PCI, and in particular, antegrade dissection reentry-based CTO PCI. • Real-time semi-automated fusion of CTA/fluoroscopy is feasible during CTO PCI. • CTA fusion data can be toggled on/off as desired during CTO PCI • Real-time CT calcium and centreline overlay could benefit antegrade dissection/reentry-based CTO PCI.
Real-time fusion of coronary CT angiography with X-ray fluoroscopy during chronic total occlusion PCI

Energy Technology Data Exchange (ETDEWEB)

Ghoshhajra, Brian B.; Takx, Richard A.P. [Harvard Medical School, Cardiac MR PET CT Program, Massachusetts General Hospital, Department of Radiology and Division of Cardiology, Boston, MA (United States); Stone, Luke L.; Yeh, Robert W.; Jaffer, Farouc A. [Harvard Medical School, Cardiac Cathetrization Laboratory, Cardiology Division, Massachusetts General Hospital, Boston, MA (United States); Girard, Erin E. [Siemens Healthcare, Princeton, NJ (United States); Brilakis, Emmanouil S. [Cardiology Division, Dallas VA Medical Center and UT Southwestern Medical Center, Dallas, TX (United States); Lombardi, William L. [University of Washington, Cardiology Division, Seattle, WA (United States)

2017-06-15

The purpose of this study was to demonstrate the feasibility of real-time fusion of coronary computed tomography angiography (CTA) centreline and arterial wall calcification with X-ray fluoroscopy during chronic total occlusion (CTO) percutaneous coronary intervention (PCI). Patients undergoing CTO PCI were prospectively enrolled. Pre-procedural CT scans were integrated with conventional coronary fluoroscopy using prototype software. We enrolled 24 patients who underwent CTO PCI using the prototype CT fusion software, and 24 consecutive CTO PCI patients without CT guidance served as a control group. Mean age was 66 ± 11 years, and 43/48 patients were men. Real-time CTA fusion during CTO PCI provided additional information regarding coronary arterial calcification and tortuosity that generated new insights into antegrade wiring, antegrade dissection/reentry, and retrograde wiring during CTO PCI. Overall CTO success rates and procedural outcomes remained similar between the two groups, despite a trend toward higher complexity in the fusion CTA group. This study demonstrates that real-time automated co-registration of coronary CTA centreline and calcification onto live fluoroscopic images is feasible and provides new insights into CTO PCI, and in particular, antegrade dissection reentry-based CTO PCI. (orig.)
Multithreaded real-time 3D image processing software architecture and implementation

Science.gov (United States)

Ramachandra, Vikas; Atanassov, Kalin; Aleksic, Milivoje; Goma, Sergio R.

2011-03-01

Recently, 3D displays and videos have generated a lot of interest in the consumer electronics industry. To make 3D capture and playback popular and practical, a user friendly playback interface is desirable. Towards this end, we built a real time software 3D video player. The 3D video player displays user captured 3D videos, provides for various 3D specific image processing functions and ensures a pleasant viewing experience. Moreover, the player enables user interactivity by providing digital zoom and pan functionalities. This real time 3D player was implemented on the GPU using CUDA and OpenGL. The player provides user interactive 3D video playback. Stereo images are first read by the player from a fast drive and rectified. Further processing of the images determines the optimal convergence point in the 3D scene to reduce eye strain. The rationale for this convergence point selection takes into account scene depth and display geometry. The first step in this processing chain is identifying keypoints by detecting vertical edges within the left image. Regions surrounding reliable keypoints are then located on the right image through the use of block matching. The difference in the positions between the corresponding regions in the left and right images are then used to calculate disparity. The extrema of the disparity histogram gives the scene disparity range. The left and right images are shifted based upon the calculated range, in order to place the desired region of the 3D scene at convergence. All the above computations are performed on one CPU thread which calls CUDA functions. Image upsampling and shifting is performed in response to user zoom and pan. The player also consists of a CPU display thread, which uses OpenGL rendering (quad buffers). This also gathers user input for digital zoom and pan and sends them to the processing thread.
Parallel Sequential Monte Carlo for Efficient Density Combination: The Deco Matlab Toolbox

DEFF Research Database (Denmark)

Casarin, Roberto; Grassi, Stefano; Ravazzolo, Francesco

This paper presents the Matlab package DeCo (Density Combination) which is based on the paper by Billio et al. (2013) where a constructive Bayesian approach is presented for combining predictive densities originating from different models or other sources of information. The combination weights...... for standard CPU computing and for Graphical Process Unit (GPU) parallel computing. For the GPU implementation we use the Matlab parallel computing toolbox and show how to use General Purposes GPU computing almost effortless. This GPU implementation comes with a speed up of the execution time up to seventy...... times compared to a standard CPU Matlab implementation on a multicore CPU. We show the use of the package and the computational gain of the GPU version, through some simulation experiments and empirical applications....
Robust real-time extraction of respiratory signals from PET list-mode data.

Science.gov (United States)

Salomon, Andre; Zhang, Bin; Olivier, Patrick; Goedicke, Andreas

2018-05-01

Respiratory motion, which typically cannot simply be suspended during PET image acquisition, affects lesions' detection and quantitative accuracy inside or in close vicinity to the lungs. Some motion compensation techniques address this issue via pre-sorting ("binning") of the acquired PET data into a set of temporal gates, where each gate is assumed to be minimally affected by respiratory motion. Tracking respiratory motion is typically realized using dedicated hardware (e.g. using respiratory belts and digital cameras). Extracting respiratory signalsdirectly from the acquired PET data simplifies the clinical workflow as it avoids to handle additional signal measurement equipment. We introduce a new data-driven method "Combined Local Motion Detection" (CLMD). It uses the Time-of-Flight (TOF) information provided by state-of-the-art PET scanners in order to enable real-time respiratory signal extraction without additional hardware resources. CLMD applies center-of-mass detection in overlapping regions based on simple back-positioned TOF event sets acquired in short time frames. Following a signal filtering and quality-based pre-selection step, the remaining extracted individual position information over time is then combined to generate a global respiratory signal. The method is evaluated using 7 measured FDG studies from single and multiple scan positions of the thorax region, and it is compared to other software-based methods regarding quantitative accuracy and statistical noise stability. Correlation coefficients around 90% between the reference and the extracted signal have been found for those PET scans where motion affected features such as tumors or hot regions were present in the PET field-of-view. For PET scans with a quarter of typically applied radiotracer doses, the CLMD method still provides similar high correlation coefficients which indicates its robustness to noise. Each CLMD processing needed less than 0.4s in total on a standard multi-core CPU
Robust real-time extraction of respiratory signals from PET list-mode data

Science.gov (United States)

Salomon, André; Zhang, Bin; Olivier, Patrick; Goedicke, Andreas

2018-06-01

Respiratory motion, which typically cannot simply be suspended during PET image acquisition, affects lesions’ detection and quantitative accuracy inside or in close vicinity to the lungs. Some motion compensation techniques address this issue via pre-sorting (‘binning’) of the acquired PET data into a set of temporal gates, where each gate is assumed to be minimally affected by respiratory motion. Tracking respiratory motion is typically realized using dedicated hardware (e.g. using respiratory belts and digital cameras). Extracting respiratory signals directly from the acquired PET data simplifies the clinical workflow as it avoids handling additional signal measurement equipment. We introduce a new data-driven method ‘combined local motion detection’ (CLMD). It uses the time-of-flight (TOF) information provided by state-of-the-art PET scanners in order to enable real-time respiratory signal extraction without additional hardware resources. CLMD applies center-of-mass detection in overlapping regions based on simple back-positioned TOF event sets acquired in short time frames. Following a signal filtering and quality-based pre-selection step, the remaining extracted individual position information over time is then combined to generate a global respiratory signal. The method is evaluated using seven measured FDG studies from single and multiple scan positions of the thorax region, and it is compared to other software-based methods regarding quantitative accuracy and statistical noise stability. Correlation coefficients around 90% between the reference and the extracted signal have been found for those PET scans where motion affected features such as tumors or hot regions were present in the PET field-of-view. For PET scans with a quarter of typically applied radiotracer doses, the CLMD method still provides similar high correlation coefficients which indicates its robustness to noise. Each CLMD processing needed less than 0.4 s in total on a standard
Parallel algorithm of real-time infrared image restoration based on total variation theory

Science.gov (United States)

Zhu, Ran; Li, Miao; Long, Yunli; Zeng, Yaoyuan; An, Wei

2015-10-01

Image restoration is a necessary preprocessing step for infrared remote sensing applications. Traditional methods allow us to remove the noise but penalize too much the gradients corresponding to edges. Image restoration techniques based on variational approaches can solve this over-smoothing problem for the merits of their well-defined mathematical modeling of the restore procedure. The total variation (TV) of infrared image is introduced as a L1 regularization term added to the objective energy functional. It converts the restoration process to an optimization problem of functional involving a fidelity term to the image data plus a regularization term. Infrared image restoration technology with TV-L1 model exploits the remote sensing data obtained sufficiently and preserves information at edges caused by clouds. Numerical implementation algorithm is presented in detail. Analysis indicates that the structure of this algorithm can be easily implemented in parallelization. Therefore a parallel implementation of the TV-L1 filter based on multicore architecture with shared memory is proposed for infrared real-time remote sensing systems. Massive computation of image data is performed in parallel by cooperating threads running simultaneously on multiple cores. Several groups of synthetic infrared image data are used to validate the feasibility and effectiveness of the proposed parallel algorithm. Quantitative analysis of measuring the restored image quality compared to input image is presented. Experiment results show that the TV-L1 filter can restore the varying background image reasonably, and that its performance can achieve the requirement of real-time image processing.
Laparoscopic total pancreatectomy

Science.gov (United States)

Wang, Xin; Li, Yongbin; Cai, Yunqiang; Liu, Xubao; Peng, Bing

2017-01-01

Abstract Rationale: Laparoscopic total pancreatectomy is a complicated surgical procedure and rarely been reported. This study was conducted to investigate the safety and feasibility of laparoscopic total pancreatectomy. Patients and Methods: Three patients underwent laparoscopic total pancreatectomy between May 2014 and August 2015. We reviewed their general demographic data, perioperative details, and short-term outcomes. General morbidity was assessed using Clavien–Dindo classification and delayed gastric emptying (DGE) was evaluated by International Study Group of Pancreatic Surgery (ISGPS) definition. Diagnosis and Outcomes: The indications for laparoscopic total pancreatectomy were intraductal papillary mucinous neoplasm (IPMN) (n = 2) and pancreatic neuroendocrine tumor (PNET) (n = 1). All patients underwent laparoscopic pylorus and spleen-preserving total pancreatectomy, the mean operative time was 490 minutes (range 450–540 minutes), the mean estimated blood loss was 266 mL (range 100–400 minutes); 2 patients suffered from postoperative complication. All the patients recovered uneventfully with conservative treatment and discharged with a mean hospital stay 18 days (range 8–24 days). The short-term (from 108 to 600 days) follow up demonstrated 3 patients had normal and consistent glycated hemoglobin (HbA1c) level with acceptable quality of life. Lessons: Laparoscopic total pancreatectomy is feasible and safe in selected patients and pylorus and spleen preserving technique should be considered. Further prospective randomized studies are needed to obtain a comprehensive understanding the role of laparoscopic technique in total pancreatectomy. PMID:28099344
Simplified neural networks for solving linear least squares and total least squares problems in real time.

Science.gov (United States)

Cichocki, A; Unbehauen, R

1994-01-01

In this paper a new class of simplified low-cost analog artificial neural networks with on chip adaptive learning algorithms are proposed for solving linear systems of algebraic equations in real time. The proposed learning algorithms for linear least squares (LS), total least squares (TLS) and data least squares (DLS) problems can be considered as modifications and extensions of well known algorithms: the row-action projection-Kaczmarz algorithm and/or the LMS (Adaline) Widrow-Hoff algorithms. The algorithms can be applied to any problem which can be formulated as a linear regression problem. The correctness and high performance of the proposed neural networks are illustrated by extensive computer simulation results.
Simulating Real-Time Aspects of Wireless Sensor Networks

Directory of Open Access Journals (Sweden)

Christian Nastasi

2010-01-01

Full Text Available Wireless Sensor Networks (WSNs technology has been mainly used in the applications with low-frequency sampling and little computational complexity. Recently, new classes of WSN-based applications with different characteristics are being considered, including process control, industrial automation and visual surveillance. Such new applications usually involve relatively heavy computations and also present real-time requirements as bounded end-to- end delay and guaranteed Quality of Service. It becomes then necessary to employ proper resource management policies, not only for communication resources but also jointly for computing resources, in the design and development of such WSN-based applications. In this context, simulation can play a critical role, together with analytical models, for validating a system design against the parameters of Quality of Service demanded for. In this paper, we present RTNS, a publicly available free simulation tool which includes Operating System aspects in wireless distributed applications. RTNS extends the well-known NS-2 simulator with models of the CPU, the Real-Time Operating System and the application tasks, to take into account delays due to the computation in addition to the communication. We demonstrate the benefits of RTNS by presenting our simulation study for a complex WSN-based multi-view vision system for real-time event detection.
Total volume versus bouts

DEFF Research Database (Denmark)

Chinapaw, Mai; Klakk, Heidi; Møller, Niels Christian

2018-01-01

BACKGROUND/OBJECTIVES: Examine the prospective relationship of total volume versus bouts of sedentary behaviour (SB) and moderate-to-vigorous physical activity (MVPA) with cardiometabolic risk in children. In addition, the moderating effects of weight status and MVPA were explored. SUBJECTS....../METHODS: Longitudinal study including 454 primary school children (mean age 10.3 years). Total volume and bouts (i.e. ≥10 min consecutive minutes) of MVPA and SB were assessed by accelerometry in Nov 2009/Jan 2010 (T1) and Aug/Oct 2010 (T2). Triglycerides, total cholesterol/HDL cholesterol ratio (TC:HDLC ratio......, with or without mutual adjustments between MVPA and SB. The moderating effects of weight status and MVPA (for SB only) were examined by adding interaction terms. RESULTS: Children engaged daily in about 60 min of total MVPA and 0-15 min/week in MVPA bouts. Mean total sedentary time was around 7 h/day with over 3...
Accelerating next generation sequencing data analysis with system level optimizations.

Science.gov (United States)

Kathiresan, Nagarajan; Temanni, Ramzi; Almabrazi, Hakeem; Syed, Najeeb; Jithesh, Puthen V; Al-Ali, Rashid

2017-08-22

Next generation sequencing (NGS) data analysis is highly compute intensive. In-memory computing, vectorization, bulk data transfer, CPU frequency scaling are some of the hardware features in the modern computing architectures. To get the best execution time and utilize these hardware features, it is necessary to tune the system level parameters before running the application. We studied the GATK-HaplotypeCaller which is part of common NGS workflows, that consume more than 43% of the total execution time. Multiple GATK 3.x versions were benchmarked and the execution time of HaplotypeCaller was optimized by various system level parameters which included: (i) tuning the parallel garbage collection and kernel shared memory to simulate in-memory computing, (ii) architecture-specific tuning in the PairHMM library for vectorization, (iii) including Java 1.8 features through GATK source code compilation and building a runtime environment for parallel sorting and bulk data transfer (iv) the default 'on-demand' mode of CPU frequency is over-clocked by using 'performance-mode' to accelerate the Java multi-threads. As a result, the HaplotypeCaller execution time was reduced by 82.66% in GATK 3.3 and 42.61% in GATK 3.7. Overall, the execution time of NGS pipeline was reduced to 70.60% and 34.14% for GATK 3.3 and GATK 3.7 respectively.
Modulation of Total Sleep Time by Transcranial Direct Current Stimulation (tDCS).

Science.gov (United States)

Frase, Lukas; Piosczyk, Hannah; Zittel, Sulamith; Jahn, Friederike; Selhausen, Peter; Krone, Lukas; Feige, Bernd; Mainberger, Florian; Maier, Jonathan G; Kuhn, Marion; Klöppel, Stefan; Normann, Claus; Sterr, Annette; Spiegelhalder, Kai; Riemann, Dieter; Nitsche, Michael A; Nissen, Christoph

2016-09-01

Arousal and sleep are fundamental physiological processes, and their modulation is of high clinical significance. This study tested the hypothesis that total sleep time (TST) in humans can be modulated by the non-invasive brain stimulation technique transcranial direct current stimulation (tDCS) targeting a 'top-down' cortico-thalamic pathway of sleep-wake regulation. Nineteen healthy participants underwent a within-subject, repeated-measures protocol across five nights in the sleep laboratory with polysomnographic monitoring (adaptation, baseline, three experimental nights). tDCS was delivered via bi-frontal target electrodes and bi-parietal return electrodes before sleep (anodal 'activation', cathodal 'deactivation', and sham stimulation). Bi-frontal anodal stimulation significantly decreased TST, compared with cathodal and sham stimulation. This effect was location specific. Bi-frontal cathodal stimulation did not significantly increase TST, potentially due to ceiling effects in good sleepers. Exploratory resting-state EEG analyses before and after the tDCS protocols were consistent with the notion of increased cortical arousal after anodal stimulation and decreased cortical arousal after cathodal stimulation. The study provides proof-of-concept that TST can be decreased by non-invasive bi-frontal anodal tDCS in healthy humans. Further elucidating the 'top-down' pathway of sleep-wake regulation is expected to increase knowledge on the fundamentals of sleep-wake regulation and to contribute to the development of novel treatments for clinical conditions of disturbed arousal and sleep.
A design of a computer complex including vector processors

International Nuclear Information System (INIS)

Asai, Kiyoshi

1982-12-01

We, members of the Computing Center, Japan Atomic Energy Research Institute have been engaged for these six years in the research of adaptability of vector processing to large-scale nuclear codes. The research has been done in collaboration with researchers and engineers of JAERI and a computer manufacturer. In this research, forty large-scale nuclear codes were investigated from the viewpoint of vectorization. Among them, twenty-six codes were actually vectorized and executed. As the results of the investigation, it is now estimated that about seventy percents of nuclear codes and seventy percents of our total amount of CPU time of JAERI are highly vectorizable. Based on the data obtained by the investigation, (1)currently vectorizable CPU time, (2)necessary number of vector processors, (3)necessary manpower for vectorization of nuclear codes, (4)computing speed, memory size, number of parallel 1/0 paths, size and speed of 1/0 buffer of vector processor suitable for our applications, (5)necessary software and operational policy for use of vector processors are discussed, and finally (6)a computer complex including vector processors is presented in this report. (author)
Coupled Heuristic Prediction of Long Lead-Time Accumulated Total Inflow of a Reservoir during Typhoons Using Deterministic Recurrent and Fuzzy Inference-Based Neural Network

Directory of Open Access Journals (Sweden)

Chien-Lin Huang

2015-11-01

Full Text Available This study applies Real-Time Recurrent Learning Neural Network (RTRLNN and Adaptive Network-based Fuzzy Inference System (ANFIS with novel heuristic techniques to develop an advanced prediction model of accumulated total inflow of a reservoir in order to solve the difficulties of future long lead-time highly varied uncertainty during typhoon attacks while using a real-time forecast. For promoting the temporal-spatial forecasted precision, the following original specialized heuristic inputs were coupled: observed-predicted inflow increase/decrease (OPIID rate, total precipitation, and duration from current time to the time of maximum precipitation and direct runoff ending (DRE. This study also investigated the temporal-spatial forecasted error feature to assess the feasibility of the developed models, and analyzed the output sensitivity of both single and combined heuristic inputs to determine whether the heuristic model is susceptible to the impact of future forecasted uncertainty/errors. Validation results showed that the long lead-time–predicted accuracy and stability of the RTRLNN-based accumulated total inflow model are better than that of the ANFIS-based model because of the real-time recurrent deterministic routing mechanism of RTRLNN. Simulations show that the RTRLNN-based model with coupled heuristic inputs (RTRLNN-CHI, average error percentage (AEP/average forecast lead-time (AFLT: 6.3%/49 h can achieve better prediction than the model with non-heuristic inputs (AEP of RTRLNN-NHI and ANFIS-NHI: 15.2%/31.8% because of the full consideration of real-time hydrological initial/boundary conditions. Besides, the RTRLNN-CHI model can promote the forecasted lead-time above 49 h with less than 10% of AEP which can overcome the previous forecasted limits of 6-h AFLT with above 20%–40% of AEP.
Implementation and Optimization of GPU-Based Static State Security Analysis in Power Systems

Directory of Open Access Journals (Sweden)

Yong Chen

2017-01-01

Full Text Available Static state security analysis (SSSA is one of the most important computations to check whether a power system is in normal and secure operating state. It is a challenge to satisfy real-time requirements with CPU-based concurrent methods due to the intensive computations. A sensitivity analysis-based method with Graphics processing unit (GPU is proposed for power systems, which can reduce calculation time by 40% compared to the execution on a 4-core CPU. The proposed method involves load flow analysis and sensitivity analysis. In load flow analysis, a multifrontal method for sparse LU factorization is explored on GPU through dynamic frontal task scheduling between CPU and GPU. The varying matrix operations during sensitivity analysis on GPU are highly optimized in this study. The results of performance evaluations show that the proposed GPU-based SSSA with optimized matrix operations can achieve a significant reduction in computation time.
Development of embedded real-time and high-speed vision platform

Science.gov (United States)

Ouyang, Zhenxing; Dong, Yimin; Yang, Hua

2015-12-01

Currently, high-speed vision platforms are widely used in many applications, such as robotics and automation industry. However, a personal computer (PC) whose over-large size is not suitable and applicable in compact systems is an indispensable component for human-computer interaction in traditional high-speed vision platforms. Therefore, this paper develops an embedded real-time and high-speed vision platform, ER-HVP Vision which is able to work completely out of PC. In this new platform, an embedded CPU-based board is designed as substitution for PC and a DSP and FPGA board is developed for implementing image parallel algorithms in FPGA and image sequential algorithms in DSP. Hence, the capability of ER-HVP Vision with size of 320mm x 250mm x 87mm can be presented in more compact condition. Experimental results are also given to indicate that the real-time detection and counting of the moving target at a frame rate of 200 fps at 512 x 512 pixels under the operation of this newly developed vision platform are feasible.
GPU-Accelerated Parallel FDTD on Distributed Heterogeneous Platform

Directory of Open Access Journals (Sweden)

Ronglin Jiang

2014-01-01

Full Text Available This paper introduces a (finite difference time domain FDTD code written in Fortran and CUDA for realistic electromagnetic calculations with parallelization methods of Message Passing Interface (MPI and Open Multiprocessing (OpenMP. Since both Central Processing Unit (CPU and Graphics Processing Unit (GPU resources are utilized, a faster execution speed can be reached compared to a traditional pure GPU code. In our experiments, 64 NVIDIA TESLA K20m GPUs and 64 INTEL XEON E5-2670 CPUs are used to carry out the pure CPU, pure GPU, and CPU + GPU tests. Relative to the pure CPU calculations for the same problems, the speedup ratio achieved by CPU + GPU calculations is around 14. Compared to the pure GPU calculations for the same problems, the CPU + GPU calculations have 7.6%–13.2% performance improvement. Because of the small memory size of GPUs, the FDTD problem size is usually very small. However, this code can enlarge the maximum problem size by 25% without reducing the performance of traditional pure GPU code. Finally, using this code, a microstrip antenna array with 16×18 elements is calculated and the radiation patterns are compared with the ones of MoM. Results show that there is a well agreement between them.
GPU-accelerated Gibbs ensemble Monte Carlo simulations of Lennard-Jonesium

Science.gov (United States)

Mick, Jason; Hailat, Eyad; Russo, Vincent; Rushaidat, Kamel; Schwiebert, Loren; Potoff, Jeffrey

2013-12-01

This work describes an implementation of canonical and Gibbs ensemble Monte Carlo simulations on graphics processing units (GPUs). The pair-wise energy calculations, which consume the majority of the computational effort, are parallelized using the energetic decomposition algorithm. While energetic decomposition is relatively inefficient for traditional CPU-bound codes, the algorithm is ideally suited to the architecture of the GPU. The performance of the CPU and GPU codes are assessed for a variety of CPU and GPU combinations for systems containing between 512 and 131,072 particles. For a system of 131,072 particles, the GPU-enabled canonical and Gibbs ensemble codes were 10.3 and 29.1 times faster (GTX 480 GPU vs. i5-2500K CPU), respectively, than an optimized serial CPU-bound code. Due to overhead from memory transfers from system RAM to the GPU, the CPU code was slightly faster than the GPU code for simulations containing less than 600 particles. The critical temperature Tc∗=1.312(2) and density ρc∗=0.316(3) were determined for the tail corrected Lennard-Jones potential from simulations of 10,000 particle systems, and found to be in exact agreement with prior mixed field finite-size scaling calculations [J.J. Potoff, A.Z. Panagiotopoulos, J. Chem. Phys. 109 (1998) 10914].
Total Productive Maintenance at Paccar INC

OpenAIRE

Ştefan Farkas

2010-01-01

This paper reports the application of total productive maintenance method at Paccar Inc. truck’s plant in Victoria, Australia. The total productive maintenance method and total productive maintenance house are presented. The global equipment effectiveness is computed and exemplified. The production structure and organising maintenance are presented. Resultas of the variation of global equipment effectiveness and autonomous maintenance in a two weeks period of time are reported.

Cataract incidence after total-body irradiation

International Nuclear Information System (INIS)

Zierhut, D.; Lohr, F.; Schraube, P.; Huber, P.; Haas, R.; Hunstein, W.; Wannenmacher, M.

1997-01-01

Purpose: Aim of this retrospective study was to evaluate cataract incidence in a homogeneous group of patients after total-body irradiation followed by autologous bone marrow transplantation or peripheral blood stem cell transplantation. Method and Materials: Between 11/1982 and 6/1994 in total 260 patients received in our hospital total-body irradiation for treatment of haematological malignancy. In 1996-96 patients out of these 260 patients were still alive. 85 from these still living patients (52 men, 33 women) answered evaluable on a questionnaire and could be examined ophthalmologically. Median age of these patients was 38,5 years (15 - 59 years) at time of total-body irradiation. Radiotherapy was applied as hyperfractionated total-body irradiation with a median dose of 14,4 Gy in 12 fractions over 4 days. Minimum time between fractions was 4 hours, photons with a energy of 23 MeV were used, and the dose rate was 7 - 18 cGy/min. Results: Median follow-up is now 5,8 years (1,7 - 13 years). Cataract occurred in (28(85)) patients after a median time of 47 months (1 - 104 months). In 6 out of these 28 patients who developed a cataract, surgery of the cataract was performed. Whole-brain irradiation prior to total-body irradiation was more often in the group of patients developing a cataract (14,3%) vs. 10,7% in the group of patients without cataract. Conclusion: Cataract is a common side effect of total-body irradiation. Cataract incidence found in our patients is comparable to results of other centres using a fractionated regimen for total-body irradiation. The hyperfractionated regimen used in our hospital does obviously not result in a even lower cataract incidence. In contrast to acute and late toxicity in other organ/organsystems, hyperfractionation of total-body irradiation does not further reduce toxicity for the eye-lens. Dose rate may have more influence on cataract incidence
Optimization of scan time in MRI for total hip prostheses. SEMAC tailoring for prosthetic implants containing different types of metals

Energy Technology Data Exchange (ETDEWEB)

Deligianni, X. [University of Basel Hospital, Basel (Switzerland). Div. of Radiological Physics; Merian Iselin Klinik, Basel (Switzerland). Inst. of Radiology; Bieri, O. [University of Basel Hospital, Basel (Switzerland). Div. of Radiological Physics; Elke, R. [Orthomerian, Basel (Switzerland); Wischer, T.; Egelhof, T. [Merian Iselin Klinik, Basel (Switzerland). Inst. of Radiology

2015-12-15

Magnetic resonance imaging (MRI) of soft tissues after total hip arthroplasty is of clinical interest for the diagnosis of various pathologies that are usually invisible with other imaging modalities. As a result, considerable effort has been put into the development of metal artifact reduction MRI strategies, such as slice encoding for metal artifact correction (SEMAC). Generally, the degree of metal artifact reduction with SEMAC directly relates to the overall time spent for acquisition, but there is no specific consensus about the most efficient sequence setup depending on the implant material. The aim of this article is to suggest material-tailored SEMAC protocol settings. Five of the most common total hip prostheses (1. Revision prosthesis (S-Rom), 2. Titanium alloy, 3. Mueller type (CoNiCRMo alloy), 4. Old Charnley prosthesis (Exeter/Stryker), 5. MS-30 stem (stainless-steel)) were scanned on a 1.5 T MRI clinical scanner with a SEMAC sequence with a range of artifact-resolving slice encoding steps (SES: 2 - 23) along the slice direction (yielding a total variable scan time ranging from 1 to 10 min). The reduction of the artifact volume in comparison with maximal artifact suppression was evaluated both quantitatively and qualitatively in order to establish a recommended number of steps for each case. The number of SES that reduced the artifact volume below approximately 300 mm{sup 3} ranged from 3 to 13, depending on the material. Our results showed that although 3 SES steps can be sufficient for artifact reduction for titanium prostheses, at least 11 SES should be used for prostheses made of materials such as certain alloys of stainless steel. Tailoring SES to the implant material and to the desired degree of metal artifact reduction represents a simple tool for workflow optimization of SEMAC imaging near total hip arthroplasty in a clinical setting.
Optimization of scan time in MRI for total hip prostheses. SEMAC tailoring for prosthetic implants containing different types of metals

International Nuclear Information System (INIS)

Deligianni, X.; Wischer, T.; Egelhof, T.

2015-01-01

Magnetic resonance imaging (MRI) of soft tissues after total hip arthroplasty is of clinical interest for the diagnosis of various pathologies that are usually invisible with other imaging modalities. As a result, considerable effort has been put into the development of metal artifact reduction MRI strategies, such as slice encoding for metal artifact correction (SEMAC). Generally, the degree of metal artifact reduction with SEMAC directly relates to the overall time spent for acquisition, but there is no specific consensus about the most efficient sequence setup depending on the implant material. The aim of this article is to suggest material-tailored SEMAC protocol settings. Five of the most common total hip prostheses (1. Revision prosthesis (S-Rom), 2. Titanium alloy, 3. Mueller type (CoNiCRMo alloy), 4. Old Charnley prosthesis (Exeter/Stryker), 5. MS-30 stem (stainless-steel)) were scanned on a 1.5 T MRI clinical scanner with a SEMAC sequence with a range of artifact-resolving slice encoding steps (SES: 2 - 23) along the slice direction (yielding a total variable scan time ranging from 1 to 10 min). The reduction of the artifact volume in comparison with maximal artifact suppression was evaluated both quantitatively and qualitatively in order to establish a recommended number of steps for each case. The number of SES that reduced the artifact volume below approximately 300 mm 3 ranged from 3 to 13, depending on the material. Our results showed that although 3 SES steps can be sufficient for artifact reduction for titanium prostheses, at least 11 SES should be used for prostheses made of materials such as certain alloys of stainless steel. Tailoring SES to the implant material and to the desired degree of metal artifact reduction represents a simple tool for workflow optimization of SEMAC imaging near total hip arthroplasty in a clinical setting.
Utilizing a multiprocessor architecture - The performance of MIDAS

International Nuclear Information System (INIS)

Maples, C.; Logan, D.; Meng, J.; Rathbun, W.; Weaver, D.

1983-01-01

The MIDAS architecture organizes multiple CPUs into clusters called distributed subsystems. Each subsystem consists of an array of processors controlled by a supervisory CPU. The multiprocessor array is composed of commercial CPUs (with floating point hardware) and specialized processing elements. Interprocessor communication within the array may occur either through switched memory modules or common shared memory. The architecture permits multiple processors to be focused on single problems. A distributed subsystem has been constructed and tested. It currently consists of a supervisor CPU; 16 blocks of independently switchable memory; 9 general purpose, VAX-class CPUs; and 2 specialized pipelined processors to handle I/O. Results on a variety of problems indicate that the subsystem performs 8 to 15 times faster than a standard computer with an identical CPU. The difference in performance represents the effect of differing CPU and I/O requirements
Simulation of time-dependent free-surface Navier-Stokes flows

International Nuclear Information System (INIS)

Muldowney, G.P.

1989-01-01

Two numerical methods for simulation of time-dependent free-surface Navier-Stokes flows are developed. Both techniques are based on semi-implicit time advancement of the momentum equations, integral formulation of the spatial problem at each timestep, and spectral-element discretization to solve the resulting integral equation. Central to each algorithm is a boundary-specific solution step which permits the spatial treatment in two dimensions to be performed in O(N 3 ) operations per timestep despite the presence of deforming geometry. The first approach is a domain-integral formulation involving integrals over the entire flow domain of kernel functions which arise in time-differencing the Navier-Stokes equations. The second is a particular-solution formulation which replaces domain integration with an iterative scheme to generate particular velocity and pressure fields on individual elements, followed by a patching step to produce a particular solution continuous over the full domain. Two of the most difficult aspects of viscous free-surface flow simulations, namely time-dependent geometry and nontrivial boundary conditions, are well accommodated by these integral equation techniques. In addition the methods offer spectral accuracy in space and admit arbitrarily high-order discretization in time. For large-scale computations and/or long-term time advancement the domain-integral algorithm must be executed on a supercomputer to deliver results in reasonable processing time. A detailed simulation of gas liquid flow with full resolution of the free phase boundary requires approximately five CPU hours at 80 megaflops
Exact and Heuristic Solutions to Minimize Total Waiting Time in the Blood Products Distribution Problem

Directory of Open Access Journals (Sweden)

Amir Salehipour

2012-01-01

Full Text Available This paper presents a novel application of operations research to support decision making in blood distribution management. The rapid and dynamic increasing demand, criticality of the product, storage, handling, and distribution requirements, and the different geographical locations of hospitals and medical centers have made blood distribution a complex and important problem. In this study, a real blood distribution problem containing 24 hospitals was tackled by the authors, and an exact approach was presented. The objective of the problem is to distribute blood and its products among hospitals and medical centers such that the total waiting time of those requiring the product is minimized. Following the exact solution, a hybrid heuristic algorithm is proposed. Computational experiments showed the optimal solutions could be obtained for medium size instances, while for larger instances the proposed hybrid heuristic is very competitive.
Total Productive Maintenance at Paccar INC

Directory of Open Access Journals (Sweden)

Ştefan Farkas

2010-06-01

Full Text Available This paper reports the application of total productive maintenance method at Paccar Inc. truck’s plant in Victoria, Australia. The total productive maintenance method and total productive maintenance house are presented. The global equipment effectiveness is computed and exemplified. The production structure and organising maintenance are presented. Resultas of the variation of global equipment effectiveness and autonomous maintenance in a two weeks period of time are reported.
Total employment effect of biofuels

International Nuclear Information System (INIS)

Stridsberg, S.

1998-08-01

The study examined the total employment effect of both direct production of biofuel and energy conversion to heat and electricity, as well as the indirect employment effect arising from investments and other activities in conjunction with the production organization. A secondary effect depending on the increased capital flow is also included in the final result. The scenarios are based on two periods, 1993-2005 and 2005-2020. In the present study, the different fuels and the different applications have been analyzed individually with regard to direct and indirect employment within each separate sector. The greatest employment effect in the production chain is shown for logging residues with 290 full-time jobs/TWh, whereas other biofuels range between 80 and 280 full-time jobs/TWh. In the processing chain, the corresponding range is 200-300 full-time jobs per each additional TWh. Additionally and finally, there are secondary effects that give a total of 650 full-time jobs/TWh. Together with the predicted increase, this suggests that unprocessed fuel will provide an additional 16 000 annual full-time jobs, and that fuel processing will contribute with a further 5 000 full-time jobs. The energy production from the fuels will provide an additional 13 000 full-time jobs. The total figure of 34 000 annual full-time jobs must then be reduced by about 4000 on account of lost jobs, mainly in the oil sector and to some extent in imports of biofuel. In addition, the anticipated increase in capital turnover that occurs within the biofuel sector, will increase full-time jobs up to year 2020. Finally, a discussion is given of the accomplishment of the programmes anticipated by the scenario, where it is noted that processing of biofuel to wafers, pellets or powder places major demands on access to raw material of good quality and that agrarian fuels must be given priority if they are to enter the system sufficiently fast. Straw is already a resource but is still not accepted by
Brake response time before and after total knee arthroplasty: a prospective cohort study

Directory of Open Access Journals (Sweden)

Niederseer David

2010-11-01

Full Text Available Abstract Background Although the numbers of total knee arthroplasty (TKA are increasing, there is only a small number of studies investigating driving safety after TKA. The parameter 'Brake Response Time (BRT' is one of the most important criteria for driving safety and was therefore chosen for investigation. The present study was conducted to test the hypotheses that patients with right- or left-sided TKA show a significant increase in BRT from pre-operative (pre-op, 1 day before surgery to post-operative (post-op, 2 weeks post surgery, and a significant decrease in BRT from post-op to the follow-up investigation (FU, 8 weeks post surgery. Additionally, it was hypothesized that the BRT of patients after TKA is significantly higher than that of healthy controls. Methods 31 of 70 consecutive patients (mean age 65.7 +/- 10.2 years receiving TKA were tested for their BRT pre-op, post-op and at FU. BRT was assessed using a custom-made driving simulator. We used normative BRT data from 31 healthy controls for comparison. Results There were no significant increases between pre-op and post-op BRT values for patients who had undergone left- or right-sided TKA. Even the proportion of patients above a BRT threshold of 700 ms was not significantly increased postop. Controls had a BRT which was significantly better than the BRT of patients with right- or left-sided TKA at all three time points. Conclusion The present study showed a small and insignificant postoperative increase in the BRT of patients who had undergone right- or left-sided TKA. Therefore, we believe it is not justified to impair the patient's quality of social and occupational life post-surgery by imposing restrictions on driving motor vehicles beyond an interval of two weeks after surgery.
A New Generation of Real-Time Systems in the JET Tokamak

Science.gov (United States)

Alves, Diogo; Neto, Andre C.; Valcarcel, Daniel F.; Felton, Robert; Lopez, Juan M.; Barbalace, Antonio; Boncagni, Luca; Card, Peter; De Tommasi, Gianmaria; Goodyear, Alex; Jachmich, Stefan; Lomas, Peter J.; Maviglia, Francesco; McCullen, Paul; Murari, Andrea; Rainford, Mark; Reux, Cedric; Rimini, Fernanda; Sartori, Filippo; Stephen, Adam V.; Vega, Jesus; Vitelli, Riccardo; Zabeo, Luca; Zastrow, Klaus-Dieter

2014-04-01

Recently, a new recipe for developing and deploying real-time systems has become increasingly adopted in the JET tokamak. Powered by the advent of x86 multi-core technology and the reliability of JET's well established Real-Time Data Network (RTDN) to handle all real-time I/O, an official Linux vanilla kernel has been demonstrated to be able to provide real-time performance to user-space applications that are required to meet stringent timing constraints. In particular, a careful rearrangement of the Interrupt ReQuests' (IRQs) affinities together with the kernel's CPU isolation mechanism allows one to obtain either soft or hard real-time behavior depending on the synchronization mechanism adopted. Finally, the Multithreaded Application Real-Time executor (MARTe) framework is used for building applications particularly optimised for exploring multi-core architectures. In the past year, four new systems based on this philosophy have been installed and are now part of JET's routine operation. The focus of the present work is on the configuration aspects that enable these new systems' real-time capability. Details are given about the common real-time configuration of these systems, followed by a brief description of each system together with results regarding their real-time performance. A cycle time jitter analysis of a user-space MARTe based application synchronizing over a network is also presented. The goal is to compare its deterministic performance while running on a vanilla and on a Messaging Real time Grid (MRG) Linux kernel.
Near real-time digital holographic microscope based on GPU parallel computing

Science.gov (United States)

Zhu, Gang; Zhao, Zhixiong; Wang, Huarui; Yang, Yan

2018-01-01

A transmission near real-time digital holographic microscope with in-line and off-axis light path is presented, in which the parallel computing technology based on compute unified device architecture (CUDA) and digital holographic microscopy are combined. Compared to other holographic microscopes, which have to implement reconstruction in multiple focal planes and are time-consuming the reconstruction speed of the near real-time digital holographic microscope can be greatly improved with the parallel computing technology based on CUDA, so it is especially suitable for measurements of particle field in micrometer and nanometer scale. Simulations and experiments show that the proposed transmission digital holographic microscope can accurately measure and display the velocity of particle field in micrometer scale, and the average velocity error is lower than 10%.With the graphic processing units(GPU), the computing time of the 100 reconstruction planes(512×512 grids) is lower than 120ms, while it is 4.9s using traditional reconstruction method by CPU. The reconstruction speed has been raised by 40 times. In other words, it can handle holograms at 8.3 frames per second and the near real-time measurement and display of particle velocity field are realized. The real-time three-dimensional reconstruction of particle velocity field is expected to achieve by further optimization of software and hardware. Keywords: digital holographic microscope,
Computing the Density Matrix in Electronic Structure Theory on Graphics Processing Units.

Science.gov (United States)

Cawkwell, M J; Sanville, E J; Mniszewski, S M; Niklasson, Anders M N

2012-11-13

The self-consistent solution of a Schrödinger-like equation for the density matrix is a critical and computationally demanding step in quantum-based models of interatomic bonding. This step was tackled historically via the diagonalization of the Hamiltonian. We have investigated the performance and accuracy of the second-order spectral projection (SP2) algorithm for the computation of the density matrix via a recursive expansion of the Fermi operator in a series of generalized matrix-matrix multiplications. We demonstrate that owing to its simplicity, the SP2 algorithm [Niklasson, A. M. N. Phys. Rev. B2002, 66, 155115] is exceptionally well suited to implementation on graphics processing units (GPUs). The performance in double and single precision arithmetic of a hybrid GPU/central processing unit (CPU) and full GPU implementation of the SP2 algorithm exceed those of a CPU-only implementation of the SP2 algorithm and traditional matrix diagonalization when the dimensions of the matrices exceed about 2000 × 2000. Padding schemes for arrays allocated in the GPU memory that optimize the performance of the CUBLAS implementations of the level 3 BLAS DGEMM and SGEMM subroutines for generalized matrix-matrix multiplications are described in detail. The analysis of the relative performance of the hybrid CPU/GPU and full GPU implementations indicate that the transfer of arrays between the GPU and CPU constitutes only a small fraction of the total computation time. The errors measured in the self-consistent density matrices computed using the SP2 algorithm are generally smaller than those measured in matrices computed via diagonalization. Furthermore, the errors in the density matrices computed using the SP2 algorithm do not exhibit any dependence of system size, whereas the errors increase linearly with the number of orbitals when diagonalization is employed.
Stability analysis by ERATO code

International Nuclear Information System (INIS)

Tsunematsu, Toshihide; Takeda, Tatsuoki; Matsuura, Toshihiko; Azumi, Masafumi; Kurita, Gen-ichi

1979-12-01

Problems in MHD stability calculations by ERATO code are described; which concern convergence property of results, equilibrium codes, and machine optimization of ERATO code. It is concluded that irregularity on a convergence curve is not due to a fault of the ERATO code itself but due to inappropriate choice of the equilibrium calculation meshes. Also described are a code to calculate an equilibrium as a quasi-inverse problem and a code to calculate an equilibrium as a result of a transport process. Optimization of the code with respect to I/O operations reduced both CPU time and I/O time considerably. With the FACOM230-75 APU/CPU multiprocessor system, the performance is about 6 times as high as with the FACOM230-75 CPU, showing the effectiveness of a vector processing computer for the kind of MHD computations. This report is a summary of the material presented at the ERATO workshop 1979(ORNL), supplemented with some details. (author)
On localization attacks against cloud infrastructure

Science.gov (United States)

Ge, Linqiang; Yu, Wei; Sistani, Mohammad Ali

2013-05-01

One of the key characteristics of cloud computing is the device and location independence that enables the user to access systems regardless of their location. Because cloud computing is heavily based on sharing resource, it is vulnerable to cyber attacks. In this paper, we investigate a localization attack that enables the adversary to leverage central processing unit (CPU) resources to localize the physical location of server used by victims. By increasing and reducing CPU usage through the malicious virtual machine (VM), the response time from the victim VM will increase and decrease correspondingly. In this way, by embedding the probing signal into the CPU usage and correlating the same pattern in the response time from the victim VM, the adversary can find the location of victim VM. To determine attack accuracy, we investigate features in both the time and frequency domains. We conduct both theoretical and experimental study to demonstrate the effectiveness of such an attack.
Caffe con Troll: Shallow Ideas to Speed Up Deep Learning.

Science.gov (United States)

Hadjis, Stefan; Abuzaid, Firas; Zhang, Ce; Ré, Christopher

2015-01-01

We present Caffe con Troll (CcT), a fully compatible end-to-end version of the popular framework Caffe with rebuilt internals. We built CcT to examine the performance characteristics of training and deploying general-purpose convolutional neural networks across different hardware architectures. We find that, by employing standard batching optimizations for CPU training, we achieve a 4.5× throughput improvement over Caffe on popular networks like CaffeNet. Moreover, with these improvements, the end-to-end training time for CNNs is directly proportional to the FLOPS delivered by the CPU, which enables us to efficiently train hybrid CPU-GPU systems for CNNs.
Massively Parallel and Scalable Implicit Time Integration Algorithms for Structural Dynamics

Science.gov (United States)

Farhat, Charbel

1997-01-01

Explicit codes are often used to simulate the nonlinear dynamics of large-scale structural systems, even for low frequency response, because the storage and CPU requirements entailed by the repeated factorizations traditionally found in implicit codes rapidly overwhelm the available computing resources. With the advent of parallel processing, this trend is accelerating because of the following additional facts: (a) explicit schemes are easier to parallelize than implicit ones, and (b) explicit schemes induce short range interprocessor communications that are relatively inexpensive, while the factorization methods used in most implicit schemes induce long range interprocessor communications that often ruin the sought-after speed-up. However, the time step restriction imposed by the Courant stability condition on all explicit schemes cannot yet be offset by the speed of the currently available parallel hardware. Therefore, it is essential to develop efficient alternatives to direct methods that are also amenable to massively parallel processing because implicit codes using unconditionally stable time-integration algorithms are computationally more efficient when simulating the low-frequency dynamics of aerospace structures.
A moving image system for cardiovascular nuclear medicine. A dedicated auxiliary device for the total capacity imaging system for multiple plane dynamic colour display

International Nuclear Information System (INIS)

Iio, M.; Toyama, H.; Murata, H.; Takaoka, S.

1981-01-01

The recent device of the authors, the dedicated multiplane dynamic colour image display system for nuclear medicine, is discussed. This new device is a hardware-based auxiliary moving image system (AMIS) attached to the total capacity image processing system of the authors' department. The major purpose of this study is to develop the dedicated device so that cardiovascular nuclear medicine and other dynamic studies will include the ability to assess the real time delicate processing of the colour selection, edge detection, phased analysis, etc. The auxiliary system consists of the interface for image transferring, four IC refresh memories of 64x64 matrix with 10 bit count depth, a digital 20-in colour TV monitor, a control keyboard and a control panel with potentiometers. This system has five major functions for colour display: (1) A microcomputer board can select any one of 40 different colour tables preset in the colour transformation RAM. This key also provides edge detection at a certain level of the count by leaving the optional colour and setting the rest of the levels at 0 (black); (2) The arithmetic processing circuit performs the operation of the fundamental rules, permitting arithmetic processes of the two images; (3) The colour level control circuit is operated independently by four potentiometers for four refresh image memories, so that the gain and offset of the colour level can be manually and visually controlled to the satisfaction of the operator; (4) The simultaneous CRT display of the maximum four images with or without cinematic motion is possible; (5) The real time movie interval is also adjustable by hardware, and certain frames can be freezed with overlapping of the dynamic frames. Since this system of AMIS is linked with the whole capacity image processing system of the CPU size of 128kW, etc., clinical applications are not limited to cardiovascular nuclear medicine. (author)
A massively parallel GPU-accelerated model for analysis of fully nonlinear free surface waves

DEFF Research Database (Denmark)

Engsig-Karup, Allan Peter; Madsen, Morten G.; Glimberg, Stefan Lemvig

2011-01-01

-storage flexible-order accurate finite difference method that is known to be efficient and scalable on a CPU core (single thread). To achieve parallel performance of the relatively complex numerical model, we investigate a new trend in high-performance computing where many-core GPUs are utilized as high......-throughput co-processors to the CPU. We describe and demonstrate how this approach makes it possible to do fast desktop computations for large nonlinear wave problems in numerical wave tanks (NWTs) with close to 50/100 million total grid points in double/ single precision with 4 GB global device memory...... available. A new code base has been developed in C++ and compute unified device architecture C and is found to improve the runtime more than an order in magnitude in double precision arithmetic for the same accuracy over an existing CPU (single thread) Fortran 90 code when executed on a single modern GPU...
A Flexible Job Shop Scheduling Problem with Controllable Processing Times to Optimize Total Cost of Delay and Processing

Directory of Open Access Journals (Sweden)

Hadi Mokhtari

2015-11-01

Full Text Available In this paper, the flexible job shop scheduling problem with machine flexibility and controllable process times is studied. The main idea is that the processing times of operations may be controlled by consumptions of additional resources. The purpose of this paper to find the best trade-off between processing cost and delay cost in order to minimize the total costs. The proposed model, flexible job shop scheduling with controllable processing times (FJCPT, is formulated as an integer non-linear programming (INLP model and then it is converted into an integer linear programming (ILP model. Due to NP-hardness of FJCPT, conventional analytic optimization methods are not efficient. Hence, in order to solve the problem, a Scatter Search (SS, as an efficient metaheuristic method, is developed. To show the effectiveness of the proposed method, numerical experiments are conducted. The efficiency of the proposed algorithm is compared with that of a genetic algorithm (GA available in the literature for solving FJSP problem. The results showed that the proposed SS provide better solutions than the existing GA.
SU-E-T-423: Fast Photon Convolution Calculation with a 3D-Ideal Kernel On the GPU

Energy Technology Data Exchange (ETDEWEB)

Moriya, S; Sato, M [Komazawa University, Setagaya, Tokyo (Japan); Tachibana, H [National Cancer Center Hospital East, Kashiwa, Chiba (Japan)

2015-06-15

Purpose: The calculation time is a trade-off for improving the accuracy of convolution dose calculation with fine calculation spacing of the KERMA kernel. We investigated to accelerate the convolution calculation using an ideal kernel on the Graphic Processing Units (GPU). Methods: The calculation was performed on the AMD graphics hardware of Dual FirePro D700 and our algorithm was implemented using the Aparapi that convert Java bytecode to OpenCL. The process of dose calculation was separated with the TERMA and KERMA steps. The dose deposited at the coordinate (x, y, z) was determined in the process. In the dose calculation running on the central processing unit (CPU) of Intel Xeon E5, the calculation loops were performed for all calculation points. On the GPU computation, all of the calculation processes for the points were sent to the GPU and the multi-thread computation was done. In this study, the dose calculation was performed in a water equivalent homogeneous phantom with 150{sup 3} voxels (2 mm calculation grid) and the calculation speed on the GPU to that on the CPU and the accuracy of PDD were compared. Results: The calculation time for the GPU and the CPU were 3.3 sec and 4.4 hour, respectively. The calculation speed for the GPU was 4800 times faster than that for the CPU. The PDD curve for the GPU was perfectly matched to that for the CPU. Conclusion: The convolution calculation with the ideal kernel on the GPU was clinically acceptable for time and may be more accurate in an inhomogeneous region. Intensity modulated arc therapy needs dose calculations for different gantry angles at many control points. Thus, it would be more practical that the kernel uses a coarse spacing technique if the calculation is faster while keeping the similar accuracy to a current treatment planning system.

Time- and radiation-dose dependent changes in the plasma proteome after total body irradiation of non-human primates: Implications for biomarker selection.

Directory of Open Access Journals (Sweden)

Stephanie D Byrum

Full Text Available Acute radiation syndrome (ARS is a complex multi-organ disease resulting from total body exposure to high doses of radiation. Individuals can be exposed to total body irradiation (TBI in a number of ways, including terrorist radiological weapons or nuclear accidents. In order to determine whether an individual has been exposed to high doses of radiation and needs countermeasure treatment, robust biomarkers are needed to estimate radiation exposure from biospecimens such as blood or urine. In order to identity such candidate biomarkers of radiation exposure, high-resolution proteomics was used to analyze plasma from non-human primates following whole body irradiation (Co-60 at 6.7 Gy and 7.4 Gy with a twelve day observation period. A total of 663 proteins were evaluated from the plasma proteome analysis. A panel of plasma proteins with characteristic time- and dose-dependent changes was identified. In addition to the plasma proteomics study reported here, we recently identified candidate biomarkers using urine from these same non-human primates. From the proteomic analysis of both plasma and urine, we identified ten overlapping proteins that significantly differentiate both time and dose variables. These shared plasma and urine proteins represent optimal candidate biomarkers of radiation exposure.
Evaluating performance of MARTe as a real-time framework for feed-back control system at tokamak device

Energy Technology Data Exchange (ETDEWEB)

Yun, Sangwon; Lee, Woongryol; Lee, Taegu; Park, Mikyung; Lee, Sangil [National Fusion Research Institute (NFRI), Gwahangno 169-148, Yuseong-Gu, Daejeon 305-806 (Korea, Republic of); Neto, André C. [Associação EURATOM/IST, Instituto de Plasmas e Fusão Nuclear, Instituto Superior Técnico, Universidade Técnica de Lisboa, P-1049-001 Lisboa (Portugal); Wallander, Anders [ITER Organization, Route de Vinon sur Verdon, 13115 St Paul Lez Durance (France); Kim, Young-Kuk, E-mail: ykim@cnu.ac.kr [Chungnam National University, Daejeon 305-764 (Korea, Republic of)

2013-10-15

Highlights: •We measured the performance of MARTe by measuring response time and jitter. •We compared the performance of application with and without MARTe. •We compared the performance of MARTe application on different O/Ss. -- Abstract: The Korea Super conducting Tokamak Advanced Research (KSTAR) is performing the task of “Demonstration and Evaluation of ITER CODAC Technologies at KSTAR” whose objective is the evaluation of real-time technologies for decision making on real-time operating systems (RTOS), real-time frameworks and 10 GbE networks. In this task, the Multi-threaded Application Real-Time executor (MARTe) has been evaluated as a real-time framework for real-time feedback control system. The performance of MARTe has been verified by measuring response time and jitter in a path of feedback control from an analog input of a monitoring system to an analog output of an actuator system. In addition, the evaluation has been performed in terms of applicability of MARTe and its performance depending on types of operating system and tuning of CPU affinity and priority. This paper describes the overview of MARTe as a real-time framework, the results of evaluation performance and its implementation.
Evaluating performance of MARTe as a real-time framework for feed-back control system at tokamak device

International Nuclear Information System (INIS)

Yun, Sangwon; Lee, Woongryol; Lee, Taegu; Park, Mikyung; Lee, Sangil; Neto, André C.; Wallander, Anders; Kim, Young-Kuk

2013-01-01

Highlights: •We measured the performance of MARTe by measuring response time and jitter. •We compared the performance of application with and without MARTe. •We compared the performance of MARTe application on different O/Ss. -- Abstract: The Korea Super conducting Tokamak Advanced Research (KSTAR) is performing the task of “Demonstration and Evaluation of ITER CODAC Technologies at KSTAR” whose objective is the evaluation of real-time technologies for decision making on real-time operating systems (RTOS), real-time frameworks and 10 GbE networks. In this task, the Multi-threaded Application Real-Time executor (MARTe) has been evaluated as a real-time framework for real-time feedback control system. The performance of MARTe has been verified by measuring response time and jitter in a path of feedback control from an analog input of a monitoring system to an analog output of an actuator system. In addition, the evaluation has been performed in terms of applicability of MARTe and its performance depending on types of operating system and tuning of CPU affinity and priority. This paper describes the overview of MARTe as a real-time framework, the results of evaluation performance and its implementation
Real time data analysis with the ATLAS Trigger at the LHC in Run-2

CERN Document Server

Beauchemin, Pierre-Hugues; The ATLAS collaboration

2018-01-01

The trigger selection capabilities of the ATLAS detector have been significantly enhanced for the LHC Run- 2 in order to cope with the higher event rates and with the large number of simultaneous interactions (pile-up) per protonproton bunch crossing. A new hardware system, designed to analyse real time event-topologies at Level-1 came to full use in 2017. A hardware-based track reconstruction system, expected to be used real-time in 2018, is designed to provide track information to the high-level software trigger at its full input rate. The high-level trigger selections are largely relying on offline-like reconstruction techniques, and in some cases multivariate analysis methods. Despite the sudden change in LHC operations during the second half of 2017, which caused an increase in pile-up and therefore also in CPU usage of the trigger algorithms, the set of triggers (so called trigger menu) running online has undergone only minor modifications thanks to the robustness and redundancy of the trigger system, a...
Minimizing cache misses in an event-driven network server: A case study of TUX

DEFF Research Database (Denmark)

Bhatia, Sapan; Consel, Charles; Lawall, Julia Laetitia

2006-01-01

We analyze the performance of CPU-bound network servers and demonstrate experimentally that the degradation in the performance of these servers under high-concurrency workloads is largely due to inefficient use of the hardware caches. We then describe an approach to speeding up event-driven network...... servers by optimizing their use of the L2 CPU cache in the context of the TUX Web server, known for its robustness to heavy load. Our approach is based on a novel cache-aware memory allocator and a specific scheduling strategy that together ensure that the total working data set of the server stays...
Influence of overall treatment time in a fractionated total lymphoid irradiation as an immunosuppressive therapy in allogeneic bone marrow transplantation in mice

International Nuclear Information System (INIS)

Waer, M.; Ang, K.K.; Vandeputte, M.; Van der Schueren, E.

1982-01-01

Three groups of C 57 /BL/Ka mice received total lymphoid irradiation (TLI) in a total dose of 34 Gy in three different fractionation schedules. The tolerance of all different schedules was excellent. No difference in the peripheral white blood cell and lymphocyte counts nor the degree of immunosuppression as measured by phytohaemaglutinin or concanavalin A induced blastogenesis and mixed lymphocyte reaction were observed at the end of the treatment and up to 200 days. When bone marrow transplantation was performed one day after the end of each schedule, chimerism without signs of graft versus host disease was induced in all the groups. However, from the results in a limited number of animals it seems that concentrated schedules were less effective for chimerism induction. It has been demonstrated that it is possible to reduce drastically the overall treatment time for TLI before bone marrow transplantation. Further investigations are necessary in order to determine the optimal time-dose-fractionation factors and the different perameters involved in the transplantation
Improvement of the real-time processor in JT-60 data processing system

International Nuclear Information System (INIS)

Sakata, S.; Kiyono, K.; Sato, M.; Kominato, T.; Sueoka, M.; Hosoyama, H.; Kawamata, Y.

2009-01-01

Real-time processor, RTP is a basic subsystem in the JT-60 data processing system and plays an important role in JT-60 feedback control for plasma experiment. During the experiment, RTP acquires various diagnostic signals, processes them into a form of physical values, and transfers them as sensor signals to the particle supply and heating control supervisor for feedback control via reflective memory synchronization with 1 ms clock signals. After the start of RTP operation in 1997, to meet the demand for advanced plasma experiment, RTP had been improved continuously such as by addition of diagnostic signals with faster digitizers, reducing time for data transfer utilizing reflective memory instead of CAMAC. However, it is becoming increasingly difficult to maintain, manage, and improve the outdated RTP with limited system CPU capability. Currently, a prototype RTP system is being developed for the next real-time processing system, which is composed of clustered system utilizing VxWorks computer. The processes on the existing RTP system will be decentralized to the VxWorks computer to solve the issues of the existing RTP system. The prototype RTP system will start to operate in August 2008.
Total-factor energy efficiency in developing countries

International Nuclear Information System (INIS)

Zhang Xingping; Cheng Xiaomei; Yuan Jiahai; Gao Xiaojun

2011-01-01

This paper uses a total-factor framework to investigate energy efficiency in 23 developing countries during the period of 1980-2005. We explore the total-factor energy efficiency and change trends by applying data envelopment analysis (DEA) window, which is capable of measuring efficiency in cross-sectional and time-varying data. The empirical results indicate that Botswana, Mexico and Panama perform the best in terms of energy efficiency, whereas Kenya, Sri Lanka, Syria and the Philippines perform the worst during the entire research period. Seven countries show little change in energy efficiency over time. Eleven countries experienced continuous decreases in energy efficiency. Among five countries witnessing continuous increase in total-factor energy efficiency, China experienced the most rapid rise. Practice in China indicates that effective energy policies play a crucial role in improving energy efficiency. Tobit regression analysis indicates that a U-shaped relationship exists between total-factor energy efficiency and income per capita. - Research Highlights: → To measure the total-factor energy efficiency using DEA window analysis. → Focus on an application area of developing countries in the period of 1980-2005. → A U-shaped relationship was found between total-factor energy efficiency and income.
Restricted Collision List method for faster Direct Simulation Monte-Carlo (DSMC) collisions

Energy Technology Data Exchange (ETDEWEB)

Macrossan, Michael N., E-mail: m.macrossan@uq.edu.au

2016-08-15

The ‘Restricted Collision List’ (RCL) method for speeding up the calculation of DSMC Variable Soft Sphere collisions, with Borgnakke–Larsen (BL) energy exchange, is presented. The method cuts down considerably on the number of random collision parameters which must be calculated (deflection and azimuthal angles, and the BL energy exchange factors). A relatively short list of these parameters is generated and the parameters required in any cell are selected from this list. The list is regenerated at intervals approximately equal to the smallest mean collision time in the flow, and the chance of any particle re-using the same collision parameters in two successive collisions is negligible. The results using this method are indistinguishable from those obtained with standard DSMC. The CPU time saving depends on how much of a DSMC calculation is devoted to collisions and how much is devoted to other tasks, such as moving particles and calculating particle interactions with flow boundaries. For 1-dimensional calculations of flow in a tube, the new method saves 20% of the CPU time per collision for VSS scattering with no energy exchange. With RCL applied to rotational energy exchange, the CPU saving can be greater; for small values of the rotational collision number, for which most collisions involve some rotational energy exchange, the CPU may be reduced by 50% or more.
Bayer image parallel decoding based on GPU

Science.gov (United States)

Hu, Rihui; Xu, Zhiyong; Wei, Yuxing; Sun, Shaohua

2012-11-01

In the photoelectrical tracking system, Bayer image is decompressed in traditional method, which is CPU-based. However, it is too slow when the images become large, for example, 2K×2K×16bit. In order to accelerate the Bayer image decoding, this paper introduces a parallel speedup method for NVIDA's Graphics Processor Unit (GPU) which supports CUDA architecture. The decoding procedure can be divided into three parts: the first is serial part, the second is task-parallelism part, and the last is data-parallelism part including inverse quantization, inverse discrete wavelet transform (IDWT) as well as image post-processing part. For reducing the execution time, the task-parallelism part is optimized by OpenMP techniques. The data-parallelism part could advance its efficiency through executing on the GPU as CUDA parallel program. The optimization techniques include instruction optimization, shared memory access optimization, the access memory coalesced optimization and texture memory optimization. In particular, it can significantly speed up the IDWT by rewriting the 2D (Tow-dimensional) serial IDWT into 1D parallel IDWT. Through experimenting with 1K×1K×16bit Bayer image, data-parallelism part is 10 more times faster than CPU-based implementation. Finally, a CPU+GPU heterogeneous decompression system was designed. The experimental result shows that it could achieve 3 to 5 times speed increase compared to the CPU serial method.
Recent improvements in the performance of the muiltitasked TORT on time-shared Cray computers

International Nuclear Information System (INIS)

Azmy, Y.Y.

1996-01-01

Coarse-grained angular domain decomposition of the mesh sweep algorithm has been implemented in ORNL's three dimensional transport code TORT for Cray's macrotasking environment on platforms running the UNICOS operating system. A performance model constructed earlier is reviewed and its main result, namely the identification of the sources of parallelization overhead, is used to motivate the present work. The sources of overhead treated here are: redundant operations in the angular loop across participating tasks; repetitive task creation; lock utilization to prevent overwriting the flux moment arrays accumulated by the participating tasks. Substantial reduction in the parallelization overhead is demonstrated via sample runs with fixed tunning, i.e. zero CPU hold time. Up to 50% improvement in the wall clock speedup over the previous implementation with autotunning is observed in some test problems
Mammography in Norway: Image quality and total performance

International Nuclear Information System (INIS)

Olsen, J.B.; Skretting, A.; Widmark, A.

1997-04-01

This report describes a method for assessing the total performance in mammography based on Receiver Operating Characteristic (ROC) analysis. In the time period from December 1993 to March 1994 the method was applied to assess the total performance of all the 45 Norwegian mammography laboratories operative at that time. Image quality characteristics in each laboratory was established by use of well-known phantoms
Coffee Production in Kigoma Region, Tanzania: Profitability and ...

African Journals Online (AJOL)

Farmers processed at CPU gained about TZS 1350/kg as coffee improvement gain. Coffee production ... explored, keeping in mind the theories put forth in the theoretical ... Information used in the gross margin analysis encompass total coffee ...
A PC based multi-CPU severe accident simulation trainer

International Nuclear Information System (INIS)

Jankowski, M.W.; Bienarz, P.P.; Sartmadjiev, A.D.

2004-01-01

MELSIM Severe Accident Simulation Trainer is a personal computer based system being developed by the International Atomic Energy Agency and Risk Management Associates, Inc. for the purpose of training the operators of nuclear power stations. It also serves for evaluating accident management strategies as well as assessing complex interfaces between emergency operating procedures and accident management guidelines. The system is being developed for the Soviet designed WWER-440/Model 213 reactor and it is plant specific. The Bohunice V2 power station in the Slovak Republic has been selected for trial operation of the system. The trainer utilizes several CPUs working simultaneously on different areas of simulation. Detailed plant operation displays are provided on colour monitor mimic screens which show changing plant conditions in approximate real-time. Up to 28 000 curves can be plotted on a separate monitor as the MELSIM program proceeds. These plots proceed concurrently with the program, and time specific segments can be recalled for review. A benchmarking (limited in scope) against well validated thermal-hydraulic codes and selected plant accident data (WWER-440/213 Rovno NPP, Ukraine) has been initiated. Preliminary results are presented and discussed. (author)
Synthetic Aperture Sequential Beamforming implemented on multi-core platforms

DEFF Research Database (Denmark)

Kjeldsen, Thomas; Lassen, Lee; Hemmsen, Martin Christian

2014-01-01

This paper compares several computational ap- proaches to Synthetic Aperture Sequential Beamforming (SASB) targeting consumer level parallel processors such as multi-core CPUs and GPUs. The proposed implementations demonstrate that ultrasound imaging using SASB can be executed in real- time with ...... per second) on an Intel Core i7 2600 CPU with an AMD HD7850 and a NVIDIA GTX680 GPU. The fastest CPU and GPU implementations use 14% and 1.3% of the real-time budget of 62 ms/frame, respectively. The maximum achieved processing rate is 1265 frames/s....
High spatial resolution CT image reconstruction using parallel computing

International Nuclear Information System (INIS)

Yin Yin; Liu Li; Sun Gongxing

2003-01-01

Using the PC cluster system with 16 dual CPU nodes, we accelerate the FBP and OR-OSEM reconstruction of high spatial resolution image (2048 x 2048). Based on the number of projections, we rewrite the reconstruction algorithms into parallel format and dispatch the tasks to each CPU. By parallel computing, the speedup factor is roughly equal to the number of CPUs, which can be up to about 25 times when 25 CPUs used. This technique is very suitable for real-time high spatial resolution CT image reconstruction. (authors)
Total spectral distributions from Hawking radiation

Energy Technology Data Exchange (ETDEWEB)

Broda, Boguslaw [University of Lodz, Department of Theoretical Physics, Faculty of Physics and Applied Informatics, Lodz (Poland)

2017-11-15

Taking into account the time dependence of the Hawking temperature and finite evaporation time of the black hole, the total spectral distributions of the radiant energy and of the number of particles have been explicitly calculated and compared to their temporary (initial) blackbody counterparts (spectral exitances). (orig.)
A GPU-based calculation using the three-dimensional FDTD method for electromagnetic field analysis.

Science.gov (United States)

Nagaoka, Tomoaki; Watanabe, Soichi

2010-01-01

Numerical simulations with the numerical human model using the finite-difference time domain (FDTD) method have recently been performed frequently in a number of fields in biomedical engineering. However, the FDTD calculation runs too slowly. We focus, therefore, on general purpose programming on the graphics processing unit (GPGPU). The three-dimensional FDTD method was implemented on the GPU using Compute Unified Device Architecture (CUDA). In this study, we used the NVIDIA Tesla C1060 as a GPGPU board. The performance of the GPU is evaluated in comparison with the performance of a conventional CPU and a vector supercomputer. The results indicate that three-dimensional FDTD calculations using a GPU can significantly reduce run time in comparison with that using a conventional CPU, even a native GPU implementation of the three-dimensional FDTD method, while the GPU/CPU speed ratio varies with the calculation domain and thread block size.
A real-time PUFF-model for accidental releases in complex terrain

International Nuclear Information System (INIS)

Thykier-Nielsen, S.; Mikkelsen, T.; Larsen, S.E.; Troen, I.; Baas, A.F. de; Kamada, R.; Skupniewicz, C.; Schacher, G.

1990-01-01

LINCOM-RIMPUFF, a combined flow/puff model, was developed at Riso National Laboratory for the Vandenberg AFB Meteorology and Plume Dispersion Handbook and is suitable as is for real time response to emergency spills and vents of gases and radionuclides. LINCOM is a linear, diagnostic, spectral, potential flow model which extends the Jackson-Hunt theory of non-hydrostatic, adiabatic wind flow over hills to the mesoscale domain. It is embedded in a weighted objective analysis (WOA) of real-time Vandenberg tower winds and may be used in ultra-high speed lookup table mode. The mesoscale dispersion model RIMPUFF is a flexible Gaussian puff model equipped with computer-time effective features for terrain and stability-dependent dispersion parameterization, plume rise formulas, inversion and ground-level reflection capabilities and wet/dry (source) depletion. It can treat plume bifurcation in complex terrain by using a puff-splitting scheme. It allows the flow-model to compute the larger scale wind field, reserving turbulent diffusion calculations for the sub-grid scale. In diagnostic mode toxic exposure are well assessed via the release of a single initial puff. With optimization, processing time for RIMPUFF should be on the order of 2 CPU minutes or less on a PC-system. In prognostic mode with shifting winds, multiple puff releases may become necessary, thereby lengthening processing time
Fast time- and frequency-domain finite-element methods for electromagnetic analysis

Science.gov (United States)

Lee, Woochan

Fast electromagnetic analysis in time and frequency domain is of critical importance to the design of integrated circuits (IC) and other advanced engineering products and systems. Many IC structures constitute a very large scale problem in modeling and simulation, the size of which also continuously grows with the advancement of the processing technology. This results in numerical problems beyond the reach of existing most powerful computational resources. Different from many other engineering problems, the structure of most ICs is special in the sense that its geometry is of Manhattan type and its dielectrics are layered. Hence, it is important to develop structure-aware algorithms that take advantage of the structure specialties to speed up the computation. In addition, among existing time-domain methods, explicit methods can avoid solving a matrix equation. However, their time step is traditionally restricted by the space step for ensuring the stability of a time-domain simulation. Therefore, making explicit time-domain methods unconditionally stable is important to accelerate the computation. In addition to time-domain methods, frequency-domain methods have suffered from an indefinite system that makes an iterative solution difficult to converge fast. The first contribution of this work is a fast time-domain finite-element algorithm for the analysis and design of very large-scale on-chip circuits. The structure specialty of on-chip circuits such as Manhattan geometry and layered permittivity is preserved in the proposed algorithm. As a result, the large-scale matrix solution encountered in the 3-D circuit analysis is turned into a simple scaling of the solution of a small 1-D matrix, which can be obtained in linear (optimal) complexity with negligible cost. Furthermore, the time step size is not sacrificed, and the total number of time steps to be simulated is also significantly reduced, thus achieving a total cost reduction in CPU time. The second contribution

AcEST: BP915442 [AcEST

Lifescience Database Archive (English)

Full Text Available to BlastX Result : TrEMBL tr_hit_id B8CPU2 Definition tr|B8CPU2|B8CPU2_9GAMM GGDEF domain protein OS=Shewanella piezo...ore E Sequences producing significant alignments: (bits) Value tr|B8CPU2|B8CPU2_9GAMM GGDEF domain protein OS=Shewanella piezo...t... 33 5.5 >tr|B8CPU2|B8CPU2_9GAMM GGDEF domain protein OS=Shewanella piezotolerans WP3 G
Total and isoform-specific quantitative assessment of circulating Fibulin-1 using selected reaction monitoring mass spectrometry and time-resolved immunofluorometry

DEFF Research Database (Denmark)

Overgaard, Martin; Cangemi, Claudia; Jensen, Martin L

2015-01-01

biomarker fibulin-1 and its circulating isoforms in human plasma. EXPERIMENTAL DESIGN:: We used bioinformatics analysis to predict total and isoform-specific tryptic peptides for absolute quantitation using SRM-MS. Fibulin-1 was quantitated in plasma by nanoflow-LC-SRM-MS in undepleted plasma and time......PURPOSE:: Targeted proteomics using SRM-MS combined with stable isotope dilution has emerged as a promising quantitative technique for the study of circulating protein biomarkers. The purpose of this study was to develop and characterize robust quantitative assays for the emerging cardiovascular......-resolved immunofluorometric assay (TRIFMA). Both methods were validated and compared to a commercial ELISA (CircuLex). Molecular size determination was performed under native conditions by SEC analysis coupled to SRM-MS and TRIFMA. RESULTS:: Absolute quantitation of total fibulin-1, isoforms -1C and -1D was performed by SRM...
A Novel Ant Colony Algorithm for the Single-Machine Total Weighted Tardiness Problem with Sequence Dependent Setup Times

Directory of Open Access Journals (Sweden)

Fardin Ahmadizar

2011-08-01

Full Text Available This paper deals with the NP-hard single-machine total weighted tardiness problem with sequence dependent setup times. Incorporating fuzzy sets and genetic operators, a novel ant colony optimization algorithm is developed for the problem. In the proposed algorithm, artificial ants construct solutions as orders of jobs based on the heuristic information as well as pheromone trails. To calculate the heuristic information, three well-known priority rules are adopted as fuzzy sets and then aggregated. When all artificial ants have terminated their constructions, genetic operators such as crossover and mutation are applied to generate new regions of the solution space. A local search is then performed to improve the performance quality of some of the solutions found. Moreover, at run-time the pheromone trails are locally as well as globally updated, and limited between lower and upper bounds. The proposed algorithm is experimented on a set of benchmark problems from the literature and compared with other metaheuristics.
Total factor productivity (TFP) growth agriculture in pakistan: trends in different time horizons

International Nuclear Information System (INIS)

Ali, A.; Mushtaq, K.; Ashfaq, M.

2008-01-01

The present study estimated total factor productivity (TFP) growth of agriculture sector of Pakistan for the period 1971-2006 by employing Tornqvist-Theil (T-T) index number methodology. Most of the conventional inputs were used in constructing the input index. The output index includes major crops, minor crops, important fruits and vegetables and four categories of livestock products. The study estimated TFP growth rates for different decades. The results showed that TFP growth rate was lowest during the decade of 70s (0.96 percent) and highest during the last six years of the study period (2.86 percent). The decade of 80s and 90s registered TFP growth rate of 2.24 percent and 2.46 percent, respectively. The results also explained that TFP growth contributed about 33 percent to total agricultural output growth during the decade of 70s and this contribution increased up to 83 percent during the last six years of the study period. The contribution of TFP growth to total agricultural output growth was 53 and 81 percent during the decades of 80s and 90s, respectively. The study observed that macro level government policies, institutional factors and weather conditions are the major key factors that influenced TFP growth. (author)
Critical Care Admissions following Total Laryngectomy: Is It Time to Change Our Practice?

Science.gov (United States)

Walijee, Hussein; Morgan, Alexandria; Gibson, Bethan; Berry, Sandeep; Jaffery, Ali

2016-01-01

Critical Care Unit (CCU) beds are a limited resource and in increasing demand. Studies have shown that complex head and neck patients can be safely managed on a ward setting given the appropriate staffing and support. This retrospective case series aims to quantify the CCU care received by patients following total laryngectomy (TL) at a District General Hospital (DGH) and compare patient outcomes in an attempt to inform current practice. Data relating to TL were collected over a 5-year period from 1st January 2010 to 31st December 2015. A total of 22 patients were included. All patients were admitted to CCU postoperatively for an average length of stay of 25.5 hours. 95% of these patients were admitted to CCU for the purpose of close monitoring only, not requiring any active treatment prior to discharge to the ward. 73% of total complications were encountered after the first 24 hours postoperatively at which point patients had been stepped down to ward care. Avoiding the use of CCU beds and instead providing the appropriate level of care on the ward would result in a potential cost saving of approximately £8,000 with no influence on patient morbidity and mortality.
A Hybrid Metaheuristic Approach for Minimizing the Total Flow Time in A Flow Shop Sequence Dependent Group Scheduling Problem

Directory of Open Access Journals (Sweden)

Antonio Costa

2014-07-01

Full Text Available Production processes in Cellular Manufacturing Systems (CMS often involve groups of parts sharing the same technological requirements in terms of tooling and setup. The issue of scheduling such parts through a flow-shop production layout is known as the Flow-Shop Group Scheduling (FSGS problem or, whether setup times are sequence-dependent, the Flow-Shop Sequence-Dependent Group Scheduling (FSDGS problem. This paper addresses the FSDGS issue, proposing a hybrid metaheuristic procedure integrating features from Genetic Algorithms (GAs and Biased Random Sampling (BRS search techniques with the aim of minimizing the total flow time, i.e., the sum of completion times of all jobs. A well-known benchmark of test cases, entailing problems with two, three, and six machines, is employed for both tuning the relevant parameters of the developed procedure and assessing its performances against two metaheuristic algorithms recently presented by literature. The obtained results and a properly arranged ANOVA analysis highlight the superiority of the proposed approach in tackling the scheduling problem under investigation.
Controlled overflowing of data-intensive jobs from oversubscribed sites

CERN Document Server

Sfiligoi, Igor; Bockelman, Brian Paul; Bradley, Daniel Charles; Tadel, Matevz; Bloom, Kenneth Arthur; Letts, James; Mrak Tadel, Alja

2012-01-01

The CMS analysis computing model was always relying on jobs running near the data, with data allocation between CMS compute centers organized at management level, based on expected needs of the CMS community. While this model provided high CPU utilization during job run times, there were times when a large fraction of CPUs at certain sites were sitting idle due to lack of demand, all while Terabytes of data were never accessed. To improve the utilization of both CPU and disks, CMS is moving toward controlled overflowing of jobs from sites that have data but are oversubscribed to others with spare CPU and network capacity, with those jobs accessing the data through real time Xrootd streaming over WAN. The major limiting factor for remote data access is the ability of the source storage system to serve such data, so the number of jobs accessing it must be carefully controlled. The CMS approach to this is to implement the overflowing by means of glideinWMS, a Condor based pilot system, and by providing the WMS w...
Sex differences in behavioral and PKA cascade responses to repeated cocaine administration.

Science.gov (United States)

Zhou, Luyi; Sun, Wei-Lun; Weierstall, Karen; Minerly, Ana Christina; Weiner, Jan; Jenab, Shirzad; Quinones-Jenab, Vanya

2016-10-01

Previous studies have shown sex different patterns in behavioral responses to cocaine. Here, we used between-subject experiment design to study whether sex differences exist in the development of behavioral sensitization and tolerance to repeated cocaine, as well as the role of protein kinase A (PKA) signaling cascade in this process. Ambulatory and rearing responses were recorded in male and female rats after 1 to 14 days of administration of saline or cocaine (15 mg/kg; ip). Correspondent PKA-associated signaling in the nucleus accumbens (NAc) and caudate-putamen (CPu) was measured at each time point. Our results showed that females exhibited higher cocaine-induced behavioral responses and developed behavioral sensitization and tolerance faster than males. Whereas females developed behavioral sensitization to cocaine after 2 days and tolerance after 14 days, male rats developed sensitization after 5 days. In addition, cocaine induced a sexual dimorphic pattern in the progression of neuronal adaptations on the PKA cascade signaling in region (NAc vs. CPu) and time (days of cocaine administration)-dependent manners. In general, more PKA signaling cascade changes were found in the NAc of males on day 5 and in the CPu of females with repeated cocaine injection. In addition, in females, behavioral activities positively correlated with FosB levels in the NAc and CPu and negatively correlated with Cdk5 and p35 in the CPu, while no correlation was observed in males. Our studies suggest that repeated cocaine administration induced different patterns of behavioral and molecular responses in the PKA cascade in male and female rats.
Benchmarking worker nodes using LHCb productions and comparing with HEPSpec06

Science.gov (United States)

Charpentier, P.

2017-10-01

In order to estimate the capabilities of a computing slot with limited processing time, it is necessary to know with a rather good precision its “power”. This allows for example pilot jobs to match a task for which the required CPU-work is known, or to define the number of events to be processed knowing the CPU-work per event. Otherwise one always has the risk that the task is aborted because it exceeds the CPU capabilities of the resource. It also allows a better accounting of the consumed resources. The traditional way the CPU power is estimated in WLCG since 2007 is using the HEP-Spec06 benchmark (HS06) suite that was verified at the time to scale properly with a set of typical HEP applications. However, the hardware architecture of processors has evolved, all WLCG experiments moved to using 64-bit applications and use different compilation flags from those advertised for running HS06. It is therefore interesting to check the scaling of HS06 with the HEP applications. For this purpose, we have been using CPU intensive massive simulation productions from the LHCb experiment and compared their event throughput to the HS06 rating of the worker nodes. We also compared it with a much faster benchmark script that is used by the DIRAC framework used by LHCb for evaluating at run time the performance of the worker nodes. This contribution reports on the finding of these comparisons: the main observation is that the scaling with HS06 is no longer fulfilled, while the fast benchmarks have a better scaling but are less precise. One can also clearly see that some hardware or software features when enabled on the worker nodes may enhance their performance beyond expectation from either benchmark, depending on external factors.
Comparative study of total shoulder arthroplasty versus total shoulder surface replacement for glenohumeral osteoarthritis with minimum 2-year follow-up

NARCIS (Netherlands)

Kooistra, B.W.; Willems, W.J.H.; Lemmens, E.; Hartel, B.P.; Bekerom, M.P. van den; Deurzen, D.F.P. van

2017-01-01

BACKGROUND: Compared with total shoulder arthroplasty (TSA), total shoulder surface replacement (TSSR) may offer the advantage of preservation of bone stock and shorter surgical time, possibly at the expense of glenoid component positioning and increasing lateral glenohumeral offset. We hypothesized
Latitude-Time Total Electron Content Anomalies as Precursors to Japan's Large Earthquakes Associated with Principal Component Analysis

Directory of Open Access Journals (Sweden)

Jyh-Woei Lin

2011-01-01

Full Text Available The goal of this study is to determine whether principal component analysis (PCA can be used to process latitude-time ionospheric TEC data on a monthly basis to identify earthquake associated TEC anomalies. PCA is applied to latitude-time (mean-of-a-month ionospheric total electron content (TEC records collected from the Japan GEONET network to detect TEC anomalies associated with 18 earthquakes in Japan (M≥6.0 from 2000 to 2005. According to the results, PCA was able to discriminate clear TEC anomalies in the months when all 18 earthquakes occurred. After reviewing months when no M≥6.0 earthquakes occurred but geomagnetic storm activity was present, it is possible that the maximal principal eigenvalues PCA returned for these 18 earthquakes indicate earthquake associated TEC anomalies. Previously PCA has been used to discriminate earthquake-associated TEC anomalies recognized by other researchers, who found that statistical association between large earthquakes and TEC anomalies could be established in the 5 days before earthquake nucleation; however, since PCA uses the characteristics of principal eigenvalues to determine earthquake related TEC anomalies, it is possible to show that such anomalies existed earlier than this 5-day statistical window.
Distributed real time data processing architecture for the TJ-II data acquisition system

International Nuclear Information System (INIS)

Ruiz, M.; Barrera, E.; Lopez, S.; Machon, D.; Vega, J.; Sanchez, E.

2004-01-01

This article describes the performance of a new model of architecture that has been developed for the TJ-II data acquisition system in order to increase its real time data processing capabilities. The current model consists of several compact PCI extension for instrumentation (PXI) standard chassis, each one with various digitizers. In this architecture, the data processing capability is restricted to the PXI controller's own performance. The controller must share its CPU resources between the data processing and the data acquisition tasks. In the new model, distributed data processing architecture has been developed. The solution adds one or more processing cards to each PXI chassis. This way it is possible to plan how to distribute the data processing of all acquired signals among the processing cards and the available resources of the PXI controller. This model allows scalability of the system. More or less processing cards can be added based on the requirements of the system. The processing algorithms are implemented in LabVIEW (from National Instruments), providing efficiency and time-saving application development when compared with other efficient solutions
Parallel PWTD-Accelerated Explicit Solution of the Time Domain Electric Field Volume Integral Equation

KAUST Repository

Liu, Yang

2016-03-25

A parallel plane-wave time-domain (PWTD)-accelerated explicit marching-on-in-time (MOT) scheme for solving the time domain electric field volume integral equation (TD-EFVIE) is presented. The proposed scheme leverages pulse functions and Lagrange polynomials to spatially and temporally discretize the electric flux density induced throughout the scatterers, and a finite difference scheme to compute the electric fields from the Hertz electric vector potentials radiated by the flux density. The flux density is explicitly updated during time marching by a predictor-corrector (PC) scheme and the vector potentials are efficiently computed by a scalar PWTD scheme. The memory requirement and computational complexity of the resulting explicit PWTD-PC-EFVIE solver scale as ( log ) s s O N N and ( ) s t O N N , respectively. Here, s N is the number of spatial basis functions and t N is the number of time steps. A scalable parallelization of the proposed MOT scheme on distributed- memory CPU clusters is described. The efficiency, accuracy, and applicability of the resulting (parallelized) PWTD-PC-EFVIE solver are demonstrated via its application to the analysis of transient electromagnetic wave interactions on canonical and real-life scatterers represented with up to 25 million spatial discretization elements.
Parallel PWTD-Accelerated Explicit Solution of the Time Domain Electric Field Volume Integral Equation

KAUST Repository

Liu, Yang; Al-Jarro, Ahmed; Bagci, Hakan; Michielssen, Eric

2016-01-01

A parallel plane-wave time-domain (PWTD)-accelerated explicit marching-on-in-time (MOT) scheme for solving the time domain electric field volume integral equation (TD-EFVIE) is presented. The proposed scheme leverages pulse functions and Lagrange polynomials to spatially and temporally discretize the electric flux density induced throughout the scatterers, and a finite difference scheme to compute the electric fields from the Hertz electric vector potentials radiated by the flux density. The flux density is explicitly updated during time marching by a predictor-corrector (PC) scheme and the vector potentials are efficiently computed by a scalar PWTD scheme. The memory requirement and computational complexity of the resulting explicit PWTD-PC-EFVIE solver scale as ( log ) s s O N N and ( ) s t O N N , respectively. Here, s N is the number of spatial basis functions and t N is the number of time steps. A scalable parallelization of the proposed MOT scheme on distributed- memory CPU clusters is described. The efficiency, accuracy, and applicability of the resulting (parallelized) PWTD-PC-EFVIE solver are demonstrated via its application to the analysis of transient electromagnetic wave interactions on canonical and real-life scatterers represented with up to 25 million spatial discretization elements.
An Integer Batch Scheduling Model for a Single Machine with Simultaneous Learning and Deterioration Effects to Minimize Total Actual Flow Time

Science.gov (United States)

Yusriski, R.; Sukoyo; Samadhi, T. M. A. A.; Halim, A. H.

2016-02-01

In the manufacturing industry, several identical parts can be processed in batches, and setup time is needed between two consecutive batches. Since the processing times of batches are not always fixed during a scheduling period due to learning and deterioration effects, this research deals with batch scheduling problems with simultaneous learning and deterioration effects. The objective is to minimize total actual flow time, defined as a time interval between the arrival of all parts at the shop and their common due date. The decision variables are the number of batches, integer batch sizes, and the sequence of the resulting batches. This research proposes a heuristic algorithm based on the Lagrange Relaxation. The effectiveness of the proposed algorithm is determined by comparing the resulting solutions of the algorithm to the respective optimal solution obtained from the enumeration method. Numerical experience results show that the average of difference among the solutions is 0.05%.
An Integer Batch Scheduling Model for a Single Machine with Simultaneous Learning and Deterioration Effects to Minimize Total Actual Flow Time

International Nuclear Information System (INIS)

Yusriski, R; Sukoyo; Samadhi, T M A A; Halim, A H

2016-01-01

In the manufacturing industry, several identical parts can be processed in batches, and setup time is needed between two consecutive batches. Since the processing times of batches are not always fixed during a scheduling period due to learning and deterioration effects, this research deals with batch scheduling problems with simultaneous learning and deterioration effects. The objective is to minimize total actual flow time, defined as a time interval between the arrival of all parts at the shop and their common due date. The decision variables are the number of batches, integer batch sizes, and the sequence of the resulting batches. This research proposes a heuristic algorithm based on the Lagrange Relaxation. The effectiveness of the proposed algorithm is determined by comparing the resulting solutions of the algorithm to the respective optimal solution obtained from the enumeration method. Numerical experience results show that the average of difference among the solutions is 0.05%. (paper)
GPU based contouring method on grid DEM data

Science.gov (United States)

Tan, Liheng; Wan, Gang; Li, Feng; Chen, Xiaohui; Du, Wenlong

2017-08-01

This paper presents a novel method to generate contour lines from grid DEM data based on the programmable GPU pipeline. The previous contouring approaches often use CPU to construct a finite element mesh from the raw DEM data, and then extract contour segments from the elements. They also need a tracing or sorting strategy to generate the final continuous contours. These approaches can be heavily CPU-costing and time-consuming. Meanwhile the generated contours would be unsmooth if the raw data is sparsely distributed. Unlike the CPU approaches, we employ the GPU's vertex shader to generate a triangular mesh with arbitrary user-defined density, in which the height of each vertex is calculated through a third-order Cardinal spline function. Then in the same frame, segments are extracted from the triangles by the geometry shader, and translated to the CPU-side with an internal order in the GPU's transform feedback stage. Finally we propose a "Grid Sorting" algorithm to achieve the continuous contour lines by travelling the segments only once. Our method makes use of multiple stages of GPU pipeline for computation, which can generate smooth contour lines, and is significantly faster than the previous CPU approaches. The algorithm can be easily implemented with OpenGL 3.3 API or higher on consumer-level PCs.
A finite volume method for cylindrical heat conduction problems based on local analytical solution

KAUST Repository

Li, Wang

2012-10-01

A new finite volume method for cylindrical heat conduction problems based on local analytical solution is proposed in this paper with detailed derivation. The calculation results of this new method are compared with the traditional second-order finite volume method. The newly proposed method is more accurate than conventional ones, even though the discretized expression of this proposed method is slightly more complex than the second-order central finite volume method, making it cost more calculation time on the same grids. Numerical result shows that the total CPU time of the new method is significantly less than conventional methods for achieving the same level of accuracy. © 2012 Elsevier Ltd. All rights reserved.
A finite volume method for cylindrical heat conduction problems based on local analytical solution

KAUST Repository

Li, Wang; Yu, Bo; Wang, Xinran; Wang, Peng; Sun, Shuyu

2012-01-01

A new finite volume method for cylindrical heat conduction problems based on local analytical solution is proposed in this paper with detailed derivation. The calculation results of this new method are compared with the traditional second-order finite volume method. The newly proposed method is more accurate than conventional ones, even though the discretized expression of this proposed method is slightly more complex than the second-order central finite volume method, making it cost more calculation time on the same grids. Numerical result shows that the total CPU time of the new method is significantly less than conventional methods for achieving the same level of accuracy. © 2012 Elsevier Ltd. All rights reserved.
Real time 3D structural and Doppler OCT imaging on graphics processing units

Science.gov (United States)

Sylwestrzak, Marcin; Szlag, Daniel; Szkulmowski, Maciej; Gorczyńska, Iwona; Bukowska, Danuta; Wojtkowski, Maciej; Targowski, Piotr

2013-03-01

In this report the application of graphics processing unit (GPU) programming for real-time 3D Fourier domain Optical Coherence Tomography (FdOCT) imaging with implementation of Doppler algorithms for visualization of the flows in capillary vessels is presented. Generally, the time of the data processing of the FdOCT data on the main processor of the computer (CPU) constitute a main limitation for real-time imaging. Employing additional algorithms, such as Doppler OCT analysis, makes this processing even more time consuming. Lately developed GPUs, which offers a very high computational power, give a solution to this problem. Taking advantages of them for massively parallel data processing, allow for real-time imaging in FdOCT. The presented software for structural and Doppler OCT allow for the whole processing with visualization of 2D data consisting of 2000 A-scans generated from 2048 pixels spectra with frame rate about 120 fps. The 3D imaging in the same mode of the volume data build of 220 × 100 A-scans is performed at a rate of about 8 frames per second. In this paper a software architecture, organization of the threads and optimization applied is shown. For illustration the screen shots recorded during real time imaging of the phantom (homogeneous water solution of Intralipid in glass capillary) and the human eye in-vivo is presented.

The Time Course of Knee Swelling Post Total Knee Arthroplasty and Its Associations with Quadriceps Strength and Gait Speed.

Science.gov (United States)

Pua, Yong-Hao

2015-07-01

This study examines the time course of knee swelling post total knee arthroplasty (TKA) and its associations with quadriceps strength and gait speed. Eighty-five patients with unilateral TKA participated. Preoperatively and on post-operative days (PODs) 1, 4, 14, and 90, knee swelling was measured using bioimpedance spectrometry. Preoperatively and on PODs 14 and 90, quadriceps strength was measured using isokinetic dynamometry while fast gait speed was measured using the timed 10-meter walk. On POD1, knee swelling increased ~35% from preoperative levels after which, knee swelling reduced but remained at ~11% above preoperative levels on POD90. In longitudinal, multivariable analyses, knee swelling was associated with quadriceps weakness (P<0.01) and slower gait speed (P=0.03). Interventions to reduce post-TKA knee swelling may be indicated to improve quadriceps strength and gait speed. Copyright © 2015 Elsevier Inc. All rights reserved.
The PAMELA storage and control unit

International Nuclear Information System (INIS)

Casolino, M.; Altamura, F.; Basili, A.; De Pascale, M.P.; Minori, M.; Nagni, M.; Picozza, P.; Sparvoli, R.; Adriani, O.; Papini, P.; Spillantini, P.; Castellini, G.; Boezio, M.

2007-01-01

The PAMELA Storage and Control Unit (PSCU) comprises a Central Processing Unit (CPU) and a Mass Memory (MM). The CPU of the experiment is based on a ERC-32 architecture (a SPARC v7 implementation) running a real time operating system (RTEMS). The main purpose of the CPU is to handle slow control, acquisition and store data on a 2 GB MM. Communications between PAMELA and the satellite are done via a 1553B bus. Data acquisition from the sub-detectors is performed via a 2 MB/s interface. Download from the PAMELA MM towards the satellite main storage unit is handled by a 16 MB/s bus. The maximum daily amount of data transmitted to ground is about 20 GB
The PAMELA storage and control unit

Energy Technology Data Exchange (ETDEWEB)

Casolino, M. [INFN, Structure of Rome II, Physics Department, University of Rome II ' Tor Vergata' , I-00133 Rome (Italy)]. E-mail: Marco.Casolino@roma2.infn.it; Altamura, F. [INFN, Structure of Rome II, Physics Department, University of Rome II ' Tor Vergata' , I-00133 Rome (Italy); Basili, A. [INFN, Structure of Rome II, Physics Department, University of Rome II ' Tor Vergata' , I-00133 Rome (Italy); De Pascale, M.P. [INFN, Structure of Rome II, Physics Department, University of Rome II ' Tor Vergata' , I-00133 Rome (Italy); Minori, M. [INFN, Structure of Rome II, Physics Department, University of Rome II ' Tor Vergata' , I-00133 Rome (Italy); Nagni, M. [INFN, Structure of Rome II, Physics Department, University of Rome II ' Tor Vergata' , I-00133 Rome (Italy); Picozza, P. [INFN, Structure of Rome II, Physics Department, University of Rome II ' Tor Vergata' , I-00133 Rome (Italy); Sparvoli, R. [INFN, Structure of Rome II, Physics Department, University of Rome II ' Tor Vergata' , I-00133 Rome (Italy); Adriani, O. [INFN, Structure of Florence, Physics Department, University of Florence, I-50019 Sesto Fiorentino (Italy); Papini, P. [INFN, Structure of Florence, Physics Department, University of Florence, I-50019 Sesto Fiorentino (Italy); Spillantini, P. [INFN, Structure of Florence, Physics Department, University of Florence, I-50019 Sesto Fiorentino (Italy); Castellini, G. [CNR-Istituto di Fisica Applicata ' Nello Carrara' , I-50127 Florence (Italy); Boezio, M. [INFN, Structure of Trieste, Physics Department, University of Trieste, I-34147 Trieste (Italy)

2007-03-01

The PAMELA Storage and Control Unit (PSCU) comprises a Central Processing Unit (CPU) and a Mass Memory (MM). The CPU of the experiment is based on a ERC-32 architecture (a SPARC v7 implementation) running a real time operating system (RTEMS). The main purpose of the CPU is to handle slow control, acquisition and store data on a 2 GB MM. Communications between PAMELA and the satellite are done via a 1553B bus. Data acquisition from the sub-detectors is performed via a 2 MB/s interface. Download from the PAMELA MM towards the satellite main storage unit is handled by a 16 MB/s bus. The maximum daily amount of data transmitted to ground is about 20 GB.
Position list word aligned hybrid

DEFF Research Database (Denmark)

Deliege, Francois; Pedersen, Torben Bach

2010-01-01

Compressed bitmap indexes are increasingly used for efficiently querying very large and complex databases. The Word Aligned Hybrid (WAH) bitmap compression scheme is commonly recognized as the most efficient compression scheme in terms of CPU efficiency. However, WAH compressed bitmaps use a lot...... of storage space. This paper presents the Position List Word Aligned Hybrid (PLWAH) compression scheme that improves significantly over WAH compression by better utilizing the available bits and new CPU instructions. For typical bit distributions, PLWAH compressed bitmaps are often half the size of WAH...... bitmaps and, at the same time, offer an even better CPU efficiency. The results are verified by theoretical estimates and extensive experiments on large amounts of both synthetic and real-world data....
Wide-bandwidth low-voltage PLL for powerPC(sup TM) microprocessors

Science.gov (United States)

Alvarez, Jose; Sanchez, Hector; Gerosa, Gianfranco; Countryman, Roger

1995-04-01

A 3.3 V Phase-Locked-Loop (PLL) clock synthesizer implemented in 0.5 micron CMOS technology is described. The PLL supports internal to external clock frequency ratios of 1, 1.5, 2, 3, and 4 as well as numerous static power down modes for PowerPC(sup TM) microprocessors. The CPU clock lock range spans from 6 to 175 MHz. Lock times below 15 mu s, PLL power dissipation below 10mW as well as phase error and jitter below +/- 100 ps have been measured. The total area of the PLL is 0.52 mm(exp 2).
Time-series MODIS image-based retrieval and distribution analysis of total suspended matter concentrations in Lake Taihu (China).

Science.gov (United States)

Zhang, Yuchao; Lin, Shan; Liu, Jianping; Qian, Xin; Ge, Yi

2010-09-01

Although there has been considerable effort to use remotely sensed images to provide synoptic maps of total suspended matter (TSM), there are limited studies on universal TSM retrieval models. In this paper, we have developed a TSM retrieval model for Lake Taihu using TSM concentrations measured in situ and a time series of quasi-synchronous MODIS 250 m images from 2005. After simple geometric and atmospheric correction, we found a significant relationship (R = 0.8736, N = 166) between in situ measured TSM concentrations and MODIS band normalization difference of band 3 and band 1. From this, we retrieved TSM concentrations in eight regions of Lake Taihu in 2007 and analyzed the characteristic distribution and variation of TSM. Synoptic maps of model-estimated TSM of 2007 showed clear geographical and seasonal variations. TSM in Central Lake and Southern Lakeshore were consistently higher than in other regions, while TSM in East Taihu was generally the lowest among the regions throughout the year. Furthermore, a wide range of TSM concentrations appeared from winter to summer. TSM in winter could be several times that in summer.
32 CFR 286.29 - Collection of fees and fee rates.

Science.gov (United States)

2010-07-01

... support, operator, programmer, database administrator, or action officer). (ii) Machine time. Machine time involves only direct costs of the Central Processing Unit (CPU), input/output devices, and memory capacity...
Novel methods in track-based alignment to correct for time-dependent distortions of the ATLAS Inner Detector

CERN Document Server

Estrada Pastor, Oscar; The ATLAS collaboration

2017-01-01

ATLAS is a multipurpose experiment at the LHC proton-proton collider. Its physics goals require high resolution and unbiased measurement of all charged particle kinematic parameters. These critically depend on the layout and performance of the tracking system and the quality of its alignment. For the LHC Run II, the system has been upgraded with the installation of a new pixel layer, the Insertable B-layer (IBL). The offline track alignment of the ATLAS tracking system has to deal with about 700,000 degrees of freedom (DoF) defining its geometrical parameters, representing a considerable numerical challenge in terms of both CPU time and precision. An outline of the track based alignment approach and its implementation within the ATLAS software is presented. Special attention is paid to describe the techniques allowing to pinpoint and eliminate track parameters biases. During Run-II, ATLAS Inner Detector Alignment framework has been adapted and upgraded to correct very short time scale movements of the sub-det...
Novel methods in track-based alignment to correct for time-dependent distortions of the ATLAS Inner Detector

CERN Document Server

Estrada Pastor, Oscar; The ATLAS collaboration

2017-01-01

ATLAS is a multipurpose experiment at the LHC proton-proton collider. Its physics goals require high resolution, unbiased measurement of all charged particle kinematic parameters. These critically depend on the layout and performance of the tracking system and the quality of its offline alignment. For the LHC Run II, the system has been upgraded with the installation of a new pixel layer, the Insertable B-layer (IBL). Offline track alignment of the ATLAS tracking system has to deal with about 700,000 degrees of freedom (DoF) defining its geometrical parameters, representing a considerable numerical challenge in terms of both CPU time and precision. An outline of the track based alignment approach and its implementation within the ATLAS software will be presented. Special attention will be paid to describe the techniques allowing to pinpoint and eliminate track parameters biases due to alignment. During Run-II, ATLAS Inner Detector Alignment framework has been adapted and upgraded to correct very short time sc...
High-throughput sequence alignment using Graphics Processing Units

Directory of Open Access Journals (Sweden)

Trapnell Cole

2007-12-01

Full Text Available Abstract Background The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and de novo genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies. Results This paper describes MUMmerGPU, an open-source high-throughput parallel pairwise local sequence alignment program that runs on commodity Graphics Processing Units (GPUs in common workstations. MUMmerGPU uses the new Compute Unified Device Architecture (CUDA from nVidia to align multiple query sequences against a single reference sequence stored as a suffix tree. By processing the queries in parallel on the highly parallel graphics card, MUMmerGPU achieves more than a 10-fold speedup over a serial CPU version of the sequence alignment kernel, and outperforms the exact alignment component of MUMmer on a high end CPU by 3.5-fold in total application time when aligning reads from recent sequencing projects using Solexa/Illumina, 454, and Sanger sequencing technologies. Conclusion MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU.
Multichannel analyzer with real-time correction of counting losses based on a fast 16/32 bit microprocessor

International Nuclear Information System (INIS)

Westphal, G.P.; Kasa, T.

1984-01-01

It is demonstrated that from a modern microprocessor with 32 bit architecture and from standard VLSI peripheral chips a multichannel analyzer with real-time correction of counting losses may be designed in a very flexible yet cost-effective manner. Throughput rates of 100,000 events/second are a good match even for high-rate spectroscopy systems and may be further enhanced by the use of already available CPU chips with higher clock frequency. Low power consumption and a very compact form factor make the design highly recommendable for portable applications. By means of a simple and easily reproducible rotating sample device the dynamic response of the VPG counting loss correction method have been tested and found to be more than sufficient for conceivable real-time applications. Enhanced statistical accuracy of correction factors may be traded against speed of response by the mere change of one preset value which lends itself to the simple implementation of self-adapting systems. Reliability as well as user convenience is improved by self-calibration of pulse evolution time in the VPG counting loss correction unit
Real time data analysis with the ATLAS trigger at the LHC in Run-2

CERN Document Server

Beauchemin, Pierre-Hugues; The ATLAS collaboration

2018-01-01

The trigger selection capabilities of the ATLAS detector have been significantly enhanced for the LHC Run-2 in order to cope with the higher event rates and with the large number of simultaneous interactions (pile-up) per proton-proton bunch crossing. A new hardware system, designed to analyse real time event-topologies at Level-1 came to full use in 2017. A hardware-based track reconstruction system, expected to be used real-time in 2018, is designed to provide track information to the high-level software trigger at its full input rate. The high-level trigger selections are largely relying on offline-like reconstruction techniques, and in some cases multi-variate analysis methods. Despite the sudden change in LHC operations during the second half of 2017, which caused an increase in pile-up and therefore also in CPU usage of the trigger algorithms, the set of triggers (so called trigger menu) running online has undergone only minor modifications thanks to the robustness and redundancy of the trigger system, ...
The Trigger System of the CMS Experiment

OpenAIRE

Felcini, Marta

2008-01-01

We give an overview of the main features of the CMS trigger and data acquisition (DAQ) system. Then, we illustrate the strategies and trigger configurations (trigger tables) developed for the detector calibration and physics program of the CMS experiment, at start-up of LHC operations, as well as their possible evolution with increasing luminosity. Finally, we discuss the expected CPU time performance of the trigger algorithms and the CPU requirements for the event filter farm at start-up.
Response Surface Optimized Extraction of Total Triterpene Acids ...

African Journals Online (AJOL)

Purpose: To optimize extraction of total triterpene acids from loquat leaf and evaluate their in vitro antioxidant activities. Methods: The independent variables were ethanol concentration, extraction time, and solvent ratio, while the dependent variable was content of total triterpene acids. Composite design and response ...
MonetDB/X100 - A DBMS in the CPU cache

NARCIS (Netherlands)

M. Zukowski (Marcin); P.A. Boncz (Peter); N.J. Nes (Niels); S. Héman (Sándor)

2005-01-01

textabstractX100 is a new execution engine for the MonetDB system, that improves execution speed and overcomes its main memory limitation. It introduces the concept of in-cache vectorized processing that strikes a balance between the existing column-at-a-time MIL execution primitives of MonetDB and
Analysis OpenMP performance of AMD and Intel architecture for breaking waves simulation using MPS

Science.gov (United States)

Alamsyah, M. N. A.; Utomo, A.; Gunawan, P. H.

2018-03-01

Simulation of breaking waves by using Navier-Stokes equation via moving particle semi-implicit method (MPS) over close domain is given. The results show the parallel computing on multicore architecture using OpenMP platform can reduce the computational time almost half of the serial time. Here, the comparison using two computer architectures (AMD and Intel) are performed. The results using Intel architecture is shown better than AMD architecture in CPU time. However, in efficiency, the computer with AMD architecture gives slightly higher than the Intel. For the simulation by 1512 number of particles, the CPU time using Intel and AMD are 12662.47 and 28282.30 respectively. Moreover, the efficiency using similar number of particles, AMD obtains 50.09 % and Intel up to 49.42 %.
Investigations on application of multigrid method to MHD equilibrium analysis

International Nuclear Information System (INIS)

Ikuno, Soichiro

2000-01-01

The potentiality of application for Multi-grid method to MHD equilibrium analysis is investigated. The nonlinear eigenvalue problem often appears when the MHD equilibria are determined by solving the Grad-Shafranov equation numerically. After linearization of the equation, the problem is solved by use of the iterative method. Although the Red-Black SOR method or Gauss-Seidel method is often used for the solution of the linearized equation, it takes much CPU time to solve the problem. The Multi-grid method is compared with the SOR method for the Poisson Problem. The results of computations show that the CPU time required for the Multi-grid method is about 1000 times as small as that for the SOR method. (author)
Development of a real time activity monitoring Android application utilizing SmartStep.

Science.gov (United States)

Hegde, Nagaraj; Melanson, Edward; Sazonov, Edward

2016-08-01

Footwear based activity monitoring systems are becoming popular in academic research as well as consumer industry segments. In our previous work, we had presented developmental aspects of an insole based activity and gait monitoring system-SmartStep, which is a socially acceptable, fully wireless and versatile insole. The present work describes the development of an Android application that captures the SmartStep data wirelessly over Bluetooth Low energy (BLE), computes features on the received data, runs activity classification algorithms and provides real time feedback. The development of activity classification methods was based on the the data from a human study involving 4 participants. Participants were asked to perform activities of sitting, standing, walking, and cycling while they wore SmartStep insole system. Multinomial Logistic Discrimination (MLD) was utilized in the development of machine learning model for activity prediction. The resulting classification model was implemented in an Android Smartphone. The Android application was benchmarked for power consumption and CPU loading. Leave one out cross validation resulted in average accuracy of 96.9% during model training phase. The Android application for real time activity classification was tested on a human subject wearing SmartStep resulting in testing accuracy of 95.4%.
Analisis Implementasi Infrastructure as A Service Menggunakan Ubuntu Cloud Infrastruktur

Directory of Open Access Journals (Sweden)

Norma Fitra Puspa Rahma

2014-01-01

Full Text Available Semakin canggih dan berkembangnya teknologi informasi di berbagai aspek kehidupan, meniscayakan perguruan tinggi sebagai institusi pengembang ilmu pengetahuan dan teknologi (IPTEK untuk merespon positif. Hal ini akan berdampak juga dalam perkembangan perangkar keras yang secara tidak langsung harus mengikuti perkembangan teknologi informasi yang ada sehingga akan dilakukan penambahan perangkat yang akan menyebabkan penambahan biaya untuk membeli perangkat yang baru. Hal tersebut dapat dipenuhi dengan menggunakan teknologi cloud computing. Cloud computing merupakan model komputasi, dimana sumber daya seperti daya komputasi, penyimpanan, jaringan dan perangkat lunak disediakan sebagai laayanan di internet. Sumber daya komputasi tersebut dapat dipenuhi oleh layanan layanan cloud Infrastructure as a Service (IaaS.Infrastructure as a Service tersebut dibangun dengan menngunakan Infrastruktur Cloud Ubuntu. Sistem Operasi yang digunakan adalah Ubuntu Server 12.04 LTS dan serta perangkat lunak yang digunakan untuk membangun infrastruktur adalah OpenStack versi essex. Hasil dari tugas akhir ini adalah terciptanya mesin virtual berdasarkan spesifikasi CPU, memory, dan disk yang dipilih melalui flavor yaitu m1.tiny dengan spesifikasi memori 512 MB, disk 0 GB, ephemeral 0 GB, vCPU 1. Image yang digunakan pada instance adalahsServer Ubuntu 12.04.3 LTS. Kecepatan CPU yang didapat pada mesin virtual tersebut adalah 3000,106 MHz. Penggunaan CPU pada instance dengan nama “webserver” meliputi 0,3% dengan sisa 0,97%, Memori 422764k dari total keseluruhan 503496.
32 CFR 518.20 - Collection of fees and fee rates.

Science.gov (United States)

2010-07-01

..., programmer, database administrator, or action officer). (ii) Machine time. Machine time involves only direct costs of the Central Processing Unit (CPU), input/output devices, and memory capacity used in the actual...

Optimizing SIEM Throughput on the Cloud Using Parallelization.

Science.gov (United States)

Alam, Masoom; Ihsan, Asif; Khan, Muazzam A; Javaid, Qaisar; Khan, Abid; Manzoor, Jawad; Akhundzada, Adnan; Khan, Muhammad Khurram; Farooq, Sajid

2016-01-01

Processing large amounts of data in real time for identifying security issues pose several performance challenges, especially when hardware infrastructure is limited. Managed Security Service Providers (MSSP), mostly hosting their applications on the Cloud, receive events at a very high rate that varies from a few hundred to a couple of thousand events per second (EPS). It is critical to process this data efficiently, so that attacks could be identified quickly and necessary response could be initiated. This paper evaluates the performance of a security framework OSTROM built on the Esper complex event processing (CEP) engine under a parallel and non-parallel computational framework. We explain three architectures under which Esper can be used to process events. We investigated the effect on throughput, memory and CPU usage in each configuration setting. The results indicate that the performance of the engine is limited by the number of events coming in rather than the queries being processed. The architecture where 1/4th of the total events are submitted to each instance and all the queries are processed by all the units shows best results in terms of throughput, memory and CPU usage.
Optimizing SIEM Throughput on the Cloud Using Parallelization.

Directory of Open Access Journals (Sweden)

Masoom Alam

Full Text Available Processing large amounts of data in real time for identifying security issues pose several performance challenges, especially when hardware infrastructure is limited. Managed Security Service Providers (MSSP, mostly hosting their applications on the Cloud, receive events at a very high rate that varies from a few hundred to a couple of thousand events per second (EPS. It is critical to process this data efficiently, so that attacks could be identified quickly and necessary response could be initiated. This paper evaluates the performance of a security framework OSTROM built on the Esper complex event processing (CEP engine under a parallel and non-parallel computational framework. We explain three architectures under which Esper can be used to process events. We investigated the effect on throughput, memory and CPU usage in each configuration setting. The results indicate that the performance of the engine is limited by the number of events coming in rather than the queries being processed. The architecture where 1/4th of the total events are submitted to each instance and all the queries are processed by all the units shows best results in terms of throughput, memory and CPU usage.
AMITIS: A 3D GPU-Based Hybrid-PIC Model for Space and Plasma Physics

Science.gov (United States)

Fatemi, Shahab; Poppe, Andrew R.; Delory, Gregory T.; Farrell, William M.

2017-05-01

We have developed, for the first time, an advanced modeling infrastructure in space simulations (AMITIS) with an embedded three-dimensional self-consistent grid-based hybrid model of plasma (kinetic ions and fluid electrons) that runs entirely on graphics processing units (GPUs). The model uses NVIDIA GPUs and their associated parallel computing platform, CUDA, developed for general purpose processing on GPUs. The model uses a single CPU-GPU pair, where the CPU transfers data between the system and GPU memory, executes CUDA kernels, and writes simulation outputs on the disk. All computations, including moving particles, calculating macroscopic properties of particles on a grid, and solving hybrid model equations are processed on a single GPU. We explain various computing kernels within AMITIS and compare their performance with an already existing well-tested hybrid model of plasma that runs in parallel using multi-CPU platforms. We show that AMITIS runs ∼10 times faster than the parallel CPU-based hybrid model. We also introduce an implicit solver for computation of Faraday’s Equation, resulting in an explicit-implicit scheme for the hybrid model equation. We show that the proposed scheme is stable and accurate. We examine the AMITIS energy conservation and show that the energy is conserved with an error < 0.2% after 500,000 timesteps, even when a very low number of particles per cell is used.
High-performance computing on GPUs for resistivity logging of oil and gas wells

Science.gov (United States)

Glinskikh, V.; Dudaev, A.; Nechaev, O.; Surodina, I.

2017-10-01

We developed and implemented into software an algorithm for high-performance simulation of electrical logs from oil and gas wells using high-performance heterogeneous computing. The numerical solution of the 2D forward problem is based on the finite-element method and the Cholesky decomposition for solving a system of linear algebraic equations (SLAE). Software implementations of the algorithm used the NVIDIA CUDA technology and computing libraries are made, allowing us to perform decomposition of SLAE and find its solution on central processor unit (CPU) and graphics processor unit (GPU). The calculation time is analyzed depending on the matrix size and number of its non-zero elements. We estimated the computing speed on CPU and GPU, including high-performance heterogeneous CPU-GPU computing. Using the developed algorithm, we simulated resistivity data in realistic models.
Fast plane wave density functional theory molecular dynamics calculations on multi-GPU machines

International Nuclear Information System (INIS)

Jia, Weile; Fu, Jiyun; Cao, Zongyan; Wang, Long; Chi, Xuebin; Gao, Weiguo; Wang, Lin-Wang

2013-01-01

Plane wave pseudopotential (PWP) density functional theory (DFT) calculation is the most widely used method for material simulations, but its absolute speed stagnated due to the inability to use large scale CPU based computers. By a drastic redesign of the algorithm, and moving all the major computation parts into GPU, we have reached a speed of 12 s per molecular dynamics (MD) step for a 512 atom system using 256 GPU cards. This is about 20 times faster than the CPU version of the code regardless of the number of CPU cores used. Our tests and analysis on different GPU platforms and configurations shed lights on the optimal GPU deployments for PWP-DFT calculations. An 1800 step MD simulation is used to study the liquid phase properties of GaInP
A Practical Framework to Study Low-Power Scheduling Algorithms on Real-Time and Embedded Systems

Directory of Open Access Journals (Sweden)

Jian (Denny Lin

2014-05-01

Full Text Available With the advanced technology used to design VLSI (Very Large Scale Integration circuits, low-power and energy-efficiency have played important roles for hardware and software implementation. Real-time scheduling is one of the fields that has attracted extensive attention to design low-power, embedded/real-time systems. The dynamic voltage scaling (DVS and CPU shut-down are the two most popular techniques used to design the algorithms. In this paper, we firstly review the fundamental advances in the research of energy-efficient, real-time scheduling. Then, a unified framework with a real Intel PXA255 Xscale processor, namely real-energy, is designed, which can be used to measure the real performance of the algorithms. We conduct a case study to evaluate several classical algorithms by using the framework. The energy efficiency and the quantitative difference in their performance, as well as the practical issues found in the implementation of these algorithms are discussed. Our experiments show a gap between the theoretical and real results. Our framework not only gives researchers a tool to evaluate their system designs, but also helps them to bridge this gap in their future works.
Hybrid monitoring scheme for end-to-end performance enhancement of multicast-based real-time media

Science.gov (United States)

Park, Ju-Won; Kim, JongWon

2004-10-01

As real-time media applications based on IP multicast networks spread widely, end-to-end QoS (quality of service) provisioning for these applications have become very important. To guarantee the end-to-end QoS of multi-party media applications, it is essential to monitor the time-varying status of both network metrics (i.e., delay, jitter and loss) and system metrics (i.e., CPU and memory utilization). In this paper, targeting the multicast-enabled AG (Access Grid) a next-generation group collaboration tool based on multi-party media services, the applicability of hybrid monitoring scheme that combines active and passive monitoring is investigated. The active monitoring measures network-layer metrics (i.e., network condition) with probe packets while the passive monitoring checks both application-layer metrics (i.e., user traffic condition by analyzing RTCP packets) and system metrics. By comparing these hybrid results, we attempt to pinpoint the causes of performance degradation and explore corresponding reactions to improve the end-to-end performance. The experimental results show that the proposed hybrid monitoring can provide useful information to coordinate the performance improvement of multi-party real-time media applications.
Design, Results, Evolution and Status of the ATLAS Simulation at Point1 Project

CERN Document Server

AUTHOR|(SzGeCERN)377840; Fressard-Batraneanu, Silvia Maria; Ballestrero, Sergio; Contescu, Alexandru Cristian; Fazio, Daniel; Di Girolamo, Alessandro; Lee, Christopher Jon; Pozo Astigarraga, Mikel Eukeni; Scannicchio, Diana; Sedov, Alexey; Twomey, Matthew Shaun; Wang, Fuquan; Zaytsev, Alexander

2015-01-01

Abstract. During the LHC Long Shutdown 1 period (LS1), that started in 2013, the Simulation at Point1 (Sim@P1) Project takes advantage, in an opportunistic way, of the TDAQ (Trigger and Data Acquisition) HLT (High Level Trigger) farm of the ATLAS experiment. This farm provides more than 1300 compute nodes, which are particularly suited for running event generation and Monte Carlo production jobs that are mostly CPU and not I/O bound. It is capable of running up to 2700 virtual machines (VMs) provided with 8 CPU cores each, for a total of up to 22000 parallel running jobs. This contribution gives a review of the design, the results, and the evolution of the Sim@P1 Project; operating a large scale OpenStack based virtualized platform deployed on top of the ATLAS TDAQ HLT farm computing resources. During LS1, Sim@P1 was one of the most productive ATLAS sites: it delivered more than 50 million CPU-hours and it generated more than 1.7 billion Monte Carlo events to various analysis communities. The design aspects a...
Design, Results, Evolution and Status of the ATLAS simulation in Point1 project.

CERN Document Server

Ballestrero, Sergio; The ATLAS collaboration; Brasolin, Franco; Contescu, Alexandru Cristian; Fazio, Daniel; Di Girolamo, Alessandro; Lee, Christopher Jon; Pozo Astigarraga, Mikel Eukeni; Scannicchio, Diana; Sedov, Alexey; Twomey, Matthew Shaun; Wang, Fuquan; Zaytsev, Alexander

2015-01-01

During the LHC long shutdown period (LS1), that started in 2013, the simulation in Point1 (Sim@P1) project takes advantage in an opportunistic way of the trigger and data acquisition (TDAQ) farm of the ATLAS experiment. The farm provides more than 1500 computer nodes, and they are particularly suitable for running event generation and Monte Carlo production jobs that are mostly CPU and not I/O bound. It is capable of running up to 2500 virtual machines (VM) provided with 8 CPU cores each, for a total of up to 20000 parallel running jobs. This contribution gives a thorough review of the design, the results and the evolution of the Sim@P1 project operating a large scale Openstack based virtualized platform deployed on top of the ATLAS TDAQ farm computing resources. During LS1, Sim@P1 was one of the most productive GRID sites: it delivered more than 50 million CPU-hours and it generated more than 1.7 billion Monte Carlo events to various analysis communities within the ATLAS collaboration. The particular design ...
The Attributable Proportion of Specific Leisure-Time Physical Activities to Total Leisure Activity Volume Among US Adults, National Health and Nutrition Examination Survey 1999-2006.

Science.gov (United States)

Watson, Kathleen Bachtel; Dai, Shifan; Paul, Prabasaj; Carlson, Susan A; Carroll, Dianna D; Fulton, Janet

2016-11-01

Previous studies have examined participation in specific leisure-time physical activities (PA) among US adults. The purpose of this study was to identify specific activities that contribute substantially to total volume of leisure-time PA in US adults. Proportion of total volume of leisure-time PA moderate-equivalent minutes attributable to 9 specific types of activities was estimated using self-reported data from 21,685 adult participants (≥ 18 years) in the National Health and Nutrition Examination Survey 1999-2006. Overall, walking (28%), sports (22%), and dancing (9%) contributed most to PA volume. Attributable proportion was higher among men than women for sports (30% vs. 11%) and higher among women than men for walking (36% vs. 23%), dancing (16% vs. 4%), and conditioning exercises (10% vs. 5%). The proportion was lower for walking, but higher for sports, among active adults than those insufficiently active and increased with age for walking. Compared with other racial/ethnic groups, the proportion was lower for sports among non-Hispanic white men and for dancing among non-Hispanic white women. Walking, sports, and dance account for the most activity time among US adults overall, yet some demographic variations exist. Strategies for PA promotion should be tailored to differences across population subgroups.
Multipurpose assessment for the quantification of Vibrio spp. and total bacteria in fish and seawater using multiplex real-time polymerase chain reaction

Science.gov (United States)

Kim, Ji Yeun; Lee, Jung-Lim

2014-01-01

Background This study describes the first multiplex real-time polymerase chain reaction assay developed, as a multipurpose assessment, for the simultaneous quantification of total bacteria and three Vibrio spp. (V. parahaemolyticus, V. vulnificus and V. anguillarum) in fish and seawater. The consumption of raw finfish as sushi or sashimi has been increasing the chance of Vibrio outbreaks in consumers. Freshness and quality of fishery products also depend on the total bacterial populations present. Results The detection sensitivity of the specific targets for the multiplex assay was 1 CFU mL−1 in pure culture and seawater, and 10 CFU g−1 in fish. While total bacterial counts by the multiplex assay were similar to those obtained by cultural methods, the levels of Vibrio detected by the multiplex assay were generally higher than by cultural methods of the same populations. Among the natural samples without Vibrio spp. inoculation, eight out of 10 seawater and three out of 20 fish samples were determined to contain Vibrio spp. Conclusion Our data demonstrate that this multiplex assay could be useful for the rapid detection and quantification of Vibrio spp. and total bacteria as a multipurpose tool for surveillance of fish and water quality as well as diagnostic method. © 2014 The Authors. Journal of the Science of Food and Agriculture published by JohnWiley & Sons Ltd on behalf of Society of Chemical Industry. PMID:24752974
US-Total Electron Content Product (USTEC)

Data.gov (United States)

National Oceanic and Atmospheric Administration, Department of Commerce — The US Total Electron Content (US-TEC) product is designed to specify TEC over the Continental US (CONUS) in near real-time. The product uses a Kalman Filter data...
Polydrug use among college students in Brazil: a nationwide survey

OpenAIRE

Oliveira,Lúcio Garcia de; Alberghini,Denis Guilherme; Santos,Bernardo dos; Andrade,Arthur Guerra de

2013-01-01

Objective: To estimate the frequency of polydrug use (alcohol and illicit drugs) among college students and its associations with gender and age group. Methods: A nationwide sample of 12,544 college students was asked to complete a questionnaire on their use of drugs according to three time parameters (lifetime, past 12 months, and last 30 days). The co-use of drugs was investigated as concurrent polydrug use (CPU) and simultaneous polydrug use (SPU), a subcategory of CPU that involves the ...
Potential and limitation of mid-infrared attenuated total reflectance spectroscopy for real time analysis of raw milk in milking lines.

Science.gov (United States)

Linker, Raphael; Etzion, Yael

2009-02-01

Real-time information about milk composition would be very useful for managing the milking process. Mid-infrared spectroscopy, which relies on fundamental modes of molecular vibrations, is routinely used for off-line analysis of milk and the purpose of the present study was to investigate the potential of attenuated total reflectance mid-infrared spectroscopy for real-time analysis of milk in milking lines. The study was conducted with 189 samples from over 70 cows that were collected during an 18 months period. Principal component analysis, wavelets and neural networks were used to develop various models for predicting protein and fat concentration. Although reasonable protein models were obtained for some seasonal sub-datasets (determination errors protein), the models lacked robustness and it was not possible to develop a model suitable for all the data. Determination of fat concentration proved even more problematic and the determination errors remained unacceptably large regardless of the sub-dataset analyzed or of the spectral intervals used. These poor results can be explained by the limited penetration depth of the mid-infrared radiation that causes the spectra to be very sensitive to the presence of fat globules or fat biofilms in the boundary layer that forms at the interface between the milk and the crystal that serves both as radiation waveguide and sensing element. Since manipulations such as homogenisation are not permissible for in-line analysis, these results show that the potential of mid-infrared attenuated total reflectance spectroscopy for in-line milk analysis is indeed quite limited.
Optimisation of high-quality total ribonucleic acid isolation from cartilaginous tissues for real-time polymerase chain reaction analysis.

Science.gov (United States)

Peeters, M; Huang, C L; Vonk, L A; Lu, Z F; Bank, R A; Helder, M N; Doulabi, B Zandieh

2016-11-01

Studies which consider the molecular mechanisms of degeneration and regeneration of cartilaginous tissues are seriously hampered by problematic ribonucleic acid (RNA) isolations due to low cell density and the dense, proteoglycan-rich extracellular matrix of cartilage. Proteoglycans tend to co-purify with RNA, they can absorb the full spectrum of UV light and they are potent inhibitors of polymerase chain reaction (PCR). Therefore, the objective of the present study is to compare and optimise different homogenisation methods and RNA isolation kits for an array of cartilaginous tissues. Tissue samples such as the nucleus pulposus (NP), annulus fibrosus (AF), articular cartilage (AC) and meniscus, were collected from goats and homogenised by either the MagNA Lyser or Freezer Mill. RNA of duplicate samples was subsequently isolated by either TRIzol (benchmark), or the RNeasy Lipid Tissue, RNeasy Fibrous Tissue, or Aurum Total RNA Fatty and Fibrous Tissue kits. RNA yield, purity, and integrity were determined and gene expression levels of type II collagen and aggrecan were measured by real-time PCR. No differences between the two homogenisation methods were found. RNA isolation using the RNeasy Fibrous and Lipid kits resulted in the purest RNA (A260/A280 ratio), whereas TRIzol isolations resulted in RNA that is not as pure, and show a larger difference in gene expression of duplicate samples compared with both RNeasy kits. The Aurum kit showed low reproducibility. For the extraction of high-quality RNA from cartilaginous structures, we suggest homogenisation of the samples by the MagNA Lyser. For AC, NP and AF we recommend the RNeasy Fibrous kit, whereas for the meniscus the RNeasy Lipid kit is advised.Cite this article: M. Peeters, C. L. Huang, L. A. Vonk, Z. F. Lu, R. A. Bank, M. N. Helder, B. Zandieh Doulabi. Optimisation of high-quality total ribonucleic acid isolation from cartilaginous tissues for real-time polymerase chain reaction analysis. Bone Joint Res 2016
A New Multiscale Technique for Time-Accurate Geophysics Simulations

Science.gov (United States)

Omelchenko, Y. A.; Karimabadi, H.

2006-12-01

Large-scale geophysics systems are frequently described by multiscale reactive flow models (e.g., wildfire and climate models, multiphase flows in porous rocks, etc.). Accurate and robust simulations of such systems by traditional time-stepping techniques face a formidable computational challenge. Explicit time integration suffers from global (CFL and accuracy) timestep restrictions due to inhomogeneous convective and diffusion processes, as well as closely coupled physical and chemical reactions. Application of adaptive mesh refinement (AMR) to such systems may not be always sufficient since its success critically depends on a careful choice of domain refinement strategy. On the other hand, implicit and timestep-splitting integrations may result in a considerable loss of accuracy when fast transients in the solution become important. To address this issue, we developed an alternative explicit approach to time-accurate integration of such systems: Discrete-Event Simulation (DES). DES enables asynchronous computation by automatically adjusting the CPU resources in accordance with local timescales. This is done by encapsulating flux- conservative updates of numerical variables in the form of events, whose execution and synchronization is explicitly controlled by imposing accuracy and causality constraints. As a result, at each time step DES self- adaptively updates only a fraction of the global system state, which eliminates unnecessary computation of inactive elements. DES can be naturally combined with various mesh generation techniques. The event-driven paradigm results in robust and fast simulation codes, which can be efficiently parallelized via a new preemptive event processing (PEP) technique. We discuss applications of this novel technology to time-dependent diffusion-advection-reaction and CFD models representative of various geophysics applications.
Minimizing total weighted completion time in a proportionate flow shop

NARCIS (Netherlands)

Shakhlevich, N.V.; Hoogeveen, J.A.; Pinedo, M.L.

1998-01-01

We study the special case of the m machine flow shop problem in which the processing time of each operation of job j is equal to pj; this variant of the flow shop problem is known as the proportionate flow shop problem. We show that for any number of machines and for any regular performance
Total neutron-counting plutonium inventory measurement systems (PIMS) and their potential application to near real time materials accountancy (NRTMA)

International Nuclear Information System (INIS)

Driscall, I.; Fox, G.H.; Orr, C.H.; Whitehouse, K.R.

1988-01-01

A radiometric method of determining the inventory of an operating plutonium plant is described. An array of total neutron counters distributed across the plant is used to estimate hold-up at each plant item. Corrections for the sensitivity of detectors to plutonium in adjacent plant items are achieved through a matrix approach. This paper describes our experience in design, calibration and operation of a Plutonium Inventory Measurement System (PIMS) on an oxalate precipitation plutonium finishing line. Data from a recent trial of Near-Real-Time Materials Accounting (NRTMA) using the PIMS are presented and used to illustrate its present performance and problem areas. The reader is asked to consider what role PIMS might have in future accountancy systems
Bulk metal concentrations versus total suspended solids in rivers: Time-invariant & catchment-specific relationships.

Science.gov (United States)

Nasrabadi, Touraj; Ruegner, Hermann; Schwientek, Marc; Bennett, Jeremy; Fazel Valipour, Shahin; Grathwohl, Peter

2018-01-01

Suspended particles in rivers can act as carriers of potentially bioavailable metal species and are thus an emerging area of interest in river system monitoring. The delineation of bulk metals concentrations in river water into dissolved and particulate components is also important for risk assessment. Linear relationships between bulk metal concentrations in water (CW,tot) and total suspended solids (TSS) in water can be used to easily evaluate dissolved (CW, intercept) and particle-bound metal fluxes (CSUS, slope) in streams (CW,tot = CW + CSUS TSS). In this study, we apply this principle to catchments in Iran (Haraz) and Germany (Ammer, Goldersbach, and Steinlach) that show differences in geology, geochemistry, land use and hydrological characteristics. For each catchment, particle-bound and dissolved concentrations for a suite of metals in water were calculated based on linear regressions of total suspended solids and total metal concentrations. Results were replicable across sampling campaigns in different years and seasons (between 2013 and 2016) and could be reproduced in a laboratory sedimentation experiment. CSUS values generally showed little variability in different catchments and agree well with soil background values for some metals (e.g. lead and nickel) while other metals (e.g. copper) indicate anthropogenic influences. CW was elevated in the Haraz (Iran) catchment, indicating higher bioavailability and potential human and ecological health concerns (where higher values of CSUS/CW are considered as a risk indicator).
Modeling of the Ionospheric Scintillation and Total Electron Content Observations during the 21 August 2017 Total Solar Eclipse

Science.gov (United States)

Datta-Barua, S.; Gachancipa, J. N.; Deshpande, K.; Herrera, J. A.; Lehmacher, G. A.; Su, Y.; Gyuk, G.; Bust, G. S.; Hampton, D. L.

2017-12-01

High concentration of free electrons in the ionosphere can cause fluctuations in incoming electromagnetic waves, such as those from the different Global Navigation Satellite Systems (GNSS). The behavior of the ionosphere depends on time and location, and it is highly influenced by solar activity. The purpose of this study is to determine the impact of a total solar eclipse on the local ionosphere in terms of ionospheric scintillations, and on the global ionosphere in terms of TEC (Total Electron Content). The studied eclipse occurred on 21 August 2017 across the continental United States. During the eclipse, we expected to see a decrease in the scintillation strength, as well as in the TEC values. As a broader impact part of our recently funded NSF proposal, we temporarily deployed two GNSS receivers on the eclipse's totality path. One GNSS receiver was placed in Clemson, SC. This is a multi-frequency GNSS receiver (NovAtel GPStation-6) capable of measuring high and low rate scintillation data as well as TEC values from four different GNSS systems. We had the receiver operating before, during, and after the solar eclipse to enable the comparison between eclipse and non-eclipse periods. A twin receiver collected data at Daytona Beach, FL during the same time, where an 85% partial solar eclipse was observed. Additionally, we set up a ground receiver onsite in the path of totality in Perryville, Missouri, from which the Adler Planetarium of Chicago launched a high-altitude balloon to capture a 360-degree video of the eclipse from the stratosphere. By analyzing the collected data, this study looks at the effects of partial and total solar eclipse periods on high rate GNSS scintillation data at mid-latitudes, which had not been explored in detail. This study also explores the impact of solar eclipses on signals from different satellite constellations (GPS, GLONASS, and Galileo). Throughout the eclipse, the scintillation values did not appear to have dramatic changes

Total nitrogen and total phosphorus removal from brackish aquaculture wastewater using effective microorganism

Science.gov (United States)

Mohamad, K. A.; Mohd, S. Y.; Sarah, R. S.; Mohd, H. Z.; Rasyidah, A.

2017-09-01

Aquaculture is one of dominant food based industry in the world with 8.3% annual growth rate and its development had led to adverse effect on the environment. High nutrient production in form of nitrogenous compound and phosphorus contributed to environmental deterioration such as eutrophication and toxicity to the industry. Usage of Effective Microorganism (EM), one of the biological approaches to remove Total Nitrogen (TN) and Total Phosphorus (TP) in aquaculture pond was proposed. Samples were obtained from the Sea Bass intensive brackish aquaculture wastewater (AW) from fish farm at Juru, Penang and the parameters used to measure the removal of nitrogenous compounds include, pH, EM dosage, shaking, contact time and optimum variable conditions. From the study, for effective contact time, day 6 is the optimum contact time for both TN and TP with 99.74% and 62.78% removal respectively while in terms of optimum pH, the highest TN removal was at pH 7 with 66.89 %. The optimum dosage of EM is 1.5 ml with ratio 1:166 for 81.5 % TN removal was also found appropriate during the experiment. At varied optimum conditions of EM, the removal efficiency of TN and TP were 81.53% and 38.94% respectively while the removal mechanism of TN was highly dependent on the decomposition rate of specific bacteria such as Nitrobacter bacteria, Yeast and Bacillus Subtilis sp. The study has established the efficacy of EM's ability to treat excessive nutrient of TN and TP from AW.
Trends in television and computer/videogame use and total screen time in high school students from Caruaru city, Pernambuco, Brazil: A repeated panel study between 2007 and 2012

Directory of Open Access Journals (Sweden)

Luis José Lagos Aros

2018-01-01

Full Text Available Abstract Aim: to analyze the pattern and trends of use of screen-based devices and associated factors from two surveys conducted on public high school students in Caruaru-PE. Methods: two representative school-based cross-sectional surveys conducted in 2007 (n=600 and 2012 (n=715 on high school students (15-20 years old. The time of exposure to television (TV and computer/videogames PC/VG was obtained through a validated questionnaire, and ≥3 hours/day was considered as being excessive exposure. The independent variables were socioeconomic status, school related, and physical activity. Crude and adjusted binary logistic regression were employed to examine the factors associated with screen time. The statistical significance was set at p<0.05. Results: There was a significant reduction in TV time on weekdays and total weekly, but no change in the prevalence of excessive exposure. The proportion of exposure to PC/VG of ≥3 hours/day increased 182.5% on weekdays and 69.5% on weekends (p <0.05. In 2007, being physically active was the only protection factor for excessive exposure to total screen time. In 2012, girls presented less chance of excessive exposure to all screen-based devices and total screen time. Other protective factors were studying at night and being physically active (PC/VG time, while residing in an urban area [OR 5.03(2.77-7.41] and having higher family income [OR 1.55(1.04-2.30] were risk factors. Conclusion: Significant and important changes in the time trends and pattern of use PC/VG were observed during the interval of 5 years. This rapid increase could be associated with increased family income and improved access to these devices, driven by technological developments.
Evaluierung die FPGA Koprozessoren zur Beschleunigung der Ausführung von Spurrekonstruktionsalgorithmen im ATLAS LVL2-Trigger

CERN Document Server

Khomich, Andrei

2006-01-01

In the scope of this thesis one of the possible approaches to acceleration the tracking algorithms using the hybrid FPGA/CPU systems has been investigated. The TRT LUT-Hough algorithm - one of the tracking algorithms for ATLAS Level2 trigger - is selected for this purpose. It is a Look-Up Table (LUT) based Hough transform algorithm for Transition Radiation Tracker (TRT). The algorithm was created keeping in mind the B-physic's tasks: fast search for low-pT tracks in entire TRT volume. Such a full subdetector scan requires a lot of computational power. Hybrid implementation of the algorithm (when the most time consuming part of algorithm is accelerated by FPGA co-processor and all other parts are running on a general purpose CPU) is integrated in the same software framework as a C++ implementation for comparison. Identical physical results are obtained for both the CPU and the Hybrid implementations. Timing measurements results show that a critical part, is implemented in VHDL runs on the FPGA co-processor ~4 ...
Parallelization for X-ray crystal structural analysis program

Energy Technology Data Exchange (ETDEWEB)

Watanabe, Hiroshi [Japan Atomic Energy Research Inst., Tokyo (Japan); Minami, Masayuki; Yamamoto, Akiji

1997-10-01

In this report we study vectorization and parallelization for X-ray crystal structural analysis program. The target machine is NEC SX-4 which is a distributed/shared memory type vector parallel supercomputer. X-ray crystal structural analysis is surveyed, and a new multi-dimensional discrete Fourier transform method is proposed. The new method is designed to have a very long vector length, so that it enables to obtain the 12.0 times higher performance result that the original code. Besides the above-mentioned vectorization, the parallelization by micro-task functions on SX-4 reaches 13.7 times acceleration in the part of multi-dimensional discrete Fourier transform with 14 CPUs, and 3.0 times acceleration in the whole program. Totally 35.9 times acceleration to the original 1CPU scalar version is achieved with vectorization and parallelization on SX-4. (author)
Double dissociation of the anterior and posterior dorsomedial caudate-putamen in the acquisition and expression of associative learning with the nicotine stimulus.

Science.gov (United States)

Charntikov, Sergios; Pittenger, Steven T; Swalve, Natashia; Li, Ming; Bevins, Rick A

2017-07-15

Tobacco use is the leading cause of preventable deaths worldwide. This habit is not only debilitating to individual users but also to those around them (second-hand smoking). Nicotine is the main addictive component of tobacco products and is a moderate stimulant and a mild reinforcer. Importantly, besides its unconditional effects, nicotine also has conditioned stimulus effects that may contribute to the tenacity of the smoking habit. Because the neurobiological substrates underlying these processes are virtually unexplored, the present study investigated the functional involvement of the dorsomedial caudate putamen (dmCPu) in learning processes with nicotine as an interoceptive stimulus. Rats were trained using the discriminated goal-tracking task where nicotine injections (0.4 mg/kg; SC), on some days, were paired with intermittent (36 per session) sucrose deliveries; sucrose was not available on interspersed saline days. Pre-training excitotoxic or post-training transient lesions of anterior or posterior dmCPu were used to elucidate the role of these areas in acquisition or expression of associative learning with nicotine stimulus. Pre-training lesion of p-dmCPu inhibited acquisition while post-training lesions of p-dmCPu attenuated the expression of associative learning with the nicotine stimulus. On the other hand, post-training lesions of a-dmCPu evoked nicotine-like responding following saline treatment indicating the role of this area in disinhibition of learned motor behaviors. These results, for the first time, show functionally distinct involvement of a- and p-dmCPu in various stages of associative learning using nicotine stimulus and provide an initial account of neural plasticity underlying these learning processes. Copyright © 2017 Elsevier Ltd. All rights reserved.
GPU Computing in Bayesian Inference of Realized Stochastic Volatility Model

International Nuclear Information System (INIS)

Takaishi, Tetsuya

2015-01-01

The realized stochastic volatility (RSV) model that utilizes the realized volatility as additional information has been proposed to infer volatility of financial time series. We consider the Bayesian inference of the RSV model by the Hybrid Monte Carlo (HMC) algorithm. The HMC algorithm can be parallelized and thus performed on the GPU for speedup. The GPU code is developed with CUDA Fortran. We compare the computational time in performing the HMC algorithm on GPU (GTX 760) and CPU (Intel i7-4770 3.4GHz) and find that the GPU can be up to 17 times faster than the CPU. We also code the program with OpenACC and find that appropriate coding can achieve the similar speedup with CUDA Fortran
Extending total parenteral nutrition hang time in the neonatal intensive care unit: is it safe and cost effective?

Science.gov (United States)

Balegar V, Kiran Kumar; Azeem, Mohammad Irfan; Spence, Kaye; Badawi, Nadia

2013-01-01

To investigate the effects of prolonging hang time of total parenteral nutrition (TPN) fluid on central line-associated blood stream infection (CLABSI), TPN-related cost and nursing workload. A before-after observational study comparing the practice of hanging TPN bags for 48 h (6 February 2009-5 February 2010) versus 24 h (6 February 2008-5 February 2009) in a tertiary neonatal intensive care unit was conducted. The main outcome measures were CLABSI, TPN-related expenses and nursing workload. One hundred thirty-six infants received 24-h TPN bags and 124 received 48-h TPN bags. Median (inter-quartile range) gestation (37 weeks (33,39) vs. 36 weeks (33,39)), mean (±standard deviation) admission weight of 2442 g (±101) versus 2476 g (±104) and TPN duration (9.7 days (±12.7) vs. 9.9 days (±13.4)) were similar (P > 0.05) between the 24- and 48-h TPN groups. There was no increase in CLABSI with longer hang time (0.8 vs. 0.4 per 1000 line days in the 24-h vs. 48-h group; P < 0.05). Annual cost saving using 48-h TPN was AUD 97,603.00. By using 48-h TPN, 68.3% of nurses indicated that their workload decreased and 80.5% indicated that time spent changing TPN reduced. Extending TPN hang time from 24 to 48 h did not alter CLABSI rate and was associated with a reduced TPN-related cost and perceived nursing workload. Larger randomised controlled trials are needed to more clearly delineate these effects. © 2012 The Authors. Journal of Paediatrics and Child Health © 2012 Paediatrics and Child Health Division (Royal Australasian College of Physicians).
Dissociated time course between peak torque and total work recovery following bench press training in resistance trained men.

Science.gov (United States)

Ferreira, Diogo V; Gentil, Paulo; Ferreira-Junior, João B; Soares, Saulo R S; Brown, Lee E; Bottaro, Martim

2017-10-01

To evaluate the time course of peak torque and total work recovery after a resistance training session involving the bench press exercise. Repeated measures with a within subject design. Twenty-six resistance-trained men (age: 23.7±3.7years; height: 176.0±5.7cm; mass: 79.65±7.61kg) performed one session involving eight sets of the bench press exercise performed to momentary muscle failure with 2-min rest between sets. Shoulder horizontal adductors peak torque (PT), total work (TW), delayed onset muscle soreness (DOMS) and subjective physical fitness were measured pre, immediately post, 24, 48, 72 and 96h following exercise. The exercise protocol resulted in significant pectoralis major DOMS that lasted for 72h. Immediately after exercise, the reduction in shoulder horizontal adductors TW (25%) was greater than PT (17%). TW, as a percentage of baseline values, was also less than PT at 24, 48 and 96h after exercise. Additionally, PT returned to baseline at 96h, while TW did not. Resistance trained men presented dissimilar PT and TW recovery following free weight bench press exercise. This indicates that recovery of maximal voluntary contraction does not reflect the capability to perform multiple contractions. Strength and conditioning professionals should be cautious when evaluating muscle recovery by peak torque, since it can lead to the repetition of a training session sooner than recommended. Copyright © 2017. Published by Elsevier Inc.
Periodicity and time trends in the prevalence of total births and conceptions with congenital malformations among Jews and Muslims in Israel, 1999-2006: a time series study of 823,966 births.

Science.gov (United States)

Agay-Shay, Keren; Friger, Michael; Linn, Shai; Peled, Ammatzia; Amitai, Yona; Peretz, Chava

2012-06-01

BACKGROUND Congenital malformations (CMs) are a leading cause of infant disability. Geophysical patterns such as 2-year, yearly, half-year, 3-month, and lunar cycles regulate much of the temporal biology of all life on Earth and may affect birth and birth outcomes in humans. Therefore, the aim of this study was to evaluate and compare trends and periodicity in total births and CM conceptions in two Israeli populations. METHODS Poisson nonlinear models (polynomial) were applied to study and compare trends and geophysical periodicity cycles of weekly births and weekly prevalence rate of CM (CMPR), in a time-series design of conception date within and between Jews and Muslims. The population included all live births and stillbirths (n = 823,966) and CM (three anatomic systems, eight CM groups [n = 2193]) in Israel during 2000 to 2006. Data were obtained from the Ministry of Health. RESULTS We describe the trend and periodicity cycles for total birth conceptions. Of eight groups of CM, periodicity cycles were statistically significant in four CM groups for either Jews or Muslims. Lunar month and biennial periodicity cycles not previously investigated in the literature were found to be statistically significant. Biennial cycle was significant in total births (Jews and Muslims) and syndactyly (Muslims), whereas lunar month cycle was significant in total births (Muslims) and atresia of small intestine (Jews). CONCLUSION We encourage others to use the method we describe as an important tool to investigate the effects of different geophysical cycles on human health and pregnancy outcomes, especially CM, and to compare between populations. Copyright © 2012 Wiley Periodicals, Inc.
An Adaptive Hybrid Multiprocessor technique for bioinformatics sequence alignment

KAUST Repository

Bonny, Talal

2012-07-28

Sequence alignment algorithms such as the Smith-Waterman algorithm are among the most important applications in the development of bioinformatics. Sequence alignment algorithms must process large amounts of data which may take a long time. Here, we introduce our Adaptive Hybrid Multiprocessor technique to accelerate the implementation of the Smith-Waterman algorithm. Our technique utilizes both the graphics processing unit (GPU) and the central processing unit (CPU). It adapts to the implementation according to the number of CPUs given as input by efficiently distributing the workload between the processing units. Using existing resources (GPU and CPU) in an efficient way is a novel approach. The peak performance achieved for the platforms GPU + CPU, GPU + 2CPUs, and GPU + 3CPUs is 10.4 GCUPS, 13.7 GCUPS, and 18.6 GCUPS, respectively (with the query length of 511 amino acid). © 2010 IEEE.
Multi-microprocessor control of the main ring magnet power supply of the 12 GeV KEK proton synchrotron

International Nuclear Information System (INIS)

Sueno, T.; Mikawa, K.; Toda, M.; Toyama, T.; Sato, H.; Matsumoto, S.

1992-01-01

A general description of the computer control system of the KEK 12 GeV PS main ring magnet power supply is given, including its peripheral devices. The system consists of the main HIDIC-V90/25 CPU and of the input and output controllers HISEC-04M. The main CPU, supervised by UNIX, provides the man-machine interfacing and implements the repetitive control algorithm to correct for any magnet current deviation from reference. Two sub-CPU's are linked by a LAN and supported by a real time multi-task monitor. The output process controller distributes the control patterns to 16-bit DAC's, at 1.67 ms clock period in synchronism with the 3-phase ac line systems. The input controller logs the magnet current and voltage, via 16-bit ADC's at the same clock rate. (author)
Progress and new advances in simulating electron microscopy datasets using MULTEM

Energy Technology Data Exchange (ETDEWEB)

Lobato, I., E-mail: Ivan.Lobato@uantwerpen.be; Van Aert, S.; Verbeeck, J.

2016-09-15

A new version of the open source program MULTEM is presented here. It includes a graphical user interface, tapering truncation of the atomic potential, CPU multithreading functionality, single/double precision calculations, scanning transmission electron microscopy (STEM) simulations using experimental detector sensitivities, imaging STEM (ISTEM) simulations, energy filtered transmission electron microscopy (EFTEM) simulations, STEM electron energy loss spectroscopy (EELS) simulations along with other improvements in the algorithms. We also present a mixed channeling approach for the calculation of inelastic excitations, which allows one to considerably speed up time consuming EFTEM/STEM-EELS calculations. - Highlights: • We present a new version of the CPU/GPU open source program MULTEM. • A cross-platform graphical user interface is developed. • We include inelastic excitations for EFTEM/STEM-EELS calculations. • We add CPU multithreading functionality and single/double precision calculations.
Energy Adaption for Multimedia Information Kiosks

DEFF Research Database (Denmark)

Urunuela, Richard; Muller, Gilles; Lawall, Julia Laetitia

2006-01-01

Video kiosks increasingly contain powerful PC-like embedded processors, allowing them to display video at a high level of quality. Such video display, however, entails significant energy consumption.This paper presents an approach to reducing energy consumption by adapting the CPU clock frequency....... In contrast to previous approaches, we exploit the specific behavior of a video kiosk. Because a kiosk plays the same set of movies over and over, we choose a CPU frequency for a given frame based on the computational requirements of the frame that were observed on earlier iterations. We have implemented our...... approach in the legacy video player MPlayer. On a PC like those that can be found in kiosks, we observe increases in battery lifetime of up to 2 times as compared to running at the maximum CPU frequency on a set of high resolution divx movies....
Progress and new advances in simulating electron microscopy datasets using MULTEM

International Nuclear Information System (INIS)

Lobato, I.; Van Aert, S.; Verbeeck, J.

2016-01-01

A new version of the open source program MULTEM is presented here. It includes a graphical user interface, tapering truncation of the atomic potential, CPU multithreading functionality, single/double precision calculations, scanning transmission electron microscopy (STEM) simulations using experimental detector sensitivities, imaging STEM (ISTEM) simulations, energy filtered transmission electron microscopy (EFTEM) simulations, STEM electron energy loss spectroscopy (EELS) simulations along with other improvements in the algorithms. We also present a mixed channeling approach for the calculation of inelastic excitations, which allows one to considerably speed up time consuming EFTEM/STEM-EELS calculations. - Highlights: • We present a new version of the CPU/GPU open source program MULTEM. • A cross-platform graphical user interface is developed. • We include inelastic excitations for EFTEM/STEM-EELS calculations. • We add CPU multithreading functionality and single/double precision calculations.
A TBB-CUDA Implementation for Background Removal in a Video-Based Fire Detection System

Directory of Open Access Journals (Sweden)

Fan Wang

2014-01-01

Full Text Available This paper presents a parallel TBB-CUDA implementation for the acceleration of single-Gaussian distribution model, which is effective for background removal in the video-based fire detection system. In this framework, TBB mainly deals with initializing work of the estimated Gaussian model running on CPU, and CUDA performs background removal and adaption of the model running on GPU. This implementation can exploit the combined computation power of TBB-CUDA, which can be applied to the real-time environment. Over 220 video sequences are utilized in the experiments. The experimental results illustrate that TBB+CUDA can achieve a higher speedup than both TBB and CUDA. The proposed framework can effectively overcome the disadvantages of limited memory bandwidth and few execution units of CPU, and it reduces data transfer latency and memory latency between CPU and GPU.
Real time analysis with the upgraded LHCb trigger in Run-III

CERN Multimedia

Szumlak, Tomasz

2016-01-01

The current LHCb trigger system consists of a hardware level, which reduces the LHC bunch-crossing rate of 40 MHz to 1 MHz, a rate at which the entire detector is read out. A second level, implemented in a farm of around 20k parallel processing CPUs, the event rate is reduced to around 12.5 kHz. The LHCb experiment plans a major upgrade of the detector and DAQ system in the LHC long shutdown II (2018-2019 ). In this upgrade, a purely software based trigger system is being developed and it will have to process the full 30 MHz of bunch crossings with inelastic collisions. LHCb will also receive a factor of 5 increase in the instantaneous luminosity, which further contributes to the challenge of reconstructing and selecting events in real time with the CPU farm. We discuss the plans and progress towards achieving efficient reconstruction and selection with a 30 MHz throughput. Another challenge is to exploit the increased signal rate that results from removing the 1 MHz readout bottleneck, combined with the high...
Totally optimal decision trees for Boolean functions

KAUST Repository

Chikalov, Igor

2016-07-28

We study decision trees which are totally optimal relative to different sets of complexity parameters for Boolean functions. A totally optimal tree is an optimal tree relative to each parameter from the set simultaneously. We consider the parameters characterizing both time (in the worst- and average-case) and space complexity of decision trees, i.e., depth, total path length (average depth), and number of nodes. We have created tools based on extensions of dynamic programming to study totally optimal trees. These tools are applicable to both exact and approximate decision trees, and allow us to make multi-stage optimization of decision trees relative to different parameters and to count the number of optimal trees. Based on the experimental results we have formulated the following hypotheses (and subsequently proved): for almost all Boolean functions there exist totally optimal decision trees (i) relative to the depth and number of nodes, and (ii) relative to the depth and average depth.
Total time on test processes and applications to failure data analysis

International Nuclear Information System (INIS)

Barlow, R.E.; Campo, R.

1975-01-01

This paper describes a new method for analyzing data. The method applies to non-negative observations such as times to failure of devices and survival times of biological organisms and involves a plot of the data. These plots are useful in choosing a probabilistic model to represent the failure behavior of the data. They also furnish information about the failure rate function and aid in its estimation. An important feature of these data plots is that incomplete data can be analyzed. The underlying random variables are, however, assumed to be independent and identically distributed. The plots have a theoretical basis, and converge to a transform of the underlying probability distribution as the sample size increases
Impact of mobile intensive care unit use on total ischemic time and clinical outcomes in ST-elevation myocardial infarction patients - real-world data from the Acute Coronary Syndrome Israeli Survey.

Science.gov (United States)

Koifman, Edward; Beigel, Roy; Iakobishvili, Zaza; Shlomo, Nir; Biton, Yitschak; Sabbag, Avi; Asher, Elad; Atar, Shaul; Gottlieb, Shmuel; Alcalai, Ronny; Zahger, Doron; Segev, Amit; Goldenberg, Ilan; Strugo, Rafael; Matetzky, Shlomi

2017-01-01

Ischemic time has prognostic importance in ST-elevation myocardial infarction patients. Mobile intensive care unit use can reduce components of total ischemic time by appropriate triage of ST-elevation myocardial infarction patients. Data from the Acute Coronary Survey in Israel registry 2000-2010 were analyzed to evaluate factors associated with mobile intensive care unit use and its impact on total ischemic time and patient outcomes. The study comprised 5474 ST-elevation myocardial infarction patients enrolled in the Acute Coronary Survey in Israel registry, of whom 46% ( n=2538) arrived via mobile intensive care units. There was a significant increase in rates of mobile intensive care unit utilization from 36% in 2000 to over 50% in 2010 ( pcare unit use were Killip>1 (odds ratio=1.32, pcare units benefitted from increased rates of primary reperfusion therapy (odds ratio=1.58, pcare unit benefitted from shorter median total ischemic time compared with non-mobile intensive care unit patients (175 (interquartile range 120-262) vs 195 (interquartile range 130-333) min, respectively ( pcare unit use was the most important predictor in achieving door-to-balloon time care unit group (odds ratio=0.79, 95% confidence interval (0.66-0.94), p=0.01). Among patients with ST-elevation myocardial infarction, the utilization of mobile intensive care units is associated with increased rates of primary reperfusion, a reduction in the time interval to reperfusion, and a reduction in one-year adjusted mortality.
Factors affecting wound ooze in total knee replacement

Science.gov (United States)

Butt, U; Ahmad, R; Aspros, D; Bannister, GC

2010-01-01

INTRODUCTION Wound ooze is common following total knee arthroplasty (TKA) and persistent wound infection is a risk factor for infection, and increased length and cost of hospitalisation. PATIENTS AND METHODS We undertook a prospective study to assess the effect of tourniquet time, peri-articular local anaesthesia and surgical approach on wound oozing after TKA. RESULTS The medial parapatellar approach was used in 59 patients (77%) and subvastus in 18 patients (23%). Peri-articular local anaesthesia (0.25% Bupivacaine with 1:1,000,000 adrenalin) was used in 34 patients (44%). The mean tourniquet time was 83 min (range, 38–125 min). We found a significant association between cessation of oozing and peri-articular local anaesthesia (P = 0.003), length of the tourniquet time (P = 0.03) and the subvastus approach (P = 0.01). CONCLUSIONS Peri-articular local anaesthesia, the subvastus approach and shorter tourniquet time were all associated with less wound oozing after total knee arthroplasty. PMID:20836920

Some links on this page may take you to non-federal websites. Their policies may differ from this site.