WorldWideScience

Sample records for sequential processor cpu

  1. A Bayesian sequential processor approach to spectroscopic portal system decisions

    Energy Technology Data Exchange (ETDEWEB)

    Sale, K; Candy, J; Breitfeller, E; Guidry, B; Manatt, D; Gosnell, T; Chambers, D

    2007-07-31

    The development of faster more reliable techniques to detect radioactive contraband in a portal type scenario is an extremely important problem especially in this era of constant terrorist threats. Towards this goal the development of a model-based, Bayesian sequential data processor for the detection problem is discussed. In the sequential processor each datum (detector energy deposit and pulse arrival time) is used to update the posterior probability distribution over the space of model parameters. The nature of the sequential processor approach is that a detection is produced as soon as it is statistically justified by the data rather than waiting for a fixed counting interval before any analysis is performed. In this paper the Bayesian model-based approach, physics and signal processing models and decision functions are discussed along with the first results of our research.

  2. CPU and GPU (Cuda Template Matching Comparison

    Directory of Open Access Journals (Sweden)

    Evaldas Borcovas

    2014-05-01

    Full Text Available Image processing, computer vision or other complicated opticalinformation processing algorithms require large resources. It isoften desired to execute algorithms in real time. It is hard tofulfill such requirements with single CPU processor. NVidiaproposed CUDA technology enables programmer to use theGPU resources in the computer. Current research was madewith Intel Pentium Dual-Core T4500 2.3 GHz processor with4 GB RAM DDR3 (CPU I, NVidia GeForce GT320M CUDAcompliable graphics card (GPU I and Intel Core I5-2500K3.3 GHz processor with 4 GB RAM DDR3 (CPU II, NVidiaGeForce GTX 560 CUDA compatible graphic card (GPU II.Additional libraries as OpenCV 2.1 and OpenCV 2.4.0 CUDAcompliable were used for the testing. Main test were made withstandard function MatchTemplate from the OpenCV libraries.The algorithm uses a main image and a template. An influenceof these factors was tested. Main image and template have beenresized and the algorithm computing time and performancein Gtpix/s have been measured. According to the informationobtained from the research GPU computing using the hardwarementioned earlier is till 24 times faster when it is processing abig amount of information. When the images are small the performanceof CPU and GPU are not significantly different. Thechoice of the template size makes influence on calculating withCPU. Difference in the computing time between the GPUs canbe explained by the number of cores which they have.

  3. Synthetic Aperture Sequential Beamforming implemented on multi-core platforms

    DEFF Research Database (Denmark)

    Kjeldsen, Thomas; Lassen, Lee; Hemmsen, Martin Christian

    2014-01-01

    This paper compares several computational ap- proaches to Synthetic Aperture Sequential Beamforming (SASB) targeting consumer level parallel processors such as multi-core CPUs and GPUs. The proposed implementations demonstrate that ultrasound imaging using SASB can be executed in real- time with ...... per second) on an Intel Core i7 2600 CPU with an AMD HD7850 and a NVIDIA GTX680 GPU. The fastest CPU and GPU implementations use 14% and 1.3% of the real-time budget of 62 ms/frame, respectively. The maximum achieved processing rate is 1265 frames/s....

  4. GPU: the biggest key processor for AI and parallel processing

    Science.gov (United States)

    Baji, Toru

    2017-07-01

    Two types of processors exist in the market. One is the conventional CPU and the other is Graphic Processor Unit (GPU). Typical CPU is composed of 1 to 8 cores while GPU has thousands of cores. CPU is good for sequential processing, while GPU is good to accelerate software with heavy parallel executions. GPU was initially dedicated for 3D graphics. However from 2006, when GPU started to apply general-purpose cores, it was noticed that this architecture can be used as a general purpose massive-parallel processor. NVIDIA developed a software framework Compute Unified Device Architecture (CUDA) that make it possible to easily program the GPU for these application. With CUDA, GPU started to be used in workstations and supercomputers widely. Recently two key technologies are highlighted in the industry. The Artificial Intelligence (AI) and Autonomous Driving Cars. AI requires a massive parallel operation to train many-layers of neural networks. With CPU alone, it was impossible to finish the training in a practical time. The latest multi-GPU system with P100 makes it possible to finish the training in a few hours. For the autonomous driving cars, TOPS class of performance is required to implement perception, localization, path planning processing and again SoC with integrated GPU will play a key role there. In this paper, the evolution of the GPU which is one of the biggest commercial devices requiring state-of-the-art fabrication technology will be introduced. Also overview of the GPU demanding key application like the ones described above will be introduced.

  5. ITCA: Inter-Task Conflict-Aware CPU accounting for CMP

    OpenAIRE

    Luque, Carlos; Moreto Planas, Miquel; Cazorla Almeida, Francisco Javier; Gioiosa, Roberto; Valero Cortés, Mateo

    2010-01-01

    Chip-MultiProcessors (CMP) introduce complexities when accounting CPU utilization to processes because the progress done by a process during an interval of time highly depends on the activity of the other processes it is coscheduled with. We propose a new hardware CPU accounting mechanism to improve the accuracy when measuring the CPU utilization in CMPs and compare it with previous accounting mechanisms. Our results show that currently known mechanisms lead to a 16% average error when it com...

  6. ITCA: Inter-Task Conflict-Aware CPU accounting for CMPs

    OpenAIRE

    Luque, Carlos; Moreto Planas, Miquel; Cazorla, Francisco; Gioiosa, Roberto; Buyuktosunoglu, Alper; Valero Cortés, Mateo

    2009-01-01

    Chip-MultiProcessor (CMP) architectures are becoming more and more popular as an alternative to the traditional processors that only extract instruction-level parallelism from an application. CMPs introduce complexities when accounting CPU utilization. This is due to the fact that the progress done by an application during an interval of time highly depends on the activity of the other applications it is co-scheduled with. In this paper, we identify how an inaccurate measurement of the CPU ut...

  7. 3081/E processor

    International Nuclear Information System (INIS)

    Kunz, P.F.; Gravina, M.; Oxoby, G.

    1984-04-01

    The 3081/E project was formed to prepare a much improved IBM mainframe emulator for the future. Its design is based on a large amount of experience in using the 168/E processor to increase available CPU power in both online and offline environments. The processor will be at least equal to the execution speed of a 370/168 and up to 1.5 times faster for heavy floating point code. A single processor will thus be at least four times more powerful than the VAX 11/780, and five processors on a system would equal at least the performance of the IBM 3081K. With its large memory space and simple but flexible high speed interface, the 3081/E is well suited for the online and offline needs of high energy physics in the future

  8. The Effect of NUMA Tunings on CPU Performance

    Science.gov (United States)

    Hollowell, Christopher; Caramarcu, Costin; Strecker-Kellogg, William; Wong, Antonio; Zaytsev, Alexandr

    2015-12-01

    Non-Uniform Memory Access (NUMA) is a memory architecture for symmetric multiprocessing (SMP) systems where each processor is directly connected to separate memory. Indirect access to other CPU's (remote) RAM is still possible, but such requests are slower as they must also pass through that memory's controlling CPU. In concert with a NUMA-aware operating system, the NUMA hardware architecture can help eliminate the memory performance reductions generally seen in SMP systems when multiple processors simultaneously attempt to access memory. The x86 CPU architecture has supported NUMA for a number of years. Modern operating systems such as Linux support NUMA-aware scheduling, where the OS attempts to schedule a process to the CPU directly attached to the majority of its RAM. In Linux, it is possible to further manually tune the NUMA subsystem using the numactl utility. With the release of Red Hat Enterprise Linux (RHEL) 6.3, the numad daemon became available in this distribution. This daemon monitors a system's NUMA topology and utilization, and automatically makes adjustments to optimize locality. As the number of cores in x86 servers continues to grow, efficient NUMA mappings of processes to CPUs/memory will become increasingly important. This paper gives a brief overview of NUMA, and discusses the effects of manual tunings and numad on the performance of the HEPSPEC06 benchmark, and ATLAS software.

  9. The Effect of NUMA Tunings on CPU Performance

    International Nuclear Information System (INIS)

    Hollowell, Christopher; Caramarcu, Costin; Strecker-Kellogg, William; Wong, Antonio; Zaytsev, Alexandr

    2015-01-01

    Non-Uniform Memory Access (NUMA) is a memory architecture for symmetric multiprocessing (SMP) systems where each processor is directly connected to separate memory. Indirect access to other CPU's (remote) RAM is still possible, but such requests are slower as they must also pass through that memory's controlling CPU. In concert with a NUMA-aware operating system, the NUMA hardware architecture can help eliminate the memory performance reductions generally seen in SMP systems when multiple processors simultaneously attempt to access memory.The x86 CPU architecture has supported NUMA for a number of years. Modern operating systems such as Linux support NUMA-aware scheduling, where the OS attempts to schedule a process to the CPU directly attached to the majority of its RAM. In Linux, it is possible to further manually tune the NUMA subsystem using the numactl utility. With the release of Red Hat Enterprise Linux (RHEL) 6.3, the numad daemon became available in this distribution. This daemon monitors a system's NUMA topology and utilization, and automatically makes adjustments to optimize locality.As the number of cores in x86 servers continues to grow, efficient NUMA mappings of processes to CPUs/memory will become increasingly important. This paper gives a brief overview of NUMA, and discusses the effects of manual tunings and numad on the performance of the HEPSPEC06 benchmark, and ATLAS software. (paper)

  10. Enhancing Leakage Power in CPU Cache Using Inverted Architecture

    OpenAIRE

    Bilal A. Shehada; Ahmed M. Serdah; Aiman Abu Samra

    2013-01-01

    Power consumption is an increasingly pressing problem in modern processor design. Since the on-chip caches usually consume a significant amount of power so power and energy consumption parameters have become one of the most important design constraint. It is one of the most attractive targets for power reduction. This paper presents an approach to enhance the dynamic power consumption of CPU cache using inverted cache architecture. Our assumption tries to reduce dynamic write power dissipatio...

  11. Hardware Realization of an FPGA Processor – Operating System Call Offload and Experiences

    DEFF Research Database (Denmark)

    Hindborg, Andreas Erik; Schleuniger, Pascal; Jensen, Nicklas Bo

    2014-01-01

    Field-programmable gate arrays, FPGAs, are attractive implementation platforms for low-volume signal and image processing applications. The structure of FPGAs allows for an efficient implementation of parallel algorithms. Sequential algorithms, on the other hand, often perform better...... core that can be integrated in many signal and data processing platforms on FPGAs. We also show how we allow the processor to use operating system services. For a set of SPLASH-2 and SPEC CPU2006 benchmarks we show a speedup of up to 64% over a similar Xilinx MicroBlaze implementation while using 27...

  12. Graphics processor efficiency for realization of rapid tabular computations

    International Nuclear Information System (INIS)

    Dudnik, V.A.; Kudryavtsev, V.I.; Us, S.A.; Shestakov, M.V.

    2016-01-01

    Capabilities of graphics processing units (GPU) and central processing units (CPU) have been investigated for realization of fast-calculation algorithms with the use of tabulated functions. The realization of tabulated functions is exemplified by the GPU/CPU architecture-based processors. Comparison is made between the operating efficiencies of GPU and CPU, employed for tabular calculations at different conditions of use. Recommendations are formulated for the use of graphical and central processors to speed up scientific and engineering computations through the use of tabulated functions

  13. A combined PLC and CPU approach to multiprocessor control

    International Nuclear Information System (INIS)

    Harris, J.J.; Broesch, J.D.; Coon, R.M.

    1995-10-01

    A sophisticated multiprocessor control system has been developed for use in the E-Power Supply System Integrated Control (EPSSIC) on the DIII-D tokamak. EPSSIC provides control and interlocks for the ohmic heating coil power supply and its associated systems. Of particular interest is the architecture of this system: both a Programmable Logic Controller (PLC) and a Central Processor Unit (CPU) have been combined on a standard VME bus. The PLC and CPU input and output signals are routed through signal conditioning modules, which provide the necessary voltage and ground isolation. Additionally these modules adapt the signal levels to that of the VME I/O boards. One set of I/O signals is shared between the two processors. The resulting multiprocessor system provides a number of advantages: redundant operation for mission critical situations, flexible communications using conventional TCP/IP protocols, the simplicity of ladder logic programming for the majority of the control code, and an easily maintained and expandable non-proprietary system

  14. Promise of a low power mobile CPU based embedded system in artificial leg control.

    Science.gov (United States)

    Hernandez, Robert; Zhang, Fan; Zhang, Xiaorong; Huang, He; Yang, Qing

    2012-01-01

    This paper presents the design and implementation of a low power embedded system using mobile processor technology (Intel Atom™ Z530 Processor) specifically tailored for a neural-machine interface (NMI) for artificial limbs. This embedded system effectively performs our previously developed NMI algorithm based on neuromuscular-mechanical fusion and phase-dependent pattern classification. The analysis shows that NMI embedded system can meet real-time constraints with high accuracies for recognizing the user's locomotion mode. Our implementation utilizes the mobile processor efficiently to allow a power consumption of 2.2 watts and low CPU utilization (less than 4.3%) while executing the complex NMI algorithm. Our experiments have shown that the highly optimized C program implementation on the embedded system has superb advantages over existing PC implementations on MATLAB. The study results suggest that mobile-CPU-based embedded system is promising for implementing advanced control for powered lower limb prostheses.

  15. Performance of Artificial Intelligence Workloads on the Intel Core 2 Duo Series Desktop Processors

    OpenAIRE

    Abdul Kareem PARCHUR; Kuppangari Krishna RAO; Fazal NOORBASHA; Ram Asaray SINGH

    2010-01-01

    As the processor architecture becomes more advanced, Intel introduced its Intel Core 2 Duo series processors. Performance impact on Intel Core 2 Duo processors are analyzed using SPEC CPU INT 2006 performance numbers. This paper studied the behavior of Artificial Intelligence (AI) benchmarks on Intel Core 2 Duo series processors. Moreover, we estimated the task completion time (TCT) @1 GHz, @2 GHz and @3 GHz Intel Core 2 Duo series processors frequency. Our results show the performance scalab...

  16. Design of a Message Passing Model for Use in a Heterogeneous CPU-NFP Framework for Network Analytics

    CSIR Research Space (South Africa)

    Pennefather, S

    2017-09-01

    Full Text Available of applications written in the Go programming language to be executed on a Network Flow Processor (NFP) for enhanced performance. This paper explores the need and feasibility of implementing a message passing model for data transmission between the NFP and CPU...

  17. Parallel processor for fast event analysis

    International Nuclear Information System (INIS)

    Hensley, D.C.

    1983-01-01

    Current maximum data rates from the Spin Spectrometer of approx. 5000 events/s (up to 1.3 MBytes/s) and minimum analysis requiring at least 3000 operations/event require a CPU cycle time near 70 ns. In order to achieve an effective cycle time of 70 ns, a parallel processing device is proposed where up to 4 independent processors will be implemented in parallel. The individual processors are designed around the Am2910 Microsequencer, the AM29116 μP, and the Am29517 Multiplier. Satellite histogramming in a mass memory system will be managed by a commercial 16-bit μP system

  18. 16-Bit RISC Processor Design for Convolution Application

    OpenAIRE

    Anand Nandakumar Shardul

    2013-01-01

    In this project, we propose a 16-bit non-pipelined RISC processor, which is used for signal processing applications. The processor consists of the blocks, namely, program counter, clock control unit, ALU, IDU and registers. Advantageous architectural modifications have been made in the incremented circuit used in program counter and carry select adder unit of the ALU in the RISC CPU core. Furthermore, a high speed and low power modified modifies multiplier has been designed and introduced in ...

  19. GeantV: from CPU to accelerators

    Science.gov (United States)

    Amadio, G.; Ananya, A.; Apostolakis, J.; Arora, A.; Bandieramonte, M.; Bhattacharyya, A.; Bianchini, C.; Brun, R.; Canal, P.; Carminati, F.; Duhem, L.; Elvira, D.; Gheata, A.; Gheata, M.; Goulas, I.; Iope, R.; Jun, S.; Lima, G.; Mohanty, A.; Nikitina, T.; Novak, M.; Pokorski, W.; Ribon, A.; Sehgal, R.; Shadura, O.; Vallecorsa, S.; Wenzel, S.; Zhang, Y.

    2016-10-01

    The GeantV project aims to research and develop the next-generation simulation software describing the passage of particles through matter. While the modern CPU architectures are being targeted first, resources such as GPGPU, Intel© Xeon Phi, Atom or ARM cannot be ignored anymore by HEP CPU-bound applications. The proof of concept GeantV prototype has been mainly engineered for CPU's having vector units but we have foreseen from early stages a bridge to arbitrary accelerators. A software layer consisting of architecture/technology specific backends supports currently this concept. This approach allows to abstract out the basic types such as scalar/vector but also to formalize generic computation kernels using transparently library or device specific constructs based on Vc, CUDA, Cilk+ or Intel intrinsics. While the main goal of this approach is portable performance, as a bonus, it comes with the insulation of the core application and algorithms from the technology layer. This allows our application to be long term maintainable and versatile to changes at the backend side. The paper presents the first results of basket-based GeantV geometry navigation on the Intel© Xeon Phi KNC architecture. We present the scalability and vectorization study, conducted using Intel performance tools, as well as our preliminary conclusions on the use of accelerators for GeantV transport. We also describe the current work and preliminary results for using the GeantV transport kernel on GPUs.

  20. GeantV: from CPU to accelerators

    International Nuclear Information System (INIS)

    Amadio, G; Bianchini, C; Iope, R; Ananya, A; Arora, A; Apostolakis, J; Bandieramonte, M; Brun, R; Carminati, F; Gheata, A; Gheata, M; Goulas, I; Nikitina, T; Bhattacharyya, A; Mohanty, A; Canal, P; Elvira, D; Jun, S; Lima, G; Duhem, L

    2016-01-01

    The GeantV project aims to research and develop the next-generation simulation software describing the passage of particles through matter. While the modern CPU architectures are being targeted first, resources such as GPGPU, Intel© Xeon Phi, Atom or ARM cannot be ignored anymore by HEP CPU-bound applications. The proof of concept GeantV prototype has been mainly engineered for CPU's having vector units but we have foreseen from early stages a bridge to arbitrary accelerators. A software layer consisting of architecture/technology specific backends supports currently this concept. This approach allows to abstract out the basic types such as scalar/vector but also to formalize generic computation kernels using transparently library or device specific constructs based on Vc, CUDA, Cilk+ or Intel intrinsics. While the main goal of this approach is portable performance, as a bonus, it comes with the insulation of the core application and algorithms from the technology layer. This allows our application to be long term maintainable and versatile to changes at the backend side. The paper presents the first results of basket-based GeantV geometry navigation on the Intel© Xeon Phi KNC architecture. We present the scalability and vectorization study, conducted using Intel performance tools, as well as our preliminary conclusions on the use of accelerators for GeantV transport. We also describe the current work and preliminary results for using the GeantV transport kernel on GPUs. (paper)

  1. Heterogeneous Gpu&Cpu Cluster For High Performance Computing In Cryptography

    Directory of Open Access Journals (Sweden)

    Michał Marks

    2012-01-01

    Full Text Available This paper addresses issues associated with distributed computing systems andthe application of mixed GPU&CPU technology to data encryption and decryptionalgorithms. We describe a heterogenous cluster HGCC formed by twotypes of nodes: Intel processor with NVIDIA graphics processing unit and AMDprocessor with AMD graphics processing unit (formerly ATI, and a novel softwareframework that hides the heterogeneity of our cluster and provides toolsfor solving complex scientific and engineering problems. Finally, we present theresults of numerical experiments. The considered case study is concerned withparallel implementations of selected cryptanalysis algorithms. The main goal ofthe paper is to show the wide applicability of the GPU&CPU technology tolarge scale computation and data processing.

  2. Comparison of Processor Performance of SPECint2006 Benchmarks of some Intel Xeon Processors

    Directory of Open Access Journals (Sweden)

    Abdul Kareem PARCHUR

    2012-08-01

    Full Text Available High performance is a critical requirement to all microprocessors manufacturers. The present paper describes the comparison of performance in two main Intel Xeon series processors (Type A: Intel Xeon X5260, X5460, E5450 and L5320 and Type B: Intel Xeon X5140, 5130, 5120 and E5310. The microarchitecture of these processors is implemented using the basis of a new family of processors from Intel starting with the Pentium 4 processor. These processors can provide a performance boost for many key application areas in modern generation. The scaling of performance in two major series of Intel Xeon processors (Type A: Intel Xeon X5260, X5460, E5450 and L5320 and Type B: Intel Xeon X5140, 5130, 5120 and E5310 has been analyzed using the performance numbers of 12 CPU2006 integer benchmarks, performance numbers that exhibit significant differences in performance. The results and analysis can be used by performance engineers, scientists and developers to better understand the performance scaling in modern generation processors.

  3. Fast parallel computation of polynomials using few processors

    DEFF Research Database (Denmark)

    Valiant, Leslie; Skyum, Sven

    1981-01-01

    It is shown that any multivariate polynomial that can be computed sequentially in C steps and has degree d can be computed in parallel in 0((log d) (log C + log d)) steps using only (Cd)0(1) processors....

  4. The LASS hardware processor

    International Nuclear Information System (INIS)

    Kunz, P.F.

    1976-01-01

    The problems of data analysis with hardware processors are reviewed and a description is given of a programmable processor. This processor, the 168/E, has been designed for use in the LASS multi-processor system; it has an execution speed comparable to the IBM 370/168 and uses the subset of IBM 370 instructions appropriate to the LASS analysis task. (Auth.)

  5. Porting AMG2013 to Heterogeneous CPU+GPU Nodes

    Energy Technology Data Exchange (ETDEWEB)

    Samfass, Philipp [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

    2017-01-26

    LLNL's future advanced technology system SIERRA will feature heterogeneous compute nodes that consist of IBM PowerV9 CPUs and NVIDIA Volta GPUs. Conceptually, the motivation for such an architecture is quite straightforward: While GPUs are optimized for throughput on massively parallel workloads, CPUs strive to minimize latency for rather sequential operations. Yet, making optimal use of heterogeneous architectures raises new challenges for the development of scalable parallel software, e.g., with respect to work distribution. Porting LLNL's parallel numerical libraries to upcoming heterogeneous CPU+GPU architectures is therefore a critical factor for ensuring LLNL's future success in ful lling its national mission. One of these libraries, called HYPRE, provides parallel solvers and precondi- tioners for large, sparse linear systems of equations. In the context of this intern- ship project, I consider AMG2013 which is a proxy application for major parts of HYPRE that implements a benchmark for setting up and solving di erent systems of linear equations. In the following, I describe in detail how I ported multiple parts of AMG2013 to the GPU (Section 2) and present results for di erent experiments that demonstrate a successful parallel implementation on the heterogeneous ma- chines surface and ray (Section 3). In Section 4, I give guidelines on how my code should be used. Finally, I conclude and give an outlook for future work (Section 5).

  6. Real-time autocorrelator for fluorescence correlation spectroscopy based on graphical-processor-unit architecture: method, implementation, and comparative studies

    Science.gov (United States)

    Laracuente, Nicholas; Grossman, Carl

    2013-03-01

    We developed an algorithm and software to calculate autocorrelation functions from real-time photon-counting data using the fast, parallel capabilities of graphical processor units (GPUs). Recent developments in hardware and software have allowed for general purpose computing with inexpensive GPU hardware. These devices are more suited for emulating hardware autocorrelators than traditional CPU-based software applications by emphasizing parallel throughput over sequential speed. Incoming data are binned in a standard multi-tau scheme with configurable points-per-bin size and are mapped into a GPU memory pattern to reduce time-expensive memory access. Applications include dynamic light scattering (DLS) and fluorescence correlation spectroscopy (FCS) experiments. We ran the software on a 64-core graphics pci card in a 3.2 GHz Intel i5 CPU based computer running Linux. FCS measurements were made on Alexa-546 and Texas Red dyes in a standard buffer (PBS). Software correlations were compared to hardware correlator measurements on the same signals. Supported by HHMI and Swarthmore College

  7. Reconstruction of the neutron spectrum using an artificial neural network in CPU and GPU

    International Nuclear Information System (INIS)

    Hernandez D, V. M.; Moreno M, A.; Ortiz L, M. A.; Vega C, H. R.; Alonso M, O. E.

    2016-10-01

    The increase in computing power in personal computers has been increasing, computers now have several processors in the CPU and in addition multiple CUDA cores in the graphics processing unit (GPU); both systems can be used individually or combined to perform scientific computation without resorting to processor or supercomputing arrangements. The Bonner sphere spectrometer is the most commonly used multi-element system for neutron detection purposes and its associated spectrum. Each sphere-detector combination gives a particular response that depends on the energy of the neutrons, and the total set of these responses is known like the responses matrix Rφ(E). Thus, the counting rates obtained with each sphere and the neutron spectrum is related to the Fredholm equation in its discrete version. For the reconstruction of the spectrum has a system of poorly conditioned equations with an infinite number of solutions and to find the appropriate solution, it has been proposed the use of artificial intelligence through neural networks with different platforms CPU and GPU. (Author)

  8. Performance of Artificial Intelligence Workloads on the Intel Core 2 Duo Series Desktop Processors

    Directory of Open Access Journals (Sweden)

    Abdul Kareem PARCHUR

    2010-12-01

    Full Text Available As the processor architecture becomes more advanced, Intel introduced its Intel Core 2 Duo series processors. Performance impact on Intel Core 2 Duo processors are analyzed using SPEC CPU INT 2006 performance numbers. This paper studied the behavior of Artificial Intelligence (AI benchmarks on Intel Core 2 Duo series processors. Moreover, we estimated the task completion time (TCT @1 GHz, @2 GHz and @3 GHz Intel Core 2 Duo series processors frequency. Our results show the performance scalability in Intel Core 2 Duo series processors. Even though AI benchmarks have similar execution time, they have dissimilar characteristics which are identified using principal component analysis and dendogram. As the processor frequency increased from 1.8 GHz to 3.167 GHz the execution time is decreased by ~370 sec for AI workloads. In the case of Physics/Quantum Computing programs it was ~940 sec.

  9. Probabilistic programmable quantum processors

    International Nuclear Information System (INIS)

    Buzek, V.; Ziman, M.; Hillery, M.

    2004-01-01

    We analyze how to improve performance of probabilistic programmable quantum processors. We show how the probability of success of the probabilistic processor can be enhanced by using the processor in loops. In addition, we show that an arbitrary SU(2) transformations of qubits can be encoded in program state of a universal programmable probabilistic quantum processor. The probability of success of this processor can be enhanced by a systematic correction of errors via conditional loops. Finally, we show that all our results can be generalized also for qudits. (Abstract Copyright [2004], Wiley Periodicals, Inc.)

  10. A Programming Framework for Scientific Applications on CPU-GPU Systems

    Energy Technology Data Exchange (ETDEWEB)

    Owens, John

    2013-03-24

    At a high level, my research interests center around designing, programming, and evaluating computer systems that use new approaches to solve interesting problems. The rapid change of technology allows a variety of different architectural approaches to computationally difficult problems, and a constantly shifting set of constraints and trends makes the solutions to these problems both challenging and interesting. One of the most important recent trends in computing has been a move to commodity parallel architectures. This sea change is motivated by the industry’s inability to continue to profitably increase performance on a single processor and instead to move to multiple parallel processors. In the period of review, my most significant work has been leading a research group looking at the use of the graphics processing unit (GPU) as a general-purpose processor. GPUs can potentially deliver superior performance on a broad range of problems than their CPU counterparts, but effectively mapping complex applications to a parallel programming model with an emerging programming environment is a significant and important research problem.

  11. Control structures for high speed processors

    Science.gov (United States)

    Maki, G. K.; Mankin, R.; Owsley, P. A.; Kim, G. M.

    1982-01-01

    A special processor was designed to function as a Reed Solomon decoder with throughput data rate in the Mhz range. This data rate is significantly greater than is possible with conventional digital architectures. To achieve this rate, the processor design includes sequential, pipelined, distributed, and parallel processing. The processor was designed using a high level language register transfer language. The RTL can be used to describe how the different processes are implemented by the hardware. One problem of special interest was the development of dependent processes which are analogous to software subroutines. For greater flexibility, the RTL control structure was implemented in ROM. The special purpose hardware required approximately 1000 SSI and MSI components. The data rate throughput is 2.5 megabits/second. This data rate is achieved through the use of pipelined and distributed processing. This data rate can be compared with 800 kilobits/second in a recently proposed very large scale integration design of a Reed Solomon encoder.

  12. Online performance evaluation of RAID 5 using CPU utilization

    Science.gov (United States)

    Jin, Hai; Yang, Hua; Zhang, Jiangling

    1998-09-01

    Redundant arrays of independent disks (RAID) technology is the efficient way to solve the bottleneck problem between CPU processing ability and I/O subsystem. For the system point of view, the most important metric of on line performance is the utilization of CPU. This paper first employs the way to calculate the CPU utilization of system connected with RAID level 5 using statistic average method. From the simulation results of CPU utilization of system connected with RAID level 5 subsystem can we see that using multiple disks as an array to access data in parallel is the efficient way to enhance the on-line performance of disk storage system. USing high-end disk drivers to compose the disk array is the key to enhance the on-line performance of system.

  13. Embedded Processor Laboratory

    Data.gov (United States)

    Federal Laboratory Consortium — The Embedded Processor Laboratory provides the means to design, develop, fabricate, and test embedded computers for missile guidance electronics systems in support...

  14. Multithreading in vector processors

    Science.gov (United States)

    Evangelinos, Constantinos; Kim, Changhoan; Nair, Ravi

    2018-01-16

    In one embodiment, a system includes a processor having a vector processing mode and a multithreading mode. The processor is configured to operate on one thread per cycle in the multithreading mode. The processor includes a program counter register having a plurality of program counters, and the program counter register is vectorized. Each program counter in the program counter register represents a distinct corresponding thread of a plurality of threads. The processor is configured to execute the plurality of threads by activating the plurality of program counters in a round robin cycle.

  15. Thermally-aware composite run-time CPU power models

    OpenAIRE

    Walker, Matthew J.; Diestelhorst, Stephan; Hansson, Andreas; Balsamo, Domenico; Merrett, Geoff V.; Al-Hashimi, Bashir M.

    2016-01-01

    Accurate and stable CPU power modelling is fundamental in modern system-on-chips (SoCs) for two main reasons: 1) they enable significant online energy savings by providing a run-time manager with reliable power consumption data for controlling CPU energy-saving techniques; 2) they can be used as accurate and trusted reference models for system design and exploration. We begin by showing the limitations in typical performance monitoring counter (PMC) based power modelling approaches and illust...

  16. Fast multipurpose Monte Carlo simulation for proton therapy using multi- and many-core CPU architectures

    Energy Technology Data Exchange (ETDEWEB)

    Souris, Kevin, E-mail: kevin.souris@uclouvain.be; Lee, John Aldo [Center for Molecular Imaging and Experimental Radiotherapy, Institut de Recherche Expérimentale et Clinique, Université catholique de Louvain, Avenue Hippocrate 54, 1200 Brussels, Belgium and ICTEAM Institute, Université catholique de Louvain, Louvain-la-Neuve 1348 (Belgium); Sterpin, Edmond [Center for Molecular Imaging and Experimental Radiotherapy, Institut de Recherche Expérimentale et Clinique, Université catholique de Louvain, Avenue Hippocrate 54, 1200 Brussels, Belgium and Department of Oncology, Katholieke Universiteit Leuven, O& N I Herestraat 49, 3000 Leuven (Belgium)

    2016-04-15

    Purpose: Accuracy in proton therapy treatment planning can be improved using Monte Carlo (MC) simulations. However the long computation time of such methods hinders their use in clinical routine. This work aims to develop a fast multipurpose Monte Carlo simulation tool for proton therapy using massively parallel central processing unit (CPU) architectures. Methods: A new Monte Carlo, called MCsquare (many-core Monte Carlo), has been designed and optimized for the last generation of Intel Xeon processors and Intel Xeon Phi coprocessors. These massively parallel architectures offer the flexibility and the computational power suitable to MC methods. The class-II condensed history algorithm of MCsquare provides a fast and yet accurate method of simulating heavy charged particles such as protons, deuterons, and alphas inside voxelized geometries. Hard ionizations, with energy losses above a user-specified threshold, are simulated individually while soft events are regrouped in a multiple scattering theory. Elastic and inelastic nuclear interactions are sampled from ICRU 63 differential cross sections, thereby allowing for the computation of prompt gamma emission profiles. MCsquare has been benchmarked with the GATE/GEANT4 Monte Carlo application for homogeneous and heterogeneous geometries. Results: Comparisons with GATE/GEANT4 for various geometries show deviations within 2%–1 mm. In spite of the limited memory bandwidth of the coprocessor simulation time is below 25 s for 10{sup 7} primary 200 MeV protons in average soft tissues using all Xeon Phi and CPU resources embedded in a single desktop unit. Conclusions: MCsquare exploits the flexibility of CPU architectures to provide a multipurpose MC simulation tool. Optimized code enables the use of accurate MC calculation within a reasonable computation time, adequate for clinical practice. MCsquare also simulates prompt gamma emission and can thus be used also for in vivo range verification.

  17. Fast multipurpose Monte Carlo simulation for proton therapy using multi- and many-core CPU architectures

    International Nuclear Information System (INIS)

    Souris, Kevin; Lee, John Aldo; Sterpin, Edmond

    2016-01-01

    Purpose: Accuracy in proton therapy treatment planning can be improved using Monte Carlo (MC) simulations. However the long computation time of such methods hinders their use in clinical routine. This work aims to develop a fast multipurpose Monte Carlo simulation tool for proton therapy using massively parallel central processing unit (CPU) architectures. Methods: A new Monte Carlo, called MCsquare (many-core Monte Carlo), has been designed and optimized for the last generation of Intel Xeon processors and Intel Xeon Phi coprocessors. These massively parallel architectures offer the flexibility and the computational power suitable to MC methods. The class-II condensed history algorithm of MCsquare provides a fast and yet accurate method of simulating heavy charged particles such as protons, deuterons, and alphas inside voxelized geometries. Hard ionizations, with energy losses above a user-specified threshold, are simulated individually while soft events are regrouped in a multiple scattering theory. Elastic and inelastic nuclear interactions are sampled from ICRU 63 differential cross sections, thereby allowing for the computation of prompt gamma emission profiles. MCsquare has been benchmarked with the GATE/GEANT4 Monte Carlo application for homogeneous and heterogeneous geometries. Results: Comparisons with GATE/GEANT4 for various geometries show deviations within 2%–1 mm. In spite of the limited memory bandwidth of the coprocessor simulation time is below 25 s for 10"7 primary 200 MeV protons in average soft tissues using all Xeon Phi and CPU resources embedded in a single desktop unit. Conclusions: MCsquare exploits the flexibility of CPU architectures to provide a multipurpose MC simulation tool. Optimized code enables the use of accurate MC calculation within a reasonable computation time, adequate for clinical practice. MCsquare also simulates prompt gamma emission and can thus be used also for in vivo range verification.

  18. Fast multipurpose Monte Carlo simulation for proton therapy using multi- and many-core CPU architectures.

    Science.gov (United States)

    Souris, Kevin; Lee, John Aldo; Sterpin, Edmond

    2016-04-01

    Accuracy in proton therapy treatment planning can be improved using Monte Carlo (MC) simulations. However the long computation time of such methods hinders their use in clinical routine. This work aims to develop a fast multipurpose Monte Carlo simulation tool for proton therapy using massively parallel central processing unit (CPU) architectures. A new Monte Carlo, called MCsquare (many-core Monte Carlo), has been designed and optimized for the last generation of Intel Xeon processors and Intel Xeon Phi coprocessors. These massively parallel architectures offer the flexibility and the computational power suitable to MC methods. The class-II condensed history algorithm of MCsquare provides a fast and yet accurate method of simulating heavy charged particles such as protons, deuterons, and alphas inside voxelized geometries. Hard ionizations, with energy losses above a user-specified threshold, are simulated individually while soft events are regrouped in a multiple scattering theory. Elastic and inelastic nuclear interactions are sampled from ICRU 63 differential cross sections, thereby allowing for the computation of prompt gamma emission profiles. MCsquare has been benchmarked with the gate/geant4 Monte Carlo application for homogeneous and heterogeneous geometries. Comparisons with gate/geant4 for various geometries show deviations within 2%-1 mm. In spite of the limited memory bandwidth of the coprocessor simulation time is below 25 s for 10(7) primary 200 MeV protons in average soft tissues using all Xeon Phi and CPU resources embedded in a single desktop unit. MCsquare exploits the flexibility of CPU architectures to provide a multipurpose MC simulation tool. Optimized code enables the use of accurate MC calculation within a reasonable computation time, adequate for clinical practice. MCsquare also simulates prompt gamma emission and can thus be used also for in vivo range verification.

  19. A design of a computer complex including vector processors

    International Nuclear Information System (INIS)

    Asai, Kiyoshi

    1982-12-01

    We, members of the Computing Center, Japan Atomic Energy Research Institute have been engaged for these six years in the research of adaptability of vector processing to large-scale nuclear codes. The research has been done in collaboration with researchers and engineers of JAERI and a computer manufacturer. In this research, forty large-scale nuclear codes were investigated from the viewpoint of vectorization. Among them, twenty-six codes were actually vectorized and executed. As the results of the investigation, it is now estimated that about seventy percents of nuclear codes and seventy percents of our total amount of CPU time of JAERI are highly vectorizable. Based on the data obtained by the investigation, (1)currently vectorizable CPU time, (2)necessary number of vector processors, (3)necessary manpower for vectorization of nuclear codes, (4)computing speed, memory size, number of parallel 1/0 paths, size and speed of 1/0 buffer of vector processor suitable for our applications, (5)necessary software and operational policy for use of vector processors are discussed, and finally (6)a computer complex including vector processors is presented in this report. (author)

  20. Code compression for VLIW embedded processors

    Science.gov (United States)

    Piccinelli, Emiliano; Sannino, Roberto

    2004-04-01

    The implementation of processors for embedded systems implies various issues: main constraints are cost, power dissipation and die area. On the other side, new terminals perform functions that require more computational flexibility and effort. Long code streams must be loaded into memories, which are expensive and power consuming, to run on DSPs or CPUs. To overcome this issue, the "SlimCode" proprietary algorithm presented in this paper (patent pending technology) can reduce the dimensions of the program memory. It can run offline and work directly on the binary code the compiler generates, by compressing it and creating a new binary file, about 40% smaller than the original one, to be loaded into the program memory of the processor. The decompression unit will be a small ASIC, placed between the Memory Controller and the System bus of the processor, keeping unchanged the internal CPU architecture: this implies that the methodology is completely transparent to the core. We present comparisons versus the state-of-the-art IBM Codepack algorithm, along with its architectural implementation into the ST200 VLIW family core.

  1. An Integrated Pipeline of Open Source Software Adapted for Multi-CPU Architectures: Use in the Large-Scale Identification of Single Nucleotide Polymorphisms

    Directory of Open Access Journals (Sweden)

    B. Jayashree

    2007-01-01

    Full Text Available The large amounts of EST sequence data available from a single species of an organism as well as for several species within a genus provide an easy source of identification of intra- and interspecies single nucleotide polymorphisms (SNPs. In the case of model organisms, the data available are numerous, given the degree of redundancy in the deposited EST data. There are several available bioinformatics tools that can be used to mine this data; however, using them requires a certain level of expertise: the tools have to be used sequentially with accompanying format conversion and steps like clustering and assembly of sequences become time-intensive jobs even for moderately sized datasets. We report here a pipeline of open source software extended to run on multiple CPU architectures that can be used to mine large EST datasets for SNPs and identify restriction sites for assaying the SNPs so that cost-effective CAPS assays can be developed for SNP genotyping in genetics and breeding applications. At the International Crops Research Institute for the Semi-Arid Tropics (ICRISAT, the pipeline has been implemented to run on a Paracel high-performance system consisting of four dual AMD Opteron processors running Linux with MPICH. The pipeline can be accessed through user-friendly web interfaces at http://hpc.icrisat.cgiar.org/PBSWeb and is available on request for academic use. We have validated the developed pipeline by mining chickpea ESTs for interspecies SNPs, development of CAPS assays for SNP genotyping, and confirmation of restriction digestion pattern at the sequence level.

  2. Computational Particle Dynamic Simulations on Multicore Processors (CPDMu) Final Report Phase I

    Energy Technology Data Exchange (ETDEWEB)

    Schmalz, Mark S

    2011-07-24

    Statement of Problem - Department of Energy has many legacy codes for simulation of computational particle dynamics and computational fluid dynamics applications that are designed to run on sequential processors and are not easily parallelized. Emerging high-performance computing architectures employ massively parallel multicore architectures (e.g., graphics processing units) to increase throughput. Parallelization of legacy simulation codes is a high priority, to achieve compatibility, efficiency, accuracy, and extensibility. General Statement of Solution - A legacy simulation application designed for implementation on mainly-sequential processors has been represented as a graph G. Mathematical transformations, applied to G, produce a graph representation {und G} for a high-performance architecture. Key computational and data movement kernels of the application were analyzed/optimized for parallel execution using the mapping G {yields} {und G}, which can be performed semi-automatically. This approach is widely applicable to many types of high-performance computing systems, such as graphics processing units or clusters comprised of nodes that contain one or more such units. Phase I Accomplishments - Phase I research decomposed/profiled computational particle dynamics simulation code for rocket fuel combustion into low and high computational cost regions (respectively, mainly sequential and mainly parallel kernels), with analysis of space and time complexity. Using the research team's expertise in algorithm-to-architecture mappings, the high-cost kernels were transformed, parallelized, and implemented on Nvidia Fermi GPUs. Measured speedups (GPU with respect to single-core CPU) were approximately 20-32X for realistic model parameters, without final optimization. Error analysis showed no loss of computational accuracy. Commercial Applications and Other Benefits - The proposed research will constitute a breakthrough in solution of problems related to efficient

  3. Integrated fuel processor development

    International Nuclear Information System (INIS)

    Ahmed, S.; Pereira, C.; Lee, S. H. D.; Krumpelt, M.

    2001-01-01

    The Department of Energy's Office of Advanced Automotive Technologies has been supporting the development of fuel-flexible fuel processors at Argonne National Laboratory. These fuel processors will enable fuel cell vehicles to operate on fuels available through the existing infrastructure. The constraints of on-board space and weight require that these fuel processors be designed to be compact and lightweight, while meeting the performance targets for efficiency and gas quality needed for the fuel cell. This paper discusses the performance of a prototype fuel processor that has been designed and fabricated to operate with liquid fuels, such as gasoline, ethanol, methanol, etc. Rated for a capacity of 10 kWe (one-fifth of that needed for a car), the prototype fuel processor integrates the unit operations (vaporization, heat exchange, etc.) and processes (reforming, water-gas shift, preferential oxidation reactions, etc.) necessary to produce the hydrogen-rich gas (reformate) that will fuel the polymer electrolyte fuel cell stacks. The fuel processor work is being complemented by analytical and fundamental research. With the ultimate objective of meeting on-board fuel processor goals, these studies include: modeling fuel cell systems to identify design and operating features; evaluating alternative fuel processing options; and developing appropriate catalysts and materials. Issues and outstanding challenges that need to be overcome in order to develop practical, on-board devices are discussed

  4. Reconstruction of the neutron spectrum using an artificial neural network in CPU and GPU; Reconstruccion del espectro de neutrones usando una red neuronal artificial (RNA) en CPU y GPU

    Energy Technology Data Exchange (ETDEWEB)

    Hernandez D, V. M.; Moreno M, A.; Ortiz L, M. A. [Universidad de Cordoba, 14002 Cordoba (Spain); Vega C, H. R.; Alonso M, O. E., E-mail: vic.mc68010@gmail.com [Universidad Autonoma de Zacatecas, 98000 Zacatecas, Zac. (Mexico)

    2016-10-15

    The increase in computing power in personal computers has been increasing, computers now have several processors in the CPU and in addition multiple CUDA cores in the graphics processing unit (GPU); both systems can be used individually or combined to perform scientific computation without resorting to processor or supercomputing arrangements. The Bonner sphere spectrometer is the most commonly used multi-element system for neutron detection purposes and its associated spectrum. Each sphere-detector combination gives a particular response that depends on the energy of the neutrons, and the total set of these responses is known like the responses matrix Rφ(E). Thus, the counting rates obtained with each sphere and the neutron spectrum is related to the Fredholm equation in its discrete version. For the reconstruction of the spectrum has a system of poorly conditioned equations with an infinite number of solutions and to find the appropriate solution, it has been proposed the use of artificial intelligence through neural networks with different platforms CPU and GPU. (Author)

  5. a method of gravity and seismic sequential inversion and its GPU implementation

    Science.gov (United States)

    Liu, G.; Meng, X.

    2011-12-01

    In this abstract, we introduce a gravity and seismic sequential inversion method to invert for density and velocity together. For the gravity inversion, we use an iterative method based on correlation imaging algorithm; for the seismic inversion, we use the full waveform inversion. The link between the density and velocity is an empirical formula called Gardner equation, for large volumes of data, we use the GPU to accelerate the computation. For the gravity inversion method , we introduce a method based on correlation imaging algorithm,it is also a interative method, first we calculate the correlation imaging of the observed gravity anomaly, it is some value between -1 and +1, then we multiply this value with a little density ,this value become the initial density model. We get a forward reuslt with this initial model and also calculate the correaltion imaging of the misfit of observed data and the forward data, also multiply the correaltion imaging result a little density and add it to the initial model, then do the same procedure above , at last ,we can get a inversion density model. For the seismic inveron method ,we use a mothod base on the linearity of acoustic wave equation written in the frequency domain,with a intial velociy model, we can get a good velocity result. In the sequential inversion of gravity and seismic , we need a link formula to convert between density and velocity ,in our method , we use the Gardner equation. Driven by the insatiable market demand for real time, high-definition 3D images, the programmable NVIDIA Graphic Processing Unit (GPU) as co-processor of CPU has been developed for high performance computing. Compute Unified Device Architecture (CUDA) is a parallel programming model and software environment provided by NVIDIA designed to overcome the challenge of using traditional general purpose GPU while maintaining a low learn curve for programmers familiar with standard programming languages such as C. In our inversion processing

  6. Logistic Fuel Processor Development

    National Research Council Canada - National Science Library

    Salavani, Reza

    2004-01-01

    ... to light gases then steam reform the light gases into hydrogen rich stream. This report documents the efforts in developing a fuel processor capable of providing hydrogen to a 3kW fuel cell stack...

  7. Logistic Fuel Processor Development

    National Research Council Canada - National Science Library

    Salavani, Reza

    2004-01-01

    The Air Base Technologies Division of the Air Force Research Laboratory has developed a logistic fuel processor that removes the sulfur content of the fuel and in the process converts logistic fuel...

  8. Adaptive signal processor

    Energy Technology Data Exchange (ETDEWEB)

    Walz, H.V.

    1980-07-01

    An experimental, general purpose adaptive signal processor system has been developed, utilizing a quantized (clipped) version of the Widrow-Hoff least-mean-square adaptive algorithm developed by Moschner. The system accommodates 64 adaptive weight channels with 8-bit resolution for each weight. Internal weight update arithmetic is performed with 16-bit resolution, and the system error signal is measured with 12-bit resolution. An adapt cycle of adjusting all 64 weight channels is accomplished in 8 ..mu..sec. Hardware of the signal processor utilizes primarily Schottky-TTL type integrated circuits. A prototype system with 24 weight channels has been constructed and tested. This report presents details of the system design and describes basic experiments performed with the prototype signal processor. Finally some system configurations and applications for this adaptive signal processor are discussed.

  9. Adaptive signal processor

    International Nuclear Information System (INIS)

    Walz, H.V.

    1980-07-01

    An experimental, general purpose adaptive signal processor system has been developed, utilizing a quantized (clipped) version of the Widrow-Hoff least-mean-square adaptive algorithm developed by Moschner. The system accommodates 64 adaptive weight channels with 8-bit resolution for each weight. Internal weight update arithmetic is performed with 16-bit resolution, and the system error signal is measured with 12-bit resolution. An adapt cycle of adjusting all 64 weight channels is accomplished in 8 μsec. Hardware of the signal processor utilizes primarily Schottky-TTL type integrated circuits. A prototype system with 24 weight channels has been constructed and tested. This report presents details of the system design and describes basic experiments performed with the prototype signal processor. Finally some system configurations and applications for this adaptive signal processor are discussed

  10. Array processor architecture

    Science.gov (United States)

    Barnes, George H. (Inventor); Lundstrom, Stephen F. (Inventor); Shafer, Philip E. (Inventor)

    1983-01-01

    A high speed parallel array data processing architecture fashioned under a computational envelope approach includes a data base memory for secondary storage of programs and data, and a plurality of memory modules interconnected to a plurality of processing modules by a connection network of the Omega gender. Programs and data are fed from the data base memory to the plurality of memory modules and from hence the programs are fed through the connection network to the array of processors (one copy of each program for each processor). Execution of the programs occur with the processors operating normally quite independently of each other in a multiprocessing fashion. For data dependent operations and other suitable operations, all processors are instructed to finish one given task or program branch before all are instructed to proceed in parallel processing fashion on the next instruction. Even when functioning in the parallel processing mode however, the processors are not locked-step but execute their own copy of the program individually unless or until another overall processor array synchronization instruction is issued.

  11. Functional unit for a processor

    NARCIS (Netherlands)

    Rohani, A.; Kerkhoff, Hans G.

    2013-01-01

    The invention relates to a functional unit for a processor, such as a Very Large Instruction Word Processor. The invention further relates to a processor comprising at least one such functional unit. The invention further relates to a functional unit and processor capable of mitigating the effect of

  12. STEM image simulation with hybrid CPU/GPU programming

    International Nuclear Information System (INIS)

    Yao, Y.; Ge, B.H.; Shen, X.; Wang, Y.G.; Yu, R.C.

    2016-01-01

    STEM image simulation is achieved via hybrid CPU/GPU programming under parallel algorithm architecture to speed up calculation on a personal computer (PC). To utilize the calculation power of a PC fully, the simulation is performed using the GPU core and multi-CPU cores at the same time to significantly improve efficiency. GaSb and an artificial GaSb/InAs interface with atom diffusion have been used to verify the computation. - Highlights: • STEM image simulation is achieved by hybrid CPU/GPU programming under parallel algorithm architecture to speed up the calculation in the personal computer (PC). • In order to fully utilize the calculation power of the PC, the simulation is performed by GPU core and multi-CPU cores at the same time so efficiency is improved significantly. • GaSb and artificial GaSb/InAs interface with atom diffusion have been used to verify the computation. The results reveal some unintuitive phenomena about the contrast variation with the atom numbers.

  13. CPU and cache efficient management of memory-resident databases

    NARCIS (Netherlands)

    Pirk, H.; Funke, F.; Grund, M.; Neumann, T.; Leser, U.; Manegold, S.; Kemper, A.; Kersten, M.L.

    2013-01-01

    Memory-Resident Database Management Systems (MRDBMS) have to be optimized for two resources: CPU cycles and memory bandwidth. To optimize for bandwidth in mixed OLTP/OLAP scenarios, the hybrid or Partially Decomposed Storage Model (PDSM) has been proposed. However, in current implementations,

  14. CPU and Cache Efficient Management of Memory-Resident Databases

    NARCIS (Netherlands)

    H. Pirk (Holger); F. Funke; M. Grund; T. Neumann (Thomas); U. Leser; S. Manegold (Stefan); A. Kemper (Alfons); M.L. Kersten (Martin)

    2013-01-01

    htmlabstractMemory-Resident Database Management Systems (MRDBMS) have to be optimized for two resources: CPU cycles and memory bandwidth. To optimize for bandwidth in mixed OLTP/OLAP scenarios, the hybrid or Partially Decomposed Storage Model (PDSM) has been proposed. However, in current

  15. STEM image simulation with hybrid CPU/GPU programming

    Energy Technology Data Exchange (ETDEWEB)

    Yao, Y., E-mail: yaoyuan@iphy.ac.cn; Ge, B.H.; Shen, X.; Wang, Y.G.; Yu, R.C.

    2016-07-15

    STEM image simulation is achieved via hybrid CPU/GPU programming under parallel algorithm architecture to speed up calculation on a personal computer (PC). To utilize the calculation power of a PC fully, the simulation is performed using the GPU core and multi-CPU cores at the same time to significantly improve efficiency. GaSb and an artificial GaSb/InAs interface with atom diffusion have been used to verify the computation. - Highlights: • STEM image simulation is achieved by hybrid CPU/GPU programming under parallel algorithm architecture to speed up the calculation in the personal computer (PC). • In order to fully utilize the calculation power of the PC, the simulation is performed by GPU core and multi-CPU cores at the same time so efficiency is improved significantly. • GaSb and artificial GaSb/InAs interface with atom diffusion have been used to verify the computation. The results reveal some unintuitive phenomena about the contrast variation with the atom numbers.

  16. Fast Parallel Computation of Polynomials Using Few Processors

    DEFF Research Database (Denmark)

    Valiant, Leslie G.; Skyum, Sven; Berkowitz, S.

    1983-01-01

    It is shown that any multivariate polynomial of degree $d$ that can be computed sequentially in $C$ steps can be computed in parallel in $O((\\log d)(\\log C + \\log d))$ steps using only $(Cd)^{O(1)} $ processors....

  17. 3081//sub E/ processor

    International Nuclear Information System (INIS)

    Kunz, P.F.; Gravina, M.; Oxoby, G.; Trang, Q.; Fucci, A.; Jacobs, D.; Martin, B.; Storr, K.

    1983-03-01

    Since the introduction of the 168//sub E/, emulating processors have been successful over an amazingly wide range of applications. This paper will describe a second generation processor, the 3081//sub E/. This new processor, which is being developed as a collaboration between SLAC and CERN, goes beyond just fixing the obvious faults of the 168//sub E/. Not only will the 3081//sub E/ have much more memory space, incorporate many more IBM instructions, and have much more memory space, incorporate many more IBM instructions, and have full double precision floating point arithmetic, but it will also have faster execution times and be much simpler to build, debug, and maintain. The simple interface and reasonable cost of the 168//sub E/ will be maintained for the 3081//sub E/

  18. The ATLAS Trigger Algorithms for General Purpose Graphics Processor Units

    CERN Document Server

    Tavares Delgado, Ademar; The ATLAS collaboration

    2016-01-01

    The ATLAS Trigger Algorithms for General Purpose Graphics Processor Units Type: Talk Abstract: We present the ATLAS Trigger algorithms developed to exploit General­ Purpose Graphics Processor Units. ATLAS is a particle physics experiment located on the LHC collider at CERN. The ATLAS Trigger system has two levels, hardware-­based Level 1 and the High Level Trigger implemented in software running on a farm of commodity CPU. Performing the trigger event selection within the available farm resources presents a significant challenge that will increase future LHC upgrades. are being evaluated as a potential solution for trigger algorithms acceleration. Key factors determining the potential benefit of this new technology are the relative execution speedup, the number of GPUs required and the relative financial cost of the selected GPU. We have developed a trigger demonstrator which includes algorithms for reconstructing tracks in the Inner Detector and Muon Spectrometer and clusters of energy deposited in the Cal...

  19. Parallelizing ATLAS Reconstruction and Simulation: Issues and Optimization Solutions for Scaling on Multi- and Many-CPU Platforms

    International Nuclear Information System (INIS)

    Leggett, C; Jackson, K; Tatarkhanov, M; Yao, Y; Binet, S; Levinthal, D

    2011-01-01

    Thermal limitations have forced CPU manufacturers to shift from simply increasing clock speeds to improve processor performance, to producing chip designs with multi- and many-core architectures. Further the cores themselves can run multiple threads as a zero overhead context switch allowing low level resource sharing (Intel Hyperthreading). To maximize bandwidth and minimize memory latency, memory access has become non uniform (NUMA). As manufacturers add more cores to each chip, a careful understanding of the underlying architecture is required in order to fully utilize the available resources. We present AthenaMP and the Atlas event loop manager, the driver of the simulation and reconstruction engines, which have been rewritten to make use of multiple cores, by means of event based parallelism, and final stage I/O synchronization. However, initial studies on 8 andl6 core Intel architectures have shown marked non-linearities as parallel process counts increase, with as much as 30% reductions in event throughput in some scenarios. Since the Intel Nehalem architecture (both Gainestown and Westmere) will be the most common choice for the next round of hardware procurements, an understanding of these scaling issues is essential. Using hardware based event counters and Intel's Performance Tuning Utility, we have studied the performance bottlenecks at the hardware level, and discovered optimization schemes to maximize processor throughput. We have also produced optimization mechanisms, common to all large experiments, that address the extreme nature of today's HEP code, which due to it's size, places huge burdens on the memory infrastructure of today's processors.

  20. SpaceCubeX: A Framework for Evaluating Hybrid Multi-Core CPU FPGA DSP Architectures

    Science.gov (United States)

    Schmidt, Andrew G.; Weisz, Gabriel; French, Matthew; Flatley, Thomas; Villalpando, Carlos Y.

    2017-01-01

    The SpaceCubeX project is motivated by the need for high performance, modular, and scalable on-board processing to help scientists answer critical 21st century questions about global climate change, air quality, ocean health, and ecosystem dynamics, while adding new capabilities such as low-latency data products for extreme event warnings. These goals translate into on-board processing throughput requirements that are on the order of 100-1,000 more than those of previous Earth Science missions for standard processing, compression, storage, and downlink operations. To study possible future architectures to achieve these performance requirements, the SpaceCubeX project provides an evolvable testbed and framework that enables a focused design space exploration of candidate hybrid CPU/FPGA/DSP processing architectures. The framework includes ArchGen, an architecture generator tool populated with candidate architecture components, performance models, and IP cores, that allows an end user to specify the type, number, and connectivity of a hybrid architecture. The framework requires minimal extensions to integrate new processors, such as the anticipated High Performance Spaceflight Computer (HPSC), reducing time to initiate benchmarking by months. To evaluate the framework, we leverage a wide suite of high performance embedded computing benchmarks and Earth science scenarios to ensure robust architecture characterization. We report on our projects Year 1 efforts and demonstrate the capabilities across four simulation testbed models, a baseline SpaceCube 2.0 system, a dual ARM A9 processor system, a hybrid quad ARM A53 and FPGA system, and a hybrid quad ARM A53 and DSP system.

  1. Sequential Banking.

    OpenAIRE

    Bizer, David S; DeMarzo, Peter M

    1992-01-01

    The authors study environments in which agents may borrow sequentially from more than one leader. Although debt is prioritized, additional lending imposes an externality on prior debt because, with moral hazard, the probability of repayment of prior loans decreases. Equilibrium interest rates are higher than they would be if borrowers could commit to borrow from at most one bank. Even though the loan terms are less favorable than they would be under commitment, the indebtedness of borrowers i...

  2. The Central Trigger Processor (CTP)

    CERN Multimedia

    Franchini, Matteo

    2016-01-01

    The Central Trigger Processor (CTP) receives trigger information from the calorimeter and muon trigger processors, as well as from other sources of trigger. It makes the Level-1 decision (L1A) based on a trigger menu.

  3. Very Long Instruction Word Processors

    Indian Academy of Sciences (India)

    Pentium Processor have modified the processor architecture to exploit parallelism in a program. .... The type of operation itself is encoded using 14 bits. .... text of designing simple architectures with low power consump- tion and execute x86 ...

  4. Scientific programming on massively parallel processor CP-PACS

    International Nuclear Information System (INIS)

    Boku, Taisuke

    1998-01-01

    The massively parallel processor CP-PACS takes various problems of calculation physics as the object, and it has been designed so that its architecture has been devised to do various numerical processings. In this report, the outline of the CP-PACS and the example of programming in the Kernel CG benchmark in NAS Parallel Benchmarks, version 1, are shown, and the pseudo vector processing mechanism and the parallel processing tuning of scientific and technical computation utilizing the three-dimensional hyper crossbar net, which are two great features of the architecture of the CP-PACS are described. As for the CP-PACS, the PUs based on RISC processor and added with pseudo vector processor are used. Pseudo vector processing is realized as the loop processing by scalar command. The features of the connection net of PUs are explained. The algorithm of the NPB version 1 Kernel CG is shown. The part that takes the time for processing most in the main loop is the product of matrix and vector (matvec), and the parallel processing of the matvec is explained. The time for the computation by the CPU is determined. As the evaluation of the performance, the evaluation of the time for execution, the short vector processing of pseudo vector processor based on slide window, and the comparison with other parallel computers are reported. (K.I.)

  5. PERFORMANCE EVALUATION OF OR1200 PROCESSOR WITH EVOLUTIONARY PARALLEL HPRC USING GEP

    Directory of Open Access Journals (Sweden)

    R. Maheswari

    2012-04-01

    Full Text Available In this fast computing era, most of the embedded system requires more computing power to complete the complex function/ task at the lesser amount of time. One way to achieve this is by boosting up the processor performance which allows processor core to run faster. This paper presents a novel technique of increasing the performance by parallel HPRC (High Performance Reconfigurable Computing in the CPU/DSP (Digital Signal Processor unit of OR1200 (Open Reduced Instruction Set Computer (RISC 1200 using Gene Expression Programming (GEP an evolutionary programming model. OR1200 is a soft-core RISC processor of the Intellectual Property cores that can efficiently run any modern operating system. In the manufacturing process of OR1200 a parallel HPRC is placed internally in the Integer Execution Pipeline unit of the CPU/DSP core to increase the performance. The GEP Parallel HPRC is activated /deactivated by triggering the signals i HPRC_Gene_Start ii HPRC_Gene_End. A Verilog HDL(Hardware Description language functional code for Gene Expression Programming parallel HPRC is developed and synthesised using XILINX ISE in the former part of the work and a CoreMark processor core benchmark is used to test the performance of the OR1200 soft core in the later part of the work. The result of the implementation ensures the overall speed-up increased to 20.59% by GEP based parallel HPRC in the execution unit of OR1200.

  6. The Molen Polymorphic Media Processor

    NARCIS (Netherlands)

    Kuzmanov, G.K.

    2004-01-01

    In this dissertation, we address high performance media processing based on a tightly coupled co-processor architectural paradigm. More specifically, we introduce a reconfigurable media augmentation of a general purpose processor and implement it into a fully operational processor prototype. The

  7. Dual-core Itanium Processor

    CERN Multimedia

    2006-01-01

    Intel’s first dual-core Itanium processor, code-named "Montecito" is a major release of Intel's Itanium 2 Processor Family, which implements the Intel Itanium architecture on a dual-core processor with two cores per die (integrated circuit). Itanium 2 is much more powerful than its predecessor. It has lower power consumption and thermal dissipation.

  8. Development of an Advanced Digital Reactor Protection System Using Diverse Dual Processors to Prevent Common-Mode Failure

    International Nuclear Information System (INIS)

    Shin, Hyun Kook; Nam, Sang Ku; Sohn, Se Do; Chang, Hoon Seon

    2003-01-01

    The advanced digital reactor protection system (ADRPS) with diverse dual processors has been developed to prevent common-mode failure (CMF). The principle of diversity is applied to both hardware design and software design. For hardware diversity, two different types of CPUs are used for the bistable processor and local coincidence logic (LCL) processor. The Versa Module Eurocard-based single board computers are used for the CPU hardware platforms. The QNX operating system and the VxWorks operating system were selected for software diversity. Functional diversity is also applied to the input and output modules, and to the algorithm in the bistable processors and LCL processors. The characteristics of the newly developed digital protection system are described together with the preventive capability against CMF. Also, system reliability analysis is discussed. The evaluation results show that the ADRPS has a good preventive capability against the CMF and is a highly reliable reactor protection system

  9. Multimode power processor

    Science.gov (United States)

    O'Sullivan, George A.; O'Sullivan, Joseph A.

    1999-01-01

    In one embodiment, a power processor which operates in three modes: an inverter mode wherein power is delivered from a battery to an AC power grid or load; a battery charger mode wherein the battery is charged by a generator; and a parallel mode wherein the generator supplies power to the AC power grid or load in parallel with the battery. In the parallel mode, the system adapts to arbitrary non-linear loads. The power processor may operate on a per-phase basis wherein the load may be synthetically transferred from one phase to another by way of a bumpless transfer which causes no interruption of power to the load when transferring energy sources. Voltage transients and frequency transients delivered to the load when switching between the generator and battery sources are minimized, thereby providing an uninterruptible power supply. The power processor may be used as part of a hybrid electrical power source system which may contain, in one embodiment, a photovoltaic array, diesel engine, and battery power sources.

  10. Scalable Parallelization of Skyline Computation for Multi-core Processors

    DEFF Research Database (Denmark)

    Chester, Sean; Sidlauskas, Darius; Assent, Ira

    2015-01-01

    The skyline is an important query operator for multi-criteria decision making. It reduces a dataset to only those points that offer optimal trade-offs of dimensions. In general, it is very expensive to compute. Recently, multi-core CPU algorithms have been proposed to accelerate the computation...... of the skyline. However, they do not sufficiently minimize dominance tests and so are not competitive with state-of-the-art sequential algorithms. In this paper, we introduce a novel multi-core skyline algorithm, Hybrid, which processes points in blocks. It maintains a shared, global skyline among all threads...

  11. Liquid Cooling System for CPU by Electroconjugate Fluid

    Directory of Open Access Journals (Sweden)

    Yasuo Sakurai

    2014-06-01

    Full Text Available The dissipated power of CPU for personal computer has been increased because the performance of personal computer becomes higher. Therefore, a liquid cooling system has been employed in some personal computers in order to improve their cooling performance. Electroconjugate fluid (ECF is one of the functional fluids. ECF has a remarkable property that a strong jet flow is generated between electrodes when a high voltage is applied to ECF through the electrodes. By using this strong jet flow, an ECF-pump with simple structure, no sliding portion, no noise, and no vibration seems to be able to be developed. And then, by the use of the ECF-pump, a new liquid cooling system by ECF seems to be realized. In this study, to realize this system, an ECF-pump is proposed and fabricated to investigate the basic characteristics of the ECF-pump experimentally. Next, by utilizing the ECF-pump, a model of a liquid cooling system by ECF is manufactured and some experiments are carried out to investigate the performance of this system. As a result, by using this system, the temperature of heat source of 50 W is kept at 60°C or less. In general, CPU is usually used at this temperature or less.

  12. Heterogeneous CPU-GPU moving targets detection for UAV video

    Science.gov (United States)

    Li, Maowen; Tang, Linbo; Han, Yuqi; Yu, Chunlei; Zhang, Chao; Fu, Huiquan

    2017-07-01

    Moving targets detection is gaining popularity in civilian and military applications. On some monitoring platform of motion detection, some low-resolution stationary cameras are replaced by moving HD camera based on UAVs. The pixels of moving targets in the HD Video taken by UAV are always in a minority, and the background of the frame is usually moving because of the motion of UAVs. The high computational cost of the algorithm prevents running it at higher resolutions the pixels of frame. Hence, to solve the problem of moving targets detection based UAVs video, we propose a heterogeneous CPU-GPU moving target detection algorithm for UAV video. More specifically, we use background registration to eliminate the impact of the moving background and frame difference to detect small moving targets. In order to achieve the effect of real-time processing, we design the solution of heterogeneous CPU-GPU framework for our method. The experimental results show that our method can detect the main moving targets from the HD video taken by UAV, and the average process time is 52.16ms per frame which is fast enough to solve the problem.

  13. Video frame processor

    International Nuclear Information System (INIS)

    Joshi, V.M.; Agashe, Alok; Bairi, B.R.

    1993-01-01

    This report provides technical description regarding the Video Frame Processor (VFP) developed at Bhabha Atomic Research Centre. The instrument provides capture of video images available in CCIR format. Two memory planes each with a capacity of 512 x 512 x 8 bit data enable storage of two video image frames. The stored image can be processed on-line and on-line image subtraction can also be carried out for image comparisons. The VFP is a PC Add-on board and is I/O mapped within the host IBM PC/AT compatible computer. (author). 9 refs., 4 figs., 19 photographs

  14. Trigger and decision processors

    International Nuclear Information System (INIS)

    Franke, G.

    1980-11-01

    In recent years there have been many attempts in high energy physics to make trigger and decision processes faster and more sophisticated. This became necessary due to a permanent increase of the number of sensitive detector elements in wire chambers and calorimeters, and in fact it was possible because of the fast developments in integrated circuits technique. In this paper the present situation will be reviewed. The discussion will be mainly focussed upon event filtering by pure software methods and - rather hardware related - microprogrammable processors as well as random access memory triggers. (orig.)

  15. Optical Finite Element Processor

    Science.gov (United States)

    Casasent, David; Taylor, Bradley K.

    1986-01-01

    A new high-accuracy optical linear algebra processor (OLAP) with many advantageous features is described. It achieves floating point accuracy, handles bipolar data by sign-magnitude representation, performs LU decomposition using only one channel, easily partitions and considers data flow. A new application (finite element (FE) structural analysis) for OLAPs is introduced and the results of a case study presented. Error sources in encoded OLAPs are addressed for the first time. Their modeling and simulation are discussed and quantitative data are presented. Dominant error sources and the effects of composite error sources are analyzed.

  16. AMD's 64-bit Opteron processor

    CERN Multimedia

    CERN. Geneva

    2003-01-01

    This talk concentrates on issues that relate to obtaining peak performance from the Opteron processor. Compiler options, memory layout, MPI issues in multi-processor configurations and the use of a NUMA kernel will be covered. A discussion of recent benchmarking projects and results will also be included.BiographiesDavid RichDavid directs AMD's efforts in high performance computing and also in the use of Opteron processors...

  17. The effects of carbon nanotubes on CPU cooling

    Science.gov (United States)

    Challa, Sashi Kiran

    Computers today have evolved from being big bulky machines that took up rooms of space into small simple machines for net browsing and into small but complicated multi-core servers and supercomputing architectures. This has been possible due to the evolution of the processors. Today processors have reached 45nm specifications with millions of transistors. Transistors produce heat when they run. Today more than ever we have a growing need for managing this heat efficiently. It is indicated that increasing power density can cause a difficulty in managing temperatures on a chip. It is also mentioned that we need to move to a more temperature aware architecture. In this research we try and address the issue of handling the heat produced by processors in an efficient manner. We have tried to see if the use of carbon nanotubes will prove useful in dissipating the heat produced by the processor in a more efficient way. In the process we have also tried to come up with a repeatable experimental setup as there is not work that we have been able to find describing this exact procedure. The use of carbon nanotubes seemed natural as they have a very high thermal conductivity value. Also one of the uncertain aspects of the experiment is the use of carbon nanotubes as they are still under study and their properties have not been completely understood and there has been some inconsistency in the theoretical values of their properties and the experimental results obtained so far. The results that we got were not exactly what we expected but were close, and were in the right direction indicating that more work in future would show better and consistent results.

  18. Der ATLAS LVL2-Trigger mit FPGA-Prozessoren : Entwicklung, Aufbau und Funktionsnachweis des hybriden FPGA/CPU-basierten Prozessorsystems ATLANTIS

    CERN Document Server

    Singpiel, Holger

    2000-01-01

    This thesis describes the conception and implementation of the hybrid FPGA/CPU based processing system ATLANTIS as trigger processor for the proposed ATLAS experiment at CERN. CompactPCI provides the close coupling of a multi FPGA system and a standard CPU. The system is scalable in computing power and flexible in use due to its partitioning into dedicated FPGA boards for computation, I/O tasks and a private communication. Main focus of the research activities based on the usage of the ATLANTIS system are two areas in the second level trigger (LVL2). First, the acceleration of time critical B physics trigger algorithms is the major aim. The execution of the full scan TRT algorithm on ATLANTIS, which has been used as a demonstrator, results in a speedup of 5.6 compared to a standard CPU. Next, the ATLANTIS system is used as a hardware platform for research work in conjunction with the ATLAS readout systems. For further studies a permanent installation of the ATLANTIS system in the LVL2 application testbed is f...

  19. Implementation of an EPICS IOC on an Embedded Soft Core Processor Using Field Programmable Gate Arrays

    International Nuclear Information System (INIS)

    Douglas Curry; Alicia Hofler; Hai Dong; Trent Allison; J. Hovater; Kelly Mahoney

    2005-01-01

    At Jefferson Lab, we have been evaluating soft core processors running an EPICS IOC over μClinux on our custom hardware. A soft core processor is a flexible CPU architecture that is configured in the FPGA as opposed to a hard core processor which is fixed in silicon. Combined with an on-board Ethernet port, the technology incorporates the IOC and digital control hardware within a single FPGA. By eliminating the general purpose computer IOC, the designer is no longer tied to a specific platform, e.g. PC, VME, or VXI, to serve as the intermediary between the high level controls and the field hardware. This paper will discuss the design and development process as well as specific applications for JLab's next generation low-level RF controls and Machine Protection Systems

  20. Thermal Dissipation Efficiency in a Micro-Processor Using Carbon Nanotubes Based Composite

    Science.gov (United States)

    Thang, Bui Hung; Van Quang, Cao; Nghia, Van Trong; Hong, Phan Ngoc; Van Chuc, Nguyen; Tam, Ngo Thi Thanh; Quang, Le Dinh; Khang, Dao Duc; Khoi, Phan Hong; Minh, Phan Ngoc

    2009-09-01

    Modern electronic and optoelectronic devices such as μ-processor, light emitting diode, semiconductor laser issued a challenge in the thermal dissipation problem. Finding an effective way for thermal dissipation therefore becomes a very important issue. It is known that carbon nanotubes (CNTs) is one of the most valuable materials with high thermal conductivity (2000 W/m.K compared to thermal conductivity of Ag 419 W/m.K). This suggested an approach in applying the CNTs as an essential component for thermal dissipation media to improve the performance of computer processor and other high power electronic devices. In this work multi walled carbon nanotubes (MWCNTs) based composites were utilized as the thermal dissipation media in a micro processor of a personal computer. The MWCNTs of different concentrations were added into polyaniline, commercial silicon thermal paste and commercial silver thermal paste by mechanical methods. A personal computer with configuration: Intel Pentium IV 3.066 GHz, 512 MB of RAM and Windows XP Service Pack 2 Operating System was employed. The thermal dissipation efficiency of the system was evaluated by directly measure the temperature of the μ-processor during the operation of the computer in different CPU speeds. The measured results showed that the CNTs based composite could reduce the temperature of the u-processor more than 5° C, and the time for increasing the temperature of the μ-processor was three times longer than that when using commercial thermal paste.

  1. Composable processor virtualization for embedded systems

    NARCIS (Netherlands)

    Molnos, A.M.; Milutinovic, A.; She, D.; Goossens, K.G.W.

    2010-01-01

    Processor virtualization divides a physical processor's time among a set of virual machines, enabling efficient hardware utilization, application security and allowing co-existence of different operating systems on the same processor. Through initially intended for the server domain, virtualization

  2. Computing trends using graphic processor in high energy physics

    CERN Document Server

    Niculescu, Mihai

    2011-01-01

    One of the main challenges in Heavy Energy Physics is to make fast analysis of high amount of experimental and simulated data. At LHC-CERN one p-p event is approximate 1 Mb in size. The time taken to analyze the data and obtain fast results depends on high computational power. The main advantage of using GPU(Graphic Processor Unit) programming over traditional CPU one is that graphical cards bring a lot of computing power at a very low price. Today a huge number of application(scientific, financial etc) began to be ported or developed for GPU, including Monte Carlo tools or data analysis tools for High Energy Physics. In this paper, we'll present current status and trends in HEP using GPU.

  3. Speeding up the MATLAB complex networks package using graphic processors

    International Nuclear Information System (INIS)

    Zhang Bai-Da; Wu Jun-Jie; Li Xin; Tang Yu-Hua

    2011-01-01

    The availability of computers and communication networks allows us to gather and analyse data on a far larger scale than previously. At present, it is believed that statistics is a suitable method to analyse networks with millions, or more, of vertices. The MATLAB language, with its mass of statistical functions, is a good choice to rapidly realize an algorithm prototype of complex networks. The performance of the MATLAB codes can be further improved by using graphic processor units (GPU). This paper presents the strategies and performance of the GPU implementation of a complex networks package, and the Jacket toolbox of MATLAB is used. Compared with some commercially available CPU implementations, GPU can achieve a speedup of, on average, 11.3×. The experimental result proves that the GPU platform combined with the MATLAB language is a good combination for complex network research. (interdisciplinary physics and related areas of science and technology)

  4. GENIE: a software package for gene-gene interaction analysis in genetic association studies using multiple GPU or CPU cores

    Directory of Open Access Journals (Sweden)

    Wang Kai

    2011-05-01

    Full Text Available Abstract Background Gene-gene interaction in genetic association studies is computationally intensive when a large number of SNPs are involved. Most of the latest Central Processing Units (CPUs have multiple cores, whereas Graphics Processing Units (GPUs also have hundreds of cores and have been recently used to implement faster scientific software. However, currently there are no genetic analysis software packages that allow users to fully utilize the computing power of these multi-core devices for genetic interaction analysis for binary traits. Findings Here we present a novel software package GENIE, which utilizes the power of multiple GPU or CPU processor cores to parallelize the interaction analysis. GENIE reads an entire genetic association study dataset into memory and partitions the dataset into fragments with non-overlapping sets of SNPs. For each fragment, GENIE analyzes: 1 the interaction of SNPs within it in parallel, and 2 the interaction between the SNPs of the current fragment and other fragments in parallel. We tested GENIE on a large-scale candidate gene study on high-density lipoprotein cholesterol. Using an NVIDIA Tesla C1060 graphics card, the GPU mode of GENIE achieves a speedup of 27 times over its single-core CPU mode run. Conclusions GENIE is open-source, economical, user-friendly, and scalable. Since the computing power and memory capacity of graphics cards are increasing rapidly while their cost is going down, we anticipate that GENIE will achieve greater speedups with faster GPU cards. Documentation, source code, and precompiled binaries can be downloaded from http://www.cceb.upenn.edu/~mli/software/GENIE/.

  5. Distributed processor systems

    International Nuclear Information System (INIS)

    Zacharov, B.

    1976-01-01

    In recent years, there has been a growing tendency in high-energy physics and in other fields to solve computational problems by distributing tasks among the resources of inter-coupled processing devices and associated system elements. This trend has gained further momentum more recently with the increased availability of low-cost processors and with the development of the means of data distribution. In two lectures, the broad question of distributed computing systems is examined and the historical development of such systems reviewed. An attempt is made to examine the reasons for the existence of these systems and to discern the main trends for the future. The components of distributed systems are discussed in some detail and particular emphasis is placed on the importance of standards and conventions in certain key system components. The ideas and principles of distributed systems are discussed in general terms, but these are illustrated by a number of concrete examples drawn from the context of the high-energy physics environment. (Auth.)

  6. Accelerating Spaceborne SAR Imaging Using Multiple CPU/GPU Deep Collaborative Computing

    Directory of Open Access Journals (Sweden)

    Fan Zhang

    2016-04-01

    Full Text Available With the development of synthetic aperture radar (SAR technologies in recent years, the huge amount of remote sensing data brings challenges for real-time imaging processing. Therefore, high performance computing (HPC methods have been presented to accelerate SAR imaging, especially the GPU based methods. In the classical GPU based imaging algorithm, GPU is employed to accelerate image processing by massive parallel computing, and CPU is only used to perform the auxiliary work such as data input/output (IO. However, the computing capability of CPU is ignored and underestimated. In this work, a new deep collaborative SAR imaging method based on multiple CPU/GPU is proposed to achieve real-time SAR imaging. Through the proposed tasks partitioning and scheduling strategy, the whole image can be generated with deep collaborative multiple CPU/GPU computing. In the part of CPU parallel imaging, the advanced vector extension (AVX method is firstly introduced into the multi-core CPU parallel method for higher efficiency. As for the GPU parallel imaging, not only the bottlenecks of memory limitation and frequent data transferring are broken, but also kinds of optimized strategies are applied, such as streaming, parallel pipeline and so on. Experimental results demonstrate that the deep CPU/GPU collaborative imaging method enhances the efficiency of SAR imaging on single-core CPU by 270 times and realizes the real-time imaging in that the imaging rate outperforms the raw data generation rate.

  7. Accelerating Spaceborne SAR Imaging Using Multiple CPU/GPU Deep Collaborative Computing.

    Science.gov (United States)

    Zhang, Fan; Li, Guojun; Li, Wei; Hu, Wei; Hu, Yuxin

    2016-04-07

    With the development of synthetic aperture radar (SAR) technologies in recent years, the huge amount of remote sensing data brings challenges for real-time imaging processing. Therefore, high performance computing (HPC) methods have been presented to accelerate SAR imaging, especially the GPU based methods. In the classical GPU based imaging algorithm, GPU is employed to accelerate image processing by massive parallel computing, and CPU is only used to perform the auxiliary work such as data input/output (IO). However, the computing capability of CPU is ignored and underestimated. In this work, a new deep collaborative SAR imaging method based on multiple CPU/GPU is proposed to achieve real-time SAR imaging. Through the proposed tasks partitioning and scheduling strategy, the whole image can be generated with deep collaborative multiple CPU/GPU computing. In the part of CPU parallel imaging, the advanced vector extension (AVX) method is firstly introduced into the multi-core CPU parallel method for higher efficiency. As for the GPU parallel imaging, not only the bottlenecks of memory limitation and frequent data transferring are broken, but also kinds of optimized strategies are applied, such as streaming, parallel pipeline and so on. Experimental results demonstrate that the deep CPU/GPU collaborative imaging method enhances the efficiency of SAR imaging on single-core CPU by 270 times and realizes the real-time imaging in that the imaging rate outperforms the raw data generation rate.

  8. Using the CPU and GPU for real-time video enhancement on a mobile computer

    CSIR Research Space (South Africa)

    Bachoo, AK

    2010-09-01

    Full Text Available . In this paper, the current advances in mobile CPU and GPU hardware are used to implement video enhancement algorithms in a new way on a mobile computer. Both the CPU and GPU are used effectively to achieve realtime performance for complex image enhancement...

  9. GNAQPMS v1.1: accelerating the Global Nested Air Quality Prediction Modeling System (GNAQPMS) on Intel Xeon Phi processors

    Science.gov (United States)

    Wang, Hui; Chen, Huansheng; Wu, Qizhong; Lin, Junmin; Chen, Xueshun; Xie, Xinwei; Wang, Rongrong; Tang, Xiao; Wang, Zifa

    2017-08-01

    The Global Nested Air Quality Prediction Modeling System (GNAQPMS) is the global version of the Nested Air Quality Prediction Modeling System (NAQPMS), which is a multi-scale chemical transport model used for air quality forecast and atmospheric environmental research. In this study, we present the porting and optimisation of GNAQPMS on a second-generation Intel Xeon Phi processor, codenamed Knights Landing (KNL). Compared with the first-generation Xeon Phi coprocessor (codenamed Knights Corner, KNC), KNL has many new hardware features such as a bootable processor, high-performance in-package memory and ISA compatibility with Intel Xeon processors. In particular, we describe the five optimisations we applied to the key modules of GNAQPMS, including the CBM-Z gas-phase chemistry, advection, convection and wet deposition modules. These optimisations work well on both the KNL 7250 processor and the Intel Xeon E5-2697 V4 processor. They include (1) updating the pure Message Passing Interface (MPI) parallel mode to the hybrid parallel mode with MPI and OpenMP in the emission, advection, convection and gas-phase chemistry modules; (2) fully employing the 512 bit wide vector processing units (VPUs) on the KNL platform; (3) reducing unnecessary memory access to improve cache efficiency; (4) reducing the thread local storage (TLS) in the CBM-Z gas-phase chemistry module to improve its OpenMP performance; and (5) changing the global communication from writing/reading interface files to MPI functions to improve the performance and the parallel scalability. These optimisations greatly improved the GNAQPMS performance. The same optimisations also work well for the Intel Xeon Broadwell processor, specifically E5-2697 v4. Compared with the baseline version of GNAQPMS, the optimised version was 3.51 × faster on KNL and 2.77 × faster on the CPU. Moreover, the optimised version ran at 26 % lower average power on KNL than on the CPU. With the combined performance and energy

  10. GNAQPMS v1.1: accelerating the Global Nested Air Quality Prediction Modeling System (GNAQPMS on Intel Xeon Phi processors

    Directory of Open Access Journals (Sweden)

    H. Wang

    2017-08-01

    Full Text Available The Global Nested Air Quality Prediction Modeling System (GNAQPMS is the global version of the Nested Air Quality Prediction Modeling System (NAQPMS, which is a multi-scale chemical transport model used for air quality forecast and atmospheric environmental research. In this study, we present the porting and optimisation of GNAQPMS on a second-generation Intel Xeon Phi processor, codenamed Knights Landing (KNL. Compared with the first-generation Xeon Phi coprocessor (codenamed Knights Corner, KNC, KNL has many new hardware features such as a bootable processor, high-performance in-package memory and ISA compatibility with Intel Xeon processors. In particular, we describe the five optimisations we applied to the key modules of GNAQPMS, including the CBM-Z gas-phase chemistry, advection, convection and wet deposition modules. These optimisations work well on both the KNL 7250 processor and the Intel Xeon E5-2697 V4 processor. They include (1 updating the pure Message Passing Interface (MPI parallel mode to the hybrid parallel mode with MPI and OpenMP in the emission, advection, convection and gas-phase chemistry modules; (2 fully employing the 512 bit wide vector processing units (VPUs on the KNL platform; (3 reducing unnecessary memory access to improve cache efficiency; (4 reducing the thread local storage (TLS in the CBM-Z gas-phase chemistry module to improve its OpenMP performance; and (5 changing the global communication from writing/reading interface files to MPI functions to improve the performance and the parallel scalability. These optimisations greatly improved the GNAQPMS performance. The same optimisations also work well for the Intel Xeon Broadwell processor, specifically E5-2697 v4. Compared with the baseline version of GNAQPMS, the optimised version was 3.51 × faster on KNL and 2.77 × faster on the CPU. Moreover, the optimised version ran at 26 % lower average power on KNL than on the CPU. With the combined

  11. On developing B-spline registration algorithms for multi-core processors

    International Nuclear Information System (INIS)

    Shackleford, J A; Kandasamy, N; Sharp, G C

    2010-01-01

    Spline-based deformable registration methods are quite popular within the medical-imaging community due to their flexibility and robustness. However, they require a large amount of computing time to obtain adequate results. This paper makes two contributions towards accelerating B-spline-based registration. First, we propose a grid-alignment scheme and associated data structures that greatly reduce the complexity of the registration algorithm. Based on this grid-alignment scheme, we then develop highly data parallel designs for B-spline registration within the stream-processing model, suitable for implementation on multi-core processors such as graphics processing units (GPUs). Particular attention is focused on an optimal method for performing analytic gradient computations in a data parallel fashion. CPU and GPU versions are validated for execution time and registration quality. Performance results on large images show that our GPU algorithm achieves a speedup of 15 times over the single-threaded CPU implementation whereas our multi-core CPU algorithm achieves a speedup of 8 times over the single-threaded implementation. The CPU and GPU versions achieve near-identical registration quality in terms of RMS differences between the generated vector fields.

  12. Green Secure Processors: Towards Power-Efficient Secure Processor Design

    Science.gov (United States)

    Chhabra, Siddhartha; Solihin, Yan

    With the increasing wealth of digital information stored on computer systems today, security issues have become increasingly important. In addition to attacks targeting the software stack of a system, hardware attacks have become equally likely. Researchers have proposed Secure Processor Architectures which utilize hardware mechanisms for memory encryption and integrity verification to protect the confidentiality and integrity of data and computation, even from sophisticated hardware attacks. While there have been many works addressing performance and other system level issues in secure processor design, power issues have largely been ignored. In this paper, we first analyze the sources of power (energy) increase in different secure processor architectures. We then present a power analysis of various secure processor architectures in terms of their increase in power consumption over a base system with no protection and then provide recommendations for designs that offer the best balance between performance and power without compromising security. We extend our study to the embedded domain as well. We also outline the design of a novel hybrid cryptographic engine that can be used to minimize the power consumption for a secure processor. We believe that if secure processors are to be adopted in future systems (general purpose or embedded), it is critically important that power issues are considered in addition to performance and other system level issues. To the best of our knowledge, this is the first work to examine the power implications of providing hardware mechanisms for security.

  13. Processors and systems (picture processing)

    Energy Technology Data Exchange (ETDEWEB)

    Gemmar, P

    1983-01-01

    Automatic picture processing requires high performance computers and high transmission capacities in the processor units. The author examines the possibilities of operating processors in parallel in order to accelerate the processing of pictures. He therefore discusses a number of available processors and systems for picture processing and illustrates their capacities for special types of picture processing. He stresses the fact that the amount of storage required for picture processing is exceptionally high. The author concludes that it is as yet difficult to decide whether very large groups of simple processors or highly complex multiprocessor systems will provide the best solution. Both methods will be aided by the development of VLSI. New solutions have already been offered (systolic arrays and 3-d processing structures) but they also are subject to losses caused by inherently parallel algorithms. Greater efforts must be made to produce suitable software for multiprocessor systems. Some possibilities for future picture processing systems are discussed. 33 references.

  14. Seismometer array station processors

    International Nuclear Information System (INIS)

    Key, F.A.; Lea, T.G.; Douglas, A.

    1977-01-01

    A description is given of the design, construction and initial testing of two types of Seismometer Array Station Processor (SASP), one to work with data stored on magnetic tape in analogue form, the other with data in digital form. The purpose of a SASP is to detect the short period P waves recorded by a UK-type array of 20 seismometers and to edit these on to a a digital library tape or disc. The edited data are then processed to obtain a rough location for the source and to produce seismograms (after optimum processing) for analysis by a seismologist. SASPs are an important component in the scheme for monitoring underground explosions advocated by the UK in the Conference of the Committee on Disarmament. With digital input a SASP can operate at 30 times real time using a linear detection process and at 20 times real time using the log detector of Weichert. Although the log detector is slower, it has the advantage over the linear detector that signals with lower signal-to-noise ratio can be detected and spurious large amplitudes are less likely to produce a detection. It is recommended, therefore, that where possible array data should be recorded in digital form for input to a SASP and that the log detector of Weichert be used. Trial runs show that a SASP is capable of detecting signals down to signal-to-noise ratios of about two with very few false detections, and at mid-continental array sites it should be capable of detecting most, if not all, the signals with magnitude above msub(b) 4.5; the UK argues that, given a suitable network, it is realistic to hope that sources of this magnitude and above can be detected and identified by seismological means alone. (author)

  15. Acceleration of spiking neural network based pattern recognition on NVIDIA graphics processors.

    Science.gov (United States)

    Han, Bing; Taha, Tarek M

    2010-04-01

    There is currently a strong push in the research community to develop biological scale implementations of neuron based vision models. Systems at this scale are computationally demanding and generally utilize more accurate neuron models, such as the Izhikevich and the Hodgkin-Huxley models, in favor of the more popular integrate and fire model. We examine the feasibility of using graphics processing units (GPUs) to accelerate a spiking neural network based character recognition network to enable such large scale systems. Two versions of the network utilizing the Izhikevich and Hodgkin-Huxley models are implemented. Three NVIDIA general-purpose (GP) GPU platforms are examined, including the GeForce 9800 GX2, the Tesla C1060, and the Tesla S1070. Our results show that the GPGPUs can provide significant speedup over conventional processors. In particular, the fastest GPGPU utilized, the Tesla S1070, provided a speedup of 5.6 and 84.4 over highly optimized implementations on the fastest central processing unit (CPU) tested, a quadcore 2.67 GHz Xeon processor, for the Izhikevich and the Hodgkin-Huxley models, respectively. The CPU implementation utilized all four cores and the vector data parallelism offered by the processor. The results indicate that GPUs are well suited for this application domain.

  16. Deployment of IPv6-only CPU resources at WLCG sites

    Science.gov (United States)

    Babik, M.; Chudoba, J.; Dewhurst, A.; Finnern, T.; Froy, T.; Grigoras, C.; Hafeez, K.; Hoeft, B.; Idiculla, T.; Kelsey, D. P.; López Muñoz, F.; Martelli, E.; Nandakumar, R.; Ohrenberg, K.; Prelz, F.; Rand, D.; Sciabà, A.; Tigerstedt, U.; Traynor, D.

    2017-10-01

    The fraction of Internet traffic carried over IPv6 continues to grow rapidly. IPv6 support from network hardware vendors and carriers is pervasive and becoming mature. A network infrastructure upgrade often offers sites an excellent window of opportunity to configure and enable IPv6. There is a significant overhead when setting up and maintaining dual-stack machines, so where possible sites would like to upgrade their services directly to IPv6 only. In doing so, they are also expediting the transition process towards its desired completion. While the LHC experiments accept there is a need to move to IPv6, it is currently not directly affecting their work. Sites are unwilling to upgrade if they will be unable to run LHC experiment workflows. This has resulted in a very slow uptake of IPv6 from WLCG sites. For several years the HEPiX IPv6 Working Group has been testing a range of WLCG services to ensure they are IPv6 compliant. Several sites are now running many of their services as dual-stack. The working group, driven by the requirements of the LHC VOs to be able to use IPv6-only opportunistic resources, continues to encourage wider deployment of dual-stack services to make the use of such IPv6-only clients viable. This paper presents the working group’s plan and progress so far to allow sites to deploy IPv6-only CPU resources. This includes making experiment central services dual-stack as well as a number of storage services. The monitoring, accounting and information services that are used by jobs also need to be upgraded. Finally the VO testing that has taken place on hosts connected via IPv6-only is reported.

  17. Using Intel's Knight Landing Processor to Accelerate Global Nested Air Quality Prediction Modeling System (GNAQPMS) Model

    Science.gov (United States)

    Wang, H.; Chen, H.; Chen, X.; Wu, Q.; Wang, Z.

    2016-12-01

    The Global Nested Air Quality Prediction Modeling System for Hg (GNAQPMS-Hg) is a global chemical transport model coupled Hg transport module to investigate the mercury pollution. In this study, we present our work of transplanting the GNAQPMS model on Intel Xeon Phi processor, Knights Landing (KNL) to accelerate the model. KNL is the second-generation product adopting Many Integrated Core Architecture (MIC) architecture. Compared with the first generation Knight Corner (KNC), KNL has more new hardware features, that it can be used as unique processor as well as coprocessor with other CPU. According to the Vtune tool, the high overhead modules in GNAQPMS model have been addressed, including CBMZ gas chemistry, advection and convection module, and wet deposition module. These high overhead modules were accelerated by optimizing code and using new techniques of KNL. The following optimized measures was done: 1) Changing the pure MPI parallel mode to hybrid parallel mode with MPI and OpenMP; 2.Vectorizing the code to using the 512-bit wide vector computation unit. 3. Reducing unnecessary memory access and calculation. 4. Reducing Thread Local Storage (TLS) for common variables with each OpenMP thread in CBMZ. 5. Changing the way of global communication from files writing and reading to MPI functions. After optimization, the performance of GNAQPMS is greatly increased both on CPU and KNL platform, the single-node test showed that optimized version has 2.6x speedup on two sockets CPU platform and 3.3x speedup on one socket KNL platform compared with the baseline version code, which means the KNL has 1.29x speedup when compared with 2 sockets CPU platform.

  18. Mathematical Methods and Algorithms of Mobile Parallel Computing on the Base of Multi-core Processors

    Directory of Open Access Journals (Sweden)

    Alexander B. Bakulev

    2012-11-01

    Full Text Available This article deals with mathematical models and algorithms, providing mobility of sequential programs parallel representation on the high-level language, presents formal model of operation environment processes management, based on the proposed model of programs parallel representation, presenting computation process on the base of multi-core processors.

  19. Applying graphics processor units to Monte Carlo dose calculation in radiation therapy

    Directory of Open Access Journals (Sweden)

    Bakhtiari M

    2010-01-01

    Full Text Available We investigate the potential in using of using a graphics processor unit (GPU for Monte-Carlo (MC-based radiation dose calculations. The percent depth dose (PDD of photons in a medium with known absorption and scattering coefficients is computed using a MC simulation running on both a standard CPU and a GPU. We demonstrate that the GPU′s capability for massive parallel processing provides a significant acceleration in the MC calculation, and offers a significant advantage for distributed stochastic simulations on a single computer. Harnessing this potential of GPUs will help in the early adoption of MC for routine planning in a clinical environment.

  20. From sequential to parallel programming with patterns

    CERN Document Server

    CERN. Geneva

    2018-01-01

    To increase in both performance and efficiency, our programming models need to adapt to better exploit modern processors. The classic idioms and patterns for programming such as loops, branches or recursion are the pillars of almost every code and are well known among all programmers. These patterns all have in common that they are sequential in nature. Embracing parallel programming patterns, which allow us to program for multi- and many-core hardware in a natural way, greatly simplifies the task of designing a program that scales and performs on modern hardware, independently of the used programming language, and in a generic way.

  1. XL-100S microprogrammable processor

    International Nuclear Information System (INIS)

    Gorbunov, N.V.; Guzik, Z.; Sutulin, V.A.; Forytski, A.

    1983-01-01

    The XL-100S microprogrammable processor providing the multiprocessor operation mode in the XL system crate is described. The processor meets the EUR 6500 CAMAC standards, address up to 4 Mbyte memory, and interacts with 7 CAMAC branchas. Eight external requests initiate operations preset by a sequence of microcommands in a memory of the capacity up to 64 kwords of 32-Git. The microprocessor architecture allows one to emulate commands of the majority of mini- or micro-computers, including floating point operations. The XL-100S processor may be used in various branches of experimental physics: for physical experiment apparatus control, fast selection of useful physical events, organization of the of input/output operations, organization of direct assess to memory included, etc. The Am2900 microprocessor set is used as an elementary base. The device is made in the form of a single width CAMAC module

  2. Making CSB + -Trees Processor Conscious

    DEFF Research Database (Denmark)

    Samuel, Michael; Pedersen, Anders Uhl; Bonnet, Philippe

    2005-01-01

    of the CSB+-tree. We argue that it is necessary to consider a larger group of parameters in order to adapt CSB+-tree to processor architectures as different as Pentium and Itanium. We identify this group of parameters and study how it impacts the performance of CSB+-tree on Itanium 2. Finally, we propose......Cache-conscious indexes, such as CSB+-tree, are sensitive to the underlying processor architecture. In this paper, we focus on how to adapt the CSB+-tree so that it performs well on a range of different processor architectures. Previous work has focused on the impact of node size on the performance...... a systematic method for adapting CSB+-tree to new platforms. This work is a first step towards integrating CSB+-tree in MySQL’s heap storage manager....

  3. Java Processor Optimized for RTSJ

    Directory of Open Access Journals (Sweden)

    Tu Shiliang

    2007-01-01

    Full Text Available Due to the preeminent work of the real-time specification for Java (RTSJ, Java is increasingly expected to become the leading programming language in real-time systems. To provide a Java platform suitable for real-time applications, a Java processor which can execute Java bytecode is directly proposed in this paper. It provides efficient support in hardware for some mechanisms specified in the RTSJ and offers a simpler programming model through ameliorating the scoped memory of the RTSJ. The worst case execution time (WCET of the bytecodes implemented in this processor is predictable by employing the optimization method proposed in our previous work, in which all the processing interfering predictability is handled before bytecode execution. Further advantage of this method is to make the implementation of the processor simpler and suited to a low-cost FPGA chip.

  4. Optical Array Processor: Laboratory Results

    Science.gov (United States)

    Casasent, David; Jackson, James; Vaerewyck, Gerard

    1987-01-01

    A Space Integrating (SI) Optical Linear Algebra Processor (OLAP) is described and laboratory results on its performance in several practical engineering problems are presented. The applications include its use in the solution of a nonlinear matrix equation for optimal control and a parabolic Partial Differential Equation (PDE), the transient diffusion equation with two spatial variables. Frequency-multiplexed, analog and high accuracy non-base-two data encoding are used and discussed. A multi-processor OLAP architecture is described and partitioning and data flow issues are addressed.

  5. Fast processor for dilepton triggers

    International Nuclear Information System (INIS)

    Katsanevas, S.; Kostarakis, P.; Baltrusaitis, R.

    1983-01-01

    We describe a fast trigger processor, developed for and used in Fermilab experiment E-537, for selecting high-mass dimuon events produced by negative pions and anti-protons. The processor finds candidate tracks by matching hit information received from drift chambers and scintillation counters, and determines their momenta. Invariant masses are calculated for all possible pairs of tracks and an event is accepted if any invariant mass is greater than some preselectable minimum mass. The whole process, accomplished within 5 to 10 microseconds, achieves up to a ten-fold reduction in trigger rate

  6. Commodity multi-processor systems in the ATLAS level-2 trigger

    International Nuclear Information System (INIS)

    Abolins, M.; Blair, R.; Bock, R.; Bogaerts, A.; Dawson, J.; Ermoline, Y.; Hauser, R.; Kugel, A.; Lay, R.; Muller, M.; Noffz, K.-H.; Pope, B.; Schlereth, J.; Werner, P.

    2000-01-01

    Low cost SMP (Symmetric Multi-Processor) systems provide substantial CPU and I/O capacity. These features together with the ease of system integration make them an attractive and cost effective solution for a number of real-time applications in event selection. In ATLAS the authors consider them as intelligent input buffers (active ROB complex), as event flow supervisors or as powerful processing nodes. Measurements of the performance of one off-the-shelf commercial 4-processor PC with two PCI buses, equipped with commercial FPGA based data source cards (microEnable) and running commercial software are presented and mapped on such applications together with a long-term program of work. The SMP systems may be considered as an important building block in future data acquisition systems

  7. Commodity multi-processor systems in the ATLAS level-2 trigger

    CERN Document Server

    Abolins, M; Bock, R; Bogaerts, J A C; Dawson, J; Ermoline, Y; Hauser, R; Kugel, A; Lay, R; Müller, M; Noffz, K H; Pope, B; Schlereth, J L; Werner, P

    2000-01-01

    Low cost SMP (symmetric multi-processor) systems provide substantial CPU and I/O capacity. These features together with the ease of system integration make them an attractive and cost effective solution for a number of real-time applications in event selection. In ATLAS we consider them as intelligent input buffers (an "active" ROB complex), as event flow supervisors or as powerful processing nodes. Measurements of the performance of one off-the-shelf commercial 4- processor PC with two PCI buses, equipped with commercial FPGA based data source cards (microEnable) and running commercial software are presented and mapped on such applications together with a long-term programme of work. The SMP systems may be considered as an important building block in future data acquisition systems. (9 refs).

  8. Some questions of using the algebraic coding theory for construction of special-purpose processors in high energy physics spectrometers

    International Nuclear Information System (INIS)

    Nikityuk, N.M.

    1989-01-01

    The results of investigations of using the algebraic coding theory for the creation of parallel encoders, majority coincidence schemes and coordinate processors for the first and second trigger levels are described. Concrete examples of calculation and structure of special-purpose processor using the table arithmetic method are given for multiplicity t ≤ 5. The question of using parallel and sequential syndrome coding methods for the registration of events with clusters is discussed. 30 refs.; 10 figs

  9. Software verification and validation methodology for advanced digital reactor protection system using diverse dual processors to prevent common mode failure

    International Nuclear Information System (INIS)

    Son, Ki Chang; Shin, Hyun Kook; Lee, Nam Hoon; Baek, Seung Min; Kim, Hang Bae

    2001-01-01

    The Advanced Digital Reactor Protection System (ADRPS) with diverse dual processors is being developed by the National Research Lab of KOPEC for ADRPS development. One of the ADRPS goals is to develop digital Plant Protection System (PPS) free of Common Mode Failure (CMF). To prevent CMF, the principle of diversity is applied to both hardware design and software design. For the hardware diversity, two different types of CPUs are used for Bistable Processor and Local Coincidence Logic Processor. The VME based Single Board Computers (SBC) are used for the CPU hardware platforms. The QNX Operating System (OS) and the VxWorks OS are used for software diversity. Rigorous Software Verification and Validation (V and V) is also required to prevent CMF. In this paper, software V and V methodology for the ADRPS is described to enhance the ADRPS software reliability and to assure high quality of the ADRPS software

  10. Very Long Instruction Word Processors

    Indian Academy of Sciences (India)

    Explicitly Parallel Instruction Computing (EPIC) is an instruction processing paradigm that has been in the spot- light due to its adoption by the next generation of Intel. Processors starting with the IA-64. The EPIC processing paradigm is an evolution of the Very Long Instruction. Word (VLIW) paradigm. This article gives an ...

  11. VON WISPR Family Processors: Volume 1

    National Research Council Canada - National Science Library

    Wagstaff, Ronald

    1997-01-01

    ...) and the background noise they are embedded in. Processors utilizing those fluctuations such as the von WISPR Family Processors discussed herein, are methods or algorithms that preferentially attenuate the fluctuating signals and noise...

  12. Design Principles for Synthesizable Processor Cores

    DEFF Research Database (Denmark)

    Schleuniger, Pascal; McKee, Sally A.; Karlsson, Sven

    2012-01-01

    As FPGAs get more competitive, synthesizable processor cores become an attractive choice for embedded computing. Currently popular commercial processor cores do not fully exploit current FPGA architectures. In this paper, we propose general design principles to increase instruction throughput...

  13. Inhibition of CPU0213, a Dual Endothelin Receptor Antagonist, on Apoptosis via Nox4-Dependent ROS in HK-2 Cells

    Directory of Open Access Journals (Sweden)

    Qing Li

    2016-06-01

    Full Text Available Background/Aims: Our previous studies have indicated that a novel endothelin receptor antagonist CPU0213 effectively normalized renal function in diabetic nephropathy. However, the molecular mechanisms mediating the nephroprotective role of CPU0213 remain unknown. Methods and Results: In the present study, we first detected the role of CPU0213 on apoptosis in human renal tubular epithelial cell (HK-2. It was shown that high glucose significantly increased the protein expression of Bax and decreased Bcl-2 protein in HK-2 cells, which was reversed by CPU0213. The percentage of HK-2 cells that showed Annexin V-FITC binding was markedly suppressed by CPU0213, which confirmed the inhibitory role of CPU0213 on apoptosis. Given the regulation of endothelin (ET system to oxidative stress, we determined the role of redox signaling in the regulation of CPU0213 on apoptosis. It was demonstrated that the production of superoxide (O2-. was substantially attenuated by CPU0213 treatment in HK-2 cells. We further found that CPU0213 dramatically inhibited expression of Nox4 protein, which gene silencing mimicked the role of CPU0213 on the apoptosis under high glucose stimulation. We finally examined the role of CPU0213 on ET-1 receptors and found that high glucose-induced protein expression of endothelin A and B receptors was dramatically inhibited by CPU0213. Conclusion: Taken together, these results suggest that this Nox4-dependenet O2- production is critical for the apoptosis of HK-2 cells in high glucose. Endothelin receptor antagonist CPU0213 has an anti-apoptosis role through Nox4-dependent O2-.production, which address the nephroprotective role of CPU0213 in diabetic nephropathy.

  14. Deterministic chaos in the processor load

    International Nuclear Information System (INIS)

    Halbiniak, Zbigniew; Jozwiak, Ireneusz J.

    2007-01-01

    In this article we present the results of research whose purpose was to identify the phenomenon of deterministic chaos in the processor load. We analysed the time series of the processor load during efficiency tests of database software. Our research was done on a Sparc Alpha processor working on the UNIX Sun Solaris 5.7 operating system. The conducted analyses proved the presence of the deterministic chaos phenomenon in the processor load in this particular case

  15. JPP: A Java Pre-Processor

    OpenAIRE

    Kiniry, Joseph R.; Cheong, Elaine

    1998-01-01

    The Java Pre-Processor, or JPP for short, is a parsing pre-processor for the Java programming language. Unlike its namesake (the C/C++ Pre-Processor, cpp), JPP provides functionality above and beyond simple textual substitution. JPP's capabilities include code beautification, code standard conformance checking, class and interface specification and testing, and documentation generation.

  16. Three Dimensional Simulation of Ion Thruster Plume-Spacecraft Interaction Based on a Graphic Processor Unit

    International Nuclear Information System (INIS)

    Ren Junxue; Xie Kan; Qiu Qian; Tang Haibin; Li Juan; Tian Huabing

    2013-01-01

    Based on the three-dimensional particle-in-cell (PIC) method and Compute Unified Device Architecture (CUDA), a parallel particle simulation code combined with a graphic processor unit (GPU) has been developed for the simulation of charge-exchange (CEX) xenon ions in the plume of an ion thruster. Using the proposed technique, the potential and CEX plasma distribution are calculated for the ion thruster plume surrounding the DS1 spacecraft at different thrust levels. The simulation results are in good agreement with measured CEX ion parameters reported in literature, and the GPU's results are equal to a CPU's. Compared with a single CPU Intel Core 2 E6300, 16-processor GPU NVIDIA GeForce 9400 GT indicates a speedup factor of 3.6 when the total macro particle number is 1.1×10 6 . The simulation results also reveal how the back flow CEX plasma affects the spacecraft floating potential, which indicates that the plume of the ion thruster is indeed able to alleviate the extreme negative floating potentials of spacecraft in geosynchronous orbit

  17. First Evaluation of the CPU, GPGPU and MIC Architectures for Real Time Particle Tracking based on Hough Transform at the LHC

    CERN Document Server

    Halyo, V.; Lujan, P.; Karpusenko, V.; Vladimirov, A.

    2014-04-07

    Recent innovations focused around {\\em parallel} processing, either through systems containing multiple processors or processors containing multiple cores, hold great promise for enhancing the performance of the trigger at the LHC and extending its physics program. The flexibility of the CMS/ATLAS trigger system allows for easy integration of computational accelerators, such as NVIDIA's Tesla Graphics Processing Unit (GPU) or Intel's \\xphi, in the High Level Trigger. These accelerators have the potential to provide faster or more energy efficient event selection, thus opening up possibilities for new complex triggers that were not previously feasible. At the same time, it is crucial to explore the performance limits achievable on the latest generation multicore CPUs with the use of the best software optimization methods. In this article, a new tracking algorithm based on the Hough transform will be evaluated for the first time on a multi-core Intel Xeon E5-2697v2 CPU, an NVIDIA Tesla K20c GPU, and an Intel \\x...

  18. Designing High-Performance Fuzzy Controllers Combining IP Cores and Soft Processors

    Directory of Open Access Journals (Sweden)

    Oscar Montiel-Ross

    2012-01-01

    Full Text Available This paper presents a methodology to integrate a fuzzy coprocessor described in VHDL (VHSIC Hardware Description Language to a soft processor embedded into an FPGA, which increases the throughput of the whole system, since the controller uses parallelism at the circuitry level for high-speed-demanding applications, the rest of the application can be written in C/C++. We used the ARM 32-bit soft processor, which allows sequential and parallel programming. The FLC coprocessor incorporates a tuning method that allows to manipulate the system response. We show experimental results using a fuzzy PD+I controller as the embedded coprocessor.

  19. Online Fastbus processor for LEP

    International Nuclear Information System (INIS)

    Mueller, H.

    1986-01-01

    The author describes the online computing aspects of Fastbus systems using a processor module which has been developed at CERN and is now available commercially. These General Purpose Master/Slaves (GPMS) are based on 68000/10 (or optionally 68020/68881) processors. Applications include use as event-filters (DELPHI), supervisory controllers, Fastbus stand-alone diagnostic tools, and multiprocessor array components. The direct mapping of single, 32-bit assembly instructions to execute Fastbus protocols makes the use of a GPM both simple and flexible. Loosely coupled processing in Fastbus networks is possible between GPM's as they support access semaphores and use a two port memory as I/O buffer for Fastbus. Both master and slave-ports support block transfers up to 20 Mbytes/s. The CERN standard Fastbus software and the MoniCa symbolic debugging monitor are available on the GPM with real time, multiprocessing support. (Auth.)

  20. Invasive tightly coupled processor arrays

    CERN Document Server

    LARI, VAHID

    2016-01-01

    This book introduces new massively parallel computer (MPSoC) architectures called invasive tightly coupled processor arrays. It proposes strategies, architecture designs, and programming interfaces for invasive TCPAs that allow invading and subsequently executing loop programs with strict requirements or guarantees of non-functional execution qualities such as performance, power consumption, and reliability. For the first time, such a configurable processor array architecture consisting of locally interconnected VLIW processing elements can be claimed by programs, either in full or in part, using the principle of invasive computing. Invasive TCPAs provide unprecedented energy efficiency for the parallel execution of nested loop programs by avoiding any global memory access such as GPUs and may even support loops with complex dependencies such as loop-carried dependencies that are not amenable to parallel execution on GPUs. For this purpose, the book proposes different invasion strategies for claiming a desire...

  1. An Investigation of the Performance of the Colored Gauss-Seidel Solver on CPU and GPU

    International Nuclear Information System (INIS)

    Yoon, Jong Seon; Choi, Hyoung Gwon; Jeon, Byoung Jin

    2017-01-01

    The performance of the colored Gauss–Seidel solver on CPU and GPU was investigated for the two- and three-dimensional heat conduction problems by using different mesh sizes. The heat conduction equation was discretized by the finite difference method and finite element method. The CPU yielded good performance for small problems but deteriorated when the total memory required for computing was larger than the cache memory for large problems. In contrast, the GPU performed better as the mesh size increased because of the latency hiding technique. Further, GPU computation by the colored Gauss–Siedel solver was approximately 7 times that by the single CPU. Furthermore, the colored Gauss–Seidel solver was found to be approximately twice that of the Jacobi solver when parallel computing was conducted on the GPU.

  2. An Investigation of the Performance of the Colored Gauss-Seidel Solver on CPU and GPU

    Energy Technology Data Exchange (ETDEWEB)

    Yoon, Jong Seon; Choi, Hyoung Gwon [Seoul Nat’l Univ. of Science and Technology, Seoul (Korea, Republic of); Jeon, Byoung Jin [Yonsei Univ., Seoul (Korea, Republic of)

    2017-02-15

    The performance of the colored Gauss–Seidel solver on CPU and GPU was investigated for the two- and three-dimensional heat conduction problems by using different mesh sizes. The heat conduction equation was discretized by the finite difference method and finite element method. The CPU yielded good performance for small problems but deteriorated when the total memory required for computing was larger than the cache memory for large problems. In contrast, the GPU performed better as the mesh size increased because of the latency hiding technique. Further, GPU computation by the colored Gauss–Siedel solver was approximately 7 times that by the single CPU. Furthermore, the colored Gauss–Seidel solver was found to be approximately twice that of the Jacobi solver when parallel computing was conducted on the GPU.

  3. SU-E-J-60: Efficient Monte Carlo Dose Calculation On CPU-GPU Heterogeneous Systems

    Energy Technology Data Exchange (ETDEWEB)

    Xiao, K; Chen, D. Z; Hu, X. S [University of Notre Dame, Notre Dame, IN (United States); Zhou, B [Altera Corp., San Jose, CA (United States)

    2014-06-01

    Purpose: It is well-known that the performance of GPU-based Monte Carlo dose calculation implementations is bounded by memory bandwidth. One major cause of this bottleneck is the random memory writing patterns in dose deposition, which leads to several memory efficiency issues on GPU such as un-coalesced writing and atomic operations. We propose a new method to alleviate such issues on CPU-GPU heterogeneous systems, which achieves overall performance improvement for Monte Carlo dose calculation. Methods: Dose deposition is to accumulate dose into the voxels of a dose volume along the trajectories of radiation rays. Our idea is to partition this procedure into the following three steps, which are fine-tuned for CPU or GPU: (1) each GPU thread writes dose results with location information to a buffer on GPU memory, which achieves fully-coalesced and atomic-free memory transactions; (2) the dose results in the buffer are transferred to CPU memory; (3) the dose volume is constructed from the dose buffer on CPU. We organize the processing of all radiation rays into streams. Since the steps within a stream use different hardware resources (i.e., GPU, DMA, CPU), we can overlap the execution of these steps for different streams by pipelining. Results: We evaluated our method using a Monte Carlo Convolution Superposition (MCCS) program and tested our implementation for various clinical cases on a heterogeneous system containing an Intel i7 quad-core CPU and an NVIDIA TITAN GPU. Comparing with a straightforward MCCS implementation on the same system (using both CPU and GPU for radiation ray tracing), our method gained 2-5X speedup without losing dose calculation accuracy. Conclusion: The results show that our new method improves the effective memory bandwidth and overall performance for MCCS on the CPU-GPU systems. Our proposed method can also be applied to accelerate other Monte Carlo dose calculation approaches. This research was supported in part by NSF under Grants CCF

  4. CPU time reduction strategies for the Lambda modes calculation of a nuclear power reactor

    Energy Technology Data Exchange (ETDEWEB)

    Vidal, V.; Garayoa, J.; Hernandez, V. [Universidad Politecnica de Valencia (Spain). Dept. de Sistemas Informaticos y Computacion; Navarro, J.; Verdu, G.; Munoz-Cobo, J.L. [Universidad Politecnica de Valencia (Spain). Dept. de Ingenieria Quimica y Nuclear; Ginestar, D. [Universidad Politecnica de Valencia (Spain). Dept. de Matematica Aplicada

    1997-12-01

    In this paper, we present two strategies to reduce the CPU time spent in the lambda modes calculation for a realistic nuclear power reactor.The discretization of the multigroup neutron diffusion equation has been made using a nodal collocation method, solving the associated eigenvalue problem with two different techniques: the Subspace Iteration Method and Arnoldi`s Method. CPU time reduction is based on a coarse grain parallelization approach together with a multistep algorithm to initialize adequately the solution. (author). 9 refs., 6 tabs.

  5. Case Study of Using High Performance Commercial Processors in Space

    Science.gov (United States)

    Ferguson, Roscoe C.; Olivas, Zulema

    2009-01-01

    The purpose of the Space Shuttle Cockpit Avionics Upgrade project (1999 2004) was to reduce crew workload and improve situational awareness. The upgrade was to augment the Shuttle avionics system with new hardware and software. A major success of this project was the validation of the hardware architecture and software design. This was significant because the project incorporated new technology and approaches for the development of human rated space software. An early version of this system was tested at the Johnson Space Center for one month by teams of astronauts. The results were positive, but NASA eventually cancelled the project towards the end of the development cycle. The goal to reduce crew workload and improve situational awareness resulted in the need for high performance Central Processing Units (CPUs). The choice of CPU selected was the PowerPC family, which is a reduced instruction set computer (RISC) known for its high performance. However, the requirement for radiation tolerance resulted in the re-evaluation of the selected family member of the PowerPC line. Radiation testing revealed that the original selected processor (PowerPC 7400) was too soft to meet mission objectives and an effort was established to perform trade studies and performance testing to determine a feasible candidate. At that time, the PowerPC RAD750s were radiation tolerant, but did not meet the required performance needs of the project. Thus, the final solution was to select the PowerPC 7455. This processor did not have a radiation tolerant version, but had some ability to detect failures. However, its cache tags did not provide parity and thus the project incorporated a software strategy to detect radiation failures. The strategy was to incorporate dual paths for software generating commands to the legacy Space Shuttle avionics to prevent failures due to the softness of the upgraded avionics.

  6. Sequential Classification of Palm Gestures Based on A* Algorithm and MLP Neural Network for Quadrocopter Control

    Directory of Open Access Journals (Sweden)

    Wodziński Marek

    2017-06-01

    Full Text Available This paper presents an alternative approach to the sequential data classification, based on traditional machine learning algorithms (neural networks, principal component analysis, multivariate Gaussian anomaly detector and finding the shortest path in a directed acyclic graph, using A* algorithm with a regression-based heuristic. Palm gestures were used as an example of the sequential data and a quadrocopter was the controlled object. The study includes creation of a conceptual model and practical construction of a system using the GPU to ensure the realtime operation. The results present the classification accuracy of chosen gestures and comparison of the computation time between the CPU- and GPU-based solutions.

  7. Multi-threaded ATLAS simulation on Intel Knights Landing processors

    Science.gov (United States)

    Farrell, Steven; Calafiura, Paolo; Leggett, Charles; Tsulaia, Vakhtang; Dotti, Andrea; ATLAS Collaboration

    2017-10-01

    The Knights Landing (KNL) release of the Intel Many Integrated Core (MIC) Xeon Phi line of processors is a potential game changer for HEP computing. With 72 cores and deep vector registers, the KNL cards promise significant performance benefits for highly-parallel, compute-heavy applications. Cori, the newest supercomputer at the National Energy Research Scientific Computing Center (NERSC), was delivered to its users in two phases with the first phase online at the end of 2015 and the second phase now online at the end of 2016. Cori Phase 2 is based on the KNL architecture and contains over 9000 compute nodes with 96GB DDR4 memory. ATLAS simulation with the multithreaded Athena Framework (AthenaMT) is a good potential use-case for the KNL architecture and supercomputers like Cori. ATLAS simulation jobs have a high ratio of CPU computation to disk I/O and have been shown to scale well in multi-threading and across many nodes. In this paper we will give an overview of the ATLAS simulation application with details on its multi-threaded design. Then, we will present a performance analysis of the application on KNL devices and compare it to a traditional x86 platform to demonstrate the capabilities of the architecture and evaluate the benefits of utilizing KNL platforms like Cori for ATLAS production.

  8. Multi-threaded ATLAS simulation on Intel Knights Landing processors

    CERN Document Server

    AUTHOR|(INSPIRE)INSPIRE-00014247; The ATLAS collaboration; Calafiura, Paolo; Leggett, Charles; Tsulaia, Vakhtang; Dotti, Andrea

    2017-01-01

    The Knights Landing (KNL) release of the Intel Many Integrated Core (MIC) Xeon Phi line of processors is a potential game changer for HEP computing. With 72 cores and deep vector registers, the KNL cards promise significant performance benefits for highly-parallel, compute-heavy applications. Cori, the newest supercomputer at the National Energy Research Scientific Computing Center (NERSC), was delivered to its users in two phases with the first phase online at the end of 2015 and the second phase now online at the end of 2016. Cori Phase 2 is based on the KNL architecture and contains over 9000 compute nodes with 96GB DDR4 memory. ATLAS simulation with the multithreaded Athena Framework (AthenaMT) is a good potential use-case for the KNL architecture and supercomputers like Cori. ATLAS simulation jobs have a high ratio of CPU computation to disk I/O and have been shown to scale well in multi-threading and across many nodes. In this paper we will give an overview of the ATLAS simulation application with detai...

  9. Multi-threaded ATLAS Simulation on Intel Knights Landing Processors

    CERN Document Server

    Farrell, Steven; The ATLAS collaboration; Calafiura, Paolo; Leggett, Charles

    2016-01-01

    The Knights Landing (KNL) release of the Intel Many Integrated Core (MIC) Xeon Phi line of processors is a potential game changer for HEP computing. With 72 cores and deep vector registers, the KNL cards promise significant performance benefits for highly-parallel, compute-heavy applications. Cori, the newest supercomputer at the National Energy Research Scientific Computing Center (NERSC), will be delivered to its users in two phases with the first phase online now and the second phase expected in mid-2016. Cori Phase 2 will be based on the KNL architecture and will contain over 9000 compute nodes with 96GB DDR4 memory. ATLAS simulation with the multithreaded Athena Framework (AthenaMT) is a great use-case for the KNL architecture and supercomputers like Cori. Simulation jobs have a high ratio of CPU computation to disk I/O and have been shown to scale well in multi-threading and across many nodes. In this presentation we will give an overview of the ATLAS simulation application with details on its multi-thr...

  10. CPU SIM: A Computer Simulator for Use in an Introductory Computer Organization-Architecture Class.

    Science.gov (United States)

    Skrein, Dale

    1994-01-01

    CPU SIM, an interactive low-level computer simulation package that runs on the Macintosh computer, is described. The program is designed for instructional use in the first or second year of undergraduate computer science, to teach various features of typical computer organization through hands-on exercises. (MSE)

  11. High performance technique for database applicationsusing a hybrid GPU/CPU platform

    KAUST Repository

    Zidan, Mohammed A.; Bonny, Talal; Salama, Khaled N.

    2012-01-01

    Hybrid GPU/CPU platform. In particular, our technique solves the problem of the low efficiency result- ing from running short-length sequences in a database on a GPU. To verify our technique, we applied it to the widely used Smith-Waterman algorithm

  12. Length-Bounded Hybrid CPU/GPU Pattern Matching Algorithm for Deep Packet Inspection

    Directory of Open Access Journals (Sweden)

    Yi-Shan Lin

    2017-01-01

    Full Text Available Since frequent communication between applications takes place in high speed networks, deep packet inspection (DPI plays an important role in the network application awareness. The signature-based network intrusion detection system (NIDS contains a DPI technique that examines the incoming packet payloads by employing a pattern matching algorithm that dominates the overall inspection performance. Existing studies focused on implementing efficient pattern matching algorithms by parallel programming on software platforms because of the advantages of lower cost and higher scalability. Either the central processing unit (CPU or the graphic processing unit (GPU were involved. Our studies focused on designing a pattern matching algorithm based on the cooperation between both CPU and GPU. In this paper, we present an enhanced design for our previous work, a length-bounded hybrid CPU/GPU pattern matching algorithm (LHPMA. In the preliminary experiment, the performance and comparison with the previous work are displayed, and the experimental results show that the LHPMA can achieve not only effective CPU/GPU cooperation but also higher throughput than the previous method.

  13. The CMSSW benchmarking suite: Using HEP code to measure CPU performance

    International Nuclear Information System (INIS)

    Benelli, G

    2010-01-01

    The demanding computing needs of the CMS experiment require thoughtful planning and management of its computing infrastructure. A key factor in this process is the use of realistic benchmarks when assessing the computing power of the different architectures available. In recent years a discrepancy has been observed between the CPU performance estimates given by the reference benchmark for HEP computing (SPECint) and actual performances of HEP code. Making use of the CPU performance tools from the CMSSW performance suite, comparative CPU performance studies have been carried out on several architectures. A benchmarking suite has been developed and integrated in the CMSSW framework, to allow computing centers and interested third parties to benchmark architectures directly with CMSSW. The CMSSW benchmarking suite can be used out of the box, to test and compare several machines in terms of CPU performance and report with the wanted level of detail the different benchmarking scores (e.g. by processing step) and results. In this talk we describe briefly the CMSSW software performance suite, and in detail the CMSSW benchmarking suite client/server design, the performance data analysis and the available CMSSW benchmark scores. The experience in the use of HEP code for benchmarking will be discussed and CMSSW benchmark results presented.

  14. DSM vs. NSM: CPU Performance Tradeoffs in Block-Oriented Query Processing

    NARCIS (Netherlands)

    M. Zukowski (Marcin); N.J. Nes (Niels); P.A. Boncz (Peter)

    2008-01-01

    textabstractComparisons between the merits of row-wise storage (NSM) and columnar storage (DSM) are typically made with respect to the persistent storage layer of database systems. In this paper, however, we focus on the CPU efficiency tradeoffs of tuple representations inside the query

  15. A Hybrid CPU/GPU Pattern-Matching Algorithm for Deep Packet Inspection.

    Directory of Open Access Journals (Sweden)

    Chun-Liang Lee

    Full Text Available The large quantities of data now being transferred via high-speed networks have made deep packet inspection indispensable for security purposes. Scalable and low-cost signature-based network intrusion detection systems have been developed for deep packet inspection for various software platforms. Traditional approaches that only involve central processing units (CPUs are now considered inadequate in terms of inspection speed. Graphic processing units (GPUs have superior parallel processing power, but transmission bottlenecks can reduce optimal GPU efficiency. In this paper we describe our proposal for a hybrid CPU/GPU pattern-matching algorithm (HPMA that divides and distributes the packet-inspecting workload between a CPU and GPU. All packets are initially inspected by the CPU and filtered using a simple pre-filtering algorithm, and packets that might contain malicious content are sent to the GPU for further inspection. Test results indicate that in terms of random payload traffic, the matching speed of our proposed algorithm was 3.4 times and 2.7 times faster than those of the AC-CPU and AC-GPU algorithms, respectively. Further, HPMA achieved higher energy efficiency than the other tested algorithms.

  16. Sequential charged particle reaction

    International Nuclear Information System (INIS)

    Hori, Jun-ichi; Ochiai, Kentaro; Sato, Satoshi; Yamauchi, Michinori; Nishitani, Takeo

    2004-01-01

    The effective cross sections for producing the sequential reaction products in F82H, pure vanadium and LiF with respect to the 14.9-MeV neutron were obtained and compared with the estimation ones. Since the sequential reactions depend on the secondary charged particles behavior, the effective cross sections are corresponding to the target nuclei and the material composition. The effective cross sections were also estimated by using the EAF-libraries and compared with the experimental ones. There were large discrepancies between estimated and experimental values. Additionally, we showed the contribution of the sequential reaction on the induced activity and dose rate in the boundary region with water. From the present study, it has been clarified that the sequential reactions are of great importance to evaluate the dose rates around the surface of cooling pipe and the activated corrosion products. (author)

  17. A solution for automatic parallelization of sequential assembly code

    Directory of Open Access Journals (Sweden)

    Kovačević Đorđe

    2013-01-01

    Full Text Available Since modern multicore processors can execute existing sequential programs only on a single core, there is a strong need for automatic parallelization of program code. Relying on existing algorithms, this paper describes one new software solution tool for parallelization of sequential assembly code. The main goal of this paper is to develop the parallelizator which reads sequential assembler code and at the output provides parallelized code for MIPS processor with multiple cores. The idea is the following: the parser translates assembler input file to program objects suitable for further processing. After that the static single assignment is done. Based on the data flow graph, the parallelization algorithm separates instructions on different cores. Once sequential code is parallelized by the parallelization algorithm, registers are allocated with the algorithm for linear allocation, and the result at the end of the program is distributed assembler code on each of the cores. In the paper we evaluate the speedup of the matrix multiplication example, which was processed by the parallelizator of assembly code. The result is almost linear speedup of code execution, which increases with the number of cores. The speed up on the two cores is 1.99, while on 16 cores the speed up is 13.88.

  18. Accuracies Of Optical Processors For Adaptive Optics

    Science.gov (United States)

    Downie, John D.; Goodman, Joseph W.

    1992-01-01

    Paper presents analysis of accuracies and requirements concerning accuracies of optical linear-algebra processors (OLAP's) in adaptive-optics imaging systems. Much faster than digital electronic processor and eliminate some residual distortion. Question whether errors introduced by analog processing of OLAP overcome advantage of greater speed. Paper addresses issue by presenting estimate of accuracy required in general OLAP that yields smaller average residual aberration of wave front than digital electronic processor computing at given speed.

  19. Functional Verification of Enhanced RISC Processor

    OpenAIRE

    SHANKER NILANGI; SOWMYA L

    2013-01-01

    This paper presents design and verification of a 32-bit enhanced RISC processor core having floating point computations integrated within the core, has been designed to reduce the cost and complexity. The designed 3 stage pipelined 32-bit RISC processor is based on the ARM7 processor architecture with single precision floating point multiplier, floating point adder/subtractor for floating point operations and 32 x 32 booths multiplier added to the integer core of ARM7. The binary representati...

  20. Conserved-peptide upstream open reading frames (CPuORFs are associated with regulatory genes in angiosperms

    Directory of Open Access Journals (Sweden)

    Richard A Jorgensen

    2012-08-01

    Full Text Available Upstream open reading frames (uORFs are common in eukaryotic transcripts, but those that encode conserved peptides (CPuORFs occur in less than 1% of transcripts. The peptides encoded by three plant CPuORF families are known to control translation of the downstream ORF in response to a small signal molecule (sucrose, polyamines and phosphocholine. In flowering plants, transcription factors are statistically over-represented among genes that possess CPuORFs, and in general it appeared that many CPuORF genes also had other regulatory functions, though the significance of this suggestion was uncertain (Hayden and Jorgensen, 2007. Five years later the literature provides much more information on the functions of many CPuORF genes. Here we reassess the functions of 27 known CPuORF gene families and find that 22 of these families play a variety of different regulatory roles, from transcriptional control to protein turnover, and from small signal molecules to signal transduction kinases. Clearly then, there is indeed a strong association of CPuORFs with regulatory genes. In addition, 16 of these families play key roles in a variety of different biological processes. Most strikingly, the core sucrose response network includes three different CPuORFs, creating the potential for sophisticated balancing of the network in response to three different molecular inputs. We propose that the function of most CPuORFs is to modulate translation of a downstream major ORF (mORF in response to a signal molecule recognized by the conserved peptide and that because the mORFs of CPuORF genes generally encode regulatory proteins, many of them centrally important in the biology of plants, CPuORFs play key roles in balancing such regulatory networks.

  1. Real-Time Adaptive Lossless Hyperspectral Image Compression using CCSDS on Parallel GPGPU and Multicore Processor Systems

    Science.gov (United States)

    Hopson, Ben; Benkrid, Khaled; Keymeulen, Didier; Aranki, Nazeeh; Klimesh, Matt; Kiely, Aaron

    2012-01-01

    The proposed CCSDS (Consultative Committee for Space Data Systems) Lossless Hyperspectral Image Compression Algorithm was designed to facilitate a fast hardware implementation. This paper analyses that algorithm with regard to available parallelism and describes fast parallel implementations in software for GPGPU and Multicore CPU architectures. We show that careful software implementation, using hardware acceleration in the form of GPGPUs or even just multicore processors, can exceed the performance of existing hardware and software implementations by up to 11x and break the real-time barrier for the first time for a typical test application.

  2. Alternative Water Processor Test Development

    Science.gov (United States)

    Pickering, Karen D.; Mitchell, Julie; Vega, Leticia; Adam, Niklas; Flynn, Michael; Wjee (er. Rau); Lunn, Griffin; Jackson, Andrew

    2012-01-01

    The Next Generation Life Support Project is developing an Alternative Water Processor (AWP) as a candidate water recovery system for long duration exploration missions. The AWP consists of biological water processor (BWP) integrated with a forward osmosis secondary treatment system (FOST). The basis of the BWP is a membrane aerated biological reactor (MABR), developed in concert with Texas Tech University. Bacteria located within the MABR metabolize organic material in wastewater, converting approximately 90% of the total organic carbon to carbon dioxide. In addition, bacteria convert a portion of the ammonia-nitrogen present in the wastewater to nitrogen gas, through a combination of nitrogen and denitrification. The effluent from the BWP system is low in organic contaminants, but high in total dissolved solids. The FOST system, integrated downstream of the BWP, removes dissolved solids through a combination of concentration-driven forward osmosis and pressure driven reverse osmosis. The integrated system is expected to produce water with a total organic carbon less than 50 mg/l and dissolved solids that meet potable water requirements for spaceflight. This paper describes the test definition, the design of the BWP and FOST subsystems, and plans for integrated testing.

  3. The UA1 trigger processor

    International Nuclear Information System (INIS)

    Grayer, G.H.

    1981-01-01

    Experiment UA1 is a large multi-purpose spectrometer at the CERN proton-antiproton collider, scheduled for late 1981. The principal trigger is formed on the basis of the energy deposition in calorimeters. A trigger decision taken in under 2.4 microseconds can avoid dead time losses due to the bunched nature of the beam. To achieve this we have built fast 8-bit charge to digital converters followed by two identical digital processors tailored to the experiment. The outputs of groups of the 2440 photomultipliers in the calorimeters are summed to form a total of 288 input channels to the ADCs. A look-up table in RAM is used to convert the digitised photomultiplier signals to energy in one processor, combinations of input channels, and also counts the number of clusters with electromagnetic or hadronic energy above pre-determined levels. Up to twelve combinations of these conditions, together with external information, may be combined in coincidence or in veto to form the final trigger. Provision has been made for testing using simulated data in an off-line mode, and sampling real data when on-line. (orig.)

  4. The relationship among CPU utilization, temperature, and thermal power for waste heat utilization

    International Nuclear Information System (INIS)

    Haywood, Anna M.; Sherbeck, Jon; Phelan, Patrick; Varsamopoulos, Georgios; Gupta, Sandeep K.S.

    2015-01-01

    Highlights: • This work graphs a triad relationship among CPU utilization, temperature and power. • Using a custom-built cold plate, we were able capture CPU-generated high quality heat. • The work undertakes a radical approach using mineral oil to directly cool CPUs. • We found that it is possible to use CPU waste energy to power an absorption chiller. - Abstract: This work addresses significant datacenter issues of growth in numbers of computer servers and subsequent electricity expenditure by proposing, analyzing and testing a unique idea of recycling the highest quality waste heat generated by datacenter servers. The aim was to provide a renewable and sustainable energy source for use in cooling the datacenter. The work incorporates novel approaches in waste heat usage, graphing CPU temperature, power and utilization simultaneously, and a mineral oil experimental design and implementation. The work presented investigates and illustrates the quantity and quality of heat that can be captured from a variably tasked liquid-cooled microprocessor on a datacenter server blade. It undertakes a radical approach using mineral oil. The trials examine the feasibility of using the thermal energy from a CPU to drive a cooling process. Results indicate that 123 servers encapsulated in mineral oil can power a 10-ton chiller with a design point of 50.2 kW th . Compared with water-cooling experiments, the mineral oil experiment mitigated the temperature drop between the heat source and discharge line by up to 81%. In addition, due to this reduction in temperature drop, the heat quality in the oil discharge line was up to 12.3 °C higher on average than for water-cooled experiments. Furthermore, mineral oil cooling holds the potential to eliminate the 50% cooling expenditure which initially motivated this project

  5. Thermoelectric mini cooler coupled with micro thermosiphon for CPU cooling system

    International Nuclear Information System (INIS)

    Liu, Di; Zhao, Fu-Yun; Yang, Hong-Xing; Tang, Guang-Fa

    2015-01-01

    In the present study, a thermoelectric mini cooler coupling with a micro thermosiphon cooling system has been proposed for the purpose of CPU cooling. A mathematical model of heat transfer, depending on one-dimensional treatment of thermal and electric power, is firstly established for the thermoelectric module. Analytical results demonstrate the relationship between the maximal COP (Coefficient of Performance) and Q c with the figure of merit. Full-scale experiments have been conducted to investigate the effect of thermoelectric operating voltage, power input of heat source, and thermoelectric module number on the performance of the cooling system. Experimental results indicated that the cooling production increases with promotion of thermoelectric operating voltage. Surface temperature of CPU heat source linearly increases with increasing of power input, and its maximum value reached 70 °C as the prototype CPU power input was equivalent to 84 W. Insulation between air and heat source surface can prevent the condensate water due to low surface temperature. In addition, thermal performance of this cooling system could be enhanced when the total dimension of thermoelectric module matched well with the dimension of CPU. This research could benefit the design of thermal dissipation of electronic chips and CPU units. - Highlights: • A cooling system coupled with thermoelectric module and loop thermosiphon is developed. • Thermoelectric module coupled with loop thermosiphon can achieve high heat-transfer efficiency. • A mathematical model of thermoelectric cooling is built. • An analysis of modeling results for design and experimental data are presented. • Influence of power input and operating voltage on the cooling system are researched

  6. Data register and processor for multiwire chambers

    International Nuclear Information System (INIS)

    Karpukhin, V.V.

    1985-01-01

    A data register and a processor for data receiving and processing from drift chambers of a device for investigating relativistic positroniums are described. The data are delivered to the register input in the form of the Grey 8 bit code, memorized and transformed to a position code. The register information is delivered to the KAMAK trunk and to the front panel plug. The processor selects particle tracks in a horizontal plane of the facility. ΔY maximum coordinate divergence and minimum point quantity on the track are set from the processor front panel. Processor solution time is 16 μs maximum quantity of simultaneously analyzed coordinates is 16

  7. Many - body simulations using an array processor

    International Nuclear Information System (INIS)

    Rapaport, D.C.

    1985-01-01

    Simulations of microscopic models of water and polypeptides using molecular dynamics and Monte Carlo techniques have been carried out with the aid of an FPS array processor. The computational techniques are discussed, with emphasis on the development and optimization of the software to take account of the special features of the processor. The computing requirements of these simulations exceed what could be reasonably carried out on a normal 'scientific' computer. While the FPS processor is highly suited to the kinds of models described, several other computationally intensive problems in statistical mechanics are outlined for which alternative processor architectures are more appropriate

  8. Sensitometric control of roentgen film processors

    International Nuclear Information System (INIS)

    Forsberg, H.; Karolinska Sjukhuset, Stockholm

    1987-01-01

    Monitoring of film processors performance is essential since image quality, patient dose and costs are influenced by the performance. A system for sensitometric constancy control of film processors and their associated components is described. Experience with the system for 3 years is given when implemented on 17 film processors. Modern high quality film processors have a stability that makes a test frequency of once a week sufficient to maintain adequate image quality. The test system is so sensitive that corrective actions almost invariably have been taken before any technical problem degraded the image quality to a visible degree. (orig.)

  9. Channel processor in 2D cluster finding algorithm for high energy physics application

    International Nuclear Information System (INIS)

    Paul, Rourab; Chakrabarti, Amlan; Mitra, Jubin; Khan, Shuaib A.; Nayak, Tapan; Mukherjee, Sanjoy

    2016-01-01

    In a Large Ion Collider Experiment (ALICE) at CERN 1 TB/s (approximately) data comes from front end electronics. Previously, we had 1 GBT link operated with a cluster clock frequencies of 133 MHz and 320 MHz in Run 1 and Run 2 respectively. The cluster algorithm proposed in Run 1 and 2 could not work in Run 3 as the data speed increased almost 20 times. Older version cluster algorithm receives data sequentially as a stream. It has 2 main sub processes - Channel Processor, Merging process. The initial step of channel processor finds a peak Q max and sums up pads (sensors) data from -2 time bin to +2 time bin in the time direction. The computed value stores in a register named cluster fragment data (cfd o ). The merging process merges cfd o in pad direction. The data streams in Run 2 comes sequentially, which processed by the channel processor and merging block in a sequential manner with very less resource over head. In Run 3 data comes parallely, 1600 data from 1600 pads of a single time instant comes at each 200 ns interval (5 MHz) which is very challenging to process in the budgeted resource platform of Arria 10 FPGA hardware with 250 to 320 MHz cluster clock

  10. A low power biomedical signal processor ASIC based on hardware software codesign.

    Science.gov (United States)

    Nie, Z D; Wang, L; Chen, W G; Zhang, T; Zhang, Y T

    2009-01-01

    A low power biomedical digital signal processor ASIC based on hardware and software codesign methodology was presented in this paper. The codesign methodology was used to achieve higher system performance and design flexibility. The hardware implementation included a low power 32bit RISC CPU ARM7TDMI, a low power AHB-compatible bus, and a scalable digital co-processor that was optimized for low power Fast Fourier Transform (FFT) calculations. The co-processor could be scaled for 8-point, 16-point and 32-point FFTs, taking approximate 50, 100 and 150 clock circles, respectively. The complete design was intensively simulated using ARM DSM model and was emulated by ARM Versatile platform, before conducted to silicon. The multi-million-gate ASIC was fabricated using SMIC 0.18 microm mixed-signal CMOS 1P6M technology. The die area measures 5,000 microm x 2,350 microm. The power consumption was approximately 3.6 mW at 1.8 V power supply and 1 MHz clock rate. The power consumption for FFT calculations was less than 1.5 % comparing with the conventional embedded software-based solution.

  11. CRI, 4-Processor VAX-11/780 Simulation of CRAY Multitasking System

    International Nuclear Information System (INIS)

    Werner, N.E.; Van Matre, S.W.

    1988-01-01

    1 - Description of program or function: CRI is a subroutine library and set of utilities which allow the use of a four-processor shared memory DEC VAX11/780-4 computer for parallel processing in a manner compatible with the present use of Cray Research, Inc.'s (CRI's) multitasking primitives on Cray computers. Included in the library are subroutines to perform resource initialization, task functions, lock operations, event signals, file sharing, and work queueing synchronization. 2 - Method of solution: A task consists of code and data that can be scheduled for execution on a CPU. Locks are the facility for monitoring critical regions of code. Events allow signaling between tasks; they have two states: cleared and posted. Posting an event allows all other tasks waiting on that event to resume execution. The CRI utilities consist of command procedures for creating the files needed to use the shared memory; for compiling and liking a multitasking program; for starting the logical processors on the physical processors after the time specified by submitting the job(s) to the selected generic batch queue and, optionally, interactively relinquishing control to the multiprocessor debugger; and for removing jobs from the batch queue and, optionally, un-mapping specified global sections from shared memory. CRIDEBUG utility does not work properly in this release

  12. DIALIGN P: Fast pair-wise and multiple sequence alignment using parallel processors

    Directory of Open Access Journals (Sweden)

    Kaufmann Michael

    2004-09-01

    Full Text Available Abstract Background Parallel computing is frequently used to speed up computationally expensive tasks in Bioinformatics. Results Herein, a parallel version of the multi-alignment program DIALIGN is introduced. We propose two ways of dividing the program into independent sub-routines that can be run on different processors: (a pair-wise sequence alignments that are used as a first step to multiple alignment account for most of the CPU time in DIALIGN. Since alignments of different sequence pairs are completely independent of each other, they can be distributed to multiple processors without any effect on the resulting output alignments. (b For alignments of large genomic sequences, we use a heuristics by splitting up sequences into sub-sequences based on a previously introduced anchored alignment procedure. For our test sequences, this combined approach reduces the program running time of DIALIGN by up to 97%. Conclusions By distributing sub-routines to multiple processors, the running time of DIALIGN can be crucially improved. With these improvements, it is possible to apply the program in large-scale genomics and proteomics projects that were previously beyond its scope.

  13. Producing chopped firewood with firewood processors

    International Nuclear Information System (INIS)

    Kaerhae, K.; Jouhiaho, A.

    2009-01-01

    The TTS Institute's research and development project studied both the productivity of new, chopped firewood processors (cross-cutting and splitting machines) suitable for professional and independent small-scale production, and the costs of the chopped firewood produced. Seven chopped firewood processors were tested in the research, six of which were sawing processors and one shearing processor. The chopping work was carried out using wood feeding racks and a wood lifter. The work was also carried out without any feeding appliances. Altogether 132.5 solid m 3 of wood were chopped in the time studies. The firewood processor used had the most significant impact on chopping work productivity. In addition to the firewood processor, the stem mid-diameter, the length of the raw material, and of the firewood were also found to affect productivity. The wood feeding systems also affected productivity. If there is a feeding rack and hydraulic grapple loader available for use in chopping firewood, then it is worth using the wood feeding rack. A wood lifter is only worth using with the largest stems (over 20 cm mid-diameter) if a feeding rack cannot be used. When producing chopped firewood from small-diameter wood, i.e. with a mid-diameter less than 10 cm, the costs of chopping work were over 10 EUR solid m -3 with sawing firewood processors. The shearing firewood processor with a guillotine blade achieved a cost level of 5 EUR solid m -3 when the mid-diameter of the chopped stem was 10 cm. In addition to the raw material, the cost-efficient chopping work also requires several hundred annual operating hours with a firewood processor, which is difficult for individual firewood entrepreneurs to achieve. The operating hours of firewood processors can be increased to the required level by the joint use of the processors by a number of firewood entrepreneurs. (author)

  14. CPU architecture for a fast and energy-saving calculation of convolution neural networks

    Science.gov (United States)

    Knoll, Florian J.; Grelcke, Michael; Czymmek, Vitali; Holtorf, Tim; Hussmann, Stephan

    2017-06-01

    One of the most difficult problem in the use of artificial neural networks is the computational capacity. Although large search engine companies own specially developed hardware to provide the necessary computing power, for the conventional user only remains the state of the art method, which is the use of a graphic processing unit (GPU) as a computational basis. Although these processors are well suited for large matrix computations, they need massive energy. Therefore a new processor on the basis of a field programmable gate array (FPGA) has been developed and is optimized for the application of deep learning. This processor is presented in this paper. The processor can be adapted for a particular application (in this paper to an organic farming application). The power consumption is only a fraction of a GPU application and should therefore be well suited for energy-saving applications.

  15. CPU Server

    CERN Multimedia

    The CERN computer centre has hundreds of racks like these. They are over a million times more powerful than our first computer in the 1960's. This tray is a 'dual-core' server. This means it effectively has two CPUs in it (eg. two of your home computers minimised to fit into a single box). Also note the copper cooling fins, to help dissipate the heat.

  16. SAFARI digital processing unit: performance analysis of the SpaceWire links in case of a LEON3-FT based CPU

    Science.gov (United States)

    Giusi, Giovanni; Liu, Scige J.; Di Giorgio, Anna M.; Galli, Emanuele; Pezzuto, Stefano; Farina, Maria; Spinoglio, Luigi

    2014-08-01

    SAFARI (SpicA FAR infrared Instrument) is a far-infrared imaging Fourier Transform Spectrometer for the SPICA mission. The Digital Processing Unit (DPU) of the instrument implements the functions of controlling the overall instrument and implementing the science data compression and packing. The DPU design is based on the use of a LEON family processor. In SAFARI, all instrument components are connected to the central DPU via SpaceWire links. On these links science data, housekeeping and commands flows are in some cases multiplexed, therefore the interface control shall be able to cope with variable throughput needs. The effective data transfer workload can be an issue for the overall system performances and becomes a critical parameter for the on-board software design, both at application layer level and at lower, and more HW related, levels. To analyze the system behavior in presence of the expected SAFARI demanding science data flow, we carried out a series of performance tests using the standard GR-CPCI-UT699 LEON3-FT Development Board, provided by Aeroflex/Gaisler, connected to the emulator of the SAFARI science data links, in a point-to-point topology. Two different communication protocols have been used in the tests, the ECSS-E-ST-50-52C RMAP protocol and an internally defined one, the SAFARI internal data handling protocol. An incremental approach has been adopted to measure the system performances at different levels of the communication protocol complexity. In all cases the performance has been evaluated by measuring the CPU workload and the bus latencies. The tests have been executed initially in a custom low level execution environment and finally using the Real- Time Executive for Multiprocessor Systems (RTEMS), which has been selected as the operating system to be used onboard SAFARI. The preliminary results of the carried out performance analysis confirmed the possibility of using a LEON3 CPU processor in the SAFARI DPU, but pointed out, in agreement

  17. Micro processors for plant protection

    International Nuclear Information System (INIS)

    McAffer, N.T.C.

    1976-01-01

    Micro computers can be used satisfactorily in general protection duties with economic advantages over hardwired systems. The reliability of such protection functions can be enhanced by keeping the task performed by each protection micro processor simple and by avoiding such a task being dependent on others in any substantial way. This implies that vital work done for any task is kept within it and that any communications from it to outside or to it from outside are restricted to those for controlling data transfer. Also that the amount of this data should be the minimum consistent with satisfactory task execution. Technology is changing rapidly and devices may become obsolete and be supplanted by new ones before their theoretical reliability can be confirmed or otherwise by field service. This emphasises the need for users to pool device performance data so that effective reliability judgements can be made within the lifetime of the devices. (orig.) [de

  18. Towards a Process Algebra for Shared Processors

    DEFF Research Database (Denmark)

    Buchholtz, Mikael; Andersen, Jacob; Løvengreen, Hans Henrik

    2002-01-01

    We present initial work on a timed process algebra that models sharing of processor resources allowing preemption at arbitrary points in time. This enables us to model both the functional and the timely behaviour of concurrent processes executed on a single processor. We give a refinement relation...

  19. Vector and parallel processors in computational science

    International Nuclear Information System (INIS)

    Duff, I.S.; Reid, J.K.

    1985-01-01

    These proceedings contain the articles presented at the named conference. These concern hardware and software for vector and parallel processors, numerical methods and algorithms for the computation on such processors, as well as applications of such methods to different fields of physics and related sciences. See hints under the relevant topics. (HSI)

  20. The communication processor of TUMULT-64

    NARCIS (Netherlands)

    Smit, Gerardus Johannes Maria; Jansen, P.G.

    1988-01-01

    Tumult (Twente University MULTi-processor system) is a modular extendible multi-processor system designed and implemented at the Twente University of Technology in co-operation with Oce Nederland B.V. and the Dr. Neher Laboratories (Dutch PTT). Characteristics of the hardware are: MIMD type,

  1. An interactive parallel processor for data analysis

    International Nuclear Information System (INIS)

    Mong, J.; Logan, D.; Maples, C.; Rathbun, W.; Weaver, D.

    1984-01-01

    A parallel array of eight minicomputers has been assembled in an attempt to deal with kiloparameter data events. By exporting computer system functions to a separate processor, the authors have been able to achieve computer amplification linearly proportional to the number of executing processors

  2. Comparison of Processor Performance of SPECint2006 Benchmarks of some Intel Xeon Processors

    OpenAIRE

    Abdul Kareem PARCHUR; Ram Asaray SINGH

    2012-01-01

    High performance is a critical requirement to all microprocessors manufacturers. The present paper describes the comparison of performance in two main Intel Xeon series processors (Type A: Intel Xeon X5260, X5460, E5450 and L5320 and Type B: Intel Xeon X5140, 5130, 5120 and E5310). The microarchitecture of these processors is implemented using the basis of a new family of processors from Intel starting with the Pentium 4 processor. These processors can provide a performance boost for many ke...

  3. Implementation of 4-way Superscalar Hash MIPS Processor Using FPGA

    Science.gov (United States)

    Sahib Omran, Safaa; Fouad Jumma, Laith

    2018-05-01

    Due to the quick advancements in the personal communications systems and wireless communications, giving data security has turned into a more essential subject. This security idea turns into a more confounded subject when next-generation system requirements and constant calculation speed are considered in real-time. Hash functions are among the most essential cryptographic primitives and utilized as a part of the many fields of signature authentication and communication integrity. These functions are utilized to acquire a settled size unique fingerprint or hash value of an arbitrary length of message. In this paper, Secure Hash Algorithms (SHA) of types SHA-1, SHA-2 (SHA-224, SHA-256) and SHA-3 (BLAKE) are implemented on Field-Programmable Gate Array (FPGA) in a processor structure. The design is described and implemented using a hardware description language, namely VHSIC “Very High Speed Integrated Circuit” Hardware Description Language (VHDL). Since the logical operation of the hash types of (SHA-1, SHA-224, SHA-256 and SHA-3) are 32-bits, so a Superscalar Hash Microprocessor without Interlocked Pipelines (MIPS) processor are designed with only few instructions that were required in invoking the desired Hash algorithms, when the four types of hash algorithms executed sequentially using the designed processor, the total time required equal to approximately 342 us, with a throughput of 4.8 Mbps while the required to execute the same four hash algorithms using the designed four-way superscalar is reduced to 237 us with improved the throughput to 5.1 Mbps.

  4. Neurovision processor for designing intelligent sensors

    Science.gov (United States)

    Gupta, Madan M.; Knopf, George K.

    1992-03-01

    A programmable multi-task neuro-vision processor, called the Positive-Negative (PN) neural processor, is proposed as a plausible hardware mechanism for constructing robust multi-task vision sensors. The computational operations performed by the PN neural processor are loosely based on the neural activity fields exhibited by certain nervous tissue layers situated in the brain. The neuro-vision processor can be programmed to generate diverse dynamic behavior that may be used for spatio-temporal stabilization (STS), short-term visual memory (STVM), spatio-temporal filtering (STF) and pulse frequency modulation (PFM). A multi- functional vision sensor that performs a variety of information processing operations on time- varying two-dimensional sensory images can be constructed from a parallel and hierarchical structure of numerous individually programmed PN neural processors.

  5. Development of a highly reliable CRT processor

    International Nuclear Information System (INIS)

    Shimizu, Tomoya; Saiki, Akira; Hirai, Kenji; Jota, Masayoshi; Fujii, Mikiya

    1996-01-01

    Although CRT processors have been employed by the main control board to reduce the operator's workload during monitoring, the control systems are still operated by hardware switches. For further advancement, direct controller operation through a display device is expected. A CRT processor providing direct controller operation must be as reliable as the hardware switches are. The authors are developing a new type of highly reliable CRT processor that enables direct controller operations. In this paper, we discuss the design principles behind a highly reliable CRT processor. The principles are defined by studies of software reliability and of the functional reliability of the monitoring and operation systems. The functional configuration of an advanced CRT processor is also addressed. (author)

  6. Online track processor for the CDF upgrade

    International Nuclear Information System (INIS)

    Thomson, E. J.

    2002-01-01

    A trigger track processor, called the eXtremely Fast Tracker (XFT), has been designed for the CDF upgrade. This processor identifies high transverse momentum (> 1.5 GeV/c) charged particles in the new central outer tracking chamber for CDF II. The XFT design is highly parallel to handle the input rate of 183 Gbits/s and output rate of 44 Gbits/s. The processor is pipelined and reports the result for a new event every 132 ns. The processor uses three stages: hit classification, segment finding, and segment linking. The pattern recognition algorithms for the three stages are implemented in programmable logic devices (PLDs) which allow in-situ modification of the algorithm at any time. The PLDs reside on three different types of modules. The complete system has been installed and commissioned at CDF II. An overview of the track processor and performance in CDF Run II are presented

  7. Computer Generated Inputs for NMIS Processor Verification

    International Nuclear Information System (INIS)

    J. A. Mullens; J. E. Breeding; J. A. McEvers; R. W. Wysor; L. G. Chiang; J. R. Lenarduzzi; J. T. Mihalczo; J. K. Mattingly

    2001-01-01

    Proper operation of the Nuclear Identification Materials System (NMIS) processor can be verified using computer-generated inputs [BIST (Built-In-Self-Test)] at the digital inputs. Preselected sequences of input pulses to all channels with known correlation functions are compared to the output of the processor. These types of verifications have been utilized in NMIS type correlation processors at the Oak Ridge National Laboratory since 1984. The use of this test confirmed a malfunction in a NMIS processor at the All-Russian Scientific Research Institute of Experimental Physics (VNIIEF) in 1998. The NMIS processor boards were returned to the U.S. for repair and subsequently used in NMIS passive and active measurements with Pu at VNIIEF in 1999

  8. Point and track-finding processors for multiwire chambers

    CERN Document Server

    Hansroul, M

    1973-01-01

    The hardware processors described below are designed to be used in conjunction with multi-wire chambers. They have the characteristic of being based on computational methods in contrast to analogue procedures. In a sense, they are hardware implementations of computer programs. But, being specially designed for their purpose, they are free of the restrictions imposed by the architecture of the computer on which the equivalent program is to run. The parallelism inherent in the algorithms can thus be fully exploited. Combined with the use of fast access scratch-pad memories and the non-sequential nature of the control program, the parallelism accounts for the fact that these processors are expected to execute 2-3 orders of magnitude faster than the equivalent Fortran programs on a CDC 7600 or 6600. As a consequence, methods which are simple and straightforward, but which are impractical because they require an exorbitant amount of computer time can on the contrary be very attractive for hardware implementation. ...

  9. Sequential stochastic optimization

    CERN Document Server

    Cairoli, Renzo

    1996-01-01

    Sequential Stochastic Optimization provides mathematicians and applied researchers with a well-developed framework in which stochastic optimization problems can be formulated and solved. Offering much material that is either new or has never before appeared in book form, it lucidly presents a unified theory of optimal stopping and optimal sequential control of stochastic processes. This book has been carefully organized so that little prior knowledge of the subject is assumed; its only prerequisites are a standard graduate course in probability theory and some familiarity with discrete-paramet

  10. High performance technique for database applicationsusing a hybrid GPU/CPU platform

    KAUST Repository

    Zidan, Mohammed A.

    2012-07-28

    Many database applications, such as sequence comparing, sequence searching, and sequence matching, etc, process large database sequences. we introduce a novel and efficient technique to improve the performance of database applica- tions by using a Hybrid GPU/CPU platform. In particular, our technique solves the problem of the low efficiency result- ing from running short-length sequences in a database on a GPU. To verify our technique, we applied it to the widely used Smith-Waterman algorithm. The experimental results show that our Hybrid GPU/CPU technique improves the average performance by a factor of 2.2, and improves the peak performance by a factor of 2.8 when compared to earlier implementations. Copyright © 2011 by ASME.

  11. Design Patterns for Sparse-Matrix Computations on Hybrid CPU/GPU Platforms

    Directory of Open Access Journals (Sweden)

    Valeria Cardellini

    2014-01-01

    Full Text Available We apply object-oriented software design patterns to develop code for scientific software involving sparse matrices. Design patterns arise when multiple independent developments produce similar designs which converge onto a generic solution. We demonstrate how to use design patterns to implement an interface for sparse matrix computations on NVIDIA GPUs starting from PSBLAS, an existing sparse matrix library, and from existing sets of GPU kernels for sparse matrices. We also compare the throughput of the PSBLAS sparse matrix–vector multiplication on two platforms exploiting the GPU with that obtained by a CPU-only PSBLAS implementation. Our experiments exhibit encouraging results regarding the comparison between CPU and GPU executions in double precision, obtaining a speedup of up to 35.35 on NVIDIA GTX 285 with respect to AMD Athlon 7750, and up to 10.15 on NVIDIA Tesla C2050 with respect to Intel Xeon X5650.

  12. Semiempirical Quantum Chemical Calculations Accelerated on a Hybrid Multicore CPU-GPU Computing Platform.

    Science.gov (United States)

    Wu, Xin; Koslowski, Axel; Thiel, Walter

    2012-07-10

    In this work, we demonstrate that semiempirical quantum chemical calculations can be accelerated significantly by leveraging the graphics processing unit (GPU) as a coprocessor on a hybrid multicore CPU-GPU computing platform. Semiempirical calculations using the MNDO, AM1, PM3, OM1, OM2, and OM3 model Hamiltonians were systematically profiled for three types of test systems (fullerenes, water clusters, and solvated crambin) to identify the most time-consuming sections of the code. The corresponding routines were ported to the GPU and optimized employing both existing library functions and a GPU kernel that carries out a sequence of noniterative Jacobi transformations during pseudodiagonalization. The overall computation times for single-point energy calculations and geometry optimizations of large molecules were reduced by one order of magnitude for all methods, as compared to runs on a single CPU core.

  13. Turbo Charge CPU Utilization in Fork/Join Using the ManagedBlocker

    CERN Multimedia

    CERN. Geneva

    2017-01-01

    Fork/Join is a framework for parallelizing calculations using recursive decomposition, also called divide and conquer. These algorithms occasionally end up duplicating work, especially at the beginning of the run. We can reduce wasted CPU cycles by implementing a reserved caching scheme. Before a task starts its calculation, it tries to reserve an entry in the shared map. If it is successful, it immediately begins. If not, it blocks until the other thread has finished its calculation. Unfortunately this might result in a significant number of blocked threads, decreasing CPU utilization. In this talk we will demonstrate this issue and offer a solution in the form of the ManagedBlocker. Combined with the Fork/Join, it can keep parallelism at the desired level.

  14. HEP specific benchmarks of virtual machines on multi-core CPU architectures

    International Nuclear Information System (INIS)

    Alef, M; Gable, I

    2010-01-01

    Virtualization technologies such as Xen can be used in order to satisfy the disparate and often incompatible system requirements of different user groups in shared-use computing facilities. This capability is particularly important for HEP applications, which often have restrictive requirements. The use of virtualization adds flexibility, however, it is essential that the virtualization technology place little overhead on the HEP application. We present an evaluation of the practicality of running HEP applications in multiple Virtual Machines (VMs) on a single multi-core Linux system. We use the benchmark suite used by the HEPiX CPU Benchmarking Working Group to give a quantitative evaluation relevant to the HEP community. Benchmarks are packaged inside VMs and then the VMs are booted onto a single multi-core system. Benchmarks are then simultaneously executed on each VM to simulate highly loaded VMs running HEP applications. These techniques are applied to a variety of multi-core CPU architectures and VM configurations.

  15. Design improvement of FPGA and CPU based digital circuit cards to solve timing issues

    International Nuclear Information System (INIS)

    Lee, Dongil; Lee, Jaeki; Lee, Kwang-Hyun

    2016-01-01

    The digital circuit cards installed at NPPs (Nuclear Power Plant) are mostly composed of a CPU (Central Processing Unit) and a PLD (Programmable Logic Device; these include a FPGA (Field Programmable Gate Array) and a CPLD (Complex Programmable Logic Device)). This type of structure is typical and is maintained using digital circuit cards. There are no big problems with this device as a structure. In particular, signal delay causes a lot of problems when various IC (Integrated Circuit) and several circuit cards are connected to the BUS of the backplane in the BUS design. This paper suggests a structure to improve the BUS signal timing problems in a circuit card consisting of CPU and FPGA. Nowadays, as the structure of circuit cards has become complex and mass data at high speed is communicated through the BUS, data integrity is the most important issue. The conventional design does not consider delay and the synchronicity of signal and this causes many problems in data processing. In order to solve these problems, it is important to isolate the BUS controller from the CPU and maintain constancy of the signal delay by using a PLD

  16. Improving the Performance of CPU Architectures by Reducing the Operating System Overhead (Extended Version

    Directory of Open Access Journals (Sweden)

    Zagan Ionel

    2016-07-01

    Full Text Available The predictable CPU architectures that run hard real-time tasks must be executed with isolation in order to provide a timing-analyzable execution for real-time systems. The major problems for real-time operating systems are determined by an excessive jitter, introduced mainly through task switching. This can alter deadline requirements, and, consequently, the predictability of hard real-time tasks. New requirements also arise for a real-time operating system used in mixed-criticality systems, when the executions of hard real-time applications require timing predictability. The present article discusses several solutions to improve the performance of CPU architectures and eventually overcome the Operating Systems overhead inconveniences. This paper focuses on the innovative CPU implementation named nMPRA-MT, designed for small real-time applications. This implementation uses the replication and remapping techniques for the program counter, general purpose registers and pipeline registers, enabling multiple threads to share a single pipeline assembly line. In order to increase predictability, the proposed architecture partially removes the hazard situation at the expense of larger execution latency per one instruction.

  17. A CFD Heterogeneous Parallel Solver Based on Collaborating CPU and GPU

    Science.gov (United States)

    Lai, Jianqi; Tian, Zhengyu; Li, Hua; Pan, Sha

    2018-03-01

    Since Graphic Processing Unit (GPU) has a strong ability of floating-point computation and memory bandwidth for data parallelism, it has been widely used in the areas of common computing such as molecular dynamics (MD), computational fluid dynamics (CFD) and so on. The emergence of compute unified device architecture (CUDA), which reduces the complexity of compiling program, brings the great opportunities to CFD. There are three different modes for parallel solution of NS equations: parallel solver based on CPU, parallel solver based on GPU and heterogeneous parallel solver based on collaborating CPU and GPU. As we can see, GPUs are relatively rich in compute capacity but poor in memory capacity and the CPUs do the opposite. We need to make full use of the GPUs and CPUs, so a CFD heterogeneous parallel solver based on collaborating CPU and GPU has been established. Three cases are presented to analyse the solver’s computational accuracy and heterogeneous parallel efficiency. The numerical results agree well with experiment results, which demonstrate that the heterogeneous parallel solver has high computational precision. The speedup on a single GPU is more than 40 for laminar flow, it decreases for turbulent flow, but it still can reach more than 20. What’s more, the speedup increases as the grid size becomes larger.

  18. Design improvement of FPGA and CPU based digital circuit cards to solve timing issues

    Energy Technology Data Exchange (ETDEWEB)

    Lee, Dongil; Lee, Jaeki; Lee, Kwang-Hyun [KHNP CRI, Daejeon (Korea, Republic of)

    2016-10-15

    The digital circuit cards installed at NPPs (Nuclear Power Plant) are mostly composed of a CPU (Central Processing Unit) and a PLD (Programmable Logic Device; these include a FPGA (Field Programmable Gate Array) and a CPLD (Complex Programmable Logic Device)). This type of structure is typical and is maintained using digital circuit cards. There are no big problems with this device as a structure. In particular, signal delay causes a lot of problems when various IC (Integrated Circuit) and several circuit cards are connected to the BUS of the backplane in the BUS design. This paper suggests a structure to improve the BUS signal timing problems in a circuit card consisting of CPU and FPGA. Nowadays, as the structure of circuit cards has become complex and mass data at high speed is communicated through the BUS, data integrity is the most important issue. The conventional design does not consider delay and the synchronicity of signal and this causes many problems in data processing. In order to solve these problems, it is important to isolate the BUS controller from the CPU and maintain constancy of the signal delay by using a PLD.

  19. Research on the Prediction Model of CPU Utilization Based on ARIMA-BP Neural Network

    Directory of Open Access Journals (Sweden)

    Wang Jina

    2016-01-01

    Full Text Available The dynamic deployment technology of the virtual machine is one of the current cloud computing research focuses. The traditional methods mainly work after the degradation of the service performance that usually lag. To solve the problem a new prediction model based on the CPU utilization is constructed in this paper. A reference offered by the new prediction model of the CPU utilization is provided to the VM dynamic deployment process which will speed to finish the deployment process before the degradation of the service performance. By this method it not only ensure the quality of services but also improve the server performance and resource utilization. The new prediction method of the CPU utilization based on the ARIMA-BP neural network mainly include four parts: preprocess the collected data, build the predictive model of ARIMA-BP neural network, modify the nonlinear residuals of the time series by the BP prediction algorithm and obtain the prediction results by analyzing the above data comprehensively.

  20. LHCb: Statistical Comparison of CPU performance for LHCb applications on the Grid

    CERN Multimedia

    Graciani, R

    2009-01-01

    The usage of CPU resources by LHCb on the Grid id dominated by two different applications: Gauss and Brunel. Gauss the application doing the Monte Carlo simulation of proton-proton collisions. Brunel is the application responsible for the reconstruction of the signals recorded by the detector converting them into objects that can be used for later physics analysis of the data (tracks, clusters,…) Both applications are based on the Gaudi and LHCb software frameworks. Gauss uses Pythia and Geant as underlying libraries for the simulation of the collision and the later passage of the generated particles through the LHCb detector. While Brunel makes use of LHCb specific code to process the data from each sub-detector. Both applications are CPU bound. Large Monte Carlo productions or data reconstructions running on the Grid are an ideal benchmark to compare the performance of the different CPU models for each case. Since the processed events are only statistically comparable, only statistical comparison of the...

  1. Analytical Bounds on the Threads in IXP1200 Network Processor

    OpenAIRE

    Ramakrishna, STGS; Jamadagni, HS

    2003-01-01

    Increasing link speeds have placed enormous burden on the processing requirements and the processors are expected to carry out a variety of tasks. Network Processors (NP) [1] [2] is the blanket name given to the processors, which are traded for flexibility and performance. Network Processors are offered by a number of vendors; to take the main burden of processing requirement of network related operations from the conventional processors. The Network Processors cover a spectrum of design trad...

  2. Sequential memory: Binding dynamics

    Science.gov (United States)

    Afraimovich, Valentin; Gong, Xue; Rabinovich, Mikhail

    2015-10-01

    Temporal order memories are critical for everyday animal and human functioning. Experiments and our own experience show that the binding or association of various features of an event together and the maintaining of multimodality events in sequential order are the key components of any sequential memories—episodic, semantic, working, etc. We study a robustness of binding sequential dynamics based on our previously introduced model in the form of generalized Lotka-Volterra equations. In the phase space of the model, there exists a multi-dimensional binding heteroclinic network consisting of saddle equilibrium points and heteroclinic trajectories joining them. We prove here the robustness of the binding sequential dynamics, i.e., the feasibility phenomenon for coupled heteroclinic networks: for each collection of successive heteroclinic trajectories inside the unified networks, there is an open set of initial points such that the trajectory going through each of them follows the prescribed collection staying in a small neighborhood of it. We show also that the symbolic complexity function of the system restricted to this neighborhood is a polynomial of degree L - 1, where L is the number of modalities.

  3. Sequential Dependencies in Driving

    Science.gov (United States)

    Doshi, Anup; Tran, Cuong; Wilder, Matthew H.; Mozer, Michael C.; Trivedi, Mohan M.

    2012-01-01

    The effect of recent experience on current behavior has been studied extensively in simple laboratory tasks. We explore the nature of sequential effects in the more naturalistic setting of automobile driving. Driving is a safety-critical task in which delayed response times may have severe consequences. Using a realistic driving simulator, we find…

  4. Mining compressing sequential problems

    NARCIS (Netherlands)

    Hoang, T.L.; Mörchen, F.; Fradkin, D.; Calders, T.G.K.

    2012-01-01

    Compression based pattern mining has been successfully applied to many data mining tasks. We propose an approach based on the minimum description length principle to extract sequential patterns that compress a database of sequences well. We show that mining compressing patterns is NP-Hard and

  5. Effect of processor temperature on film dosimetry

    International Nuclear Information System (INIS)

    Srivastava, Shiv P.; Das, Indra J.

    2012-01-01

    Optical density (OD) of a radiographic film plays an important role in radiation dosimetry, which depends on various parameters, including beam energy, depth, field size, film batch, dose, dose rate, air film interface, postexposure processing time, and temperature of the processor. Most of these parameters have been studied for Kodak XV and extended dose range (EDR) films used in radiation oncology. There is very limited information on processor temperature, which is investigated in this study. Multiple XV and EDR films were exposed in the reference condition (d max. , 10 × 10 cm 2 , 100 cm) to a given dose. An automatic film processor (X-Omat 5000) was used for processing films. The temperature of the processor was adjusted manually with increasing temperature. At each temperature, a set of films was processed to evaluate OD at a given dose. For both films, OD is a linear function of processor temperature in the range of 29.4–40.6°C (85–105°F) for various dose ranges. The changes in processor temperature are directly related to the dose by a quadratic function. A simple linear equation is provided for the changes in OD vs. processor temperature, which could be used for correcting dose in radiation dosimetry when film is used.

  6. Optical Associative Processors For Visual Perception"

    Science.gov (United States)

    Casasent, David; Telfer, Brian

    1988-05-01

    We consider various associative processor modifications required to allow these systems to be used for visual perception, scene analysis, and object recognition. For these applications, decisions on the class of the objects present in the input image are required and thus heteroassociative memories are necessary (rather than the autoassociative memories that have been given most attention). We analyze the performance of both associative processors and note that there is considerable difference between heteroassociative and autoassociative memories. We describe associative processors suitable for realizing functions such as: distortion invariance (using linear discriminant function memory synthesis techniques), noise and image processing performance (using autoassociative memories in cascade with with a heteroassociative processor and with a finite number of autoassociative memory iterations employed), shift invariance (achieved through the use of associative processors operating on feature space data), and the analysis of multiple objects in high noise (which is achieved using associative processing of the output from symbolic correlators). We detail and provide initial demonstrations of the use of associative processors operating on iconic, feature space and symbolic data, as well as adaptive associative processors.

  7. Development of Innovative Design Processor

    International Nuclear Information System (INIS)

    Park, Y.S.; Park, C.O.

    2004-01-01

    The nuclear design analysis requires time-consuming and erroneous model-input preparation, code run, output analysis and quality assurance process. To reduce human effort and improve design quality and productivity, Innovative Design Processor (IDP) is being developed. Two basic principles of IDP are the document-oriented design and the web-based design. The document-oriented design is that, if the designer writes a design document called active document and feeds it to a special program, the final document with complete analysis, table and plots is made automatically. The active documents can be written with ordinary HTML editors or created automatically on the web, which is another framework of IDP. Using the proper mix-up of server side and client side programming under the LAMP (Linux/Apache/MySQL/PHP) environment, the design process on the web is modeled as a design wizard style so that even a novice designer makes the design document easily. This automation using the IDP is now being implemented for all the reload design of Korea Standard Nuclear Power Plant (KSNP) type PWRs. The introduction of this process will allow large reduction in all reload design efforts of KSNP and provide a platform for design and R and D tasks of KNFC. (authors)

  8. Onboard spectral imager data processor

    Science.gov (United States)

    Otten, Leonard J.; Meigs, Andrew D.; Franklin, Abraham J.; Sears, Robert D.; Robison, Mark W.; Rafert, J. Bruce; Fronterhouse, Donald C.; Grotbeck, Ronald L.

    1999-10-01

    Previous papers have described the concept behind the MightySat II.1 program, the satellite's Fourier Transform imaging spectrometer's optical design, the design for the spectral imaging payload, and its initial qualification testing. This paper discusses the on board data processing designed to reduce the amount of downloaded data by an order of magnitude and provide a demonstration of a smart spaceborne spectral imaging sensor. Two custom components, a spectral imager interface 6U VME card that moves data at over 30 MByte/sec, and four TI C-40 processors mounted to a second 6U VME and daughter card, are used to adapt the sensor to the spacecraft and provide the necessary high speed processing. A system architecture that offers both on board real time image processing and high-speed post data collection analysis of the spectral data has been developed. In addition to the on board processing of the raw data into a usable spectral data volume, one feature extraction technique has been incorporated. This algorithm operates on the basic interferometric data. The algorithm is integrated within the data compression process to search for uploadable feature descriptions.

  9. A data base processor semantics specification package

    Science.gov (United States)

    Fishwick, P. A.

    1983-01-01

    A Semantics Specification Package (DBPSSP) for the Intel Data Base Processor (DBP) is defined. DBPSSP serves as a collection of cross assembly tools that allow the analyst to assemble request blocks on the host computer for passage to the DBP. The assembly tools discussed in this report may be effectively used in conjunction with a DBP compatible data communications protocol to form a query processor, precompiler, or file management system for the database processor. The source modules representing the components of DBPSSP are fully commented and included.

  10. Hardware trigger processor for the MDT system

    CERN Document Server

    AUTHOR|(SzGeCERN)757787; The ATLAS collaboration; Hazen, Eric; Butler, John; Black, Kevin; Gastler, Daniel Edward; Ntekas, Konstantinos; Taffard, Anyes; Martinez Outschoorn, Verena; Ishino, Masaya; Okumura, Yasuyuki

    2017-01-01

    We are developing a low-latency hardware trigger processor for the Monitored Drift Tube system in the Muon spectrometer. The processor will fit candidate Muon tracks in the drift tubes in real time, improving significantly the momentum resolution provided by the dedicated trigger chambers. We present a novel pure-FPGA implementation of a Legendre transform segment finder, an associative-memory alternative implementation, an ARM (Zynq) processor-based track fitter, and compact ATCA carrier board architecture. The ATCA architecture is designed to allow a modular, staged approach to deployment of the system and exploration of alternative technologies.

  11. Analysis of impact of general-purpose graphics processor units in supersonic flow modeling

    Science.gov (United States)

    Emelyanov, V. N.; Karpenko, A. G.; Kozelkov, A. S.; Teterina, I. V.; Volkov, K. N.; Yalozo, A. V.

    2017-06-01

    Computational methods are widely used in prediction of complex flowfields associated with off-normal situations in aerospace engineering. Modern graphics processing units (GPU) provide architectures and new programming models that enable to harness their large processing power and to design computational fluid dynamics (CFD) simulations at both high performance and low cost. Possibilities of the use of GPUs for the simulation of external and internal flows on unstructured meshes are discussed. The finite volume method is applied to solve three-dimensional unsteady compressible Euler and Navier-Stokes equations on unstructured meshes with high resolution numerical schemes. CUDA technology is used for programming implementation of parallel computational algorithms. Solutions of some benchmark test cases on GPUs are reported, and the results computed are compared with experimental and computational data. Approaches to optimization of the CFD code related to the use of different types of memory are considered. Speedup of solution on GPUs with respect to the solution on central processor unit (CPU) is compared. Performance measurements show that numerical schemes developed achieve 20-50 speedup on GPU hardware compared to CPU reference implementation. The results obtained provide promising perspective for designing a GPU-based software framework for applications in CFD.

  12. Power-Energy Simulation for Multi-Core Processors in Bench-marking

    Directory of Open Access Journals (Sweden)

    Mona A. Abou-Of

    2017-01-01

    Full Text Available At Microarchitectural level, multi-core processor, as a complex System on Chip, has sophisticated on-chip components including cores, shared caches, interconnects and system controllers such as memory and ethernet controllers. At technological level, architects should consider the device types forecast in the International Technology Roadmap for Semiconductors (ITRS. Energy simulation enables architects to study two important metrics simultaneously. Timing is a key element of the CPU performance that imposes constraints on the CPU target clock frequency. Power and the resulting heat impose more severe design constraints, such as core clustering, while semiconductor industry is providing more transistors in the die area in pace with Moore’s law. Energy simulators provide a solution for such serious challenge. Energy is modelled either by combining performance benchmarking tool with a power simulator or by an integrated framework of both performance simulator and power profiling system. This article presents and asses trade-offs between different architectures using four cores battery-powered mobile systems by running a custom-made and a standard benchmark tools. The experimental results assure the Energy/ Frequency convexity rule over a range of frequency settings on different number of enabled cores. The reported results show that increasing the number of cores has a great effect on increasing the power consumption. However, a minimum energy dissipation will occur at a lower frequency which reduces the power consumption. Despite that, increasing the number of cores will also increase the effective cores value which will reflect a better processor performance.

  13. Design concepts for a virtualizable embedded MPSoC architecture enabling virtualization in embedded multi-processor systems

    CERN Document Server

    Biedermann, Alexander

    2014-01-01

    Alexander Biedermann presents a generic hardware-based virtualization approach, which may transform an array of any off-the-shelf embedded processors into a multi-processor system with high execution dynamism. Based on this approach, he highlights concepts for the design of energy aware systems, self-healing systems as well as parallelized systems. For the latter, the novel so-called Agile Processing scheme is introduced by the author, which enables a seamless transition between sequential and parallel execution schemes. The design of such virtualizable systems is further aided by introduction

  14. Multi-CPU plasma fluid turbulence calculations on a CRAY Y-MP C90

    International Nuclear Information System (INIS)

    Lynch, V.E.; Carreras, B.A.; Leboeuf, J.N.; Curtis, B.C.; Troutman, R.L.

    1993-01-01

    Significant improvements in real-time efficiency have been obtained for plasma fluid turbulence calculations by microtasking the nonlinear fluid code KITE in which they are implemented on the CRAY Y-MP C90 at the National Energy Research Supercomputer Center (NERSC). The number of processors accessed concurrently scales linearly with problem size. Close to six concurrent processors have so far been obtained with a three-dimensional nonlinear production calculation at the currently allowed memory size of 80 Mword. With a calculation size corresponding to the maximum allowed memory of 200 Mword in the next system configuration, they expect to be able to access close to ten processors of the C90 concurrently with a commensurate improvement in real-time efficiency. These improvements in performance are comparable to those expected from a massively parallel implementation of the same calculations on the Intel Paragon

  15. Multi-CPU plasma fluid turbulence calculations on a CRAY Y-MP C90

    International Nuclear Information System (INIS)

    Lynch, V.E.; Carreras, B.A.; Leboeuf, J.N.; Curtis, B.C.; Troutman, R.L.

    1993-01-01

    Significant improvements in real-time efficiency have been obtained for plasma fluid turbulence calculations by microtasking the nonlinear fluid code KITE in which they are implemented on the CRAY Y-MP C90 at the National Energy Research Supercomputer Center (NERSC). The number of processors accessed concurrently scales linearly with problem size. Close to six concurrent processors have so far been obtained with a three-dimensional nonlinear production calculation at the currently allowed memory size of 80 Mword. With a calculation size corresponding to the maximum allowed memory of 200 Mword in the next system configuration, we expect to be able to access close to nine processors of the C90 concurrently with a commensurate improvement in real-time efficiency. These improvements in performance are comparable to those expected from a massively parallel implementation of the same calculations on the Intel Paragon

  16. MILC Code Performance on High End CPU and GPU Supercomputer Clusters

    Science.gov (United States)

    DeTar, Carleton; Gottlieb, Steven; Li, Ruizi; Toussaint, Doug

    2018-03-01

    With recent developments in parallel supercomputing architecture, many core, multi-core, and GPU processors are now commonplace, resulting in more levels of parallelism, memory hierarchy, and programming complexity. It has been necessary to adapt the MILC code to these new processors starting with NVIDIA GPUs, and more recently, the Intel Xeon Phi processors. We report on our efforts to port and optimize our code for the Intel Knights Landing architecture. We consider performance of the MILC code with MPI and OpenMP, and optimizations with QOPQDP and QPhiX. For the latter approach, we concentrate on the staggered conjugate gradient and gauge force. We also consider performance on recent NVIDIA GPUs using the QUDA library.

  17. MILC Code Performance on High End CPU and GPU Supercomputer Clusters

    Directory of Open Access Journals (Sweden)

    DeTar Carleton

    2018-01-01

    Full Text Available With recent developments in parallel supercomputing architecture, many core, multi-core, and GPU processors are now commonplace, resulting in more levels of parallelism, memory hierarchy, and programming complexity. It has been necessary to adapt the MILC code to these new processors starting with NVIDIA GPUs, and more recently, the Intel Xeon Phi processors. We report on our efforts to port and optimize our code for the Intel Knights Landing architecture. We consider performance of the MILC code with MPI and OpenMP, and optimizations with QOPQDP and QPhiX. For the latter approach, we concentrate on the staggered conjugate gradient and gauge force. We also consider performance on recent NVIDIA GPUs using the QUDA library.

  18. Photonics and Fiber Optics Processor Lab

    Data.gov (United States)

    Federal Laboratory Consortium — The Photonics and Fiber Optics Processor Lab develops, tests and evaluates high speed fiber optic network components as well as network protocols. In addition, this...

  19. Forced Sequence Sequential Decoding

    DEFF Research Database (Denmark)

    Jensen, Ole Riis

    In this thesis we describe a new concatenated decoding scheme based on iterations between an inner sequentially decoded convolutional code of rate R=1/4 and memory M=23, and block interleaved outer Reed-Solomon codes with non-uniform profile. With this scheme decoding with good performance...... is possible as low as Eb/No=0.6 dB, which is about 1.7 dB below the signal-to-noise ratio that marks the cut-off rate for the convolutional code. This is possible since the iteration process provides the sequential decoders with side information that allows a smaller average load and minimizes the probability...... of computational overflow. Analytical results for the probability that the first Reed-Solomon word is decoded after C computations are presented. This is supported by simulation results that are also extended to other parameters....

  20. Keystone Business Models for Network Security Processors

    OpenAIRE

    Arthur Low; Steven Muegge

    2013-01-01

    Network security processors are critical components of high-performance systems built for cybersecurity. Development of a network security processor requires multi-domain experience in semiconductors and complex software security applications, and multiple iterations of both software and hardware implementations. Limited by the business models in use today, such an arduous task can be undertaken only by large incumbent companies and government organizations. Neither the “fabless semiconductor...

  1. Real time monitoring of electron processors

    International Nuclear Information System (INIS)

    Nablo, S.V.; Kneeland, D.R.; McLaughlin, W.L.

    1995-01-01

    A real time radiation monitor (RTRM) has been developed for monitoring the dose rate (current density) of electron beam processors. The system provides continuous monitoring of processor output, electron beam uniformity, and an independent measure of operating voltage or electron energy. In view of the device's ability to replace labor-intensive dosimetry in verification of machine performance on a real-time basis, its application to providing archival performance data for in-line processing is discussed. (author)

  2. Sequential Power-Dependence Theory

    NARCIS (Netherlands)

    Buskens, Vincent; Rijt, Arnout van de

    2008-01-01

    Existing methods for predicting resource divisions in laboratory exchange networks do not take into account the sequential nature of the experimental setting. We extend network exchange theory by considering sequential exchange. We prove that Sequential Power-Dependence Theory—unlike

  3. Modelling sequentially scored item responses

    NARCIS (Netherlands)

    Akkermans, W.

    2000-01-01

    The sequential model can be used to describe the variable resulting from a sequential scoring process. In this paper two more item response models are investigated with respect to their suitability for sequential scoring: the partial credit model and the graded response model. The investigation is

  4. Forced Sequence Sequential Decoding

    DEFF Research Database (Denmark)

    Jensen, Ole Riis; Paaske, Erik

    1998-01-01

    We describe a new concatenated decoding scheme based on iterations between an inner sequentially decoded convolutional code of rate R=1/4 and memory M=23, and block interleaved outer Reed-Solomon (RS) codes with nonuniform profile. With this scheme decoding with good performance is possible as low...... as Eb/N0=0.6 dB, which is about 1.25 dB below the signal-to-noise ratio (SNR) that marks the cutoff rate for the full system. Accounting for about 0.45 dB due to the outer codes, sequential decoding takes place at about 1.7 dB below the SNR cutoff rate for the convolutional code. This is possible since...... the iteration process provides the sequential decoders with side information that allows a smaller average load and minimizes the probability of computational overflow. Analytical results for the probability that the first RS word is decoded after C computations are presented. These results are supported...

  5. Accuracy Limitations in Optical Linear Algebra Processors

    Science.gov (United States)

    Batsell, Stephen Gordon

    1990-01-01

    One of the limiting factors in applying optical linear algebra processors (OLAPs) to real-world problems has been the poor achievable accuracy of these processors. Little previous research has been done on determining noise sources from a systems perspective which would include noise generated in the multiplication and addition operations, noise from spatial variations across arrays, and from crosstalk. In this dissertation, we propose a second-order statistical model for an OLAP which incorporates all these system noise sources. We now apply this knowledge to determining upper and lower bounds on the achievable accuracy. This is accomplished by first translating the standard definition of accuracy used in electronic digital processors to analog optical processors. We then employ our second-order statistical model. Having determined a general accuracy equation, we consider limiting cases such as for ideal and noisy components. From the ideal case, we find the fundamental limitations on improving analog processor accuracy. From the noisy case, we determine the practical limitations based on both device and system noise sources. These bounds allow system trade-offs to be made both in the choice of architecture and in individual components in such a way as to maximize the accuracy of the processor. Finally, by determining the fundamental limitations, we show the system engineer when the accuracy desired can be achieved from hardware or architecture improvements and when it must come from signal pre-processing and/or post-processing techniques.

  6. Parallelized computation for computer simulation of electrocardiograms using personal computers with multi-core CPU and general-purpose GPU.

    Science.gov (United States)

    Shen, Wenfeng; Wei, Daming; Xu, Weimin; Zhu, Xin; Yuan, Shizhong

    2010-10-01

    Biological computations like electrocardiological modelling and simulation usually require high-performance computing environments. This paper introduces an implementation of parallel computation for computer simulation of electrocardiograms (ECGs) in a personal computer environment with an Intel CPU of Core (TM) 2 Quad Q6600 and a GPU of Geforce 8800GT, with software support by OpenMP and CUDA. It was tested in three parallelization device setups: (a) a four-core CPU without a general-purpose GPU, (b) a general-purpose GPU plus 1 core of CPU, and (c) a four-core CPU plus a general-purpose GPU. To effectively take advantage of a multi-core CPU and a general-purpose GPU, an algorithm based on load-prediction dynamic scheduling was developed and applied to setting (c). In the simulation with 1600 time steps, the speedup of the parallel computation as compared to the serial computation was 3.9 in setting (a), 16.8 in setting (b), and 20.0 in setting (c). This study demonstrates that a current PC with a multi-core CPU and a general-purpose GPU provides a good environment for parallel computations in biological modelling and simulation studies. Copyright 2010 Elsevier Ireland Ltd. All rights reserved.

  7. Acceleration of stereo-matching on multi-core CPU and GPU

    OpenAIRE

    Tian, Xu; Cockshott, Paul; Oehler, Susanne

    2014-01-01

    This paper presents an accelerated version of a\\ud dense stereo-correspondence algorithm for two different parallelism\\ud enabled architectures, multi-core CPU and GPU. The\\ud algorithm is part of the vision system developed for a binocular\\ud robot-head in the context of the CloPeMa 1 research project.\\ud This research project focuses on the conception of a new clothes\\ud folding robot with real-time and high resolution requirements\\ud for the vision system. The performance analysis shows th...

  8. VMware vSphere performance designing CPU, memory, storage, and networking for performance-intensive workloads

    CERN Document Server

    Liebowitz, Matt; Spies, Rynardt

    2014-01-01

    Covering the latest VMware vSphere software, an essential book aimed at solving vSphere performance problems before they happen VMware vSphere is the industry's most widely deployed virtualization solution. However, if you improperly deploy vSphere, performance problems occur. Aimed at VMware administrators and engineers and written by a team of VMware experts, this resource provides guidance on common CPU, memory, storage, and network-related problems. Plus, step-by-step instructions walk you through techniques for solving problems and shed light on possible causes behind the problems. Divu

  9. Simulation of small-angle scattering patterns using a CPU-efficient algorithm

    Science.gov (United States)

    Anitas, E. M.

    2017-12-01

    Small-angle scattering (of neutrons, x-ray or light; SAS) is a well-established experimental technique for structural analysis of disordered systems at nano and micro scales. For complex systems, such as super-molecular assemblies or protein molecules, analytic solutions of SAS intensity are generally not available. Thus, a frequent approach to simulate the corresponding patterns is to use a CPU-efficient version of the Debye formula. For this purpose, in this paper we implement the well-known DALAI algorithm in Mathematica software. We present calculations for a series of 2D Sierpinski gaskets and respectively of pentaflakes, obtained from chaos game representation.

  10. A Robust Ultra-Low Voltage CPU Utilizing Timing-Error Prevention

    OpenAIRE

    Hiienkari, Markus; Teittinen, Jukka; Koskinen, Lauri; Turnquist, Matthew; Mäkipää, Jani; Rantala, Arto; Sopanen, Matti; Kaltiokallio, Mikko

    2015-01-01

    To minimize energy consumption of a digital circuit, logic can be operated at sub- or near-threshold voltage. Operation at this region is challenging due to device and environment variations, and resulting performance may not be adequate to all applications. This article presents two variants of a 32-bit RISC CPU targeted for near-threshold voltage. Both CPUs are placed on the same die and manufactured in 28 nm CMOS process. They employ timing-error prevention with clock stretching to enable ...

  11. Improvement of CPU time of Linear Discriminant Function based on MNM criterion by IP

    Directory of Open Access Journals (Sweden)

    Shuichi Shinmura

    2014-05-01

    Full Text Available Revised IP-OLDF (optimal linear discriminant function by integer programming is a linear discriminant function to minimize the number of misclassifications (NM of training samples by integer programming (IP. However, IP requires large computation (CPU time. In this paper, it is proposed how to reduce CPU time by using linear programming (LP. In the first phase, Revised LP-OLDF is applied to all cases, and all cases are categorized into two groups: those that are classified correctly or those that are not classified by support vectors (SVs. In the second phase, Revised IP-OLDF is applied to the misclassified cases by SVs. This method is called Revised IPLP-OLDF.In this research, it is evaluated whether NM of Revised IPLP-OLDF is good estimate of the minimum number of misclassifications (MNM by Revised IP-OLDF. Four kinds of the real data—Iris data, Swiss bank note data, student data, and CPD data—are used as training samples. Four kinds of 20,000 re-sampling cases generated from these data are used as the evaluation samples. There are a total of 149 models of all combinations of independent variables by these data. NMs and CPU times of the 149 models are compared with Revised IPLP-OLDF and Revised IP-OLDF. The following results are obtained: 1 Revised IPLP-OLDF significantly improves CPU time. 2 In the case of training samples, all 149 NMs of Revised IPLP-OLDF are equal to the MNM of Revised IP-OLDF. 3 In the case of evaluation samples, most NMs of Revised IPLP-OLDF are equal to NM of Revised IP-OLDF. 4 Generalization abilities of both discriminant functions are concluded to be high, because the difference between the error rates of training and evaluation samples are almost within 2%.   Therefore, Revised IPLP-OLDF is recommended for the analysis of big data instead of Revised IP-OLDF. Next, Revised IPLP-OLDF is compared with LDF and logistic regression by 100-fold cross validation using 100 re-sampling samples. Means of error rates of

  12. A lock circuit for a multi-core processor

    DEFF Research Database (Denmark)

    2015-01-01

    An integrated circuit comprising a multiple processor cores and a lock circuit that comprises a queue register with respective bits set or reset via respective, connections dedicated to respective processor cores, whereby the queue register identifies those among the multiple processor cores...... that are enqueued in the queue register. Furthermore, the integrated circuit comprises a current register and a selector circuit configured to select a processor core and identify that processor core by a value in the current register. A selected processor core is a prioritized processor core among the cores...... configured with an integrated circuit; and a silicon die configured with an integrated circuit....

  13. Multi-processor developments in the United States for future high energy physics experiments and accelerators

    International Nuclear Information System (INIS)

    Gaines, I.

    1988-03-01

    The use of multi-processors for analysis and high-level triggering in High Energy Physics experiments, pioneered by the early emulator systems, has reached maturity, in particular with the multiple microprocessor systems in use at Fermilab. It is widely acknowledged that such systems will fulfill the major portion of the computing needs of future large experiments. Recent developments at Fermilab's Advanced Computer Program will make such systems even more powerful, cost-effective, and easier to use than they are at present. The next generation of microprocessors, already available, will provide CPU power of about one VAX 780 equivalent/$300, while supporting most VMS FORTRAN extensions and large (>8MB) amounts of memory. Low cost high density mass storage devices (based on video tape cartridge technology) will allow parallel I/O to remove potential I/O bottlenecks in systems of over 1000 VAX equipment processors. New interconnection schemes and system software will allow more flexible topologies and extremely high data bandwidth, especially for on-line systems. This talk will summarize the work at the Advanced Computer Program and the rest of the US in this field. 3 refs., 4 figs

  14. The TIGER trigger processor for the CAMERA detector at COMPASS-II

    Energy Technology Data Exchange (ETDEWEB)

    Baumann, Tobias; Buechele, Maximilian; Fischer, Horst; Gorzellik, Matthias; Grussenmeyer, Tobias; Herrmann, Florian; Joerg, Philipp; Kremser, Paul; Kunz, Tobias; Michalski, Christoph; Schopferer, Sebastian; Szameitat, Tobias [Physikalisches Institut der Universitaet Freiburg, Freiburg im Breisgau (Germany)

    2013-07-01

    In today's nuclear and high-energy physics experiments the background-induced occupancy of the detector channels can be quite high; therefore it is important to have sophisticated trigger subsystems which process the data in real-time to generate trigger objects for the global trigger decision. In this work we present a FPGA based low-latency trigger processor for the COMPASS-II experiment. TIGER is a high-performance trigger processor that was developed to fit perfectly in the GANDALF framework and extend its versatility. It is designed as a VXS module and is allocated to the central VXS switch slot, which has a direct link from every payload slot. The synchronous transfer protocol was optimized for low latencies and offers a bandwidth of up to 8 Gbit/s per link. The centerpiece of the board is a Xilinx Virtex-6 SX315T FPGA, offering vast programmable logic, embedded memory and DSP resources. It is accompanied by DDR3 memory, a COM Express CPU and a MXM GPU. Besides the VXS backplane ports, the board features two SFP+ transceivers, 32 LVDS inputs and 32 LVDS outputs to interface with the global trigger system and a Gigabit Ethernet port for configuration and monitoring.

  15. Application of multiwall carbon nanotubes for thermal dissipation in a micro-processor

    Energy Technology Data Exchange (ETDEWEB)

    Bui Hung Thang; Phan Ngoc Hong; Phan Hong Khoi; Phan Ngoc Minh [Institute of Materials Science, Vietnam Academy of Science and Technology, 18 Hoang Quoc Viet Road, Cau Giay District, Hanoi (Viet Nam)], E-mail: minhpn@ims.vast.ac.vn

    2009-09-01

    One of the most valuable properties of the carbon nanotubes materials is its high thermal conductivity with 2000 W/m.K (compared to thermal conductivity of Ag 419 W/m.K). It suggested an approach in applying the CNTs in thermal dissipation media to improve the performance of computer processors and other high power electronic devices. In this research, the multiwall carbon nanotubes (MWCNTs) made by thermal chemical vapour deposition (CVD) at our laboratory was employed as the heat dissipation media in a microprocessor a Personal Computer with configuration: Intel Pentium IV 3.066 GHz, 512Mb of RAM and Windows XP Service Pack 2 Operating System. We directly measured the temperature of the microprocessor during the operation of the computer in two modes: 100% usage CPU mode and over-clocking mode. The measured results showed that when using our thermal dissipation media (a mixture of the mentioned commercial thermal compound and 2 wt.%. MWCNTs), the temperature of the microprocessor decreased 5 deg. C, and the time for increasing the temperature of the microprocessor was three times longer than that when using commercial thermal compound. In over-clocking mode, the processor speed reached 3.8 GHz with 165 MHz of system bus clock speed; it was 1.24 times higher than that in non over-clocking mode. The results confirmed a promising way of using MWCNTs as the thermal dissipation media for microprocessor and high power electronic devices.

  16. Application of multiwall carbon nanotubes for thermal dissipation in a micro-processor

    Science.gov (United States)

    Thang, Bui Hung; Hong, Phan Ngoc; Khoi, Phan Hong; Minh, Phan Ngoc

    2009-09-01

    One of the most valuable properties of the carbon nanotubes materials is its high thermal conductivity with 2000 W/m.K (compared to thermal conductivity of Ag 419 W/m.K). It suggested an approach in applying the CNTs in thermal dissipation media to improve the performance of computer processors and other high power electronic devices. In this research, the multiwall carbon nanotubes (MWCNTs) made by thermal chemical vapour deposition (CVD) at our laboratory was employed as the heat dissipation media in a microprocessor a Personal Computer with configuration: Intel Pentium IV 3.066 GHz, 512Mb of RAM and Windows XP Service Pack 2 Operating System. We directly measured the temperature of the microprocessor during the operation of the computer in two modes: 100% usage CPU mode and over-clocking mode. The measured results showed that when using our thermal dissipation media (a mixture of the mentioned commercial thermal compound and 2 wt.%. MWCNTs), the temperature of the microprocessor decreased 5°C, and the time for increasing the temperature of the microprocessor was three times longer than that when using commercial thermal compound. In over-clocking mode, the processor speed reached 3.8 GHz with 165 MHz of system bus clock speed; it was 1.24 times higher than that in non over-clocking mode. The results confirmed a promising way of using MWCNTs as the thermal dissipation media for microprocessor and high power electronic devices.

  17. Application of multiwall carbon nanotubes for thermal dissipation in a micro-processor

    International Nuclear Information System (INIS)

    Bui Hung Thang; Phan Ngoc Hong; Phan Hong Khoi; Phan Ngoc Minh

    2009-01-01

    One of the most valuable properties of the carbon nanotubes materials is its high thermal conductivity with 2000 W/m.K (compared to thermal conductivity of Ag 419 W/m.K). It suggested an approach in applying the CNTs in thermal dissipation media to improve the performance of computer processors and other high power electronic devices. In this research, the multiwall carbon nanotubes (MWCNTs) made by thermal chemical vapour deposition (CVD) at our laboratory was employed as the heat dissipation media in a microprocessor a Personal Computer with configuration: Intel Pentium IV 3.066 GHz, 512Mb of RAM and Windows XP Service Pack 2 Operating System. We directly measured the temperature of the microprocessor during the operation of the computer in two modes: 100% usage CPU mode and over-clocking mode. The measured results showed that when using our thermal dissipation media (a mixture of the mentioned commercial thermal compound and 2 wt.%. MWCNTs), the temperature of the microprocessor decreased 5 deg. C, and the time for increasing the temperature of the microprocessor was three times longer than that when using commercial thermal compound. In over-clocking mode, the processor speed reached 3.8 GHz with 165 MHz of system bus clock speed; it was 1.24 times higher than that in non over-clocking mode. The results confirmed a promising way of using MWCNTs as the thermal dissipation media for microprocessor and high power electronic devices.

  18. Architectural design and analysis of a programmable image processor

    International Nuclear Information System (INIS)

    Siyal, M.Y.; Chowdhry, B.S.; Rajput, A.Q.K.

    2003-01-01

    In this paper we present an architectural design and analysis of a programmable image processor, nicknamed Snake. The processor was designed with a high degree of parallelism to speed up a range of image processing operations. Data parallelism found in array processors has been included into the architecture of the proposed processor. The implementation of commonly used image processing algorithms and their performance evaluation are also discussed. The performance of Snake is also compared with other types of processor architectures. (author)

  19. ad-heap: an Efficient Heap Data Structure for Asymmetric Multicore Processors

    DEFF Research Database (Denmark)

    Liu, Weifeng; Vinter, Brian

    2014-01-01

    and its child nodes must be executed sequentially, and (2) heaps, even d-heaps (d-ary heaps or d-way heaps), cannot supply enough wide data parallelism to these processors. Recent research proposed more versatile asymmetric multicore processors (AMPs) that consist of two types of cores (latency......-oriented cores with high single-thread performance and throughput-oriented cores with wide vector processing capability), unified memory address space and faster synchronization mechanism among cores with different ISAs. To leverage the AMPs for the heap data structure, in this paper we propose ad......-heap, an efficient heap data structure that introduces an implicit bridge structure and properly apportions workloads to the two types of cores. We implement a batch k-selection algorithm and conduct experiments on simulated AMP environments composed of real CPUs and GPUs. In our experiments on two representative...

  20. Designing of Vague Logic Based 2-Layered Framework for CPU Scheduler

    Directory of Open Access Journals (Sweden)

    Supriya Raheja

    2016-01-01

    Full Text Available Fuzzy based CPU scheduler has become of great interest by operating system because of its ability to handle imprecise information associated with task. This paper introduces an extension to the fuzzy based round robin scheduler to a Vague Logic Based Round Robin (VBRR scheduler. VBRR scheduler works on 2-layered framework. At the first layer, scheduler has a vague inference system which has the ability to handle the impreciseness of task using vague logic. At the second layer, Vague Logic Based Round Robin (VBRR scheduling algorithm works to schedule the tasks. VBRR scheduler has the learning capability based on which scheduler adapts intelligently an optimum length for time quantum. An optimum time quantum reduces the overhead on scheduler by reducing the unnecessary context switches which lead to improve the overall performance of system. The work is simulated using MATLAB and compared with the conventional round robin scheduler and the other two fuzzy based approaches to CPU scheduler. Given simulation analysis and results prove the effectiveness and efficiency of VBRR scheduler.

  1. Energy consumption optimization of the total-FETI solver by changing the CPU frequency

    Science.gov (United States)

    Horak, David; Riha, Lubomir; Sojka, Radim; Kruzik, Jakub; Beseda, Martin; Cermak, Martin; Schuchart, Joseph

    2017-07-01

    The energy consumption of supercomputers is one of the critical problems for the upcoming Exascale supercomputing era. The awareness of power and energy consumption is required on both software and hardware side. This paper deals with the energy consumption evaluation of the Finite Element Tearing and Interconnect (FETI) based solvers of linear systems, which is an established method for solving real-world engineering problems. We have evaluated the effect of the CPU frequency on the energy consumption of the FETI solver using a linear elasticity 3D cube synthetic benchmark. In this problem, we have evaluated the effect of frequency tuning on the energy consumption of the essential processing kernels of the FETI method. The paper provides results for two types of frequency tuning: (1) static tuning and (2) dynamic tuning. For static tuning experiments, the frequency is set before execution and kept constant during the runtime. For dynamic tuning, the frequency is changed during the program execution to adapt the system to the actual needs of the application. The paper shows that static tuning brings up 12% energy savings when compared to default CPU settings (the highest clock rate). The dynamic tuning improves this further by up to 3%.

  2. Fast CPU-based Monte Carlo simulation for radiotherapy dose calculation

    Science.gov (United States)

    Ziegenhein, Peter; Pirner, Sven; Kamerling, Cornelis Ph; Oelfke, Uwe

    2015-08-01

    Monte-Carlo (MC) simulations are considered to be the most accurate method for calculating dose distributions in radiotherapy. Its clinical application, however, still is limited by the long runtimes conventional implementations of MC algorithms require to deliver sufficiently accurate results on high resolution imaging data. In order to overcome this obstacle we developed the software-package PhiMC, which is capable of computing precise dose distributions in a sub-minute time-frame by leveraging the potential of modern many- and multi-core CPU-based computers. PhiMC is based on the well verified dose planning method (DPM). We could demonstrate that PhiMC delivers dose distributions which are in excellent agreement to DPM. The multi-core implementation of PhiMC scales well between different computer architectures and achieves a speed-up of up to 37× compared to the original DPM code executed on a modern system. Furthermore, we could show that our CPU-based implementation on a modern workstation is between 1.25× and 1.95× faster than a well-known GPU implementation of the same simulation method on a NVIDIA Tesla C2050. Since CPUs work on several hundreds of GB RAM the typical GPU memory limitation does not apply for our implementation and high resolution clinical plans can be calculated.

  3. The “Chimera”: An Off-The-Shelf CPU/GPGPU/FPGA Hybrid Computing Platform

    Directory of Open Access Journals (Sweden)

    Ra Inta

    2012-01-01

    Full Text Available The nature of modern astronomy means that a number of interesting problems exhibit a substantial computational bound and this situation is gradually worsening. Scientists, increasingly fighting for valuable resources on conventional high-performance computing (HPC facilities—often with a limited customizable user environment—are increasingly looking to hardware acceleration solutions. We describe here a heterogeneous CPU/GPGPU/FPGA desktop computing system (the “Chimera”, built with commercial-off-the-shelf components. We show that this platform may be a viable alternative solution to many common computationally bound problems found in astronomy, however, not without significant challenges. The most significant bottleneck in pipelines involving real data is most likely to be the interconnect (in this case the PCI Express bus residing on the CPU motherboard. Finally, we speculate on the merits of our Chimera system on the entire landscape of parallel computing, through the analysis of representative problems from UC Berkeley’s “Thirteen Dwarves.”

  4. Sequential decay of Reggeons

    International Nuclear Information System (INIS)

    Yoshida, Toshihiro

    1981-01-01

    Probabilities of meson production in the sequential decay of Reggeons, which are formed from the projectile and the target in the hadron-hadron to Reggeon-Reggeon processes, are investigated. It is assumed that pair creation of heavy quarks and simultaneous creation of two antiquark-quark pairs are negligible. The leading-order terms with respect to ratio of creation probabilities of anti s s to anti u u (anti d d) are calculated. The production cross sections in the target fragmentation region are given in terms of probabilities in the initial decay of the Reggeons and an effect of manyparticle production. (author)

  5. Synthetic Aperture Sequential Beamforming

    DEFF Research Database (Denmark)

    Kortbek, Jacob; Jensen, Jørgen Arendt; Gammelmark, Kim Løkke

    2008-01-01

    A synthetic aperture focusing (SAF) technique denoted Synthetic Aperture Sequential Beamforming (SASB) suitable for 2D and 3D imaging is presented. The technique differ from prior art of SAF in the sense that SAF is performed on pre-beamformed data contrary to channel data. The objective is to im......A synthetic aperture focusing (SAF) technique denoted Synthetic Aperture Sequential Beamforming (SASB) suitable for 2D and 3D imaging is presented. The technique differ from prior art of SAF in the sense that SAF is performed on pre-beamformed data contrary to channel data. The objective...... is to improve and obtain a more range independent lateral resolution compared to conventional dynamic receive focusing (DRF) without compromising frame rate. SASB is a two-stage procedure using two separate beamformers. First a set of Bmode image lines using a single focal point in both transmit and receive...... is stored. The second stage applies the focused image lines from the first stage as input data. The SASB method has been investigated using simulations in Field II and by off-line processing of data acquired with a commercial scanner. The performance of SASB with a static image object is compared with DRF...

  6. Real time processor for array speckle interferometry

    Science.gov (United States)

    Chin, Gordon; Florez, Jose; Borelli, Renan; Fong, Wai; Miko, Joseph; Trujillo, Carlos

    1989-02-01

    The authors are constructing a real-time processor to acquire image frames, perform array flat-fielding, execute a 64 x 64 element two-dimensional complex FFT (fast Fourier transform) and average the power spectrum, all within the 25 ms coherence time for speckles at near-IR (infrared) wavelength. The processor will be a compact unit controlled by a PC with real-time display and data storage capability. This will provide the ability to optimize observations and obtain results on the telescope rather than waiting several weeks before the data can be analyzed and viewed with offline methods. The image acquisition and processing, design criteria, and processor architecture are described.

  7. The UA1 upgrade calorimeter trigger processor

    International Nuclear Information System (INIS)

    Bains, M.; Charleton, D.; Ellis, N.; Garvey, J.; Gregory, J.; Jimack, M.P.; Jovanovic, P.; Kenyon, I.R.; Baird, S.A.; Campbell, D.; Cawthraw, M.; Coughlan, J.; Flynn, P.; Galagedera, S.; Grayer, G.; Halsall, R.; Shah, T.P.; Stephens, R.; Biddulph, P.; Eisenhandler, E.; Fensome, I.F.; Landon, M.; Robinson, D.; Oliver, J.; Sumorok, K.

    1990-01-01

    The increased luminosity of the improved CERN Collider and the more subtle signals of second-generation collider physics demand increasingly sophisticated triggering. We have built a new first-level trigger processor designed to use the excellent granularity of the UA1 upgrade calorimeter. This device is entirely digital and handles events in 1.5 μs, thus introducing no dead time. Its most novel feature is fast two-dimensional electromagnetic cluster-finding with the possibility of demanding an isolated shower of limited penetration. The processor allows multiple combinations of triggers on electromagnetic showers, hadronic jets and energy sums, including a total-energy veto of multiple interactions and a full vector sum of missing transverse energy. This hard-wired processor is about five times more powerful than its predecessor, and makes extensive use of pipelining techniques. It was used extensively in the 1988 and 1989 runs of the CERN Collider. (orig.)

  8. Embedded processor extensions for image processing

    Science.gov (United States)

    Thevenin, Mathieu; Paindavoine, Michel; Letellier, Laurent; Heyrman, Barthélémy

    2008-04-01

    The advent of camera phones marks a new phase in embedded camera sales. By late 2009, the total number of camera phones will exceed that of both conventional and digital cameras shipped since the invention of photography. Use in mobile phones of applications like visiophony, matrix code readers and biometrics requires a high degree of component flexibility that image processors (IPs) have not, to date, been able to provide. For all these reasons, programmable processor solutions have become essential. This paper presents several techniques geared to speeding up image processors. It demonstrates that a gain of twice is possible for the complete image acquisition chain and the enhancement pipeline downstream of the video sensor. Such results confirm the potential of these computing systems for supporting future applications.

  9. The UA1 upgrade calorimeter trigger processor

    International Nuclear Information System (INIS)

    Bains, N.; Baird, S.A.; Biddulph, P.

    1990-01-01

    The increased luminosity of the improved CERN Collider and the more subtle signals of second-generation collider physics demand increasingly sophisticated triggering. We have built a new first-level trigger processor designed to use the excellent granularity of the UA1 upgrade calorimeter. This device is entirely digital and handles events in 1.5 μs, thus introducing no deadtime. Its most novel feature is fast two-dimensional electromagnetic cluster-finding with the possibility of demanding an isolated shower of limited penetration. The processor allows multiple combinations of triggers on electromagnetic showers, hadronic jets and energy sums, including a total-energy veto of multiple interactions and a full vector sum of missing transverse energy. This hard-wired processor is about five times more powerful than its predecessor, and makes extensive use of pipelining techniques. It was used extensively in the 1988 and 1989 runs of the CERN Collider. (author)

  10. Development methods for VLSI-processors

    International Nuclear Information System (INIS)

    Horninger, K.; Sandweg, G.

    1982-01-01

    The aim of this project, which was originally planed for 3 years, was the development of modern system and circuit concepts, for VLSI-processors having a 32 bit wide data path. The result of this first years work is the concept of a general purpose processor. This processor is not only logically but also physically (on the chip) divided into four functional units: a microprogrammable instruction unit, an execution unit in slice technique, a fully associative cache memory and an I/O unit. For the ALU of the execution unit circuits in PLA and slice techniques have been realized. On the basis of regularity, area consumption and achievable performance the slice technique has been prefered. The designs utilize selftesting circuitry. (orig.) [de

  11. Software-defined reconfigurable microwave photonics processor.

    Science.gov (United States)

    Pérez, Daniel; Gasulla, Ivana; Capmany, José

    2015-06-01

    We propose, for the first time to our knowledge, a software-defined reconfigurable microwave photonics signal processor architecture that can be integrated on a chip and is capable of performing all the main functionalities by suitable programming of its control signals. The basic configuration is presented and a thorough end-to-end design model derived that accounts for the performance of the overall processor taking into consideration the impact and interdependencies of both its photonic and RF parts. We demonstrate the model versatility by applying it to several relevant application examples.

  12. Time Manager Software for a Flight Processor

    Science.gov (United States)

    Zoerne, Roger

    2012-01-01

    Data analysis is a process of inspecting, cleaning, transforming, and modeling data to highlight useful information and suggest conclusions. Accurate timestamps and a timeline of vehicle events are needed to analyze flight data. By moving the timekeeping to the flight processor, there is no longer a need for a redundant time source. If each flight processor is initially synchronized to GPS, they can freewheel and maintain a fairly accurate time throughout the flight with no additional GPS time messages received. How ever, additional GPS time messages will ensure an even greater accuracy. When a timestamp is required, a gettime function is called that immediately reads the time-base register.

  13. Simulation of a parallel processor on a serial processor: The neutron diffusion equation

    International Nuclear Information System (INIS)

    Honeck, H.C.

    1981-01-01

    Parallel processors could provide the nuclear industry with very high computing power at a very moderate cost. Will we be able to make effective use of this power. This paper explores the use of a very simple parallel processor for solving the neutron diffusion equation to predict power distributions in a nuclear reactor. We first describe a simple parallel processor and estimate its theoretical performance based on the current hardware technology. Next, we show how the parallel processor could be used to solve the neutron diffusion equation. We then present the results of some simulations of a parallel processor run on a serial processor and measure some of the expected inefficiencies. Finally we extrapolate the results to estimate how actual design codes would perform. We find that the standard numerical methods for solving the neutron diffusion equation are still applicable when used on a parallel processor. However, some simple modifications to these methods will be necessary if we are to achieve the full power of these new computers. (orig.) [de

  14. Special purpose processors for high energy physics applications

    International Nuclear Information System (INIS)

    Verkerk, C.

    1978-01-01

    The review on the subject of hardware processors from very fast decision logic for the split field magnet facility at CERN, to a point-finding processor used to relieve the data-acquisition minicomputer from the task of monitoring the SPS experiment is given. Block diagrams of decision making processor, point-finding processor, complanarity and opening angle processor and programmable track selector module are presented and discussed. The applications of fully programmable but slower processor on the one hand, and very fast and programmable decision logic on the other hand are given in this review

  15. Fast data reconstructed method of Fourier transform imaging spectrometer based on multi-core CPU

    Science.gov (United States)

    Yu, Chunchao; Du, Debiao; Xia, Zongze; Song, Li; Zheng, Weijian; Yan, Min; Lei, Zhenggang

    2017-10-01

    Imaging spectrometer can gain two-dimensional space image and one-dimensional spectrum at the same time, which shows high utility in color and spectral measurements, the true color image synthesis, military reconnaissance and so on. In order to realize the fast reconstructed processing of the Fourier transform imaging spectrometer data, the paper designed the optimization reconstructed algorithm with OpenMP parallel calculating technology, which was further used for the optimization process for the HyperSpectral Imager of `HJ-1' Chinese satellite. The results show that the method based on multi-core parallel computing technology can control the multi-core CPU hardware resources competently and significantly enhance the calculation of the spectrum reconstruction processing efficiency. If the technology is applied to more cores workstation in parallel computing, it will be possible to complete Fourier transform imaging spectrometer real-time data processing with a single computer.

  16. An FPGA Based Multiprocessing CPU for Beam Synchronous Timing in CERN's SPS and LHC

    CERN Document Server

    Ballester, F J; Gras, J J; Lewis, J; Savioz, J J; Serrano, J

    2003-01-01

    The Beam Synchronous Timing system (BST) will be used around the LHC and its injector, the SPS, to broadcast timing meassages and synchronize actions with the beam in different receivers. To achieve beam synchronization, the BST Master card encodes messages using the bunch clock, with a nominal value of 40.079 MHz for the LHC. These messages are produced by a set of tasks every revolution period, which is every 89 us for the LHC and every 23 us for the SPS, therefore imposing a hard real-time constraint on the system. To achieve determinism, the BST Master uses a dedicated CPU inside its main Field Programmable Gate Array (FPGA) featuring zero-delay hardware task switching and a reduced instruction set. This paper describes the BST Master card, stressing the main FPGA design, as well as the associated software, including the LynxOS driver and the tailor-made assembler.

  17. Deployment of 464XLAT (RFC6877) alongside IPv6-only CPU resources at WLCG sites

    Science.gov (United States)

    Froy, T. S.; Traynor, D. P.; Walker, C. J.

    2017-10-01

    IPv4 is now officially deprecated by the IETF. A significant amount of effort has already been expended by the HEPiX IPv6 Working Group on testing dual-stacked hosts and IPv6-only CPU resources. Dual-stack adds complexity and administrative overhead to sites that may already be starved of resource. This has resulted in a very slow uptake of IPv6 from WLCG sites. 464XLAT (RFC6877) is intended for IPv6 single-stack environments that require the ability to communicate with IPv4-only endpoints. This paper will present a deployment strategy for 464XLAT, operational experiences of using 464XLAT in production at a WLCG site and important information to consider prior to deploying 464XLAT.

  18. A Robust Ultra-Low Voltage CPU Utilizing Timing-Error Prevention

    Directory of Open Access Journals (Sweden)

    Markus Hiienkari

    2015-04-01

    Full Text Available To minimize energy consumption of a digital circuit, logic can be operated at sub- or near-threshold voltage. Operation at this region is challenging due to device and environment variations, and resulting performance may not be adequate to all applications. This article presents two variants of a 32-bit RISC CPU targeted for near-threshold voltage. Both CPUs are placed on the same die and manufactured in 28 nm CMOS process. They employ timing-error prevention with clock stretching to enable operation with minimal safety margins while maximizing performance and energy efficiency at a given operating point. Measurements show minimum energy of 3.15 pJ/cyc at 400 mV, which corresponds to 39% energy saving compared to operation based on static signoff timing.

  19. A Bit String Content Aware Chunking Strategy for Reduced CPU Energy on Cloud Storage

    Directory of Open Access Journals (Sweden)

    Bin Zhou

    2015-01-01

    Full Text Available In order to achieve energy saving and reduce the total cost of ownership, green storage has become the first priority for data center. Detecting and deleting the redundant data are the key factors to the reduction of the energy consumption of CPU, while high performance stable chunking strategy provides the groundwork for detecting redundant data. The existing chunking algorithm greatly reduces the system performance when confronted with big data and it wastes a lot of energy. Factors affecting the chunking performance are analyzed and discussed in the paper and a new fingerprint signature calculation is implemented. Furthermore, a Bit String Content Aware Chunking Strategy (BCCS is put forward. This strategy reduces the cost of signature computation in chunking process to improve the system performance and cuts down the energy consumption of the cloud storage data center. On the basis of relevant test scenarios and test data of this paper, the advantages of the chunking strategy are verified.

  20. Enhanced round robin CPU scheduling with burst time based time quantum

    Science.gov (United States)

    Indusree, J. R.; Prabadevi, B.

    2017-11-01

    Process scheduling is a very important functionality of Operating system. The main-known process-scheduling algorithms are First Come First Serve (FCFS) algorithm, Round Robin (RR) algorithm, Priority scheduling algorithm and Shortest Job First (SJF) algorithm. Compared to its peers, Round Robin (RR) algorithm has the advantage that it gives fair share of CPU to the processes which are already in the ready-queue. The effectiveness of the RR algorithm greatly depends on chosen time quantum value. Through this research paper, we are proposing an enhanced algorithm called Enhanced Round Robin with Burst-time based Time Quantum (ERRBTQ) process scheduling algorithm which calculates time quantum as per the burst-time of processes already in ready queue. The experimental results and analysis of ERRBTQ algorithm clearly indicates the improved performance when compared with conventional RR and its variants.

  1. XOP, a fast versatile processor, as a building block for parallel processing in high energy physics experiments

    International Nuclear Information System (INIS)

    Baehler, P.; Bosco, N.; Lingjaerde, T.; Ljuslin, C.; Van Praag, A.; Werner, P.

    1986-01-01

    The XOP processor has been designed for trigger calculation and data compression in high energy physics experiments. Therefore, emphasis has been placed upon fast execution and high input/output rate. The fast execution is achieved by a wide instruction word holding operations which are executed concurrently. Thus, the arithmetic operations, data address calculations, data accessing, condition checking, loop count checking and next instruction evaluation all overlap in time. In conventional micro-processors these operations are performed sequentially. In addition, the instruction set comprises not only the classical computer instructions, but also specialized instructions suitable for trigger calculations, such as bit search, population count, loose compare and vector instructions. In order to achieve a high input/output rate, each XOP ECLine interface board is equipped with an input and an output port which fulfil the LeCroy ECLine specifications. The autonomous input port allows a data rate of 40 Mbytes/sec, while the program controlled output port allows 20 Mbytes/sec. For Fastbus based systems a dual Fastbus master interface is under design which allows to build up a Fastbus multi-processor system. This design is being done in collaboration with LAPP Annecy for the CERN Lep L3 experiment. Their scheme comprises 4-5 XOP processors, each of them with a master interface on a data input segment and a master interface on a data output segment. This paper describes the structure of the XOP processor, the interface capabilities and the software development and debugging tools. (Auth.)

  2. Noise limitations in optical linear algebra processors.

    Science.gov (United States)

    Batsell, S G; Jong, T L; Walkup, J F; Krile, T F

    1990-05-10

    A general statistical noise model is presented for optical linear algebra processors. A statistical analysis which includes device noise, the multiplication process, and the addition operation is undertaken. We focus on those processes which are architecturally independent. Finally, experimental results which verify the analytical predictions are also presented.

  3. Cassava processors' awareness of occupational and environmental ...

    African Journals Online (AJOL)

    A larger percentage (74.5%) of the respondents indicated that the Agricultural Development Programme (ADP) is their source of information. The result also showed that processor's awareness of occupational hazards associated with the different stages of cassava processing vary because their involvement in these stages

  4. A high-speed analog neural processor

    NARCIS (Netherlands)

    Masa, P.; Masa, Peter; Hoen, Klaas; Hoen, Klaas; Wallinga, Hans

    1994-01-01

    Targeted at high-energy physics research applications, our special-purpose analog neural processor can classify up to 70 dimensional vectors within 50 nanoseconds. The decision-making process of the implemented feedforward neural network enables this type of computation to tolerate weight

  5. Beeldverwerking met de Micron Automatic Processor

    OpenAIRE

    Goyens, Frank

    2017-01-01

    Deze thesis is een onderzoek naar toepassingen binnen beeldverwerking op de Micron Automata Processor hardware. De hardware wordt vergeleken met populaire hedendaagse hardware. Ook bevat dit onderzoek nuttige informatie en strategieën voor het ontwikkelen van nieuwe toepassingen. Bevindingen in dit onderzoek omvatten proof of concept algoritmes en een praktische toepassing.

  6. 7 CFR 1215.14 - Processor.

    Science.gov (United States)

    2010-01-01

    ... 7 Agriculture 10 2010-01-01 2010-01-01 false Processor. 1215.14 Section 1215.14 Agriculture Regulations of the Department of Agriculture (Continued) AGRICULTURAL MARKETING SERVICE (MARKETING AGREEMENTS... CONSUMER INFORMATION Popcorn Promotion, Research, and Consumer Information Order Definitions § 1215.14...

  7. Simplifying cochlear implant speech processor fitting

    NARCIS (Netherlands)

    Willeboer, C.

    2008-01-01

    Conventional fittings of the speech processor of a cochlear implant (CI) rely to a large extent on the implant recipient's subjective responses. For each of the 22 intracochlear electrodes the recipient has to indicate the threshold level (T-level) and comfortable loudness level (C-level) while

  8. Vector and parallel processors in computational science

    International Nuclear Information System (INIS)

    Duff, I.S.; Reid, J.K.

    1985-01-01

    This book presents the papers given at a conference which reviewed the new developments in parallel and vector processing. Topics considered at the conference included hardware (array processors, supercomputers), programming languages, software aids, numerical methods (e.g., Monte Carlo algorithms, iterative methods, finite elements, optimization), and applications (e.g., neutron transport theory, meteorology, image processing)

  9. Space Station Water Processor Process Pump

    Science.gov (United States)

    Parker, David

    1995-01-01

    This report presents the results of the development program conducted under contract NAS8-38250-12 related to the International Space Station (ISS) Water Processor (WP) Process Pump. The results of the Process Pumps evaluation conducted on this program indicates that further development is required in order to achieve the performance and life requirements for the ISSWP.

  10. Interleaved Subtask Scheduling on Multi Processor SOC

    NARCIS (Netherlands)

    Zhe, M.

    2006-01-01

    The ever-progressing semiconductor processing technique has integrated more and more embedded processors on a single system-on-achip (SoC). With such powerful SoC platforms, and also due to the stringent time-to-market deadlines, many functionalities which used to be implemented in ASICs are

  11. User manual Dieka PreProcessor

    NARCIS (Netherlands)

    Valkering, Kasper

    2000-01-01

    This is the user manual belonging to the Dieka-PreProcessor. This application was written by Wenhua Cao and revised and expanded by Kasper Valkering. The aim of this preproccesor is to be able to draw and mesh extrusion dies in ProEngineer, and do the FE-calculation in Dieka. The preprocessor makes

  12. Globe hosts launch of new processor

    CERN Multimedia

    2006-01-01

    Launch of the quadecore processor chip at the Globe. On 14 November, in a series of major media events around the world, the chip-maker Intel launched its new 'quadcore' processor. For the regions of Europe, the Middle East and Africa, the day-long launch event took place in CERN's Globe of Science and Innovation, with over 30 journalists in attendance, coming from as far away as Johannesburg and Dubai. CERN was a significant choice for the event: the first tests of this new generation of processor in Europe had been made at CERN over the preceding months, as part of CERN openlab, a research partnership with leading IT companies such as Intel, HP and Oracle. The event also provided the opportunity for the journalists to visit ATLAS and the CERN Computer Centre. The strategy of putting multiple processor cores on the same chip, which has been pursued by Intel and other chip-makers in the last few years, represents an important departure from the more traditional improvements in the sheer speed of such chips. ...

  13. Event analysis using a massively parallel processor

    International Nuclear Information System (INIS)

    Bale, A.; Gerelle, E.; Messersmith, J.; Warren, R.; Hoek, J.

    1990-01-01

    This paper describes a system for performing histogramming of n-tuple data at interactive rates using a commercial SIMD processor array connected to a work-station running the well-known Physics Analysis Workstation software (PAW). Results indicate that an order of magnitude performance improvement over current RISC technology is easily achievable

  14. Performance evaluation of throughput computing workloads using multi-core processors and graphics processors

    Science.gov (United States)

    Dave, Gaurav P.; Sureshkumar, N.; Blessy Trencia Lincy, S. S.

    2017-11-01

    Current trend in processor manufacturing focuses on multi-core architectures rather than increasing the clock speed for performance improvement. Graphic processors have become as commodity hardware for providing fast co-processing in computer systems. Developments in IoT, social networking web applications, big data created huge demand for data processing activities and such kind of throughput intensive applications inherently contains data level parallelism which is more suited for SIMD architecture based GPU. This paper reviews the architectural aspects of multi/many core processors and graphics processors. Different case studies are taken to compare performance of throughput computing applications using shared memory programming in OpenMP and CUDA API based programming.

  15. Data collection from FASTBUS to a DEC UNIBUS processor through the UNIBUS-Processor Interface

    International Nuclear Information System (INIS)

    Larwill, M.; Barsotti, E.; Lesny, D.; Pordes, R.

    1983-01-01

    This paper describes the use of the UNIBUS Processor Interface, an interface between FASTBUS and the Digital Equipment Corporation UNIBUS. The UPI was developed by Fermilab and the University of Illinois. Details of the use of this interface in a high energy physics experiment at Fermilab are given. The paper includes a discussion of the operation of the UPI on the UNIBUS of a VAX-11, and plans for using the UPI to perform data acquisition from FASTBUS to a VAX-11 Processor

  16. Performance of the OVERFLOW-MLP and LAURA-MLP CFD Codes on the NASA Ames 512 CPU Origin System

    Science.gov (United States)

    Taft, James R.

    2000-01-01

    The shared memory Multi-Level Parallelism (MLP) technique, developed last year at NASA Ames has been very successful in dramatically improving the performance of important NASA CFD codes. This new and very simple parallel programming technique was first inserted into the OVERFLOW production CFD code in FY 1998. The OVERFLOW-MLP code's parallel performance scaled linearly to 256 CPUs on the NASA Ames 256 CPU Origin 2000 system (steger). Overall performance exceeded 20.1 GFLOP/s, or about 4.5x the performance of a dedicated 16 CPU C90 system. All of this was achieved without any major modification to the original vector based code. The OVERFLOW-MLP code is now in production on the inhouse Origin systems as well as being used offsite at commercial aerospace companies. Partially as a result of this work, NASA Ames has purchased a new 512 CPU Origin 2000 system to further test the limits of parallel performance for NASA codes of interest. This paper presents the performance obtained from the latest optimization efforts on this machine for the LAURA-MLP and OVERFLOW-MLP codes. The Langley Aerothermodynamics Upwind Relaxation Algorithm (LAURA) code is a key simulation tool in the development of the next generation shuttle, interplanetary reentry vehicles, and nearly all "X" plane development. This code sustains about 4-5 GFLOP/s on a dedicated 16 CPU C90. At this rate, expected workloads would require over 100 C90 CPU years of computing over the next few calendar years. It is not feasible to expect that this would be affordable or available to the user community. Dramatic performance gains on cheaper systems are needed. This code is expected to be perhaps the largest consumer of NASA Ames compute cycles per run in the coming year.The OVERFLOW CFD code is extensively used in the government and commercial aerospace communities to evaluate new aircraft designs. It is one of the largest consumers of NASA supercomputing cycles and large simulations of highly resolved full

  17. Array processors based on Gaussian fraction-free method

    Energy Technology Data Exchange (ETDEWEB)

    Peng, S; Sedukhin, S [Aizu Univ., Aizuwakamatsu, Fukushima (Japan); Sedukhin, I

    1998-03-01

    The design of algorithmic array processors for solving linear systems of equations using fraction-free Gaussian elimination method is presented. The design is based on a formal approach which constructs a family of planar array processors systematically. These array processors are synthesized and analyzed. It is shown that some array processors are optimal in the framework of linear allocation of computations and in terms of number of processing elements and computing time. (author)

  18. Research on parallel algorithm for sequential pattern mining

    Science.gov (United States)

    Zhou, Lijuan; Qin, Bai; Wang, Yu; Hao, Zhongxiao

    2008-03-01

    Sequential pattern mining is the mining of frequent sequences related to time or other orders from the sequence database. Its initial motivation is to discover the laws of customer purchasing in a time section by finding the frequent sequences. In recent years, sequential pattern mining has become an important direction of data mining, and its application field has not been confined to the business database and has extended to new data sources such as Web and advanced science fields such as DNA analysis. The data of sequential pattern mining has characteristics as follows: mass data amount and distributed storage. Most existing sequential pattern mining algorithms haven't considered the above-mentioned characteristics synthetically. According to the traits mentioned above and combining the parallel theory, this paper puts forward a new distributed parallel algorithm SPP(Sequential Pattern Parallel). The algorithm abides by the principal of pattern reduction and utilizes the divide-and-conquer strategy for parallelization. The first parallel task is to construct frequent item sets applying frequent concept and search space partition theory and the second task is to structure frequent sequences using the depth-first search method at each processor. The algorithm only needs to access the database twice and doesn't generate the candidated sequences, which abates the access time and improves the mining efficiency. Based on the random data generation procedure and different information structure designed, this paper simulated the SPP algorithm in a concrete parallel environment and implemented the AprioriAll algorithm. The experiments demonstrate that compared with AprioriAll, the SPP algorithm had excellent speedup factor and efficiency.

  19. Adaptive sequential controller

    Energy Technology Data Exchange (ETDEWEB)

    El-Sharkawi, Mohamed A. (Renton, WA); Xing, Jian (Seattle, WA); Butler, Nicholas G. (Newberg, OR); Rodriguez, Alonso (Pasadena, CA)

    1994-01-01

    An adaptive sequential controller (50/50') for controlling a circuit breaker (52) or other switching device to substantially eliminate transients on a distribution line caused by closing and opening the circuit breaker. The device adaptively compensates for changes in the response time of the circuit breaker due to aging and environmental effects. A potential transformer (70) provides a reference signal corresponding to the zero crossing of the voltage waveform, and a phase shift comparator circuit (96) compares the reference signal to the time at which any transient was produced when the circuit breaker closed, producing a signal indicative of the adaptive adjustment that should be made. Similarly, in controlling the opening of the circuit breaker, a current transformer (88) provides a reference signal that is compared against the time at which any transient is detected when the circuit breaker last opened. An adaptive adjustment circuit (102) produces a compensation time that is appropriately modified to account for changes in the circuit breaker response, including the effect of ambient conditions and aging. When next opened or closed, the circuit breaker is activated at an appropriately compensated time, so that it closes when the voltage crosses zero and opens when the current crosses zero, minimizing any transients on the distribution line. Phase angle can be used to control the opening of the circuit breaker relative to the reference signal provided by the potential transformer.

  20. Adaptive sequential controller

    Science.gov (United States)

    El-Sharkawi, Mohamed A.; Xing, Jian; Butler, Nicholas G.; Rodriguez, Alonso

    1994-01-01

    An adaptive sequential controller (50/50') for controlling a circuit breaker (52) or other switching device to substantially eliminate transients on a distribution line caused by closing and opening the circuit breaker. The device adaptively compensates for changes in the response time of the circuit breaker due to aging and environmental effects. A potential transformer (70) provides a reference signal corresponding to the zero crossing of the voltage waveform, and a phase shift comparator circuit (96) compares the reference signal to the time at which any transient was produced when the circuit breaker closed, producing a signal indicative of the adaptive adjustment that should be made. Similarly, in controlling the opening of the circuit breaker, a current transformer (88) provides a reference signal that is compared against the time at which any transient is detected when the circuit breaker last opened. An adaptive adjustment circuit (102) produces a compensation time that is appropriately modified to account for changes in the circuit breaker response, including the effect of ambient conditions and aging. When next opened or closed, the circuit breaker is activated at an appropriately compensated time, so that it closes when the voltage crosses zero and opens when the current crosses zero, minimizing any transients on the distribution line. Phase angle can be used to control the opening of the circuit breaker relative to the reference signal provided by the potential transformer.

  1. Interaction sorting method for molecular dynamics on multi-core SIMD CPU architecture.

    Science.gov (United States)

    Matvienko, Sergey; Alemasov, Nikolay; Fomin, Eduard

    2015-02-01

    Molecular dynamics (MD) is widely used in computational biology for studying binding mechanisms of molecules, molecular transport, conformational transitions, protein folding, etc. The method is computationally expensive; thus, the demand for the development of novel, much more efficient algorithms is still high. Therefore, the new algorithm designed in 2007 and called interaction sorting (IS) clearly attracted interest, as it outperformed the most efficient MD algorithms. In this work, a new IS modification is proposed which allows the algorithm to utilize SIMD processor instructions. This paper shows that the improvement provides an additional gain in performance, 9% to 45% in comparison to the original IS method.

  2. Profiling CPU-bound workloads on Intel Haswell-EP platforms

    CERN Document Server

    Guerri, Marco; Cristovao, Cordeiro; CERN. Geneva. IT Department

    2017-01-01

    With the increasing adoption of public and private cloud resources to support the demands in terms of computing capacity of the WLCG, the HEP community has begun studying several benchmarking applications aimed at continuously assessing the performance of virtual machines procured from commercial providers. In order to characterise the behaviour of these benchmarks, in-depth profiling activities have been carried out. In this document we outline our experience in profiling one specific application, the ATLAS Kit Validation, in an attempt to explain an unexpected distribution in the performance samples obtained on systems based on Intel Haswell-EP processors.

  3. Lipsi: Probably the Smallest Processor in the World

    DEFF Research Database (Denmark)

    Schoeberl, Martin

    2018-01-01

    While research on high-performance processors is important, it is also interesting to explore processor architectures at the other end of the spectrum: tiny processor cores for auxiliary functions. While it is common to implement small circuits for such functions, such as a serial port, in dedica...... at a minimal cost....

  4. Space-charge-dominated beam dynamics simulations using the massively parallel processors (MPPs) of the Cray T3D

    International Nuclear Information System (INIS)

    Liu, H.

    1996-01-01

    Computer simulations using the multi-particle code PARMELA with a three-dimensional point-by-point space charge algorithm have turned out to be very helpful in supporting injector commissioning and operations at Thomas Jefferson National Accelerator Facility (Jefferson Lab, formerly called CEBAF). However, this algorithm, which defines a typical N 2 problem in CPU time scaling, is very time-consuming when N, the number of macro-particles, is large. Therefore, it is attractive to use massively parallel processors (MPPs) to speed up the simulations. Motivated by this, the authors modified the space charge subroutine for using the MPPs of the Cray T3D. The techniques used to parallelize and optimize the code on the T3D are discussed in this paper. The performance of the code on the T3D is examined in comparison with a Parallel Vector Processing supercomputer of the Cray C90 and an HP 735/15 high-end workstation

  5. A Hybrid Scheme Based on Pipelining and Multitasking in Mobile Application Processors for Advanced Video Coding

    Directory of Open Access Journals (Sweden)

    Muhammad Asif

    2015-01-01

    Full Text Available One of the key requirements for mobile devices is to provide high-performance computing at lower power consumption. The processors used in these devices provide specific hardware resources to handle computationally intensive video processing and interactive graphical applications. Moreover, processors designed for low-power applications may introduce limitations on the availability and usage of resources, which present additional challenges to the system designers. Owing to the specific design of the JZ47x series of mobile application processors, a hybrid software-hardware implementation scheme for H.264/AVC encoder is proposed in this work. The proposed scheme distributes the encoding tasks among hardware and software modules. A series of optimization techniques are developed to speed up the memory access and data transferring among memories. Moreover, an efficient data reusage design is proposed for the deblock filter video processing unit to reduce the memory accesses. Furthermore, fine grained macroblock (MB level parallelism is effectively exploited and a pipelined approach is proposed for efficient utilization of hardware processing cores. Finally, based on parallelism in the proposed design, encoding tasks are distributed between two processing cores. Experiments show that the hybrid encoder is 12 times faster than a highly optimized sequential encoder due to proposed techniques.

  6. Overtaking CPU DBMSes with a GPU in whole-query analytic processing with parallelism-friendly execution plan optimization

    NARCIS (Netherlands)

    A. Agbaria (Adnan); D. Minor (David); N. Peterfreund (Natan); E. Rozenberg (Eyal); O. Rosenberg (Ofer); Huawei Research

    2016-01-01

    textabstractExisting work on accelerating analytic DB query processing with (discrete) GPUs fails to fully realize their potential for speedup through parallelism: Published results do not achieve significant speedup over more performant CPU-only DBMSes when processing complete queries. This

  7. On the Feasibility and Limitations of Just-in-Time Instruction Set Extension for FPGA-Based Reconfigurable Processors

    Directory of Open Access Journals (Sweden)

    Mariusz Grad

    2012-01-01

    Full Text Available Reconfigurable instruction set processors provide the possibility of tailor the instruction set of a CPU to a particular application. While this customization process could be performed during runtime in order to adapt the CPU to the currently executed workload, this use case has been hardly investigated. In this paper, we study the feasibility of moving the customization process to runtime and evaluate the relation of the expected speedups and the associated overheads. To this end, we present a tool flow that is tailored to the requirements of this just-in-time ASIP specialization scenario. We evaluate our methods by targeting our previously introduced Woolcano reconfigurable ASIP architecture for a set of applications from the SPEC2006, SPEC2000, MiBench, and SciMark2 benchmark suites. Our results show that just-in-time ASIP specialization is promising for embedded computing applications, where average speedups of 5x can be achieved by spending 50 minutes for custom instruction identification and hardware generation. These overheads will be compensated if the applications execute for more than 2 hours. For the scientific computing benchmarks, the achievable speedup is only 1.2x, which requires significant execution times in the order of days to amortize the overheads.

  8. Optimizing the performance of streaming numerical kernels on the IBM Blue Gene/P PowerPC 450 processor

    KAUST Repository

    Malas, Tareq Majed Yasin

    2012-05-21

    Several emerging petascale architectures use energy-efficient processors with vectorized computational units and in-order thread processing. On these architectures the sustained performance of streaming numerical kernels, ubiquitous in the solution of partial differential equations, represents a challenge despite the regularity of memory access. Sophisticated optimization techniques are required to fully utilize the CPU. We propose a new method for constructing streaming numerical kernels using a high-level assembly synthesis and optimization framework. We describe an implementation of this method in Python targeting the IBM® Blue Gene®/P supercomputer\\'s PowerPC® 450 core. This paper details the high-level design, construction, simulation, verification, and analysis of these kernels utilizing a subset of the CPU\\'s instruction set. We demonstrate the effectiveness of our approach by implementing several three-dimensional stencil kernels over a variety of cached memory scenarios and analyzing the mechanically scheduled variants, including a 27-point stencil achieving a 1.7× speedup over the best previously published results. © The Author(s) 2012.

  9. Efficient Execution of Microscopy Image Analysis on CPU, GPU, and MIC Equipped Cluster Systems.

    Science.gov (United States)

    Andrade, G; Ferreira, R; Teodoro, George; Rocha, Leonardo; Saltz, Joel H; Kurc, Tahsin

    2014-10-01

    High performance computing is experiencing a major paradigm shift with the introduction of accelerators, such as graphics processing units (GPUs) and Intel Xeon Phi (MIC). These processors have made available a tremendous computing power at low cost, and are transforming machines into hybrid systems equipped with CPUs and accelerators. Although these systems can deliver a very high peak performance, making full use of its resources in real-world applications is a complex problem. Most current applications deployed to these machines are still being executed in a single processor, leaving other devices underutilized. In this paper we explore a scenario in which applications are composed of hierarchical data flow tasks which are allocated to nodes of a distributed memory machine in coarse-grain, but each of them may be composed of several finer-grain tasks which can be allocated to different devices within the node. We propose and implement novel performance aware scheduling techniques that can be used to allocate tasks to devices. We evaluate our techniques using a pathology image analysis application used to investigate brain cancer morphology, and our experimental evaluation shows that the proposed scheduling strategies significantly outperforms other efficient scheduling techniques, such as Heterogeneous Earliest Finish Time - HEFT, in cooperative executions using CPUs, GPUs, and MICs. We also experimentally show that our strategies are less sensitive to inaccuracy in the scheduling input data and that the performance gains are maintained as the application scales.

  10. Quantum Inequalities and Sequential Measurements

    International Nuclear Information System (INIS)

    Candelpergher, B.; Grandouz, T.; Rubinx, J.L.

    2011-01-01

    In this article, the peculiar context of sequential measurements is chosen in order to analyze the quantum specificity in the two most famous examples of Heisenberg and Bell inequalities: Results are found at some interesting variance with customary textbook materials, where the context of initial state re-initialization is described. A key-point of the analysis is the possibility of defining Joint Probability Distributions for sequential random variables associated to quantum operators. Within the sequential context, it is shown that Joint Probability Distributions can be defined in situations where not all of the quantum operators (corresponding to random variables) do commute two by two. (authors)

  11. Bulk-memory processor for data acquisition

    International Nuclear Information System (INIS)

    Nelson, R.O.; McMillan, D.E.; Sunier, J.W.; Meier, M.; Poore, R.V.

    1981-01-01

    To meet the diverse needs and data rate requirements at the Van de Graaff and Weapons Neutron Research (WNR) facilities, a bulk memory system has been implemented which includes a fast and flexible processor. This bulk memory processor (BMP) utilizes bit slice and microcode techniques and features a 24 bit wide internal architecture allowing direct addressing of up to 16 megawords of memory and histogramming up to 16 million counts per channel without overflow. The BMP is interfaced to the MOSTEK MK 8000 bulk memory system and to the standard MODCOMP computer I/O bus. Coding for the BMP both at the microcode level and with macro instructions is supported. The generalized data acquisition system has been extended to support the BMP in a manner transparent to the user

  12. Design of Processors with Reconfigurable Microarchitecture

    Directory of Open Access Journals (Sweden)

    Andrey Mokhov

    2014-01-01

    Full Text Available Energy becomes a dominating factor for a wide spectrum of computations: from intensive data processing in “big data” companies resulting in large electricity bills, to infrastructure monitoring with wireless sensors relying on energy harvesting. In this context it is essential for a computation system to be adaptable to the power supply and the service demand, which often vary dramatically during runtime. In this paper we present an approach to building processors with reconfigurable microarchitecture capable of changing the way they fetch and execute instructions depending on energy availability and application requirements. We show how to use Conditional Partial Order Graphs to formally specify the microarchitecture of such a processor, explore the design possibilities for its instruction set, and synthesise the instruction decoder using correct-by-construction techniques. The paper is focused on the design methodology, which is evaluated by implementing a power-proportional version of Intel 8051 microprocessor.

  13. Real time processor for array speckle interferometry

    International Nuclear Information System (INIS)

    Chin, G.; Florez, J.; Borelli, R.; Fong, W.; Miko, J.; Trujillo, C.

    1989-01-01

    With the construction of several new large aperture telescopes and the development of large format array detectors in the near IR, the ability to obtain diffraction limited seeing via IR array speckle interferometry offers a powerful tool. We are constructing a real-time processor to acquire image frames, perform array flat-fielding, execute a 64 x 64 element 2D complex FFT, and to average the power spectrum all within the 25 msec coherence time for speckles at near IR wavelength. The processor is a compact unit controlled by a PC with real time display and data storage capability. It provides the ability to optimize observations and obtain results on the telescope rather than waiting several weeks before the data can be analyzed and viewed with off-line methods

  14. Parallel processor programs in the Federal Government

    Science.gov (United States)

    Schneck, P. B.; Austin, D.; Squires, S. L.; Lehmann, J.; Mizell, D.; Wallgren, K.

    1985-01-01

    In 1982, a report dealing with the nation's research needs in high-speed computing called for increased access to supercomputing resources for the research community, research in computational mathematics, and increased research in the technology base needed for the next generation of supercomputers. Since that time a number of programs addressing future generations of computers, particularly parallel processors, have been started by U.S. government agencies. The present paper provides a description of the largest government programs in parallel processing. Established in fiscal year 1985 by the Institute for Defense Analyses for the National Security Agency, the Supercomputing Research Center will pursue research to advance the state of the art in supercomputing. Attention is also given to the DOE applied mathematical sciences research program, the NYU Ultracomputer project, the DARPA multiprocessor system architectures program, NSF research on multiprocessor systems, ONR activities in parallel computing, and NASA parallel processor projects.

  15. RISC Processors and High Performance Computing

    Science.gov (United States)

    Bailey, David H.; Saini, Subhash; Craw, James M. (Technical Monitor)

    1995-01-01

    This tutorial will discuss the top five RISC microprocessors and the parallel systems in which they are used. It will provide a unique cross-machine comparison not available elsewhere. The effective performance of these processors will be compared by citing standard benchmarks in the context of real applications. The latest NAS Parallel Benchmarks, both absolute performance and performance per dollar, will be listed. The next generation of the NPB will be described. The tutorial will conclude with a discussion of future directions in the field. Technology Transfer Considerations: All of these computer systems are commercially available internationally. Information about these processors is available in the public domain, mostly from the vendors themselves. The NAS Parallel Benchmarks and their results have been previously approved numerous times for public release, beginning back in 1991.

  16. Performance analysis of the FDTD method applied to holographic volume gratings: Multi-core CPU versus GPU computing

    Science.gov (United States)

    Francés, J.; Bleda, S.; Neipp, C.; Márquez, A.; Pascual, I.; Beléndez, A.

    2013-03-01

    The finite-difference time-domain method (FDTD) allows electromagnetic field distribution analysis as a function of time and space. The method is applied to analyze holographic volume gratings (HVGs) for the near-field distribution at optical wavelengths. Usually, this application requires the simulation of wide areas, which implies more memory and time processing. In this work, we propose a specific implementation of the FDTD method including several add-ons for a precise simulation of optical diffractive elements. Values in the near-field region are computed considering the illumination of the grating by means of a plane wave for different angles of incidence and including absorbing boundaries as well. We compare the results obtained by FDTD with those obtained using a matrix method (MM) applied to diffraction gratings. In addition, we have developed two optimized versions of the algorithm, for both CPU and GPU, in order to analyze the improvement of using the new NVIDIA Fermi GPU architecture versus highly tuned multi-core CPU as a function of the size simulation. In particular, the optimized CPU implementation takes advantage of the arithmetic and data transfer streaming SIMD (single instruction multiple data) extensions (SSE) included explicitly in the code and also of multi-threading by means of OpenMP directives. A good agreement between the results obtained using both FDTD and MM methods is obtained, thus validating our methodology. Moreover, the performance of the GPU is compared to the SSE+OpenMP CPU implementation, and it is quantitatively determined that a highly optimized CPU program can be competitive for a wider range of simulation sizes, whereas GPU computing becomes more powerful for large-scale simulations.

  17. Multi-Core Processor Memory Contention Benchmark Analysis Case Study

    Science.gov (United States)

    Simon, Tyler; McGalliard, James

    2009-01-01

    Multi-core processors dominate current mainframe, server, and high performance computing (HPC) systems. This paper provides synthetic kernel and natural benchmark results from an HPC system at the NASA Goddard Space Flight Center that illustrate the performance impacts of multi-core (dual- and quad-core) vs. single core processor systems. Analysis of processor design, application source code, and synthetic and natural test results all indicate that multi-core processors can suffer from significant memory subsystem contention compared to similar single-core processors.

  18. VIRTUS: a multi-processor system in FASTBUS

    International Nuclear Information System (INIS)

    Ellett, J.; Jackson, R.; Ritter, R.; Schlein, P.; Yaeger, D.; Zweizig, J.

    1986-01-01

    VIRTUS is a system of parallel MC68000-based processors interconnected by FASTBUS that is used either on-line as an intelligent trigger component or off-line for full event processing. Each processor receives the complete set of data from one event. The host computer, a VAX 11/780, down-line loads all software to the processors, controls and monitors the functioning of all processors, and writes processed data to tape. Instructions, programs, and data are transferred among the processors and the host in the form of fixed format, variable length data blocks. (Auth.)

  19. Low-Latency Embedded Vision Processor (LLEVS)

    Science.gov (United States)

    2016-03-01

    algorithms, low-latency video processing, embedded image processor, wearable electronics, helmet-mounted systems, alternative night / day imaging...external subsystems and data sources with the device. The establishment of data interfaces in terms of data transfer rates, formats and types are...video signals from Near-visible Infrared (NVIR) sensor, Shortwave IR (SWIR) and Longwave IR (LWIR) is the main processing for Night Vision (NI) system

  20. Keystone Business Models for Network Security Processors

    Directory of Open Access Journals (Sweden)

    Arthur Low

    2013-07-01

    Full Text Available Network security processors are critical components of high-performance systems built for cybersecurity. Development of a network security processor requires multi-domain experience in semiconductors and complex software security applications, and multiple iterations of both software and hardware implementations. Limited by the business models in use today, such an arduous task can be undertaken only by large incumbent companies and government organizations. Neither the “fabless semiconductor” models nor the silicon intellectual-property licensing (“IP-licensing” models allow small technology companies to successfully compete. This article describes an alternative approach that produces an ongoing stream of novel network security processors for niche markets through continuous innovation by both large and small companies. This approach, referred to here as the "business ecosystem model for network security processors", includes a flexible and reconfigurable technology platform, a “keystone” business model for the company that maintains the platform architecture, and an extended ecosystem of companies that both contribute and share in the value created by innovation. New opportunities for business model innovation by participating companies are made possible by the ecosystem model. This ecosystem model builds on: i the lessons learned from the experience of the first author as a senior integrated circuit architect for providers of public-key cryptography solutions and as the owner of a semiconductor startup, and ii the latest scholarly research on technology entrepreneurship, business models, platforms, and business ecosystems. This article will be of interest to all technology entrepreneurs, but it will be of particular interest to owners of small companies that provide security solutions and to specialized security professionals seeking to launch their own companies.

  1. Silicon Processors Using Organically Reconfigurable Techniques (SPORT)

    Science.gov (United States)

    2014-05-19

    AFRL-OSR-VA-TR-2014-0132 SILICON PROCESSORS USING ORGANICALLY RECONFIGURABLE TECHNIQUES ( SPORT ) Dennis Prather UNIVERSITY OF DELAWARE Final Report 05...5a. CONTRACT NUMBER Silicon Processes for Organically Reconfigurable Techniques ( SPORT ) 5b. GRANT NUMBER FA9550-10-1-0363 5c...Contract: Silicon Processes for Organically Reconfigurable Techniques ( SPORT ) Contract #: FA9550-10-1-0363 Reporting Period: 1 July 2010 – 31 December

  2. Quantum chemistry on a superconducting quantum processor

    Energy Technology Data Exchange (ETDEWEB)

    Kaicher, Michael P.; Wilhelm, Frank K. [Theoretical Physics, Saarland University, 66123 Saarbruecken (Germany); Love, Peter J. [Department of Physics and Astronomy, Tufts University, Medford, MA 02155 (United States)

    2016-07-01

    Quantum chemistry is the most promising civilian application for quantum processors to date. We study its adaptation to superconducting (sc) quantum systems, computing the ground state energy of LiH through a variational hybrid quantum classical algorithm. We demonstrate how interactions native to sc qubits further reduce the amount of quantum resources needed, pushing sc architectures as a near-term candidate for simulations of more complex atoms/molecules.

  3. Debugging in a multi-processor environment

    International Nuclear Information System (INIS)

    Spann, J.M.

    1981-01-01

    The Supervisory Control and Diagnostic System (SCDS) for the Mirror Fusion Test Facility (MFTF) consists of nine 32-bit minicomputers arranged in a tightly coupled distributed computer system utilizing a share memory as the data exchange medium. Debugging of more than one program in the multi-processor environment is a difficult process. This paper describes what new tools were developed and how the testing of software is performed in the SCDS for the MFTF project

  4. Intelligent trigger processor for the crystal box

    International Nuclear Information System (INIS)

    Sanders, G.H.; Butler, H.S.; Cooper, M.D.

    1981-01-01

    A large solid angle modular NaI(Tl) detector with 432 phototubes and 88 trigger scintillators is being used to search simultaneously for three lepton flavor changing decays of muon. A beam of up to 10 6 muons stopping per second with a 6% duty factor would yield up to 1000 triggers per second from random triple coincidences. A reduction of the trigger rate to 10 Hz is required from a hardwired primary trigger processor described in this paper. Further reduction to < 1 Hz is achieved by a microprocessor based secondary trigger processor. The primary trigger hardware imposes voter coincidence logic, stringent timing requirements, and a non-adjacency requirement in the trigger scintillators defined by hardwired circuits. Sophisticated geometric requirements are imposed by a PROM-based matrix logic, and energy and vector-momentum cuts are imposed by a hardwired processor using LSI flash ADC's and digital arithmetic loci. The secondary trigger employs four satellite microprocessors to do a sparse data scan, multiplex the data acquisition channels and apply additional event filtering

  5. Multibus-based parallel processor for simulation

    Science.gov (United States)

    Ogrady, E. P.; Wang, C.-H.

    1983-01-01

    A Multibus-based parallel processor simulation system is described. The system is intended to serve as a vehicle for gaining hands-on experience, testing system and application software, and evaluating parallel processor performance during development of a larger system based on the horizontal/vertical-bus interprocessor communication mechanism. The prototype system consists of up to seven Intel iSBC 86/12A single-board computers which serve as processing elements, a multiple transmission controller (MTC) designed to support system operation, and an Intel Model 225 Microcomputer Development System which serves as the user interface and input/output processor. All components are interconnected by a Multibus/IEEE 796 bus. An important characteristic of the system is that it provides a mechanism for a processing element to broadcast data to other selected processing elements. This parallel transfer capability is provided through the design of the MTC and a minor modification to the iSBC 86/12A board. The operation of the MTC, the basic hardware-level operation of the system, and pertinent details about the iSBC 86/12A and the Multibus are described.

  6. Techniques for optimizing inerting in electron processors

    International Nuclear Information System (INIS)

    Rangwalla, I.J.; Korn, D.J.; Nablo, S.V.

    1993-01-01

    The design of an ''inert gas'' distribution system in an electron processor must satisfy a number of requirements. The first of these is the elimination or control of beam produced ozone and NO x which can be transported from the process zone by the product into the work area. Since the tolerable levels for O 3 in occupied areas around the processor are 3 in the beam heated process zone, or exhausting and dilution of the gas at the processor exit. The second requirement of the inerting system is to provide a suitable environment for completing efficient, free radical initiated addition polymerization. The competition between radical loss through de-excitation and that from O 2 quenching must be understood. This group has used gas chromatographic analysis of electron cured coatings to study the trade-offs of delivered dose, dose rate and O 2 concentrations in the process zone to determine the tolerable ranges of parameter excursions for production quality control purposes. These techniques are described for an ink coating system on paperboard, where a broad range of process parameters have been studied (D, D radical, O 2 ). It is then shown how the technique is used to optimize the use of higher purity (10-100 ppm O 2 ) nitrogen gas for inerting, in combination with lower purity (2-20,000 ppm O 2 ) non-cryogenically produced gas, as from a membrane or pressure swing adsorption generators. (author)

  7. Treecode with a Special-Purpose Processor

    Science.gov (United States)

    Makino, Junichiro

    1991-08-01

    We describe an implementation of the modified Barnes-Hut tree algorithm for a gravitational N-body calculation on a GRAPE (GRAvity PipE) backend processor. GRAPE is a special-purpose computer for N-body calculations. It receives the positions and masses of particles from a host computer and then calculates the gravitational force at each coordinate specified by the host. To use this GRAPE processor with the hierarchical tree algorithm, the host computer must maintain a list of all nodes that exert force on a particle. If we create this list for each particle of the system at each timestep, the number of floating-point operations on the host and that on GRAPE would become comparable, and the increased speed obtained by using GRAPE would be small. In our modified algorithm, we create a list of nodes for many particles. Thus, the amount of the work required of the host is significantly reduced. This algorithm was originally developed by Barnes in order to vectorize the force calculation on a Cyber 205. With this algorithm, the computing time of the force calculation becomes comparable to that of the tree construction, if the GRAPE backend processor is sufficiently fast. The obtained speed-up factor is 30 to 50 for a RISC-based host computer and GRAPE-1A with a peak speed of 240 Mflops.

  8. A heterogeneous CPU+GPU Poisson solver for space charge calculations in beam dynamics studies

    Energy Technology Data Exchange (ETDEWEB)

    Zheng, Dawei; Rienen, Ursula van [University of Rostock, Institute of General Electrical Engineering (Germany)

    2016-07-01

    In beam dynamics studies in accelerator physics, space charge plays a central role in the low energy regime of an accelerator. Numerical space charge calculations are required, both, in the design phase and in the operation of the machines as well. Due to its efficiency, mostly the Particle-In-Cell (PIC) method is chosen for the space charge calculation. Then, the solution of Poisson's equation for the charge distribution in the rest frame is the most prominent part within the solution process. The Poisson solver directly affects the accuracy of the self-field applied on the charged particles when the equation of motion is solved in the laboratory frame. As the Poisson solver consumes the major part of the computing time in most simulations it has to be as fast as possible since it has to be carried out once per time step. In this work, we demonstrate a novel heterogeneous CPU+GPU routine for the Poisson solver. The novel solver also benefits from our new research results on the utilization of a discrete cosine transform within the classical Hockney and Eastwood's convolution routine.

  9. Portable implementation model for CFD simulations. Application to hybrid CPU/GPU supercomputers

    Science.gov (United States)

    Oyarzun, Guillermo; Borrell, Ricard; Gorobets, Andrey; Oliva, Assensi

    2017-10-01

    Nowadays, high performance computing (HPC) systems experience a disruptive moment with a variety of novel architectures and frameworks, without any clarity of which one is going to prevail. In this context, the portability of codes across different architectures is of major importance. This paper presents a portable implementation model based on an algebraic operational approach for direct numerical simulation (DNS) and large eddy simulation (LES) of incompressible turbulent flows using unstructured hybrid meshes. The strategy proposed consists in representing the whole time-integration algorithm using only three basic algebraic operations: sparse matrix-vector product, a linear combination of vectors and dot product. The main idea is based on decomposing the nonlinear operators into a concatenation of two SpMV operations. This provides high modularity and portability. An exhaustive analysis of the proposed implementation for hybrid CPU/GPU supercomputers has been conducted with tests using up to 128 GPUs. The main objective consists in understanding the challenges of implementing CFD codes on new architectures.

  10. Discrepancy Between Clinician and Research Assistant in TIMI Score Calculation (TRIAGED CPU

    Directory of Open Access Journals (Sweden)

    Taylor, Brian T.

    2014-11-01

    Full Text Available Introduction: Several studies have attempted to demonstrate that the Thrombolysis in Myocardial Infarction (TIMI risk score has the ability to risk stratify emergency department (ED patients with potential acute coronary syndromes (ACS. Most of the studies we reviewed relied on trained research investigators to determine TIMI risk scores rather than ED providers functioning in their normal work capacity. We assessed whether TIMI risk scores obtained by ED providers in the setting of a busy ED differed from those obtained by trained research investigators. Methods: This was an ED-based prospective observational cohort study comparing TIMI scores obtained by 49 ED providers admitting patients to an ED chest pain unit (CPU to scores generated by a team of trained research investigators. We examined provider type, patient gender, and TIMI elements for their effects on TIMI risk score discrepancy. Results: Of the 501 adult patients enrolled in the study, 29.3% of TIMI risk scores determined by ED providers and trained research investigators were generated using identical TIMI risk score variables. In our low-risk population the majority of TIMI risk score differences were small; however, 12% of TIMI risk scores differed by two or more points. Conclusion: TIMI risk scores determined by ED providers in the setting of a busy ED frequently differ from scores generated by trained research investigators who complete them while not under the same pressure of an ED provider. [West J Emerg Med. 2015;16(1:24–33.

  11. A Novel CPU/GPU Simulation Environment for Large-Scale Biologically-Realistic Neural Modeling

    Directory of Open Access Journals (Sweden)

    Roger V Hoang

    2013-10-01

    Full Text Available Computational Neuroscience is an emerging field that provides unique opportunities to studycomplex brain structures through realistic neural simulations. However, as biological details are added tomodels, the execution time for the simulation becomes longer. Graphics Processing Units (GPUs are now being utilized to accelerate simulations due to their ability to perform computations in parallel. As such, they haveshown significant improvement in execution time compared to Central Processing Units (CPUs. Most neural simulators utilize either multiple CPUs or a single GPU for better performance, but still show limitations in execution time when biological details are not sacrificed. Therefore, we present a novel CPU/GPU simulation environment for large-scale biological networks,the NeoCortical Simulator version 6 (NCS6. NCS6 is a free, open-source, parallelizable, and scalable simula-tor, designed to run on clusters of multiple machines, potentially with high performance computing devicesin each of them. It has built-in leaky-integrate-and-fire (LIF and Izhikevich (IZH neuron models, but usersalso have the capability to design their own plug-in interface for different neuron types as desired. NCS6is currently able to simulate one million cells and 100 million synapses in quasi real time by distributing dataacross these heterogeneous clusters of CPUs and GPUs.

  12. Framework for sequential approximate optimization

    NARCIS (Netherlands)

    Jacobs, J.H.; Etman, L.F.P.; Keulen, van F.; Rooda, J.E.

    2004-01-01

    An object-oriented framework for Sequential Approximate Optimization (SAO) isproposed. The framework aims to provide an open environment for thespecification and implementation of SAO strategies. The framework is based onthe Python programming language and contains a toolbox of Python

  13. Multi-processor network implementations in Multibus II and VME

    International Nuclear Information System (INIS)

    Briegel, C.

    1992-01-01

    ACNET (Fermilab Accelerator Controls Network), a proprietary network protocol, is implemented in a multi-processor configuration for both Multibus II and VME. The implementations are contrasted by the bus protocol and software design goals. The Multibus II implementation provides for multiple processors running a duplicate set of tasks on each processor. For a network connected task, messages are distributed by a network round-robin scheduler. Further, messages can be stopped, continued, or re-routed for each task by user-callable commands. The VME implementation provides for multiple processors running one task across all processors. The process can either be fixed to a particular processor or dynamically allocated to an available processor depending on the scheduling algorithm of the multi-processing operating system. (author)

  14. Sequentially pulsed traveling wave accelerator

    Science.gov (United States)

    Caporaso, George J [Livermore, CA; Nelson, Scott D [Patterson, CA; Poole, Brian R [Tracy, CA

    2009-08-18

    A sequentially pulsed traveling wave compact accelerator having two or more pulse forming lines each with a switch for producing a short acceleration pulse along a short length of a beam tube, and a trigger mechanism for sequentially triggering the switches so that a traveling axial electric field is produced along the beam tube in synchronism with an axially traversing pulsed beam of charged particles to serially impart energy to the particle beam.

  15. Merged ozone profiles from four MIPAS processors

    Science.gov (United States)

    Laeng, Alexandra; von Clarmann, Thomas; Stiller, Gabriele; Dinelli, Bianca Maria; Dudhia, Anu; Raspollini, Piera; Glatthor, Norbert; Grabowski, Udo; Sofieva, Viktoria; Froidevaux, Lucien; Walker, Kaley A.; Zehner, Claus

    2017-04-01

    The Michelson Interferometer for Passive Atmospheric Sounding (MIPAS) was an infrared (IR) limb emission spectrometer on the Envisat platform. Currently, there are four MIPAS ozone data products, including the operational Level-2 ozone product processed at ESA, with the scientific prototype processor being operated at IFAC Florence, and three independent research products developed by the Istituto di Fisica Applicata Nello Carrara (ISAC-CNR)/University of Bologna, Oxford University, and the Karlsruhe Institute of Technology-Institute of Meteorology and Climate Research/Instituto de Astrofísica de Andalucía (KIT-IMK/IAA). Here we present a dataset of ozone vertical profiles obtained by merging ozone retrievals from four independent Level-2 MIPAS processors. We also discuss the advantages and the shortcomings of this merged product. As the four processors retrieve ozone in different parts of the spectra (microwindows), the source measurements can be considered as nearly independent with respect to measurement noise. Hence, the information content of the merged product is greater and the precision is better than those of any parent (source) dataset. The merging is performed on a profile per profile basis. Parent ozone profiles are weighted based on the corresponding error covariance matrices; the error correlations between different profile levels are taken into account. The intercorrelations between the processors' errors are evaluated statistically and are used in the merging. The height range of the merged product is 20-55 km, and error covariance matrices are provided as diagnostics. Validation of the merged dataset is performed by comparison with ozone profiles from ACE-FTS (Atmospheric Chemistry Experiment-Fourier Transform Spectrometer) and MLS (Microwave Limb Sounder). Even though the merging is not supposed to remove the biases of the parent datasets, around the ozone volume mixing ratio peak the merged product is found to have a smaller (up to 0.1 ppmv

  16. Application of the coupled code Athlet-Quabox/Cubbox for the extreme scenarios of the OECD/NRC BWR turbine trip benchmark and its performance on multi-processor computers

    International Nuclear Information System (INIS)

    Langenbuch, S.; Schmidt, K.D.; Velkov, K.

    2003-01-01

    The OECD/NRC BWR Turbine Trip (TT) Benchmark is investigated to perform code-to-code comparison of coupled codes including a comparison to measured data which are available from turbine trip experiments at Peach Bottom 2. This Benchmark problem for a BWR over-pressure transient represents a challenging application of coupled codes which integrate 3-dimensional neutron kinetics into thermal-hydraulic system codes for best-estimate simulation of plant transients. This transient represents a typical application of coupled codes which are usually performed on powerful workstations using a single CPU. Nowadays, the availability of multi-CPUs is much easier. Indeed, powerful workstations already provide 4 to 8 CPU, computer centers give access to multi-processor systems with numbers of CPUs in the order of 16 up to several 100. Therefore, the performance of the coupled code Athlet-Quabox/Cubbox on multi-processor systems is studied. Different cases of application lead to changing requirements of the code efficiency, because the amount of computer time spent in different parts of the code is varying. This paper presents main results of the coupled code Athlet-Quabox/Cubbox for the extreme scenarios of the BWR TT Benchmark together with evaluations of the code performance on multi-processor computers. (authors)

  17. A high performance image processing platform based on CPU-GPU heterogeneous cluster with parallel image reconstroctions for micro-CT

    International Nuclear Information System (INIS)

    Ding Yu; Qi Yujin; Zhang Xuezhu; Zhao Cuilan

    2011-01-01

    In this paper, we report the development of a high-performance image processing platform, which is based on CPU-GPU heterogeneous cluster. Currently, it consists of a Dell Precision T7500 and HP XW8600 workstations with parallel programming and runtime environment, using the message-passing interface (MPI) and CUDA (Compute Unified Device Architecture). We succeeded in developing parallel image processing techniques for 3D image reconstruction of X-ray micro-CT imaging. The results show that a GPU provides a computing efficiency of about 194 times faster than a single CPU, and the CPU-GPU clusters provides a computing efficiency of about 46 times faster than the CPU clusters. These meet the requirements of rapid 3D image reconstruction and real time image display. In conclusion, the use of CPU-GPU heterogeneous cluster is an effective way to build high-performance image processing platform. (authors)

  18. Computing effective properties of random heterogeneous materials on heterogeneous parallel processors

    Science.gov (United States)

    Leidi, Tiziano; Scocchi, Giulio; Grossi, Loris; Pusterla, Simone; D'Angelo, Claudio; Thiran, Jean-Philippe; Ortona, Alberto

    2012-11-01

    In recent decades, finite element (FE) techniques have been extensively used for predicting effective properties of random heterogeneous materials. In the case of very complex microstructures, the choice of numerical methods for the solution of this problem can offer some advantages over classical analytical approaches, and it allows the use of digital images obtained from real material samples (e.g., using computed tomography). On the other hand, having a large number of elements is often necessary for properly describing complex microstructures, ultimately leading to extremely time-consuming computations and high memory requirements. With the final objective of reducing these limitations, we improved an existing freely available FE code for the computation of effective conductivity (electrical and thermal) of microstructure digital models. To allow execution on hardware combining multi-core CPUs and a GPU, we first translated the original algorithm from Fortran to C, and we subdivided it into software components. Then, we enhanced the C version of the algorithm for parallel processing with heterogeneous processors. With the goal of maximizing the obtained performances and limiting resource consumption, we utilized a software architecture based on stream processing, event-driven scheduling, and dynamic load balancing. The parallel processing version of the algorithm has been validated using a simple microstructure consisting of a single sphere located at the centre of a cubic box, yielding consistent results. Finally, the code was used for the calculation of the effective thermal conductivity of a digital model of a real sample (a ceramic foam obtained using X-ray computed tomography). On a computer equipped with dual hexa-core Intel Xeon X5670 processors and an NVIDIA Tesla C2050, the parallel application version features near to linear speed-up progression when using only the CPU cores. It executes more than 20 times faster when additionally using the GPU.

  19. Case Study of Using High Performance Commercial Processors in a Space Environment

    Science.gov (United States)

    Ferguson, Roscoe C.; Olivas, Zulema

    2009-01-01

    The purpose of the Space Shuttle Cockpit Avionics Upgrade project was to reduce crew workload and improve situational awareness. The upgrade was to augment the Shuttle avionics system with new hardware and software. A major success of this project was the validation of the hardware architecture and software design. This was significant because the project incorporated new technology and approaches for the development of human rated space software. An early version of this system was tested at the Johnson Space Center for one month by teams of astronauts. The results were positive, but NASA eventually cancelled the project towards the end of the development cycle. The goal to reduce crew workload and improve situational awareness resulted in the need for high performance Central Processing Units (CPUs). The choice of CPU selected was the PowerPC family, which is a reduced instruction set computer (RISC) known for its high performance. However, the requirement for radiation tolerance resulted in the reevaluation of the selected family member of the PowerPC line. Radiation testing revealed that the original selected processor (PowerPC 7400) was too soft to meet mission objectives and an effort was established to perform trade studies and performance testing to determine a feasible candidate. At that time, the PowerPC RAD750s where radiation tolerant, but did not meet the required performance needs of the project. Thus, the final solution was to select the PowerPC 7455. This processor did not have a radiation tolerant version, but faired better than the 7400 in the ability to detect failures. However, its cache tags did not provide parity and thus the project incorporated a software strategy to detect radiation failures. The strategy was to incorporate dual paths for software generating commands to the legacy Space Shuttle avionics to prevent failures due to the softness of the upgraded avionics.

  20. Digital Circuit Analysis Using an 8080 Processor.

    Science.gov (United States)

    Greco, John; Stern, Kenneth

    1983-01-01

    Presents the essentials of a program written in Intel 8080 assembly language for the steady state analysis of a combinatorial logic gate circuit. Program features and potential modifications are considered. For example, the program could also be extended to include clocked/unclocked sequential circuits. (JN)

  1. Interactive dose shaping - efficient strategies for CPU-based real-time treatment planning

    International Nuclear Information System (INIS)

    Ziegenhein, P; Kamerling, C P; Oelfke, U

    2014-01-01

    Conventional intensity modulated radiation therapy (IMRT) treatment planning is based on the traditional concept of iterative optimization using an objective function specified by dose volume histogram constraints for pre-segmented VOIs. This indirect approach suffers from unavoidable shortcomings: i) The control of local dose features is limited to segmented VOIs. ii) Any objective function is a mathematical measure of the plan quality, i.e., is not able to define the clinically optimal treatment plan. iii) Adapting an existing plan to changed patient anatomy as detected by IGRT procedures is difficult. To overcome these shortcomings, we introduce the method of Interactive Dose Shaping (IDS) as a new paradigm for IMRT treatment planning. IDS allows for a direct and interactive manipulation of local dose features in real-time. The key element driving the IDS process is a two-step Dose Modification and Recovery (DMR) strategy: A local dose modification is initiated by the user which translates into modified fluence patterns. This also affects existing desired dose features elsewhere which is compensated by a heuristic recovery process. The IDS paradigm was implemented together with a CPU-based ultra-fast dose calculation and a 3D GUI for dose manipulation and visualization. A local dose feature can be implemented via the DMR strategy within 1-2 seconds. By imposing a series of local dose features, equal plan qualities could be achieved compared to conventional planning for prostate and head and neck cases within 1-2 minutes. The idea of Interactive Dose Shaping for treatment planning has been introduced and first applications of this concept have been realized.

  2. Fast parallel image registration on CPU and GPU for diagnostic classification of Alzheimer's disease.

    Science.gov (United States)

    Shamonin, Denis P; Bron, Esther E; Lelieveldt, Boudewijn P F; Smits, Marion; Klein, Stefan; Staring, Marius

    2013-01-01

    Nonrigid image registration is an important, but time-consuming task in medical image analysis. In typical neuroimaging studies, multiple image registrations are performed, i.e., for atlas-based segmentation or template construction. Faster image registration routines would therefore be beneficial. In this paper we explore acceleration of the image registration package elastix by a combination of several techniques: (i) parallelization on the CPU, to speed up the cost function derivative calculation; (ii) parallelization on the GPU building on and extending the OpenCL framework from ITKv4, to speed up the Gaussian pyramid computation and the image resampling step; (iii) exploitation of certain properties of the B-spline transformation model; (iv) further software optimizations. The accelerated registration tool is employed in a study on diagnostic classification of Alzheimer's disease and cognitively normal controls based on T1-weighted MRI. We selected 299 participants from the publicly available Alzheimer's Disease Neuroimaging Initiative database. Classification is performed with a support vector machine based on gray matter volumes as a marker for atrophy. We evaluated two types of strategies (voxel-wise and region-wise) that heavily rely on nonrigid image registration. Parallelization and optimization resulted in an acceleration factor of 4-5x on an 8-core machine. Using OpenCL a speedup factor of 2 was realized for computation of the Gaussian pyramids, and 15-60 for the resampling step, for larger images. The voxel-wise and the region-wise classification methods had an area under the receiver operator characteristic curve of 88 and 90%, respectively, both for standard and accelerated registration. We conclude that the image registration package elastix was substantially accelerated, with nearly identical results to the non-optimized version. The new functionality will become available in the next release of elastix as open source under the BSD license.

  3. Fast Parallel Image Registration on CPU and GPU for Diagnostic Classification of Alzheimer's Disease

    Directory of Open Access Journals (Sweden)

    Denis P Shamonin

    2014-01-01

    Full Text Available Nonrigid image registration is an important, but time-consuming taskin medical image analysis. In typical neuroimaging studies, multipleimage registrations are performed, i.e. for atlas-based segmentationor template construction. Faster image registration routines wouldtherefore be beneficial.In this paper we explore acceleration of the image registrationpackage elastix by a combination of several techniques: iparallelization on the CPU, to speed up the cost function derivativecalculation; ii parallelization on the GPU building on andextending the OpenCL framework from ITKv4, to speed up the Gaussianpyramid computation and the image resampling step; iii exploitationof certain properties of the B-spline transformation model; ivfurther software optimizations.The accelerated registration tool is employed in a study ondiagnostic classification of Alzheimer's disease and cognitivelynormal controls based on T1-weighted MRI. We selected 299participants from the publicly available Alzheimer's DiseaseNeuroimaging Initiative database. Classification is performed with asupport vector machine based on gray matter volumes as a marker foratrophy. We evaluated two types of strategies (voxel-wise andregion-wise that heavily rely on nonrigid image registration.Parallelization and optimization resulted in an acceleration factorof 4-5x on an 8-core machine. Using OpenCL a speedup factor of ~2was realized for computation of the Gaussian pyramids, and 15-60 forthe resampling step, for larger images. The voxel-wise and theregion-wise classification methods had an area under thereceiver operator characteristic curve of 88% and 90%,respectively, both for standard and accelerated registration.We conclude that the image registration package elastix wassubstantially accelerated, with nearly identical results to thenon-optimized version. The new functionality will become availablein the next release of elastix as open source under the BSD license.

  4. Modcomp MAX IV System Processors reference guide

    Energy Technology Data Exchange (ETDEWEB)

    Cummings, J.

    1990-10-01

    A user almost always faces a big problem when having to learn to use a new computer system. The information necessary to use the system is often scattered throughout many different manuals. The user also faces the problem of extracting the information really needed from each manual. Very few computer vendors supply a single Users Guide or even a manual to help the new user locate the necessary manuals. Modcomp is no exception to this, Modcomp MAX IV requires that the user be familiar with the system file usage which adds to the problem. At General Atomics there is an ever increasing need for new users to learn how to use the Modcomp computers. This paper was written to provide a condensed Users Reference Guide'' for Modcomp computer users. This manual should be of value not only to new users but any users that are not Modcomp computer systems experts. This Users Reference Guide'' is intended to provided the basic information for the use of the various Modcomp System Processors necessary to, create, compile, link-edit, and catalog a program. Only the information necessary to provide the user with a basic understanding of the Systems Processors is included. This document provides enough information for the majority of programmers to use the Modcomp computers without having to refer to any other manuals. A lot of emphasis has been placed on the file description and usage for each of the System Processors. This allows the user to understand how Modcomp MAX IV does things rather than just learning the system commands.

  5. Joint Optimized CPU and Networking Control Scheme for Improved Energy Efficiency in Video Streaming on Mobile Devices

    Directory of Open Access Journals (Sweden)

    Sung-Woong Jo

    2017-01-01

    Full Text Available Video streaming service is one of the most popular applications for mobile users. However, mobile video streaming services consume a lot of energy, resulting in a reduced battery life. This is a critical problem that results in a degraded user’s quality of experience (QoE. Therefore, in this paper, a joint optimization scheme that controls both the central processing unit (CPU and wireless networking of the video streaming process for improved energy efficiency on mobile devices is proposed. For this purpose, the energy consumption of the network interface and CPU is analyzed, and based on the energy consumption profile a joint optimization problem is formulated to maximize the energy efficiency of the mobile device. The proposed algorithm adaptively adjusts the number of chunks to be downloaded and decoded in each packet. Simulation results show that the proposed algorithm can effectively improve the energy efficiency when compared with the existing algorithms.

  6. Optical linear algebra processors - Architectures and algorithms

    Science.gov (United States)

    Casasent, David

    1986-01-01

    Attention is given to the component design and optical configuration features of a generic optical linear algebra processor (OLAP) architecture, as well as the large number of OLAP architectures, number representations, algorithms and applications encountered in current literature. Number-representation issues associated with bipolar and complex-valued data representations, high-accuracy (including floating point) performance, and the base or radix to be employed, are discussed, together with case studies on a space-integrating frequency-multiplexed architecture and a hybrid space-integrating and time-integrating multichannel architecture.

  7. The design of a graphics processor

    International Nuclear Information System (INIS)

    Holmes, M.; Thorne, A.R.

    1975-12-01

    The design of a graphics processor is described which takes into account known and anticipated user requirements, the availability of cheap minicomputers, the state of integrated circuit technology, and the overall need to minimise cost for a given performance. The main user needs are the ability to display large high resolution pictures, and to dynamically change the user's view in real time by means of fast coordinate processing hardware. The transformations that can be applied to 2D or 3D coordinates either singly or in combination are: translation, scaling, mirror imaging, rotation, and the ability to map the transformation origin on to any point on the screen. (author)

  8. Dual-scale topology optoelectronic processor.

    Science.gov (United States)

    Marsden, G C; Krishnamoorthy, A V; Esener, S C; Lee, S H

    1991-12-15

    The dual-scale topology optoelectronic processor (D-STOP) is a parallel optoelectronic architecture for matrix algebraic processing. The architecture can be used for matrix-vector multiplication and two types of vector outer product. The computations are performed electronically, which allows multiplication and summation concepts in linear algebra to be generalized to various nonlinear or symbolic operations. This generalization permits the application of D-STOP to many computational problems. The architecture uses a minimum number of optical transmitters, which thereby reduces fabrication requirements while maintaining area-efficient electronics. The necessary optical interconnections are space invariant, minimizing space-bandwidth requirements.

  9. Nuclear interactive evaluations on distributed processors

    International Nuclear Information System (INIS)

    Dix, G.E.; Congdon, S.P.

    1988-01-01

    BWR [boiling water reactor] nuclear design is a complicated process, involving trade-offs among a variety of conflicting objectives. Complex computer calculations and usually required for each design iteration. GE Nuclear Energy has implemented a system where the evaluations are performed interactively on a large number of small microcomputers. This approach minimizes the time it takes to carry out design iterations even through the processor speeds are low compared with modern super computers. All of the desktop microcomputers are linked to a common data base via an ethernet communications system so that design data can be shared and data quality can be maintained

  10. Integral Fast Reactor fuel pin processor

    International Nuclear Information System (INIS)

    Levinskas, D.

    1993-01-01

    This report discusses the pin processor which receives metal alloy pins cast from recycled Integral Fast Reactor (IFR) fuel and prepares them for assembly into new IFR fuel elements. Either full length as-cast or precut pins are fed to the machine from a magazine, cut if necessary, and measured for length, weight, diameter and deviation from straightness. Accepted pins are loaded into cladding jackets located in a magazine, while rejects and cutting scraps are separated into trays. The magazines, trays, and the individual modules that perform the different machine functions are assembled and removed using remote manipulators and master-slaves

  11. Lattice gauge theory using parallel processors

    International Nuclear Information System (INIS)

    Lee, T.D.; Chou, K.C.; Zichichi, A.

    1987-01-01

    The book's contents include: Lattice Gauge Theory Lectures: Introduction and Current Fermion Simulations; Monte Carlo Algorithms for Lattice Gauge Theory; Specialized Computers for Lattice Gauge Theory; Lattice Gauge Theory at Finite Temperature: A Monte Carlo Study; Computational Method - An Elementary Introduction to the Langevin Equation, Present Status of Numerical Quantum Chromodynamics; Random Lattice Field Theory; The GF11 Processor and Compiler; and The APE Computer and First Physics Results; Columbia Supercomputer Project: Parallel Supercomputer for Lattice QCD; Statistical and Systematic Errors in Numerical Simulations; Monte Carlo Simulation for LGT and Programming Techniques on the Columbia Supercomputer; Food for Thought: Five Lectures on Lattice Gauge Theory

  12. Introduction to programming multiple-processor computers

    International Nuclear Information System (INIS)

    Hicks, H.R.; Lynch, V.E.

    1985-04-01

    FORTRAN applications programs can be executed on multiprocessor computers in either a unitasking (traditional) or multitasking form. The latter allows a single job to use more than one processor simultaneously, with a consequent reduction in wall-clock time and, perhaps, the cost of the calculation. An introduction to programming in this environment is presented. The concepts of synchronization and data sharing using EVENTS and LOCKS are illustrated with examples. The strategy of strong synchronization and the use of synchronization templates are proposed. We emphasize that incorrect multitasking programs can produce irreproducible results, which makes debugging more difficult

  13. Remarks on sequential designs in risk assessment

    International Nuclear Information System (INIS)

    Seidenfeld, T.

    1982-01-01

    The special merits of sequential designs are reviewed in light of particular challenges that attend risk assessment for human population. The kinds of ''statistical inference'' are distinguished and the problem of design which is pursued is the clash between Neyman-Pearson and Bayesian programs of sequential design. The value of sequential designs is discussed and the Neyman-Pearson vs. Bayesian sequential designs are probed in particular. Finally, warnings with sequential designs are considered, especially in relation to utilitarianism

  14. Improvement of the real-time processor in JT-60 data processing system

    International Nuclear Information System (INIS)

    Sakata, S.; Kiyono, K.; Sato, M.; Kominato, T.; Sueoka, M.; Hosoyama, H.; Kawamata, Y.

    2009-01-01

    Real-time processor, RTP is a basic subsystem in the JT-60 data processing system and plays an important role in JT-60 feedback control for plasma experiment. During the experiment, RTP acquires various diagnostic signals, processes them into a form of physical values, and transfers them as sensor signals to the particle supply and heating control supervisor for feedback control via reflective memory synchronization with 1 ms clock signals. After the start of RTP operation in 1997, to meet the demand for advanced plasma experiment, RTP had been improved continuously such as by addition of diagnostic signals with faster digitizers, reducing time for data transfer utilizing reflective memory instead of CAMAC. However, it is becoming increasingly difficult to maintain, manage, and improve the outdated RTP with limited system CPU capability. Currently, a prototype RTP system is being developed for the next real-time processing system, which is composed of clustered system utilizing VxWorks computer. The processes on the existing RTP system will be decentralized to the VxWorks computer to solve the issues of the existing RTP system. The prototype RTP system will start to operate in August 2008.

  15. EXPERIENCE WITH FPGA-BASED PROCESSOR CORE AS FRONT-END COMPUTER

    International Nuclear Information System (INIS)

    HOFF, L.T.

    2005-01-01

    The RHIC control system architecture follows the familiar ''standard model''. LINUX workstations are used as operator consoles. Front-end computers are distributed around the accelerator, close to equipment being controlled or monitored. These computers are generally based on VMEbus CPU modules running the VxWorks operating system. I/O is typically performed via the VMEbus, or via PMC daughter cards (via an internal PCI bus), or via on-board I/O interfaces (Ethernet or serial). Advances in FPGA size and sophistication now permit running virtual processor ''cores'' within the FPGA logic, including ''cores'' with advanced features such as memory management. Such systems offer certain advantages over traditional VMEbus Front-end computers. Advantages include tighter coupling with FPGA logic, and therefore higher I/O bandwidth, and flexibility in packaging, possibly resulting in a lower noise environment and/or lower cost. This paper presents the experience acquired while porting the RHIC control system to a PowerPC 405 core within a Xilinx FPGA for use in low-level RF control

  16. Recommending the heterogeneous cluster type multi-processor system computing

    International Nuclear Information System (INIS)

    Iijima, Nobukazu

    2010-01-01

    Real-time reactor simulator had been developed by reusing the equipment of the Musashi reactor and its performance improvement became indispensable for research tools to increase sampling rate with introduction of arithmetic units using multi-Digital Signal Processor(DSP) system (cluster). In order to realize the heterogeneous cluster type multi-processor system computing, combination of two kinds of Control Processor (CP) s, Cluster Control Processor (CCP) and System Control Processor (SCP), were proposed with Large System Control Processor (LSCP) for hierarchical cluster if needed. Faster computing performance of this system was well evaluated by simulation results for simultaneous execution of plural jobs and also pipeline processing between clusters, which showed the system led to effective use of existing system and enhancement of the cost performance. (T. Tanaka)

  17. Cpu/gpu Computing for AN Implicit Multi-Block Compressible Navier-Stokes Solver on Heterogeneous Platform

    Science.gov (United States)

    Deng, Liang; Bai, Hanli; Wang, Fang; Xu, Qingxin

    2016-06-01

    CPU/GPU computing allows scientists to tremendously accelerate their numerical codes. In this paper, we port and optimize a double precision alternating direction implicit (ADI) solver for three-dimensional compressible Navier-Stokes equations from our in-house Computational Fluid Dynamics (CFD) software on heterogeneous platform. First, we implement a full GPU version of the ADI solver to remove a lot of redundant data transfers between CPU and GPU, and then design two fine-grain schemes, namely “one-thread-one-point” and “one-thread-one-line”, to maximize the performance. Second, we present a dual-level parallelization scheme using the CPU/GPU collaborative model to exploit the computational resources of both multi-core CPUs and many-core GPUs within the heterogeneous platform. Finally, considering the fact that memory on a single node becomes inadequate when the simulation size grows, we present a tri-level hybrid programming pattern MPI-OpenMP-CUDA that merges fine-grain parallelism using OpenMP and CUDA threads with coarse-grain parallelism using MPI for inter-node communication. We also propose a strategy to overlap the computation with communication using the advanced features of CUDA and MPI programming. We obtain speedups of 6.0 for the ADI solver on one Tesla M2050 GPU in contrast to two Xeon X5670 CPUs. Scalability tests show that our implementation can offer significant performance improvement on heterogeneous platform.

  18. SSC 254 Screen-Based Word Processors: Production Tests. The Lanier Word Processor.

    Science.gov (United States)

    Moyer, Ruth A.

    Designed for use in Trident Technical College's Secretarial Lab, this series of 12 production tests focuses on the use of the Lanier Word Processor for a variety of tasks. In tests 1 and 2, students are required to type and print out letters. Tests 3 through 8 require students to reformat a text; make corrections on a letter; divide and combine…

  19. Multiprocessor Real-Time Scheduling with Hierarchical Processor Affinities

    OpenAIRE

    Bonifaci , Vincenzo; Brandenburg , Björn; D'Angelo , Gianlorenzo; Marchetti-Spaccamela , Alberto

    2016-01-01

    International audience; Many multiprocessor real-time operating systems offer the possibility to restrict the migrations of any task to a specified subset of processors by setting affinity masks. A notion of " strong arbitrary processor affinity scheduling " (strong APA scheduling) has been proposed; this notion avoids schedulability losses due to overly simple implementations of processor affinities. Due to potential overheads, strong APA has not been implemented so far in a real-time operat...

  20. Sequential versus simultaneous market delineation

    DEFF Research Database (Denmark)

    Haldrup, Niels; Møllgaard, Peter; Kastberg Nielsen, Claus

    2005-01-01

    and geographical markets. Using a unique data setfor prices of Norwegian and Scottish salmon, we propose a methodologyfor simultaneous market delineation and we demonstrate that comparedto a sequential approach conclusions will be reversed.JEL: C3, K21, L41, Q22Keywords: Relevant market, econometric delineation......Delineation of the relevant market forms a pivotal part of most antitrustcases. The standard approach is sequential. First the product marketis delineated, then the geographical market is defined. Demand andsupply substitution in both the product dimension and the geographicaldimension...

  1. Sequential logic analysis and synthesis

    CERN Document Server

    Cavanagh, Joseph

    2007-01-01

    Until now, there was no single resource for actual digital system design. Using both basic and advanced concepts, Sequential Logic: Analysis and Synthesis offers a thorough exposition of the analysis and synthesis of both synchronous and asynchronous sequential machines. With 25 years of experience in designing computing equipment, the author stresses the practical design of state machines. He clearly delineates each step of the structured and rigorous design principles that can be applied to practical applications. The book begins by reviewing the analysis of combinatorial logic and Boolean a

  2. Coordinated Energy Management in Heterogeneous Processors

    Directory of Open Access Journals (Sweden)

    Indrani Paul

    2014-01-01

    Full Text Available This paper examines energy management in a heterogeneous processor consisting of an integrated CPU–GPU for high-performance computing (HPC applications. Energy management for HPC applications is challenged by their uncompromising performance requirements and complicated by the need for coordinating energy management across distinct core types – a new and less understood problem. We examine the intra-node CPU–GPU frequency sensitivity of HPC applications on tightly coupled CPU–GPU architectures as the first step in understanding power and performance optimization for a heterogeneous multi-node HPC system. The insights from this analysis form the basis of a coordinated energy management scheme, called DynaCo, for integrated CPU–GPU architectures. We implement DynaCo on a modern heterogeneous processor and compare its performance to a state-of-the-art power- and performance-management algorithm. DynaCo improves measured average energy-delay squared (ED2 product by up to 30% with less than 2% average performance loss across several exascale and other HPC workloads.

  3. Expert System Constant False Alarm Rate (CFAR) Processor

    National Research Council Canada - National Science Library

    Wicks, Michael C

    2006-01-01

    An artificial intelligence system improves radar signal processor performance by increasing target probability of detection and reducing probability of false alarm in a severe radar clutter environment...

  4. Fast track trigger processor for the OPAL detector at LEP

    Energy Technology Data Exchange (ETDEWEB)

    Carter, A A; Carter, J R; Ward, D R; Heuer, R D; Jaroslawski, S; Wagner, A

    1986-09-20

    A fast hardware track trigger processor being built for the OPAL experiment is described. The processor will analyse data from the central drift chambers of OPAL to determine whether any tracks come from the interaction region, and thereby eliminate background events. The processor will find tracks over a large angular range, vertical strokecos thetavertical stroke < or approx. 0.95. The design of the processor is described, together with a brief account of its hardware implementation for OPAL. The results of feasibility studies are also presented.

  5. Special processor for in-core control systems

    International Nuclear Information System (INIS)

    Golovanov, M.N.; Duma, V.R.; Levin, G.L.; Mel'nikov, A.V.; Polikanin, A.V.; Filatov, V.P.

    1978-01-01

    The BUTs-20 special processor is discussed, designed to control the units of the in-core control equipment which are incorporated into the VECTOR communication channel, and to provide preliminary data processing prior to computer calculations. A set of instructions and flowsheet of the processor, organization of its communication with memories and other units of the system are given. The processor components: a control unit and an arithmetic logical unit are discussed. It is noted that the special processor permits more effective utilization of the computer time

  6. Development of level 2 processor for the readout of TMC

    International Nuclear Information System (INIS)

    Arai, Y.; Ikeno, M.; Murata, T.; Sudo, F.; Emura, T.

    1995-01-01

    We have developed a prototype 8-bit processor for the level 2 data processing for the Time Memory Cell (TMC). The first prototype processor successfully runs with 18 MHz clock. The operation of same clock frequency as TMC (30 MHz) will be easily achieved with simple modifications. Although the processor is very primitive one but shows its powerful performance and flexibility. To realize the compact TMC/L2P (Level 2 Processor) system, it is better to include the microcode memory within the chip. Encoding logic of the microcode must be included to reduce the microcode memory in this case. (J.P.N.)

  7. Evaluation Using Sequential Trials Methods.

    Science.gov (United States)

    Cohen, Mark E.; Ralls, Stephen A.

    1986-01-01

    Although dental school faculty as well as practitioners are interested in evaluating products and procedures used in clinical practice, research design and statistical analysis can sometimes pose problems. Sequential trials methods provide an analytical structure that is both easy to use and statistically valid. (Author/MLW)

  8. Attack Trees with Sequential Conjunction

    NARCIS (Netherlands)

    Jhawar, Ravi; Kordy, Barbara; Mauw, Sjouke; Radomirović, Sasa; Trujillo-Rasua, Rolando

    2015-01-01

    We provide the first formal foundation of SAND attack trees which are a popular extension of the well-known attack trees. The SAND at- tack tree formalism increases the expressivity of attack trees by intro- ducing the sequential conjunctive operator SAND. This operator enables the modeling of

  9. Dynamic anticipatory processing of hierarchical sequential events: a common role for Broca's area and ventral premotor cortex across domains?

    Science.gov (United States)

    Fiebach, Christian J; Schubotz, Ricarda I

    2006-05-01

    This paper proposes a domain-general model for the functional contribution of ventral premotor cortex (PMv) and adjacent Broca's area to perceptual, cognitive, and motor processing. We propose to understand this frontal region as a highly flexible sequence processor, with the PMv mapping sequential events onto stored structural templates and Broca's Area involved in more complex, hierarchical or hypersequential processing. This proposal is supported by reference to previous functional neuroimaging studies investigating abstract sequence processing and syntactic processing.

  10. Environmentally adaptive processing for shallow ocean applications: A sequential Bayesian approach.

    Science.gov (United States)

    Candy, J V

    2015-09-01

    The shallow ocean is a changing environment primarily due to temperature variations in its upper layers directly affecting sound propagation throughout. The need to develop processors capable of tracking these changes implies a stochastic as well as an environmentally adaptive design. Bayesian techniques have evolved to enable a class of processors capable of performing in such an uncertain, nonstationary (varying statistics), non-Gaussian, variable shallow ocean environment. A solution to this problem is addressed by developing a sequential Bayesian processor capable of providing a joint solution to the modal function tracking and environmental adaptivity problem. Here, the focus is on the development of both a particle filter and an unscented Kalman filter capable of providing reasonable performance for this problem. These processors are applied to hydrophone measurements obtained from a vertical array. The adaptivity problem is attacked by allowing the modal coefficients and/or wavenumbers to be jointly estimated from the noisy measurement data along with tracking of the modal functions while simultaneously enhancing the noisy pressure-field measurements.

  11. Generation of sea ice geophysical flux estimates utilizing a multisensor data processor in preparation for the RADARSAT and EOS eras

    International Nuclear Information System (INIS)

    Holt, B.; Kwok, R.; Carsey, F.; Curlander, J.

    1991-01-01

    A geophysical processor for deriving sea ice type and ice motion information from sequential SAR image data has been designed and is in implementation phase for use with ERS-1 SAR data at the Alaska SAR Facility (ASF). This SAR ice data processor, called the ASF Geophysical Processing System, or ASF-GPS, will be in place for launch in May 1991. Descriptions of the salient aspects of ASF-GPS and its current status are presented. The next step in the evolution of processors for geophysical descriptions of sea ice is now in design phase; it involves the utilization of data from other sensors and sources and the generation of higher-level products. The augmented data are environmental, e.g., weather agency analyses, satellite-derived surface temperatures and drifting buoy data. These data serve to (1) improve the performance of the basic data product generation, the ice type and motion data sets, by increasing accuracy and shortening processing time, and (2) extend the level of the data products by computation of key geophysical fluxes. Geophysical quantities required from the sea ice processor include the surface heat, momentum, brine and freshwater fluxes, radiation balance, snow cover, melt pond cover and thermodynamic state. The estimation of two of these fluxes, brine and freshwater, is discussed, and the requirements for suitable environmental data are also presented. Finally, the system design of the ASF-GPS and the follow-on processor, designed initially to utilize SAR data from RADARSAT with weather and other inputs, e.g., AVHRR, and, after upgrade, from the suite of EOS instruments, will be presented. As now envisioned this system will have layered architecture with major branches in data management, user interface and science data analysis and will serve as a prototype design for a wide range of applications

  12. On-board landmark navigation and attitude reference parallel processor system

    Science.gov (United States)

    Gilbert, L. E.; Mahajan, D. T.

    1978-01-01

    An approach to autonomous navigation and attitude reference for earth observing spacecraft is described along with the landmark identification technique based on a sequential similarity detection algorithm (SSDA). Laboratory experiments undertaken to determine if better than one pixel accuracy in registration can be achieved consistent with onboard processor timing and capacity constraints are included. The SSDA is implemented using a multi-microprocessor system including synchronization logic and chip library. The data is processed in parallel stages, effectively reducing the time to match the small known image within a larger image as seen by the onboard image system. Shared memory is incorporated in the system to help communicate intermediate results among microprocessors. The functions include finding mean values and summation of absolute differences over the image search area. The hardware is a low power, compact unit suitable to onboard application with the flexibility to provide for different parameters depending upon the environment.

  13. Aspects of computation on asynchronous parallel processors

    International Nuclear Information System (INIS)

    Wright, M.

    1989-01-01

    The increasing availability of asynchronous parallel processors has provided opportunities for original and useful work in scientific computing. However, the field of parallel computing is still in a highly volatile state, and researchers display a wide range of opinion about many fundamental questions such as models of parallelism, approaches for detecting and analyzing parallelism of algorithms, and tools that allow software developers and users to make effective use of diverse forms of complex hardware. This volume collects the work of researchers specializing in different aspects of parallel computing, who met to discuss the framework and the mechanics of numerical computing. The far-reaching impact of high-performance asynchronous systems is reflected in the wide variety of topics, which include scientific applications (e.g. linear algebra, lattice gauge simulation, ordinary and partial differential equations), models of parallelism, parallel language features, task scheduling, automatic parallelization techniques, tools for algorithm development in parallel environments, and system design issues

  14. Efficient quantum walk on a quantum processor

    Science.gov (United States)

    Qiang, Xiaogang; Loke, Thomas; Montanaro, Ashley; Aungskunsiri, Kanin; Zhou, Xiaoqi; O'Brien, Jeremy L.; Wang, Jingbo B.; Matthews, Jonathan C. F.

    2016-01-01

    The random walk formalism is used across a wide range of applications, from modelling share prices to predicting population genetics. Likewise, quantum walks have shown much potential as a framework for developing new quantum algorithms. Here we present explicit efficient quantum circuits for implementing continuous-time quantum walks on the circulant class of graphs. These circuits allow us to sample from the output probability distributions of quantum walks on circulant graphs efficiently. We also show that solving the same sampling problem for arbitrary circulant quantum circuits is intractable for a classical computer, assuming conjectures from computational complexity theory. This is a new link between continuous-time quantum walks and computational complexity theory and it indicates a family of tasks that could ultimately demonstrate quantum supremacy over classical computers. As a proof of principle, we experimentally implement the proposed quantum circuit on an example circulant graph using a two-qubit photonics quantum processor. PMID:27146471

  15. The ALICE Central Trigger Processor (CTP) upgrade

    International Nuclear Information System (INIS)

    Krivda, M.; Alexandre, D.; Barnby, L.S.; Evans, D.; Jones, P.G.; Jusko, A.; Lietava, R.; Baillie, O. Villalobos; Pospíšil, J.

    2016-01-01

    The ALICE Central Trigger Processor (CTP) at the CERN LHC has been upgraded for LHC Run 2, to improve the Transition Radiation Detector (TRD) data-taking efficiency and to improve the physics performance of ALICE. There is a new additional CTP interaction record sent using a new second Detector Data Link (DDL), a 2 GB DDR3 memory and an extension of functionality for classes. The CTP switch has been incorporated directly onto the new LM0 board. A design proposal for an ALICE CTP upgrade for LHC Run 3 is also presented. Part of the development is a low latency high bandwidth interface whose purpose is to minimize an overall trigger latency

  16. Processor-in-memory-and-storage architecture

    Science.gov (United States)

    DeBenedictis, Erik

    2018-01-02

    A method and apparatus for performing reliable general-purpose computing. Each sub-core of a plurality of sub-cores of a processor core processes a same instruction at a same time. A code analyzer receives a plurality of residues that represents a code word corresponding to the same instruction and an indication of whether the code word is a memory address code or a data code from the plurality of sub-cores. The code analyzer determines whether the plurality of residues are consistent or inconsistent. The code analyzer and the plurality of sub-cores perform a set of operations based on whether the code word is a memory address code or a data code and a determination of whether the plurality of residues are consistent or inconsistent.

  17. Optimal processor for malfunction detection in operating nuclear reactor

    International Nuclear Information System (INIS)

    Ciftcioglu, O.

    1990-01-01

    An optimal processor for diagnosing operational transients in a nuclear reactor is described. Basic design of the processor involves real-time processing of noise signal obtained from a particular in core sensor and the optimality is based on minimum alarm failure in contrast to minimum false alarm criterion from the safe and reliable plant operation viewpoint

  18. Sojourn time tails in processor-sharing systems

    NARCIS (Netherlands)

    Egorova, R.R.

    2009-01-01

    The processor-sharing discipline was originally introduced as a modeling abstraction for the design and performance analysis of the processing unit of a computer system. Under the processor-sharing discipline, all active tasks are assumed to be processed simultaneously, receiving an equal share of

  19. ACP/R3000 processors in data acquisition systems

    International Nuclear Information System (INIS)

    Deppe, J.; Areti, H.; Atac, R.

    1989-02-01

    We describe ACP/R3000 processor based data acquisition systems for high energy physics. This VME bus compatible processor board, with a computational power equivalent to 15 VAX 11/780s or better, contains 8 Mb of memory for event buffering and has a high speed secondary bus that allows data gathering from front end electronics. 2 refs., 3 figs

  20. On the effective parallel programming of multi-core processors

    NARCIS (Netherlands)

    Varbanescu, A.L.

    2010-01-01

    Multi-core processors are considered now the only feasible alternative to the large single-core processors which have become limited by technological aspects such as power consumption and heat dissipation. However, due to their inherent parallel structure and their diversity, multi-cores are

  1. Bank switched memory interface for an image processor

    International Nuclear Information System (INIS)

    Barron, M.; Downward, J.

    1980-09-01

    A commercially available image processor is interfaced to a PDP-11/45 through an 8K window of memory addresses. When the image processor was not in use it was desired to be able to use the 8K address space as real memory. The standard method of accomplishing this would have been to use UNIBUS switches to switch in either the physical 8K bank of memory or the image processor memory. This method has the disadvantage of being rather expensive. As a simple alternative, a device was built to selectively enable or disable either an 8K bank of memory or the image processor memory. To enable the image processor under program control, GEN is contracted in size, the memory is disabled, a device partition for the image processor is created above GEN, and the image processor memory is enabled. The process is reversed to restore memory to GEN. The hardware to enable/disable the image and computer memories is controlled using spare bits from a DR-11K output register. The image processor and physical memory can be switched in or out on line with no adverse affects on the system's operation

  2. Digital image processing software system using an array processor

    International Nuclear Information System (INIS)

    Sherwood, R.J.; Portnoff, M.R.; Journeay, C.H.; Twogood, R.E.

    1981-01-01

    A versatile array processor-based system for general-purpose image processing was developed. At the heart of this system is an extensive, flexible software package that incorporates the array processor for effective interactive image processing. The software system is described in detail, and its application to a diverse set of applications at LLNL is briefly discussed. 4 figures, 1 table

  3. Designing a dataflow processor using CλaSH

    NARCIS (Netherlands)

    Niedermeier, A.; Wester, Rinse; Wester, Rinse; Rovers, K.C.; Baaij, C.P.R.; Kuper, Jan; Smit, Gerardus Johannes Maria

    2010-01-01

    In this paper we show how a simple dataflow processor can be fully implemented using CλaSH, a high level HDL based on the functional programming language Haskell. The processor was described using Haskell, the CλaSH compiler was then used to translate the design into a fully synthesisable VHDL code.

  4. Biomass is beginning to threaten the wood-processors

    International Nuclear Information System (INIS)

    Beer, G.; Sobinkovic, B.

    2004-01-01

    In this issue an exploitation of biomass in Slovak Republic is analysed. Some new projects of constructing of the stoke-holds for biomass processing are published. The grants for biomass are ascending the prices of wood raw material, which is thus becoming less accessible for the wood-processors. An excessive wood export threatens the domestic processors

  5. Digital Signal Processor System for AC Power Drivers

    Directory of Open Access Journals (Sweden)

    Ovidiu Neamtu

    2009-10-01

    Full Text Available DSP (Digital Signal Processor is the bestsolution for motor control systems to make possible thedevelopment of advanced motor drive systems. The motorcontrol processor calculates the required motor windingvoltage magnitude and frequency to operate the motor atthe desired speed. A PWM (Pulse Width Modulationcircuit controls the on and off duty cycle of the powerinverter switches to vary the magnitude of the motorvoltages.

  6. Evaluation of the Intel Sandy Bridge-EP server processor

    CERN Document Server

    Jarp, S; Leduc, J; Nowak, A; CERN. Geneva. IT Department

    2012-01-01

    In this paper we report on a set of benchmark results recently obtained by CERN openlab when comparing an 8-core “Sandy Bridge-EP” processor with Intel’s previous microarchitecture, the “Westmere-EP”. The Intel marketing names for these processors are “Xeon E5-2600 processor series” and “Xeon 5600 processor series”, respectively. Both processors are produced in a 32nm process, and both platforms are dual-socket servers. Multiple benchmarks were used to get a good understanding of the performance of the new processor. We used both industry-standard benchmarks, such as SPEC2006, and specific High Energy Physics benchmarks, representing both simulation of physics detectors and data analysis of physics events. Before summarizing the results we must stress the fact that benchmarking of modern processors is a very complex affair. One has to control (at least) the following features: processor frequency, overclocking via Turbo mode, the number of physical cores in use, the use of logical cores ...

  7. Recursive Matrix Inverse Update On An Optical Processor

    Science.gov (United States)

    Casasent, David P.; Baranoski, Edward J.

    1988-02-01

    A high accuracy optical linear algebraic processor (OLAP) using the digital multiplication by analog convolution (DMAC) algorithm is described for use in an efficient matrix inverse update algorithm with speed and accuracy advantages. The solution of the parameters in the algorithm are addressed and the advantages of optical over digital linear algebraic processors are advanced.

  8. CPU0213, a novel endothelin type A and type B receptor antagonist, protects against myocardial ischemia/reperfusion injury in rats

    Directory of Open Access Journals (Sweden)

    Z.Y. Wang

    2011-11-01

    Full Text Available The efficacy of endothelin receptor antagonists in protecting against myocardial ischemia/reperfusion (I/R injury is controversial, and the mechanisms remain unclear. The aim of this study was to investigate the effects of CPU0123, a novel endothelin type A and type B receptor antagonist, on myocardial I/R injury and to explore the mechanisms involved. Male Sprague-Dawley rats weighing 200-250 g were randomized to three groups (6-7 per group: group 1, Sham; group 2, I/R + vehicle. Rats were subjected to in vivo myocardial I/R injury by ligation of the left anterior descending coronary artery and 0.5% sodium carboxymethyl cellulose (1 mL/kg was injected intraperitoneally immediately prior to coronary occlusion. Group 3, I/R + CPU0213. Rats were subjected to identical surgical procedures and CPU0213 (30 mg/kg was injected intraperitoneally immediately prior to coronary occlusion. Infarct size, cardiac function and biochemical changes were measured. CPU0213 pretreatment reduced infarct size as a percentage of the ischemic area by 44.5% (I/R + vehicle: 61.3 ± 3.2 vs I/R + CPU0213: 34.0 ± 5.5%, P < 0.05 and improved ejection fraction by 17.2% (I/R + vehicle: 58.4 ± 2.8 vs I/R + CPU0213: 68.5 ± 2.2%, P < 0.05 compared to vehicle-treated animals. This protection was associated with inhibition of myocardial inflammation and oxidative stress. Moreover, reduction in Akt (protein kinase B and endothelial nitric oxide synthase (eNOS phosphorylation induced by myocardial I/R injury was limited by CPU0213 (P < 0.05. These data suggest that CPU0123, a non-selective antagonist, has protective effects against myocardial I/R injury in rats, which may be related to the Akt/eNOS pathway.

  9. Acoustooptic linear algebra processors - Architectures, algorithms, and applications

    Science.gov (United States)

    Casasent, D.

    1984-01-01

    Architectures, algorithms, and applications for systolic processors are described with attention to the realization of parallel algorithms on various optical systolic array processors. Systolic processors for matrices with special structure and matrices of general structure, and the realization of matrix-vector, matrix-matrix, and triple-matrix products and such architectures are described. Parallel algorithms for direct and indirect solutions to systems of linear algebraic equations and their implementation on optical systolic processors are detailed with attention to the pipelining and flow of data and operations. Parallel algorithms and their optical realization for LU and QR matrix decomposition are specifically detailed. These represent the fundamental operations necessary in the implementation of least squares, eigenvalue, and SVD solutions. Specific applications (e.g., the solution of partial differential equations, adaptive noise cancellation, and optimal control) are described to typify the use of matrix processors in modern advanced signal processing.

  10. APRON: A Cellular Processor Array Simulation and Hardware Design Tool

    Science.gov (United States)

    Barr, David R. W.; Dudek, Piotr

    2009-12-01

    We present a software environment for the efficient simulation of cellular processor arrays (CPAs). This software (APRON) is used to explore algorithms that are designed for massively parallel fine-grained processor arrays, topographic multilayer neural networks, vision chips with SIMD processor arrays, and related architectures. The software uses a highly optimised core combined with a flexible compiler to provide the user with tools for the design of new processor array hardware architectures and the emulation of existing devices. We present performance benchmarks for the software processor array implemented on standard commodity microprocessors. APRON can be configured to use additional processing hardware if necessary and can be used as a complete graphical user interface and development environment for new or existing CPA systems, allowing more users to develop algorithms for CPA systems.

  11. Multiple Embedded Processors for Fault-Tolerant Computing

    Science.gov (United States)

    Bolotin, Gary; Watson, Robert; Katanyoutanant, Sunant; Burke, Gary; Wang, Mandy

    2005-01-01

    A fault-tolerant computer architecture has been conceived in an effort to reduce vulnerability to single-event upsets (spurious bit flips caused by impingement of energetic ionizing particles or photons). As in some prior fault-tolerant architectures, the redundancy needed for fault tolerance is obtained by use of multiple processors in one computer. Unlike prior architectures, the multiple processors are embedded in a single field-programmable gate array (FPGA). What makes this new approach practical is the recent commercial availability of FPGAs that are capable of having multiple embedded processors. A working prototype (see figure) consists of two embedded IBM PowerPC 405 processor cores and a comparator built on a Xilinx Virtex-II Pro FPGA. This relatively simple instantiation of the architecture implements an error-detection scheme. A planned future version, incorporating four processors and two comparators, would correct some errors in addition to detecting them.

  12. Experimental testing of the noise-canceling processor.

    Science.gov (United States)

    Collins, Michael D; Baer, Ralph N; Simpson, Harry J

    2011-09-01

    Signal-processing techniques for localizing an acoustic source buried in noise are tested in a tank experiment. Noise is generated using a discrete source, a bubble generator, and a sprinkler. The experiment has essential elements of a realistic scenario in matched-field processing, including complex source and noise time series in a waveguide with water, sediment, and multipath propagation. The noise-canceling processor is found to outperform the Bartlett processor and provide the correct source range for signal-to-noise ratios below -10 dB. The multivalued Bartlett processor is found to outperform the Bartlett processor but not the noise-canceling processor. © 2011 Acoustical Society of America

  13. Simulation of a processor switching circuit with APLSV

    International Nuclear Information System (INIS)

    Dilcher, H.

    1979-01-01

    The report describes the simulation of a processor switching circuit with APL. Furthermore an APL function is represented to simulate a processor in an assembly like language. Both together serve as a tool for studying processor properties. By means of the programming function it is also possible to program other simulated processors. The processor is to be used in the processing of data in real time analysis that occur in high energy physics experiments. The data are already offered to the computer in digitalized form. A typical data rate is at 10 KB/ sec. The data are structured in blocks. The particular blocks are 1 KB wide and are independent from each other. Aprocessor has to decide, whether the block data belong to an event that is part of the backround noise and can therefore be forgotten, or whether the data should be saved for a later evaluation. (orig./WB) [de

  14. New development for low energy electron beam processor

    International Nuclear Information System (INIS)

    Takei, Taro; Goto, Hitoshi; Oizumi, Matsutoshi; Hirakawa, Tetsuya; Ochi, Masafumi

    2003-01-01

    Newly developed low-energy electron beam (EB) processors that have unique designs and configurations compared to conventional ones enable electron-beam treatment of small three-dimensional objects, such as grain-like agricultural products and small plastic parts. As the EB processor can irradiate the products from the whole angles, the uniform EB treatment can be achieved at one time regardless the complex shapes of the product. Here presented are two new EB processors: the first system has cylindrical process zone, which allows three-dimensional objects to be irradiated with one-pass treatment. The second is a tube-type small EB processor, achieving not only its compactor design, but also higher beam extraction efficiency and flexible installation of the irradiation heads. The basic design of each processor and potential applications with them will be presented in this paper. (author)

  15. MPC Related Computational Capabilities of ARMv7A Processors

    DEFF Research Database (Denmark)

    Frison, Gianluca; Jørgensen, John Bagterp

    2015-01-01

    In recent years, the mass market of mobile devices has pushed the demand for increasingly fast but cheap processors. ARM, the world leader in this sector, has developed the Cortex-A series of processors with focus on computationally intensive applications. If properly programmed, these processors...... are powerful enough to solve the complex optimization problems arising in MPC in real-time, while keeping the traditional low-cost and low-power consumption. This makes these processors ideal candidates for use in embedded MPC. In this paper, we investigate the floating-point capabilities of Cortex A7, A9...... and A15 and show how to exploit the unique features of each processor to obtain the best performance, in the context of a novel implementation method for the linear-algebra routines used in MPC solvers. This method adapts high-performance computing techniques to the needs of embedded MPC. In particular...

  16. APRON: A Cellular Processor Array Simulation and Hardware Design Tool

    Directory of Open Access Journals (Sweden)

    David R. W. Barr

    2009-01-01

    Full Text Available We present a software environment for the efficient simulation of cellular processor arrays (CPAs. This software (APRON is used to explore algorithms that are designed for massively parallel fine-grained processor arrays, topographic multilayer neural networks, vision chips with SIMD processor arrays, and related architectures. The software uses a highly optimised core combined with a flexible compiler to provide the user with tools for the design of new processor array hardware architectures and the emulation of existing devices. We present performance benchmarks for the software processor array implemented on standard commodity microprocessors. APRON can be configured to use additional processing hardware if necessary and can be used as a complete graphical user interface and development environment for new or existing CPA systems, allowing more users to develop algorithms for CPA systems.

  17. SAD PROCESSOR FOR MULTIPLE MACROBLOCK MATCHING IN FAST SEARCH VIDEO MOTION ESTIMATION

    Directory of Open Access Journals (Sweden)

    Nehal N. Shah

    2015-02-01

    Full Text Available Motion estimation is a very important but computationally complex task in video coding. Process of determining motion vectors based on the temporal correlation of consecutive frame is used for video compression. In order to reduce the computational complexity of motion estimation and maintain the quality of encoding during motion compensation, different fast search techniques are available. These block based motion estimation algorithms use the sum of absolute difference (SAD between corresponding macroblock in current frame and all the candidate macroblocks in the reference frame to identify best match. Existing implementations can perform SAD between two blocks using sequential or pipeline approach but performing multi operand SAD in single clock cycle with optimized recourses is state of art. In this paper various parallel architectures for computation of the fixed block size SAD is evaluated and fast parallel SAD architecture is proposed with optimized resources. Further SAD processor is described with 9 processing elements which can be configured for any existing fast search block matching algorithm. Proposed SAD processor consumes 7% fewer adders compared to existing implementation for one processing elements. Using nine PE it can process 84 HD frames per second in worse case which is good outcome for real time implementation. In average case architecture process 325 HD frames per second.

  18. Fundamental physics issues of multilevel logic in developing a parallel processor.

    Science.gov (United States)

    Bandyopadhyay, Anirban; Miki, Kazushi

    2007-06-01

    In the last century, On and Off physical switches, were equated with two decisions 0 and 1 to express every information in terms of binary digits and physically realize it in terms of switches connected in a circuit. Apart from memory-density increase significantly, more possible choices in particular space enables pattern-logic a reality, and manipulation of pattern would allow controlling logic, generating a new kind of processor. Neumann's computer is based on sequential logic, processing bits one by one. But as pattern-logic is generated on a surface, viewing whole pattern at a time is a truly parallel processing. Following Neumann's and Shannons fundamental thermodynamical approaches we have built compatible model based on series of single molecule based multibit logic systems of 4-12 bits in an UHV-STM. On their monolayer multilevel communication and pattern formation is experimentally verified. Furthermore, the developed intelligent monolayer is trained by Artificial Neural Network. Therefore fundamental weak interactions for the building of truly parallel processor are explored here physically and theoretically.

  19. Multi-agent sequential hypothesis testing

    KAUST Repository

    Kim, Kwang-Ki K.; Shamma, Jeff S.

    2014-01-01

    incorporate costs of taking private/public measurements, costs of time-difference and disagreement in actions of agents, and costs of false declaration/choices in the sequential hypothesis testing. The corresponding sequential decision processes have well

  20. Launching applications on compute and service processors running under different operating systems in scalable network of processor boards with routers

    Science.gov (United States)

    Tomkins, James L [Albuquerque, NM; Camp, William J [Albuquerque, NM

    2009-03-17

    A multiple processor computing apparatus includes a physical interconnect structure that is flexibly configurable to support selective segregation of classified and unclassified users. The physical interconnect structure also permits easy physical scalability of the computing apparatus. The computing apparatus can include an emulator which permits applications from the same job to be launched on processors that use different operating systems.

  1. Robustness of the Sequential Lineup Advantage

    Science.gov (United States)

    Gronlund, Scott D.; Carlson, Curt A.; Dailey, Sarah B.; Goodsell, Charles A.

    2009-01-01

    A growing movement in the United States and around the world involves promoting the advantages of conducting an eyewitness lineup in a sequential manner. We conducted a large study (N = 2,529) that included 24 comparisons of sequential versus simultaneous lineups. A liberal statistical criterion revealed only 2 significant sequential lineup…

  2. Sequential Probability Ration Tests : Conservative and Robust

    NARCIS (Netherlands)

    Kleijnen, J.P.C.; Shi, Wen

    2017-01-01

    In practice, most computers generate simulation outputs sequentially, so it is attractive to analyze these outputs through sequential statistical methods such as sequential probability ratio tests (SPRTs). We investigate several SPRTs for choosing between two hypothesized values for the mean output

  3. Random sequential adsorption of cubes

    Science.gov (United States)

    Cieśla, Michał; Kubala, Piotr

    2018-01-01

    Random packings built of cubes are studied numerically using a random sequential adsorption algorithm. To compare the obtained results with previous reports, three different models of cube orientation sampling were used. Also, three different cube-cube intersection algorithms were tested to find the most efficient one. The study focuses on the mean saturated packing fraction as well as kinetics of packing growth. Microstructural properties of packings were analyzed using density autocorrelation function.

  4. FPGA Implementation of Decimal Processors for Hardware Acceleration

    DEFF Research Database (Denmark)

    Borup, Nicolas; Dindorp, Jonas; Nannarelli, Alberto

    2011-01-01

    Applications in non-conventional number systems can benefit from accelerators implemented on reconfigurable platforms, such as Field Programmable Gate-Arrays (FPGAs). In this paper, we show that applications requiring decimal operations, such as the ones necessary in accounting or financial trans...... execution on the CPU of the hosting computer....

  5. Evaluation of Selected Resource Allocation and Scheduling Methods in Heterogeneous Many-Core Processors and Graphics Processing Units

    Directory of Open Access Journals (Sweden)

    Ciznicki Milosz

    2014-12-01

    Full Text Available Heterogeneous many-core computing resources are increasingly popular among users due to their improved performance over homogeneous systems. Many developers have realized that heterogeneous systems, e.g. a combination of a shared memory multi-core CPU machine with massively parallel Graphics Processing Units (GPUs, can provide significant performance opportunities to a wide range of applications. However, the best overall performance can only be achieved if application tasks are efficiently assigned to different types of processor units in time taking into account their specific resource requirements. Additionally, one should note that available heterogeneous resources have been designed as general purpose units, however, with many built-in features accelerating specific application operations. In other words, the same algorithm or application functionality can be implemented as a different task for CPU or GPU. Nevertheless, from the perspective of various evaluation criteria, e.g. the total execution time or energy consumption, we may observe completely different results. Therefore, as tasks can be scheduled and managed in many alternative ways on both many-core CPUs or GPUs and consequently have a huge impact on the overall computing resources performance, there are needs for new and improved resource management techniques. In this paper we discuss results achieved during experimental performance studies of selected task scheduling methods in heterogeneous computing systems. Additionally, we present a new architecture for resource allocation and task scheduling library which provides a generic application programming interface at the operating system level for improving scheduling polices taking into account a diversity of tasks and heterogeneous computing resources characteristics.

  6. First level trigger processor for the ZEUS calorimeter

    International Nuclear Information System (INIS)

    Dawson, J.W.; Talaga, R.L.; Burr, G.W.; Laird, R.J.; Smith, W.; Lackey, J.

    1990-01-01

    This paper discusses the design of the first level trigger processor for the ZEUS calorimeter. This processor accepts data from the 13,000 photomultipliers of the calorimeter which is topologically divided into 16 regions, and after regional preprocessing, performs logical and numerical operations which cross regional boundaries. Because the crossing period at the HERA collider is 96 ns, it is necessary that first-level trigger decisions be made in pipelined hardware. One microsecond is allowed for the processor to perform the required logical and numerical operations, during which time the data from ten crossings would be resident in the processor while being clocked through the pipelined hardware. The circuitry is implemented in 100K ECL, Advanced CMOS discrete devices, and programmable gate arrays, and operates in a VME environment. All tables and registers are written/read from VME, and all diagnostic codes are executed from VME. Preprocessed data flows into the processor at a rate of 5.2GB/s, and processed data flows from the processor to the Global First-Level Trigger at a rate of 700MB/s. The system allows for subsets of the logic to be configured by software and for various important variables to be histogrammed as they flow through the processor. 2 refs., 3 figs

  7. A dedicated line-processor as used at the SHF

    International Nuclear Information System (INIS)

    Bevan, A.V.; Hatley, R.W.; Price, D.R.; Rankin, P.

    1985-01-01

    A hardwired trigger processor was used at the SLAC Hybrid Facility to find evidence for charged tracks originating from the fiducial volume of a 40'' rapidcycling bubble chamber. Straight-line projections of these tracks in the plane perpendicular to the applied magnetic field were searched for using data from three sets of proportional wire chambers (PWC). This information was made directly available to the processor by means of a special digitizing card. The results memory of the processor simulated read-only memory in a 168/E processor and was accessible by it. The 168/E controlled the issuing of a trigger command to the bubble chamber flash tubes. The same design of digitizer card used by the line processor was incorporated into the 168/E, again as read only memory, which allowed it access to the raw data for continual monitoring of trigger integrity. The design logic of the trigger processor was verified by running real PWC data through a FORTRAN simulation of the hardware. This enabled the debugging to become highly automated since a step by step, computer controlled comparison of processor registers to simulation predictions could be made

  8. First-level trigger processor for the ZEUS calorimeter

    International Nuclear Information System (INIS)

    Dawson, J.W.; Talaga, R.L.; Burr, G.W.; Laird, R.J.; Smith, W.; Lackey, J.

    1990-01-01

    The design of the first-level trigger processor for the Zeus calorimeter is discussed. This processor accepts data from the 13,000 photomultipliers of the calorimeter, which is topologically divided into 16 regions, and after regional preprocessing performs logical and numerical operations that cross regional boundaries. Because the crossing period at the HERA collider is 96 ns, it is necessary that first-level trigger decisions be made in pipelined hardware. One microsecond is allowed for the processor to perform the required logical and numerical operations, during which time the data from ten crossings would be resident in the processor while being clocked through the pipelined hardware. The circuitry is implemented in 100K emitter-coupled logic (ECL), advanced CMOS discrete devices and programmable gate arrays, and operates in a VME environment. All tables and registers are written/read from VME, and all diagnostic codes are executed from VME. Preprocessed data flows into the processor at a rate of 5.2 Gbyte/s, and processed data flows from the processor to the global first-level trigger at a rate of 70 Mbyte/s. The system allows for subsets of the logic to be configured by software and for various important variables to be histogrammed as they flow through the processor

  9. On the incidence of meteorological and hydrological processors: Effect of resolution, sharpness and reliability of hydrological ensemble forecasts

    Science.gov (United States)

    Abaza, Mabrouk; Anctil, François; Fortin, Vincent; Perreault, Luc

    2017-12-01

    Meteorological and hydrological ensemble prediction systems are imperfect. Their outputs could often be improved through the use of a statistical processor, opening up the question of the necessity of using both processors (meteorological and hydrological), only one of them, or none. This experiment compares the predictive distributions from four hydrological ensemble prediction systems (H-EPS) utilising the Ensemble Kalman filter (EnKF) probabilistic sequential data assimilation scheme. They differ in the inclusion or not of the Distribution Based Scaling (DBS) method for post-processing meteorological forecasts and the ensemble Bayesian Model Averaging (ensemble BMA) method for hydrological forecast post-processing. The experiment is implemented on three large watersheds and relies on the combination of two meteorological reforecast products: the 4-member Canadian reforecasts from the Canadian Centre for Meteorological and Environmental Prediction (CCMEP) and the 10-member American reforecasts from the National Oceanic and Atmospheric Administration (NOAA), leading to 14 members at each time step. Results show that all four tested H-EPS lead to resolution and sharpness values that are quite similar, with an advantage to DBS + EnKF. The ensemble BMA is unable to compensate for any bias left in the precipitation ensemble forecasts. On the other hand, it succeeds in calibrating ensemble members that are otherwise under-dispersed. If reliability is preferred over resolution and sharpness, DBS + EnKF + ensemble BMA performs best, making use of both processors in the H-EPS system. Conversely, for enhanced resolution and sharpness, DBS is the preferred method.

  10. Robust real-time pattern matching using bayesian sequential hypothesis testing.

    Science.gov (United States)

    Pele, Ofir; Werman, Michael

    2008-08-01

    This paper describes a method for robust real time pattern matching. We first introduce a family of image distance measures, the "Image Hamming Distance Family". Members of this family are robust to occlusion, small geometrical transforms, light changes and non-rigid deformations. We then present a novel Bayesian framework for sequential hypothesis testing on finite populations. Based on this framework, we design an optimal rejection/acceptance sampling algorithm. This algorithm quickly determines whether two images are similar with respect to a member of the Image Hamming Distance Family. We also present a fast framework that designs a near-optimal sampling algorithm. Extensive experimental results show that the sequential sampling algorithm performance is excellent. Implemented on a Pentium 4 3 GHz processor, detection of a pattern with 2197 pixels, in 640 x 480 pixel frames, where in each frame the pattern rotated and was highly occluded, proceeds at only 0.022 seconds per frame.

  11. Novel memory architecture for video signal processor

    Science.gov (United States)

    Hung, Jen-Sheng; Lin, Chia-Hsing; Jen, Chein-Wei

    1993-11-01

    An on-chip memory architecture for video signal processor (VSP) is proposed. This memory structure is a two-level design for the different data locality in video applications. The upper level--Memory A provides enough storage capacity to reduce the impact on the limitation of chip I/O bandwidth, and the lower level--Memory B provides enough data parallelism and flexibility to meet the requirements of multiple reconfigurable pipeline function units in a single VSP chip. The needed memory size is decided by the memory usage analysis for video algorithms and the number of function units. Both levels of memory adopted a dual-port memory scheme to sustain the simultaneous read and write operations. Especially, Memory B uses multiple one-read-one-write memory banks to emulate the real multiport memory. Therefore, one can change the configuration of Memory B to several sets of memories with variable read/write ports by adjusting the bus switches. Then the numbers of read ports and write ports in proposed memory can meet requirement of data flow patterns in different video coding algorithms. We have finished the design of a prototype memory design using 1.2- micrometers SPDM SRAM technology and will fabricated it through TSMC, in Taiwan.

  12. A CNN-Specific Integrated Processor

    Directory of Open Access Journals (Sweden)

    Suleyman Malki

    2009-01-01

    Full Text Available Integrated Processors (IP are algorithm-specific cores that either by programming or by configuration can be re-used within many microelectronic systems. This paper looks at Cellular Neural Networks (CNN to become realized as IP. First current digital implementations are reviewed, and the memoryprocessor bandwidth issues are analyzed. Then a generic view is taken on the structure of the network, and a new intra-communication protocol based on rotating wheels is proposed. It is shown that this provides for guaranteed high-performance with a minimal network interface. The resulting node is small and supports multi-level CNN designs, giving the system a 30-fold increase in capacity compared to classical designs. As it facilitates multiple operations on a single image, and single operations on multiple images, with minimal access to the external image memory, balancing the internal and external data transfer requirements optimizes the system operation. In conventional digital CNN designs, the treatment of boundary nodes requires additional logic to handle the CNN value propagation scheme. In the new architecture, only a slight modification of the existing cells is necessary to model the boundary effect. A typical prototype for visual pattern recognition will house 4096 CNN cells with a 2% overhead for making it an IP.

  13. The ATLAS fast tracker processor design

    CERN Document Server

    Volpi, Guido; Albicocco, Pietro; Alison, John; Ancu, Lucian Stefan; Anderson, James; Andari, Nansi; Andreani, Alessandro; Andreazza, Attilio; Annovi, Alberto; Antonelli, Mario; Asbah, Needa; Atkinson, Markus; Baines, J; Barberio, Elisabetta; Beccherle, Roberto; Beretta, Matteo; Biesuz, Nicolo Vladi; Blair, R E; Bogdan, Mircea; Boveia, Antonio; Britzger, Daniel; Bryant, Partick; Burghgrave, Blake; Calderini, Giovanni; Camplani, Alessandra; Cavaliere, Viviana; Cavasinni, Vincenzo; Chakraborty, Dhiman; Chang, Philip; Cheng, Yangyang; Citraro, Saverio; Citterio, Mauro; Crescioli, Francesco; Dawe, Noel; Dell'Orso, Mauro; Donati, Simone; Dondero, Paolo; Drake, G; Gadomski, Szymon; Gatta, Mauro; Gentsos, Christos; Giannetti, Paola; Gkaitatzis, Stamatios; Gramling, Johanna; Howarth, James William; Iizawa, Tomoya; Ilic, Nikolina; Jiang, Zihao; Kaji, Toshiaki; Kasten, Michael; Kawaguchi, Yoshimasa; Kim, Young Kee; Kimura, Naoki; Klimkovich, Tatsiana; Kolb, Mathis; Kordas, K; Krizka, Karol; Kubota, T; Lanza, Agostino; Li, Ho Ling; Liberali, Valentino; Lisovyi, Mykhailo; Liu, Lulu; Love, Jeremy; Luciano, Pierluigi; Luongo, Carmela; Magalotti, Daniel; Maznas, Ioannis; Meroni, Chiara; Mitani, Takashi; Nasimi, Hikmat; Negri, Andrea; Neroutsos, Panos; Neubauer, Mark; Nikolaidis, Spiridon; Okumura, Y; Pandini, Carlo; Petridou, Chariclia; Piendibene, Marco; Proudfoot, James; Rados, Petar Kevin; Roda, Chiara; Rossi, Enrico; Sakurai, Yuki; Sampsonidis, Dimitrios; Saxon, James; Schmitt, Stefan; Schoening, Andre; Shochet, Mel; Shoijaii, Jafar; Soltveit, Hans Kristian; Sotiropoulou, Calliope-Louisa; Stabile, Alberto; Swiatlowski, Maximilian J; Tang, Fukun; Taylor, Pierre Thor Elliot; Testa, Marianna; Tompkins, Lauren; Vercesi, V; Wang, Rui; Watari, Ryutaro; Zhang, Jianhong; Zeng, Jian Cong; Zou, Rui; Bertolucci, Federico

    2015-01-01

    The extended use of tracking information at the trigger level in the LHC is crucial for the trigger and data acquisition (TDAQ) system to fulfill its task. Precise and fast tracking is important to identify specific decay products of the Higgs boson or new phenomena, as well as to distinguish the contributions coming from the many collisions that occur at every bunch crossing. However, track reconstruction is among the most demanding tasks performed by the TDAQ computing farm; in fact, complete reconstruction at full Level-1 trigger accept rate (100 kHz) is not possible. In order to overcome this limitation, the ATLAS experiment is planning the installation of a dedicated processor, the Fast Tracker (FTK), which is aimed at achieving this goal. The FTK is a pipeline of high performance electronics, based on custom and commercial devices, which is expected to reconstruct, with high resolution, the trajectories of charged-particle tracks with a transverse momentum above 1 GeV, using the ATLAS inner tracker info...

  14. Preventing Precipitation in the ISS Urine Processor

    Science.gov (United States)

    Muirhead, Dean; Carter, Layne; Williamson, Jill; Chambers, Antja

    2017-01-01

    The ISS Urine Processor Assembly (UPA) was initially designed to achieve 85% recovery of water from pretreated urine on ISS. Pretreated urine is comprised of crew urine treated with flush water, an oxidant (chromium trioxide), and an inorganic acid (sulfuric acid) to control microbial growth and inhibit precipitation. Unfortunately, initial operation of the UPA on ISS resulted in the precipitation of calcium sulfate at 85% recovery. This occurred because the calcium concentration in the crew urine was elevated in microgravity due to bone loss. The higher calcium concentration precipitated with sulfate from the pretreatment acid, resulting in a failure of the UPA due to the accumulation of solids in the Distillation Assembly. Since this failure, the UPA has been limited to a reduced recovery of water from urine to prevent calcium sulfate from reaching the solubility limit. NASA personnel have worked to identify a solution that would allow the UPA to return to a nominal recovery rate of 85%. This effort has culminated with the development of a pretreatment based on phosphoric acid instead of sulfuric acid. By eliminating the sulfate associated with the pretreatment, the brine can be concentrated to a much higher concentration before calcium sulfate reach the solubility limit. This paper summarizes the development of this pretreatment and the testing performed to verify its implementation on ISS.

  15. Multipurpose silicon photonics signal processor core.

    Science.gov (United States)

    Pérez, Daniel; Gasulla, Ivana; Crudgington, Lee; Thomson, David J; Khokhar, Ali Z; Li, Ke; Cao, Wei; Mashanovich, Goran Z; Capmany, José

    2017-09-21

    Integrated photonics changes the scaling laws of information and communication systems offering architectural choices that combine photonics with electronics to optimize performance, power, footprint, and cost. Application-specific photonic integrated circuits, where particular circuits/chips are designed to optimally perform particular functionalities, require a considerable number of design and fabrication iterations leading to long development times. A different approach inspired by electronic Field Programmable Gate Arrays is the programmable photonic processor, where a common hardware implemented by a two-dimensional photonic waveguide mesh realizes different functionalities through programming. Here, we report the demonstration of such reconfigurable waveguide mesh in silicon. We demonstrate over 20 different functionalities with a simple seven hexagonal cell structure, which can be applied to different fields including communications, chemical and biomedical sensing, signal processing, multiprocessor networks, and quantum information systems. Our work is an important step toward this paradigm.Integrated optical circuits today are typically designed for a few special functionalities and require complex design and development procedures. Here, the authors demonstrate a reconfigurable but simple silicon waveguide mesh with different functionalities.

  16. Element Load Data Processor (ELDAP) Users Manual

    Science.gov (United States)

    Ramsey, John K., Jr.; Ramsey, John K., Sr.

    2015-01-01

    Often, the shear and tensile forces and moments are extracted from finite element analyses to be used in off-line calculations for evaluating the integrity of structural connections involving bolts, rivets, and welds. Usually the maximum forces and moments are desired for use in the calculations. In situations where there are numerous structural connections of interest for numerous load cases, the effort in finding the true maximum force and/or moment combinations among all fasteners and welds and load cases becomes difficult. The Element Load Data Processor (ELDAP) software described herein makes this effort manageable. This software eliminates the possibility of overlooking the worst-case forces and moments that could result in erroneous positive margins of safety and/or selecting inconsistent combinations of forces and moments resulting in false negative margins of safety. In addition to forces and moments, any scalar quantity output in a PATRAN report file may be evaluated with this software. This software was originally written to fill an urgent need during the structural analysis of the Ares I-X Interstage segment. As such, this software was coded in a straightforward manner with no effort made to optimize or minimize code or to develop a graphical user interface.

  17. Scientific Computing Kernels on the Cell Processor

    Energy Technology Data Exchange (ETDEWEB)

    Williams, Samuel W.; Shalf, John; Oliker, Leonid; Kamil, Shoaib; Husbands, Parry; Yelick, Katherine

    2007-04-04

    The slowing pace of commodity microprocessor performance improvements combined with ever-increasing chip power demands has become of utmost concern to computational scientists. As a result, the high performance computing community is examining alternative architectures that address the limitations of modern cache-based designs. In this work, we examine the potential of using the recently-released STI Cell processor as a building block for future high-end computing systems. Our work contains several novel contributions. First, we introduce a performance model for Cell and apply it to several key scientific computing kernels: dense matrix multiply, sparse matrix vector multiply, stencil computations, and 1D/2D FFTs. The difficulty of programming Cell, which requires assembly level intrinsics for the best performance, makes this model useful as an initial step in algorithm design and evaluation. Next, we validate the accuracy of our model by comparing results against published hardware results, as well as our own implementations on a 3.2GHz Cell blade. Additionally, we compare Cell performance to benchmarks run on leading superscalar (AMD Opteron), VLIW (Intel Itanium2), and vector (Cray X1E) architectures. Our work also explores several different mappings of the kernels and demonstrates a simple and effective programming model for Cell's unique architecture. Finally, we propose modest microarchitectural modifications that could significantly increase the efficiency of double-precision calculations. Overall results demonstrate the tremendous potential of the Cell architecture for scientific computations in terms of both raw performance and power efficiency.

  18. Nonlinear Wave Simulation on the Xeon Phi Knights Landing Processor

    Science.gov (United States)

    Hristov, Ivan; Goranov, Goran; Hristova, Radoslava

    2018-02-01

    We consider an interesting from computational point of view standing wave simulation by solving coupled 2D perturbed Sine-Gordon equations. We make an OpenMP realization which explores both thread and SIMD levels of parallelism. We test the OpenMP program on two different energy equivalent Intel architectures: 2× Xeon E5-2695 v2 processors, (code-named "Ivy Bridge-EP") in the Hybrilit cluster, and Xeon Phi 7250 processor (code-named "Knights Landing" (KNL). The results show 2 times better performance on KNL processor.

  19. Nonlinear Wave Simulation on the Xeon Phi Knights Landing Processor

    Directory of Open Access Journals (Sweden)

    Hristov Ivan

    2018-01-01

    Full Text Available We consider an interesting from computational point of view standing wave simulation by solving coupled 2D perturbed Sine-Gordon equations. We make an OpenMP realization which explores both thread and SIMD levels of parallelism. We test the OpenMP program on two different energy equivalent Intel architectures: 2× Xeon E5-2695 v2 processors, (code-named “Ivy Bridge-EP” in the Hybrilit cluster, and Xeon Phi 7250 processor (code-named “Knights Landing” (KNL. The results show 2 times better performance on KNL processor.

  20. Median and Morphological Specialized Processors for a Real-Time Image Data Processing

    Directory of Open Access Journals (Sweden)

    Kazimierz Wiatr

    2002-01-01

    Full Text Available This paper presents the considerations on selecting a multiprocessor MISD architecture for fast implementation of the vision image processing. Using the author′s earlier experience with real-time systems, implementing of specialized hardware processors based on the programmable FPGA systems has been proposed in the pipeline architecture. In particular, the following processors are presented: median filter and morphological processor. The structure of a universal reconfigurable processor developed has been proposed as well. Experimental results are presented as delays on LCA level implementation for median filter, morphological processor, convolution processor, look-up-table processor, logic processor and histogram processor. These times compare with delays in general purpose processor and DSP processor.

  1. Reconfigurable VLIW Processor for Software Defined Radio, Phase I

    Data.gov (United States)

    National Aeronautics and Space Administration — We will design and formally verify a VLIW processor that is radiation-hardened, and where the VLIW instructions consist of predicated RISC instructions from the...

  2. Detailed algorithmic description of a processor: a recipe for ...

    African Journals Online (AJOL)

    International Journal of Natural and Applied Sciences ... a simple developed compiler could generate the code of a simple programming language. ... It should be noted that such code generation must be done on a particular processor- for ...

  3. Analysis of Intel IA-64 Processor Support for Secure Systems

    National Research Council Canada - National Science Library

    Unalmis, Bugra

    2001-01-01

    .... Systems could be constructed for which serious security threats would be eliminated. This thesis explores the Intel IA-64 processor's hardware support and its relationship to software for building a secure system...

  4. Optical backplane interconnect switch for data processors and computers

    Science.gov (United States)

    Hendricks, Herbert D.; Benz, Harry F.; Hammer, Jacob M.

    1989-01-01

    An optoelectronic integrated device design is reported which can be used to implement an all-optical backplane interconnect switch. The switch is sized to accommodate an array of processors and memories suitable for direct replacement into the basic avionic multiprocessor backplane. The optical backplane interconnect switch is also suitable for direct replacement of the PI bus traffic switch and at the same time, suitable for supporting pipelining of the processor and memory. The 32 bidirectional switchable interconnects are configured with broadcast capability for controls, reconfiguration, and messages. The approach described here can handle a serial interconnection of data processors or a line-to-link interconnection of data processors. An optical fiber demonstration of this approach is presented.

  5. High-speed packet filtering utilizing stream processors

    Science.gov (United States)

    Hummel, Richard J.; Fulp, Errin W.

    2009-04-01

    Parallel firewalls offer a scalable architecture for the next generation of high-speed networks. While these parallel systems can be implemented using multiple firewalls, the latest generation of stream processors can provide similar benefits with a significantly reduced latency due to locality. This paper describes how the Cell Broadband Engine (CBE), a popular stream processor, can be used as a high-speed packet filter. Results show the CBE can potentially process packets arriving at a rate of 1 Gbps with a latency less than 82 μ-seconds. Performance depends on how well the packet filtering process is translated to the unique stream processor architecture. For example the method used for transmitting data and control messages among the pseudo-independent processor cores has a significant impact on performance. Experimental results will also show the current limitations of a CBE operating system when used to process packets. Possible solutions to these issues will be discussed.

  6. 2009 Survey of Gulf of Mexico Dockside Seafood Processors

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This survey gathered and analyze economic data from seafood processors throughout the states in the Gulf region. The survey sought to collect financial variables...

  7. Huffman-based code compression techniques for embedded processors

    KAUST Repository

    Bonny, Mohamed Talal; Henkel, Jö rg

    2010-01-01

    % for ARM and MIPS, respectively. In our compression technique, we have conducted evaluations using a representative set of applications and we have applied each technique to two major embedded processor architectures, namely ARM and MIPS. © 2010 ACM.

  8. High-Performance Linear Algebra Processor using FPGA

    National Research Council Canada - National Science Library

    Johnson, J

    2004-01-01

    With recent advances in FPGA (Field Programmable Gate Array) technology it is now feasible to use these devices to build special purpose processors for floating point intensive applications that arise in scientific computing...

  9. Particle simulation on a distributed memory highly parallel processor

    International Nuclear Information System (INIS)

    Sato, Hiroyuki; Ikesaka, Morio

    1990-01-01

    This paper describes parallel molecular dynamics simulation of atoms governed by local force interaction. The space in the model is divided into cubic subspaces and mapped to the processor array of the CAP-256, a distributed memory, highly parallel processor developed at Fujitsu Labs. We developed a new technique to avoid redundant calculation of forces between atoms in different processors. Experiments showed the communication overhead was less than 5%, and the idle time due to load imbalance was less than 11% for two model problems which contain 11,532 and 46,128 argon atoms. From the software simulation, the CAP-II which is under development is estimated to be about 45 times faster than CAP-256 and will be able to run the same problem about 40 times faster than Fujitsu's M-380 mainframe when 256 processors are used. (author)

  10. Radiation Tolerant Software Defined Video Processor, Phase I

    Data.gov (United States)

    National Aeronautics and Space Administration — MaXentric's is proposing a radiation tolerant Software Define Video Processor, codenamed SDVP, for the problem of advanced motion imaging in the space environment....

  11. Assembly processor program converts symbolic programming language to machine language

    Science.gov (United States)

    Pelto, E. V.

    1967-01-01

    Assembly processor program converts symbolic programming language to machine language. This program translates symbolic codes into computer understandable instructions, assigns locations in storage for successive instructions, and computer locations from symbolic addresses.

  12. Xeon Phi - A comparison between the newly introduced MIC architecture and a standard CPU through three types of problems.

    OpenAIRE

    Kristiansen, Joakim

    2016-01-01

    As Moore s law continues, processors keep getting more cores packed together on the chip. This thesis is an empirical study of the rather newly introduced Intel Many Integrated Core (IMIC) architecture found in the Intel Xeon Phi. With roughly 60 cores connected by a high performance on-die interconnect, the Intel Xeon Phi makes an interesting candidate for High Performance Computing. By digging into parallel algorithms solving three well known problems, our goal is to optimize, test and comp...

  13. Suboptimal processor for anomaly detection for system surveillance and diagnosis

    Energy Technology Data Exchange (ETDEWEB)

    Ciftcioglu, Oe.; Hoogenboom, J.E.; Dam, H. van

    1989-06-01

    Anomaly detection for nuclear reactor surveillance and diagnosis is described. The residual noise obtained as a result of autoregressive (AR) modelling is essential to obtain high sensitivity for anomaly detection. By means of the method of hypothesis testing a suboptimal anomaly detection processor is devised for system surveillance and diagnosis. Experiments are carried out to investigate the performance of the processor, which is in particular of interest for on-line and real-time applications.

  14. Reducing Competitive Cache Misses in Modern Processor Architectures

    OpenAIRE

    Prisagjanec, Milcho; Mitrevski, Pece

    2017-01-01

    The increasing number of threads inside the cores of a multicore processor, and competitive access to the shared cache memory, become the main reasons for an increased number of competitive cache misses and performance decline. Inevitably, the development of modern processor architectures leads to an increased number of cache misses. In this paper, we make an attempt to implement a technique for decreasing the number of competitive cache misses in the first level of cache memory. This tec...

  15. UA1 upgrade first-level calorimeter trigger processor

    International Nuclear Information System (INIS)

    Bains, N.; Charlton, D.; Ellis, N.; Garvey, J.; Gregory, J.; Jimack, M.P.; Jovanovic, P.; Kenyon, I.R.; Baird, S.A.; Campbell, D.; Cawthraw, M.; Coughlan, J.; Flynn, P.; Galagedera, S.; Grayer, G.; Halsall, R.; Shah, T.P.; Stephens, R.; Eisenhandler, E.; Fensome, I.; Landon, M.

    1989-01-01

    A new first-level trigger processor has been built for the UA1 experiment on the Cern SppS Collider. The processor exploits the fine granularity of the new UA1 uranium-TMP calorimeter to improve the selectivity of the trigger. The new electron trigger has improved hadron jet rejection, achieved by requiring low energy deposition around the electromagnetic cluster. A missing transverse energy trigger and a total energy trigger have also been implemented. (orig.)

  16. GA103: A microprogrammable processor for online filtering

    International Nuclear Information System (INIS)

    Calzas, A.; Danon, G.; Bouquet, B.

    1981-01-01

    GA 103 is a 16 bit microprogrammable processor which emulates the PDP 11 instruction set. It is based on the Am 2900 slices. It allows user-implemented microinstructions and addition of hardwired processors. It will perform on-line filtering tasks in the NA 14 experiment at CERN, based on the reconstruction of transverse momentum of photons detected in a lead glass calorimeter. (orig.)

  17. Preliminary Study of Image Reconstruction Algorithm on a Digital Signal Processor

    Science.gov (United States)

    2014-03-01

    5.2 Comparison of CPU-GPU, CPU-FPGA, and CPU-DSP Designs The work for implementing VHDL description of the back-projection algorithm on a physical...FPGA was not complete. Hence, the DSP implementation results are compared with the simulated results for the VHDL design. Simulating VHDL provides an...rather than at the software level. Depending on an application’s characteristics, FPGA implementations can provide a significant performance

  18. Sequential series for nuclear reactions

    International Nuclear Information System (INIS)

    Izumo, Ko

    1975-01-01

    A new time-dependent treatment of nuclear reactions is given, in which the wave function of compound nucleus is expanded by a sequential series of the reaction processes. The wave functions of the sequential series form another complete set of compound nucleus at the limit Δt→0. It is pointed out that the wave function is characterized by the quantities: the number of degrees of freedom of motion n, the period of the motion (Poincare cycle) tsub(n), the delay time t sub(nμ) and the relaxation time tausub(n) to the equilibrium of compound nucleus, instead of the usual quantum number lambda, the energy eigenvalue Esub(lambda) and the total width GAMMAsub(lambda) of resonance levels, respectively. The transition matrix elements and the yields of nuclear reactions also become the functions of time given by the Fourier transform of the usual ones. The Poincare cycles of compound nuclei are compared with the observed correlations among resonance levels, which are about 10 -17 --10 -16 sec for medium and heavy nuclei and about 10 -20 sec for the intermediate resonances. (auth.)

  19. The Serial Link Processor for the Fast TracKer (FTK) processor at ATLAS

    CERN Document Server

    Biesuz, Nicolo Vladi; The ATLAS collaboration; Luciano, Pierluigi; Magalotti, Daniel; Rossi, Enrico

    2015-01-01

    The Associative Memory (AM) system of the Fast Tracker (FTK) processor has been designed to perform pattern matching using the hit information of the ATLAS experiment silicon tracker. The AM is the heart of FTK and is mainly based on the use of ASICs (AM chips) designed on purpose to execute pattern matching with a high degree of parallelism. It finds track candidates at low resolution that are seeds for a full resolution track fitting. To solve the very challenging data traffic problems inside FTK, multiple board and chip designs have been performed. The currently proposed solution is named the “Serial Link Processor” and is based on an extremely powerful network of 2 Gb/s serial links. This paper reports on the design of the Serial Link Processor consisting of two types of boards, the Local Associative Memory Board (LAMB), a mezzanine where the AM chips are mounted, and the Associative Memory Board (AMB), a 9U VME board which holds and exercises four LAMBs. We report on the performance of the intermedia...

  20. The Serial Link Processor for the Fast TracKer (FTK) processor at ATLAS

    CERN Document Server

    Andreani, A; The ATLAS collaboration; Beccherle, R; Beretta, M; Cipriani, R; Citraro, S; Citterio, M; Colombo, A; Crescioli, F; Dimas, D; Donati, S; Giannetti, P; Kordas, K; Lanza, A; Liberali, V; Luciano, P; Magalotti, D; Neroutsos, P; Nikolaidis, S; Piendibene, M; Sakellariou, A; Shojaii, S; Sotiropoulou, C-L; Stabile, A

    2014-01-01

    The Associative Memory (AM) system of the FTK processor has been designed to perform pattern matching using the hit information of the ATLAS silicon tracker. The AM is the heart of the FTK and it finds track candidates at low resolution that are seeds for a full resolution track fitting. To solve the very challenging data traffic problems inside the FTK, multiple designs and tests have been performed. The currently proposed solution is named the “Serial Link Processor” and is based on an extremely powerful network of 2 Gb/s serial links. This paper reports on the design of the Serial Link Processor consisting of the AM chip, an ASIC designed and optimized to perform pattern matching, and two types of boards, the Local Associative Memory Board (LAMB), a mezzanine where the AM chips are mounted, and the Associative Memory Board (AMB), a 9U VME board which holds and exercises four LAMBs. Special relevance will be given to the AMchip design that includes two custom cells optimized for low consumption. We repo...

  1. The Serial Link Processor for the Fast TracKer (FTK) processor at ATLAS

    CERN Document Server

    Biesuz, Nicolo Vladi; The ATLAS collaboration; Luciano, Pierluigi; Magalotti, Daniel; Rossi, Enrico

    2015-01-01

    The Associative Memory (AM) system of the Fast Tracker (FTK) processor has been designed to perform pattern matching using the hit information of the ATLAS experiment silicon tracker. The AM is the heart of FTK and is mainly based on the use of ASICs (AM chips) designed to execute pattern matching with a high degree of parallelism. The AM system finds track candidates at low resolution that are seeds for a full resolution track fitting. To solve the very challenging data traffic problems inside FTK, multiple board and chip designs have been performed. The currently proposed solution is named the “Serial Link Processor” and is based on an extremely powerful network of 828 2 Gbit/s serial links for a total in/out bandwidth of 56 Gb/s. This paper reports on the design of the Serial Link Processor consisting of two types of boards, the Local Associative Memory Board (LAMB), a mezzanine where the AM chips are mounted, and the Associative Memory Board (AMB), a 9U VME board which holds and exercises four LAMBs. ...

  2. Reconfigurable signal processor designs for advanced digital array radar systems

    Science.gov (United States)

    Suarez, Hernan; Zhang, Yan (Rockee); Yu, Xining

    2017-05-01

    The new challenges originated from Digital Array Radar (DAR) demands a new generation of reconfigurable backend processor in the system. The new FPGA devices can support much higher speed, more bandwidth and processing capabilities for the need of digital Line Replaceable Unit (LRU). This study focuses on using the latest Altera and Xilinx devices in an adaptive beamforming processor. The field reprogrammable RF devices from Analog Devices are used as analog front end transceivers. Different from other existing Software-Defined Radio transceivers on the market, this processor is designed for distributed adaptive beamforming in a networked environment. The following aspects of the novel radar processor will be presented: (1) A new system-on-chip architecture based on Altera's devices and adaptive processing module, especially for the adaptive beamforming and pulse compression, will be introduced, (2) Successful implementation of generation 2 serial RapidIO data links on FPGA, which supports VITA-49 radio packet format for large distributed DAR processing. (3) Demonstration of the feasibility and capabilities of the processor in a Micro-TCA based, SRIO switching backplane to support multichannel beamforming in real-time. (4) Application of this processor in ongoing radar system development projects, including OU's dual-polarized digital array radar, the planned new cylindrical array radars, and future airborne radars.

  3. PixonVision real-time video processor

    Science.gov (United States)

    Puetter, R. C.; Hier, R. G.

    2007-09-01

    PixonImaging LLC and DigiVision, Inc. have developed a real-time video processor, the PixonVision PV-200, based on the patented Pixon method for image deblurring and denoising, and DigiVision's spatially adaptive contrast enhancement processor, the DV1000. The PV-200 can process NTSC and PAL video in real time with a latency of 1 field (1/60 th of a second), remove the effects of aerosol scattering from haze, mist, smoke, and dust, improve spatial resolution by up to 2x, decrease noise by up to 6x, and increase local contrast by up to 8x. A newer version of the processor, the PV-300, is now in prototype form and can handle high definition video. Both the PV-200 and PV-300 are FPGA-based processors, which could be spun into ASICs if desired. Obvious applications of these processors include applications in the DOD (tanks, aircraft, and ships), homeland security, intelligence, surveillance, and law enforcement. If developed into an ASIC, these processors will be suitable for a variety of portable applications, including gun sights, night vision goggles, binoculars, and guided munitions. This paper presents a variety of examples of PV-200 processing, including examples appropriate to border security, battlefield applications, port security, and surveillance from unmanned aerial vehicles.

  4. Review of trigger and on-line processors at SLAC

    International Nuclear Information System (INIS)

    Lankford, A.J.

    1984-07-01

    The role of trigger and on-line processors in reducing data rates to manageable proportions in e + e - physics experiments is defined not by high physics or background rates, but by the large event sizes of the general-purpose detectors employed. The rate of e + e - annihilation is low, and backgrounds are not high; yet the number of physics processes which can be studied is vast and varied. This paper begins by briefly describing the role of trigger processors in the e + e - context. The usual flow of the trigger decision process is illustrated with selected examples of SLAC trigger processing. The features are mentioned of triggering at the SLC and the trigger processing plans of the two SLC detectors: The Mark II and the SLD. The most common on-line processors at SLAC, the BADC, the SLAC Scanner Processor, the SLAC FASTBUS Controller, and the VAX CAMAC Channel, are discussed. Uses of the 168/E, 3081/E, and FASTBUS VAX processors are mentioned. The manner in which these processors are interfaced and the function they serve on line is described. Finally, the accelerator control system for the SLC is outlined. This paper is a survey in nature, and hence, relies heavily upon references to previous publications for detailed description of work mentioned here. 27 references, 9 figures, 1 table

  5. High-Speed General Purpose Genetic Algorithm Processor.

    Science.gov (United States)

    Hoseini Alinodehi, Seyed Pourya; Moshfe, Sajjad; Saber Zaeimian, Masoumeh; Khoei, Abdollah; Hadidi, Khairollah

    2016-07-01

    In this paper, an ultrafast steady-state genetic algorithm processor (GAP) is presented. Due to the heavy computational load of genetic algorithms (GAs), they usually take a long time to find optimum solutions. Hardware implementation is a significant approach to overcome the problem by speeding up the GAs procedure. Hence, we designed a digital CMOS implementation of GA in [Formula: see text] process. The proposed processor is not bounded to a specific application. Indeed, it is a general-purpose processor, which is capable of performing optimization in any possible application. Utilizing speed-boosting techniques, such as pipeline scheme, parallel coarse-grained processing, parallel fitness computation, parallel selection of parents, dual-population scheme, and support for pipelined fitness computation, the proposed processor significantly reduces the processing time. Furthermore, by relying on a built-in discard operator the proposed hardware may be used in constrained problems that are very common in control applications. In the proposed design, a large search space is achievable through the bit string length extension of individuals in the genetic population by connecting the 32-bit GAPs. In addition, the proposed processor supports parallel processing, in which the GAs procedure can be run on several connected processors simultaneously.

  6. A UNIX-based prototype biomedical virtual image processor

    International Nuclear Information System (INIS)

    Fahy, J.B.; Kim, Y.

    1987-01-01

    The authors have developed a multiprocess virtual image processor for the IBM PC/AT, in order to maximize image processing software portability for biomedical applications. An interprocess communication scheme, based on two-way metacode exchange, has been developed and verified for this purpose. Application programs call a device-independent image processing library, which transfers commands over a shared data bridge to one or more Autonomous Virtual Image Processors (AVIP). Each AVIP runs as a separate process in the UNIX operating system, and implements the device-independent functions on the image processor to which it corresponds. Application programs can control multiple image processors at a time, change the image processor configuration used at any time, and are completely portable among image processors for which an AVIP has been implemented. Run-time speeds have been found to be acceptable for higher level functions, although rather slow for lower level functions, owing to the overhead associated with sending commands and data over the shared data bridge

  7. A digital retina-like low-level vision processor.

    Science.gov (United States)

    Mertoguno, S; Bourbakis, N G

    2003-01-01

    This correspondence presents the basic design and the simulation of a low level multilayer vision processor that emulates to some degree the functional behavior of a human retina. This retina-like multilayer processor is the lower part of an autonomous self-organized vision system, called Kydon, that could be used on visually impaired people with a damaged visual cerebral cortex. The Kydon vision system, however, is not presented in this paper. The retina-like processor consists of four major layers, where each of them is an array processor based on hexagonal, autonomous processing elements that perform a certain set of low level vision tasks, such as smoothing and light adaptation, edge detection, segmentation, line recognition and region-graph generation. At each layer, the array processor is a 2D array of k/spl times/m hexagonal identical autonomous cells that simultaneously execute certain low level vision tasks. Thus, the hardware design and the simulation at the transistor level of the processing elements (PEs) of the retina-like processor and its simulated functionality with illustrative examples are provided in this paper.

  8. Air-Lubricated Thermal Processor For Dry Silver Film

    Science.gov (United States)

    Siryj, B. W.

    1980-09-01

    Since dry silver film is processed by heat, it may be viewed on a light table only seconds after exposure. On the other hand, wet films require both bulky chemicals and substantial time before an image can be analyzed. Processing of dry silver film, although simple in concept, is not so simple when reduced to practice. The main concern is the effect of film temperature gradients on uniformity of optical film density. RCA has developed two thermal processors, different in implementation but based on the same philosophy. Pressurized air is directed to both sides of the film to support the film and to conduct the heat to the film. Porous graphite is used as the medium through which heat and air are introduced. The initial thermal processor was designed to process 9.5-inch-wide film moving at speeds ranging from 0.0034 to 0.008 inch per second. The processor configuration was curved to match the plane generated by the laser recording beam. The second thermal processor was configured to process 5-inch-wide film moving at a continuously variable rate ranging from 0.15 to 3.5 inches per second. Due to field flattening optics used in this laser recorder, the required film processing area was plane. In addition, this processor was sectioned in the direction of film motion, giving the processor the capability of varying both temperature and effective processing area.

  9. Exploring the sequential lineup advantage using WITNESS.

    Science.gov (United States)

    Goodsell, Charles A; Gronlund, Scott D; Carlson, Curt A

    2010-12-01

    Advocates claim that the sequential lineup is an improvement over simultaneous lineup procedures, but no formal (quantitatively specified) explanation exists for why it is better. The computational model WITNESS (Clark, Appl Cogn Psychol 17:629-654, 2003) was used to develop theoretical explanations for the sequential lineup advantage. In its current form, WITNESS produced a sequential advantage only by pairing conservative sequential choosing with liberal simultaneous choosing. However, this combination failed to approximate four extant experiments that exhibited large sequential advantages. Two of these experiments became the focus of our efforts because the data were uncontaminated by likely suspect position effects. Decision-based and memory-based modifications to WITNESS approximated the data and produced a sequential advantage. The next step is to evaluate the proposed explanations and modify public policy recommendations accordingly.

  10. Leveraging the checkpoint-restart technique for optimizing CPU efficiency of ATLAS production applications on opportunistic platforms

    CERN Document Server

    Cameron, David; The ATLAS collaboration

    2017-01-01

    Data processing applications of the ATLAS experiment, such as event simulation and reconstruction, spend considerable amount of time in the initialization phase. This phase includes loading a large number of shared libraries, reading detector geometry and condition data from external databases, building a transient representation of the detector geometry and initializing various algorithms and services. In some cases the initialization step can take as long as 10-15 minutes. Such slow initialization, being inherently serial, has a significant negative impact on overall CPU efficiency of the production job, especially when the job is executed on opportunistic, often short-lived, resources such as commercial clouds or volunteer computing. In order to improve this situation, we can take advantage of the fact that ATLAS runs large numbers of production jobs with similar configuration parameters (e.g. jobs within the same production task). This allows us to checkpoint one job at the end of its configuration step a...

  11. High-speed special-purpose processor for event selection by number of direct tracks

    International Nuclear Information System (INIS)

    Kalinnikov, V.A.; Krastev, V.R.; Chudakov, E.A.

    1986-01-01

    A processor which uses data on events from five detector planes is described. To increase economy and speed in parallel processing, the processor converts the input data to superposition code and recognizes tracks by a generated search mask. The resolving time of the processor is ≤300 nsec. The processor is CAMAC-compatible and uses ECL integrated circuits

  12. Sequential lineup presentation: Patterns and policy

    OpenAIRE

    Lindsay, R C L; Mansour, Jamal K; Beaudry, J L; Leach, A-M; Bertrand, M I

    2009-01-01

    Sequential lineups were offered as an alternative to the traditional simultaneous lineup. Sequential lineups reduce incorrect lineup selections; however, the accompanying loss of correct identifications has resulted in controversy regarding adoption of the technique. We discuss the procedure and research relevant to (1) the pattern of results found using sequential versus simultaneous lineups; (2) reasons (theory) for differences in witness responses; (3) two methodological issues; and (4) im...

  13. Event- and Time-Driven Techniques Using Parallel CPU-GPU Co-processing for Spiking Neural Networks.

    Science.gov (United States)

    Naveros, Francisco; Garrido, Jesus A; Carrillo, Richard R; Ros, Eduardo; Luque, Niceto R

    2017-01-01

    Modeling and simulating the neural structures which make up our central neural system is instrumental for deciphering the computational neural cues beneath. Higher levels of biological plausibility usually impose higher levels of complexity in mathematical modeling, from neural to behavioral levels. This paper focuses on overcoming the simulation problems (accuracy and performance) derived from using higher levels of mathematical complexity at a neural level. This study proposes different techniques for simulating neural models that hold incremental levels of mathematical complexity: leaky integrate-and-fire (LIF), adaptive exponential integrate-and-fire (AdEx), and Hodgkin-Huxley (HH) neural models (ranged from low to high neural complexity). The studied techniques are classified into two main families depending on how the neural-model dynamic evaluation is computed: the event-driven or the time-driven families. Whilst event-driven techniques pre-compile and store the neural dynamics within look-up tables, time-driven techniques compute the neural dynamics iteratively during the simulation time. We propose two modifications for the event-driven family: a look-up table recombination to better cope with the incremental neural complexity together with a better handling of the synchronous input activity. Regarding the time-driven family, we propose a modification in computing the neural dynamics: the bi-fixed-step integration method. This method automatically adjusts the simulation step size to better cope with the stiffness of the neural model dynamics running in CPU platforms. One version of this method is also implemented for hybrid CPU-GPU platforms. Finally, we analyze how the performance and accuracy of these modifications evolve with increasing levels of neural complexity. We also demonstrate how the proposed modifications which constitute the main contribution of this study systematically outperform the traditional event- and time-driven techniques under

  14. Accelerating Smith-Waterman Alignment for Protein Database Search Using Frequency Distance Filtration Scheme Based on CPU-GPU Collaborative System.

    Science.gov (United States)

    Liu, Yu; Hong, Yang; Lin, Chun-Yuan; Hung, Che-Lun

    2015-01-01

    The Smith-Waterman (SW) algorithm has been widely utilized for searching biological sequence databases in bioinformatics. Recently, several works have adopted the graphic card with Graphic Processing Units (GPUs) and their associated CUDA model to enhance the performance of SW computations. However, these works mainly focused on the protein database search by using the intertask parallelization technique, and only using the GPU capability to do the SW computations one by one. Hence, in this paper, we will propose an efficient SW alignment method, called CUDA-SWfr, for the protein database search by using the intratask parallelization technique based on a CPU-GPU collaborative system. Before doing the SW computations on GPU, a procedure is applied on CPU by using the frequency distance filtration scheme (FDFS) to eliminate the unnecessary alignments. The experimental results indicate that CUDA-SWfr runs 9.6 times and 96 times faster than the CPU-based SW method without and with FDFS, respectively.

  15. The Bacterial Sequential Markov Coalescent.

    Science.gov (United States)

    De Maio, Nicola; Wilson, Daniel J

    2017-05-01

    Bacteria can exchange and acquire new genetic material from other organisms directly and via the environment. This process, known as bacterial recombination, has a strong impact on the evolution of bacteria, for example, leading to the spread of antibiotic resistance across clades and species, and to the avoidance of clonal interference. Recombination hinders phylogenetic and transmission inference because it creates patterns of substitutions (homoplasies) inconsistent with the hypothesis of a single evolutionary tree. Bacterial recombination is typically modeled as statistically akin to gene conversion in eukaryotes, i.e. , using the coalescent with gene conversion (CGC). However, this model can be very computationally demanding as it needs to account for the correlations of evolutionary histories of even distant loci. So, with the increasing popularity of whole genome sequencing, the need has emerged for a faster approach to model and simulate bacterial genome evolution. We present a new model that approximates the coalescent with gene conversion: the bacterial sequential Markov coalescent (BSMC). Our approach is based on a similar idea to the sequential Markov coalescent (SMC)-an approximation of the coalescent with crossover recombination. However, bacterial recombination poses hurdles to a sequential Markov approximation, as it leads to strong correlations and linkage disequilibrium across very distant sites in the genome. Our BSMC overcomes these difficulties, and shows a considerable reduction in computational demand compared to the exact CGC, and very similar patterns in simulated data. We implemented our BSMC model within new simulation software FastSimBac. In addition to the decreased computational demand compared to previous bacterial genome evolution simulators, FastSimBac provides more general options for evolutionary scenarios, allowing population structure with migration, speciation, population size changes, and recombination hotspots. FastSimBac is

  16. Biased lineups: sequential presentation reduces the problem.

    Science.gov (United States)

    Lindsay, R C; Lea, J A; Nosworthy, G J; Fulford, J A; Hector, J; LeVan, V; Seabrook, C

    1991-12-01

    Biased lineups have been shown to increase significantly false, but not correct, identification rates (Lindsay, Wallbridge, & Drennan, 1987; Lindsay & Wells, 1980; Malpass & Devine, 1981). Lindsay and Wells (1985) found that sequential lineup presentation reduced false identification rates, presumably by reducing reliance on relative judgment processes. Five staged-crime experiments were conducted to examine the effect of lineup biases and sequential presentation on eyewitness recognition accuracy. Sequential lineup presentation significantly reduced false identification rates from fair lineups as well as from lineups biased with regard to foil similarity, instructions, or witness attire, and from lineups biased in all of these ways. The results support recommendations that police present lineups sequentially.

  17. THOR Fields and Wave Processor - FWP

    Science.gov (United States)

    Soucek, Jan; Rothkaehl, Hanna; Ahlen, Lennart; Balikhin, Michael; Carr, Christopher; Dekkali, Moustapha; Khotyaintsev, Yuri; Lan, Radek; Magnes, Werner; Morawski, Marek; Nakamura, Rumi; Uhlir, Ludek; Yearby, Keith; Winkler, Marek; Zaslavsky, Arnaud

    2017-04-01

    If selected, Turbulence Heating ObserveR (THOR) will become the first spacecraft mission dedicated to the study of plasma turbulence. The Fields and Waves Processor (FWP) is an integrated electronics unit for all electromagnetic field measurements performed by THOR. FWP will interface with all THOR fields sensors: electric field antennas of the EFI instrument, the MAG fluxgate magnetometer, and search-coil magnetometer (SCM), and perform signal digitization and on-board data processing. FWP box will house multiple data acquisition sub-units and signal analyzers all sharing a common power supply and data processing unit and thus a single data and power interface to the spacecraft. Integrating all the electromagnetic field measurements in a single unit will improve the consistency of field measurement and accuracy of time synchronization. The scientific value of highly sensitive electric and magnetic field measurements in space has been demonstrated by Cluster (among other spacecraft) and THOR instrumentation will further improve on this heritage. Large dynamic range of the instruments will be complemented by a thorough electromagnetic cleanliness program, which will prevent perturbation of field measurements by interference from payload and platform subsystems. Taking advantage of the capabilities of modern electronics and the large telemetry bandwidth of THOR, FWP will provide multi-component electromagnetic field waveforms and spectral data products at a high time resolution. Fully synchronized sampling of many signals will allow to resolve wave phase information and estimate wavelength via interferometric correlations between EFI probes. FWP will also implement a plasma resonance sounder and a digital plasma quasi-thermal noise analyzer designed to provide high cadence measurements of plasma density and temperature complementary to data from particle instruments. FWP will rapidly transmit information about magnetic field vector and spacecraft potential to the

  18. Immediate Sequential Bilateral Cataract Surgery

    DEFF Research Database (Denmark)

    Kessel, Line; Andresen, Jens; Erngaard, Ditte

    2015-01-01

    The aim of the present systematic review was to examine the benefits and harms associated with immediate sequential bilateral cataract surgery (ISBCS) with specific emphasis on the rate of complications, postoperative anisometropia, and subjective visual function in order to formulate evidence......-based national Danish guidelines for cataract surgery. A systematic literature review in PubMed, Embase, and Cochrane central databases identified three randomized controlled trials that compared outcome in patients randomized to ISBCS or bilateral cataract surgery on two different dates. Meta-analyses were...... performed using the Cochrane Review Manager software. The quality of the evidence was assessed using the GRADE method (Grading of Recommendation, Assessment, Development, and Evaluation). We did not find any difference in the risk of complications or visual outcome in patients randomized to ISBCS or surgery...

  19. Random and cooperative sequential adsorption

    Science.gov (United States)

    Evans, J. W.

    1993-10-01

    Irreversible random sequential adsorption (RSA) on lattices, and continuum "car parking" analogues, have long received attention as models for reactions on polymer chains, chemisorption on single-crystal surfaces, adsorption in colloidal systems, and solid state transformations. Cooperative generalizations of these models (CSA) are sometimes more appropriate, and can exhibit richer kinetics and spatial structure, e.g., autocatalysis and clustering. The distribution of filled or transformed sites in RSA and CSA is not described by an equilibrium Gibbs measure. This is the case even for the saturation "jammed" state of models where the lattice or space cannot fill completely. However exact analysis is often possible in one dimension, and a variety of powerful analytic methods have been developed for higher dimensional models. Here we review the detailed understanding of asymptotic kinetics, spatial correlations, percolative structure, etc., which is emerging for these far-from-equilibrium processes.

  20. Digital signal processor for silicon audio playback devices; Silicon audio saisei kikiyo digital signal processor

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    2000-03-01

    The digital audio signal processor (DSP) TC9446F series has been developed silicon audio playback devices with a memory medium of, e.g., flash memory, DVD players, and AV devices, e.g., TV sets. It corresponds to AAC (advanced audio coding) (2ch) and MP3 (MPEG1 Layer3), as the audio compressing techniques being used for transmitting music through an internet. It also corresponds to compressed types, e.g., Dolby Digital, DTS (digital theater system) and MPEG2 audio, being adopted for, e.g., DVDs. It can carry a built-in audio signal processing program, e.g., Dolby ProLogic, equalizer, sound field controlling, and 3D sound. TC9446XB has been lined up anew. It adopts an FBGA (fine pitch ball grid array) package for portable audio devices. (translated by NEDO)

  1. Multi­-Threaded Algorithms for General purpose Graphics Processor Units in the ATLAS High Level Trigger

    CERN Document Server

    Conde Mui\\~no, Patricia; The ATLAS collaboration

    2016-01-01

    General purpose Graphics Processor Units (GPGPU) are being evaluated for possible future inclusion in an upgraded ATLAS High Level Trigger farm. We have developed a demonstrator including GPGPU implementations of Inner Detector and Muon tracking and Calorimeter clustering within the ATLAS software framework. ATLAS is a general purpose particle physics experiment located on the LHC collider at CERN. The ATLAS Trigger system consists of two levels, with level 1 implemented in hardware and the High Level Trigger implemented in software running on a farm of commodity CPU. The High Level Trigger reduces the trigger rate from the 100 kHz level 1 acceptance rate to 1 kHz for recording, requiring an average per­-event processing time of ~250 ms for this task. The selection in the high level trigger is based on reconstructing tracks in the Inner Detector and Muon Spectrometer and clusters of energy deposited in the Calorimeter. Performing this reconstruction within the available farm resources presents a significant ...

  2. Processor core for real time background identification of HD video based on OpenCV Gaussian mixture model algorithm

    Science.gov (United States)

    Genovese, Mariangela; Napoli, Ettore

    2013-05-01

    The identification of moving objects is a fundamental step in computer vision processing chains. The development of low cost and lightweight smart cameras steadily increases the request of efficient and high performance circuits able to process high definition video in real time. The paper proposes two processor cores aimed to perform the real time background identification on High Definition (HD, 1920 1080 pixel) video streams. The implemented algorithm is the OpenCV version of the Gaussian Mixture Model (GMM), an high performance probabilistic algorithm for the segmentation of the background that is however computationally intensive and impossible to implement on general purpose CPU with the constraint of real time processing. In the proposed paper, the equations of the OpenCV GMM algorithm are optimized in such a way that a lightweight and low power implementation of the algorithm is obtained. The reported performances are also the result of the use of state of the art truncated binary multipliers and ROM compression techniques for the implementation of the non-linear functions. The first circuit has commercial FPGA devices as a target and provides speed and logic resource occupation that overcome previously proposed implementations. The second circuit is oriented to an ASIC (UMC-90nm) standard cell implementation. Both implementations are able to process more than 60 frames per second in 1080p format, a frame rate compatible with HD television.

  3. Sequential decision reliability concept and failure rate assessment

    International Nuclear Information System (INIS)

    Ciftcioglu, O.

    1990-11-01

    Conventionally, a reliability concept is considered together with both each basic unit and their integration in a complicated large scale system such as a nuclear power plant (NPP). Basically, as the plant's operational status is determined by the information obtained from various sensors, the plant's reliability and the risk assessment is closely related to the reliability of the sensory information and hence the sensor components. However, considering the relevant information-processing systems, e.g. fault detection processors, there exists a further question about the reliability of such systems, specifically the reliability of the systems' decision-based outcomes by means of which the further actions are performed. To this end, a general sequential decision reliability concept and the failure rate assessment methodology is introduced. The implications of the methodology are investigated and the importance of the decision reliability concept in system operation is demonstrated by means of sensory signals in real-time from the Borssele NPP in the Netherlands. (author). 21 refs.; 8 figs

  4. [Improving speech comprehension using a new cochlear implant speech processor].

    Science.gov (United States)

    Müller-Deile, J; Kortmann, T; Hoppe, U; Hessel, H; Morsnowski, A

    2009-06-01

    The aim of this multicenter clinical field study was to assess the benefits of the new Freedom 24 sound processor for cochlear implant (CI) users implanted with the Nucleus 24 cochlear implant system. The study included 48 postlingually profoundly deaf experienced CI users who demonstrated speech comprehension performance with their current speech processor on the Oldenburg sentence test (OLSA) in quiet conditions of at least 80% correct scores and who were able to perform adaptive speech threshold testing using the OLSA in noisy conditions. Following baseline measures of speech comprehension performance with their current speech processor, subjects were upgraded to the Freedom 24 speech processor. After a take-home trial period of at least 2 weeks, subject performance was evaluated by measuring the speech reception threshold with the Freiburg multisyllabic word test and speech intelligibility with the Freiburg monosyllabic word test at 50 dB and 70 dB in the sound field. The results demonstrated highly significant benefits for speech comprehension with the new speech processor. Significant benefits for speech comprehension were also demonstrated with the new speech processor when tested in competing background noise.In contrast, use of the Abbreviated Profile of Hearing Aid Benefit (APHAB) did not prove to be a suitably sensitive assessment tool for comparative subjective self-assessment of hearing benefits with each processor. Use of the preprocessing algorithm known as adaptive dynamic range optimization (ADRO) in the Freedom 24 led to additional improvements over the standard upgrade map for speech comprehension in quiet and showed equivalent performance in noise. Through use of the preprocessing beam-forming algorithm BEAM, subjects demonstrated a highly significant improved signal-to-noise ratio for speech comprehension thresholds (i.e., signal-to-noise ratio for 50% speech comprehension scores) when tested with an adaptive procedure using the Oldenburg

  5. Accelerating 3D Elastic Wave Equations on Knights Landing based Intel Xeon Phi processors

    Science.gov (United States)

    Sourouri, Mohammed; Birger Raknes, Espen

    2017-04-01

    In advanced imaging methods like reverse-time migration (RTM) and full waveform inversion (FWI) the elastic wave equation (EWE) is numerically solved many times to create the seismic image or the elastic parameter model update. Thus, it is essential to optimize the solution time for solving the EWE as this will have a major impact on the total computational cost in running RTM or FWI. From a computational point of view applications implementing EWEs are associated with two major challenges. The first challenge is the amount of memory-bound computations involved, while the second challenge is the execution of such computations over very large datasets. So far, multi-core processors have not been able to tackle these two challenges, which eventually led to the adoption of accelerators such as Graphics Processing Units (GPUs). Compared to conventional CPUs, GPUs are densely populated with many floating-point units and fast memory, a type of architecture that has proven to map well to many scientific computations. Despite its architectural advantages, full-scale adoption of accelerators has yet to materialize. First, accelerators require a significant programming effort imposed by programming models such as CUDA or OpenCL. Second, accelerators come with a limited amount of memory, which also require explicit data transfers between the CPU and the accelerator over the slow PCI bus. The second generation of the Xeon Phi processor based on the Knights Landing (KNL) architecture, promises the computational capabilities of an accelerator but require the same programming effort as traditional multi-core processors. The high computational performance is realized through many integrated cores (number of cores and tiles and memory varies with the model) organized in tiles that are connected via a 2D mesh based interconnect. In contrary to accelerators, KNL is a self-hosted system, meaning explicit data transfers over the PCI bus are no longer required. However, like most

  6. Trial Sequential Methods for Meta-Analysis

    Science.gov (United States)

    Kulinskaya, Elena; Wood, John

    2014-01-01

    Statistical methods for sequential meta-analysis have applications also for the design of new trials. Existing methods are based on group sequential methods developed for single trials and start with the calculation of a required information size. This works satisfactorily within the framework of fixed effects meta-analysis, but conceptual…

  7. A programmable systolic trigger processor for FERA bus data

    International Nuclear Information System (INIS)

    Appelquist, G.; Hovander, B.; Sellden, B.; Bohm, C.

    1992-09-01

    A generic CAMAC based trigger processor module for fast processing of large amounts of ADC data, has been designed. This module has been realised using complex programmable gate arrays (LCAs from XILINX). The gate arrays have been connected to memories and multipliers in such a way that different gate array configurations can cover a wide range of module applications. Using this module, it is possible to construct complex trigger processors. The module uses both the fast ECL FERA bus and the CAMAC bus for inputs and outputs. The latter, however, is primarily used for set-up and control but may also be used for data output. Large numbers of ADCs can be served by a hierarchical arrangement of trigger processor modules, processing ADC data with pipe-line arithmetics producing the final result at the apex of the pyramid. The trigger decision will be transmitted to the data acquisition system via a logic signal while numeric results may be extracted by the CAMAC controller. The trigger processor was originally developed for the proposed neutral particle search experiment at CERN, NUMASS. There it was designed to serve as a second level trigger processor. It was required to correct all ADC raw data for efficiency and pedestal, calculate the total calorimeter energy, obtain the optimal time of flight data and calculate the particle mass. A suitable mass cut would then deliver the trigger decision. More complex triggers were also considered. (au)

  8. Low voltage 80 KV to 125 KV electron processors

    International Nuclear Information System (INIS)

    Lauppi, U.V.

    1999-01-01

    The classic electron beam technology made use of accelerating energies in the voltage range of 300 to 800 kV. The first EB processors - built for the curing of coatings - operated at 300 kV. The products to be treated were thicker than a simple layer of coating with thicknesses up to 100g and more. It was only in the beginning of the 1970's that industrial EB processors with accelerating voltages below 300 kV appeared on the market. Our company developed the first commercial electron accelerator without a beam scanner. The new EB machine featured a linear cathode, emitting a shower or 'curtain' of electrons over the full width of the product. These units were much smaller than anv previous EB processors and dedicated to the curing of coatings and other thin layers. ESI's first EB units operated with accelerating voltages between 150 and 200 kV. In 1993 ESI announced the introduction of a new generation of Electrocure. EB processors operating at 120 kV, and in 1998, at the RadTech North America '98 Conference in Chicago, the introduction of an 80 kV electron beam processor under the designation Microbeam LV

  9. Design of RISC Processor Using VHDL and Cadence

    Science.gov (United States)

    Moslehpour, Saeid; Puliroju, Chandrasekhar; Abu-Aisheh, Akram

    The project deals about development of a basic RISC processor. The processor is designed with basic architecture consisting of internal modules like clock generator, memory, program counter, instruction register, accumulator, arithmetic and logic unit and decoder. This processor is mainly used for simple general purpose like arithmetic operations and which can be further developed for general purpose processor by increasing the size of the instruction register. The processor is designed in VHDL by using Xilinx 8.1i version. The present project also serves as an application of the knowledge gained from past studies of the PSPICE program. The study will show how PSPICE can be used to simplify massive complex circuits designed in VHDL Synthesis. The purpose of the project is to explore the designed RISC model piece by piece, examine and understand the Input/ Output pins, and to show how the VHDL synthesis code can be converted to a simplified PSPICE model. The project will also serve as a collection of various research materials about the pieces of the circuit.

  10. Sequential lineup laps and eyewitness accuracy.

    Science.gov (United States)

    Steblay, Nancy K; Dietrich, Hannah L; Ryan, Shannon L; Raczynski, Jeanette L; James, Kali A

    2011-08-01

    Police practice of double-blind sequential lineups prompts a question about the efficacy of repeated viewings (laps) of the sequential lineup. Two laboratory experiments confirmed the presence of a sequential lap effect: an increase in witness lineup picks from first to second lap, when the culprit was a stranger. The second lap produced more errors than correct identifications. In Experiment 2, lineup diagnosticity was significantly higher for sequential lineup procedures that employed a single versus double laps. Witnesses who elected to view a second lap made significantly more errors than witnesses who chose to stop after one lap or those who were required to view two laps. Witnesses with prior exposure to the culprit did not exhibit a sequential lap effect.

  11. Multi-agent sequential hypothesis testing

    KAUST Repository

    Kim, Kwang-Ki K.

    2014-12-15

    This paper considers multi-agent sequential hypothesis testing and presents a framework for strategic learning in sequential games with explicit consideration of both temporal and spatial coordination. The associated Bayes risk functions explicitly incorporate costs of taking private/public measurements, costs of time-difference and disagreement in actions of agents, and costs of false declaration/choices in the sequential hypothesis testing. The corresponding sequential decision processes have well-defined value functions with respect to (a) the belief states for the case of conditional independent private noisy measurements that are also assumed to be independent identically distributed over time, and (b) the information states for the case of correlated private noisy measurements. A sequential investment game of strategic coordination and delay is also discussed as an application of the proposed strategic learning rules.

  12. Sequential Product of Quantum Effects: An Overview

    Science.gov (United States)

    Gudder, Stan

    2010-12-01

    This article presents an overview for the theory of sequential products of quantum effects. We first summarize some of the highlights of this relatively recent field of investigation and then provide some new results. We begin by discussing sequential effect algebras which are effect algebras endowed with a sequential product satisfying certain basic conditions. We then consider sequential products of (discrete) quantum measurements. We next treat transition effect matrices (TEMs) and their associated sequential product. A TEM is a matrix whose entries are effects and whose rows form quantum measurements. We show that TEMs can be employed for the study of quantum Markov chains. Finally, we prove some new results concerning TEMs and vector densities.

  13. An intercomparison of Canadian external dosimetry processors for radiation protection

    International Nuclear Information System (INIS)

    1989-10-01

    The five Canadian external dosimetry processors have participated in a two-stage intercomparison. The first stage involved dosimeters to known radiation fields under controlled laboratory conditions. The second stage involved exposing dosimeters to radiation fields in power reactor working environments. The results for each stage indicated the dose reported by each processor relative to an independently determined dose and relative to the others. The results of the intercomparisons confirm the original supposition: namely that the average differences in reported dose among five processors are much less than the uncertainty limits recommended by the ICRP. This report provides a description of the experimental methods as well as a discussion of the results for each stage. The report also includes a set of recommendations

  14. First Results of an “Artificial Retina” Processor Prototype

    International Nuclear Information System (INIS)

    Cenci, Riccardo; Bedeschi, Franco; Marino, Pietro; Morello, Michael J.; Ninci, Daniele; Piucci, Alessio; Punzi, Giovanni; Ristori, Luciano; Spinella, Franco; Stracka, Simone; Tonelli, Diego; Walsh, John

    2016-01-01

    We report on the performance of a specialized processor capable of reconstructing charged particle tracks in a realistic LHC silicon tracker detector, at the same speed of the readout and with sub-microsecond latency. The processor is based on an innovative pattern-recognition algorithm, called “artificial retina algorithm”, inspired from the vision system of mammals. A prototype of the processor has been designed, simulated, and implemented on Tel62 boards equipped with high-bandwidth Altera Stratix III FPGA devices. The prototype is the first step towards a real-time track reconstruction device aimed at processing complex events of high-luminosity LHC experiments at 40 MHz crossing rate

  15. Modal Processor Effects Inspired by Hammond Tonewheel Organs

    Directory of Open Access Journals (Sweden)

    Kurt James Werner

    2016-06-01

    Full Text Available In this design study, we introduce a novel class of digital audio effects that extend the recently introduced modal processor approach to artificial reverberation and effects processing. These pitch and distortion processing effects mimic the design and sonics of a classic additive-synthesis-based electromechanical musical instrument, the Hammond tonewheel organ. As a reverb effect, the modal processor simulates a room response as the sum of resonant filter responses. This architecture provides precise, interactive control over the frequency, damping, and complex amplitude of each mode. Into this framework, we introduce two types of processing effects: pitch effects inspired by the Hammond organ’s equal tempered “tonewheels”, “drawbar” tone controls, vibrato/chorus circuit, and distortion effects inspired by the pseudo-sinusoidal shape of its tonewheels and electromagnetic pickup distortion. The result is an effects processor that imprints the Hammond organ’s sonics onto any audio input.

  16. Safety-critical Java on a time-predictable processor

    DEFF Research Database (Denmark)

    Korsholm, Stephan E.; Schoeberl, Martin; Puffitsch, Wolfgang

    2015-01-01

    For real-time systems the whole execution stack needs to be time-predictable and analyzable for the worst-case execution time (WCET). This paper presents a time-predictable platform for safety-critical Java. The platform consists of (1) the Patmos processor, which is a time-predictable processor......; (2) a C compiler for Patmos with support for WCET analysis; (3) the HVM, which is a Java-to-C compiler; (4) the HVM-SCJ implementation which supports SCJ Level 0, 1, and 2 (for both single and multicore platforms); and (5) a WCET analysis tool. We show that real-time Java programs translated to C...... and compiled to a Patmos binary can be analyzed by the AbsInt aiT WCET analysis tool. To the best of our knowledge the presented system is the second WCET analyzable real-time Java system; and the first one on top of a RISC processor....

  17. Token-Aware Completion Functions for Elastic Processor Verification

    Directory of Open Access Journals (Sweden)

    Sudarshan K. Srinivasan

    2009-01-01

    Full Text Available We develop a formal verification procedure to check that elastic pipelined processor designs correctly implement their instruction set architecture (ISA specifications. The notion of correctness we use is based on refinement. Refinement proofs are based on refinement maps, which—in the context of this problem—are functions that map elastic processor states to states of the ISA specification model. Data flow in elastic architectures is complicated by the insertion of any number of buffers in any place in the design, making it hard to construct refinement maps for elastic systems in a systematic manner. We introduce token-aware completion functions, which incorporate a mechanism to track the flow of data in elastic pipelines, as a highly automated and systematic approach to construct refinement maps. We demonstrate the efficiency of the overall verification procedure based on token-aware completion functions using six elastic pipelined processor models based on the DLX architecture.

  18. Processor farming in two-level analysis of historical bridge

    Science.gov (United States)

    Krejčí, T.; Kruis, J.; Koudelka, T.; Šejnoha, M.

    2017-11-01

    This contribution presents a processor farming method in connection with a multi-scale analysis. In this method, each macro-scopic integration point or each finite element is connected with a certain meso-scopic problem represented by an appropriate representative volume element (RVE). The solution of a meso-scale problem provides then effective parameters needed on the macro-scale. Such an analysis is suitable for parallel computing because the meso-scale problems can be distributed among many processors. The application of the processor farming method to a real world masonry structure is illustrated by an analysis of Charles bridge in Prague. The three-dimensional numerical model simulates the coupled heat and moisture transfer of one half of arch No. 3. and it is a part of a complex hygro-thermo-mechanical analysis which has been developed to determine the influence of climatic loading on the current state of the bridge.

  19. A single chip pulse processor for nuclear spectroscopy

    International Nuclear Information System (INIS)

    Hilsenrath, F.; Bakke, J.C.; Voss, H.D.

    1985-01-01

    A high performance digital pulse processor, integrated into a single gate array microcircuit, has been developed for spaceflight applications. The new approach takes advantage of the latest CMOS high speed A/D flash converters and low-power gated logic arrays. The pulse processor measures pulse height, pulse area and the required timing information (e.g. multi detector coincidence and pulse pile-up detection). The pulse processor features high throughput rate (e.g. 0.5 Mhz for 2 usec gausssian pulses) and improved differential linearity (e.g. + or - 0.2 LSB for a + or - 1 LSB A/D). Because of the parallel digital architecture of the device, the interface is microprocessor bus compatible. A satellite flight application of this module is presented for use in the X-ray imager and high energy particle spectrometers of the PEM experiment on the Upper Atmospheric Research Satellite

  20. The ATLAS Level-1 Central Trigger Processor (CTP)

    CERN Document Server

    Spiwoks, Ralf; Ellis, Nick; Farthouat, P; Gällnö, P; Haller, J; Krasznahorkay, A; Maeno, T; Pauly, T; Pessoa-Lima, H; Resurreccion-Arcas, I; Schuler, G; De Seixas, J M; Torga-Teixeira, R; Wengler, T

    2005-01-01

    The ATLAS Level-1 Central Trigger Processor (CTP) combines information from calorimeter and muon trigger processors and makes the final Level-1 Accept (L1A) decision on the basis of lists of selection criteria (trigger menus). In addition to the event-selection decision, the CTP also provides trigger summary information to the Level-2 trigger and the data acquisition system. It further provides accumulated and bunch-by-bunch scaler data for monitoring of the trigger, detector and beam conditions. The CTP is presented and results are shown from tests with the calorimeter adn muon trigger processors connected to detectors in a particle beam, as well as from stand-alone full-system tests in the laboratory which were used to validate the CTP.

  1. Stepping motor control processor reference manual. Volume I

    International Nuclear Information System (INIS)

    Holloway, F.W.; VanArsdall, P.J.; Suski, G.J.; Gant, R.G.; Rash, M.

    1980-01-01

    This manual is intended to serve several purposes. The first goal is to describe the capabilities and operation of the SMC processor package from an operator or user point of view. Secondly, the manual will describe in some detail the basic hardware elements and how they can be used effectively to implement a step motor control system. Practical information on the use, installation and checkout of the hardware set is presented in the following sections along with programming suggestions. Available related system software is described in this manual for reference and as an aid in understanding the system architecture. Section two presents an overview and operations manual of the SMC processor describing its composition and functional capabilities. Section three contains hardware descriptions in some detail for the LLL-designed hardware used in the SMC processor. Basic theory of operation and important features are explained

  2. Embedded Processor Based Automatic Temperature Control of VLSI Chips

    Directory of Open Access Journals (Sweden)

    Narasimha Murthy Yayavaram

    2009-01-01

    Full Text Available This paper presents embedded processor based automatic temperature control of VLSI chips, using temperature sensor LM35 and ARM processor LPC2378. Due to the very high packing density, VLSI chips get heated very soon and if not cooled properly, the performance is very much affected. In the present work, the sensor which is kept very near proximity to the IC will sense the temperature and the speed of the fan arranged near to the IC is controlled based on the PWM signal generated by the ARM processor. A buzzer is also provided with the hardware, to indicate either the failure of the fan or overheating of the IC. The entire process is achieved by developing a suitable embedded C program.

  3. A Processor-Sharing Scheduling Strategy for NFV Nodes

    Directory of Open Access Journals (Sweden)

    Giuseppe Faraci

    2016-01-01

    Full Text Available The introduction of the two paradigms SDN and NFV to “softwarize” the current Internet is making management and resource allocation two key challenges in the evolution towards the Future Internet. In this context, this paper proposes Network-Aware Round Robin (NARR, a processor-sharing strategy, to reduce delays in traversing SDN/NFV nodes. The application of NARR alleviates the job of the Orchestrator by automatically working at the intranode level, dynamically assigning the processor slices to the virtual network functions (VNFs according to the state of the queues associated with the output links of the network interface cards (NICs. An extensive simulation set is presented to show the improvements achieved with respect to two more processor-sharing strategies chosen as reference.

  4. Multilevel sequential Monte Carlo samplers

    KAUST Repository

    Beskos, Alexandros; Jasra, Ajay; Law, Kody; Tempone, Raul; Zhou, Yan

    2016-01-01

    In this article we consider the approximation of expectations w.r.t. probability distributions associated to the solution of partial differential equations (PDEs); this scenario appears routinely in Bayesian inverse problems. In practice, one often has to solve the associated PDE numerically, using, for instance finite element methods which depend on the step-size level . hL. In addition, the expectation cannot be computed analytically and one often resorts to Monte Carlo methods. In the context of this problem, it is known that the introduction of the multilevel Monte Carlo (MLMC) method can reduce the amount of computational effort to estimate expectations, for a given level of error. This is achieved via a telescoping identity associated to a Monte Carlo approximation of a sequence of probability distributions with discretization levels . ∞>h0>h1⋯>hL. In many practical problems of interest, one cannot achieve an i.i.d. sampling of the associated sequence and a sequential Monte Carlo (SMC) version of the MLMC method is introduced to deal with this problem. It is shown that under appropriate assumptions, the attractive property of a reduction of the amount of computational effort to estimate expectations, for a given level of error, can be maintained within the SMC context. That is, relative to exact sampling and Monte Carlo for the distribution at the finest level . hL. The approach is numerically illustrated on a Bayesian inverse problem. © 2016 Elsevier B.V.

  5. Multilevel sequential Monte Carlo samplers

    KAUST Repository

    Beskos, Alexandros

    2016-08-29

    In this article we consider the approximation of expectations w.r.t. probability distributions associated to the solution of partial differential equations (PDEs); this scenario appears routinely in Bayesian inverse problems. In practice, one often has to solve the associated PDE numerically, using, for instance finite element methods which depend on the step-size level . hL. In addition, the expectation cannot be computed analytically and one often resorts to Monte Carlo methods. In the context of this problem, it is known that the introduction of the multilevel Monte Carlo (MLMC) method can reduce the amount of computational effort to estimate expectations, for a given level of error. This is achieved via a telescoping identity associated to a Monte Carlo approximation of a sequence of probability distributions with discretization levels . ∞>h0>h1⋯>hL. In many practical problems of interest, one cannot achieve an i.i.d. sampling of the associated sequence and a sequential Monte Carlo (SMC) version of the MLMC method is introduced to deal with this problem. It is shown that under appropriate assumptions, the attractive property of a reduction of the amount of computational effort to estimate expectations, for a given level of error, can be maintained within the SMC context. That is, relative to exact sampling and Monte Carlo for the distribution at the finest level . hL. The approach is numerically illustrated on a Bayesian inverse problem. © 2016 Elsevier B.V.

  6. Sequential Scintigraphy in Renal Transplantation

    Energy Technology Data Exchange (ETDEWEB)

    Winkel, K. zum; Harbst, H.; Schenck, P.; Franz, H. E.; Ritz, E.; Roehl, L.; Ziegler, M.; Ammann, W.; Maier-Borst, W. [Institut Fuer Nuklearmedizin, Deutsches Krebsforschungszentrum, Heidelberg, Federal Republic of Germany (Germany)

    1969-05-15

    Based on experience gained from more than 1600 patients with proved or suspected kidney diseases and on results on extended studies with dogs, sequential scintigraphy was performed after renal transplantation in dogs. After intravenous injection of 500 {mu}Ci. {sup 131}I-Hippuran scintiphotos were taken during the first minute with an exposure time of 15 sec each and thereafter with an exposure of 2 min up to at least 16 min.. Several examinations were evaluated digitally. 26 examinations were performed on 11 dogs with homotransplanted kidneys. Immediately after transplantation the renal function was almost normal arid the bladder was filled in due time. At the beginning of rejection the initial uptake of radioactive Hippuran was reduced. The intrarenal transport became delayed; probably the renal extraction rate decreased. Corresponding to the development of an oedema in the transplant the uptake area increased in size. In cases of thrombosis of the main artery there was no evidence of any uptake of radioactivity in the transplant. Similar results were obtained in 41 examinations on 15 persons. Patients with postoperative anuria due to acute tubular necrosis showed still some uptake of radioactivity contrary to those with thrombosis of the renal artery, where no uptake was found. In cases of rejection the most frequent signs were a reduced initial uptake and a delayed intrarenal transport of radioactive Hippuran. Infarction could be detected by a reduced uptake in distinct areas of the transplant. (author)

  7. Sequential provisional implant prosthodontics therapy.

    Science.gov (United States)

    Zinner, Ira D; Markovits, Stanley; Jansen, Curtis E; Reid, Patrick E; Schnader, Yale E; Shapiro, Herbert J

    2012-01-01

    The fabrication and long-term use of first- and second-stage provisional implant prostheses is critical to create a favorable prognosis for function and esthetics of a fixed-implant supported prosthesis. The fixed metal and acrylic resin cemented first-stage prosthesis, as reviewed in Part I, is needed for prevention of adjacent and opposing tooth movement, pressure on the implant site as well as protection to avoid micromovement of the freshly placed implant body. The second-stage prosthesis, reviewed in Part II, should be used following implant uncovering and abutment installation. The patient wears this provisional prosthesis until maturation of the bone and healing of soft tissues. The second-stage provisional prosthesis is also a fail-safe mechanism for possible early implant failures and also can be used with late failures and/or for the necessity to repair the definitive prosthesis. In addition, the screw-retained provisional prosthesis is used if and when an implant requires removal or other implants are to be placed as in a sequential approach. The creation and use of both first- and second-stage provisional prostheses involve a restorative dentist, dental technician, surgeon, and patient to work as a team. If the dentist alone cannot do diagnosis and treatment planning, surgery, and laboratory techniques, he or she needs help by employing the expertise of a surgeon and a laboratory technician. This team approach is essential for optimum results.

  8. Sequential pattern data mining and visualization

    Science.gov (United States)

    Wong, Pak Chung [Richland, WA; Jurrus, Elizabeth R [Kennewick, WA; Cowley, Wendy E [Benton City, WA; Foote, Harlan P [Richland, WA; Thomas, James J [Richland, WA

    2009-05-26

    One or more processors (22) are operated to extract a number of different event identifiers therefrom. These processors (22) are further operable to determine a number a display locations each representative of one of the different identifiers and a corresponding time. The display locations are grouped into sets each corresponding to a different one of several event sequences (330a, 330b, 330c. 330d, 330e). An output is generated corresponding to a visualization (320) of the event sequences (330a, 330b, 330c, 330d, 330e).

  9. Satellite on-board real-time SAR processor prototype

    Science.gov (United States)

    Bergeron, Alain; Doucet, Michel; Harnisch, Bernd; Suess, Martin; Marchese, Linda; Bourqui, Pascal; Desnoyers, Nicholas; Legros, Mathieu; Guillot, Ludovic; Mercier, Luc; Châteauneuf, François

    2017-11-01

    A Compact Real-Time Optronic SAR Processor has been successfully developed and tested up to a Technology Readiness Level of 4 (TRL4), the breadboard validation in a laboratory environment. SAR, or Synthetic Aperture Radar, is an active system allowing day and night imaging independent of the cloud coverage of the planet. The SAR raw data is a set of complex data for range and azimuth, which cannot be compressed. Specifically, for planetary missions and unmanned aerial vehicle (UAV) systems with limited communication data rates this is a clear disadvantage. SAR images are typically processed electronically applying dedicated Fourier transformations. This, however, can also be performed optically in real-time. Originally the first SAR images were optically processed. The optical Fourier processor architecture provides inherent parallel computing capabilities allowing real-time SAR data processing and thus the ability for compression and strongly reduced communication bandwidth requirements for the satellite. SAR signal return data are in general complex data. Both amplitude and phase must be combined optically in the SAR processor for each range and azimuth pixel. Amplitude and phase are generated by dedicated spatial light modulators and superimposed by an optical relay set-up. The spatial light modulators display the full complex raw data information over a two-dimensional format, one for the azimuth and one for the range. Since the entire signal history is displayed at once, the processor operates in parallel yielding real-time performances, i.e. without resulting bottleneck. Processing of both azimuth and range information is performed in a single pass. This paper focuses on the onboard capabilities of the compact optical SAR processor prototype that allows in-orbit processing of SAR images. Examples of processed ENVISAT ASAR images are presented. Various SAR processor parameters such as processing capabilities, image quality (point target analysis), weight and

  10. Benchmarking NWP Kernels on Multi- and Many-core Processors

    Science.gov (United States)

    Michalakes, J.; Vachharajani, M.

    2008-12-01

    Increased computing power for weather, climate, and atmospheric science has provided direct benefits for defense, agriculture, the economy, the environment, and public welfare and convenience. Today, very large clusters with many thousands of processors are allowing scientists to move forward with simulations of unprecedented size. But time-critical applications such as real-time forecasting or climate prediction need strong scaling: faster nodes and processors, not more of them. Moreover, the need for good cost- performance has never been greater, both in terms of performance per watt and per dollar. For these reasons, the new generations of multi- and many-core processors being mass produced for commercial IT and "graphical computing" (video games) are being scrutinized for their ability to exploit the abundant fine- grain parallelism in atmospheric models. We present results of our work to date identifying key computational kernels within the dynamics and physics of a large community NWP model, the Weather Research and Forecast (WRF) model. We benchmark and optimize these kernels on several different multi- and many-core processors. The goals are to (1) characterize and model performance of the kernels in terms of computational intensity, data parallelism, memory bandwidth pressure, memory footprint, etc. (2) enumerate and classify effective strategies for coding and optimizing for these new processors, (3) assess difficulties and opportunities for tool or higher-level language support, and (4) establish a continuing set of kernel benchmarks that can be used to measure and compare effectiveness of current and future designs of multi- and many-core processors for weather and climate applications.

  11. Nonlinear Wave Simulation on the Xeon Phi Knights Landing Processor

    OpenAIRE

    Hristov Ivan; Goranov Goran; Hristova Radoslava

    2018-01-01

    We consider an interesting from computational point of view standing wave simulation by solving coupled 2D perturbed Sine-Gordon equations. We make an OpenMP realization which explores both thread and SIMD levels of parallelism. We test the OpenMP program on two different energy equivalent Intel architectures: 2× Xeon E5-2695 v2 processors, (code-named “Ivy Bridge-EP”) in the Hybrilit cluster, and Xeon Phi 7250 processor (code-named “Knights Landing” (KNL). The results show 2 times better per...

  12. The Danish real-time SAR processor: first results

    DEFF Research Database (Denmark)

    Dall, Jørgen; Jørgensen, Jørn Hjelm; Netterstrøm, Anders

    1993-01-01

    A real-time processor (RTP) for the Danish airborne Synthetic Aperture Radar (SAR) has been designed and constructed at the Electromagnetics Institute. The implementation was completed in mid 1992, and since then the RTP has been operated successfully on several test and demonstration flights....... The processor is capable of focusing the entire swath of the raw SAR data into full resolution, and depending on the choice made by the on-board operator, either a high resolution one-look zoom image or a spatially multilooked overview image is displayed. After a brief design review, the paper addresses various...

  13. Matrix preconditioning: a robust operation for optical linear algebra processors.

    Science.gov (United States)

    Ghosh, A; Paparao, P

    1987-07-15

    Analog electrooptical processors are best suited for applications demanding high computational throughput with tolerance for inaccuracies. Matrix preconditioning is one such application. Matrix preconditioning is a preprocessing step for reducing the condition number of a matrix and is used extensively with gradient algorithms for increasing the rate of convergence and improving the accuracy of the solution. In this paper, we describe a simple parallel algorithm for matrix preconditioning, which can be implemented efficiently on a pipelined optical linear algebra processor. From the results of our numerical experiments we show that the efficacy of the preconditioning algorithm is affected very little by the errors of the optical system.

  14. UNIBUS processor interface for a FASTBUS data acquisition system

    International Nuclear Information System (INIS)

    Larwill, M.; Lagerlund, T.D.; Barsotti, E.; Taff, L.M.; Franzen, J.

    1981-01-01

    Current work on a FASTBUS data acquisition system at Fermilab is described. The system will consist of three pieces of FASTBUS hardware: a UNIBUS processor interface (UPI), a dual-ported bulk memory, and a FASTBUS ''event builder'' (i.e., data acquisition processor). Primary efforts have been on specifying and constructing a UPI. The present specification includes capability for all basic FASTBUS operations, including list processing of consecutive FASTBUS operations. Some possible FASTBUS data acquisition system architectures employing the UPI are discussed along with some detailed specifications of the UPI itself

  15. Ring-array processor distribution topology for optical interconnects

    Science.gov (United States)

    Li, Yao; Ha, Berlin; Wang, Ting; Wang, Sunyu; Katz, A.; Lu, X. J.; Kanterakis, E.

    1992-01-01

    The existing linear and rectangular processor distribution topologies for optical interconnects, although promising in many respects, cannot solve problems such as clock skews, the lack of supporting elements for efficient optical implementation, etc. The use of a ring-array processor distribution topology, however, can overcome these problems. Here, a study of the ring-array topology is conducted with an aim of implementing various fast clock rate, high-performance, compact optical networks for digital electronic multiprocessor computers. Practical design issues are addressed. Some proof-of-principle experimental results are included.

  16. Global synchronization of parallel processors using clock pulse width modulation

    Science.gov (United States)

    Chen, Dong; Ellavsky, Matthew R.; Franke, Ross L.; Gara, Alan; Gooding, Thomas M.; Haring, Rudolf A.; Jeanson, Mark J.; Kopcsay, Gerard V.; Liebsch, Thomas A.; Littrell, Daniel; Ohmacht, Martin; Reed, Don D.; Schenck, Brandon E.; Swetz, Richard A.

    2013-04-02

    A circuit generates a global clock signal with a pulse width modification to synchronize processors in a parallel computing system. The circuit may include a hardware module and a clock splitter. The hardware module may generate a clock signal and performs a pulse width modification on the clock signal. The pulse width modification changes a pulse width within a clock period in the clock signal. The clock splitter may distribute the pulse width modified clock signal to a plurality of processors in the parallel computing system.

  17. Post-silicon and runtime verification for modern processors

    CERN Document Server

    Wagner, Ilya

    2010-01-01

    The purpose of this book is to survey the state of the art and evolving directions in post-silicon and runtime verification. The authors start by giving an overview of the state of the art in verification, particularly current post-silicon methodologies in use in the industry, both for the domain of processor pipeline design and for memory subsystems. They then dive into the presentation of several new post-silicon verification solutions aimed at boosting the verification coverage of modern processors, dedicating several chapters to this topic. The presentation of runtime verification solution

  18. A VAX-FPS Loosely-Coupled Array of Processors

    International Nuclear Information System (INIS)

    Grosdidier, G.

    1987-03-01

    The main features of a VAX-FPS Loosely-Coupled Array of Processors (LCAP) set-up and the implementation of a High Energy Physics tracking program for off-line purposes will be described. This LCAP consists of a VAX 11/750 host and two FPS 64 bit attached processors. Before analyzing the performances of this LCAP, its characteristics will be outlined, especially from a user's point of vue, and will be briefly compared to those of the IBM-FPS LCAP

  19. Parallel Processor for 3D Recovery from Optical Flow

    Directory of Open Access Journals (Sweden)

    Jose Hugo Barron-Zambrano

    2009-01-01

    Full Text Available 3D recovery from motion has received a major effort in computer vision systems in the recent years. The main problem lies in the number of operations and memory accesses to be performed by the majority of the existing techniques when translated to hardware or software implementations. This paper proposes a parallel processor for 3D recovery from optical flow. Its main feature is the maximum reuse of data and the low number of clock cycles to calculate the optical flow, along with the precision with which 3D recovery is achieved. The results of the proposed architecture as well as those from processor synthesis are presented.

  20. FASTBUS Standard Routines implementation for Fermilab embedded processor boards

    International Nuclear Information System (INIS)

    Pangburn, J.; Patrick, J.; Kent, S.; Oleynik, G.; Pordes, R.; Votava, M.; Heyes, G.; Watson, W.A. III

    1992-10-01

    In collaboration with CEBAF, Fermilab's Online Support Department and the CDF experiment have produced a new implementation of the IEEE FASTBUS Standard Routines for two embedded processor FASTBUS boards: the Fermilab Smart Crate Controller (FSCC) and the FASTBUS Readout Controller (FRC). Features of this implementation include: portability (to other embedded processor boards), remote source-level debugging, high speed, optional generation of very high-speed code for readout applications, and built-in Sun RPC support for execution of FASTBUS transactions and lists over the network

  1. The associative memory system for the FTK processor at ATLAS

    CERN Document Server

    Magalotti, D; The ATLAS collaboration; Donati, S; Luciano, P; Piendibene, M; Giannetti, P; Lanza, A; Verzellesi, G; Sakellariou, Andreas; Billereau, W; Combe, J M

    2014-01-01

    In high energy physics experiments, the most interesting processes are very rare and hidden in an extremely large level of background. As the experiment complexity, accelerator backgrounds, and instantaneous luminosity increase, more effective and accurate data selection techniques are needed. The Fast TracKer processor (FTK) is a real time tracking processor designed for the ATLAS trigger upgrade. The FTK core is the Associative Memory system. It provides massive computing power to minimize the processing time of complex tracking algorithms executed online. This paper reports on the results and performance of a new prototype of Associative Memory system.

  2. Wavelength-encoded OCDMA system using opto-VLSI processors.

    Science.gov (United States)

    Aljada, Muhsen; Alameh, Kamal

    2007-07-01

    We propose and experimentally demonstrate a 2.5 Gbits/sper user wavelength-encoded optical code-division multiple-access encoder-decoder structure based on opto-VLSI processing. Each encoder and decoder is constructed using a single 1D opto-very-large-scale-integrated (VLSI) processor in conjunction with a fiber Bragg grating (FBG) array of different Bragg wavelengths. The FBG array spectrally and temporally slices the broadband input pulse into several components and the opto-VLSI processor generates codewords using digital phase holograms. System performance is measured in terms of the autocorrelation and cross-correlation functions as well as the eye diagram.

  3. A fast processor for di-lepton triggers

    CERN Document Server

    Kostarakis, P; Barsotti, E; Conetti, S; Cox, B; Enagonio, J; Haldeman, M; Haynes, W; Katsanevas, S; Kerns, C; Lebrun, P; Smith, H; Soszyniski, T; Stoffel, J; Treptow, K; Turkot, F; Wagner, R

    1981-01-01

    As a new application of the Fermilab ECL-CAMAC logic modules a fast trigger processor was developed for Fermilab experiment E-537, aiming to measure the higher mass di-muon production by antiprotons. The processor matches the hit information received from drift chambers and scintillation counters, to find candidate muon tracks and determine their directions and momenta. The tracks are then paired to compute an invariant mass: when the computed mass falls within the desired range, the event is accepted. The process is accomplished in times of 5 to 10 microseconds, while achieving a trigger rate reduction of up to a factor of ten. (5 refs).

  4. ARM Processor Based Embedded System for Remote Data Acquisition

    OpenAIRE

    Raj Kumar Tiwari; Santosh Kumar Agrahari

    2014-01-01

    The embedded systems are widely used for the data acquisition. The data acquired may be used for monitoring various activity of the system or it can be used to control the parts of the system. Accessing various signals with remote location has greater advantage for multisite operation or unmanned systems. The remote data acquisition used in this paper is based on ARM processor. The Cortex M3 processor used in this system has in-built Ethernet controller which facilitate to acquire the remote ...

  5. Digital control card based on digital signal processor

    International Nuclear Information System (INIS)

    Hou Shigang; Yin Zhiguo; Xia Le

    2008-01-01

    A digital control card based on digital signal processor was developed. Two Freescale DSP-56303 processors were utilized to achieve 3 channels proportional- integral-differential regulations. The card offers high flexibility for 100 MeV cyclotron RF system development. It was used as feedback controller in low level radio frequency control prototype, with the feedback gain parameters continuously adjustable. By using high precision analog to digital converter with 500 kHz sampling rate, a regulation bandwidth of 20 kHz was achieved. (authors)

  6. OLYMPUS system and development of its pre-processor

    International Nuclear Information System (INIS)

    Okamoto, Masao; Takeda, Tatsuoki; Tanaka, Masatoshi; Asai, Kiyoshi; Nakano, Koh.

    1977-08-01

    The OLYMPUS SYSTEM developed by K. V. Roverts et al. was converted and introduced in computer system FACOM 230/75 of the JAERI Computing Center. A pre-processor was also developed for the OLYMPUS SYSTEM. The OLYMPUS SYSTEM is very useful for development, standardization and exchange of programs in thermonuclear fusion research and plasma physics. The pre-processor developed by the present authors is not only essential for the JAERI OLYMPUS SYSTEM, but also useful in manipulation, creation and correction of program files. (auth.)

  7. Wavelength-encoded OCDMA system using opto-VLSI processors

    Science.gov (United States)

    Aljada, Muhsen; Alameh, Kamal

    2007-07-01

    We propose and experimentally demonstrate a 2.5 Gbits/sper user wavelength-encoded optical code-division multiple-access encoder-decoder structure based on opto-VLSI processing. Each encoder and decoder is constructed using a single 1D opto-very-large-scale-integrated (VLSI) processor in conjunction with a fiber Bragg grating (FBG) array of different Bragg wavelengths. The FBG array spectrally and temporally slices the broadband input pulse into several components and the opto-VLSI processor generates codewords using digital phase holograms. System performance is measured in terms of the autocorrelation and cross-correlation functions as well as the eye diagram.

  8. Reconfigurable lattice mesh designs for programmable photonic processors.

    Science.gov (United States)

    Pérez, Daniel; Gasulla, Ivana; Capmany, José; Soref, Richard A

    2016-05-30

    We propose and analyse two novel mesh design geometries for the implementation of tunable optical cores in programmable photonic processors. These geometries are the hexagonal and the triangular lattice. They are compared here to a previously proposed square mesh topology in terms of a series of figures of merit that account for metrics that are relevant to on-chip integration of the mesh. We find that that the hexagonal mesh is the most suitable option of the three considered for the implementation of the reconfigurable optical core in the programmable processor.

  9. Hardware Synchronization for Embedded Multi-Core Processors

    DEFF Research Database (Denmark)

    Stoif, Christian; Schoeberl, Martin; Liccardi, Benito

    2011-01-01

    Multi-core processors are about to conquer embedded systems — it is not the question of whether they are coming but how the architectures of the microcontrollers should look with respect to the strict requirements in the field. We present the step from one to multiple cores in this paper, establi......Multi-core processors are about to conquer embedded systems — it is not the question of whether they are coming but how the architectures of the microcontrollers should look with respect to the strict requirements in the field. We present the step from one to multiple cores in this paper...

  10. Accelerating the SCE-UA Global Optimization Method Based on Multi-Core CPU and Many-Core GPU

    Directory of Open Access Journals (Sweden)

    Guangyuan Kan

    2016-01-01

    Full Text Available The famous global optimization SCE-UA method, which has been widely used in the field of environmental model parameter calibration, is an effective and robust method. However, the SCE-UA method has a high computational load which prohibits the application of SCE-UA to high dimensional and complex problems. In recent years, the hardware of computer, such as multi-core CPUs and many-core GPUs, improves significantly. These much more powerful new hardware and their software ecosystems provide an opportunity to accelerate the SCE-UA method. In this paper, we proposed two parallel SCE-UA methods and implemented them on Intel multi-core CPU and NVIDIA many-core GPU by OpenMP and CUDA Fortran, respectively. The Griewank benchmark function was adopted in this paper to test and compare the performances of the serial and parallel SCE-UA methods. According to the results of the comparison, some useful advises were given to direct how to properly use the parallel SCE-UA methods.

  11. Hybrid GPU-CPU adaptive precision ray-triangle intersection tests for robust high-performance GPU dosimetry computations

    International Nuclear Information System (INIS)

    Perrotte, Lancelot; Bodin, Bruno; Chodorge, Laurent

    2011-01-01

    Before an intervention on a nuclear site, it is essential to study different scenarios to identify the less dangerous one for the operator. Therefore, it is mandatory to dispose of an efficient dosimetry simulation code with accurate results. One classical method in radiation protection is the straight-line attenuation method with build-up factors. In the case of 3D industrial scenes composed of meshes, the computation cost resides in the fast computation of all of the intersections between the rays and the triangles of the scene. Efficient GPU algorithms have already been proposed, that enable dosimetry calculation for a huge scene (800000 rays, 800000 triangles) in a fraction of second. But these algorithms are not robust: because of the rounding caused by floating-point arithmetic, the numerical results of the ray-triangle intersection tests can differ from the expected mathematical results. In worst case scenario, this can lead to a computed dose rate dramatically inferior to the real dose rate to which the operator is exposed. In this paper, we present a hybrid GPU-CPU algorithm to manage adaptive precision floating-point arithmetic. This algorithm allows robust ray-triangle intersection tests, with very small loss of performance (less than 5 % overhead), and without any need for scene-dependent tuning. (author)

  12. Accuracy requirements of optical linear algebra processors in adaptive optics imaging systems

    Science.gov (United States)

    Downie, John D.; Goodman, Joseph W.

    1989-10-01

    The accuracy requirements of optical processors in adaptive optics systems are determined by estimating the required accuracy in a general optical linear algebra processor (OLAP) that results in a smaller average residual aberration than that achieved with a conventional electronic digital processor with some specific computation speed. Special attention is given to an error analysis of a general OLAP with regard to the residual aberration that is created in an adaptive mirror system by the inaccuracies of the processor, and to the effect of computational speed of an electronic processor on the correction. Results are presented on the ability of an OLAP to compete with a digital processor in various situations.

  13. Tradable permit allocations and sequential choice

    Energy Technology Data Exchange (ETDEWEB)

    MacKenzie, Ian A. [Centre for Economic Research, ETH Zuerich, Zurichbergstrasse 18, 8092 Zuerich (Switzerland)

    2011-01-15

    This paper investigates initial allocation choices in an international tradable pollution permit market. For two sovereign governments, we compare allocation choices that are either simultaneously or sequentially announced. We show sequential allocation announcements result in higher (lower) aggregate emissions when announcements are strategic substitutes (complements). Whether allocation announcements are strategic substitutes or complements depends on the relationship between the follower's damage function and governments' abatement costs. When the marginal damage function is relatively steep (flat), allocation announcements are strategic substitutes (complements). For quadratic abatement costs and damages, sequential announcements provide a higher level of aggregate emissions. (author)

  14. Sequential Generalized Transforms on Function Space

    Directory of Open Access Journals (Sweden)

    Jae Gil Choi

    2013-01-01

    Full Text Available We define two sequential transforms on a function space Ca,b[0,T] induced by generalized Brownian motion process. We then establish the existence of the sequential transforms for functionals in a Banach algebra of functionals on Ca,b[0,T]. We also establish that any one of these transforms acts like an inverse transform of the other transform. Finally, we give some remarks about certain relations between our sequential transforms and other well-known transforms on Ca,b[0,T].

  15. The Trigger Processor and Trigger Processor Algorithms for the ATLAS New Small Wheel Upgrade

    CERN Document Server

    Lazovich, Tomo; The ATLAS collaboration

    2015-01-01

    The ATLAS New Small Wheel (NSW) is an upgrade to the ATLAS muon endcap detectors that will be installed during the next long shutdown of the LHC. Comprising both MicroMegas (MMs) and small-strip Thin Gap Chambers (sTGCs), this system will drastically improve the performance of the muon system in a high cavern background environment. The NSW trigger, in particular, will significantly reduce the rate of fake triggers coming from track segments in the endcap not originating from the interaction point. We will present an overview of the trigger, the proposed sTGC and MM trigger algorithms, and the hardware implementation of the trigger. In particular, we will discuss both the heart of the trigger, an ATCA system with FPGA-based trigger processors (using the same hardware platform for both MM and sTGC triggers), as well as the full trigger electronics chain, including dedicated cards for transmission of data via GBT optical links. Finally, we will detail the challenges of ensuring that the trigger electronics can ...

  16. A Software Implementation of a Satellite Interface Message Processor.

    Science.gov (United States)

    Eastwood, Margaret A.; Eastwood, Lester F., Jr.

    A design for network control software for a computer network is described in which some nodes are linked by a communications satellite channel. It is assumed that the network has an ARPANET-like configuration; that is, that specialized processors at each node are responsible for message switching and network control. The purpose of the control…

  17. Optical linear algebra processors - Noise and error-source modeling

    Science.gov (United States)

    Casasent, D.; Ghosh, A.

    1985-01-01

    The modeling of system and component noise and error sources in optical linear algebra processors (OLAPs) are considered, with attention to the frequency-multiplexed OLAP. General expressions are obtained for the output produced as a function of various component errors and noise. A digital simulator for this model is discussed.

  18. Optical linear algebra processors: noise and error-source modeling.

    Science.gov (United States)

    Casasent, D; Ghosh, A

    1985-06-01

    The modeling of system and component noise and error sources in optical linear algebra processors (OLAP's) are considered, with attention to the frequency-multiplexed OLAP. General expressions are obtained for the output produced as a function of various component errors and noise. A digital simulator for this model is discussed.

  19. A post-processor for the PEST code

    International Nuclear Information System (INIS)

    Priesche, S.; Manickam, J.; Johnson, J.L.

    1992-01-01

    A new post-processor has been developed for use with output from the PEST tokamak stability code. It allows us to use quantities calculated by PEST and take better advantage of the physical picture of the plasma instability which they can provide. This will improve comparison with experimentally measured quantities as well as facilitate understanding of theoretical studies

  20. The Operational Semantics of a Java Secure Processor

    NARCIS (Netherlands)

    Hartel, Pieter H.; Butler, M.J.; Levy, M.; Alves-Foss, J.

    1999-01-01

    A formal specification of a Java Secure Processor is presented, which is mechanically checked for type consistency, well formedness and operational conservativity. The specification is executable and it is used to animate and study the behaviour of sample Java programs. The purpose of the semantics

  1. Analytic processor model for fast design-space exploration

    NARCIS (Netherlands)

    Jongerius, R.; Mariani, G.; Anghel, A.; Dittmann, G.; Vermij, E.; Corporaal, H.

    2015-01-01

    In this paper, we propose an analytic model that takes as inputs a) a parametric microarchitecture-independent characterization of the target workload, and b) a hardware configuration of the core and the memory hierarchy, and returns as output an estimation of processor-core performance. To validate

  2. Interactive high-resolution isosurface ray casting on multicore processors.

    Science.gov (United States)

    Wang, Qin; JaJa, Joseph

    2008-01-01

    We present a new method for the interactive rendering of isosurfaces using ray casting on multi-core processors. This method consists of a combination of an object-order traversal that coarsely identifies possible candidate 3D data blocks for each small set of contiguous pixels, and an isosurface ray casting strategy tailored for the resulting limited-size lists of candidate 3D data blocks. While static screen partitioning is widely used in the literature, our scheme performs dynamic allocation of groups of ray casting tasks to ensure almost equal loads among the different threads running on multi-cores while maintaining spatial locality. We also make careful use of memory management environment commonly present in multi-core processors. We test our system on a two-processor Clovertown platform, each consisting of a Quad-Core 1.86-GHz Intel Xeon Processor, for a number of widely different benchmarks. The detailed experimental results show that our system is efficient and scalable, and achieves high cache performance and excellent load balancing, resulting in an overall performance that is superior to any of the previous algorithms. In fact, we achieve an interactive isosurface rendering on a 1024(2) screen for all the datasets tested up to the maximum size of the main memory of our platform.

  3. Real-time trajectory optimization on parallel processors

    Science.gov (United States)

    Psiaki, Mark L.

    1993-01-01

    A parallel algorithm has been developed for rapidly solving trajectory optimization problems. The goal of the work has been to develop an algorithm that is suitable to do real-time, on-line optimal guidance through repeated solution of a trajectory optimization problem. The algorithm has been developed on an INTEL iPSC/860 message passing parallel processor. It uses a zero-order-hold discretization of a continuous-time problem and solves the resulting nonlinear programming problem using a custom-designed augmented Lagrangian nonlinear programming algorithm. The algorithm achieves parallelism of function, derivative, and search direction calculations through the principle of domain decomposition applied along the time axis. It has been encoded and tested on 3 example problems, the Goddard problem, the acceleration-limited, planar minimum-time to the origin problem, and a National Aerospace Plane minimum-fuel ascent guidance problem. Execution times as fast as 118 sec of wall clock time have been achieved for a 128-stage Goddard problem solved on 32 processors. A 32-stage minimum-time problem has been solved in 151 sec on 32 processors. A 32-stage National Aerospace Plane problem required 2 hours when solved on 32 processors. A speed-up factor of 7.2 has been achieved by using 32-nodes instead of 1-node to solve a 64-stage Goddard problem.

  4. Efficient Multicriteria Protein Structure Comparison on Modern Processor Architectures

    Science.gov (United States)

    Manolakos, Elias S.

    2015-01-01

    Fast increasing computational demand for all-to-all protein structures comparison (PSC) is a result of three confounding factors: rapidly expanding structural proteomics databases, high computational complexity of pairwise protein comparison algorithms, and the trend in the domain towards using multiple criteria for protein structures comparison (MCPSC) and combining results. We have developed a software framework that exploits many-core and multicore CPUs to implement efficient parallel MCPSC in modern processors based on three popular PSC methods, namely, TMalign, CE, and USM. We evaluate and compare the performance and efficiency of the two parallel MCPSC implementations using Intel's experimental many-core Single-Chip Cloud Computer (SCC) as well as Intel's Core i7 multicore processor. We show that the 48-core SCC is more efficient than the latest generation Core i7, achieving a speedup factor of 42 (efficiency of 0.9), making many-core processors an exciting emerging technology for large-scale structural proteomics. We compare and contrast the performance of the two processors on several datasets and also show that MCPSC outperforms its component methods in grouping related domains, achieving a high F-measure of 0.91 on the benchmark CK34 dataset. The software implementation for protein structure comparison using the three methods and combined MCPSC, along with the developed underlying rckskel algorithmic skeletons library, is available via GitHub. PMID:26605332

  5. Dynamic overset grid communication on distributed memory parallel processors

    Science.gov (United States)

    Barszcz, Eric; Weeratunga, Sisira K.; Meakin, Robert L.

    1993-01-01

    A parallel distributed memory implementation of intergrid communication for dynamic overset grids is presented. Included are discussions of various options considered during development. Results are presented comparing an Intel iPSC/860 to a single processor Cray Y-MP. Results for grids in relative motion show the iPSC/860 implementation to be faster than the Cray implementation.

  6. Low-power analogue processor for Bonner sphere spectrometers

    International Nuclear Information System (INIS)

    Ciobanu, M.I.; Alevra, A.V.

    1998-01-01

    The electronic system proposed is compact, small-size (the dimensions of the prototype are 107 x 105 x 58 mm) and battery-powered. The whole detection system is portable and independent of the mains supply and is well shielded against external disturbances. Technical details of the analog processor are given. (M.D.)

  7. 50 CFR 648.6 - Dealer/processor permits.

    Science.gov (United States)

    2010-10-01

    ... of incorporation if the business is a corporation, and a copy of the partnership agreement and the names and addresses of all partners, if the business is a partnership, name of at-sea processor vessel... the fishing year to an applicant, unless the applicant fails to submit a completed application. An...

  8. The study of image processing of parallel digital signal processor

    International Nuclear Information System (INIS)

    Liu Jie

    2000-01-01

    The author analyzes the basic characteristic of parallel DSP (digital signal processor) TMS320C80 and proposes related optimized image algorithm and the parallel processing method based on parallel DSP. The realtime for many image processing can be achieved in this way

  9. An implementation of the SANE Virtual Processor using POSIX threads

    NARCIS (Netherlands)

    van Tol, M.W.; Jesshope, C.R.; Lankamp, M.; Polstra, S.

    2009-01-01

    The SANE Virtual Processor (SVP) is an abstract concurrent programming model that is both deadlock free and supports efficient implementation. It is captured by the μTC programming language. The work presented in this paper covers a portable implementation of this model as a C++ library on top of

  10. Sojourn time asymptotics in processor-sharing queues

    NARCIS (Netherlands)

    Borst, S.C.; Núñez Queija, R.; Zwart, B.

    2006-01-01

    Over the past few decades, the Processor-Sharing (PS) discipline has attracted a great deal of attention in the queueing literature. While the PS paradigm emerged in the sixties as an idealization of round-robin scheduling in time-shared computer systems, it has recently captured renewed interest as

  11. The impact of reneging in processor sharing queues

    NARCIS (Netherlands)

    Gromoll, H.C.; Robert, Ph.; Zwart, B.; Bakker, R.F.

    2006-01-01

    We investigate an overloaded processor sharing queue with renewal arrivals and generally distributed service times. Impatient customers may abandon the queue, or renege, before completing service. The random time representing a customer’s patience has a general distribution and may be dependent on

  12. Efficient Multicriteria Protein Structure Comparison on Modern Processor Architectures.

    Science.gov (United States)

    Sharma, Anuj; Manolakos, Elias S

    2015-01-01

    Fast increasing computational demand for all-to-all protein structures comparison (PSC) is a result of three confounding factors: rapidly expanding structural proteomics databases, high computational complexity of pairwise protein comparison algorithms, and the trend in the domain towards using multiple criteria for protein structures comparison (MCPSC) and combining results. We have developed a software framework that exploits many-core and multicore CPUs to implement efficient parallel MCPSC in modern processors based on three popular PSC methods, namely, TMalign, CE, and USM. We evaluate and compare the performance and efficiency of the two parallel MCPSC implementations using Intel's experimental many-core Single-Chip Cloud Computer (SCC) as well as Intel's Core i7 multicore processor. We show that the 48-core SCC is more efficient than the latest generation Core i7, achieving a speedup factor of 42 (efficiency of 0.9), making many-core processors an exciting emerging technology for large-scale structural proteomics. We compare and contrast the performance of the two processors on several datasets and also show that MCPSC outperforms its component methods in grouping related domains, achieving a high F-measure of 0.91 on the benchmark CK34 dataset. The software implementation for protein structure comparison using the three methods and combined MCPSC, along with the developed underlying rckskel algorithmic skeletons library, is available via GitHub.

  13. Sojourn times in finite-capacity processor-sharing queues

    NARCIS (Netherlands)

    Borst, S.C.; Boxma, O.J.; Hegde, N.

    2005-01-01

    Motivated by the need to develop simple parsimonious models for evaluating the performance of wireless data systems, we consider finite-capacity processor-sharing systems. For such systems, we analyze the sojourn time distribution, which presents a useful measure for the transfer delay of documents

  14. Word Processors: A Look at Four Popular Programs.

    Science.gov (United States)

    Press, Larry

    1980-01-01

    Described are types of programs used for processing text (editors, print formatters, and word processors), followed by the comparison of four word-processing packages: Auto Scribe, Electric Pencil, Magic Want and Word Star. With the exception of Auto Scribe, all programs reviewed are CP/M versions. (KC)

  15. The hardware track finder processor in CMS at CERN

    CERN Document Server

    Kluge, A

    1997-01-01

    The work covers the design of the Track Finder Processor in the high energy experiment CMS (Compact Muon Solenoid, planned for 2005) at CERN/Geneva. The task of this processor is to identify muons and measure their transverse momentum. The track finder processor makes it possible to determine the physical relevance of each high energetic collision and to forward only interesting data to the data an alysis units. Data of more than two hundred thousand detector cells are used to determine the location of muons and measure their transverse momentum. Each 25 ns a new data set is generated. Measurem ent of location and transverse momentum of the muons can be terminated within 350 ns by using an ASIC (Application Specific Integrated Circuit). A pipeline architecture processes new data sets with th e required data rate of 40 MHz to ensure dead time free operation. In the framework of this study specifications and the overall concept of the track finder processor were worked out in detail. Simul ations were performed...

  16. 21 CFR 864.3875 - Automated tissue processor.

    Science.gov (United States)

    2010-04-01

    ... 21 Food and Drugs 8 2010-04-01 2010-04-01 false Automated tissue processor. 864.3875 Section 864.3875 Food and Drugs FOOD AND DRUG ADMINISTRATION, DEPARTMENT OF HEALTH AND HUMAN SERVICES (CONTINUED) MEDICAL DEVICES HEMATOLOGY AND PATHOLOGY DEVICES Pathology Instrumentation and Accessories § 864.3875...

  17. High performance graphics processors for medical imaging applications

    International Nuclear Information System (INIS)

    Goldwasser, S.M.; Reynolds, R.A.; Talton, D.A.; Walsh, E.S.

    1989-01-01

    This paper describes a family of high- performance graphics processors with special hardware for interactive visualization of 3D human anatomy. The basic architecture expands to multiple parallel processors, each processor using pipelined arithmetic and logical units for high-speed rendering of Computed Tomography (CT), Magnetic Resonance (MR) and Positron Emission Tomography (PET) data. User-selectable display alternatives include multiple 2D axial slices, reformatted images in sagittal or coronal planes and shaded 3D views. Special facilities support applications requiring color-coded display of multiple datasets (such as radiation therapy planning), or dynamic replay of time- varying volumetric data (such as cine-CT or gated MR studies of the beating heart). The current implementation is a single processor system which generates reformatted images in true real time (30 frames per second), and shaded 3D views in a few seconds per frame. It accepts full scale medical datasets in their native formats, so that minimal preprocessing delay exists between data acquisition and display

  18. Single particle irradiation effect of digital signal processor

    International Nuclear Information System (INIS)

    Fan Si'an; Chen Kenan

    2010-01-01

    The single particle irradiation effect of high energy neutron on digital signal processor TMS320P25 in dynamic working condition has been studied. The influence of the single particle on the device has been explored through the acquired waveform and working current of TMS320P25. Analysis results, test data and test methods have also been presented. (authors)

  19. Evaluation of the Intel Westmere-EP server processor

    CERN Document Server

    Jarp, S; Leduc, J; Nowak, A; CERN. Geneva. IT Department

    2010-01-01

    In this paper we report on a set of benchmark results recently obtained by CERN openlab when comparing the 6-core “Westmere-EP” processor with Intel’s previous generation of the same microarchitecture, the “Nehalem-EP”. The former is produced in a new 32nm process, the latter in 45nm. Both platforms are dual-socket servers. Multiple benchmarks were used to get a good understanding of the performance of the new processor. We used both industry-standard benchmarks, such as SPEC2006, and specific High Energy Physics benchmarks, representing both simulation of physics detectors and data analysis of physics events. Before summarizing the results we must stress the fact that benchmarking of modern processors is a very complex affair. One has to control (at least) the following features: processor frequency, overclocking via Turbo mode, the number of physical cores in use, the use of logical cores via Simultaneous Multi-Threading (SMT), the cache sizes available, the memory configuration installed, as well...

  20. Evaluation of the Intel Nehalem-EX server processor

    CERN Document Server

    Jarp, S; Leduc, J; Nowak, A; CERN. Geneva. IT Department

    2010-01-01

    In this paper we report on a set of benchmark results recently obtained by the CERN openlab by comparing the 4-socket, 32-core Intel Xeon X7560 server with the previous generation 4-socket server, based on the Xeon X7460 processor. The Xeon X7560 processor represents a major change in many respects, especially the memory sub-system, so it was important to make multiple comparisons. In most benchmarks the two 4-socket servers were compared. It should be underlined that both servers represent the “top of the line” in terms of frequency. However, in some cases, it was important to compare systems that integrated the latest processor features, such as QPI links, Symmetric multithreading and over-clocking via Turbo mode, and in such situations the X7560 server was compared to a dual socket L5520 based system with an identical frequency of 2.26 GHz. Before summarizing the results we must stress the fact that benchmarking of modern processors is a very complex affair. One has to control (at least) the following ...